Search (206 results, page 1 of 11)

Wätjen, H.-J.; Diekmann, B.; Möller, G.; Carstensen, K.-U.: Bericht zum DFG-Projekt: GERHARD : German Harvest Automated Retrieval and Directory (1998) 0.03

0.029035995 = product of:
  0.10162598 = sum of:
    0.057740733 = weight(_text_:g in 3065) [ClassicSimilarity], result of:
      0.057740733 = score(doc=3065,freq=2.0), product of:
        0.13914184 = queryWeight, product of:
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.03704574 = queryNorm
        0.4149775 = fieldWeight in 3065, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.078125 = fieldNorm(doc=3065)
    0.043885246 = weight(_text_:u in 3065) [ClassicSimilarity], result of:
      0.043885246 = score(doc=3065,freq=2.0), product of:
        0.121304214 = queryWeight, product of:
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.03704574 = queryNorm
        0.3617784 = fieldWeight in 3065, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.078125 = fieldNorm(doc=3065)
  0.2857143 = coord(2/7)

Greiner, G.: Intellektuelles und automatisches Klassifizieren (1981) 0.03

0.028883414 = product of:
  0.10109194 = sum of:
    0.09238517 = weight(_text_:g in 1103) [ClassicSimilarity], result of:
      0.09238517 = score(doc=1103,freq=2.0), product of:
        0.13914184 = queryWeight, product of:
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.03704574 = queryNorm
        0.663964 = fieldWeight in 1103, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.125 = fieldNorm(doc=1103)
    0.008706774 = weight(_text_:a in 1103) [ClassicSimilarity], result of:
      0.008706774 = score(doc=1103,freq=2.0), product of:
        0.04271548 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03704574 = queryNorm
        0.20383182 = fieldWeight in 1103, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.125 = fieldNorm(doc=1103)
  0.2857143 = coord(2/7)

Type: a

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.03

0.028164204 = product of:
  0.065716475 = sum of:
    0.04412884 = product of:
      0.17651536 = sum of:
        0.17651536 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.17651536 = score(doc=562,freq=2.0), product of:
            0.3140742 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.03704574 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.25 = coord(1/4)
    0.006530081 = weight(_text_:a in 562) [ClassicSimilarity], result of:
      0.006530081 = score(doc=562,freq=8.0), product of:
        0.04271548 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03704574 = queryNorm
        0.15287387 = fieldWeight in 562, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.015057558 = product of:
      0.030115116 = sum of:
        0.030115116 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.030115116 = score(doc=562,freq=2.0), product of:
            0.12972787 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03704574 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
  0.42857143 = coord(3/7)

Abstract: Document representations for text classification are typically based on the classical Bag-Of-Words paradigm. This approach comes with deficiencies that motivate the integration of features on a higher semantic level than single words. In this paper we propose an enhancement of the classical document representation through concepts extracted from background knowledge. Boosting is used for actual classification. Experimental evaluations on two well known text corpora support our approach through consistent improvement of the results.
Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32
Type: a

Ruiz, M.E.; Srinivasan, P.: Combining machine learning and hierarchical indexing structures for text categorization (2001) 0.02

0.02436761 = product of:
  0.056857757 = sum of:
    0.018519659 = product of:
      0.037039317 = sum of:
        0.037039317 = weight(_text_:p in 1595) [ClassicSimilarity], result of:
          0.037039317 = score(doc=1595,freq=2.0), product of:
            0.13319843 = queryWeight, product of:
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.03704574 = queryNorm
            0.27807623 = fieldWeight in 1595, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1595)
      0.5 = coord(1/2)
    0.030719671 = weight(_text_:u in 1595) [ClassicSimilarity], result of:
      0.030719671 = score(doc=1595,freq=2.0), product of:
        0.121304214 = queryWeight, product of:
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.03704574 = queryNorm
        0.25324488 = fieldWeight in 1595, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1595)
    0.0076184273 = weight(_text_:a in 1595) [ClassicSimilarity], result of:
      0.0076184273 = score(doc=1595,freq=8.0), product of:
        0.04271548 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03704574 = queryNorm
        0.17835285 = fieldWeight in 1595, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1595)
  0.42857143 = coord(3/7)

Abstract: This paper presents a method that exploits the hierarchical structure of an indexing vocabulary to guide the development and training of machine learning methods for automatic text categorization. We present the design of a hierarchical classifier based an the divide-and-conquer principle. The method is evaluated using backpropagation neural networks, such as the machine learning algorithm, that leam to assign MeSH categories to a subset of MEDLINE records. Comparisons with traditional Rocchio's algorithm adapted for text categorization, as well as flat neural network classifiers, are provided. The results indicate that the use of hierarchical structures improves Performance significantly.
Source: Advances in classification research, vol.10: proceedings of the 10th ASIS SIG/CR Classification Research Workshop. Ed.: Albrechtsen, H. u. J.E. Mai
Type: a

Reiner, U.: Automatische DDC-Klassifizierung von bibliografischen Titeldatensätzen (2009) 0.02

0.01970891 = product of:
  0.06898118 = sum of:
    0.043885246 = weight(_text_:u in 611) [ClassicSimilarity], result of:
      0.043885246 = score(doc=611,freq=2.0), product of:
        0.121304214 = queryWeight, product of:
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.03704574 = queryNorm
        0.3617784 = fieldWeight in 611, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.078125 = fieldNorm(doc=611)
    0.025095932 = product of:
      0.050191864 = sum of:
        0.050191864 = weight(_text_:22 in 611) [ClassicSimilarity], result of:
          0.050191864 = score(doc=611,freq=2.0), product of:
            0.12972787 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03704574 = queryNorm
            0.38690117 = fieldWeight in 611, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=611)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)

Date: 22. 8.2009 12:54:24

Reiner, U.: Automatische DDC-Klassifizierung bibliografischer Titeldatensätze der Deutschen Nationalbibliografie (2009) 0.02
```
0.018948475 = product of:
  0.04421311 = sum of:
    0.030404592 = weight(_text_:u in 3284) [ClassicSimilarity], result of:
      0.030404592 = score(doc=3284,freq=6.0), product of:
        0.121304214 = queryWeight, product of:
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.03704574 = queryNorm
        0.25064746 = fieldWeight in 3284, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.03125 = fieldNorm(doc=3284)
    0.003770144 = weight(_text_:a in 3284) [ClassicSimilarity], result of:
      0.003770144 = score(doc=3284,freq=6.0), product of:
        0.04271548 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03704574 = queryNorm
        0.088261776 = fieldWeight in 3284, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03125 = fieldNorm(doc=3284)
    0.010038373 = product of:
      0.020076746 = sum of:
        0.020076746 = weight(_text_:22 in 3284) [ClassicSimilarity], result of:
          0.020076746 = score(doc=3284,freq=2.0), product of:
            0.12972787 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03704574 = queryNorm
            0.15476047 = fieldWeight in 3284, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=3284)
      0.5 = coord(1/2)
  0.42857143 = coord(3/7)
```
Abstract

Das Klassifizieren von Objekten (z. B. Fauna, Flora, Texte) ist ein Verfahren, das auf menschlicher Intelligenz basiert. In der Informatik - insbesondere im Gebiet der Künstlichen Intelligenz (KI) - wird u. a. untersucht, inweit Verfahren, die menschliche Intelligenz benötigen, automatisiert werden können. Hierbei hat sich herausgestellt, dass die Lösung von Alltagsproblemen eine größere Herausforderung darstellt, als die Lösung von Spezialproblemen, wie z. B. das Erstellen eines Schachcomputers. So ist "Rybka" der seit Juni 2007 amtierende Computerschach-Weltmeistern. Inwieweit Alltagsprobleme mit Methoden der Künstlichen Intelligenz gelöst werden können, ist eine - für den allgemeinen Fall - noch offene Frage. Beim Lösen von Alltagsproblemen spielt die Verarbeitung der natürlichen Sprache, wie z. B. das Verstehen, eine wesentliche Rolle. Den "gesunden Menschenverstand" als Maschine (in der Cyc-Wissensbasis in Form von Fakten und Regeln) zu realisieren, ist Lenat's Ziel seit 1984. Bezüglich des KI-Paradeprojektes "Cyc" gibt es CycOptimisten und Cyc-Pessimisten. Das Verstehen der natürlichen Sprache (z. B. Werktitel, Zusammenfassung, Vorwort, Inhalt) ist auch beim intellektuellen Klassifizieren von bibliografischen Titeldatensätzen oder Netzpublikationen notwendig, um diese Textobjekte korrekt klassifizieren zu können. Seit dem Jahr 2007 werden von der Deutschen Nationalbibliothek nahezu alle Veröffentlichungen mit der Dewey Dezimalklassifikation (DDC) intellektuell klassifiziert.
Die Menge der zu klassifizierenden Veröffentlichungen steigt spätestens seit der Existenz des World Wide Web schneller an, als sie intellektuell sachlich erschlossen werden kann. Daher werden Verfahren gesucht, um die Klassifizierung von Textobjekten zu automatisieren oder die intellektuelle Klassifizierung zumindest zu unterstützen. Seit 1968 gibt es Verfahren zur automatischen Dokumentenklassifizierung (Information Retrieval, kurz: IR) und seit 1992 zur automatischen Textklassifizierung (ATC: Automated Text Categorization). Seit immer mehr digitale Objekte im World Wide Web zur Verfügung stehen, haben Arbeiten zur automatischen Textklassifizierung seit ca. 1998 verstärkt zugenommen. Dazu gehören seit 1996 auch Arbeiten zur automatischen DDC-Klassifizierung bzw. RVK-Klassifizierung von bibliografischen Titeldatensätzen und Volltextdokumenten. Bei den Entwicklungen handelt es sich unseres Wissens bislang um experimentelle und keine im ständigen Betrieb befindlichen Systeme. Auch das VZG-Projekt Colibri/DDC ist seit 2006 u. a. mit der automatischen DDC-Klassifizierung befasst. Die diesbezüglichen Untersuchungen und Entwicklungen dienen zur Beantwortung der Forschungsfrage: "Ist es möglich, eine inhaltlich stimmige DDC-Titelklassifikation aller GVK-PLUS-Titeldatensätze automatisch zu erzielen?"

Date

22. 1.2010 14:41:24

Type

a

McKiernan, G.: Automated categorisation of Web resources : a profile of selected projects, research, products, and services (1996) 0.02

0.018696144 = product of:
  0.065436505 = sum of:
    0.057740733 = weight(_text_:g in 2533) [ClassicSimilarity], result of:
      0.057740733 = score(doc=2533,freq=2.0), product of:
        0.13914184 = queryWeight, product of:
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.03704574 = queryNorm
        0.4149775 = fieldWeight in 2533, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.078125 = fieldNorm(doc=2533)
    0.007695774 = weight(_text_:a in 2533) [ClassicSimilarity], result of:
      0.007695774 = score(doc=2533,freq=4.0), product of:
        0.04271548 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03704574 = queryNorm
        0.18016359 = fieldWeight in 2533, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.078125 = fieldNorm(doc=2533)
  0.2857143 = coord(2/7)

Type: a

Möller, G.: Automatic classification of the World Wide Web using Universal Decimal Classification (1999) 0.02

0.018052135 = product of:
  0.063182466 = sum of:
    0.057740733 = weight(_text_:g in 494) [ClassicSimilarity], result of:
      0.057740733 = score(doc=494,freq=2.0), product of:
        0.13914184 = queryWeight, product of:
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.03704574 = queryNorm
        0.4149775 = fieldWeight in 494, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.078125 = fieldNorm(doc=494)
    0.0054417336 = weight(_text_:a in 494) [ClassicSimilarity], result of:
      0.0054417336 = score(doc=494,freq=2.0), product of:
        0.04271548 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03704574 = queryNorm
        0.12739488 = fieldWeight in 494, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.078125 = fieldNorm(doc=494)
  0.2857143 = coord(2/7)

Type: a

Leroy, G.; Miller, T.; Rosemblat, G.; Browne, A.: ¬A balanced approach to health information evaluation : a vocabulary-based naïve Bayes classifier and readability formulas (2008) 0.02
```
0.016637018 = product of:
  0.058229562 = sum of:
    0.048994634 = weight(_text_:g in 1998) [ClassicSimilarity], result of:
      0.048994634 = score(doc=1998,freq=4.0), product of:
        0.13914184 = queryWeight, product of:
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.03704574 = queryNorm
        0.35212007 = fieldWeight in 1998, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.046875 = fieldNorm(doc=1998)
    0.0092349285 = weight(_text_:a in 1998) [ClassicSimilarity], result of:
      0.0092349285 = score(doc=1998,freq=16.0), product of:
        0.04271548 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03704574 = queryNorm
        0.2161963 = fieldWeight in 1998, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=1998)
  0.2857143 = coord(2/7)
```
Abstract

Since millions seek health information online, it is vital for this information to be comprehensible. Most studies use readability formulas, which ignore vocabulary, and conclude that online health information is too difficult. We developed a vocabularly-based, naïve Bayes classifier to distinguish between three difficulty levels in text. It proved 98% accurate in a 250-document evaluation. We compared our classifier with readability formulas for 90 new documents with different origins and asked representative human evaluators, an expert and a consumer, to judge each document. Average readability grade levels for educational and commercial pages was 10th grade or higher, too difficult according to current literature. In contrast, the classifier showed that 70-90% of these pages were written at an intermediate, appropriate level indicating that vocabulary usage is frequently appropriate in text considered too difficult by readability formula evaluations. The expert considered the pages more difficult for a consumer than the consumer did.

Type

a

Classification, automation, and new media : Proceedings of the 24th Annual Conference of the Gesellschaft für Klassifikation e.V., University of Passau, March 15 - 17, 2000 (2002) 0.01

0.014517997 = product of:
  0.05081299 = sum of:
    0.028870367 = weight(_text_:g in 5997) [ClassicSimilarity], result of:
      0.028870367 = score(doc=5997,freq=2.0), product of:
        0.13914184 = queryWeight, product of:
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.03704574 = queryNorm
        0.20748875 = fieldWeight in 5997, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5997)
    0.021942623 = weight(_text_:u in 5997) [ClassicSimilarity], result of:
      0.021942623 = score(doc=5997,freq=2.0), product of:
        0.121304214 = queryWeight, product of:
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.03704574 = queryNorm
        0.1808892 = fieldWeight in 5997, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5997)
  0.2857143 = coord(2/7)

Editor: Gaul, W. u. G. Ritter

Hu, G.; Zhou, S.; Guan, J.; Hu, X.: Towards effective document clustering : a constrained K-means based approach (2008) 0.01

0.013433219 = product of:
  0.047016263 = sum of:
    0.040418513 = weight(_text_:g in 2113) [ClassicSimilarity], result of:
      0.040418513 = score(doc=2113,freq=2.0), product of:
        0.13914184 = queryWeight, product of:
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.03704574 = queryNorm
        0.29048425 = fieldWeight in 2113, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2113)
    0.0065977518 = weight(_text_:a in 2113) [ClassicSimilarity], result of:
      0.0065977518 = score(doc=2113,freq=6.0), product of:
        0.04271548 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03704574 = queryNorm
        0.1544581 = fieldWeight in 2113, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2113)
  0.2857143 = coord(2/7)

Abstract: Document clustering is an important tool for document collection organization and browsing. In real applications, some limited knowledge about cluster membership of a small number of documents is often available, such as some pairs of documents belonging to the same cluster. This kind of prior knowledge can be served as constraints for the clustering process. We integrate the constraints into the trace formulation of the sum of square Euclidean distance function of K-means. Then, the combined criterion function is transformed into trace maximization, which is further optimized by eigen-decomposition. Our experimental evaluation shows that the proposed semi-supervised clustering method can achieve better performance, compared to three existing methods.
Type: a

Kanaan, G.; Al-Shalabi, R.; Ghwanmeh, S.; Al-Ma'adeed, H.: ¬A comparison of text-classification techniques applied to Arabic text (2009) 0.01

0.011514188 = product of:
  0.040299654 = sum of:
    0.03464444 = weight(_text_:g in 3096) [ClassicSimilarity], result of:
      0.03464444 = score(doc=3096,freq=2.0), product of:
        0.13914184 = queryWeight, product of:
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.03704574 = queryNorm
        0.24898648 = fieldWeight in 3096, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.046875 = fieldNorm(doc=3096)
    0.005655216 = weight(_text_:a in 3096) [ClassicSimilarity], result of:
      0.005655216 = score(doc=3096,freq=6.0), product of:
        0.04271548 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03704574 = queryNorm
        0.13239266 = fieldWeight in 3096, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=3096)
  0.2857143 = coord(2/7)

Abstract: Many algorithms have been implemented for the problem of text classification. Most of the work in this area was carried out for English text. Very little research has been carried out on Arabic text. The nature of Arabic text is different than that of English text, and preprocessing of Arabic text is more challenging. This paper presents an implementation of three automatic text-classification techniques for Arabic text. A corpus of 1445 Arabic text documents belonging to nine categories has been automatically classified using the kNN, Rocchio, and naïve Bayes algorithms. The research results reveal that Naïve Bayes was the best performer, followed by kNN and Rocchio.
Type: a

Schulze, U.: Erfahrungen bei der Anwendung automatischer Klassifizierungsverfahren zur Inhaltsanalyse einer Dokumentenmenge (1978) 0.01

0.011274738 = product of:
  0.039461583 = sum of:
    0.035108197 = weight(_text_:u in 83) [ClassicSimilarity], result of:
      0.035108197 = score(doc=83,freq=2.0), product of:
        0.121304214 = queryWeight, product of:
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.03704574 = queryNorm
        0.28942272 = fieldWeight in 83, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.0625 = fieldNorm(doc=83)
    0.004353387 = weight(_text_:a in 83) [ClassicSimilarity], result of:
      0.004353387 = score(doc=83,freq=2.0), product of:
        0.04271548 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03704574 = queryNorm
        0.10191591 = fieldWeight in 83, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0625 = fieldNorm(doc=83)
  0.2857143 = coord(2/7)

Type: a

Pfister, J.: Clustering von Patent-Dokumenten am Beispiel der Datenbanken des Fachinformationszentrums Karlsruhe (2006) 0.01

0.011274738 = product of:
  0.039461583 = sum of:
    0.035108197 = weight(_text_:u in 5976) [ClassicSimilarity], result of:
      0.035108197 = score(doc=5976,freq=2.0), product of:
        0.121304214 = queryWeight, product of:
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.03704574 = queryNorm
        0.28942272 = fieldWeight in 5976, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.0625 = fieldNorm(doc=5976)
    0.004353387 = weight(_text_:a in 5976) [ClassicSimilarity], result of:
      0.004353387 = score(doc=5976,freq=2.0), product of:
        0.04271548 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03704574 = queryNorm
        0.10191591 = fieldWeight in 5976, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0625 = fieldNorm(doc=5976)
  0.2857143 = coord(2/7)

Source: Effektive Information Retrieval Verfahren in Theorie und Praxis: ausgewählte und erweiterte Beiträge des Vierten Hildesheimer Evaluierungs- und Retrievalworkshop (HIER 2005), Hildesheim, 20.7.2005. Hrsg.: T. Mandl u. C. Womser-Hacker
Type: a

Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.01

0.010470057 = product of:
  0.036645196 = sum of:
    0.006530081 = weight(_text_:a in 1046) [ClassicSimilarity], result of:
      0.006530081 = score(doc=1046,freq=2.0), product of:
        0.04271548 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03704574 = queryNorm
        0.15287387 = fieldWeight in 1046, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.09375 = fieldNorm(doc=1046)
    0.030115116 = product of:
      0.060230233 = sum of:
        0.060230233 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
          0.060230233 = score(doc=1046,freq=2.0), product of:
            0.12972787 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03704574 = queryNorm
            0.46428138 = fieldWeight in 1046, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=1046)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)

Date: 5. 5.2003 14:17:22
Type: a

Golub, K.; Soergel, D.; Buchanan, G.; Tudhope, D.; Lykke, M.; Hiom, D.: ¬A framework for evaluating automatic indexing or classification in the context of retrieval (2016) 0.01
```
0.010305459 = product of:
  0.036069103 = sum of:
    0.028870367 = weight(_text_:g in 3311) [ClassicSimilarity], result of:
      0.028870367 = score(doc=3311,freq=2.0), product of:
        0.13914184 = queryWeight, product of:
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.03704574 = queryNorm
        0.20748875 = fieldWeight in 3311, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3311)
    0.0071987375 = weight(_text_:a in 3311) [ClassicSimilarity], result of:
      0.0071987375 = score(doc=3311,freq=14.0), product of:
        0.04271548 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03704574 = queryNorm
        0.1685276 = fieldWeight in 3311, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3311)
  0.2857143 = coord(2/7)
```
Abstract

Tools for automatic subject assignment help deal with scale and sustainability in creating and enriching metadata, establishing more connections across and between resources and enhancing consistency. Although some software vendors and experimental researchers claim the tools can replace manual subject indexing, hard scientific evidence of their performance in operating information environments is scarce. A major reason for this is that research is usually conducted in laboratory conditions, excluding the complexities of real-life systems and situations. The article reviews and discusses issues with existing evaluation approaches such as problems of aboutness and relevance assessments, implying the need to use more than a single "gold standard" method when evaluating indexing and retrieval, and proposes a comprehensive evaluation framework. The framework is informed by a systematic review of the literature on evaluation approaches: evaluating indexing quality directly through assessment by an evaluator or through comparison with a gold standard, evaluating the quality of computer-assisted indexing directly in the context of an indexing workflow, and evaluating indexing quality indirectly through analyzing retrieval performance.

Type

a
Ma, Z.; Sun, A.; Cong, G.: On predicting the popularity of newly emerging hashtags in Twitter (2013) 0.01
```
0.010152887 = product of:
  0.0355351 = sum of:
    0.028870367 = weight(_text_:g in 967) [ClassicSimilarity], result of:
      0.028870367 = score(doc=967,freq=2.0), product of:
        0.13914184 = queryWeight, product of:
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.03704574 = queryNorm
        0.20748875 = fieldWeight in 967, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.0390625 = fieldNorm(doc=967)
    0.0066647357 = weight(_text_:a in 967) [ClassicSimilarity], result of:
      0.0066647357 = score(doc=967,freq=12.0), product of:
        0.04271548 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03704574 = queryNorm
        0.15602624 = fieldWeight in 967, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=967)
  0.2857143 = coord(2/7)
```
Abstract

Because of Twitter's popularity and the viral nature of information dissemination on Twitter, predicting which Twitter topics will become popular in the near future becomes a task of considerable economic importance. Many Twitter topics are annotated by hashtags. In this article, we propose methods to predict the popularity of new hashtags on Twitter by formulating the problem as a classification task. We use five standard classification models (i.e., Naïve bayes, k-nearest neighbors, decision trees, support vector machines, and logistic regression) for prediction. The main challenge is the identification of effective features for describing new hashtags. We extract 7 content features from a hashtag string and the collection of tweets containing the hashtag and 11 contextual features from the social graph formed by users who have adopted the hashtag. We conducted experiments on a Twitter data set consisting of 31 million tweets from 2 million Singapore-based users. The experimental results show that the standard classifiers using the extracted features significantly outperform the baseline methods that do not use these features. Among the five classifiers, the logistic regression model performs the best in terms of the Micro-F1 measure. We also observe that contextual features are more effective than content features.

Type

a
Chae, G.; Park, J.; Park, J.; Yeo, W.S.; Shi, C.: Linking and clustering artworks using social tags : revitalizing crowd-sourced information on cultural collections (2016) 0.01
```
0.010152887 = product of:
  0.0355351 = sum of:
    0.028870367 = weight(_text_:g in 2852) [ClassicSimilarity], result of:
      0.028870367 = score(doc=2852,freq=2.0), product of:
        0.13914184 = queryWeight, product of:
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.03704574 = queryNorm
        0.20748875 = fieldWeight in 2852, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2852)
    0.0066647357 = weight(_text_:a in 2852) [ClassicSimilarity], result of:
      0.0066647357 = score(doc=2852,freq=12.0), product of:
        0.04271548 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03704574 = queryNorm
        0.15602624 = fieldWeight in 2852, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2852)
  0.2857143 = coord(2/7)
```
Abstract

Social tagging is one of the most popular methods for collecting crowd-sourced information in galleries, libraries, archives, and museums (GLAMs). However, when the number of social tags grows rapidly, using them becomes problematic and, as a result, they are often left as simply big data that cannot be used for practical purposes. To revitalize the use of this crowd-sourced information, we propose using social tags to link and cluster artworks based on an experimental study using an online collection at the Gyeonggi Museum of Modern Art (GMoMA). We view social tagging as a folksonomy, where artworks are classified by keywords of the crowd's various interpretations and one artwork can belong to several different categories simultaneously. To leverage this strength of social tags, we used a clustering method called "link communities" to detect overlapping communities in a network of artworks constructed by computing similarities between all artwork pairs. We used this framework to identify semantic relationships and clusters of similar artworks. By comparing the clustering results with curators' manual classification results, we demonstrated the potential of social tagging data for automatically clustering artworks in a way that reflects the dynamic perspectives of crowds.

Type

a
Qu, B.; Cong, G.; Li, C.; Sun, A.; Chen, H.: ¬An evaluation of classification models for question topic categorization (2012) 0.01
```
0.009986974 = product of:
  0.03495441 = sum of:
    0.028870367 = weight(_text_:g in 237) [ClassicSimilarity], result of:
      0.028870367 = score(doc=237,freq=2.0), product of:
        0.13914184 = queryWeight, product of:
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.03704574 = queryNorm
        0.20748875 = fieldWeight in 237, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.0390625 = fieldNorm(doc=237)
    0.0060840435 = weight(_text_:a in 237) [ClassicSimilarity], result of:
      0.0060840435 = score(doc=237,freq=10.0), product of:
        0.04271548 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03704574 = queryNorm
        0.14243183 = fieldWeight in 237, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=237)
  0.2857143 = coord(2/7)
```
Abstract

We study the problem of question topic classification using a very large real-world Community Question Answering (CQA) dataset from Yahoo! Answers. The dataset comprises 3.9 million questions and these questions are organized into more than 1,000 categories in a hierarchy. To the best knowledge, this is the first systematic evaluation of the performance of different classification methods on question topic classification as well as short texts. Specifically, we empirically evaluate the following in classifying questions into CQA categories: (a) the usefulness of n-gram features and bag-of-word features; (b) the performance of three standard classification algorithms (naive Bayes, maximum entropy, and support vector machines); (c) the performance of the state-of-the-art hierarchical classification algorithms; (d) the effect of training data size on performance; and (e) the effectiveness of the different components of CQA data, including subject, content, asker, and the best answer. The experimental results show what aspects are important for question topic classification in terms of both effectiveness and efficiency. We believe that the experimental findings from this study will be useful in real-world classification problems.

Type

a
Pech, G.; Delgado, C.; Sorella, S.P.: Classifying papers into subfields using Abstracts, Titles, Keywords and KeyWords Plus through pattern detection and optimization procedures : an application in Physics (2022) 0.01
```
0.009803457 = product of:
  0.0343121 = sum of:
    0.028870367 = weight(_text_:g in 744) [ClassicSimilarity], result of:
      0.028870367 = score(doc=744,freq=2.0), product of:
        0.13914184 = queryWeight, product of:
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.03704574 = queryNorm
        0.20748875 = fieldWeight in 744, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.0390625 = fieldNorm(doc=744)
    0.0054417336 = weight(_text_:a in 744) [ClassicSimilarity], result of:
      0.0054417336 = score(doc=744,freq=8.0), product of:
        0.04271548 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03704574 = queryNorm
        0.12739488 = fieldWeight in 744, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=744)
  0.2857143 = coord(2/7)
```
Abstract

Classifying papers according to the fields of knowledge is critical to clearly understand the dynamics of scientific (sub)fields, their leading questions, and trends. Most studies rely on journal categories defined by popular databases such as WoS or Scopus, but some experts find that those categories may not correctly map the existing subfields nor identify the subfield of a specific article. This study addresses the classification problem using data from each paper (Abstract, Title, Keywords, and the KeyWords Plus) and the help of experts to identify the existing subfields and journals exclusive of each subfield. These "exclusive journals" are critical to obtain, through a pattern detection procedure that uses machine learning techniques (from software NVivo), a list of the frequent terms that are specific to each subfield. With that list of terms and with the help of optimization procedures, we can identify to which subfield each paper most likely belongs. This study can contribute to support scientific policy-makers, funding, and research institutions-via more accurate academic performance evaluations-, to support editors in their tasks to redefine the scopes of journals, and to support popular databases in their processes of refining categories.

Type

a

Search (206 results, page 1 of 11)

Authors

Years

Languages

Types

Themes

Subjects