Search (54 results, page 1 of 3)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.21

0.20824818 = product of:
  0.3644343 = sum of:
    0.051311683 = product of:
      0.15393505 = sum of:
        0.15393505 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.15393505 = score(doc=562,freq=2.0), product of:
            0.273897 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.03230675 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.15393505 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.15393505 = score(doc=562,freq=2.0), product of:
        0.273897 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.03230675 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.15393505 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.15393505 = score(doc=562,freq=2.0), product of:
        0.273897 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.03230675 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.005252542 = product of:
      0.02626271 = sum of:
        0.02626271 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.02626271 = score(doc=562,freq=2.0), product of:
            0.11313273 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03230675 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.2 = coord(1/5)
  0.5714286 = coord(4/7)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Wätjen, H.-J.: Automatisches Sammeln, Klassifizieren und Indexieren von wissenschaftlich relevanten Informationsressourcen im deutschen World Wide Web : das DFG-Projekt GERHARD (1998) 0.01

0.012230963 = product of:
  0.08561674 = sum of:
    0.08561674 = product of:
      0.17123348 = sum of:
        0.17123348 = weight(_text_:wissenschaftlich in 3066) [ClassicSimilarity], result of:
          0.17123348 = score(doc=3066,freq=2.0), product of:
            0.2237631 = queryWeight, product of:
              6.926203 = idf(docFreq=117, maxDocs=44218)
              0.03230675 = queryNorm
            0.7652445 = fieldWeight in 3066, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.926203 = idf(docFreq=117, maxDocs=44218)
              0.078125 = fieldNorm(doc=3066)
      0.5 = coord(1/2)
  0.14285715 = coord(1/7)

Wätjen, H.-J.: GERHARD : Automatisches Sammeln, Klassifizieren und Indexieren von wissenschaftlich relevanten Informationsressourcen im deutschen World Wide Web (1998) 0.01
```
0.012108037 = product of:
  0.084756255 = sum of:
    0.084756255 = product of:
      0.16951251 = sum of:
        0.16951251 = weight(_text_:wissenschaftlich in 3064) [ClassicSimilarity], result of:
          0.16951251 = score(doc=3064,freq=4.0), product of:
            0.2237631 = queryWeight, product of:
              6.926203 = idf(docFreq=117, maxDocs=44218)
              0.03230675 = queryNorm
            0.75755346 = fieldWeight in 3064, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              6.926203 = idf(docFreq=117, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3064)
      0.5 = coord(1/2)
  0.14285715 = coord(1/7)
```
Abstract

Die intellektuelle Erschließung des Internet befindet sich in einer Krise. Yahoo und andere Dienste können mit dem Wachstum des Web nicht mithalten. GERHARD ist derzeit weltweit der einzige Such- und Navigationsdienst, der die mit einem Roboter gesammelten Internetressourcen mit computerlinguistischen und statistischen Verfahren auch automatisch vollständig klassifiziert. Weit über eine Million HTML-Dokumente von wissenschaftlich relevanten Servern in Deutschland können wie bei anderen Suchmaschinen in der Datenbank gesucht, aber auch über die Navigation in der dreisprachigen Universalen Dezimalklassifikation (ETH-Bibliothek Zürich) recherchiert werden
Krüger, C.: Evaluation des WWW-Suchdienstes GERHARD unter besonderer Beachtung automatischer Indexierung (1999) 0.01
```
0.0061154817 = product of:
  0.04280837 = sum of:
    0.04280837 = product of:
      0.08561674 = sum of:
        0.08561674 = weight(_text_:wissenschaftlich in 1777) [ClassicSimilarity], result of:
          0.08561674 = score(doc=1777,freq=2.0), product of:
            0.2237631 = queryWeight, product of:
              6.926203 = idf(docFreq=117, maxDocs=44218)
              0.03230675 = queryNorm
            0.38262224 = fieldWeight in 1777, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.926203 = idf(docFreq=117, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1777)
      0.5 = coord(1/2)
  0.14285715 = coord(1/7)
```
Abstract

Die vorliegende Arbeit beinhaltet eine Beschreibung und Evaluation des WWW - Suchdienstes GERHARD (German Harvest Automated Retrieval and Directory). GERHARD ist ein Such- und Navigationssystem für das deutsche World Wide Web, weiches ausschließlich wissenschaftlich relevante Dokumente sammelt, und diese auf der Basis computerlinguistischer und statistischer Methoden automatisch mit Hilfe eines bibliothekarischen Klassifikationssystems klassifiziert. Mit dem DFG - Projekt GERHARD ist der Versuch unternommen worden, mit einem auf einem automatischen Klassifizierungsverfahren basierenden World Wide Web - Dienst eine Alternative zu herkömmlichen Methoden der Interneterschließung zu entwickeln. GERHARD ist im deutschsprachigen Raum das einzige Verzeichnis von Internetressourcen, dessen Erstellung und Aktualisierung vollständig automatisch (also maschinell) erfolgt. GERHARD beschränkt sich dabei auf den Nachweis von Dokumenten auf wissenschaftlichen WWW - Servern. Die Grundidee dabei war, kostenintensive intellektuelle Erschließung und Klassifizierung von lnternetseiten durch computerlinguistische und statistische Methoden zu ersetzen, um auf diese Weise die nachgewiesenen Internetressourcen automatisch auf das Vokabular eines bibliothekarischen Klassifikationssystems abzubilden. GERHARD steht für German Harvest Automated Retrieval and Directory. Die WWW - Adresse (URL) von GERHARD lautet: http://www.gerhard.de. Im Rahmen der vorliegenden Diplomarbeit soll eine Beschreibung des Dienstes mit besonderem Schwerpunkt auf dem zugrundeliegenden Indexierungs- bzw. Klassifizierungssystem erfolgen und anschließend mit Hilfe eines kleinen Retrievaltests die Effektivität von GERHARD überprüft werden.
Oberhauser, O.: Automatisches Klassifizieren : Entwicklungsstand - Methodik - Anwendungsbereiche (2005) 0.01
```
0.0060712276 = product of:
  0.042498592 = sum of:
    0.042498592 = weight(_text_:fortschritt in 38) [ClassicSimilarity], result of:
      0.042498592 = score(doc=38,freq=2.0), product of:
        0.22295201 = queryWeight, product of:
          6.901097 = idf(docFreq=120, maxDocs=44218)
          0.03230675 = queryNorm
        0.19061767 = fieldWeight in 38, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.901097 = idf(docFreq=120, maxDocs=44218)
          0.01953125 = fieldNorm(doc=38)
  0.14285715 = coord(1/7)
```
Footnote

Zum Inhalt Auf einen kurzen einleitenden Abschnitt folgt eine Einführung in die grundlegende Methodik des automatischen Klassifizierens. Oberhauser erklärt hier Begriffe wie Einfach- und Mehrfachklassifizierung, Klassen- und Dokumentzentrierung, und geht danach auf die hauptsächlichen Anwendungen der automatischen Klassifikation von Textdokumenten, maschinelle Lernverfahren und Techniken der Dimensionsreduktion bei der Indexierung ein. Zwei weitere Unterkapitel sind der Erstellung von Klassifikatoren und den Methoden für deren Auswertung gewidmet. Das Kapitel wird abgerundet von einer kurzen Auflistung einiger Softwareprodukte für automatisches Klassifizieren, die sowohl kommerzielle Software, als auch Projekte aus dem Open-Source-Bereich umfasst. Der Hauptteil des Buches ist den großen Projekten zur automatischen Erschließung von Webdokumenten gewidmet, die von OCLC (Scorpion) sowie an den Universitäten Lund (Nordic WAIS/WWW, DESIRE II), Wolverhampton (WWLib-TOS, WWLib-TNG, Old ACE, ACE) und Oldenburg (GERHARD, GERHARD II) durchgeführt worden sind. Der Autor beschreibt hier sehr detailliert - wobei der Detailliertheitsgrad unterschiedlich ist, je nachdem, was aus der Projektdokumentation geschlossen werden kann - die jeweilige Zielsetzung des Projektes, die verwendete Klassifikation, die methodische Vorgehensweise sowie die Evaluierungsmethoden und -ergebnisse. Sofern Querverweise zu anderen Projekten bestehen, werden auch diese besprochen. Der Verfasser geht hier sehr genau auf wichtige Aspekte wie Vokabularbildung, Textaufbereitung und Gewichtung ein, so dass der Leser eine gute Vorstellung von den Ansätzen und der möglichen Weiterentwicklung des Projektes bekommt. In einem weiteren Kapitel wird auf einige kleinere Projekte eingegangen, die dem für Bibliotheken besonders interessanten Thema des automatischen Klassifizierens von Büchern sowie den Bereichen Patentliteratur, Mediendokumentation und dem Einsatz bei Informationsdiensten gewidmet sind. Die Darstellung wird ergänzt von einem Literaturverzeichnis mit über 250 Titeln zu den konkreten Projekten sowie einem Abkürzungs- und einem Abbildungsverzeichnis. In der abschließenden Diskussion der beschriebenen Projekte wird einerseits auf die Bedeutung der einzelnen Projekte für den methodischen Fortschritt eingegangen, andererseits aber auch einiges an Kritik geäußert, v. a. bezüglich der mangelnden Auswertung der Projektergebnisse und des Fehlens an brauchbarer Dokumentation. So waren z. B. die Projektseiten des Projekts GERHARD (www.gerhard.de/) auf den Stand von 1998 eingefroren, zurzeit [11.07.06] sind sie überhaupt nicht mehr erreichbar. Mit einigem Erstaunen stellt Oberhauser auch fest, dass - abgesehen von der fast 15 Jahre alten Untersuchung von Larsen - »keine signifikanten Studien oder Anwendungen aus dem Bibliotheksbereich vorliegen« (S. 139). Wie der Autor aber selbst ergänzend ausführt, dürfte dies daran liegen, dass sich bibliografische Metadaten wegen des geringen Textumfangs sehr schlecht für automatische Klassifikation eignen, und dass - wie frühere Ergebnisse gezeigt haben - das übliche TF/IDF-Verfahren nicht für Katalogisate geeignet ist (ibd.).

Savic, D.: Designing an expert system for classifying office documents (1994) 0.00

0.0041130767 = product of:
  0.028791536 = sum of:
    0.028791536 = product of:
      0.07197884 = sum of:
        0.036644034 = weight(_text_:28 in 2655) [ClassicSimilarity], result of:
          0.036644034 = score(doc=2655,freq=2.0), product of:
            0.115731284 = queryWeight, product of:
              3.5822632 = idf(docFreq=3342, maxDocs=44218)
              0.03230675 = queryNorm
            0.31663033 = fieldWeight in 2655, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5822632 = idf(docFreq=3342, maxDocs=44218)
              0.0625 = fieldNorm(doc=2655)
        0.035334807 = weight(_text_:29 in 2655) [ClassicSimilarity], result of:
          0.035334807 = score(doc=2655,freq=2.0), product of:
            0.11364504 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.03230675 = queryNorm
            0.31092256 = fieldWeight in 2655, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0625 = fieldNorm(doc=2655)
      0.4 = coord(2/5)
  0.14285715 = coord(1/7)

Source: Records management quarterly. 28(1994) no.3, S.20-29

Savic, D.: Automatic classification of office documents : review of available methods and techniques (1995) 0.00

0.0035989422 = product of:
  0.025192594 = sum of:
    0.025192594 = product of:
      0.06298149 = sum of:
        0.03206353 = weight(_text_:28 in 2219) [ClassicSimilarity], result of:
          0.03206353 = score(doc=2219,freq=2.0), product of:
            0.115731284 = queryWeight, product of:
              3.5822632 = idf(docFreq=3342, maxDocs=44218)
              0.03230675 = queryNorm
            0.27705154 = fieldWeight in 2219, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5822632 = idf(docFreq=3342, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2219)
        0.030917956 = weight(_text_:29 in 2219) [ClassicSimilarity], result of:
          0.030917956 = score(doc=2219,freq=2.0), product of:
            0.11364504 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.03230675 = queryNorm
            0.27205724 = fieldWeight in 2219, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2219)
      0.4 = coord(2/5)
  0.14285715 = coord(1/7)

Date: 23. 7.1996 10:28:09
Source: Records management quarterly. 29(1995) no.4, S.3-18

Yi, K.: Automatic text classification using library classification schemes : trends, issues and challenges (2007) 0.00

0.003583049 = product of:
  0.025081342 = sum of:
    0.025081342 = product of:
      0.062703356 = sum of:
        0.03206353 = weight(_text_:28 in 2560) [ClassicSimilarity], result of:
          0.03206353 = score(doc=2560,freq=2.0), product of:
            0.115731284 = queryWeight, product of:
              3.5822632 = idf(docFreq=3342, maxDocs=44218)
              0.03230675 = queryNorm
            0.27705154 = fieldWeight in 2560, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5822632 = idf(docFreq=3342, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2560)
        0.03063983 = weight(_text_:22 in 2560) [ClassicSimilarity], result of:
          0.03063983 = score(doc=2560,freq=2.0), product of:
            0.11313273 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03230675 = queryNorm
            0.2708308 = fieldWeight in 2560, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2560)
      0.4 = coord(2/5)
  0.14285715 = coord(1/7)

Date: 28. 9.2003 11:42:17
22. 9.2008 18:31:54

Pfeffer, M.: Automatische Vergabe von RVK-Notationen mittels fallbasiertem Schließen (2009) 0.00

0.003071185 = product of:
  0.021498295 = sum of:
    0.021498295 = product of:
      0.053745735 = sum of:
        0.027483026 = weight(_text_:28 in 3051) [ClassicSimilarity], result of:
          0.027483026 = score(doc=3051,freq=2.0), product of:
            0.115731284 = queryWeight, product of:
              3.5822632 = idf(docFreq=3342, maxDocs=44218)
              0.03230675 = queryNorm
            0.23747274 = fieldWeight in 3051, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5822632 = idf(docFreq=3342, maxDocs=44218)
              0.046875 = fieldNorm(doc=3051)
        0.02626271 = weight(_text_:22 in 3051) [ClassicSimilarity], result of:
          0.02626271 = score(doc=3051,freq=2.0), product of:
            0.11313273 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03230675 = queryNorm
            0.23214069 = fieldWeight in 3051, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=3051)
      0.4 = coord(2/5)
  0.14285715 = coord(1/7)

Date: 22. 8.2009 19:51:28

Ibekwe-SanJuan, F.; SanJuan, E.: From term variants to research topics (2002) 0.00

0.0025706731 = product of:
  0.017994711 = sum of:
    0.017994711 = product of:
      0.044986777 = sum of:
        0.022902522 = weight(_text_:28 in 1853) [ClassicSimilarity], result of:
          0.022902522 = score(doc=1853,freq=2.0), product of:
            0.115731284 = queryWeight, product of:
              3.5822632 = idf(docFreq=3342, maxDocs=44218)
              0.03230675 = queryNorm
            0.19789396 = fieldWeight in 1853, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5822632 = idf(docFreq=3342, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1853)
        0.022084255 = weight(_text_:29 in 1853) [ClassicSimilarity], result of:
          0.022084255 = score(doc=1853,freq=2.0), product of:
            0.11364504 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.03230675 = queryNorm
            0.19432661 = fieldWeight in 1853, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1853)
      0.4 = coord(2/5)
  0.14285715 = coord(1/7)

Date: 6. 1.1997 18:30:28
Source: Knowledge organization. 29(2002) nos.3/4, S.181-197

Giorgetti, D.; Sebastiani, F.: Automating survey coding by multiclass text categorization techniques (2003) 0.00
```
0.0025706731 = product of:
  0.017994711 = sum of:
    0.017994711 = product of:
      0.044986777 = sum of:
        0.022902522 = weight(_text_:28 in 5172) [ClassicSimilarity], result of:
          0.022902522 = score(doc=5172,freq=2.0), product of:
            0.115731284 = queryWeight, product of:
              3.5822632 = idf(docFreq=3342, maxDocs=44218)
              0.03230675 = queryNorm
            0.19789396 = fieldWeight in 5172, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5822632 = idf(docFreq=3342, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5172)
        0.022084255 = weight(_text_:29 in 5172) [ClassicSimilarity], result of:
          0.022084255 = score(doc=5172,freq=2.0), product of:
            0.11364504 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.03230675 = queryNorm
            0.19432661 = fieldWeight in 5172, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5172)
      0.4 = coord(2/5)
  0.14285715 = coord(1/7)
```
Abstract

In this issue Giorgetti, and Sebastiani suggest that answers to open ended questions in survey instruments can be coded automatically by creating classifiers which learn from training sets of manually coded answers. The manual effort required is only that of classifying a representative set of documents, not creating a dictionary of words that trigger an assignment. They use a naive Bayesian probabilistic learner from Mc Callum's RAINBOW package and the multi-class support vector machine learner from Hsu and Lin's BSVM package, both examples of text categorization techniques. Data from the 1996 General Social Survey by the U.S. National Opinion Research Center provided a set of answers to three questions (previously tested by Viechnicki using a dictionary approach), their associated manually assigned category codes, and a complete set of predefined category codes. The learners were run on three random disjoint subsets of the answer sets to create the classifiers and a remaining set was used as a test set. The dictionary approach is out preformed by 18% for RAINBOW and by 17% for BSVM, while the standard deviation of the results is reduced by 28% and 34% respectively over the dictionary approach.

Date

9. 7.2006 10:29:12

Khoo, C.S.G.; Ng, K.; Ou, S.: ¬An exploratory study of human clustering of Web pages (2003) 0.00

0.0020474568 = product of:
  0.014332197 = sum of:
    0.014332197 = product of:
      0.03583049 = sum of:
        0.018322017 = weight(_text_:28 in 2741) [ClassicSimilarity], result of:
          0.018322017 = score(doc=2741,freq=2.0), product of:
            0.115731284 = queryWeight, product of:
              3.5822632 = idf(docFreq=3342, maxDocs=44218)
              0.03230675 = queryNorm
            0.15831517 = fieldWeight in 2741, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5822632 = idf(docFreq=3342, maxDocs=44218)
              0.03125 = fieldNorm(doc=2741)
        0.017508473 = weight(_text_:22 in 2741) [ClassicSimilarity], result of:
          0.017508473 = score(doc=2741,freq=2.0), product of:
            0.11313273 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03230675 = queryNorm
            0.15476047 = fieldWeight in 2741, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=2741)
      0.4 = coord(2/5)
  0.14285715 = coord(1/7)

Date: 6. 1.1997 18:30:28
12. 9.2004 9:56:22

Panyr, J.: STEINADLER: ein Verfahren zur automatischen Deskribierung und zur automatischen thematischen Klassifikation (1978) 0.00

0.002019132 = product of:
  0.014133923 = sum of:
    0.014133923 = product of:
      0.070669614 = sum of:
        0.070669614 = weight(_text_:29 in 5169) [ClassicSimilarity], result of:
          0.070669614 = score(doc=5169,freq=2.0), product of:
            0.11364504 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.03230675 = queryNorm
            0.6218451 = fieldWeight in 5169, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.125 = fieldNorm(doc=5169)
      0.2 = coord(1/5)
  0.14285715 = coord(1/7)

Source: Nachrichten für Dokumentation. 29(1978), S.92-96

Kleinoeder, H.H.; Puzicha, J.: Automatische Katalogisierung am Beispiel einer Pilotanwendung (2002) 0.00

0.0018322017 = product of:
  0.012825412 = sum of:
    0.012825412 = product of:
      0.06412706 = sum of:
        0.06412706 = weight(_text_:28 in 1154) [ClassicSimilarity], result of:
          0.06412706 = score(doc=1154,freq=2.0), product of:
            0.115731284 = queryWeight, product of:
              3.5822632 = idf(docFreq=3342, maxDocs=44218)
              0.03230675 = queryNorm
            0.5541031 = fieldWeight in 1154, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5822632 = idf(docFreq=3342, maxDocs=44218)
              0.109375 = fieldNorm(doc=1154)
      0.2 = coord(1/5)
  0.14285715 = coord(1/7)

Date: 11. 7.2003 13:27:28

Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.00

0.0015007263 = product of:
  0.010505084 = sum of:
    0.010505084 = product of:
      0.05252542 = sum of:
        0.05252542 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
          0.05252542 = score(doc=1046,freq=2.0), product of:
            0.11313273 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03230675 = queryNorm
            0.46428138 = fieldWeight in 1046, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=1046)
      0.2 = coord(1/5)
  0.14285715 = coord(1/7)

Date: 5. 5.2003 14:17:22

Reiner, U.: Automatische DDC-Klassifizierung von bibliografischen Titeldatensätzen (2009) 0.00

0.0012506054 = product of:
  0.008754238 = sum of:
    0.008754238 = product of:
      0.043771185 = sum of:
        0.043771185 = weight(_text_:22 in 611) [ClassicSimilarity], result of:
          0.043771185 = score(doc=611,freq=2.0), product of:
            0.11313273 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03230675 = queryNorm
            0.38690117 = fieldWeight in 611, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=611)
      0.2 = coord(1/5)
  0.14285715 = coord(1/7)

Date: 22. 8.2009 12:54:24

HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.00

0.0012506054 = product of:
  0.008754238 = sum of:
    0.008754238 = product of:
      0.043771185 = sum of:
        0.043771185 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
          0.043771185 = score(doc=2748,freq=2.0), product of:
            0.11313273 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03230675 = queryNorm
            0.38690117 = fieldWeight in 2748, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2748)
      0.2 = coord(1/5)
  0.14285715 = coord(1/7)

Date: 1. 2.2016 18:25:22

Golub, K.; Hamon, T.; Ardö, A.: Automated classification of textual documents based on a controlled vocabulary in engineering (2007) 0.00

0.001110482 = product of:
  0.007773374 = sum of:
    0.007773374 = product of:
      0.03886687 = sum of:
        0.03886687 = weight(_text_:28 in 1461) [ClassicSimilarity], result of:
          0.03886687 = score(doc=1461,freq=4.0), product of:
            0.115731284 = queryWeight, product of:
              3.5822632 = idf(docFreq=3342, maxDocs=44218)
              0.03230675 = queryNorm
            0.3358372 = fieldWeight in 1461, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5822632 = idf(docFreq=3342, maxDocs=44218)
              0.046875 = fieldNorm(doc=1461)
      0.2 = coord(1/5)
  0.14285715 = coord(1/7)

Date: 6. 1.1997 18:30:28
28. 2.2008 14:21:51

Desale, S.K.; Kumbhar, R.: Research on automatic classification of documents in library environment : a literature review (2013) 0.00

0.001110482 = product of:
  0.007773374 = sum of:
    0.007773374 = product of:
      0.03886687 = sum of:
        0.03886687 = weight(_text_:28 in 1071) [ClassicSimilarity], result of:
          0.03886687 = score(doc=1071,freq=4.0), product of:
            0.115731284 = queryWeight, product of:
              3.5822632 = idf(docFreq=3342, maxDocs=44218)
              0.03230675 = queryNorm
            0.3358372 = fieldWeight in 1071, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5822632 = idf(docFreq=3342, maxDocs=44218)
              0.046875 = fieldNorm(doc=1071)
      0.2 = coord(1/5)
  0.14285715 = coord(1/7)

Date: 6. 1.1997 18:30:28
19. 9.2013 19:28:15

Aphinyanaphongs, Y.; Fu, L.D.; Li, Z.; Peskin, E.R.; Efstathiadis, E.; Aliferis, C.F.; Statnikov, A.: ¬A comprehensive empirical comparison of modern supervised classification and feature selection methods for text categorization (2014) 0.00
```
0.001110482 = product of:
  0.007773374 = sum of:
    0.007773374 = product of:
      0.03886687 = sum of:
        0.03886687 = weight(_text_:28 in 1496) [ClassicSimilarity], result of:
          0.03886687 = score(doc=1496,freq=4.0), product of:
            0.115731284 = queryWeight, product of:
              3.5822632 = idf(docFreq=3342, maxDocs=44218)
              0.03230675 = queryNorm
            0.3358372 = fieldWeight in 1496, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5822632 = idf(docFreq=3342, maxDocs=44218)
              0.046875 = fieldNorm(doc=1496)
      0.2 = coord(1/5)
  0.14285715 = coord(1/7)
```
Abstract

An important aspect to performing text categorization is selecting appropriate supervised classification and feature selection methods. A comprehensive benchmark is needed to inform best practices in this broad application field. Previous benchmarks have evaluated performance for a few supervised classification and feature selection methods and limited ways to optimize them. The present work updates prior benchmarks by increasing the number of classifiers and feature selection methods order of magnitude, including adding recently developed, state-of-the-art methods. Specifically, this study used 229 text categorization data sets/tasks, and evaluated 28 classification methods (both well-established and proprietary/commercial) and 19 feature selection methods according to 4 classification performance metrics. We report several key findings that will be helpful in establishing best methodological practices for text categorization.

Date

26. 9.2014 18:28:57

Search (54 results, page 1 of 3)

Authors

Years

Languages

Types

Themes

Subjects