Search (38 results, page 1 of 2)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.22

0.22118558 = product of:
  0.5308454 = sum of:
    0.051759936 = product of:
      0.1552798 = sum of:
        0.1552798 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.1552798 = score(doc=562,freq=2.0), product of:
            0.2762897 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.032588977 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.1552798 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.1552798 = score(doc=562,freq=2.0), product of:
        0.2762897 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.032588977 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.1552798 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.1552798 = score(doc=562,freq=2.0), product of:
        0.2762897 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.032588977 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.1552798 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.1552798 = score(doc=562,freq=2.0), product of:
        0.2762897 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.032588977 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.01324607 = product of:
      0.02649214 = sum of:
        0.02649214 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.02649214 = score(doc=562,freq=2.0), product of:
            0.11412105 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.032588977 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
  0.41666666 = coord(5/12)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Reiner, U.: Automatische DDC-Klassifizierung bibliografischer Titeldatensätze der Deutschen Nationalbibliografie (2009) 0.02
```
0.021892602 = product of:
  0.08757041 = sum of:
    0.037497066 = weight(_text_:informatik in 3284) [ClassicSimilarity], result of:
      0.037497066 = score(doc=3284,freq=2.0), product of:
        0.1662844 = queryWeight, product of:
          5.1024737 = idf(docFreq=730, maxDocs=44218)
          0.032588977 = queryNorm
        0.2254996 = fieldWeight in 3284, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.1024737 = idf(docFreq=730, maxDocs=44218)
          0.03125 = fieldNorm(doc=3284)
    0.04124263 = weight(_text_:systeme in 3284) [ClassicSimilarity], result of:
      0.04124263 = score(doc=3284,freq=2.0), product of:
        0.17439179 = queryWeight, product of:
          5.3512506 = idf(docFreq=569, maxDocs=44218)
          0.032588977 = queryNorm
        0.2364941 = fieldWeight in 3284, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.3512506 = idf(docFreq=569, maxDocs=44218)
          0.03125 = fieldNorm(doc=3284)
    0.008830713 = product of:
      0.017661426 = sum of:
        0.017661426 = weight(_text_:22 in 3284) [ClassicSimilarity], result of:
          0.017661426 = score(doc=3284,freq=2.0), product of:
            0.11412105 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.032588977 = queryNorm
            0.15476047 = fieldWeight in 3284, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=3284)
      0.5 = coord(1/2)
  0.25 = coord(3/12)
```
Abstract

Das Klassifizieren von Objekten (z. B. Fauna, Flora, Texte) ist ein Verfahren, das auf menschlicher Intelligenz basiert. In der Informatik - insbesondere im Gebiet der Künstlichen Intelligenz (KI) - wird u. a. untersucht, inweit Verfahren, die menschliche Intelligenz benötigen, automatisiert werden können. Hierbei hat sich herausgestellt, dass die Lösung von Alltagsproblemen eine größere Herausforderung darstellt, als die Lösung von Spezialproblemen, wie z. B. das Erstellen eines Schachcomputers. So ist "Rybka" der seit Juni 2007 amtierende Computerschach-Weltmeistern. Inwieweit Alltagsprobleme mit Methoden der Künstlichen Intelligenz gelöst werden können, ist eine - für den allgemeinen Fall - noch offene Frage. Beim Lösen von Alltagsproblemen spielt die Verarbeitung der natürlichen Sprache, wie z. B. das Verstehen, eine wesentliche Rolle. Den "gesunden Menschenverstand" als Maschine (in der Cyc-Wissensbasis in Form von Fakten und Regeln) zu realisieren, ist Lenat's Ziel seit 1984. Bezüglich des KI-Paradeprojektes "Cyc" gibt es CycOptimisten und Cyc-Pessimisten. Das Verstehen der natürlichen Sprache (z. B. Werktitel, Zusammenfassung, Vorwort, Inhalt) ist auch beim intellektuellen Klassifizieren von bibliografischen Titeldatensätzen oder Netzpublikationen notwendig, um diese Textobjekte korrekt klassifizieren zu können. Seit dem Jahr 2007 werden von der Deutschen Nationalbibliothek nahezu alle Veröffentlichungen mit der Dewey Dezimalklassifikation (DDC) intellektuell klassifiziert.
Die Menge der zu klassifizierenden Veröffentlichungen steigt spätestens seit der Existenz des World Wide Web schneller an, als sie intellektuell sachlich erschlossen werden kann. Daher werden Verfahren gesucht, um die Klassifizierung von Textobjekten zu automatisieren oder die intellektuelle Klassifizierung zumindest zu unterstützen. Seit 1968 gibt es Verfahren zur automatischen Dokumentenklassifizierung (Information Retrieval, kurz: IR) und seit 1992 zur automatischen Textklassifizierung (ATC: Automated Text Categorization). Seit immer mehr digitale Objekte im World Wide Web zur Verfügung stehen, haben Arbeiten zur automatischen Textklassifizierung seit ca. 1998 verstärkt zugenommen. Dazu gehören seit 1996 auch Arbeiten zur automatischen DDC-Klassifizierung bzw. RVK-Klassifizierung von bibliografischen Titeldatensätzen und Volltextdokumenten. Bei den Entwicklungen handelt es sich unseres Wissens bislang um experimentelle und keine im ständigen Betrieb befindlichen Systeme. Auch das VZG-Projekt Colibri/DDC ist seit 2006 u. a. mit der automatischen DDC-Klassifizierung befasst. Die diesbezüglichen Untersuchungen und Entwicklungen dienen zur Beantwortung der Forschungsfrage: "Ist es möglich, eine inhaltlich stimmige DDC-Titelklassifikation aller GVK-PLUS-Titeldatensätze automatisch zu erzielen?"

Date

22. 1.2010 14:41:24

Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.01

0.010691734 = product of:
  0.0641504 = sum of:
    0.03765826 = weight(_text_:internet in 1046) [ClassicSimilarity], result of:
      0.03765826 = score(doc=1046,freq=2.0), product of:
        0.09621047 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.032588977 = queryNorm
        0.3914154 = fieldWeight in 1046, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.09375 = fieldNorm(doc=1046)
    0.02649214 = product of:
      0.05298428 = sum of:
        0.05298428 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
          0.05298428 = score(doc=1046,freq=2.0), product of:
            0.11412105 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.032588977 = queryNorm
            0.46428138 = fieldWeight in 1046, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=1046)
      0.5 = coord(1/2)
  0.16666667 = coord(2/12)

Date: 5. 5.2003 14:17:22
Footnote: Teil eines Themenheftes: OCLC and the Internet: An Historical Overview of Research Activities, 1990-1999 - Part II

Egbert, J.; Biber, D.; Davies, M.: Developing a bottom-up, user-based method of web register classification (2015) 0.01

0.0066457465 = product of:
  0.03987448 = sum of:
    0.02662841 = weight(_text_:internet in 2158) [ClassicSimilarity], result of:
      0.02662841 = score(doc=2158,freq=4.0), product of:
        0.09621047 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.032588977 = queryNorm
        0.27677247 = fieldWeight in 2158, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.046875 = fieldNorm(doc=2158)
    0.01324607 = product of:
      0.02649214 = sum of:
        0.02649214 = weight(_text_:22 in 2158) [ClassicSimilarity], result of:
          0.02649214 = score(doc=2158,freq=2.0), product of:
            0.11412105 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.032588977 = queryNorm
            0.23214069 = fieldWeight in 2158, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2158)
      0.5 = coord(1/2)
  0.16666667 = coord(2/12)

Abstract: This paper introduces a project to develop a reliable, cost-effective method for classifying Internet texts into register categories, and apply that approach to the analysis of a large corpus of web documents. To date, the project has proceeded in 2 key phases. First, we developed a bottom-up method for web register classification, asking end users of the web to utilize a decision-tree survey to code relevant situational characteristics of web documents, resulting in a bottom-up identification of register and subregister categories. We present details regarding the development and testing of this method through a series of 10 pilot studies. Then, in the second phase of our project we applied this procedure to a corpus of 53,000 web documents. An analysis of the results demonstrates the effectiveness of these methods for web register classification and provides a preliminary description of the types and distribution of registers on the web.
Date: 4. 8.2015 19:22:04
Theme: Internet

Shafer, K.E.: Evaluating Scorpion Results (2001) 0.00

0.004529585 = product of:
  0.054355018 = sum of:
    0.054355018 = weight(_text_:internet in 4085) [ClassicSimilarity], result of:
      0.054355018 = score(doc=4085,freq=6.0), product of:
        0.09621047 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.032588977 = queryNorm
        0.56495947 = fieldWeight in 4085, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.078125 = fieldNorm(doc=4085)
  0.083333336 = coord(1/12)

Abstract: Using DDC for automatic indexing and classifying of Internet resources
Footnote: Teil eines Themenheftes: OCLC and the Internet: An Historical Overview of Research Activities, 1990-1999 - Part II
Theme: Internet

Shafer, K.E.: Automatic Subject Assignment via the Scorpion System (2001) 0.00

0.0044380687 = product of:
  0.05325682 = sum of:
    0.05325682 = weight(_text_:internet in 1043) [ClassicSimilarity], result of:
      0.05325682 = score(doc=1043,freq=4.0), product of:
        0.09621047 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.032588977 = queryNorm
        0.55354494 = fieldWeight in 1043, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.09375 = fieldNorm(doc=1043)
  0.083333336 = coord(1/12)

Footnote: Teil eines Themenheftes: OCLC and the Internet: An Historical Overview of Research Activities, 1990-1999 - Part I
Theme: Internet

Panyr, J.: Automatische thematische Textklassifikation und ihre Interpretation in der Dokumentengrobrecherche (1980) 0.00
```
0.0040856022 = product of:
  0.049027227 = sum of:
    0.049027227 = product of:
      0.098054454 = sum of:
        0.098054454 = weight(_text_:vernetzung in 100) [ClassicSimilarity], result of:
          0.098054454 = score(doc=100,freq=2.0), product of:
            0.20326729 = queryWeight, product of:
              6.237302 = idf(docFreq=234, maxDocs=44218)
              0.032588977 = queryNorm
            0.48239172 = fieldWeight in 100, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.237302 = idf(docFreq=234, maxDocs=44218)
              0.0546875 = fieldNorm(doc=100)
      0.5 = coord(1/2)
  0.083333336 = coord(1/12)
```
Abstract

Für die automatische Erschließung natürlich-sprachlicher Dokumente in einem Informationssystem wurde ein Verfahren zur automatischen thematischen hierarchischen Klassifikation der Texte entwickelt. Die dabei gewonnene Ordnungsstruktur (Begriffsnetz) wird beim Retrieval als Recherchehilfe engeboten. Die Klassifikation erfolgt in vier Stufen: Textindexierung, Prioritätsklassenbildung, Verknüpfung der begriffe und Vernetzung der Prioritätsklassen miteinander. Die so entstandenen Wichtigkeitsstufen sind die Hierarchieebenen der Klassifikation. Die während des Clusteringverfahrens erzeugten Begriffs- und Dokumenten-Gruppierungen bilden die Knoten des Klassifikationsnetzes. Die Verknüpfung zwischen den Knoten benachbarter Prioritätsklassen repräsentieren die Netzwege in diesem Netz. Die Abbildung der Suchfrage auf dieses Begriffsnetz wird zur Relevanzbeurteilung der wiedergewonnenen Texte benutzt

Vizine-Goetz, D.: NetLab / OCLC collaboration seeks to improve Web searching (1999) 0.00

0.0036983904 = product of:
  0.044380683 = sum of:
    0.044380683 = weight(_text_:internet in 4180) [ClassicSimilarity], result of:
      0.044380683 = score(doc=4180,freq=4.0), product of:
        0.09621047 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.032588977 = queryNorm
        0.46128747 = fieldWeight in 4180, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.078125 = fieldNorm(doc=4180)
  0.083333336 = coord(1/12)

Abstract: Vorstellung verschiedener Projekte zur Verbesserung der Internet-Erschließung mit Hilfe der DDC
Theme: Internet

GERHARD : eine Spezialsuchmaschine für die Wissenschaft (1998) 0.00

0.0031381883 = product of:
  0.03765826 = sum of:
    0.03765826 = weight(_text_:internet in 381) [ClassicSimilarity], result of:
      0.03765826 = score(doc=381,freq=2.0), product of:
        0.09621047 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.032588977 = queryNorm
        0.3914154 = fieldWeight in 381, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.09375 = fieldNorm(doc=381)
  0.083333336 = coord(1/12)

Theme: Internet

Ardö, A.; Koch, T.: Automatic classification applied to full-text Internet documents in a robot-generated subject index (1999) 0.00

0.0031381883 = product of:
  0.03765826 = sum of:
    0.03765826 = weight(_text_:internet in 382) [ClassicSimilarity], result of:
      0.03765826 = score(doc=382,freq=2.0), product of:
        0.09621047 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.032588977 = queryNorm
        0.3914154 = fieldWeight in 382, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.09375 = fieldNorm(doc=382)
  0.083333336 = coord(1/12)

Chan, L.M.; Lin, X.; Zeng, M.L.: Structural and multilingual approaches to subject access on the Web (2000) 0.00

0.0031381883 = product of:
  0.03765826 = sum of:
    0.03765826 = weight(_text_:internet in 507) [ClassicSimilarity], result of:
      0.03765826 = score(doc=507,freq=2.0), product of:
        0.09621047 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.032588977 = queryNorm
        0.3914154 = fieldWeight in 507, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.09375 = fieldNorm(doc=507)
  0.083333336 = coord(1/12)

Theme: Internet

McKiernan, G.: Automated categorisation of Web resources : a profile of selected projects, research, products, and services (1996) 0.00

0.002615157 = product of:
  0.031381883 = sum of:
    0.031381883 = weight(_text_:internet in 2533) [ClassicSimilarity], result of:
      0.031381883 = score(doc=2533,freq=2.0), product of:
        0.09621047 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.032588977 = queryNorm
        0.3261795 = fieldWeight in 2533, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.078125 = fieldNorm(doc=2533)
  0.083333336 = coord(1/12)

Theme: Internet

Möller, G.: Automatic classification of the World Wide Web using Universal Decimal Classification (1999) 0.00

0.002615157 = product of:
  0.031381883 = sum of:
    0.031381883 = weight(_text_:internet in 494) [ClassicSimilarity], result of:
      0.031381883 = score(doc=494,freq=2.0), product of:
        0.09621047 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.032588977 = queryNorm
        0.3261795 = fieldWeight in 494, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.078125 = fieldNorm(doc=494)
  0.083333336 = coord(1/12)

Theme: Internet

Koch, T.: Experiments with automatic classification of WAIS databases and indexing of WWW : some results from the Nordic WAIS/WWW project (1994) 0.00

0.0025888733 = product of:
  0.03106648 = sum of:
    0.03106648 = weight(_text_:internet in 7209) [ClassicSimilarity], result of:
      0.03106648 = score(doc=7209,freq=4.0), product of:
        0.09621047 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.032588977 = queryNorm
        0.32290122 = fieldWeight in 7209, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7209)
  0.083333336 = coord(1/12)

Source: Internet world and document delivery world international 94: Proceedings of the 2nd Annual Conference, London, May 1994
Theme: Internet

Wätjen, H.-J.: GERHARD : Automatisches Sammeln, Klassifizieren und Indexieren von wissenschaftlich relevanten Informationsressourcen im deutschen World Wide Web (1998) 0.00
```
0.0025888733 = product of:
  0.03106648 = sum of:
    0.03106648 = weight(_text_:internet in 3064) [ClassicSimilarity], result of:
      0.03106648 = score(doc=3064,freq=4.0), product of:
        0.09621047 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.032588977 = queryNorm
        0.32290122 = fieldWeight in 3064, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3064)
  0.083333336 = coord(1/12)
```
Abstract

Die intellektuelle Erschließung des Internet befindet sich in einer Krise. Yahoo und andere Dienste können mit dem Wachstum des Web nicht mithalten. GERHARD ist derzeit weltweit der einzige Such- und Navigationsdienst, der die mit einem Roboter gesammelten Internetressourcen mit computerlinguistischen und statistischen Verfahren auch automatisch vollständig klassifiziert. Weit über eine Million HTML-Dokumente von wissenschaftlich relevanten Servern in Deutschland können wie bei anderen Suchmaschinen in der Datenbank gesucht, aber auch über die Navigation in der dreisprachigen Universalen Dezimalklassifikation (ETH-Bibliothek Zürich) recherchiert werden

Theme

Internet
Choi, B.; Peng, X.: Dynamic and hierarchical classification of Web pages (2004) 0.00
```
0.0022190344 = product of:
  0.02662841 = sum of:
    0.02662841 = weight(_text_:internet in 2555) [ClassicSimilarity], result of:
      0.02662841 = score(doc=2555,freq=4.0), product of:
        0.09621047 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.032588977 = queryNorm
        0.27677247 = fieldWeight in 2555, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.046875 = fieldNorm(doc=2555)
  0.083333336 = coord(1/12)
```
Abstract

Automatic classification of Web pages is an effective way to organise the vast amount of information and to assist in retrieving relevant information from the Internet. Although many automatic classification systems have been proposed, most of them ignore the conflict between the fixed number of categories and the growing number of Web pages being added into the systems. They also require searching through all existing categories to make any classification. This article proposes a dynamic and hierarchical classification system that is capable of adding new categories as required, organising the Web pages into a tree structure, and classifying Web pages by searching through only one path of the tree. The proposed single-path search technique reduces the search complexity from (n) to (log(n)). Test results show that the system improves the accuracy of classification by 6 percent in comparison to related systems. The dynamic-category expansion technique also achieves satisfying results for adding new categories into the system as required.

Theme

Internet

Koch, T.: Nutzung von Klassifikationssystemen zur verbesserten Beschreibung, Organisation und Suche von Internetressourcen (1998) 0.00

0.0020921256 = product of:
  0.025105506 = sum of:
    0.025105506 = weight(_text_:internet in 1030) [ClassicSimilarity], result of:
      0.025105506 = score(doc=1030,freq=2.0), product of:
        0.09621047 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.032588977 = queryNorm
        0.2609436 = fieldWeight in 1030, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.0625 = fieldNorm(doc=1030)
  0.083333336 = coord(1/12)

Theme: Internet

HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.00

0.001839732 = product of:
  0.022076784 = sum of:
    0.022076784 = product of:
      0.044153567 = sum of:
        0.044153567 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
          0.044153567 = score(doc=2748,freq=2.0), product of:
            0.11412105 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.032588977 = queryNorm
            0.38690117 = fieldWeight in 2748, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2748)
      0.5 = coord(1/2)
  0.083333336 = coord(1/12)

Date: 1. 2.2016 18:25:22

Dolin, R.; Agrawal, D.; El Abbadi, A.; Pearlman, J.: Using automated classification for summarizing and selecting heterogeneous information sources (1998) 0.00
```
0.0015690941 = product of:
  0.01882913 = sum of:
    0.01882913 = weight(_text_:internet in 316) [ClassicSimilarity], result of:
      0.01882913 = score(doc=316,freq=2.0), product of:
        0.09621047 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.032588977 = queryNorm
        0.1957077 = fieldWeight in 316, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.046875 = fieldNorm(doc=316)
  0.083333336 = coord(1/12)
```
Abstract

Information retrieval over the Internet increasingly requires the filtering of thousands of heterogeneous information sources. Important sources of information include not only traditional databases with structured data and queries, but also increasing numbers of non-traditional, semi- or unstructured collections such as Web sites, FTP archives, etc. As the number and variability of sources increases, new ways of automatically summarizing, discovering, and selecting collections relevant to a user's query are needed. One such method involves the use of classification schemes, such as the Library of Congress Classification (LCC) [10], within which a collection may be represented based on its content, irrespective of the structure of the actual data or documents. For such a system to be useful in a large-scale distributed environment, it must be easy to use for both collection managers and users. As a result, it must be possible to classify documents automatically within a classification scheme. Furthermore, there must be a straightforward and intuitive interface with which the user may use the scheme to assist in information retrieval (IR).

Chung, Y.-M.; Noh, Y.-H.: Developing a specialized directory system by automatically classifying Web documents (2003) 0.00

0.0015690941 = product of:
  0.01882913 = sum of:
    0.01882913 = weight(_text_:internet in 1566) [ClassicSimilarity], result of:
      0.01882913 = score(doc=1566,freq=2.0), product of:
        0.09621047 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.032588977 = queryNorm
        0.1957077 = fieldWeight in 1566, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.046875 = fieldNorm(doc=1566)
  0.083333336 = coord(1/12)

Theme: Internet

Search (38 results, page 1 of 2)

Authors

Years

Languages

Themes