Search (38 results, page 1 of 2)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.29

0.29195872 = product of:
  0.43793806 = sum of:
    0.06107404 = product of:
      0.18322212 = sum of:
        0.18322212 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.18322212 = score(doc=562,freq=2.0), product of:
            0.32600754 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.038453303 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.18322212 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.18322212 = score(doc=562,freq=2.0), product of:
        0.32600754 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.038453303 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.18322212 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.18322212 = score(doc=562,freq=2.0), product of:
        0.32600754 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.038453303 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.010419784 = product of:
      0.03125935 = sum of:
        0.03125935 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.03125935 = score(doc=562,freq=2.0), product of:
            0.13465692 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038453303 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
  0.6666667 = coord(4/6)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.02

0.02175812 = product of:
  0.06527436 = sum of:
    0.044434793 = weight(_text_:internet in 1046) [ClassicSimilarity], result of:
      0.044434793 = score(doc=1046,freq=2.0), product of:
        0.11352337 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.038453303 = queryNorm
        0.3914154 = fieldWeight in 1046, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.09375 = fieldNorm(doc=1046)
    0.020839568 = product of:
      0.0625187 = sum of:
        0.0625187 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
          0.0625187 = score(doc=1046,freq=2.0), product of:
            0.13465692 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038453303 = queryNorm
            0.46428138 = fieldWeight in 1046, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=1046)
      0.33333334 = coord(1/3)
  0.33333334 = coord(2/6)

Date: 5. 5.2003 14:17:22
Footnote: Teil eines Themenheftes: OCLC and the Internet: An Historical Overview of Research Activities, 1990-1999 - Part II

Pfeffer, M.: Automatische Vergabe von RVK-Notationen mittels fallbasiertem Schließen (2009) 0.02
```
0.017795376 = product of:
  0.05338613 = sum of:
    0.042966347 = weight(_text_:bibliothek in 3051) [ClassicSimilarity], result of:
      0.042966347 = score(doc=3051,freq=2.0), product of:
        0.1578712 = queryWeight, product of:
          4.1055303 = idf(docFreq=1980, maxDocs=44218)
          0.038453303 = queryNorm
        0.27216077 = fieldWeight in 3051, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1055303 = idf(docFreq=1980, maxDocs=44218)
          0.046875 = fieldNorm(doc=3051)
    0.010419784 = product of:
      0.03125935 = sum of:
        0.03125935 = weight(_text_:22 in 3051) [ClassicSimilarity], result of:
          0.03125935 = score(doc=3051,freq=2.0), product of:
            0.13465692 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038453303 = queryNorm
            0.23214069 = fieldWeight in 3051, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=3051)
      0.33333334 = coord(1/3)
  0.33333334 = coord(2/6)
```
Abstract

Klassifikation von bibliografischen Einheiten ist für einen systematischen Zugang zu den Beständen einer Bibliothek und deren Aufstellung unumgänglich. Bislang wurde diese Aufgabe von Fachexperten manuell erledigt, sei es individuell nach einer selbst entwickelten Systematik oder kooperativ nach einer gemeinsamen Systematik. In dieser Arbeit wird ein Verfahren zur Automatisierung des Klassifikationsvorgangs vorgestellt. Dabei kommt das Verfahren des fallbasierten Schließens zum Einsatz, das im Kontext der Forschung zur künstlichen Intelligenz entwickelt wurde. Das Verfahren liefert für jedes Werk, für das bibliografische Daten vorliegen, eine oder mehrere mögliche Klassifikationen. In Experimenten werden die Ergebnisse der automatischen Klassifikation mit der durch Fachexperten verglichen. Diese Experimente belegen die hohe Qualität der automatischen Klassifikation und dass das Verfahren geeignet ist, Fachexperten bei der Klassifikationsarbeit signifikant zu entlasten. Auch die nahezu vollständige Resystematisierung eines Bibliothekskataloges ist - mit gewissen Abstrichen - möglich.

Date

22. 8.2009 19:51:28

Chung, Y.-M.; Noh, Y.-H.: Developing a specialized directory system by automatically classifying Web documents (2003) 0.01

0.010910588 = product of:
  0.032731764 = sum of:
    0.022217397 = weight(_text_:internet in 1566) [ClassicSimilarity], result of:
      0.022217397 = score(doc=1566,freq=2.0), product of:
        0.11352337 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.038453303 = queryNorm
        0.1957077 = fieldWeight in 1566, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.046875 = fieldNorm(doc=1566)
    0.010514366 = product of:
      0.0315431 = sum of:
        0.0315431 = weight(_text_:29 in 1566) [ClassicSimilarity], result of:
          0.0315431 = score(doc=1566,freq=2.0), product of:
            0.13526669 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.038453303 = queryNorm
            0.23319192 = fieldWeight in 1566, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.046875 = fieldNorm(doc=1566)
      0.33333334 = coord(1/3)
  0.33333334 = coord(2/6)

Source: Journal of information science. 29(2003) no.2, S.117-126
Theme: Internet

Shafer, K.E.: Evaluating Scorpion Results (2001) 0.01

0.010689351 = product of:
  0.0641361 = sum of:
    0.0641361 = weight(_text_:internet in 4085) [ClassicSimilarity], result of:
      0.0641361 = score(doc=4085,freq=6.0), product of:
        0.11352337 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.038453303 = queryNorm
        0.56495947 = fieldWeight in 4085, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.078125 = fieldNorm(doc=4085)
  0.16666667 = coord(1/6)

Abstract: Using DDC for automatic indexing and classifying of Internet resources
Footnote: Teil eines Themenheftes: OCLC and the Internet: An Historical Overview of Research Activities, 1990-1999 - Part II
Theme: Internet

Shafer, K.E.: Automatic Subject Assignment via the Scorpion System (2001) 0.01

0.010473382 = product of:
  0.06284029 = sum of:
    0.06284029 = weight(_text_:internet in 1043) [ClassicSimilarity], result of:
      0.06284029 = score(doc=1043,freq=4.0), product of:
        0.11352337 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.038453303 = queryNorm
        0.55354494 = fieldWeight in 1043, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.09375 = fieldNorm(doc=1043)
  0.16666667 = coord(1/6)

Footnote: Teil eines Themenheftes: OCLC and the Internet: An Historical Overview of Research Activities, 1990-1999 - Part I
Theme: Internet

Oberhauser, O.: Automatisches Klassifizieren : Entwicklungsstand - Methodik - Anwendungsbereiche (2005) 0.01
```
0.009053299 = product of:
  0.027159896 = sum of:
    0.017902646 = weight(_text_:bibliothek in 38) [ClassicSimilarity], result of:
      0.017902646 = score(doc=38,freq=2.0), product of:
        0.1578712 = queryWeight, product of:
          4.1055303 = idf(docFreq=1980, maxDocs=44218)
          0.038453303 = queryNorm
        0.113400325 = fieldWeight in 38, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1055303 = idf(docFreq=1980, maxDocs=44218)
          0.01953125 = fieldNorm(doc=38)
    0.0092572495 = weight(_text_:internet in 38) [ClassicSimilarity], result of:
      0.0092572495 = score(doc=38,freq=2.0), product of:
        0.11352337 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.038453303 = queryNorm
        0.081544876 = fieldWeight in 38, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.01953125 = fieldNorm(doc=38)
  0.33333334 = coord(2/6)
```
Footnote

Rez. in: VÖB-Mitteilungen 58(2005) H.3, S.102-104 (R.F. Müller); ZfBB 53(2006) H.5, S.282-283 (L. Svensson): "Das Sammeln und Verzeichnen elektronischer Ressourcen gehört in wissenschaftlichen Bibliotheken längst zum Alltag. Parallel dazu kündigt sich ein Paradigmenwechsel bei den Findmitteln an: Um einen effizienten und benutzerorientierten Zugang zu den gemischten Kollektionen bieten zu können, experimentieren einige bibliothekarische Diensteanbieter wie z. B. das hbz (http://suchen.hbz-nrw.de/dreilaender/), die Bibliothek der North Carolina State University (www.lib.ncsu.edu/) und demnächst vascoda (www.vascoda.de/) und der Librarians-Internet Index (www.lii.org/) zunehmend mit Suchmaschinentechnologie. Dabei wird angestrebt, nicht nur einen vollinvertierten Suchindex anzubieten, sondern auch das Browsing durch eine hierarchisch geordnete Klassifikation. Von den Daten in den deutschen Verbunddatenbanken ist jedoch nur ein kleiner Teil schon klassifikatorisch erschlossen. Fremddaten aus dem angloamerikanischen Bereich sind oft mit LCC und/oder DDC erschlossen, wobei die Library of Congress sich bei der DDCErschließung auf Titel, die hauptsächlich für die Public Libraries interessant sind, konzentriert. Die Deutsche Nationalbibliothek wird ab 2007 Printmedien und Hochschulschriften flächendeckend mit DDC erschließen. Es ist aber schon offensichtlich, dass v. a. im Bereich der elektronischen Publikationen die anfallenden Dokumentenmengen mit immer knapperen Personalressourcen nicht intellektuell erschlossen werden können, sondern dass neue Verfahren entwickelt werden müssen. Hier kommt Oberhausers Buch gerade richtig. Seit Anfang der 1990er Jahre sind mehrere Projekte zum Thema automatisches Klassifizieren durchgeführt worden. Wer sich in diese Thematik einarbeiten wollte oder sich für die Ergebnisse der größeren Projekte interessierte, konnte bislang auf keine Überblicksdarstellung zurückgreifen, sondern war auf eine Vielzahl von Einzeluntersuchungen sowie die Projektdokumentationen angewiesen. Oberhausers Darstellung, die auf einer Fülle von publizierter und grauer Literatur fußt, schließt diese Lücke. Das selbst gesetzte Ziel, einen guten Überblick über den momentanen Kenntnisstand und die Ergebnisse der einschlägigen Projekte verständlich zu vermitteln, erfüllt der Autor mit Bravour. Dabei ist anzumerken, dass er ein bibliothekarisches Grundwissen und mindestens grundlegende Kenntnisse über informationswissenschaftliche Grundbegriffe und Fragestellungen voraussetzt, wobei hier für den Einsteiger einige Hinweise auf einführende Darstellungen wünschenswert gewesen wären.

Oberhauser, O.: Automatisches Klassifizieren und Bibliothekskataloge (2005) 0.01

0.008354569 = product of:
  0.05012741 = sum of:
    0.05012741 = weight(_text_:bibliothek in 4099) [ClassicSimilarity], result of:
      0.05012741 = score(doc=4099,freq=2.0), product of:
        0.1578712 = queryWeight, product of:
          4.1055303 = idf(docFreq=1980, maxDocs=44218)
          0.038453303 = queryNorm
        0.31752092 = fieldWeight in 4099, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1055303 = idf(docFreq=1980, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4099)
  0.16666667 = coord(1/6)

Source: Bibliothek Technik Recht. Festschrift für Peter Kubalek zum 60. Geburtstag. Hrsg.: H. Hrusa

Chan, L.M.; Lin, X.; Zeng, M.L.: Structural and multilingual approaches to subject access on the Web (2000) 0.01

0.007405799 = product of:
  0.044434793 = sum of:
    0.044434793 = weight(_text_:internet in 507) [ClassicSimilarity], result of:
      0.044434793 = score(doc=507,freq=2.0), product of:
        0.11352337 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.038453303 = queryNorm
        0.3914154 = fieldWeight in 507, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.09375 = fieldNorm(doc=507)
  0.16666667 = coord(1/6)

Theme: Internet

Pfeffer, M.: Automatische Vergabe von RVK-Notationen anhand von bibliografischen Daten mittels fallbasiertem Schließen (2007) 0.01
```
0.007161058 = product of:
  0.042966347 = sum of:
    0.042966347 = weight(_text_:bibliothek in 558) [ClassicSimilarity], result of:
      0.042966347 = score(doc=558,freq=2.0), product of:
        0.1578712 = queryWeight, product of:
          4.1055303 = idf(docFreq=1980, maxDocs=44218)
          0.038453303 = queryNorm
        0.27216077 = fieldWeight in 558, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1055303 = idf(docFreq=1980, maxDocs=44218)
          0.046875 = fieldNorm(doc=558)
  0.16666667 = coord(1/6)
```
Abstract

Klassifikation von bibliografischen Einheiten ist für einen systematischen Zugang zu den Beständen einer Bibliothek und deren Aufstellung unumgänglich. Bislang wurde diese Aufgabe von Fachexperten manuell erledigt, sei es individuell nach einer selbst entwickelten Systematik oder kooperativ nach einer gemeinsamen Systematik. In dieser Arbeit wird ein Verfahren zur Automatisierung des Klassifikationsvorgangs vorgestellt. Dabei kommt das Verfahren des fallbasierten Schließens zum Einsatz, das im Kontext der Forschung zur künstlichen Intelligenz entwickelt wurde. Das Verfahren liefert für jedes Werk, für das bibliografische Daten vorliegen, eine oder mehrere mögliche Klassifikationen. In Experimenten werden die Ergebnisse der automatischen Klassifikation mit der durch Fachexperten verglichen. Diese Experimente belegen die hohe Qualität der automatischen Klassifikation und dass das Verfahren geeignet ist, Fachexperten bei der Klassifikationsarbeit signifikant zu entlasten. Auch die nahezu vollständige Resystematisierung eines Bibliothekskataloges ist - mit gewissen Abstrichen - möglich.
Walther, R.: Möglichkeiten und Grenzen automatischer Klassifikationen von Web-Dokumenten (2001) 0.01
```
0.0061094724 = product of:
  0.036656834 = sum of:
    0.036656834 = weight(_text_:internet in 1562) [ClassicSimilarity], result of:
      0.036656834 = score(doc=1562,freq=4.0), product of:
        0.11352337 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.038453303 = queryNorm
        0.32290122 = fieldWeight in 1562, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1562)
  0.16666667 = coord(1/6)
```
Abstract

Automatische Klassifikationen von Web- und andern Textdokumenten ermöglichen es, betriebsinterne und externe Informationen geordnet zugänglich zu machen. Die Forschung zur automatischen Klassifikation hat sich in den letzten Jahren intensiviert. Das Resultat sind verschiedenen Methoden, die heute in der Praxis einzeln oder kombiniert für die Klassifikation im Einsatz sind. In der vorliegenden Lizenziatsarbeit werden neben allgemeinen Grundsätzen einige Methoden zur automatischen Klassifikation genauer betrachtet und ihre Möglichkeiten und Grenzen erörtert. Daneben erfolgt die Präsentation der Resultate aus einer Umfrage bei Anbieterrfirmen von Softwarelösungen zur automatische Klassifikation von Text-Dokumenten. Die Ausführungen dienen der myax internet AG als Basis, ein eigenes Klassifikations-Produkt zu entwickeln

Theme

Internet
Choi, B.; Peng, X.: Dynamic and hierarchical classification of Web pages (2004) 0.01
```
0.005236691 = product of:
  0.031420145 = sum of:
    0.031420145 = weight(_text_:internet in 2555) [ClassicSimilarity], result of:
      0.031420145 = score(doc=2555,freq=4.0), product of:
        0.11352337 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.038453303 = queryNorm
        0.27677247 = fieldWeight in 2555, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.046875 = fieldNorm(doc=2555)
  0.16666667 = coord(1/6)
```
Abstract

Automatic classification of Web pages is an effective way to organise the vast amount of information and to assist in retrieving relevant information from the Internet. Although many automatic classification systems have been proposed, most of them ignore the conflict between the fixed number of categories and the growing number of Web pages being added into the systems. They also require searching through all existing categories to make any classification. This article proposes a dynamic and hierarchical classification system that is capable of adding new categories as required, organising the Web pages into a tree structure, and classifying Web pages by searching through only one path of the tree. The proposed single-path search technique reduces the search complexity from (n) to (log(n)). Test results show that the system improves the accuracy of classification by 6 percent in comparison to related systems. The dynamic-category expansion technique also achieves satisfying results for adding new categories into the system as required.

Theme

Internet

Koch, T.; Ardö, A.: Automatic classification of full-text HTML-documents from one specific subject area : DESIRE II D3.6a, Working Paper 2 (2000) 0.00

0.0049371994 = product of:
  0.029623196 = sum of:
    0.029623196 = weight(_text_:internet in 1667) [ClassicSimilarity], result of:
      0.029623196 = score(doc=1667,freq=2.0), product of:
        0.11352337 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.038453303 = queryNorm
        0.2609436 = fieldWeight in 1667, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.0625 = fieldNorm(doc=1667)
  0.16666667 = coord(1/6)

Theme: Internet

Hoffmann, R.: Entwicklung einer benutzerunterstützten automatisierten Klassifikation von Web - Dokumenten : Untersuchung gegenwärtiger Methoden zur automatisierten Dokumentklassifikation und Implementierung eines Prototyps zum verbesserten Information Retrieval für das xFIND System (2002) 0.00
```
0.00427574 = product of:
  0.02565444 = sum of:
    0.02565444 = weight(_text_:internet in 4197) [ClassicSimilarity], result of:
      0.02565444 = score(doc=4197,freq=6.0), product of:
        0.11352337 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.038453303 = queryNorm
        0.22598378 = fieldWeight in 4197, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.03125 = fieldNorm(doc=4197)
  0.16666667 = coord(1/6)
```
Abstract

Das unüberschaubare und permanent wachsende Angebot von Informationen im Internet ermöglicht es den Menschen nicht mehr, dieses inhaltlich zu erfassen oder gezielt nach Informationen zu suchen. Einen Lösungsweg zur verbesserten Informationsauffindung stellt hierbei die Kategorisierung bzw. Klassifikation der Informationen auf Basis ihres thematischen Inhaltes dar. Diese thematische Klassifikation kann sowohl anhand manueller (intellektueller) Methoden als auch durch automatisierte Verfahren erfolgen. Doch beide Ansätze für sich konnten die an sie gestellten Erwartungen bis zum heutigen Tag nur unzureichend erfüllen. Im Rahmen dieser Arbeit soll daher der naheliegende Ansatz, die beiden Methoden sinnvoll zu verknüpfen, untersucht werden. Im ersten Teil dieser Arbeit, dem Untersuchungsbereich, wird einleitend das Problem des Informationsüberangebots in unserer Gesellschaft erläutert und gezeigt, dass die Kategorisierung bzw. Klassifikation dieser Informationen speziell im Internet sinnvoll erscheint. Die prinzipiellen Möglichkeiten der Themenzuordnung von Dokumenten zur Verbesserung der Wissensverwaltung und Wissensauffindung werden beschrieben. Dabei werden unter anderem verschiedene Klassifikationsschemata, Topic Maps und semantische Netze vorgestellt. Schwerpunkt des Untersuchungsbereiches ist die Beschreibung automatisierter Methoden zur Themenzuordnung. Neben einem Überblick über die gebräuchlichsten Klassifikations-Algorithmen werden sowohl am Markt existierende Systeme sowie Forschungsansätze und frei verfügbare Module zur automatischen Klassifikation vorgestellt. Berücksichtigt werden auch Systeme, die zumindest teilweise den erwähnten Ansatz der Kombination von manuellen und automatischen Methoden unterstützen. Auch die in Zusammenhang mit der Klassifikation von Dokumenten im Internet auftretenden Probleme werden aufgezeigt. Die im Untersuchungsbereich gewonnenen Erkenntnisse fließen in die Entwicklung eines Moduls zur benutzerunterstützten, automatischen Dokumentklassifikation im Rahmen des xFIND Systems (extended Framework for Information Discovery) ein. Dieses an der technischen Universität Graz konzipierte Framework stellt die Basis für eine Vielzahl neuer Ideen zur Verbesserung des Information Retrieval dar. Der im Gestaltungsbereich entwickelte Lösungsansatz sieht zunächst die Verwendung bereits im System vorhandener, manuell klassifizierter Dokumente, Server oder Serverbereiche als Grundlage für die automatische Klassifikation vor. Nach erfolgter automatischer Klassifikation können in einem nächsten Schritt dann Autoren und Administratoren die Ergebnisse im Rahmen einer Benutzerunterstützung anpassen. Dabei kann das kollektive Benutzerverhalten durch die Möglichkeit eines Votings - mittels Zustimmung bzw. Ablehnung der Klassifikationsergebnisse - Einfluss finden. Das Wissen von Fachexperten und Benutzern trägt somit letztendlich zur Verbesserung der automatischen Klassifikation bei. Im Gestaltungsbereich werden die grundlegenden Konzepte, der Aufbau und die Funktionsweise des entwickelten Moduls beschrieben, sowie eine Reihe von Vorschlägen und Ideen zur Weiterentwicklung der benutzerunterstützten automatischen Dokumentklassifikation präsentiert.
Wu, K.J.; Chen, M.-C.; Sun, Y.: Automatic topics discovery from hyperlinked documents (2004) 0.00
```
0.0037028994 = product of:
  0.022217397 = sum of:
    0.022217397 = weight(_text_:internet in 2563) [ClassicSimilarity], result of:
      0.022217397 = score(doc=2563,freq=2.0), product of:
        0.11352337 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.038453303 = queryNorm
        0.1957077 = fieldWeight in 2563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.046875 = fieldNorm(doc=2563)
  0.16666667 = coord(1/6)
```
Abstract

Topic discovery is an important means for marketing, e-Business and social science studies. As well, it can be applied to various purposes, such as identifying a group with certain properties and observing the emergence and diminishment of a certain cyber community. Previous topic discovery work (J.M. Kleinberg, Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms, San Francisco, California, p. 668) requires manual judgment of usefulness of outcomes and is thus incapable of handling the explosive growth of the Internet. In this paper, we propose the Automatic Topic Discovery (ATD) method, which combines a method of base set construction, a clustering algorithm and an iterative principal eigenvector computation method to discover the topics relevant to a given query without using manual examination. Given a query, ATD returns with topics associated with the query and top representative pages for each topic. Our experiments show that the ATD method performs better than the traditional eigenvector method in terms of computation time and topic discovery quality.
Montesi, M.; Navarrete, T.: Classifying web genres in context : A case study documenting the web genres used by a software engineer (2008) 0.00
```
0.0037028994 = product of:
  0.022217397 = sum of:
    0.022217397 = weight(_text_:internet in 2100) [ClassicSimilarity], result of:
      0.022217397 = score(doc=2100,freq=2.0), product of:
        0.11352337 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.038453303 = queryNorm
        0.1957077 = fieldWeight in 2100, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.046875 = fieldNorm(doc=2100)
  0.16666667 = coord(1/6)
```
Abstract

This case study analyzes the Internet-based resources that a software engineer uses in his daily work. Methodologically, we studied the web browser history of the participant, classifying all the web pages he had seen over a period of 12 days into web genres. We interviewed him before and after the analysis of the web browser history. In the first interview, he spoke about his general information behavior; in the second, he commented on each web genre, explaining why and how he used them. As a result, three approaches allow us to describe the set of 23 web genres obtained: (a) the purposes they serve for the participant; (b) the role they play in the various work and search phases; (c) and the way they are used in combination with each other. Further observations concern the way the participant assesses quality of web-based resources, and his information behavior as a software engineer.
Classification, automation, and new media : Proceedings of the 24th Annual Conference of the Gesellschaft für Klassifikation e.V., University of Passau, March 15 - 17, 2000 (2002) 0.00
```
0.00308575 = product of:
  0.018514499 = sum of:
    0.018514499 = weight(_text_:internet in 5997) [ClassicSimilarity], result of:
      0.018514499 = score(doc=5997,freq=2.0), product of:
        0.11352337 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.038453303 = queryNorm
        0.16308975 = fieldWeight in 5997, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5997)
  0.16666667 = coord(1/6)
```
Abstract

Given the huge amount of information in the internet and in practically every domain of knowledge that we are facing today, knowledge discovery calls for automation. The book deals with methods from classification and data analysis that respond effectively to this rapidly growing challenge. The interested reader will find new methodological insights as well as applications in economics, management science, finance, and marketing, and in pattern recognition, biology, health, and archaeology.
Li, T.; Zhu, S.; Ogihara, M.: Text categorization via generalized discriminant analysis (2008) 0.00
```
0.00308575 = product of:
  0.018514499 = sum of:
    0.018514499 = weight(_text_:internet in 2119) [ClassicSimilarity], result of:
      0.018514499 = score(doc=2119,freq=2.0), product of:
        0.11352337 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.038453303 = queryNorm
        0.16308975 = fieldWeight in 2119, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2119)
  0.16666667 = coord(1/6)
```
Abstract

Text categorization is an important research area and has been receiving much attention due to the growth of the on-line information and of Internet. Automated text categorization is generally cast as a multi-class classification problem. Much of previous work focused on binary document classification problems. Support vector machines (SVMs) excel in binary classification, but the elegant theory behind large-margin hyperplane cannot be easily extended to multi-class text classification. In addition, the training time and scaling are also important concerns. On the other hand, other techniques naturally extensible to handle multi-class classification are generally not as accurate as SVM. This paper presents a simple and efficient solution to multi-class text categorization. Classification problems are first formulated as optimization via discriminant analysis. Text categorization is then cast as the problem of finding coordinate transformations that reflects the inherent similarity from the data. While most of the previous approaches decompose a multi-class classification problem into multiple independent binary classification tasks, the proposed approach enables direct multi-class classification. By using generalized singular value decomposition (GSVD), a coordinate transformation that reflects the inherent class structure indicated by the generalized singular values is identified. Extensive experiments demonstrate the efficiency and effectiveness of the proposed approach.
Pong, J.Y.-H.; Kwok, R.C.-W.; Lau, R.Y.-K.; Hao, J.-X.; Wong, P.C.-C.: ¬A comparative study of two automatic document classification methods in a library setting (2008) 0.00
```
0.00308575 = product of:
  0.018514499 = sum of:
    0.018514499 = weight(_text_:internet in 2532) [ClassicSimilarity], result of:
      0.018514499 = score(doc=2532,freq=2.0), product of:
        0.11352337 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.038453303 = queryNorm
        0.16308975 = fieldWeight in 2532, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2532)
  0.16666667 = coord(1/6)
```
Abstract

In current library practice, trained human experts usually carry out document cataloguing and indexing based on a manual approach. With the explosive growth in the number of electronic documents available on the Internet and digital libraries, it is increasingly difficult for library practitioners to categorize both electronic documents and traditional library materials using just a manual approach. To improve the effectiveness and efficiency of document categorization at the library setting, more in-depth studies of using automatic document classification methods to categorize library items are required. Machine learning research has advanced rapidly in recent years. However, applying machine learning techniques to improve library practice is still a relatively unexplored area. This paper illustrates the design and development of a machine learning based automatic document classification system to alleviate the manual categorization problem encountered within the library setting. Two supervised machine learning algorithms have been tested. Our empirical tests show that supervised machine learning algorithms in general, and the k-nearest neighbours (KNN) algorithm in particular, can be used to develop an effective document classification system to enhance current library practice. Moreover, some concrete recommendations regarding how to practically apply the KNN algorithm to develop automatic document classification in a library setting are made. To our best knowledge, this is the first in-depth study of applying the KNN algorithm to automatic document classification based on the widely used LCC classification scheme adopted by many large libraries.

Reiner, U.: Automatische DDC-Klassifizierung von bibliografischen Titeldatensätzen (2009) 0.00

0.0028943846 = product of:
  0.017366307 = sum of:
    0.017366307 = product of:
      0.05209892 = sum of:
        0.05209892 = weight(_text_:22 in 611) [ClassicSimilarity], result of:
          0.05209892 = score(doc=611,freq=2.0), product of:
            0.13465692 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038453303 = queryNorm
            0.38690117 = fieldWeight in 611, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=611)
      0.33333334 = coord(1/3)
  0.16666667 = coord(1/6)

Date: 22. 8.2009 12:54:24

Search (38 results, page 1 of 2)

Authors

Languages

Types

Themes

Subjects