Search (20 results, page 1 of 1)

Reiner, U.: Automatische DDC-Klassifizierung von bibliografischen Titeldatensätzen (2009) 0.01

0.008647365 = product of:
  0.06485523 = sum of:
    0.015727427 = weight(_text_:und in 611) [ClassicSimilarity], result of:
      0.015727427 = score(doc=611,freq=2.0), product of:
        0.06422601 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.028978055 = queryNorm
        0.24487628 = fieldWeight in 611, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.078125 = fieldNorm(doc=611)
    0.04912781 = sum of:
      0.009866543 = weight(_text_:information in 611) [ClassicSimilarity], result of:
        0.009866543 = score(doc=611,freq=2.0), product of:
          0.050870337 = queryWeight, product of:
            1.7554779 = idf(docFreq=20772, maxDocs=44218)
            0.028978055 = queryNorm
          0.19395474 = fieldWeight in 611, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            1.7554779 = idf(docFreq=20772, maxDocs=44218)
            0.078125 = fieldNorm(doc=611)
      0.039261267 = weight(_text_:22 in 611) [ClassicSimilarity], result of:
        0.039261267 = score(doc=611,freq=2.0), product of:
          0.101476215 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.028978055 = queryNorm
          0.38690117 = fieldWeight in 611, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.078125 = fieldNorm(doc=611)
  0.13333334 = coord(2/15)

Content: Präsentation zum Vortrag anlässlich des 98. Deutscher Bibliothekartag in Erfurt: Ein neuer Blick auf Bibliotheken; TK10: Information erschließen und recherchieren Inhalte erschließen - mit neuen Tools
Date: 22. 8.2009 12:54:24

Reiner, U.: Automatische DDC-Klassifizierung bibliografischer Titeldatensätze der Deutschen Nationalbibliografie (2009) 0.00
```
0.0046747727 = product of:
  0.035060793 = sum of:
    0.015409667 = weight(_text_:und in 3284) [ClassicSimilarity], result of:
      0.015409667 = score(doc=3284,freq=12.0), product of:
        0.06422601 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.028978055 = queryNorm
        0.23992877 = fieldWeight in 3284, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.03125 = fieldNorm(doc=3284)
    0.019651124 = sum of:
      0.0039466172 = weight(_text_:information in 3284) [ClassicSimilarity], result of:
        0.0039466172 = score(doc=3284,freq=2.0), product of:
          0.050870337 = queryWeight, product of:
            1.7554779 = idf(docFreq=20772, maxDocs=44218)
            0.028978055 = queryNorm
          0.0775819 = fieldWeight in 3284, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            1.7554779 = idf(docFreq=20772, maxDocs=44218)
            0.03125 = fieldNorm(doc=3284)
      0.015704507 = weight(_text_:22 in 3284) [ClassicSimilarity], result of:
        0.015704507 = score(doc=3284,freq=2.0), product of:
          0.101476215 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.028978055 = queryNorm
          0.15476047 = fieldWeight in 3284, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.03125 = fieldNorm(doc=3284)
  0.13333334 = coord(2/15)
```
Abstract

Das Klassifizieren von Objekten (z. B. Fauna, Flora, Texte) ist ein Verfahren, das auf menschlicher Intelligenz basiert. In der Informatik - insbesondere im Gebiet der Künstlichen Intelligenz (KI) - wird u. a. untersucht, inweit Verfahren, die menschliche Intelligenz benötigen, automatisiert werden können. Hierbei hat sich herausgestellt, dass die Lösung von Alltagsproblemen eine größere Herausforderung darstellt, als die Lösung von Spezialproblemen, wie z. B. das Erstellen eines Schachcomputers. So ist "Rybka" der seit Juni 2007 amtierende Computerschach-Weltmeistern. Inwieweit Alltagsprobleme mit Methoden der Künstlichen Intelligenz gelöst werden können, ist eine - für den allgemeinen Fall - noch offene Frage. Beim Lösen von Alltagsproblemen spielt die Verarbeitung der natürlichen Sprache, wie z. B. das Verstehen, eine wesentliche Rolle. Den "gesunden Menschenverstand" als Maschine (in der Cyc-Wissensbasis in Form von Fakten und Regeln) zu realisieren, ist Lenat's Ziel seit 1984. Bezüglich des KI-Paradeprojektes "Cyc" gibt es CycOptimisten und Cyc-Pessimisten. Das Verstehen der natürlichen Sprache (z. B. Werktitel, Zusammenfassung, Vorwort, Inhalt) ist auch beim intellektuellen Klassifizieren von bibliografischen Titeldatensätzen oder Netzpublikationen notwendig, um diese Textobjekte korrekt klassifizieren zu können. Seit dem Jahr 2007 werden von der Deutschen Nationalbibliothek nahezu alle Veröffentlichungen mit der Dewey Dezimalklassifikation (DDC) intellektuell klassifiziert.
Die Menge der zu klassifizierenden Veröffentlichungen steigt spätestens seit der Existenz des World Wide Web schneller an, als sie intellektuell sachlich erschlossen werden kann. Daher werden Verfahren gesucht, um die Klassifizierung von Textobjekten zu automatisieren oder die intellektuelle Klassifizierung zumindest zu unterstützen. Seit 1968 gibt es Verfahren zur automatischen Dokumentenklassifizierung (Information Retrieval, kurz: IR) und seit 1992 zur automatischen Textklassifizierung (ATC: Automated Text Categorization). Seit immer mehr digitale Objekte im World Wide Web zur Verfügung stehen, haben Arbeiten zur automatischen Textklassifizierung seit ca. 1998 verstärkt zugenommen. Dazu gehören seit 1996 auch Arbeiten zur automatischen DDC-Klassifizierung bzw. RVK-Klassifizierung von bibliografischen Titeldatensätzen und Volltextdokumenten. Bei den Entwicklungen handelt es sich unseres Wissens bislang um experimentelle und keine im ständigen Betrieb befindlichen Systeme. Auch das VZG-Projekt Colibri/DDC ist seit 2006 u. a. mit der automatischen DDC-Klassifizierung befasst. Die diesbezüglichen Untersuchungen und Entwicklungen dienen zur Beantwortung der Forschungsfrage: "Ist es möglich, eine inhaltlich stimmige DDC-Titelklassifikation aller GVK-PLUS-Titeldatensätze automatisch zu erzielen?"

Date

22. 1.2010 14:41:24
Reiner, U.: VZG-Projekt Colibri : Bewertung von automatisch DDC-klassifizierten Titeldatensätzen der Deutschen Nationalbibliothek (DNB) (2009) 0.00
```
0.0017387326 = product of:
  0.026080986 = sum of:
    0.026080986 = weight(_text_:und in 2675) [ClassicSimilarity], result of:
      0.026080986 = score(doc=2675,freq=22.0), product of:
        0.06422601 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.028978055 = queryNorm
        0.40608138 = fieldWeight in 2675, product of:
          4.690416 = tf(freq=22.0), with freq of:
            22.0 = termFreq=22.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2675)
  0.06666667 = coord(1/15)
```
Abstract

Das VZG-Projekt Colibri/DDC beschäftigt sich seit 2003 mit automatischen Verfahren zur Dewey-Dezimalklassifikation (Dewey Decimal Classification, kurz DDC). Ziel des Projektes ist eine einheitliche DDC-Erschließung von bibliografischen Titeldatensätzen und eine Unterstützung der DDC-Expert(inn)en und DDC-Laien, z. B. bei der Analyse und Synthese von DDC-Notationen und deren Qualitätskontrolle und der DDC-basierten Suche. Der vorliegende Bericht konzentriert sich auf die erste größere automatische DDC-Klassifizierung und erste automatische und intellektuelle Bewertung mit der Klassifizierungskomponente vc_dcl1. Grundlage hierfür waren die von der Deutschen Nationabibliothek (DNB) im November 2007 zur Verfügung gestellten 25.653 Titeldatensätze (12 Wochen-/Monatslieferungen) der Deutschen Nationalbibliografie der Reihen A, B und H. Nach Erläuterung der automatischen DDC-Klassifizierung und automatischen Bewertung in Kapitel 2 wird in Kapitel 3 auf den DNB-Bericht "Colibri_Auswertung_DDC_Endbericht_Sommer_2008" eingegangen. Es werden Sachverhalte geklärt und Fragen gestellt, deren Antworten die Weichen für den Verlauf der weiteren Klassifizierungstests stellen werden. Über das Kapitel 3 hinaus führende weitergehende Betrachtungen und Gedanken zur Fortführung der automatischen DDC-Klassifizierung werden in Kapitel 4 angestellt. Der Bericht dient dem vertieften Verständnis für die automatischen Verfahren.

Frobese, D.T.: Klassifikationsaufgaben mit der SENTRAX : Konkreter Fall: Automatische Detektion von SPAM (2006) 0.00

0.0014528375 = product of:
  0.02179256 = sum of:
    0.02179256 = weight(_text_:und in 5980) [ClassicSimilarity], result of:
      0.02179256 = score(doc=5980,freq=6.0), product of:
        0.06422601 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.028978055 = queryNorm
        0.33931053 = fieldWeight in 5980, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.0625 = fieldNorm(doc=5980)
  0.06666667 = coord(1/15)

Abstract: Die Suchfunktionen des SENTRAX-Verfahrens werden für die Klassifizierung von Mails und im Besonderen für die Detektion von SPAM eingesetzt. Die Eigenschaften einer kontextähnlichen Suche und die Fehlertoleranz sollen genutzt werden, um SPAM Nachrichten treffsicher aufzuspüren.
Footnote: Beitrag der Proceedings des Fünften Hildesheimer Evaluierungs- und Retrievalworkshop (HIER 2006), Hildesheim, xx.x.2006.

Koch, T.; Vizine-Goetz, D.: DDC and knowledge organization in the digital library : Research and development. Demonstration pages (1999) 0.00
```
0.0012581941 = product of:
  0.018872911 = sum of:
    0.018872911 = weight(_text_:und in 942) [ClassicSimilarity], result of:
      0.018872911 = score(doc=942,freq=8.0), product of:
        0.06422601 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.028978055 = queryNorm
        0.29385152 = fieldWeight in 942, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.046875 = fieldNorm(doc=942)
  0.06666667 = coord(1/15)
```
Abstract

Der Workshop gibt einen Einblick in die aktuelle Forschung und Entwicklung zur Wissensorganisation in digitalen Bibliotheken. Diane Vizine-Goetz vom OCLC Office of Research in Dublin, Ohio, stellt die Forschungsprojekte von OCLC zur Anpassung und Weiterentwicklung der Dewey Decimal Classification als Wissensorganisationsinstrument fuer grosse digitale Dokumentensammlungen vor. Traugott Koch, NetLab, Universität Lund in Schweden, demonstriert die Ansätze und Lösungen des EU-Projekts DESIRE zum Einsatz von intellektueller und vor allem automatischer Klassifikation in Fachinformationsdiensten im Internet.

Wätjen, H.-J.: Automatisches Sammeln, Klassifizieren und Indexieren von wissenschaftlich relevanten Informationsressourcen im deutschen World Wide Web : das DFG-Projekt GERHARD (1998) 0.00

0.0010484952 = product of:
  0.015727427 = sum of:
    0.015727427 = weight(_text_:und in 3066) [ClassicSimilarity], result of:
      0.015727427 = score(doc=3066,freq=2.0), product of:
        0.06422601 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.028978055 = queryNorm
        0.24487628 = fieldWeight in 3066, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.078125 = fieldNorm(doc=3066)
  0.06666667 = coord(1/15)

Automatic classification research at OCLC (2002) 0.00

9.1609627E-4 = product of:
  0.013741443 = sum of:
    0.013741443 = product of:
      0.027482886 = sum of:
        0.027482886 = weight(_text_:22 in 1563) [ClassicSimilarity], result of:
          0.027482886 = score(doc=1563,freq=2.0), product of:
            0.101476215 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.028978055 = queryNorm
            0.2708308 = fieldWeight in 1563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1563)
      0.5 = coord(1/2)
  0.06666667 = coord(1/15)

Date: 5. 5.2003 9:22:09

Chan, L.M.; Lin, X.; Zeng, M.: Structural and multilingual approaches to subject access on the Web (1999) 0.00
```
8.387961E-4 = product of:
  0.012581941 = sum of:
    0.012581941 = weight(_text_:und in 162) [ClassicSimilarity], result of:
      0.012581941 = score(doc=162,freq=2.0), product of:
        0.06422601 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.028978055 = queryNorm
        0.19590102 = fieldWeight in 162, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.0625 = fieldNorm(doc=162)
  0.06666667 = coord(1/15)
```
Abstract

Zu den großen Herausforderungen einer sinnvollen Suche im WWW gehören die riesige Menge des Verfügbaren und die Sparchbarrieren. Verfahren, die die Web-Ressourcen im Hinblick auf ein effizienteres Retrieval inhaltlich strukturieren, werden daher ebenso dringend benötigt wie Programme, die mit der Sprachvielfalt umgehen können. Im folgenden Vortrag werden wir einige Ansätze diskutieren, die zur Bewältigung der beiden Probleme derzeit unternommen werden

Lindholm, J.; Schönthal, T.; Jansson , K.: Experiences of harvesting Web resources in engineering using automatic classification (2003) 0.00

8.387961E-4 = product of:
  0.012581941 = sum of:
    0.012581941 = weight(_text_:und in 4088) [ClassicSimilarity], result of:
      0.012581941 = score(doc=4088,freq=2.0), product of:
        0.06422601 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.028978055 = queryNorm
        0.19590102 = fieldWeight in 4088, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.0625 = fieldNorm(doc=4088)
  0.06666667 = coord(1/15)

Footnote: Auch unter: http://www.ariadne.ac.uk/issue37/lindholm/ und http://engine-e.lub.lu.se/

Dolin, R.; Agrawal, D.; El Abbadi, A.; Pearlman, J.: Using automated classification for summarizing and selecting heterogeneous information sources (1998) 0.00
```
4.4124527E-4 = product of:
  0.0066186786 = sum of:
    0.0066186786 = product of:
      0.013237357 = sum of:
        0.013237357 = weight(_text_:information in 316) [ClassicSimilarity], result of:
          0.013237357 = score(doc=316,freq=10.0), product of:
            0.050870337 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.028978055 = queryNorm
            0.2602176 = fieldWeight in 316, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=316)
      0.5 = coord(1/2)
  0.06666667 = coord(1/15)
```
Abstract

Information retrieval over the Internet increasingly requires the filtering of thousands of heterogeneous information sources. Important sources of information include not only traditional databases with structured data and queries, but also increasing numbers of non-traditional, semi- or unstructured collections such as Web sites, FTP archives, etc. As the number and variability of sources increases, new ways of automatically summarizing, discovering, and selecting collections relevant to a user's query are needed. One such method involves the use of classification schemes, such as the Library of Congress Classification (LCC) [10], within which a collection may be represented based on its content, irrespective of the structure of the actual data or documents. For such a system to be useful in a large-scale distributed environment, it must be easy to use for both collection managers and users. As a result, it must be possible to classify documents automatically within a classification scheme. Furthermore, there must be a straightforward and intuitive interface with which the user may use the scheme to assist in information retrieval (IR).

Yi, K.: Challenges in automated classification using library classification schemes (2006) 0.00

3.7209064E-4 = product of:
  0.0055813594 = sum of:
    0.0055813594 = product of:
      0.011162719 = sum of:
        0.011162719 = weight(_text_:information in 5810) [ClassicSimilarity], result of:
          0.011162719 = score(doc=5810,freq=4.0), product of:
            0.050870337 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.028978055 = queryNorm
            0.21943474 = fieldWeight in 5810, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0625 = fieldNorm(doc=5810)
      0.5 = coord(1/2)
  0.06666667 = coord(1/15)

Abstract: A major library classification scheme has long been standard classification framework for information sources in traditional library environment, and text classification (TC) becomes a popular and attractive tool of organizing digital information. This paper gives an overview of previous projects and studies on TC using major library classification schemes, and summarizes a discussion of TC research challenges.

Shafer, K.E.: Evaluating Scorpion results (1998) 0.00

3.2888478E-4 = product of:
  0.0049332716 = sum of:
    0.0049332716 = product of:
      0.009866543 = sum of:
        0.009866543 = weight(_text_:information in 1569) [ClassicSimilarity], result of:
          0.009866543 = score(doc=1569,freq=2.0), product of:
            0.050870337 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.028978055 = queryNorm
            0.19395474 = fieldWeight in 1569, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.078125 = fieldNorm(doc=1569)
      0.5 = coord(1/2)
  0.06666667 = coord(1/15)

Abstract: Scorpion is a research project at OCLC that builds tools for automatic subject assignment by combining library science and information retrieval techniques. A thesis of Scorpion is that the Dewey Decimal Classification (Dewey) can be used to perform automatic subject assignment for electronic items.

Dolin, R.; Agrawal, D.; El Abbadi, A.; Pearlman, J.: Using automated classification for summarizing and selecting heterogeneous information sources (1998) 0.00
```
2.9599632E-4 = product of:
  0.0044399444 = sum of:
    0.0044399444 = product of:
      0.008879889 = sum of:
        0.008879889 = weight(_text_:information in 1253) [ClassicSimilarity], result of:
          0.008879889 = score(doc=1253,freq=18.0), product of:
            0.050870337 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.028978055 = queryNorm
            0.17455927 = fieldWeight in 1253, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0234375 = fieldNorm(doc=1253)
      0.5 = coord(1/2)
  0.06666667 = coord(1/15)
```
Abstract

Information retrieval over the Internet increasingly requires the filtering of thousands of heterogeneous information sources. Important sources of information include not only traditional databases with structured data and queries, but also increasing numbers of non-traditional, semi- or unstructured collections such as Web sites, FTP archives, etc. As the number and variability of sources increases, new ways of automatically summarizing, discovering, and selecting collections relevant to a user's query are needed. One such method involves the use of classification schemes, such as the Library of Congress Classification (LCC), within which a collection may be represented based on its content, irrespective of the structure of the actual data or documents. For such a system to be useful in a large-scale distributed environment, it must be easy to use for both collection managers and users. As a result, it must be possible to classify documents automatically within a classification scheme. Furthermore, there must be a straightforward and intuitive interface with which the user may use the scheme to assist in information retrieval (IR). Our work with the Alexandria Digital Library (ADL) Project focuses on geo-referenced information, whether text, maps, aerial photographs, or satellite images. As a result, we have emphasized techniques which work with both text and non-text, such as combined textual and graphical queries, multi-dimensional indexing, and IR methods which are not solely dependent on words or phrases. Part of this work involves locating relevant online sources of information. In particular, we have designed and are currently testing aspects of an architecture, Pharos, which we believe will scale up to 1.000.000 heterogeneous sources. Pharos accommodates heterogeneity in content and format, both among multiple sources as well as within a single source. That is, we consider sources to include Web sites, FTP archives, newsgroups, and full digital libraries; all of these systems can include a wide variety of content and multimedia data formats. Pharos is based on the use of hierarchical classification schemes. These include not only well-known 'subject' (or 'concept') based schemes such as the Dewey Decimal System and the LCC, but also, for example, geographic classifications, which might be constructed as layers of smaller and smaller hierarchical longitude/latitude boxes. Pharos is designed to work with sophisticated queries which utilize subjects, geographical locations, temporal specifications, and other types of information domains. The Pharos architecture requires that hierarchically structured collection metadata be extracted so that it can be partitioned in such a way as to greatly enhance scalability. Automated classification is important to Pharos because it allows information sources to extract the requisite collection metadata automatically that must be distributed.

Autonomy, Inc.: Automatic classification (o.J.) 0.00

2.6310782E-4 = product of:
  0.0039466172 = sum of:
    0.0039466172 = product of:
      0.0078932345 = sum of:
        0.0078932345 = weight(_text_:information in 1666) [ClassicSimilarity], result of:
          0.0078932345 = score(doc=1666,freq=2.0), product of:
            0.050870337 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.028978055 = queryNorm
            0.1551638 = fieldWeight in 1666, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0625 = fieldNorm(doc=1666)
      0.5 = coord(1/2)
  0.06666667 = coord(1/15)

Abstract: Autonomy's Classification solutions remove the necessity for organizations to rely on human intervention or manual processing of information, such as manual tagging, typically required to make most other e-business applications work. Autonomy's ability to consistently and accurately classify data automatically is a unique infrastructure solution that overcomes the predicaments surrounding the exponential growth of unstructured data.

Koch, T.; Ardö, A.; Brümmer, A.: ¬The building and maintenance of robot based internet search services : A review of current indexing and data collection methods. Prepared to meet the requirements of Work Package 3 of EU Telematics for Research, project DESIRE. Version D3.11v0.3 (Draft version 3) (1996) 0.00
```
2.2785808E-4 = product of:
  0.003417871 = sum of:
    0.003417871 = product of:
      0.006835742 = sum of:
        0.006835742 = weight(_text_:information in 1669) [ClassicSimilarity], result of:
          0.006835742 = score(doc=1669,freq=6.0), product of:
            0.050870337 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.028978055 = queryNorm
            0.1343758 = fieldWeight in 1669, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.03125 = fieldNorm(doc=1669)
      0.5 = coord(1/2)
  0.06666667 = coord(1/15)
```
Abstract

After a short outline of problems, possibilities and difficulties of systematic information retrieval on the Internet and a description of efforts for development in this area, a specification of the terminology for this report is required. Although the process of retrieval is generally seen as an iterative process of browsing and information retrieval and several important services on the net have taken this fact into consideration, the emphasis of this report lays on the general retrieval tools for the whole of Internet. In order to be able to evaluate the differences, possibilities and restrictions of the different services it is necessary to begin with organizing the existing varieties in a typological/ taxonomical survey. The possibilities and weaknesses will be briefly compared and described for the most important services in the categories robot-based WWW-catalogues of different types, list- or form-based catalogues and simultaneous or collected search services respectively. It will however for different reasons not be possible to rank them in order of "best" services. Still more important are the weaknesses and problems common for all attempts of indexing the Internet. The problems of the quality of the input, the technical performance and the general problem of indexing virtual hypertext are shown to be at least as difficult as the different aspects of harvesting, indexing and information retrieval. Some of the attempts made in the area of further development of retrieval services will be mentioned in relation to descriptions of the contents of documents and standardization efforts. Internet harvesting and indexing technology and retrieval software is thoroughly reviewed. Details about all services and software are listed in analytical forms in Annex 1-3.

Prabowo, R.; Jackson, M.; Burden, P.; Knoell, H.-D.: Ontology-based automatic classification for the Web pages : design, implementation and evaluation (2002) 0.00

1.9733087E-4 = product of:
  0.002959963 = sum of:
    0.002959963 = product of:
      0.005919926 = sum of:
        0.005919926 = weight(_text_:information in 3383) [ClassicSimilarity], result of:
          0.005919926 = score(doc=3383,freq=2.0), product of:
            0.050870337 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.028978055 = queryNorm
            0.116372846 = fieldWeight in 3383, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=3383)
      0.5 = coord(1/2)
  0.06666667 = coord(1/15)

Content: Beitrag bei: The Third International Conference on Web Information Systems Engineering (WISE'00) Dec., 12-14, 2002, Singapore, S.182.

Hagedorn, K.; Chapman, S.; Newman, D.: Enhancing search and browse using automated clustering of subject metadata (2007) 0.00
```
1.9733087E-4 = product of:
  0.002959963 = sum of:
    0.002959963 = product of:
      0.005919926 = sum of:
        0.005919926 = weight(_text_:information in 1168) [ClassicSimilarity], result of:
          0.005919926 = score(doc=1168,freq=2.0), product of:
            0.050870337 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.028978055 = queryNorm
            0.116372846 = fieldWeight in 1168, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=1168)
      0.5 = coord(1/2)
  0.06666667 = coord(1/15)
```
Abstract

The Web puzzle of online information resources often hinders end-users from effective and efficient access to these resources. Clustering resources into appropriate subject-based groupings may help alleviate these difficulties, but will it work with heterogeneous material? The University of Michigan and the University of California Irvine joined forces to test automatically enhancing metadata records using the Topic Modeling algorithm on the varied OAIster corpus. We created labels for the resulting clusters of metadata records, matched the clusters to an in-house classification system, and developed a prototype that would showcase methods for search and retrieval using the enhanced records. Results indicated that while the algorithm was somewhat time-intensive to run and using a local classification scheme had its drawbacks, precise clustering of records was achieved and the prototype interface proved that faceted classification could be powerful in helping end-users find resources.
Search Engines and Beyond : Developing efficient knowledge management systems, April 19-20 1999, Boston, Mass (1999) 0.00
```
1.8604532E-4 = product of:
  0.0027906797 = sum of:
    0.0027906797 = product of:
      0.0055813594 = sum of:
        0.0055813594 = weight(_text_:information in 2596) [ClassicSimilarity], result of:
          0.0055813594 = score(doc=2596,freq=4.0), product of:
            0.050870337 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.028978055 = queryNorm
            0.10971737 = fieldWeight in 2596, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.03125 = fieldNorm(doc=2596)
      0.5 = coord(1/2)
  0.06666667 = coord(1/15)
```
Content

Ramana Rao (Inxight, Palo Alto, CA) 7 ± 2 Insights on achieving Effective Information Access Session One: Updates and a twelve month perspective Danny Sullivan (Search Engine Watch, US / England) Portalization and other search trends Carol Tenopir (University of Tennessee) Search realities faced by end users and professional searchers Session Two: Today's search engines and beyond Daniel Hoogterp (Retrieval Technologies, McLean, VA) Effective presentation and utilization of search techniques Rick Kenny (Fulcrum Technologies, Ontario, Canada) Beyond document clustering: The knowledge impact statement Gary Stock (Ingenius, Kalamazoo, MI) Automated change monitoring Gary Culliss (Direct Hit, Wellesley Hills, MA) User popularity ranked search engines Byron Dom (IBM, CA) Automatically finding the best pages on the World Wide Web (CLEVER) Peter Tomassi (LookSmart, San Francisco, CA) Adding human intellect to search technology Session Three: Panel discussion: Human v automated categorization and editing Ev Brenner (New York, NY)- Chairman James Callan (University of Massachusetts, MA) Marc Krellenstein (Northern Light Technology, Cambridge, MA) Dan Miller (Ask Jeeves, Berkeley, CA) Session Four: Updates and a twelve month perspective Steve Arnold (AIT, Harrods Creek, KY) Review: The leading edge in search and retrieval software Ellen Voorhees (NIST, Gaithersburg, MD) TREC update Session Five: Search engines now and beyond Intelligent Agents John Snyder (Muscat, Cambridge, England) Practical issues behind intelligent agents Text summarization Therese Firmin, (Dept of Defense, Ft George G. Meade, MD) The TIPSTER/SUMMAC evaluation of automatic text summarization systems Cross language searching Elizabeth Liddy (TextWise, Syracuse, NY) A conceptual interlingua approach to cross-language retrieval. Video search and retrieval Armon Amir (IBM, Almaden, CA) CueVideo: Modular system for automatic indexing and browsing of video/audio Speech recognition Michael Witbrock (Lycos, Waltham, MA) Retrieval of spoken documents Visualization James A. Wise (Integral Visuals, Richland, WA) Information visualization in the new millennium: Emerging science or passing fashion? Text mining David Evans (Claritech, Pittsburgh, PA) Text mining - towards decision support
Adams, K.C.: Word wranglers : Automatic classification tools transform enterprise documents from "bags of words" into knowledge resources (2003) 0.00
```
1.6444239E-4 = product of:
  0.0024666358 = sum of:
    0.0024666358 = product of:
      0.0049332716 = sum of:
        0.0049332716 = weight(_text_:information in 1665) [ClassicSimilarity], result of:
          0.0049332716 = score(doc=1665,freq=2.0), product of:
            0.050870337 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.028978055 = queryNorm
            0.09697737 = fieldWeight in 1665, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1665)
      0.5 = coord(1/2)
  0.06666667 = coord(1/15)
```
Abstract

Taxonomies are an important part of any knowledge management (KM) system, and automatic classification software is emerging as a "killer app" for consumer and enterprise portals. A number of companies such as Inxight Software , Mohomine, Metacode, and others claim to interpret the semantic content of any textual document and automatically classify text on the fly. The promise that software could automatically produce a Yahoo-style directory is a siren call not many IT managers are able to resist. KM needs have grown more complex due to the increasing amount of digital information, the declining effectiveness of keyword searching, and heterogeneous document formats in corporate databases. This environment requires innovative KM tools, and automatic classification technology is an example of this new kind of software. These products can be divided into three categories according to their underlying technology - rules-based, catalog-by-example, and statistical clustering. Evolving trends in this market include framing classification as a cyborg (computer- and human-based) activity and the increasing use of extensible markup language (XML) and support vector machine (SVM) technology. In this article, we'll survey the rapidly changing automatic classification software market and examine the features and capabilities of leading classification products.
Wartena, C.; Sommer, M.: Automatic classification of scientific records using the German Subject Heading Authority File (SWD) (2012) 0.00
```
1.6444239E-4 = product of:
  0.0024666358 = sum of:
    0.0024666358 = product of:
      0.0049332716 = sum of:
        0.0049332716 = weight(_text_:information in 472) [ClassicSimilarity], result of:
          0.0049332716 = score(doc=472,freq=2.0), product of:
            0.050870337 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.028978055 = queryNorm
            0.09697737 = fieldWeight in 472, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=472)
      0.5 = coord(1/2)
  0.06666667 = coord(1/15)
```
Abstract

The following paper deals with an automatic text classification method which does not require training documents. For this method the German Subject Heading Authority File (SWD), provided by the linked data service of the German National Library is used. Recently the SWD was enriched with notations of the Dewey Decimal Classification (DDC). In consequence it became possible to utilize the subject headings as textual representations for the notations of the DDC. Basically, we we derive the classification of a text from the classification of the words in the text given by the thesaurus. The method was tested by classifying 3826 OAI-Records from 7 different repositories. Mean reciprocal rank and recall were chosen as evaluation measure. Direct comparison to a machine learning method has shown that this method is definitely competitive. Thus we can conclude that the enriched version of the SWD provides high quality information with a broad coverage for classification of German scientific articles.

Search (20 results, page 1 of 1)

Authors

Years

Languages

Types

Themes