Search (57 results, page 2 of 3)

Panyr, J.: STEINADLER: ein Verfahren zur automatischen Deskribierung und zur automatischen thematischen Klassifikation (1978) 0.00

0.004673052 = product of:
  0.028038312 = sum of:
    0.028038312 = product of:
      0.08411493 = sum of:
        0.08411493 = weight(_text_:29 in 5169) [ClassicSimilarity], result of:
          0.08411493 = score(doc=5169,freq=2.0), product of:
            0.13526669 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.038453303 = queryNorm
            0.6218451 = fieldWeight in 5169, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.125 = fieldNorm(doc=5169)
      0.33333334 = coord(1/3)
  0.16666667 = coord(1/6)

Source: Nachrichten für Dokumentation. 29(1978), S.92-96

Dolin, R.; Agrawal, D.; El Abbadi, A.; Pearlman, J.: Using automated classification for summarizing and selecting heterogeneous information sources (1998) 0.00
```
0.0037028994 = product of:
  0.022217397 = sum of:
    0.022217397 = weight(_text_:internet in 316) [ClassicSimilarity], result of:
      0.022217397 = score(doc=316,freq=2.0), product of:
        0.11352337 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.038453303 = queryNorm
        0.1957077 = fieldWeight in 316, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.046875 = fieldNorm(doc=316)
  0.16666667 = coord(1/6)
```
Abstract

Information retrieval over the Internet increasingly requires the filtering of thousands of heterogeneous information sources. Important sources of information include not only traditional databases with structured data and queries, but also increasing numbers of non-traditional, semi- or unstructured collections such as Web sites, FTP archives, etc. As the number and variability of sources increases, new ways of automatically summarizing, discovering, and selecting collections relevant to a user's query are needed. One such method involves the use of classification schemes, such as the Library of Congress Classification (LCC) [10], within which a collection may be represented based on its content, irrespective of the structure of the actual data or documents. For such a system to be useful in a large-scale distributed environment, it must be easy to use for both collection managers and users. As a result, it must be possible to classify documents automatically within a classification scheme. Furthermore, there must be a straightforward and intuitive interface with which the user may use the scheme to assist in information retrieval (IR).
Wu, K.J.; Chen, M.-C.; Sun, Y.: Automatic topics discovery from hyperlinked documents (2004) 0.00
```
0.0037028994 = product of:
  0.022217397 = sum of:
    0.022217397 = weight(_text_:internet in 2563) [ClassicSimilarity], result of:
      0.022217397 = score(doc=2563,freq=2.0), product of:
        0.11352337 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.038453303 = queryNorm
        0.1957077 = fieldWeight in 2563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.046875 = fieldNorm(doc=2563)
  0.16666667 = coord(1/6)
```
Abstract

Topic discovery is an important means for marketing, e-Business and social science studies. As well, it can be applied to various purposes, such as identifying a group with certain properties and observing the emergence and diminishment of a certain cyber community. Previous topic discovery work (J.M. Kleinberg, Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms, San Francisco, California, p. 668) requires manual judgment of usefulness of outcomes and is thus incapable of handling the explosive growth of the Internet. In this paper, we propose the Automatic Topic Discovery (ATD) method, which combines a method of base set construction, a clustering algorithm and an iterative principal eigenvector computation method to discover the topics relevant to a given query without using manual examination. Given a query, ATD returns with topics associated with the query and top representative pages for each topic. Our experiments show that the ATD method performs better than the traditional eigenvector method in terms of computation time and topic discovery quality.
Montesi, M.; Navarrete, T.: Classifying web genres in context : A case study documenting the web genres used by a software engineer (2008) 0.00
```
0.0037028994 = product of:
  0.022217397 = sum of:
    0.022217397 = weight(_text_:internet in 2100) [ClassicSimilarity], result of:
      0.022217397 = score(doc=2100,freq=2.0), product of:
        0.11352337 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.038453303 = queryNorm
        0.1957077 = fieldWeight in 2100, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.046875 = fieldNorm(doc=2100)
  0.16666667 = coord(1/6)
```
Abstract

This case study analyzes the Internet-based resources that a software engineer uses in his daily work. Methodologically, we studied the web browser history of the participant, classifying all the web pages he had seen over a period of 12 days into web genres. We interviewed him before and after the analysis of the web browser history. In the first interview, he spoke about his general information behavior; in the second, he commented on each web genre, explaining why and how he used them. As a result, three approaches allow us to describe the set of 23 web genres obtained: (a) the purposes they serve for the participant; (b) the role they play in the various work and search phases; (c) and the way they are used in combination with each other. Further observations concern the way the participant assesses quality of web-based resources, and his information behavior as a software engineer.
Li, T.; Zhu, S.; Ogihara, M.: Text categorization via generalized discriminant analysis (2008) 0.00
```
0.00308575 = product of:
  0.018514499 = sum of:
    0.018514499 = weight(_text_:internet in 2119) [ClassicSimilarity], result of:
      0.018514499 = score(doc=2119,freq=2.0), product of:
        0.11352337 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.038453303 = queryNorm
        0.16308975 = fieldWeight in 2119, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2119)
  0.16666667 = coord(1/6)
```
Abstract

Text categorization is an important research area and has been receiving much attention due to the growth of the on-line information and of Internet. Automated text categorization is generally cast as a multi-class classification problem. Much of previous work focused on binary document classification problems. Support vector machines (SVMs) excel in binary classification, but the elegant theory behind large-margin hyperplane cannot be easily extended to multi-class text classification. In addition, the training time and scaling are also important concerns. On the other hand, other techniques naturally extensible to handle multi-class classification are generally not as accurate as SVM. This paper presents a simple and efficient solution to multi-class text categorization. Classification problems are first formulated as optimization via discriminant analysis. Text categorization is then cast as the problem of finding coordinate transformations that reflects the inherent similarity from the data. While most of the previous approaches decompose a multi-class classification problem into multiple independent binary classification tasks, the proposed approach enables direct multi-class classification. By using generalized singular value decomposition (GSVD), a coordinate transformation that reflects the inherent class structure indicated by the generalized singular values is identified. Extensive experiments demonstrate the efficiency and effectiveness of the proposed approach.
Pong, J.Y.-H.; Kwok, R.C.-W.; Lau, R.Y.-K.; Hao, J.-X.; Wong, P.C.-C.: ¬A comparative study of two automatic document classification methods in a library setting (2008) 0.00
```
0.00308575 = product of:
  0.018514499 = sum of:
    0.018514499 = weight(_text_:internet in 2532) [ClassicSimilarity], result of:
      0.018514499 = score(doc=2532,freq=2.0), product of:
        0.11352337 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.038453303 = queryNorm
        0.16308975 = fieldWeight in 2532, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2532)
  0.16666667 = coord(1/6)
```
Abstract

In current library practice, trained human experts usually carry out document cataloguing and indexing based on a manual approach. With the explosive growth in the number of electronic documents available on the Internet and digital libraries, it is increasingly difficult for library practitioners to categorize both electronic documents and traditional library materials using just a manual approach. To improve the effectiveness and efficiency of document categorization at the library setting, more in-depth studies of using automatic document classification methods to categorize library items are required. Machine learning research has advanced rapidly in recent years. However, applying machine learning techniques to improve library practice is still a relatively unexplored area. This paper illustrates the design and development of a machine learning based automatic document classification system to alleviate the manual categorization problem encountered within the library setting. Two supervised machine learning algorithms have been tested. Our empirical tests show that supervised machine learning algorithms in general, and the k-nearest neighbours (KNN) algorithm in particular, can be used to develop an effective document classification system to enhance current library practice. Moreover, some concrete recommendations regarding how to practically apply the KNN algorithm to develop automatic document classification in a library setting are made. To our best knowledge, this is the first in-depth study of applying the KNN algorithm to automatic document classification based on the widely used LCC classification scheme adopted by many large libraries.

HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.00

0.0028943846 = product of:
  0.017366307 = sum of:
    0.017366307 = product of:
      0.05209892 = sum of:
        0.05209892 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
          0.05209892 = score(doc=2748,freq=2.0), product of:
            0.13465692 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038453303 = queryNorm
            0.38690117 = fieldWeight in 2748, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2748)
      0.33333334 = coord(1/3)
  0.16666667 = coord(1/6)

Date: 1. 2.2016 18:25:22

Schek, M.: Automatische Klassifizierung in Erschließung und Recherche eines Pressearchivs (2006) 0.00
```
0.0024685997 = product of:
  0.014811598 = sum of:
    0.014811598 = weight(_text_:internet in 6043) [ClassicSimilarity], result of:
      0.014811598 = score(doc=6043,freq=2.0), product of:
        0.11352337 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.038453303 = queryNorm
        0.1304718 = fieldWeight in 6043, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.03125 = fieldNorm(doc=6043)
  0.16666667 = coord(1/6)
```
Abstract

Die Süddeutsche Zeitung (SZ) verfügt seit ihrer Gründung 1945 über ein Pressearchiv, das die Texte der eigenen Redakteure und zahlreicher nationaler und internationaler Publikationen dokumentiert und für Recherchezwecke bereitstellt. Die DIZ-Pressedatenbank (www.medienport.de) ermöglicht die browserbasierte Recherche für Redakteure und externe Kunden im Intra- und Internet und die kundenspezifischen Content Feeds für Verlage, Rundfunkanstalten und Portale. Die DIZ-Pressedatenbank enthält z. Zt. 7,8 Millionen Artikel, die jeweils als HTML oder PDF abrufbar sind. Täglich kommen ca. 3.500 Artikel hinzu, von denen ca. 1.000 durch Dokumentare inhaltlich erschlossen werden. Die Informationserschließung erfolgt im DIZ nicht durch die Vergabe von Schlagwörtern am Dokument, sondern durch die Verlinkung der Artikel mit "virtuellen Mappen", den Dossiers. Insgesamt enthält die DIZ-Pressedatenbank ca. 90.000 Dossiers, die untereinander zum "DIZ-Wissensnetz" verlinkt sind. DIZ definiert das Wissensnetz als Alleinstellungsmerkmal und wendet beträchtliche personelle Ressourcen für die Aktualisierung und Qualitätssicherung der Dossiers auf. Im Zuge der Medienkrise mussten sich DIZ der Herausforderung stellen, bei sinkenden Lektoratskapazitäten die Qualität der Informationserschließung im Input zu erhalten. Auf der Outputseite gilt es, eine anspruchsvolle Zielgruppe - u.a. die Redakteure der Süddeutschen Zeitung - passgenau und zeitnah mit den Informationen zu versorgen, die sie für ihre tägliche Arbeit benötigt. Bezogen auf die Ausgangssituation in der Dokumentation der Süddeutschen Zeitung identifizierte DIZ drei Ansatzpunkte, wie die Aufwände auf der Inputseite (Lektorat) zu optimieren sind und gleichzeitig auf der Outputseite (Recherche) das Wissensnetz besser zu vermarkten ist: - (Teil-)Automatische Klassifizierung von Pressetexten (Vorschlagwesen) - Visualisierung des Wissensnetzes - Neue Retrievalmöglichkeiten (Ähnlichkeitssuche, Clustering) Im Bereich "Visualisierung" setzt DIZ auf den Net-Navigator von intelligent views, eine interaktive Visualisierung allgemeiner Graphen, basierend auf einem physikalischen Modell. In den Bereichen automatische Klassifizierung, Ähnlichkeitssuche und Clustering hat DIZ sich für das Produkt nextBot der Firma Brainbot entschieden.

Savic, D.: Designing an expert system for classifying office documents (1994) 0.00

0.002336526 = product of:
  0.014019156 = sum of:
    0.014019156 = product of:
      0.042057466 = sum of:
        0.042057466 = weight(_text_:29 in 2655) [ClassicSimilarity], result of:
          0.042057466 = score(doc=2655,freq=2.0), product of:
            0.13526669 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.038453303 = queryNorm
            0.31092256 = fieldWeight in 2655, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0625 = fieldNorm(doc=2655)
      0.33333334 = coord(1/3)
  0.16666667 = coord(1/6)

Source: Records management quarterly. 28(1994) no.3, S.20-29

Schek, M.: Automatische Klassifizierung und Visualisierung im Archiv der Süddeutschen Zeitung (2005) 0.00
```
0.0021600248 = product of:
  0.012960148 = sum of:
    0.012960148 = weight(_text_:internet in 4884) [ClassicSimilarity], result of:
      0.012960148 = score(doc=4884,freq=2.0), product of:
        0.11352337 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.038453303 = queryNorm
        0.11416282 = fieldWeight in 4884, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.02734375 = fieldNorm(doc=4884)
  0.16666667 = coord(1/6)
```
Abstract

Die Süddeutsche Zeitung (SZ) verfügt seit ihrer Gründung 1945 über ein Pressearchiv, das die Texte der eigenen Redakteure und zahlreicher nationaler und internationaler Publikationen dokumentiert und auf Anfrage für Recherchezwecke bereitstellt. Die Einführung der EDV begann Anfang der 90er Jahre mit der digitalen Speicherung zunächst der SZ-Daten. Die technische Weiterentwicklung ab Mitte der 90er Jahre diente zwei Zielen: (1) dem vollständigen Wechsel von der Papierablage zur digitalen Speicherung und (2) dem Wandel von einer verlagsinternen Dokumentations- und Auskunftsstelle zu einem auch auf dem Markt vertretenen Informationsdienstleister. Um die dabei entstehenden Aufwände zu verteilen und gleichzeitig Synergieeffekte zwischen inhaltlich verwandten Archiven zu erschließen, gründeten der Süddeutsche Verlag und der Bayerische Rundfunk im Jahr 1998 die Dokumentations- und Informationszentrum (DIZ) München GmbH, in der die Pressearchive der beiden Gesellschafter und das Bildarchiv des Süddeutschen Verlags zusammengeführt wurden. Die gemeinsam entwickelte Pressedatenbank ermöglichte das standortübergreifende Lektorat, die browserbasierte Recherche für Redakteure und externe Kunden im Intraund Internet und die kundenspezifischen Content Feeds für Verlage, Rundfunkanstalten und Portale. Die DIZPressedatenbank enthält zur Zeit 6,9 Millionen Artikel, die jeweils als HTML oder PDF abrufbar sind. Täglich kommen ca. 3.500 Artikel hinzu, von denen ca. 1.000 lektoriert werden. Das Lektorat erfolgt im DIZ nicht durch die Vergabe von Schlagwörtern am Dokument, sondern durch die Verlinkung der Artikel mit "virtuellen Mappen", den Dossiers. Diese stellen die elektronische Repräsentation einer Papiermappe dar und sind das zentrale Erschließungsobjekt. Im Gegensatz zu statischen Klassifikationssystemen ist die Dossierstruktur dynamisch und aufkommensabhängig, d.h. neue Dossiers werden hauptsächlich anhand der aktuellen Berichterstattung erstellt. Insgesamt enthält die DIZ-Pressedatenbank ca. 90.000 Dossiers, davon sind 68.000 Sachthemen (Topics), Personen und Institutionen. Die Dossiers sind untereinander zum "DIZ-Wissensnetz" verlinkt.

Savic, D.: Automatic classification of office documents : review of available methods and techniques (1995) 0.00

0.0020444603 = product of:
  0.012266762 = sum of:
    0.012266762 = product of:
      0.036800284 = sum of:
        0.036800284 = weight(_text_:29 in 2219) [ClassicSimilarity], result of:
          0.036800284 = score(doc=2219,freq=2.0), product of:
            0.13526669 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.038453303 = queryNorm
            0.27205724 = fieldWeight in 2219, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2219)
      0.33333334 = coord(1/3)
  0.16666667 = coord(1/6)

Source: Records management quarterly. 29(1995) no.4, S.3-18

Ruocco, A.S.; Frieder, O.: Clustering and classification of large document bases in a parallel environment (1997) 0.00

0.0020444603 = product of:
  0.012266762 = sum of:
    0.012266762 = product of:
      0.036800284 = sum of:
        0.036800284 = weight(_text_:29 in 1661) [ClassicSimilarity], result of:
          0.036800284 = score(doc=1661,freq=2.0), product of:
            0.13526669 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.038453303 = queryNorm
            0.27205724 = fieldWeight in 1661, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1661)
      0.33333334 = coord(1/3)
  0.16666667 = coord(1/6)

Date: 29. 7.1998 17:45:02

Ruiz, M.E.; Srinivasan, P.: Combining machine learning and hierarchical indexing structures for text categorization (2001) 0.00

0.0020444603 = product of:
  0.012266762 = sum of:
    0.012266762 = product of:
      0.036800284 = sum of:
        0.036800284 = weight(_text_:29 in 1595) [ClassicSimilarity], result of:
          0.036800284 = score(doc=1595,freq=2.0), product of:
            0.13526669 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.038453303 = queryNorm
            0.27205724 = fieldWeight in 1595, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1595)
      0.33333334 = coord(1/3)
  0.16666667 = coord(1/6)

Date: 11. 5.2003 18:29:44

Bock, H.-H.: Datenanalyse zur Strukturierung und Ordnung von Information (1989) 0.00

0.0020260692 = product of:
  0.012156415 = sum of:
    0.012156415 = product of:
      0.036469243 = sum of:
        0.036469243 = weight(_text_:22 in 141) [ClassicSimilarity], result of:
          0.036469243 = score(doc=141,freq=2.0), product of:
            0.13465692 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038453303 = queryNorm
            0.2708308 = fieldWeight in 141, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=141)
      0.33333334 = coord(1/3)
  0.16666667 = coord(1/6)

Pages: S.1-22

Dubin, D.: Dimensions and discriminability (1998) 0.00

0.0020260692 = product of:
  0.012156415 = sum of:
    0.012156415 = product of:
      0.036469243 = sum of:
        0.036469243 = weight(_text_:22 in 2338) [ClassicSimilarity], result of:
          0.036469243 = score(doc=2338,freq=2.0), product of:
            0.13465692 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038453303 = queryNorm
            0.2708308 = fieldWeight in 2338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2338)
      0.33333334 = coord(1/3)
  0.16666667 = coord(1/6)

Date: 22. 9.1997 19:16:05

Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998) 0.00

0.0020260692 = product of:
  0.012156415 = sum of:
    0.012156415 = product of:
      0.036469243 = sum of:
        0.036469243 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
          0.036469243 = score(doc=1673,freq=2.0), product of:
            0.13465692 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038453303 = queryNorm
            0.2708308 = fieldWeight in 1673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1673)
      0.33333334 = coord(1/3)
  0.16666667 = coord(1/6)

Date: 1. 8.1996 22:08:06

Yoon, Y.; Lee, C.; Lee, G.G.: ¬An effective procedure for constructing a hierarchical text classification system (2006) 0.00

0.0020260692 = product of:
  0.012156415 = sum of:
    0.012156415 = product of:
      0.036469243 = sum of:
        0.036469243 = weight(_text_:22 in 5273) [ClassicSimilarity], result of:
          0.036469243 = score(doc=5273,freq=2.0), product of:
            0.13465692 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038453303 = queryNorm
            0.2708308 = fieldWeight in 5273, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5273)
      0.33333334 = coord(1/3)
  0.16666667 = coord(1/6)

Date: 22. 7.2006 16:24:52

Yi, K.: Automatic text classification using library classification schemes : trends, issues and challenges (2007) 0.00

0.0020260692 = product of:
  0.012156415 = sum of:
    0.012156415 = product of:
      0.036469243 = sum of:
        0.036469243 = weight(_text_:22 in 2560) [ClassicSimilarity], result of:
          0.036469243 = score(doc=2560,freq=2.0), product of:
            0.13465692 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038453303 = queryNorm
            0.2708308 = fieldWeight in 2560, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2560)
      0.33333334 = coord(1/3)
  0.16666667 = coord(1/6)

Date: 22. 9.2008 18:31:54

Dolin, R.; Agrawal, D.; El Abbadi, A.; Pearlman, J.: Using automated classification for summarizing and selecting heterogeneous information sources (1998) 0.00
```
0.0018514497 = product of:
  0.011108698 = sum of:
    0.011108698 = weight(_text_:internet in 1253) [ClassicSimilarity], result of:
      0.011108698 = score(doc=1253,freq=2.0), product of:
        0.11352337 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.038453303 = queryNorm
        0.09785385 = fieldWeight in 1253, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.0234375 = fieldNorm(doc=1253)
  0.16666667 = coord(1/6)
```
Abstract

Information retrieval over the Internet increasingly requires the filtering of thousands of heterogeneous information sources. Important sources of information include not only traditional databases with structured data and queries, but also increasing numbers of non-traditional, semi- or unstructured collections such as Web sites, FTP archives, etc. As the number and variability of sources increases, new ways of automatically summarizing, discovering, and selecting collections relevant to a user's query are needed. One such method involves the use of classification schemes, such as the Library of Congress Classification (LCC), within which a collection may be represented based on its content, irrespective of the structure of the actual data or documents. For such a system to be useful in a large-scale distributed environment, it must be easy to use for both collection managers and users. As a result, it must be possible to classify documents automatically within a classification scheme. Furthermore, there must be a straightforward and intuitive interface with which the user may use the scheme to assist in information retrieval (IR). Our work with the Alexandria Digital Library (ADL) Project focuses on geo-referenced information, whether text, maps, aerial photographs, or satellite images. As a result, we have emphasized techniques which work with both text and non-text, such as combined textual and graphical queries, multi-dimensional indexing, and IR methods which are not solely dependent on words or phrases. Part of this work involves locating relevant online sources of information. In particular, we have designed and are currently testing aspects of an architecture, Pharos, which we believe will scale up to 1.000.000 heterogeneous sources. Pharos accommodates heterogeneity in content and format, both among multiple sources as well as within a single source. That is, we consider sources to include Web sites, FTP archives, newsgroups, and full digital libraries; all of these systems can include a wide variety of content and multimedia data formats. Pharos is based on the use of hierarchical classification schemes. These include not only well-known 'subject' (or 'concept') based schemes such as the Dewey Decimal System and the LCC, but also, for example, geographic classifications, which might be constructed as layers of smaller and smaller hierarchical longitude/latitude boxes. Pharos is designed to work with sophisticated queries which utilize subjects, geographical locations, temporal specifications, and other types of information domains. The Pharos architecture requires that hierarchically structured collection metadata be extracted so that it can be partitioned in such a way as to greatly enhance scalability. Automated classification is important to Pharos because it allows information sources to extract the requisite collection metadata automatically that must be distributed.

Drori, O.; Alon, N.: Using document classification for displaying search results (2003) 0.00

0.0017523945 = product of:
  0.010514366 = sum of:
    0.010514366 = product of:
      0.0315431 = sum of:
        0.0315431 = weight(_text_:29 in 1565) [ClassicSimilarity], result of:
          0.0315431 = score(doc=1565,freq=2.0), product of:
            0.13526669 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.038453303 = queryNorm
            0.23319192 = fieldWeight in 1565, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.046875 = fieldNorm(doc=1565)
      0.33333334 = coord(1/3)
  0.16666667 = coord(1/6)

Source: Journal of information science. 29(2003) no.2, S.97-106

Search (57 results, page 2 of 3)

Authors

Years

Languages

Themes