Search (174 results, page 1 of 9)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.31

0.3089268 = product of:
  0.6178536 = sum of:
    0.023380058 = product of:
      0.11690029 = sum of:
        0.11690029 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.11690029 = score(doc=562,freq=2.0), product of:
            0.20800096 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.02453417 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.2 = coord(1/5)
    0.11690029 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.11690029 = score(doc=562,freq=2.0), product of:
        0.20800096 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.02453417 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.11690029 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.11690029 = score(doc=562,freq=2.0), product of:
        0.20800096 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.02453417 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.11690029 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.11690029 = score(doc=562,freq=2.0), product of:
        0.20800096 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.02453417 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.11690029 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.11690029 = score(doc=562,freq=2.0), product of:
        0.20800096 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.02453417 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.11690029 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.11690029 = score(doc=562,freq=2.0), product of:
        0.20800096 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.02453417 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.009972124 = product of:
      0.019944249 = sum of:
        0.019944249 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.019944249 = score(doc=562,freq=2.0), product of:
            0.085914485 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.02453417 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
  0.5 = coord(7/14)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Yoon, Y.; Lee, C.; Lee, G.G.: ¬An effective procedure for constructing a hierarchical text classification system (2006) 0.01

0.011812743 = product of:
  0.055126134 = sum of:
    0.037644558 = weight(_text_:system in 5273) [ClassicSimilarity], result of:
      0.037644558 = score(doc=5273,freq=8.0), product of:
        0.07727166 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.02453417 = queryNorm
        0.4871716 = fieldWeight in 5273, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5273)
    0.0058474317 = weight(_text_:information in 5273) [ClassicSimilarity], result of:
      0.0058474317 = score(doc=5273,freq=2.0), product of:
        0.04306919 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02453417 = queryNorm
        0.13576832 = fieldWeight in 5273, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5273)
    0.011634145 = product of:
      0.02326829 = sum of:
        0.02326829 = weight(_text_:22 in 5273) [ClassicSimilarity], result of:
          0.02326829 = score(doc=5273,freq=2.0), product of:
            0.085914485 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.02453417 = queryNorm
            0.2708308 = fieldWeight in 5273, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5273)
      0.5 = coord(1/2)
  0.21428572 = coord(3/14)

Abstract: In text categorization tasks, classification on some class hierarchies has better results than in cases without the hierarchy. Currently, because a large number of documents are divided into several subgroups in a hierarchy, we can appropriately use a hierarchical classification method. However, we have no systematic method to build a hierarchical classification system that performs well with large collections of practical data. In this article, we introduce a new evaluation scheme for internal node classifiers, which can be used effectively to develop a hierarchical classification system. We also show that our method for constructing the hierarchical classification system is very effective, especially for the task of constructing classifiers applied to hierarchy tree with a lot of levels.
Date: 22. 7.2006 16:24:52
Source: Journal of the American Society for Information Science and Technology. 57(2006) no.3, S.431-442

AlQenaei, Z.M.; Monarchi, D.E.: ¬The use of learning techniques to analyze the results of a manual classification system (2016) 0.01

0.011750447 = product of:
  0.054835416 = sum of:
    0.030062785 = weight(_text_:system in 2836) [ClassicSimilarity], result of:
      0.030062785 = score(doc=2836,freq=10.0), product of:
        0.07727166 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.02453417 = queryNorm
        0.38905317 = fieldWeight in 2836, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2836)
    0.0072343214 = weight(_text_:information in 2836) [ClassicSimilarity], result of:
      0.0072343214 = score(doc=2836,freq=6.0), product of:
        0.04306919 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02453417 = queryNorm
        0.16796975 = fieldWeight in 2836, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2836)
    0.017538311 = weight(_text_:retrieval in 2836) [ClassicSimilarity], result of:
      0.017538311 = score(doc=2836,freq=4.0), product of:
        0.07421378 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.02453417 = queryNorm
        0.23632148 = fieldWeight in 2836, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2836)
  0.21428572 = coord(3/14)

Abstract: Classification is the process of assigning objects to pre-defined classes based on observations or characteristics of those objects, and there are many approaches to performing this task. The overall objective of this study is to demonstrate the use of two learning techniques to analyze the results of a manual classification system. Our sample consisted of 1,026 documents, from the ACM Computing Classification System, classified by their authors as belonging to one of the groups of the classification system: "H.3 Information Storage and Retrieval." A singular value decomposition of the documents' weighted term-frequency matrix was used to represent each document in a 50-dimensional vector space. The analysis of the representation using both supervised (decision tree) and unsupervised (clustering) techniques suggests that two pairs of the ACM classes are closely related to each other in the vector space. Class 1 (Content Analysis and Indexing) is closely related to Class 3 (Information Search and Retrieval), and Class 4 (Systems and Software) is closely related to Class 5 (Online Information Services). Further analysis was performed to test the diffusion of the words in the two classes using both cosine and Euclidean distance.
Object: Computing Classification System

Reiner, U.: Automatische DDC-Klassifizierung von bibliografischen Titeldatensätzen (2009) 0.01

0.010666414 = product of:
  0.0497766 = sum of:
    0.008353474 = weight(_text_:information in 611) [ClassicSimilarity], result of:
      0.008353474 = score(doc=611,freq=2.0), product of:
        0.04306919 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02453417 = queryNorm
        0.19395474 = fieldWeight in 611, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.078125 = fieldNorm(doc=611)
    0.024802918 = weight(_text_:retrieval in 611) [ClassicSimilarity], result of:
      0.024802918 = score(doc=611,freq=2.0), product of:
        0.07421378 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.02453417 = queryNorm
        0.33420905 = fieldWeight in 611, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.078125 = fieldNorm(doc=611)
    0.016620208 = product of:
      0.033240415 = sum of:
        0.033240415 = weight(_text_:22 in 611) [ClassicSimilarity], result of:
          0.033240415 = score(doc=611,freq=2.0), product of:
            0.085914485 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.02453417 = queryNorm
            0.38690117 = fieldWeight in 611, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=611)
      0.5 = coord(1/2)
  0.21428572 = coord(3/14)

Content: Präsentation zum Vortrag anlässlich des 98. Deutscher Bibliothekartag in Erfurt: Ein neuer Blick auf Bibliotheken; TK10: Information erschließen und recherchieren Inhalte erschließen - mit neuen Tools
Date: 22. 8.2009 12:54:24
Theme: Klassifikationssysteme im Online-Retrieval

Dolin, R.; Agrawal, D.; El Abbadi, A.; Pearlman, J.: Using automated classification for summarizing and selecting heterogeneous information sources (1998) 0.01

0.010368582 = product of:
  0.048386715 = sum of:
    0.016133383 = weight(_text_:system in 316) [ClassicSimilarity], result of:
      0.016133383 = score(doc=316,freq=2.0), product of:
        0.07727166 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.02453417 = queryNorm
        0.20878783 = fieldWeight in 316, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.046875 = fieldNorm(doc=316)
    0.011207362 = weight(_text_:information in 316) [ClassicSimilarity], result of:
      0.011207362 = score(doc=316,freq=10.0), product of:
        0.04306919 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02453417 = queryNorm
        0.2602176 = fieldWeight in 316, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=316)
    0.021045974 = weight(_text_:retrieval in 316) [ClassicSimilarity], result of:
      0.021045974 = score(doc=316,freq=4.0), product of:
        0.07421378 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.02453417 = queryNorm
        0.2835858 = fieldWeight in 316, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=316)
  0.21428572 = coord(3/14)

Abstract: Information retrieval over the Internet increasingly requires the filtering of thousands of heterogeneous information sources. Important sources of information include not only traditional databases with structured data and queries, but also increasing numbers of non-traditional, semi- or unstructured collections such as Web sites, FTP archives, etc. As the number and variability of sources increases, new ways of automatically summarizing, discovering, and selecting collections relevant to a user's query are needed. One such method involves the use of classification schemes, such as the Library of Congress Classification (LCC) [10], within which a collection may be represented based on its content, irrespective of the structure of the actual data or documents. For such a system to be useful in a large-scale distributed environment, it must be easy to use for both collection managers and users. As a result, it must be possible to classify documents automatically within a classification scheme. Furthermore, there must be a straightforward and intuitive interface with which the user may use the scheme to assist in information retrieval (IR).

Yao, H.; Etzkorn, L.H.; Virani, S.: Automated classification and retrieval of reusable software components (2008) 0.01
```
0.01028422 = product of:
  0.047993027 = sum of:
    0.019013375 = weight(_text_:system in 1382) [ClassicSimilarity], result of:
      0.019013375 = score(doc=1382,freq=4.0), product of:
        0.07727166 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.02453417 = queryNorm
        0.24605882 = fieldWeight in 1382, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1382)
    0.004176737 = weight(_text_:information in 1382) [ClassicSimilarity], result of:
      0.004176737 = score(doc=1382,freq=2.0), product of:
        0.04306919 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02453417 = queryNorm
        0.09697737 = fieldWeight in 1382, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1382)
    0.024802918 = weight(_text_:retrieval in 1382) [ClassicSimilarity], result of:
      0.024802918 = score(doc=1382,freq=8.0), product of:
        0.07421378 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.02453417 = queryNorm
        0.33420905 = fieldWeight in 1382, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1382)
  0.21428572 = coord(3/14)
```
Abstract

The authors describe their research which improves software reuse by using an automated approach to semantically search for and retrieve reusable software components in large software component repositories and on the World Wide Web (WWW). Using automation and smart (semantic) techniques, their approach speeds up the search and retrieval of reusable software components, while retaining good accuracy, and therefore improves the affordability of software reuse. A program understanding of software components and natural language understanding of user queries was employed. Then the software component descriptions were compared by matching the resulting semantic representations of the user queries to the semantic representations of the software components to search for software components that best match the user queries. A proof of concept system was developed to test the authors' approach. The results of this proof of concept system were compared to human experts, and statistical analysis was performed on the collected experimental data. The results from these experiments demonstrate that this automated semantic-based approach for software reusable component classification and retrieval is successful when compared to the labor-intensive results from the experts, thus showing that this approach can significantly benefit software reuse classification and retrieval.

Source

Journal of the American Society for Information Science and Technology. 59(2008) no.4, S.613-627

Koch, T.: Experiments with automatic classification of WAIS databases and indexing of WWW : some results from the Nordic WAIS/WWW project (1994) 0.01

0.0095258225 = product of:
  0.044453837 = sum of:
    0.018822279 = weight(_text_:system in 7209) [ClassicSimilarity], result of:
      0.018822279 = score(doc=7209,freq=2.0), product of:
        0.07727166 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.02453417 = queryNorm
        0.2435858 = fieldWeight in 7209, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7209)
    0.008269517 = weight(_text_:information in 7209) [ClassicSimilarity], result of:
      0.008269517 = score(doc=7209,freq=4.0), product of:
        0.04306919 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02453417 = queryNorm
        0.1920054 = fieldWeight in 7209, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7209)
    0.017362041 = weight(_text_:retrieval in 7209) [ClassicSimilarity], result of:
      0.017362041 = score(doc=7209,freq=2.0), product of:
        0.07421378 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.02453417 = queryNorm
        0.23394634 = fieldWeight in 7209, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7209)
  0.21428572 = coord(3/14)

Abstract: The Nordic WAIS/WWW project sponsored by NORDINFO is a joint project between Lund University Library and the National Technological Library of Denmark. It aims to improve the existing networked information discovery and retrieval tools Wide Area Information System (WAIS) and World Wide Web (WWW), and to move towards unifying WWW and WAIS. Details current results focusing on the WAIS side of the project. Describes research into automatic indexing and classification of WAIS sources, development of an orientation tool for WAIS, and development of a WAIS index of WWW resources

Mengle, S.; Goharian, N.: Passage detection using text classification (2009) 0.01
```
0.009513018 = product of:
  0.044394083 = sum of:
    0.008353474 = weight(_text_:information in 2765) [ClassicSimilarity], result of:
      0.008353474 = score(doc=2765,freq=8.0), product of:
        0.04306919 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02453417 = queryNorm
        0.19395474 = fieldWeight in 2765, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2765)
    0.027730504 = weight(_text_:retrieval in 2765) [ClassicSimilarity], result of:
      0.027730504 = score(doc=2765,freq=10.0), product of:
        0.07421378 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.02453417 = queryNorm
        0.37365708 = fieldWeight in 2765, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2765)
    0.008310104 = product of:
      0.016620208 = sum of:
        0.016620208 = weight(_text_:22 in 2765) [ClassicSimilarity], result of:
          0.016620208 = score(doc=2765,freq=2.0), product of:
            0.085914485 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.02453417 = queryNorm
            0.19345059 = fieldWeight in 2765, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2765)
      0.5 = coord(1/2)
  0.21428572 = coord(3/14)
```
Abstract

Passages can be hidden within a text to circumvent their disallowed transfer. Such release of compartmentalized information is of concern to all corporate and governmental organizations. Passage retrieval is well studied; we posit, however, that passage detection is not. Passage retrieval is the determination of the degree of relevance of blocks of text, namely passages, comprising a document. Rather than determining the relevance of a document in its entirety, passage retrieval determines the relevance of the individual passages. As such, modified traditional information-retrieval techniques compare terms found in user queries with the individual passages to determine a similarity score for passages of interest. In passage detection, passages are classified into predetermined categories. More often than not, passage detection techniques are deployed to detect hidden paragraphs in documents. That is, to hide information, documents are injected with hidden text into passages. Rather than matching query terms against passages to determine their relevance, using text-mining techniques, the passages are classified. Those documents with hidden passages are defined as infected. Thus, simply stated, passage retrieval is the search for passages relevant to a user query, while passage detection is the classification of passages. That is, in passage detection, passages are labeled with one or more categories from a set of predetermined categories. We present a keyword-based dynamic passage approach (KDP) and demonstrate that KDP outperforms statistically significantly (99% confidence) the other document-splitting approaches by 12% to 18% in the passage detection and passage category-prediction tasks. Furthermore, we evaluate the effects of the feature selection, passage length, ambiguous passages, and finally training-data category distribution on passage-detection accuracy.

Date

22. 3.2009 19:14:43

Source

Journal of the American Society for Information Science and Technology. 60(2009) no.4, S.814-825

Meder, N.: Artificial intelligence as a tool of classification, or: the network of language games as cognitive paradigm (1985) 0.01

0.009006804 = product of:
  0.04203175 = sum of:
    0.018822279 = weight(_text_:system in 7694) [ClassicSimilarity], result of:
      0.018822279 = score(doc=7694,freq=2.0), product of:
        0.07727166 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.02453417 = queryNorm
        0.2435858 = fieldWeight in 7694, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7694)
    0.0058474317 = weight(_text_:information in 7694) [ClassicSimilarity], result of:
      0.0058474317 = score(doc=7694,freq=2.0), product of:
        0.04306919 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02453417 = queryNorm
        0.13576832 = fieldWeight in 7694, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7694)
    0.017362041 = weight(_text_:retrieval in 7694) [ClassicSimilarity], result of:
      0.017362041 = score(doc=7694,freq=2.0), product of:
        0.07421378 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.02453417 = queryNorm
        0.23394634 = fieldWeight in 7694, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7694)
  0.21428572 = coord(3/14)

Abstract: It is shown that the cognitive paradigm may be an orientation mark for automatic classification. On the basis of research in Artificial Intelligence, the cognitive paradigm - as opposed to the behavioristic paradigm - was developed as a multiplicity of competitive world-views. This is the thesis of DeMey in his book "The cognitive paradigm". Multiplicity in a loosely-coupled network of cognitive knots is also the principle of dynamic restlessness. In competititon with cognitive views, a classification system that follows various models may learn by concrete information retrieval. During his actions the user builds implicitly a new classification order

Sebastiani, F.: Classification of text, automatic (2006) 0.01

0.009006804 = product of:
  0.04203175 = sum of:
    0.018822279 = weight(_text_:system in 5003) [ClassicSimilarity], result of:
      0.018822279 = score(doc=5003,freq=2.0), product of:
        0.07727166 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.02453417 = queryNorm
        0.2435858 = fieldWeight in 5003, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5003)
    0.0058474317 = weight(_text_:information in 5003) [ClassicSimilarity], result of:
      0.0058474317 = score(doc=5003,freq=2.0), product of:
        0.04306919 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02453417 = queryNorm
        0.13576832 = fieldWeight in 5003, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5003)
    0.017362041 = weight(_text_:retrieval in 5003) [ClassicSimilarity], result of:
      0.017362041 = score(doc=5003,freq=2.0), product of:
        0.07421378 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.02453417 = queryNorm
        0.23394634 = fieldWeight in 5003, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5003)
  0.21428572 = coord(3/14)

Abstract: Automatic text classification (ATC) is a discipline at the crossroads of information retrieval (IR), machine learning (ML), and computational linguistics (CL), and consists in the realization of text classifiers, i.e. software systems capable of assigning texts to one or more categories, or classes, from a predefined set. Applications range from the automated indexing of scientific articles, to e-mail routing, spam filtering, authorship attribution, and automated survey coding. This article will focus on the ML approach to ATC, whereby a software system (called the learner) automatically builds a classifier for the categories of interest by generalizing from a "training" set of pre-classified texts.

Schiminovich, S.: Automatic classification and retrieval of documents by means of a bibliographic pattern discovery algorithm (1971) 0.01

0.008686019 = product of:
  0.060802132 = sum of:
    0.011694863 = weight(_text_:information in 4846) [ClassicSimilarity], result of:
      0.011694863 = score(doc=4846,freq=2.0), product of:
        0.04306919 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02453417 = queryNorm
        0.27153665 = fieldWeight in 4846, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.109375 = fieldNorm(doc=4846)
    0.04910727 = weight(_text_:retrieval in 4846) [ClassicSimilarity], result of:
      0.04910727 = score(doc=4846,freq=4.0), product of:
        0.07421378 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.02453417 = queryNorm
        0.6617001 = fieldWeight in 4846, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.109375 = fieldNorm(doc=4846)
  0.14285715 = coord(2/14)

Source: Information storage and retrieval. 6(1971), S.417-435

Panyr, J.: Automatische Klassifikation und Information Retrieval : Anwendung und Entwicklung komplexer Verfahren in Information-Retrieval-Systemen und ihre Evaluierung (1986) 0.01

0.008493474 = product of:
  0.059454314 = sum of:
    0.01736237 = weight(_text_:information in 32) [ClassicSimilarity], result of:
      0.01736237 = score(doc=32,freq=6.0), product of:
        0.04306919 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02453417 = queryNorm
        0.40312737 = fieldWeight in 32, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.09375 = fieldNorm(doc=32)
    0.042091947 = weight(_text_:retrieval in 32) [ClassicSimilarity], result of:
      0.042091947 = score(doc=32,freq=4.0), product of:
        0.07421378 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.02453417 = queryNorm
        0.5671716 = fieldWeight in 32, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.09375 = fieldNorm(doc=32)
  0.14285715 = coord(2/14)

Series: Sprache und Information; Bd.12

Search Engines and Beyond : Developing efficient knowledge management systems, April 19-20 1999, Boston, Mass (1999) 0.01
```
0.008071164 = product of:
  0.03766543 = sum of:
    0.010755588 = weight(_text_:system in 2596) [ClassicSimilarity], result of:
      0.010755588 = score(doc=2596,freq=2.0), product of:
        0.07727166 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.02453417 = queryNorm
        0.13919188 = fieldWeight in 2596, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.03125 = fieldNorm(doc=2596)
    0.0047254385 = weight(_text_:information in 2596) [ClassicSimilarity], result of:
      0.0047254385 = score(doc=2596,freq=4.0), product of:
        0.04306919 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02453417 = queryNorm
        0.10971737 = fieldWeight in 2596, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03125 = fieldNorm(doc=2596)
    0.022184404 = weight(_text_:retrieval in 2596) [ClassicSimilarity], result of:
      0.022184404 = score(doc=2596,freq=10.0), product of:
        0.07421378 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.02453417 = queryNorm
        0.29892567 = fieldWeight in 2596, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.03125 = fieldNorm(doc=2596)
  0.21428572 = coord(3/14)
```
Content

Ramana Rao (Inxight, Palo Alto, CA) 7 ± 2 Insights on achieving Effective Information Access Session One: Updates and a twelve month perspective Danny Sullivan (Search Engine Watch, US / England) Portalization and other search trends Carol Tenopir (University of Tennessee) Search realities faced by end users and professional searchers Session Two: Today's search engines and beyond Daniel Hoogterp (Retrieval Technologies, McLean, VA) Effective presentation and utilization of search techniques Rick Kenny (Fulcrum Technologies, Ontario, Canada) Beyond document clustering: The knowledge impact statement Gary Stock (Ingenius, Kalamazoo, MI) Automated change monitoring Gary Culliss (Direct Hit, Wellesley Hills, MA) User popularity ranked search engines Byron Dom (IBM, CA) Automatically finding the best pages on the World Wide Web (CLEVER) Peter Tomassi (LookSmart, San Francisco, CA) Adding human intellect to search technology Session Three: Panel discussion: Human v automated categorization and editing Ev Brenner (New York, NY)- Chairman James Callan (University of Massachusetts, MA) Marc Krellenstein (Northern Light Technology, Cambridge, MA) Dan Miller (Ask Jeeves, Berkeley, CA) Session Four: Updates and a twelve month perspective Steve Arnold (AIT, Harrods Creek, KY) Review: The leading edge in search and retrieval software Ellen Voorhees (NIST, Gaithersburg, MD) TREC update Session Five: Search engines now and beyond Intelligent Agents John Snyder (Muscat, Cambridge, England) Practical issues behind intelligent agents Text summarization Therese Firmin, (Dept of Defense, Ft George G. Meade, MD) The TIPSTER/SUMMAC evaluation of automatic text summarization systems Cross language searching Elizabeth Liddy (TextWise, Syracuse, NY) A conceptual interlingua approach to cross-language retrieval. Video search and retrieval Armon Amir (IBM, Almaden, CA) CueVideo: Modular system for automatic indexing and browsing of video/audio Speech recognition Michael Witbrock (Lycos, Waltham, MA) Retrieval of spoken documents Visualization James A. Wise (Integral Visuals, Richland, WA) Information visualization in the new millennium: Emerging science or passing fashion? Text mining David Evans (Claritech, Pittsburgh, PA) Text mining - towards decision support

Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998) 0.01

0.007985508 = product of:
  0.037265703 = sum of:
    0.008269517 = weight(_text_:information in 1673) [ClassicSimilarity], result of:
      0.008269517 = score(doc=1673,freq=4.0), product of:
        0.04306919 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02453417 = queryNorm
        0.1920054 = fieldWeight in 1673, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1673)
    0.017362041 = weight(_text_:retrieval in 1673) [ClassicSimilarity], result of:
      0.017362041 = score(doc=1673,freq=2.0), product of:
        0.07421378 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.02453417 = queryNorm
        0.23394634 = fieldWeight in 1673, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1673)
    0.011634145 = product of:
      0.02326829 = sum of:
        0.02326829 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
          0.02326829 = score(doc=1673,freq=2.0), product of:
            0.085914485 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.02453417 = queryNorm
            0.2708308 = fieldWeight in 1673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1673)
      0.5 = coord(1/2)
  0.21428572 = coord(3/14)

Abstract: The Wolverhampton Web Library (WWLib) is a WWW search engine that provides access to UK based information. The experimental version developed in 1995, was a success but highlighted the need for a much higher degree of automation. An interesting feature of the experimental WWLib was that it organised information according to DDC. Discusses the advantages of classification and describes the automatic classifier that is being developed in Java as part of the new, fully automated WWLib
Date: 1. 8.1996 22:08:06
Theme: Klassifikationssysteme im Online-Retrieval

Dolin, R.; Agrawal, D.; El Abbadi, A.; Pearlman, J.: Using automated classification for summarizing and selecting heterogeneous information sources (1998) 0.01
```
0.007731168 = product of:
  0.036078785 = sum of:
    0.018037671 = weight(_text_:system in 1253) [ClassicSimilarity], result of:
      0.018037671 = score(doc=1253,freq=10.0), product of:
        0.07727166 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.02453417 = queryNorm
        0.23343189 = fieldWeight in 1253, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.0234375 = fieldNorm(doc=1253)
    0.007518126 = weight(_text_:information in 1253) [ClassicSimilarity], result of:
      0.007518126 = score(doc=1253,freq=18.0), product of:
        0.04306919 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02453417 = queryNorm
        0.17455927 = fieldWeight in 1253, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0234375 = fieldNorm(doc=1253)
    0.010522987 = weight(_text_:retrieval in 1253) [ClassicSimilarity], result of:
      0.010522987 = score(doc=1253,freq=4.0), product of:
        0.07421378 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.02453417 = queryNorm
        0.1417929 = fieldWeight in 1253, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0234375 = fieldNorm(doc=1253)
  0.21428572 = coord(3/14)
```
Abstract

Information retrieval over the Internet increasingly requires the filtering of thousands of heterogeneous information sources. Important sources of information include not only traditional databases with structured data and queries, but also increasing numbers of non-traditional, semi- or unstructured collections such as Web sites, FTP archives, etc. As the number and variability of sources increases, new ways of automatically summarizing, discovering, and selecting collections relevant to a user's query are needed. One such method involves the use of classification schemes, such as the Library of Congress Classification (LCC), within which a collection may be represented based on its content, irrespective of the structure of the actual data or documents. For such a system to be useful in a large-scale distributed environment, it must be easy to use for both collection managers and users. As a result, it must be possible to classify documents automatically within a classification scheme. Furthermore, there must be a straightforward and intuitive interface with which the user may use the scheme to assist in information retrieval (IR). Our work with the Alexandria Digital Library (ADL) Project focuses on geo-referenced information, whether text, maps, aerial photographs, or satellite images. As a result, we have emphasized techniques which work with both text and non-text, such as combined textual and graphical queries, multi-dimensional indexing, and IR methods which are not solely dependent on words or phrases. Part of this work involves locating relevant online sources of information. In particular, we have designed and are currently testing aspects of an architecture, Pharos, which we believe will scale up to 1.000.000 heterogeneous sources. Pharos accommodates heterogeneity in content and format, both among multiple sources as well as within a single source. That is, we consider sources to include Web sites, FTP archives, newsgroups, and full digital libraries; all of these systems can include a wide variety of content and multimedia data formats. Pharos is based on the use of hierarchical classification schemes. These include not only well-known 'subject' (or 'concept') based schemes such as the Dewey Decimal System and the LCC, but also, for example, geographic classifications, which might be constructed as layers of smaller and smaller hierarchical longitude/latitude boxes. Pharos is designed to work with sophisticated queries which utilize subjects, geographical locations, temporal specifications, and other types of information domains. The Pharos architecture requires that hierarchically structured collection metadata be extracted so that it can be partitioned in such a way as to greatly enhance scalability. Automated classification is important to Pharos because it allows information sources to extract the requisite collection metadata automatically that must be distributed.
We are currently experimenting with newsgroups as collections. We have built an initial prototype which automatically classifies and summarizes newsgroups within the LCC. (The prototype can be tested below, and more details may be found at http://pharos.alexandria.ucsb.edu/). The prototype uses electronic library catalog records as a `training set' and Latent Semantic Indexing (LSI) for IR. We use the training set to build a rich set of classification terminology, and associate these terms with the relevant categories in the LCC. This association between terms and classification categories allows us to relate users' queries to nodes in the LCC so that users can select appropriate query categories. Newsgroups are similarly associated with classification categories. Pharos then matches the categories selected by users to relevant newsgroups. In principle, this approach allows users to exclude newsgroups that might have been selected based on an unintended meaning of a query term, and to include newsgroups with relevant content even though the exact query terms may not have been used. This work is extensible to other types of classification, including geographical, temporal, and image feature. Before discussing the methodology of the collection summarization and selection, we first present an online demonstration below. The demonstration is not intended to be a complete end-user interface. Rather, it is intended merely to offer a view of the process to suggest the "look and feel" of the prototype. The demo works as follows. First supply it with a few keywords of interest. The system will then use those terms to try to return to you the most relevant subject categories within the LCC. Assuming that the system recognizes any of your terms (it has over 400,000 terms indexed), it will give you a list of 15 LCC categories sorted by relevancy ranking. From there, you have two choices. The first choice, by clicking on the "News" links, is to get a list of newsgroups which the system has identified as relevant to the LCC category you select. The other choice, by clicking on the LCC ID links, is to enter the LCC hierarchy starting at the category of your choice and navigate the tree until you locate the best category for your query. From there, again, you can get a list of newsgroups by clicking on the "News" links. After having shown this demonstration to many people, we would like to suggest that you first give it easier examples before trying to break it. For example, "prostate cancer" (discussed below), "remote sensing", "investment banking", and "gershwin" all work reasonably well.

Hagedorn, K.; Chapman, S.; Newman, D.: Enhancing search and browse using automated clustering of subject metadata (2007) 0.01

0.0077201175 = product of:
  0.036027215 = sum of:
    0.016133383 = weight(_text_:system in 1168) [ClassicSimilarity], result of:
      0.016133383 = score(doc=1168,freq=2.0), product of:
        0.07727166 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.02453417 = queryNorm
        0.20878783 = fieldWeight in 1168, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.046875 = fieldNorm(doc=1168)
    0.0050120843 = weight(_text_:information in 1168) [ClassicSimilarity], result of:
      0.0050120843 = score(doc=1168,freq=2.0), product of:
        0.04306919 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02453417 = queryNorm
        0.116372846 = fieldWeight in 1168, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=1168)
    0.014881751 = weight(_text_:retrieval in 1168) [ClassicSimilarity], result of:
      0.014881751 = score(doc=1168,freq=2.0), product of:
        0.07421378 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.02453417 = queryNorm
        0.20052543 = fieldWeight in 1168, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=1168)
  0.21428572 = coord(3/14)

Abstract: The Web puzzle of online information resources often hinders end-users from effective and efficient access to these resources. Clustering resources into appropriate subject-based groupings may help alleviate these difficulties, but will it work with heterogeneous material? The University of Michigan and the University of California Irvine joined forces to test automatically enhancing metadata records using the Topic Modeling algorithm on the varied OAIster corpus. We created labels for the resulting clusters of metadata records, matched the clusters to an in-house classification system, and developed a prototype that would showcase methods for search and retrieval using the enhanced records. Results indicated that while the algorithm was somewhat time-intensive to run and using a local classification scheme had its drawbacks, precise clustering of records was achieved and the prototype interface proved that faceted classification could be powerful in helping end-users find resources.

Rijsbergen, C.J. van: Automatic classification in information retrieval (1978) 0.01

0.0075786044 = product of:
  0.053050227 = sum of:
    0.013365558 = weight(_text_:information in 2412) [ClassicSimilarity], result of:
      0.013365558 = score(doc=2412,freq=2.0), product of:
        0.04306919 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02453417 = queryNorm
        0.3103276 = fieldWeight in 2412, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.125 = fieldNorm(doc=2412)
    0.03968467 = weight(_text_:retrieval in 2412) [ClassicSimilarity], result of:
      0.03968467 = score(doc=2412,freq=2.0), product of:
        0.07421378 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.02453417 = queryNorm
        0.5347345 = fieldWeight in 2412, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.125 = fieldNorm(doc=2412)
  0.14285715 = coord(2/14)

Hoffmann, R.: Entwicklung einer benutzerunterstützten automatisierten Klassifikation von Web - Dokumenten : Untersuchung gegenwärtiger Methoden zur automatisierten Dokumentklassifikation und Implementierung eines Prototyps zum verbesserten Information Retrieval für das xFIND System (2002) 0.01
```
0.007506172 = product of:
  0.035028804 = sum of:
    0.015210699 = weight(_text_:system in 4197) [ClassicSimilarity], result of:
      0.015210699 = score(doc=4197,freq=4.0), product of:
        0.07727166 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.02453417 = queryNorm
        0.19684705 = fieldWeight in 4197, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.03125 = fieldNorm(doc=4197)
    0.005787457 = weight(_text_:information in 4197) [ClassicSimilarity], result of:
      0.005787457 = score(doc=4197,freq=6.0), product of:
        0.04306919 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02453417 = queryNorm
        0.1343758 = fieldWeight in 4197, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03125 = fieldNorm(doc=4197)
    0.014030648 = weight(_text_:retrieval in 4197) [ClassicSimilarity], result of:
      0.014030648 = score(doc=4197,freq=4.0), product of:
        0.07421378 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.02453417 = queryNorm
        0.18905719 = fieldWeight in 4197, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.03125 = fieldNorm(doc=4197)
  0.21428572 = coord(3/14)
```
Abstract

Das unüberschaubare und permanent wachsende Angebot von Informationen im Internet ermöglicht es den Menschen nicht mehr, dieses inhaltlich zu erfassen oder gezielt nach Informationen zu suchen. Einen Lösungsweg zur verbesserten Informationsauffindung stellt hierbei die Kategorisierung bzw. Klassifikation der Informationen auf Basis ihres thematischen Inhaltes dar. Diese thematische Klassifikation kann sowohl anhand manueller (intellektueller) Methoden als auch durch automatisierte Verfahren erfolgen. Doch beide Ansätze für sich konnten die an sie gestellten Erwartungen bis zum heutigen Tag nur unzureichend erfüllen. Im Rahmen dieser Arbeit soll daher der naheliegende Ansatz, die beiden Methoden sinnvoll zu verknüpfen, untersucht werden. Im ersten Teil dieser Arbeit, dem Untersuchungsbereich, wird einleitend das Problem des Informationsüberangebots in unserer Gesellschaft erläutert und gezeigt, dass die Kategorisierung bzw. Klassifikation dieser Informationen speziell im Internet sinnvoll erscheint. Die prinzipiellen Möglichkeiten der Themenzuordnung von Dokumenten zur Verbesserung der Wissensverwaltung und Wissensauffindung werden beschrieben. Dabei werden unter anderem verschiedene Klassifikationsschemata, Topic Maps und semantische Netze vorgestellt. Schwerpunkt des Untersuchungsbereiches ist die Beschreibung automatisierter Methoden zur Themenzuordnung. Neben einem Überblick über die gebräuchlichsten Klassifikations-Algorithmen werden sowohl am Markt existierende Systeme sowie Forschungsansätze und frei verfügbare Module zur automatischen Klassifikation vorgestellt. Berücksichtigt werden auch Systeme, die zumindest teilweise den erwähnten Ansatz der Kombination von manuellen und automatischen Methoden unterstützen. Auch die in Zusammenhang mit der Klassifikation von Dokumenten im Internet auftretenden Probleme werden aufgezeigt. Die im Untersuchungsbereich gewonnenen Erkenntnisse fließen in die Entwicklung eines Moduls zur benutzerunterstützten, automatischen Dokumentklassifikation im Rahmen des xFIND Systems (extended Framework for Information Discovery) ein. Dieses an der technischen Universität Graz konzipierte Framework stellt die Basis für eine Vielzahl neuer Ideen zur Verbesserung des Information Retrieval dar. Der im Gestaltungsbereich entwickelte Lösungsansatz sieht zunächst die Verwendung bereits im System vorhandener, manuell klassifizierter Dokumente, Server oder Serverbereiche als Grundlage für die automatische Klassifikation vor. Nach erfolgter automatischer Klassifikation können in einem nächsten Schritt dann Autoren und Administratoren die Ergebnisse im Rahmen einer Benutzerunterstützung anpassen. Dabei kann das kollektive Benutzerverhalten durch die Möglichkeit eines Votings - mittels Zustimmung bzw. Ablehnung der Klassifikationsergebnisse - Einfluss finden. Das Wissen von Fachexperten und Benutzern trägt somit letztendlich zur Verbesserung der automatischen Klassifikation bei. Im Gestaltungsbereich werden die grundlegenden Konzepte, der Aufbau und die Funktionsweise des entwickelten Moduls beschrieben, sowie eine Reihe von Vorschlägen und Ideen zur Weiterentwicklung der benutzerunterstützten automatischen Dokumentklassifikation präsentiert.

Liu, R.-L.: Context recognition for hierarchical text classification (2009) 0.01

0.0074542915 = product of:
  0.034786694 = sum of:
    0.016133383 = weight(_text_:system in 2760) [ClassicSimilarity], result of:
      0.016133383 = score(doc=2760,freq=2.0), product of:
        0.07727166 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.02453417 = queryNorm
        0.20878783 = fieldWeight in 2760, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.046875 = fieldNorm(doc=2760)
    0.008681185 = weight(_text_:information in 2760) [ClassicSimilarity], result of:
      0.008681185 = score(doc=2760,freq=6.0), product of:
        0.04306919 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02453417 = queryNorm
        0.20156369 = fieldWeight in 2760, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2760)
    0.009972124 = product of:
      0.019944249 = sum of:
        0.019944249 = weight(_text_:22 in 2760) [ClassicSimilarity], result of:
          0.019944249 = score(doc=2760,freq=2.0), product of:
            0.085914485 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.02453417 = queryNorm
            0.23214069 = fieldWeight in 2760, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2760)
      0.5 = coord(1/2)
  0.21428572 = coord(3/14)

Abstract: Information is often organized as a text hierarchy. A hierarchical text-classification system is thus essential for the management, sharing, and dissemination of information. It aims to automatically classify each incoming document into zero, one, or several categories in the text hierarchy. In this paper, we present a technique called CRHTC (context recognition for hierarchical text classification) that performs hierarchical text classification by recognizing the context of discussion (COD) of each category. A category's COD is governed by its ancestor categories, whose contents indicate contextual backgrounds of the category. A document may be classified into a category only if its content matches the category's COD. CRHTC does not require any trials to manually set parameters, and hence is more portable and easier to implement than other methods. It is empirically evaluated under various conditions. The results show that CRHTC achieves both better and more stable performance than several hierarchical and nonhierarchical text-classification methodologies.
Date: 22. 3.2009 19:11:54
Source: Journal of the American Society for Information Science and Technology. 60(2009) no.4, S.803-813

Humphrey, S.M.; Névéol, A.; Browne, A.; Gobeil, J.; Ruch, P.; Darmoni, S.J.: Comparing a rule-based versus statistical system for automatic categorization of MEDLINE documents according to biomedical specialty (2009) 0.01
```
0.007088628 = product of:
  0.033080265 = sum of:
    0.013444485 = weight(_text_:system in 3300) [ClassicSimilarity], result of:
      0.013444485 = score(doc=3300,freq=2.0), product of:
        0.07727166 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.02453417 = queryNorm
        0.17398985 = fieldWeight in 3300, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3300)
    0.0072343214 = weight(_text_:information in 3300) [ClassicSimilarity], result of:
      0.0072343214 = score(doc=3300,freq=6.0), product of:
        0.04306919 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02453417 = queryNorm
        0.16796975 = fieldWeight in 3300, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3300)
    0.012401459 = weight(_text_:retrieval in 3300) [ClassicSimilarity], result of:
      0.012401459 = score(doc=3300,freq=2.0), product of:
        0.07421378 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.02453417 = queryNorm
        0.16710453 = fieldWeight in 3300, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3300)
  0.21428572 = coord(3/14)
```
Abstract

Automatic document categorization is an important research problem in Information Science and Natural Language Processing. Many applications, including, Word Sense Disambiguation and Information Retrieval in large collections, can benefit from such categorization. This paper focuses on automatic categorization of documents from the biomedical literature into broad discipline-based categories. Two different systems are described and contrasted: CISMeF, which uses rules based on human indexing of the documents by the Medical Subject Headings (MeSH) controlled vocabulary in order to assign metaterms (MTs), and Journal Descriptor Indexing (JDI), based on human categorization of about 4,000 journals and statistical associations between journal descriptors (JDs) and textwords in the documents. We evaluate and compare the performance of these systems against a gold standard of humanly assigned categories for 100 MEDLINE documents, using six measures selected from trec_eval. The results show that for five of the measures performance is comparable, and for one measure JDI is superior. We conclude that these results favor JDI, given the significantly greater intellectual overhead involved in human indexing and maintaining a rule base for mapping MeSH terms to MTs. We also note a JDI method that associates JDs with MeSH indexing rather than textwords, and it may be worthwhile to investigate whether this JDI method (statistical) and CISMeF (rule-based) might be combined and then evaluated showing they are complementary to one another.

Source

Journal of the American Society for Information Science and Technology. 60(2009) no.12, S.2530-2539

Search (174 results, page 1 of 9)

Authors

Years

Languages

Types

Themes

Subjects