Search (102 results, page 2 of 6)

Yi, K.: Automatic text classification using library classification schemes : trends, issues and challenges (2007) 0.02

0.016956761 = product of:
  0.033913523 = sum of:
    0.011652809 = product of:
      0.046611235 = sum of:
        0.046611235 = weight(_text_:based in 2560) [ClassicSimilarity], result of:
          0.046611235 = score(doc=2560,freq=4.0), product of:
            0.14144066 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.04694356 = queryNorm
            0.3295462 = fieldWeight in 2560, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2560)
      0.25 = coord(1/4)
    0.022260714 = product of:
      0.04452143 = sum of:
        0.04452143 = weight(_text_:22 in 2560) [ClassicSimilarity], result of:
          0.04452143 = score(doc=2560,freq=2.0), product of:
            0.16438834 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04694356 = queryNorm
            0.2708308 = fieldWeight in 2560, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2560)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: The proliferation of digital resources and their integration into a traditional library setting has created a pressing need for an automated tool that organizes textual information based on library classification schemes. Automated text classification is a research field of developing tools, methods, and models to automate text classification. This article describes the current popular approach for text classification and major text classification projects and applications that are based on library classification schemes. Related issues and challenges are discussed, and a number of considerations for the challenges are examined.
Date: 22. 9.2008 18:31:54

Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998) 0.02

0.015250247 = product of:
  0.030500494 = sum of:
    0.00823978 = product of:
      0.03295912 = sum of:
        0.03295912 = weight(_text_:based in 1673) [ClassicSimilarity], result of:
          0.03295912 = score(doc=1673,freq=2.0), product of:
            0.14144066 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.04694356 = queryNorm
            0.23302436 = fieldWeight in 1673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1673)
      0.25 = coord(1/4)
    0.022260714 = product of:
      0.04452143 = sum of:
        0.04452143 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
          0.04452143 = score(doc=1673,freq=2.0), product of:
            0.16438834 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04694356 = queryNorm
            0.2708308 = fieldWeight in 1673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1673)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: The Wolverhampton Web Library (WWLib) is a WWW search engine that provides access to UK based information. The experimental version developed in 1995, was a success but highlighted the need for a much higher degree of automation. An interesting feature of the experimental WWLib was that it organised information according to DDC. Discusses the advantages of classification and describes the automatic classifier that is being developed in Java as part of the new, fully automated WWLib
Date: 1. 8.1996 22:08:06

Ibekwe-SanJuan, F.; SanJuan, E.: From term variants to research topics (2002) 0.01

0.014115169 = product of:
  0.056460675 = sum of:
    0.056460675 = weight(_text_:term in 1853) [ClassicSimilarity], result of:
      0.056460675 = score(doc=1853,freq=2.0), product of:
        0.21904005 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.04694356 = queryNorm
        0.25776416 = fieldWeight in 1853, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1853)
  0.25 = coord(1/4)

Pfeffer, M.: Automatische Vergabe von RVK-Notationen mittels fallbasiertem Schließen (2009) 0.01

0.013071639 = product of:
  0.026143279 = sum of:
    0.0070626684 = product of:
      0.028250674 = sum of:
        0.028250674 = weight(_text_:based in 3051) [ClassicSimilarity], result of:
          0.028250674 = score(doc=3051,freq=2.0), product of:
            0.14144066 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.04694356 = queryNorm
            0.19973516 = fieldWeight in 3051, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=3051)
      0.25 = coord(1/4)
    0.019080611 = product of:
      0.038161222 = sum of:
        0.038161222 = weight(_text_:22 in 3051) [ClassicSimilarity], result of:
          0.038161222 = score(doc=3051,freq=2.0), product of:
            0.16438834 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04694356 = queryNorm
            0.23214069 = fieldWeight in 3051, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=3051)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Date: 22. 8.2009 19:51:28
Theme: Case Based Reasoning

Egbert, J.; Biber, D.; Davies, M.: Developing a bottom-up, user-based method of web register classification (2015) 0.01

0.013071639 = product of:
  0.026143279 = sum of:
    0.0070626684 = product of:
      0.028250674 = sum of:
        0.028250674 = weight(_text_:based in 2158) [ClassicSimilarity], result of:
          0.028250674 = score(doc=2158,freq=2.0), product of:
            0.14144066 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.04694356 = queryNorm
            0.19973516 = fieldWeight in 2158, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=2158)
      0.25 = coord(1/4)
    0.019080611 = product of:
      0.038161222 = sum of:
        0.038161222 = weight(_text_:22 in 2158) [ClassicSimilarity], result of:
          0.038161222 = score(doc=2158,freq=2.0), product of:
            0.16438834 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04694356 = queryNorm
            0.23214069 = fieldWeight in 2158, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2158)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Date: 4. 8.2015 19:22:04

Mengle, S.; Goharian, N.: Passage detection using text classification (2009) 0.01
```
0.010893034 = product of:
  0.021786068 = sum of:
    0.005885557 = product of:
      0.023542227 = sum of:
        0.023542227 = weight(_text_:based in 2765) [ClassicSimilarity], result of:
          0.023542227 = score(doc=2765,freq=2.0), product of:
            0.14144066 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.04694356 = queryNorm
            0.16644597 = fieldWeight in 2765, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2765)
      0.25 = coord(1/4)
    0.015900511 = product of:
      0.031801023 = sum of:
        0.031801023 = weight(_text_:22 in 2765) [ClassicSimilarity], result of:
          0.031801023 = score(doc=2765,freq=2.0), product of:
            0.16438834 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04694356 = queryNorm
            0.19345059 = fieldWeight in 2765, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2765)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Passages can be hidden within a text to circumvent their disallowed transfer. Such release of compartmentalized information is of concern to all corporate and governmental organizations. Passage retrieval is well studied; we posit, however, that passage detection is not. Passage retrieval is the determination of the degree of relevance of blocks of text, namely passages, comprising a document. Rather than determining the relevance of a document in its entirety, passage retrieval determines the relevance of the individual passages. As such, modified traditional information-retrieval techniques compare terms found in user queries with the individual passages to determine a similarity score for passages of interest. In passage detection, passages are classified into predetermined categories. More often than not, passage detection techniques are deployed to detect hidden paragraphs in documents. That is, to hide information, documents are injected with hidden text into passages. Rather than matching query terms against passages to determine their relevance, using text-mining techniques, the passages are classified. Those documents with hidden passages are defined as infected. Thus, simply stated, passage retrieval is the search for passages relevant to a user query, while passage detection is the classification of passages. That is, in passage detection, passages are labeled with one or more categories from a set of predetermined categories. We present a keyword-based dynamic passage approach (KDP) and demonstrate that KDP outperforms statistically significantly (99% confidence) the other document-splitting approaches by 12% to 18% in the passage detection and passage category-prediction tasks. Furthermore, we evaluate the effects of the feature selection, passage length, ambiguous passages, and finally training-data category distribution on passage-detection accuracy.

Date

22. 3.2009 19:14:43
Barthel, S.; Tönnies, S.; Balke, W.-T.: Large-scale experiments for mathematical document classification (2013) 0.01
```
0.009880973 = product of:
  0.039523892 = sum of:
    0.039523892 = product of:
      0.079047784 = sum of:
        0.079047784 = weight(_text_:assessment in 1056) [ClassicSimilarity], result of:
          0.079047784 = score(doc=1056,freq=2.0), product of:
            0.25917634 = queryWeight, product of:
              5.52102 = idf(docFreq=480, maxDocs=44218)
              0.04694356 = queryNorm
            0.30499613 = fieldWeight in 1056, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.52102 = idf(docFreq=480, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1056)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

The ever increasing amount of digitally available information is curse and blessing at the same time. On the one hand, users have increasingly large amounts of information at their fingertips. On the other hand, the assessment and refinement of web search results becomes more and more tiresome and difficult for non-experts in a domain. Therefore, established digital libraries offer specialized collections with a certain degree of quality. This quality can largely be attributed to the great effort invested into semantic enrichment of the provided documents e.g. by annotating their documents with respect to a domain-specific taxonomy. This process is still done manually in many domains, e.g. chemistry CAS, medicine MeSH, or mathematics MSC. But due to the growing amount of data, this manual task gets more and more time consuming and expensive. The only solution for this problem seems to employ automated classification algorithms, but from evaluations done in previous research, conclusions to a real world scenario are difficult to make. We therefore conducted a large scale feasibility study on a real world data set from one of the biggest mathematical digital libraries, i.e. Zentralblatt MATH, with special focus on its practical applicability.
Golub, K.; Soergel, D.; Buchanan, G.; Tudhope, D.; Lykke, M.; Hiom, D.: ¬A framework for evaluating automatic indexing or classification in the context of retrieval (2016) 0.01
```
0.009880973 = product of:
  0.039523892 = sum of:
    0.039523892 = product of:
      0.079047784 = sum of:
        0.079047784 = weight(_text_:assessment in 3311) [ClassicSimilarity], result of:
          0.079047784 = score(doc=3311,freq=2.0), product of:
            0.25917634 = queryWeight, product of:
              5.52102 = idf(docFreq=480, maxDocs=44218)
              0.04694356 = queryNorm
            0.30499613 = fieldWeight in 3311, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.52102 = idf(docFreq=480, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3311)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

Tools for automatic subject assignment help deal with scale and sustainability in creating and enriching metadata, establishing more connections across and between resources and enhancing consistency. Although some software vendors and experimental researchers claim the tools can replace manual subject indexing, hard scientific evidence of their performance in operating information environments is scarce. A major reason for this is that research is usually conducted in laboratory conditions, excluding the complexities of real-life systems and situations. The article reviews and discusses issues with existing evaluation approaches such as problems of aboutness and relevance assessments, implying the need to use more than a single "gold standard" method when evaluating indexing and retrieval, and proposes a comprehensive evaluation framework. The framework is informed by a systematic review of the literature on evaluation approaches: evaluating indexing quality directly through assessment by an evaluator or through comparison with a gold standard, evaluating the quality of computer-assisted indexing directly in the context of an indexing workflow, and evaluating indexing quality indirectly through analyzing retrieval performance.

Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.01

0.0095403055 = product of:
  0.038161222 = sum of:
    0.038161222 = product of:
      0.076322444 = sum of:
        0.076322444 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
          0.076322444 = score(doc=1046,freq=2.0), product of:
            0.16438834 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04694356 = queryNorm
            0.46428138 = fieldWeight in 1046, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=1046)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 5. 5.2003 14:17:22

Reiner, U.: Automatische DDC-Klassifizierung von bibliografischen Titeldatensätzen (2009) 0.01

0.007950256 = product of:
  0.031801023 = sum of:
    0.031801023 = product of:
      0.063602045 = sum of:
        0.063602045 = weight(_text_:22 in 611) [ClassicSimilarity], result of:
          0.063602045 = score(doc=611,freq=2.0), product of:
            0.16438834 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04694356 = queryNorm
            0.38690117 = fieldWeight in 611, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=611)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 22. 8.2009 12:54:24

Bock, H.-H.: Datenanalyse zur Strukturierung und Ordnung von Information (1989) 0.01

0.0055651786 = product of:
  0.022260714 = sum of:
    0.022260714 = product of:
      0.04452143 = sum of:
        0.04452143 = weight(_text_:22 in 141) [ClassicSimilarity], result of:
          0.04452143 = score(doc=141,freq=2.0), product of:
            0.16438834 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04694356 = queryNorm
            0.2708308 = fieldWeight in 141, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=141)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Pages: S.1-22

Dubin, D.: Dimensions and discriminability (1998) 0.01

0.0055651786 = product of:
  0.022260714 = sum of:
    0.022260714 = product of:
      0.04452143 = sum of:
        0.04452143 = weight(_text_:22 in 2338) [ClassicSimilarity], result of:
          0.04452143 = score(doc=2338,freq=2.0), product of:
            0.16438834 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04694356 = queryNorm
            0.2708308 = fieldWeight in 2338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2338)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 22. 9.1997 19:16:05

Automatic classification research at OCLC (2002) 0.01

0.0055651786 = product of:
  0.022260714 = sum of:
    0.022260714 = product of:
      0.04452143 = sum of:
        0.04452143 = weight(_text_:22 in 1563) [ClassicSimilarity], result of:
          0.04452143 = score(doc=1563,freq=2.0), product of:
            0.16438834 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04694356 = queryNorm
            0.2708308 = fieldWeight in 1563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1563)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 5. 5.2003 9:22:09

Yoon, Y.; Lee, C.; Lee, G.G.: ¬An effective procedure for constructing a hierarchical text classification system (2006) 0.01

0.0055651786 = product of:
  0.022260714 = sum of:
    0.022260714 = product of:
      0.04452143 = sum of:
        0.04452143 = weight(_text_:22 in 5273) [ClassicSimilarity], result of:
          0.04452143 = score(doc=5273,freq=2.0), product of:
            0.16438834 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04694356 = queryNorm
            0.2708308 = fieldWeight in 5273, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5273)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 22. 7.2006 16:24:52

Liu, R.-L.: Context recognition for hierarchical text classification (2009) 0.00

0.0047701527 = product of:
  0.019080611 = sum of:
    0.019080611 = product of:
      0.038161222 = sum of:
        0.038161222 = weight(_text_:22 in 2760) [ClassicSimilarity], result of:
          0.038161222 = score(doc=2760,freq=2.0), product of:
            0.16438834 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04694356 = queryNorm
            0.23214069 = fieldWeight in 2760, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2760)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 22. 3.2009 19:11:54

Reiner, U.: DDC-based search in the data of the German National Bibliography (2008) 0.00
```
0.0035313342 = product of:
  0.014125337 = sum of:
    0.014125337 = product of:
      0.056501348 = sum of:
        0.056501348 = weight(_text_:based in 2166) [ClassicSimilarity], result of:
          0.056501348 = score(doc=2166,freq=8.0), product of:
            0.14144066 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.04694356 = queryNorm
            0.39947033 = fieldWeight in 2166, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=2166)
      0.25 = coord(1/4)
  0.25 = coord(1/4)
```
Abstract

In 2004, the German National Library began to classify title records of the German National Bibliography according to subject groups based on the divisions of the Dewey Decimal Classification (DDC). Since 2006, all titles of the main series of the German National Bibliography are classified in strict compliance with the DDC. On this basis, an enhanced DDC-based search can be realized - e.g., searching the data of the German National Bibliography for title records using number components of synthesized classification numbers or searching for DDC numbers using unclassified title records. This paper gives an account of the current research and development of the DDC-based search. The work is conducted in the VZG project Colibri that focuses on the automatic analysis of DDC-synthesized numbers and the automatic classification of bibliographic title records.

Guerrero-Bote, V.P.; Moya Anegón, F. de; Herrero Solana, V.: Document organization using Kohonen's algorithm (2002) 0.00

0.0033293737 = product of:
  0.013317495 = sum of:
    0.013317495 = product of:
      0.05326998 = sum of:
        0.05326998 = weight(_text_:based in 2564) [ClassicSimilarity], result of:
          0.05326998 = score(doc=2564,freq=4.0), product of:
            0.14144066 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.04694356 = queryNorm
            0.37662423 = fieldWeight in 2564, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0625 = fieldNorm(doc=2564)
      0.25 = coord(1/4)
  0.25 = coord(1/4)

Abstract: The classification of documents from a bibliographic database is a task that is linked to processes of information retrieval based on partial matching. A method is described of vectorizing reference documents from LISA which permits their topological organization using Kohonen's algorithm. As an example a map is generated of 202 documents from LISA, and an analysis is made of the possibilities of this type of neural network with respect to the development of information retrieval systems based on graphical browsing.

Zhang, X: Rough set theory based automatic text categorization (2005) 0.00

0.0033293737 = product of:
  0.013317495 = sum of:
    0.013317495 = product of:
      0.05326998 = sum of:
        0.05326998 = weight(_text_:based in 2822) [ClassicSimilarity], result of:
          0.05326998 = score(doc=2822,freq=4.0), product of:
            0.14144066 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.04694356 = queryNorm
            0.37662423 = fieldWeight in 2822, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0625 = fieldNorm(doc=2822)
      0.25 = coord(1/4)
  0.25 = coord(1/4)

Abstract: Der Forschungsbericht "Rough Set Theory Based Automatic Text Categorization and the Handling of Semantic Heterogeneity" von Xueying Zhang ist in Buchform auf Englisch erschienen. Zhang hat in ihrer Arbeit ein Verfahren basierend auf der Rough Set Theory entwickelt, das Beziehungen zwischen Schlagwörtern verschiedener Vokabulare herstellt. Sie war von 2003 bis 2005 Mitarbeiterin des IZ und ist seit Oktober 2005 Associate Professor an der Nanjing University of Science and Technology.

Humphrey, S.M.; Névéol, A.; Browne, A.; Gobeil, J.; Ruch, P.; Darmoni, S.J.: Comparing a rule-based versus statistical system for automatic categorization of MEDLINE documents according to biomedical specialty (2009) 0.00
```
0.0032901266 = product of:
  0.013160506 = sum of:
    0.013160506 = product of:
      0.052642025 = sum of:
        0.052642025 = weight(_text_:based in 3300) [ClassicSimilarity], result of:
          0.052642025 = score(doc=3300,freq=10.0), product of:
            0.14144066 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.04694356 = queryNorm
            0.37218451 = fieldWeight in 3300, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3300)
      0.25 = coord(1/4)
  0.25 = coord(1/4)
```
Abstract

Automatic document categorization is an important research problem in Information Science and Natural Language Processing. Many applications, including, Word Sense Disambiguation and Information Retrieval in large collections, can benefit from such categorization. This paper focuses on automatic categorization of documents from the biomedical literature into broad discipline-based categories. Two different systems are described and contrasted: CISMeF, which uses rules based on human indexing of the documents by the Medical Subject Headings (MeSH) controlled vocabulary in order to assign metaterms (MTs), and Journal Descriptor Indexing (JDI), based on human categorization of about 4,000 journals and statistical associations between journal descriptors (JDs) and textwords in the documents. We evaluate and compare the performance of these systems against a gold standard of humanly assigned categories for 100 MEDLINE documents, using six measures selected from trec_eval. The results show that for five of the measures performance is comparable, and for one measure JDI is superior. We conclude that these results favor JDI, given the significantly greater intellectual overhead involved in human indexing and maintaining a rule base for mapping MeSH terms to MTs. We also note a JDI method that associates JDs with MeSH indexing rather than textwords, and it may be worthwhile to investigate whether this JDI method (statistical) and CISMeF (rule-based) might be combined and then evaluated showing they are complementary to one another.

Khoo, C.S.G.; Ng, K.; Ou, S.: ¬An exploratory study of human clustering of Web pages (2003) 0.00

0.003180102 = product of:
  0.012720408 = sum of:
    0.012720408 = product of:
      0.025440816 = sum of:
        0.025440816 = weight(_text_:22 in 2741) [ClassicSimilarity], result of:
          0.025440816 = score(doc=2741,freq=2.0), product of:
            0.16438834 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04694356 = queryNorm
            0.15476047 = fieldWeight in 2741, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=2741)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 12. 9.2004 9:56:22

Search (102 results, page 2 of 6)

Authors

Years

Languages

Types

Themes