Search (31 results, page 1 of 2)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.06

0.06324223 = product of:
  0.09486334 = sum of:
    0.081037596 = product of:
      0.24311279 = sum of:
        0.24311279 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.24311279 = score(doc=562,freq=2.0), product of:
            0.43257114 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.051022716 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.013825747 = product of:
      0.04147724 = sum of:
        0.04147724 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.04147724 = score(doc=562,freq=2.0), product of:
            0.17867287 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051022716 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Golub, K.; Lykke, M.: Automated classification of web pages in hierarchical browsing (2009) 0.04

0.03801625 = product of:
  0.05702437 = sum of:
    0.022522911 = weight(_text_:im in 3614) [ClassicSimilarity], result of:
      0.022522911 = score(doc=3614,freq=2.0), product of:
        0.1442303 = queryWeight, product of:
          2.8267863 = idf(docFreq=7115, maxDocs=44218)
          0.051022716 = queryNorm
        0.15615936 = fieldWeight in 3614, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.8267863 = idf(docFreq=7115, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3614)
    0.03450146 = product of:
      0.051752187 = sum of:
        0.025961377 = weight(_text_:online in 3614) [ClassicSimilarity], result of:
          0.025961377 = score(doc=3614,freq=2.0), product of:
            0.1548489 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.051022716 = queryNorm
            0.16765618 = fieldWeight in 3614, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3614)
        0.025790809 = weight(_text_:retrieval in 3614) [ClassicSimilarity], result of:
          0.025790809 = score(doc=3614,freq=2.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.16710453 = fieldWeight in 3614, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3614)
      0.6666667 = coord(2/3)
  0.6666667 = coord(2/3)

Theme: Klassifikationssysteme im Online-Retrieval

Mengle, S.; Goharian, N.: Passage detection using text classification (2009) 0.02
```
0.020496529 = product of:
  0.061489582 = sum of:
    0.061489582 = product of:
      0.09223437 = sum of:
        0.05767 = weight(_text_:retrieval in 2765) [ClassicSimilarity], result of:
          0.05767 = score(doc=2765,freq=10.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.37365708 = fieldWeight in 2765, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2765)
        0.03456437 = weight(_text_:22 in 2765) [ClassicSimilarity], result of:
          0.03456437 = score(doc=2765,freq=2.0), product of:
            0.17867287 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051022716 = queryNorm
            0.19345059 = fieldWeight in 2765, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2765)
      0.6666667 = coord(2/3)
  0.33333334 = coord(1/3)
```
Abstract

Passages can be hidden within a text to circumvent their disallowed transfer. Such release of compartmentalized information is of concern to all corporate and governmental organizations. Passage retrieval is well studied; we posit, however, that passage detection is not. Passage retrieval is the determination of the degree of relevance of blocks of text, namely passages, comprising a document. Rather than determining the relevance of a document in its entirety, passage retrieval determines the relevance of the individual passages. As such, modified traditional information-retrieval techniques compare terms found in user queries with the individual passages to determine a similarity score for passages of interest. In passage detection, passages are classified into predetermined categories. More often than not, passage detection techniques are deployed to detect hidden paragraphs in documents. That is, to hide information, documents are injected with hidden text into passages. Rather than matching query terms against passages to determine their relevance, using text-mining techniques, the passages are classified. Those documents with hidden passages are defined as infected. Thus, simply stated, passage retrieval is the search for passages relevant to a user query, while passage detection is the classification of passages. That is, in passage detection, passages are labeled with one or more categories from a set of predetermined categories. We present a keyword-based dynamic passage approach (KDP) and demonstrate that KDP outperforms statistically significantly (99% confidence) the other document-splitting approaches by 12% to 18% in the passage detection and passage category-prediction tasks. Furthermore, we evaluate the effects of the feature selection, passage length, ambiguous passages, and finally training-data category distribution on passage-detection accuracy.

Date

22. 3.2009 19:14:43

Cui, H.; Heidorn, P.B.; Zhang, H.: ¬An approach to automatic classification of text for information retrieval (2002) 0.02

0.01942425 = product of:
  0.05827275 = sum of:
    0.05827275 = product of:
      0.087409124 = sum of:
        0.03634593 = weight(_text_:online in 174) [ClassicSimilarity], result of:
          0.03634593 = score(doc=174,freq=2.0), product of:
            0.1548489 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.051022716 = queryNorm
            0.23471867 = fieldWeight in 174, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.0546875 = fieldNorm(doc=174)
        0.05106319 = weight(_text_:retrieval in 174) [ClassicSimilarity], result of:
          0.05106319 = score(doc=174,freq=4.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.33085006 = fieldWeight in 174, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0546875 = fieldNorm(doc=174)
      0.6666667 = coord(2/3)
  0.33333334 = coord(1/3)

Abstract: In this paper, we explore an approach to make better use of semi-structured documents in information retrieval in the domain of biology. Using machine learning techniques, we make those inherent structures explicit by XML markups. This marking up has great potentials in improving task performance in specimen identification and the usability of online flora and fauna.

Hagedorn, K.; Chapman, S.; Newman, D.: Enhancing search and browse using automated clustering of subject metadata (2007) 0.01

0.013800584 = product of:
  0.04140175 = sum of:
    0.04140175 = product of:
      0.062102623 = sum of:
        0.031153653 = weight(_text_:online in 1168) [ClassicSimilarity], result of:
          0.031153653 = score(doc=1168,freq=2.0), product of:
            0.1548489 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.051022716 = queryNorm
            0.20118743 = fieldWeight in 1168, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.046875 = fieldNorm(doc=1168)
        0.03094897 = weight(_text_:retrieval in 1168) [ClassicSimilarity], result of:
          0.03094897 = score(doc=1168,freq=2.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.20052543 = fieldWeight in 1168, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=1168)
      0.6666667 = coord(2/3)
  0.33333334 = coord(1/3)

Abstract: The Web puzzle of online information resources often hinders end-users from effective and efficient access to these resources. Clustering resources into appropriate subject-based groupings may help alleviate these difficulties, but will it work with heterogeneous material? The University of Michigan and the University of California Irvine joined forces to test automatically enhancing metadata records using the Topic Modeling algorithm on the varied OAIster corpus. We created labels for the resulting clusters of metadata records, matched the clusters to an in-house classification system, and developed a prototype that would showcase methods for search and retrieval using the enhanced records. Results indicated that while the algorithm was somewhat time-intensive to run and using a local classification scheme had its drawbacks, precise clustering of records was achieved and the prototype interface proved that faceted classification could be powerful in helping end-users find resources.

Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.01

0.009217165 = product of:
  0.027651494 = sum of:
    0.027651494 = product of:
      0.08295448 = sum of:
        0.08295448 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
          0.08295448 = score(doc=1046,freq=2.0), product of:
            0.17867287 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051022716 = queryNorm
            0.46428138 = fieldWeight in 1046, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=1046)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Date: 5. 5.2003 14:17:22

Wu, M.; Fuller, M.; Wilkinson, R.: Using clustering and classification approaches in interactive retrieval (2001) 0.01

0.008023808 = product of:
  0.024071421 = sum of:
    0.024071421 = product of:
      0.07221426 = sum of:
        0.07221426 = weight(_text_:retrieval in 2666) [ClassicSimilarity], result of:
          0.07221426 = score(doc=2666,freq=2.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.46789268 = fieldWeight in 2666, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.109375 = fieldNorm(doc=2666)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Yu, W.; Gong, Y.: Document clustering by concept factorization (2004) 0.01

0.006877549 = product of:
  0.020632647 = sum of:
    0.020632647 = product of:
      0.06189794 = sum of:
        0.06189794 = weight(_text_:retrieval in 4084) [ClassicSimilarity], result of:
          0.06189794 = score(doc=4084,freq=2.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.40105087 = fieldWeight in 4084, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.09375 = fieldNorm(doc=4084)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Source: SIGIR'04: Proceedings of the 27th Annual International ACM-SIGIR Conference an Research and Development in Information Retrieval. Ed.: K. Järvelin, u.a

Guerrero-Bote, V.P.; Moya Anegón, F. de; Herrero Solana, V.: Document organization using Kohonen's algorithm (2002) 0.01

0.0064842156 = product of:
  0.019452646 = sum of:
    0.019452646 = product of:
      0.058357935 = sum of:
        0.058357935 = weight(_text_:retrieval in 2564) [ClassicSimilarity], result of:
          0.058357935 = score(doc=2564,freq=4.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.37811437 = fieldWeight in 2564, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0625 = fieldNorm(doc=2564)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Abstract: The classification of documents from a bibliographic database is a task that is linked to processes of information retrieval based on partial matching. A method is described of vectorizing reference documents from LISA which permits their topological organization using Kohonen's algorithm. As an example a map is generated of 202 documents from LISA, and an analysis is made of the possibilities of this type of neural network with respect to the development of information retrieval systems based on graphical browsing.

Shen, D.; Chen, Z.; Yang, Q.; Zeng, H.J.; Zhang, B.; Lu, Y.; Ma, W.Y.: Web page classification through summarization (2004) 0.01

0.005731291 = product of:
  0.017193872 = sum of:
    0.017193872 = product of:
      0.051581617 = sum of:
        0.051581617 = weight(_text_:retrieval in 4132) [ClassicSimilarity], result of:
          0.051581617 = score(doc=4132,freq=2.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.33420905 = fieldWeight in 4132, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.078125 = fieldNorm(doc=4132)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Source: SIGIR'04: Proceedings of the 27th Annual International ACM-SIGIR Conference an Research and Development in Information Retrieval. Ed.: K. Järvelin, u.a

Yao, H.; Etzkorn, L.H.; Virani, S.: Automated classification and retrieval of reusable software components (2008) 0.01
```
0.005731291 = product of:
  0.017193872 = sum of:
    0.017193872 = product of:
      0.051581617 = sum of:
        0.051581617 = weight(_text_:retrieval in 1382) [ClassicSimilarity], result of:
          0.051581617 = score(doc=1382,freq=8.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.33420905 = fieldWeight in 1382, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1382)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)
```
Abstract

The authors describe their research which improves software reuse by using an automated approach to semantically search for and retrieve reusable software components in large software component repositories and on the World Wide Web (WWW). Using automation and smart (semantic) techniques, their approach speeds up the search and retrieval of reusable software components, while retaining good accuracy, and therefore improves the affordability of software reuse. A program understanding of software components and natural language understanding of user queries was employed. Then the software component descriptions were compared by matching the resulting semantic representations of the user queries to the semantic representations of the software components to search for software components that best match the user queries. A proof of concept system was developed to test the authors' approach. The results of this proof of concept system were compared to human experts, and statistical analysis was performed on the collected experimental data. The results from these experiments demonstrate that this automated semantic-based approach for software reusable component classification and retrieval is successful when compared to the labor-intensive results from the experts, thus showing that this approach can significantly benefit software reuse classification and retrieval.
Miyamoto, S.: Information clustering based an fuzzy multisets (2003) 0.01
```
0.005673688 = product of:
  0.017021064 = sum of:
    0.017021064 = product of:
      0.05106319 = sum of:
        0.05106319 = weight(_text_:retrieval in 1071) [ClassicSimilarity], result of:
          0.05106319 = score(doc=1071,freq=4.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.33085006 = fieldWeight in 1071, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1071)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)
```
Abstract

A fuzzy multiset model for information clustering is proposed with application to information retrieval on the World Wide Web. Noting that a search engine retrieves multiple occurrences of the same subjects with possibly different degrees of relevance, we observe that fuzzy multisets provide an appropriate model of information retrieval on the WWW. Information clustering which means both term clustering and document clustering is considered. Three methods of the hard c-means, fuzzy c-means, and an agglomerative method using cluster centers are proposed. Two distances between fuzzy multisets and algorithms for calculating cluster centers are defined. Theoretical properties concerning the clustering algorithms are studied. Illustrative examples are given to show how the algorithms work.

Automatic classification research at OCLC (2002) 0.01

0.00537668 = product of:
  0.01613004 = sum of:
    0.01613004 = product of:
      0.048390117 = sum of:
        0.048390117 = weight(_text_:22 in 1563) [ClassicSimilarity], result of:
          0.048390117 = score(doc=1563,freq=2.0), product of:
            0.17867287 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051022716 = queryNorm
            0.2708308 = fieldWeight in 1563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1563)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Date: 5. 5.2003 9:22:09

Yoon, Y.; Lee, C.; Lee, G.G.: ¬An effective procedure for constructing a hierarchical text classification system (2006) 0.01

0.00537668 = product of:
  0.01613004 = sum of:
    0.01613004 = product of:
      0.048390117 = sum of:
        0.048390117 = weight(_text_:22 in 5273) [ClassicSimilarity], result of:
          0.048390117 = score(doc=5273,freq=2.0), product of:
            0.17867287 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051022716 = queryNorm
            0.2708308 = fieldWeight in 5273, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5273)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Date: 22. 7.2006 16:24:52

Yi, K.: Automatic text classification using library classification schemes : trends, issues and challenges (2007) 0.01

0.00537668 = product of:
  0.01613004 = sum of:
    0.01613004 = product of:
      0.048390117 = sum of:
        0.048390117 = weight(_text_:22 in 2560) [ClassicSimilarity], result of:
          0.048390117 = score(doc=2560,freq=2.0), product of:
            0.17867287 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051022716 = queryNorm
            0.2708308 = fieldWeight in 2560, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2560)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Date: 22. 9.2008 18:31:54

Leroy, G.; Miller, T.; Rosemblat, G.; Browne, A.: ¬A balanced approach to health information evaluation : a vocabulary-based naïve Bayes classifier and readability formulas (2008) 0.00
```
0.004895325 = product of:
  0.0146859735 = sum of:
    0.0146859735 = product of:
      0.04405792 = sum of:
        0.04405792 = weight(_text_:online in 1998) [ClassicSimilarity], result of:
          0.04405792 = score(doc=1998,freq=4.0), product of:
            0.1548489 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.051022716 = queryNorm
            0.284522 = fieldWeight in 1998, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.046875 = fieldNorm(doc=1998)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)
```
Abstract

Since millions seek health information online, it is vital for this information to be comprehensible. Most studies use readability formulas, which ignore vocabulary, and conclude that online health information is too difficult. We developed a vocabularly-based, naïve Bayes classifier to distinguish between three difficulty levels in text. It proved 98% accurate in a 250-document evaluation. We compared our classifier with readability formulas for 90 new documents with different origins and asked representative human evaluators, an expert and a consumer, to judge each document. Average readability grade levels for educational and commercial pages was 10th grade or higher, too difficult according to current literature. In contrast, the classifier showed that 70-90% of these pages were written at an intermediate, appropriate level indicating that vocabulary usage is frequently appropriate in text considered too difficult by readability formula evaluations. The expert considered the pages more difficult for a consumer than the consumer did.

Liu, R.-L.: Context recognition for hierarchical text classification (2009) 0.00

0.0046085827 = product of:
  0.013825747 = sum of:
    0.013825747 = product of:
      0.04147724 = sum of:
        0.04147724 = weight(_text_:22 in 2760) [ClassicSimilarity], result of:
          0.04147724 = score(doc=2760,freq=2.0), product of:
            0.17867287 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051022716 = queryNorm
            0.23214069 = fieldWeight in 2760, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2760)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Date: 22. 3.2009 19:11:54

Godby, C. J.; Stuler, J.: ¬The Library of Congress Classification as a knowledge base for automatic subject categorization (2001) 0.00

0.004585033 = product of:
  0.013755098 = sum of:
    0.013755098 = product of:
      0.041265294 = sum of:
        0.041265294 = weight(_text_:retrieval in 1567) [ClassicSimilarity], result of:
          0.041265294 = score(doc=1567,freq=2.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.26736724 = fieldWeight in 1567, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0625 = fieldNorm(doc=1567)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Footnote: Paper, IFLA Preconference "Subject Retrieval in a Networked Environment", Dublin, OH, August 2001.

Ribeiro-Neto, B.; Laender, A.H.F.; Lima, L.R.S. de: ¬An experimental study in automatically categorizing medical documents (2001) 0.00
```
0.004052635 = product of:
  0.012157904 = sum of:
    0.012157904 = product of:
      0.03647371 = sum of:
        0.03647371 = weight(_text_:retrieval in 5702) [ClassicSimilarity], result of:
          0.03647371 = score(doc=5702,freq=4.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.23632148 = fieldWeight in 5702, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5702)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)
```
Abstract

In this article, we evaluate the retrieval performance of an algorithm that automatically categorizes medical documents. The categorization, which consists in assigning an International Code of Disease (ICD) to the medical document under examination, is based on wellknown information retrieval techniques. The algorithm, which we proposed, operates in a fully automatic mode and requires no supervision or training data. Using a database of 20,569 documents, we verify that the algorithm attains levels of average precision in the 70-80% range for category coding and in the 60-70% range for subcategory coding. We also carefully analyze the case of those documents whose categorization is not in accordance with the one provided by the human specialists. The vast majority of them represent cases that can only be fully categorized with the assistance of a human subject (because, for instance, they require specific knowledge of a given pathology). For a slim fraction of all documents (0.77% for category coding and 1.4% for subcategory coding), the algorithm makes assignments that are clearly incorrect. However, this fraction corresponds to only one-fourth of the mistakes made by the human specialists

Godby, C.J.; Stuler, J.: ¬The Library of Congress Classification as a knowledge base for automatic subject categorization : subject access issues (2003) 0.00

0.004011904 = product of:
  0.012035711 = sum of:
    0.012035711 = product of:
      0.03610713 = sum of:
        0.03610713 = weight(_text_:retrieval in 3962) [ClassicSimilarity], result of:
          0.03610713 = score(doc=3962,freq=2.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.23394634 = fieldWeight in 3962, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3962)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Source: Subject retrieval in a networked environment: Proceedings of the IFLA Satellite Meeting held in Dublin, OH, 14-16 August 2001 and sponsored by the IFLA Classification and Indexing Section, the IFLA Information Technology Section and OCLC. Ed.: I.C. McIlwaine

Search (31 results, page 1 of 2)

Authors

Types

Themes