Search (174 results, page 1 of 9)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.35

0.35057482 = product of:
  0.6310347 = sum of:
    0.06152886 = product of:
      0.18458658 = sum of:
        0.18458658 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.18458658 = score(doc=562,freq=2.0), product of:
            0.32843533 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.038739666 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.18458658 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.18458658 = score(doc=562,freq=2.0), product of:
        0.32843533 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.038739666 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.18458658 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.18458658 = score(doc=562,freq=2.0), product of:
        0.32843533 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.038739666 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.18458658 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.18458658 = score(doc=562,freq=2.0), product of:
        0.32843533 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.038739666 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.01574607 = product of:
      0.03149214 = sum of:
        0.03149214 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.03149214 = score(doc=562,freq=2.0), product of:
            0.13565971 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038739666 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
  0.5555556 = coord(5/9)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Mengle, S.; Goharian, N.: Passage detection using text classification (2009) 0.06

0.06312523 = product of:
  0.14203176 = sum of:
    0.013190207 = weight(_text_:information in 2765) [ClassicSimilarity], result of:
      0.013190207 = score(doc=2765,freq=8.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.19395474 = fieldWeight in 2765, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2765)
    0.043786705 = weight(_text_:retrieval in 2765) [ClassicSimilarity], result of:
      0.043786705 = score(doc=2765,freq=10.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.37365708 = fieldWeight in 2765, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2765)
    0.07193312 = weight(_text_:techniques in 2765) [ClassicSimilarity], result of:
      0.07193312 = score(doc=2765,freq=6.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.42150658 = fieldWeight in 2765, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2765)
    0.013121725 = product of:
      0.02624345 = sum of:
        0.02624345 = weight(_text_:22 in 2765) [ClassicSimilarity], result of:
          0.02624345 = score(doc=2765,freq=2.0), product of:
            0.13565971 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038739666 = queryNorm
            0.19345059 = fieldWeight in 2765, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2765)
      0.5 = coord(1/2)
  0.44444445 = coord(4/9)

Abstract: Passages can be hidden within a text to circumvent their disallowed transfer. Such release of compartmentalized information is of concern to all corporate and governmental organizations. Passage retrieval is well studied; we posit, however, that passage detection is not. Passage retrieval is the determination of the degree of relevance of blocks of text, namely passages, comprising a document. Rather than determining the relevance of a document in its entirety, passage retrieval determines the relevance of the individual passages. As such, modified traditional information-retrieval techniques compare terms found in user queries with the individual passages to determine a similarity score for passages of interest. In passage detection, passages are classified into predetermined categories. More often than not, passage detection techniques are deployed to detect hidden paragraphs in documents. That is, to hide information, documents are injected with hidden text into passages. Rather than matching query terms against passages to determine their relevance, using text-mining techniques, the passages are classified. Those documents with hidden passages are defined as infected. Thus, simply stated, passage retrieval is the search for passages relevant to a user query, while passage detection is the classification of passages. That is, in passage detection, passages are labeled with one or more categories from a set of predetermined categories. We present a keyword-based dynamic passage approach (KDP) and demonstrate that KDP outperforms statistically significantly (99% confidence) the other document-splitting approaches by 12% to 18% in the passage detection and passage category-prediction tasks. Furthermore, we evaluate the effects of the feature selection, passage length, ambiguous passages, and finally training-data category distribution on passage-detection accuracy.
Date: 22. 3.2009 19:14:43
Source: Journal of the American Society for Information Science and Technology. 60(2009) no.4, S.814-825

Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.05

0.048393726 = product of:
  0.108885884 = sum of:
    0.017449005 = weight(_text_:information in 1107) [ClassicSimilarity], result of:
      0.017449005 = score(doc=1107,freq=14.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.256578 = fieldWeight in 1107, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1107)
    0.01958201 = weight(_text_:retrieval in 1107) [ClassicSimilarity], result of:
      0.01958201 = score(doc=1107,freq=2.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.16710453 = fieldWeight in 1107, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1107)
    0.058733147 = weight(_text_:techniques in 1107) [ClassicSimilarity], result of:
      0.058733147 = score(doc=1107,freq=4.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.34415868 = fieldWeight in 1107, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1107)
    0.013121725 = product of:
      0.02624345 = sum of:
        0.02624345 = weight(_text_:22 in 1107) [ClassicSimilarity], result of:
          0.02624345 = score(doc=1107,freq=2.0), product of:
            0.13565971 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038739666 = queryNorm
            0.19345059 = fieldWeight in 1107, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1107)
      0.5 = coord(1/2)
  0.44444445 = coord(4/9)

Abstract: Retrieval of disease information is often based on several key aspects such as etiology, diagnosis, treatment, prevention, and symptoms of diseases. Automatic identification of disease aspect information is thus essential. In this article, I model the aspect identification problem as a text classification (TC) problem in which a disease aspect corresponds to a category. The disease aspect classification problem poses two challenges to classifiers: (a) a medical text often contains information about multiple aspects of a disease and hence produces noise for the classifiers and (b) text classifiers often cannot extract the textual parts (i.e., passages) about the categories of interest. I thus develop a technique, PETC (Passage Extractor for Text Classification), that extracts passages (from medical texts) for the underlying text classifiers to classify. Case studies on thousands of Chinese and English medical texts show that PETC enhances a support vector machine (SVM) classifier in classifying disease aspect information. PETC also performs better than three state-of-the-art classifier enhancement techniques, including two passage extraction techniques for text classifiers and a technique that employs term proximity information to enhance text classifiers. The contribution is of significance to evidence-based medicine, health education, and healthcare decision support. PETC can be used in those application domains in which a text to be classified may have several parts about different categories.
Date: 28.10.2013 19:22:57
Source: Journal of the American Society for Information Science and Technology. 64(2013) no.11, S.2265-2277

Shafer, K.E.: Evaluating Scorpion results (1998) 0.05

0.045138486 = product of:
  0.13541545 = sum of:
    0.013190207 = weight(_text_:information in 1569) [ClassicSimilarity], result of:
      0.013190207 = score(doc=1569,freq=2.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.19395474 = fieldWeight in 1569, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.078125 = fieldNorm(doc=1569)
    0.03916402 = weight(_text_:retrieval in 1569) [ClassicSimilarity], result of:
      0.03916402 = score(doc=1569,freq=2.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.33420905 = fieldWeight in 1569, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.078125 = fieldNorm(doc=1569)
    0.08306122 = weight(_text_:techniques in 1569) [ClassicSimilarity], result of:
      0.08306122 = score(doc=1569,freq=2.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.4867139 = fieldWeight in 1569, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.078125 = fieldNorm(doc=1569)
  0.33333334 = coord(3/9)

Abstract: Scorpion is a research project at OCLC that builds tools for automatic subject assignment by combining library science and information retrieval techniques. A thesis of Scorpion is that the Dewey Decimal Classification (Dewey) can be used to perform automatic subject assignment for electronic items.

Li, T.; Zhu, S.; Ogihara, M.: Text categorization via generalized discriminant analysis (2008) 0.04
```
0.039386217 = product of:
  0.118158646 = sum of:
    0.067301154 = weight(_text_:line in 2119) [ClassicSimilarity], result of:
      0.067301154 = score(doc=2119,freq=2.0), product of:
        0.21724595 = queryWeight, product of:
          5.6078424 = idf(docFreq=440, maxDocs=44218)
          0.038739666 = queryNorm
        0.30979243 = fieldWeight in 2119, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.6078424 = idf(docFreq=440, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2119)
    0.009326885 = weight(_text_:information in 2119) [ClassicSimilarity], result of:
      0.009326885 = score(doc=2119,freq=4.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.13714671 = fieldWeight in 2119, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2119)
    0.04153061 = weight(_text_:techniques in 2119) [ClassicSimilarity], result of:
      0.04153061 = score(doc=2119,freq=2.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.24335694 = fieldWeight in 2119, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2119)
  0.33333334 = coord(3/9)
```
Abstract

Text categorization is an important research area and has been receiving much attention due to the growth of the on-line information and of Internet. Automated text categorization is generally cast as a multi-class classification problem. Much of previous work focused on binary document classification problems. Support vector machines (SVMs) excel in binary classification, but the elegant theory behind large-margin hyperplane cannot be easily extended to multi-class text classification. In addition, the training time and scaling are also important concerns. On the other hand, other techniques naturally extensible to handle multi-class classification are generally not as accurate as SVM. This paper presents a simple and efficient solution to multi-class text categorization. Classification problems are first formulated as optimization via discriminant analysis. Text categorization is then cast as the problem of finding coordinate transformations that reflects the inherent similarity from the data. While most of the previous approaches decompose a multi-class classification problem into multiple independent binary classification tasks, the proposed approach enables direct multi-class classification. By using generalized singular value decomposition (GSVD), a coordinate transformation that reflects the inherent class structure indicated by the generalized singular values is identified. Extensive experiments demonstrate the efficiency and effectiveness of the proposed approach.

Source

Information processing and management. 44(2008) no.5, S.1684-1697

AlQenaei, Z.M.; Monarchi, D.E.: ¬The use of learning techniques to analyze the results of a manual classification system (2016) 0.04

0.037016444 = product of:
  0.111049324 = sum of:
    0.011423056 = weight(_text_:information in 2836) [ClassicSimilarity], result of:
      0.011423056 = score(doc=2836,freq=6.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.16796975 = fieldWeight in 2836, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2836)
    0.027693143 = weight(_text_:retrieval in 2836) [ClassicSimilarity], result of:
      0.027693143 = score(doc=2836,freq=4.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.23632148 = fieldWeight in 2836, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2836)
    0.07193312 = weight(_text_:techniques in 2836) [ClassicSimilarity], result of:
      0.07193312 = score(doc=2836,freq=6.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.42150658 = fieldWeight in 2836, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2836)
  0.33333334 = coord(3/9)

Abstract: Classification is the process of assigning objects to pre-defined classes based on observations or characteristics of those objects, and there are many approaches to performing this task. The overall objective of this study is to demonstrate the use of two learning techniques to analyze the results of a manual classification system. Our sample consisted of 1,026 documents, from the ACM Computing Classification System, classified by their authors as belonging to one of the groups of the classification system: "H.3 Information Storage and Retrieval." A singular value decomposition of the documents' weighted term-frequency matrix was used to represent each document in a 50-dimensional vector space. The analysis of the representation using both supervised (decision tree) and unsupervised (clustering) techniques suggests that two pairs of the ACM classes are closely related to each other in the vector space. Class 1 (Content Analysis and Indexing) is closely related to Class 3 (Information Search and Retrieval), and Class 4 (Systems and Software) is closely related to Class 5 (Online Information Services). Further analysis was performed to test the diffusion of the words in the two classes using both cosine and Euclidean distance.

Ingwersen, P.; Wormell, I.: Ranganathan in the perspective of advanced information retrieval (1992) 0.04

0.036772612 = product of:
  0.11031783 = sum of:
    0.014923017 = weight(_text_:information in 7695) [ClassicSimilarity], result of:
      0.014923017 = score(doc=7695,freq=4.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.21943474 = fieldWeight in 7695, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0625 = fieldNorm(doc=7695)
    0.04430903 = weight(_text_:retrieval in 7695) [ClassicSimilarity], result of:
      0.04430903 = score(doc=7695,freq=4.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.37811437 = fieldWeight in 7695, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0625 = fieldNorm(doc=7695)
    0.051085785 = product of:
      0.10217157 = sum of:
        0.10217157 = weight(_text_:theories in 7695) [ClassicSimilarity], result of:
          0.10217157 = score(doc=7695,freq=2.0), product of:
            0.21161452 = queryWeight, product of:
              5.4624767 = idf(docFreq=509, maxDocs=44218)
              0.038739666 = queryNorm
            0.4828193 = fieldWeight in 7695, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4624767 = idf(docFreq=509, maxDocs=44218)
              0.0625 = fieldNorm(doc=7695)
      0.5 = coord(1/2)
  0.33333334 = coord(3/9)

Abstract: Examnines Ranganathan's approach to knowledge organisation and its relevance to intellectual accessibility in libraries. Discusses the current and future developments of his methodology and theories in knowledge-based systems. Topics covered include: semi-automatic classification and structure of thesauri; user-intermediary interactions in information retrieval (IR); semantic value-theory and uncertainty principles in IR; and case grammar

Cui, H.; Heidorn, P.B.; Zhang, H.: ¬An approach to automatic classification of text for information retrieval (2002) 0.04

0.036656965 = product of:
  0.10997089 = sum of:
    0.01305764 = weight(_text_:information in 174) [ClassicSimilarity], result of:
      0.01305764 = score(doc=174,freq=4.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.1920054 = fieldWeight in 174, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=174)
    0.0387704 = weight(_text_:retrieval in 174) [ClassicSimilarity], result of:
      0.0387704 = score(doc=174,freq=4.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.33085006 = fieldWeight in 174, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=174)
    0.05814285 = weight(_text_:techniques in 174) [ClassicSimilarity], result of:
      0.05814285 = score(doc=174,freq=2.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.3406997 = fieldWeight in 174, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.0546875 = fieldNorm(doc=174)
  0.33333334 = coord(3/9)

Abstract: In this paper, we explore an approach to make better use of semi-structured documents in information retrieval in the domain of biology. Using machine learning techniques, we make those inherent structures explicit by XML markups. This marking up has great potentials in improving task performance in specimen identification and the usability of online flora and fauna.

Yao, H.; Etzkorn, L.H.; Virani, S.: Automated classification and retrieval of reusable software components (2008) 0.03
```
0.02909658 = product of:
  0.087289736 = sum of:
    0.0065951035 = weight(_text_:information in 1382) [ClassicSimilarity], result of:
      0.0065951035 = score(doc=1382,freq=2.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.09697737 = fieldWeight in 1382, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1382)
    0.03916402 = weight(_text_:retrieval in 1382) [ClassicSimilarity], result of:
      0.03916402 = score(doc=1382,freq=8.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.33420905 = fieldWeight in 1382, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1382)
    0.04153061 = weight(_text_:techniques in 1382) [ClassicSimilarity], result of:
      0.04153061 = score(doc=1382,freq=2.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.24335694 = fieldWeight in 1382, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1382)
  0.33333334 = coord(3/9)
```
Abstract

The authors describe their research which improves software reuse by using an automated approach to semantically search for and retrieve reusable software components in large software component repositories and on the World Wide Web (WWW). Using automation and smart (semantic) techniques, their approach speeds up the search and retrieval of reusable software components, while retaining good accuracy, and therefore improves the affordability of software reuse. A program understanding of software components and natural language understanding of user queries was employed. Then the software component descriptions were compared by matching the resulting semantic representations of the user queries to the semantic representations of the software components to search for software components that best match the user queries. A proof of concept system was developed to test the authors' approach. The results of this proof of concept system were compared to human experts, and statistical analysis was performed on the collected experimental data. The results from these experiments demonstrate that this automated semantic-based approach for software reusable component classification and retrieval is successful when compared to the labor-intensive results from the experts, thus showing that this approach can significantly benefit software reuse classification and retrieval.

Source

Journal of the American Society for Information Science and Technology. 59(2008) no.4, S.613-627
Smiraglia, R.P.; Cai, X.: Tracking the evolution of clustering, machine learning, automatic indexing and automatic classification in knowledge organization (2017) 0.03
```
0.02830342 = product of:
  0.08491026 = sum of:
    0.0065951035 = weight(_text_:information in 3627) [ClassicSimilarity], result of:
      0.0065951035 = score(doc=3627,freq=2.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.09697737 = fieldWeight in 3627, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3627)
    0.01958201 = weight(_text_:retrieval in 3627) [ClassicSimilarity], result of:
      0.01958201 = score(doc=3627,freq=2.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.16710453 = fieldWeight in 3627, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3627)
    0.058733147 = weight(_text_:techniques in 3627) [ClassicSimilarity], result of:
      0.058733147 = score(doc=3627,freq=4.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.34415868 = fieldWeight in 3627, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3627)
  0.33333334 = coord(3/9)
```
Abstract

A very important extension of the traditional domain of knowledge organization (KO) arises from attempts to incorporate techniques devised in the computer science domain for automatic concept extraction and for grouping, categorizing, clustering and otherwise organizing knowledge using mechanical means. Four specific terms have emerged to identify the most prevalent techniques: machine learning, clustering, automatic indexing, and automatic classification. Our study presents three domain analytical case analyses in search of answers. The first case relies on citations located using the ISKO-supported "Knowledge Organization Bibliography." The second case relies on works in both Web of Science and SCOPUS. Case three applies co-word analysis and citation analysis to the contents of the papers in the present special issue. We observe scholars involved in "clustering" and "automatic classification" who share common thematic emphases. But we have found no coherence, no common activity and no social semantics. We have not found a research front, or a common teleology within the KO domain. We also have found a lively group of authors who have succeeded in submitting papers to this special issue, and their work quite interestingly aligns with the case studies we report. There is an emphasis on KO for information retrieval; there is much work on clustering (which involves conceptual points within texts) and automatic classification (which involves semantic groupings at the meta-document level).

Larson, R.R.: Experiments in automatic Library of Congress Classification (1992) 0.03

0.02708309 = product of:
  0.08124927 = sum of:
    0.007914125 = weight(_text_:information in 1054) [ClassicSimilarity], result of:
      0.007914125 = score(doc=1054,freq=2.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.116372846 = fieldWeight in 1054, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=1054)
    0.023498412 = weight(_text_:retrieval in 1054) [ClassicSimilarity], result of:
      0.023498412 = score(doc=1054,freq=2.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.20052543 = fieldWeight in 1054, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=1054)
    0.049836725 = weight(_text_:techniques in 1054) [ClassicSimilarity], result of:
      0.049836725 = score(doc=1054,freq=2.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.2920283 = fieldWeight in 1054, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.046875 = fieldNorm(doc=1054)
  0.33333334 = coord(3/9)

Abstract: This article presents the results of research into the automatic selection of Library of Congress Classification numbers based on the titles and subject headings in MARC records. The method used in this study was based on partial match retrieval techniques using various elements of new recors (i.e., those to be classified) as "queries", and a test database of classification clusters generated from previously classified MARC records. Sixty individual methods for automatic classification were tested on a set of 283 new records, using all combinations of four different partial match methods, five query types, and three representations of search terms. The results indicate that if the best method for a particular case can be determined, then up to 86% of the new records may be correctly classified. The single method with the best accuracy was able to select the correct classification for about 46% of the new records.
Source: Journal of the American Society for Information Science. 43(1992), S.130-148

Reiner, U.: Automatische DDC-Klassifizierung von bibliografischen Titeldatensätzen (2009) 0.03

0.026199227 = product of:
  0.07859768 = sum of:
    0.013190207 = weight(_text_:information in 611) [ClassicSimilarity], result of:
      0.013190207 = score(doc=611,freq=2.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.19395474 = fieldWeight in 611, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.078125 = fieldNorm(doc=611)
    0.03916402 = weight(_text_:retrieval in 611) [ClassicSimilarity], result of:
      0.03916402 = score(doc=611,freq=2.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.33420905 = fieldWeight in 611, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.078125 = fieldNorm(doc=611)
    0.02624345 = product of:
      0.0524869 = sum of:
        0.0524869 = weight(_text_:22 in 611) [ClassicSimilarity], result of:
          0.0524869 = score(doc=611,freq=2.0), product of:
            0.13565971 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038739666 = queryNorm
            0.38690117 = fieldWeight in 611, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=611)
      0.5 = coord(1/2)
  0.33333334 = coord(3/9)

Content: Präsentation zum Vortrag anlässlich des 98. Deutscher Bibliothekartag in Erfurt: Ein neuer Blick auf Bibliotheken; TK10: Information erschließen und recherchieren Inhalte erschließen - mit neuen Tools
Date: 22. 8.2009 12:54:24
Theme: Klassifikationssysteme im Online-Retrieval

Ribeiro-Neto, B.; Laender, A.H.F.; Lima, L.R.S. de: ¬An experimental study in automatically categorizing medical documents (2001) 0.03

0.026183546 = product of:
  0.07855064 = sum of:
    0.009326885 = weight(_text_:information in 5702) [ClassicSimilarity], result of:
      0.009326885 = score(doc=5702,freq=4.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.13714671 = fieldWeight in 5702, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5702)
    0.027693143 = weight(_text_:retrieval in 5702) [ClassicSimilarity], result of:
      0.027693143 = score(doc=5702,freq=4.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.23632148 = fieldWeight in 5702, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5702)
    0.04153061 = weight(_text_:techniques in 5702) [ClassicSimilarity], result of:
      0.04153061 = score(doc=5702,freq=2.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.24335694 = fieldWeight in 5702, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5702)
  0.33333334 = coord(3/9)

Abstract: In this article, we evaluate the retrieval performance of an algorithm that automatically categorizes medical documents. The categorization, which consists in assigning an International Code of Disease (ICD) to the medical document under examination, is based on wellknown information retrieval techniques. The algorithm, which we proposed, operates in a fully automatic mode and requires no supervision or training data. Using a database of 20,569 documents, we verify that the algorithm attains levels of average precision in the 70-80% range for category coding and in the 60-70% range for subcategory coding. We also carefully analyze the case of those documents whose categorization is not in accordance with the one provided by the human specialists. The vast majority of them represent cases that can only be fully categorized with the assistance of a human subject (because, for instance, they require specific knowledge of a given pathology). For a slim fraction of all documents (0.77% for category coding and 1.4% for subcategory coding), the algorithm makes assignments that are clearly incorrect. However, this fraction corresponds to only one-fourth of the mistakes made by the human specialists
Source: Journal of the American Society for Information Science and technology. 52(2001) no.5, S.391-401

Search Engines and Beyond : Developing efficient knowledge management systems, April 19-20 1999, Boston, Mass (1999) 0.03
```
0.025238454 = product of:
  0.07571536 = sum of:
    0.0074615083 = weight(_text_:information in 2596) [ClassicSimilarity], result of:
      0.0074615083 = score(doc=2596,freq=4.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.10971737 = fieldWeight in 2596, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03125 = fieldNorm(doc=2596)
    0.035029367 = weight(_text_:retrieval in 2596) [ClassicSimilarity], result of:
      0.035029367 = score(doc=2596,freq=10.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.29892567 = fieldWeight in 2596, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.03125 = fieldNorm(doc=2596)
    0.033224486 = weight(_text_:techniques in 2596) [ClassicSimilarity], result of:
      0.033224486 = score(doc=2596,freq=2.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.19468555 = fieldWeight in 2596, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.03125 = fieldNorm(doc=2596)
  0.33333334 = coord(3/9)
```
Content

Ramana Rao (Inxight, Palo Alto, CA) 7 ± 2 Insights on achieving Effective Information Access Session One: Updates and a twelve month perspective Danny Sullivan (Search Engine Watch, US / England) Portalization and other search trends Carol Tenopir (University of Tennessee) Search realities faced by end users and professional searchers Session Two: Today's search engines and beyond Daniel Hoogterp (Retrieval Technologies, McLean, VA) Effective presentation and utilization of search techniques Rick Kenny (Fulcrum Technologies, Ontario, Canada) Beyond document clustering: The knowledge impact statement Gary Stock (Ingenius, Kalamazoo, MI) Automated change monitoring Gary Culliss (Direct Hit, Wellesley Hills, MA) User popularity ranked search engines Byron Dom (IBM, CA) Automatically finding the best pages on the World Wide Web (CLEVER) Peter Tomassi (LookSmart, San Francisco, CA) Adding human intellect to search technology Session Three: Panel discussion: Human v automated categorization and editing Ev Brenner (New York, NY)- Chairman James Callan (University of Massachusetts, MA) Marc Krellenstein (Northern Light Technology, Cambridge, MA) Dan Miller (Ask Jeeves, Berkeley, CA) Session Four: Updates and a twelve month perspective Steve Arnold (AIT, Harrods Creek, KY) Review: The leading edge in search and retrieval software Ellen Voorhees (NIST, Gaithersburg, MD) TREC update Session Five: Search engines now and beyond Intelligent Agents John Snyder (Muscat, Cambridge, England) Practical issues behind intelligent agents Text summarization Therese Firmin, (Dept of Defense, Ft George G. Meade, MD) The TIPSTER/SUMMAC evaluation of automatic text summarization systems Cross language searching Elizabeth Liddy (TextWise, Syracuse, NY) A conceptual interlingua approach to cross-language retrieval. Video search and retrieval Armon Amir (IBM, Almaden, CA) CueVideo: Modular system for automatic indexing and browsing of video/audio Speech recognition Michael Witbrock (Lycos, Waltham, MA) Retrieval of spoken documents Visualization James A. Wise (Integral Visuals, Richland, WA) Information visualization in the new millennium: Emerging science or passing fashion? Text mining David Evans (Claritech, Pittsburgh, PA) Text mining - towards decision support

Chung, Y.M.; Lee, J.Y.: ¬A corpus-based approach to comparative evaluation of statistical term association measures (2001) 0.02

0.024767607 = product of:
  0.07430282 = sum of:
    0.013190207 = weight(_text_:information in 5769) [ClassicSimilarity], result of:
      0.013190207 = score(doc=5769,freq=8.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.19395474 = fieldWeight in 5769, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5769)
    0.01958201 = weight(_text_:retrieval in 5769) [ClassicSimilarity], result of:
      0.01958201 = score(doc=5769,freq=2.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.16710453 = fieldWeight in 5769, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5769)
    0.04153061 = weight(_text_:techniques in 5769) [ClassicSimilarity], result of:
      0.04153061 = score(doc=5769,freq=2.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.24335694 = fieldWeight in 5769, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5769)
  0.33333334 = coord(3/9)

Abstract: Statistical association measures have been widely applied in information retrieval research, usually employing a clustering of documents or terms on the basis of their relationships. Applications of the association measures for term clustering include automatic thesaurus construction and query expansion. This research evaluates the similarity of six association measures by comparing the relationship and behavior they demonstrate in various analyses of a test corpus. Analysis techniques include comparisons of highly ranked term pairs and term clusters, analyses of the correlation among the association measures using Pearson's correlation coefficient and MDS mapping, and an analysis of the impact of a term frequency on the association values by means of z-score. The major findings of the study are as follows: First, the most similar association measures are mutual information and Yule's coefficient of colligation Y, whereas cosine and Jaccard coefficients, as well as X**2 statistic and likelihood ratio, demonstrate quite similar behavior for terms with high frequency. Second, among all the measures, the X**2 statistic is the least affected by the frequency of terms. Third, although cosine and Jaccard coefficients tend to emphasize high frequency terms, mutual information and Yule's Y seem to overestimate rare terms
Source: Journal of the American Society for Information Science and technology. 52(2001) no.4, S.283-296

Schiminovich, S.: Automatic classification and retrieval of documents by means of a bibliographic pattern discovery algorithm (1971) 0.02

0.02133491 = product of:
  0.096007094 = sum of:
    0.018466292 = weight(_text_:information in 4846) [ClassicSimilarity], result of:
      0.018466292 = score(doc=4846,freq=2.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.27153665 = fieldWeight in 4846, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.109375 = fieldNorm(doc=4846)
    0.0775408 = weight(_text_:retrieval in 4846) [ClassicSimilarity], result of:
      0.0775408 = score(doc=4846,freq=4.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.6617001 = fieldWeight in 4846, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.109375 = fieldNorm(doc=4846)
  0.22222222 = coord(2/9)

Source: Information storage and retrieval. 6(1971), S.417-435

Panyr, J.: Automatische Klassifikation und Information Retrieval : Anwendung und Entwicklung komplexer Verfahren in Information-Retrieval-Systemen und ihre Evaluierung (1986) 0.02

0.020861974 = product of:
  0.09387888 = sum of:
    0.027415333 = weight(_text_:information in 32) [ClassicSimilarity], result of:
      0.027415333 = score(doc=32,freq=6.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.40312737 = fieldWeight in 32, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.09375 = fieldNorm(doc=32)
    0.066463545 = weight(_text_:retrieval in 32) [ClassicSimilarity], result of:
      0.066463545 = score(doc=32,freq=4.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.5671716 = fieldWeight in 32, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.09375 = fieldNorm(doc=32)
  0.22222222 = coord(2/9)

Series: Sprache und Information; Bd.12

Savic, D.: Automatic classification of office documents : review of available methods and techniques (1995) 0.02

0.020324344 = product of:
  0.09145955 = sum of:
    0.009233146 = weight(_text_:information in 2219) [ClassicSimilarity], result of:
      0.009233146 = score(doc=2219,freq=2.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.13576832 = fieldWeight in 2219, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2219)
    0.0822264 = weight(_text_:techniques in 2219) [ClassicSimilarity], result of:
      0.0822264 = score(doc=2219,freq=4.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.48182213 = fieldWeight in 2219, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2219)
  0.22222222 = coord(2/9)

Abstract: Classification of office documents is one of the administrative functions carried out by almost every organization and institution which sends and receives correspondence. Processing of this increasing amount of information coming and out going mail, in particular its classification, is time consuming and expensive. More and more organizations are seeking a solution for meeting this challenge by designing computer based systems for automatic classification. Examines the present status of available knowledge and methodology which can be used for automatic classification of office documents. Besides a review of classic methods and techniques, the focus id also placed on the application of artificial intelligence

Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998) 0.02

0.01961429 = product of:
  0.058842868 = sum of:
    0.01305764 = weight(_text_:information in 1673) [ClassicSimilarity], result of:
      0.01305764 = score(doc=1673,freq=4.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.1920054 = fieldWeight in 1673, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1673)
    0.027414814 = weight(_text_:retrieval in 1673) [ClassicSimilarity], result of:
      0.027414814 = score(doc=1673,freq=2.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.23394634 = fieldWeight in 1673, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1673)
    0.018370414 = product of:
      0.03674083 = sum of:
        0.03674083 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
          0.03674083 = score(doc=1673,freq=2.0), product of:
            0.13565971 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038739666 = queryNorm
            0.2708308 = fieldWeight in 1673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1673)
      0.5 = coord(1/2)
  0.33333334 = coord(3/9)

Abstract: The Wolverhampton Web Library (WWLib) is a WWW search engine that provides access to UK based information. The experimental version developed in 1995, was a success but highlighted the need for a much higher degree of automation. An interesting feature of the experimental WWLib was that it organised information according to DDC. Discusses the advantages of classification and describes the automatic classifier that is being developed in Java as part of the new, fully automated WWLib
Date: 1. 8.1996 22:08:06
Theme: Klassifikationssysteme im Online-Retrieval

Rijsbergen, C.J. van: Automatic classification in information retrieval (1978) 0.02

0.018614836 = product of:
  0.08376676 = sum of:
    0.021104332 = weight(_text_:information in 2412) [ClassicSimilarity], result of:
      0.021104332 = score(doc=2412,freq=2.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.3103276 = fieldWeight in 2412, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.125 = fieldNorm(doc=2412)
    0.06266243 = weight(_text_:retrieval in 2412) [ClassicSimilarity], result of:
      0.06266243 = score(doc=2412,freq=2.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.5347345 = fieldWeight in 2412, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.125 = fieldNorm(doc=2412)
  0.22222222 = coord(2/9)

Search (174 results, page 1 of 9)

Authors

Years

Languages

Types

Themes

Subjects