Search (155 results, page 1 of 8)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.07

0.068927675 = product of:
  0.10339151 = sum of:
    0.08232375 = product of:
      0.24697125 = sum of:
        0.24697125 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.24697125 = score(doc=562,freq=2.0), product of:
            0.43943653 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.0518325 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.021067765 = product of:
      0.04213553 = sum of:
        0.04213553 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.04213553 = score(doc=562,freq=2.0), product of:
            0.18150859 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0518325 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Liu, R.-L.: Context recognition for hierarchical text classification (2009) 0.07

0.06634197 = product of:
  0.09951294 = sum of:
    0.018340444 = weight(_text_:information in 2760) [ClassicSimilarity], result of:
      0.018340444 = score(doc=2760,freq=6.0), product of:
        0.09099081 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0518325 = queryNorm
        0.20156369 = fieldWeight in 2760, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2760)
    0.081172496 = sum of:
      0.039036963 = weight(_text_:management in 2760) [ClassicSimilarity], result of:
        0.039036963 = score(doc=2760,freq=2.0), product of:
          0.17470726 = queryWeight, product of:
            3.3706124 = idf(docFreq=4130, maxDocs=44218)
            0.0518325 = queryNorm
          0.22344214 = fieldWeight in 2760, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.3706124 = idf(docFreq=4130, maxDocs=44218)
            0.046875 = fieldNorm(doc=2760)
      0.04213553 = weight(_text_:22 in 2760) [ClassicSimilarity], result of:
        0.04213553 = score(doc=2760,freq=2.0), product of:
          0.18150859 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0518325 = queryNorm
          0.23214069 = fieldWeight in 2760, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=2760)
  0.6666667 = coord(2/3)

Abstract: Information is often organized as a text hierarchy. A hierarchical text-classification system is thus essential for the management, sharing, and dissemination of information. It aims to automatically classify each incoming document into zero, one, or several categories in the text hierarchy. In this paper, we present a technique called CRHTC (context recognition for hierarchical text classification) that performs hierarchical text classification by recognizing the context of discussion (COD) of each category. A category's COD is governed by its ancestor categories, whose contents indicate contextual backgrounds of the category. A document may be classified into a category only if its content matches the category's COD. CRHTC does not require any trials to manually set parameters, and hence is more portable and easier to implement than other methods. It is empirically evaluated under various conditions. The results show that CRHTC achieves both better and more stable performance than several hierarchical and nonhierarchical text-classification methodologies.
Date: 22. 3.2009 19:11:54
Source: Journal of the American Society for Information Science and Technology. 60(2009) no.4, S.803-813

Kwok, K.L.: ¬The use of titles and cited titles as document representations for automatic classification (1975) 0.05

0.046833646 = product of:
  0.07025047 = sum of:
    0.02470734 = weight(_text_:information in 4347) [ClassicSimilarity], result of:
      0.02470734 = score(doc=4347,freq=2.0), product of:
        0.09099081 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0518325 = queryNorm
        0.27153665 = fieldWeight in 4347, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.109375 = fieldNorm(doc=4347)
    0.045543127 = product of:
      0.09108625 = sum of:
        0.09108625 = weight(_text_:management in 4347) [ClassicSimilarity], result of:
          0.09108625 = score(doc=4347,freq=2.0), product of:
            0.17470726 = queryWeight, product of:
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.0518325 = queryNorm
            0.521365 = fieldWeight in 4347, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.109375 = fieldNorm(doc=4347)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Source: Information processing and management. 11(1975), S.201-206

Wu, M.; Fuller, M.; Wilkinson, R.: Using clustering and classification approaches in interactive retrieval (2001) 0.05

0.046833646 = product of:
  0.07025047 = sum of:
    0.02470734 = weight(_text_:information in 2666) [ClassicSimilarity], result of:
      0.02470734 = score(doc=2666,freq=2.0), product of:
        0.09099081 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0518325 = queryNorm
        0.27153665 = fieldWeight in 2666, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.109375 = fieldNorm(doc=2666)
    0.045543127 = product of:
      0.09108625 = sum of:
        0.09108625 = weight(_text_:management in 2666) [ClassicSimilarity], result of:
          0.09108625 = score(doc=2666,freq=2.0), product of:
            0.17470726 = queryWeight, product of:
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.0518325 = queryNorm
            0.521365 = fieldWeight in 2666, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.109375 = fieldNorm(doc=2666)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Source: Information processing and management. 37(2001) no.3, S.459-484

Major, R.L.; Ragsdale, C.T.: ¬An aggregation approach to the classification problem using multiple prediction experts (2000) 0.04

0.040143125 = product of:
  0.060214683 = sum of:
    0.02117772 = weight(_text_:information in 3789) [ClassicSimilarity], result of:
      0.02117772 = score(doc=3789,freq=2.0), product of:
        0.09099081 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0518325 = queryNorm
        0.23274569 = fieldWeight in 3789, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.09375 = fieldNorm(doc=3789)
    0.039036963 = product of:
      0.078073926 = sum of:
        0.078073926 = weight(_text_:management in 3789) [ClassicSimilarity], result of:
          0.078073926 = score(doc=3789,freq=2.0), product of:
            0.17470726 = queryWeight, product of:
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.0518325 = queryNorm
            0.44688427 = fieldWeight in 3789, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.09375 = fieldNorm(doc=3789)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Source: Information processing and management. 36(2000) no.4, S.683-696

Miyamoto, S.: Information clustering based an fuzzy multisets (2003) 0.04

0.035354502 = product of:
  0.05303175 = sum of:
    0.030260187 = weight(_text_:information in 1071) [ClassicSimilarity], result of:
      0.030260187 = score(doc=1071,freq=12.0), product of:
        0.09099081 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0518325 = queryNorm
        0.3325631 = fieldWeight in 1071, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1071)
    0.022771563 = product of:
      0.045543127 = sum of:
        0.045543127 = weight(_text_:management in 1071) [ClassicSimilarity], result of:
          0.045543127 = score(doc=1071,freq=2.0), product of:
            0.17470726 = queryWeight, product of:
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.0518325 = queryNorm
            0.2606825 = fieldWeight in 1071, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1071)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: A fuzzy multiset model for information clustering is proposed with application to information retrieval on the World Wide Web. Noting that a search engine retrieves multiple occurrences of the same subjects with possibly different degrees of relevance, we observe that fuzzy multisets provide an appropriate model of information retrieval on the WWW. Information clustering which means both term clustering and document clustering is considered. Three methods of the hard c-means, fuzzy c-means, and an agglomerative method using cluster centers are proposed. Two distances between fuzzy multisets and algorithms for calculating cluster centers are defined. Theoretical properties concerning the clustering algorithms are studied. Illustrative examples are given to show how the algorithms work.
Source: Information processing and management. 39(2003) no.2, S.195-213

Reiner, U.: Automatische DDC-Klassifizierung von bibliografischen Titeldatensätzen (2009) 0.04

0.035174027 = product of:
  0.05276104 = sum of:
    0.017648099 = weight(_text_:information in 611) [ClassicSimilarity], result of:
      0.017648099 = score(doc=611,freq=2.0), product of:
        0.09099081 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0518325 = queryNorm
        0.19395474 = fieldWeight in 611, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.078125 = fieldNorm(doc=611)
    0.035112944 = product of:
      0.07022589 = sum of:
        0.07022589 = weight(_text_:22 in 611) [ClassicSimilarity], result of:
          0.07022589 = score(doc=611,freq=2.0), product of:
            0.18150859 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0518325 = queryNorm
            0.38690117 = fieldWeight in 611, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=611)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Content: Präsentation zum Vortrag anlässlich des 98. Deutscher Bibliothekartag in Erfurt: Ein neuer Blick auf Bibliotheken; TK10: Information erschließen und recherchieren Inhalte erschließen - mit neuen Tools
Date: 22. 8.2009 12:54:24

Guerrero-Bote, V.P.; Moya Anegón, F. de; Herrero Solana, V.: Document organization using Kohonen's algorithm (2002) 0.03

0.03365238 = product of:
  0.05047857 = sum of:
    0.024453925 = weight(_text_:information in 2564) [ClassicSimilarity], result of:
      0.024453925 = score(doc=2564,freq=6.0), product of:
        0.09099081 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0518325 = queryNorm
        0.2687516 = fieldWeight in 2564, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0625 = fieldNorm(doc=2564)
    0.026024643 = product of:
      0.052049287 = sum of:
        0.052049287 = weight(_text_:management in 2564) [ClassicSimilarity], result of:
          0.052049287 = score(doc=2564,freq=2.0), product of:
            0.17470726 = queryWeight, product of:
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.0518325 = queryNorm
            0.29792285 = fieldWeight in 2564, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.0625 = fieldNorm(doc=2564)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: The classification of documents from a bibliographic database is a task that is linked to processes of information retrieval based on partial matching. A method is described of vectorizing reference documents from LISA which permits their topological organization using Kohonen's algorithm. As an example a map is generated of 202 documents from LISA, and an analysis is made of the possibilities of this type of neural network with respect to the development of information retrieval systems based on graphical browsing.
Source: Information processing and management. 38(2002) no.1, S.79-89

Dubin, D.: Dimensions and discriminability (1998) 0.03

0.03065083 = product of:
  0.045976244 = sum of:
    0.021397185 = weight(_text_:information in 2338) [ClassicSimilarity], result of:
      0.021397185 = score(doc=2338,freq=6.0), product of:
        0.09099081 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0518325 = queryNorm
        0.23515764 = fieldWeight in 2338, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2338)
    0.02457906 = product of:
      0.04915812 = sum of:
        0.04915812 = weight(_text_:22 in 2338) [ClassicSimilarity], result of:
          0.04915812 = score(doc=2338,freq=2.0), product of:
            0.18150859 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0518325 = queryNorm
            0.2708308 = fieldWeight in 2338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2338)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Date: 22. 9.1997 19:16:05
Imprint: Urbana-Champaign, IL : Illinois University at Urbana-Champaign, Graduate School of Library and Information Science
Source: Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al

Denoyer, L.; Gallinari, P.: Bayesian network model for semi-structured document classification (2004) 0.03

0.028797261 = product of:
  0.043195892 = sum of:
    0.02367741 = weight(_text_:information in 995) [ClassicSimilarity], result of:
      0.02367741 = score(doc=995,freq=10.0), product of:
        0.09099081 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0518325 = queryNorm
        0.2602176 = fieldWeight in 995, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=995)
    0.019518482 = product of:
      0.039036963 = sum of:
        0.039036963 = weight(_text_:management in 995) [ClassicSimilarity], result of:
          0.039036963 = score(doc=995,freq=2.0), product of:
            0.17470726 = queryWeight, product of:
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.0518325 = queryNorm
            0.22344214 = fieldWeight in 995, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.046875 = fieldNorm(doc=995)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: Recently, a new community has started to emerge around the development of new information research methods for searching and analyzing semi-structured and XML like documents. The goal is to handle both content and structural information, and to deal with different types of information content (text, image, etc.). We consider here the task of structured document classification. We propose a generative model able to handle both structure and content which is based on Bayesian networks. We then show how to transform this generative model into a discriminant classifier using the method of Fisher kernel. The model is then extended for dealing with different types of content information (here text and images). The model was tested on three databases: the classical webKB corpus composed of HTML pages, the new INEX corpus which has become a reference in the field of ad-hoc retrieval for XML documents, and a multimedia corpus of Web pages.
Source: Information processing and management. 40(2004) no.5, S.807-827

Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998) 0.03

0.028033193 = product of:
  0.042049788 = sum of:
    0.017470727 = weight(_text_:information in 1673) [ClassicSimilarity], result of:
      0.017470727 = score(doc=1673,freq=4.0), product of:
        0.09099081 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0518325 = queryNorm
        0.1920054 = fieldWeight in 1673, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1673)
    0.02457906 = product of:
      0.04915812 = sum of:
        0.04915812 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
          0.04915812 = score(doc=1673,freq=2.0), product of:
            0.18150859 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0518325 = queryNorm
            0.2708308 = fieldWeight in 1673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1673)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: The Wolverhampton Web Library (WWLib) is a WWW search engine that provides access to UK based information. The experimental version developed in 1995, was a success but highlighted the need for a much higher degree of automation. An interesting feature of the experimental WWLib was that it organised information according to DDC. Discusses the advantages of classification and describes the automatic classifier that is being developed in Java as part of the new, fully automated WWLib
Date: 1. 8.1996 22:08:06

Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.03
```
0.027268475 = product of:
  0.04090271 = sum of:
    0.02334624 = weight(_text_:information in 1107) [ClassicSimilarity], result of:
      0.02334624 = score(doc=1107,freq=14.0), product of:
        0.09099081 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0518325 = queryNorm
        0.256578 = fieldWeight in 1107, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1107)
    0.017556472 = product of:
      0.035112944 = sum of:
        0.035112944 = weight(_text_:22 in 1107) [ClassicSimilarity], result of:
          0.035112944 = score(doc=1107,freq=2.0), product of:
            0.18150859 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0518325 = queryNorm
            0.19345059 = fieldWeight in 1107, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1107)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Retrieval of disease information is often based on several key aspects such as etiology, diagnosis, treatment, prevention, and symptoms of diseases. Automatic identification of disease aspect information is thus essential. In this article, I model the aspect identification problem as a text classification (TC) problem in which a disease aspect corresponds to a category. The disease aspect classification problem poses two challenges to classifiers: (a) a medical text often contains information about multiple aspects of a disease and hence produces noise for the classifiers and (b) text classifiers often cannot extract the textual parts (i.e., passages) about the categories of interest. I thus develop a technique, PETC (Passage Extractor for Text Classification), that extracts passages (from medical texts) for the underlying text classifiers to classify. Case studies on thousands of Chinese and English medical texts show that PETC enhances a support vector machine (SVM) classifier in classifying disease aspect information. PETC also performs better than three state-of-the-art classifier enhancement techniques, including two passage extraction techniques for text classifiers and a technique that employs term proximity information to enhance text classifiers. The contribution is of significance to evidence-based medicine, health education, and healthcare decision support. PETC can be used in those application domains in which a text to be classified may have several parts about different categories.

Date

28.10.2013 19:22:57

Source

Journal of the American Society for Information Science and Technology. 64(2013) no.11, S.2265-2277

Losee, R.M.: Text windows and phrases differing by discipline, location in document, and syntactic structure (1996) 0.03

0.026828196 = product of:
  0.040242292 = sum of:
    0.017470727 = weight(_text_:information in 6962) [ClassicSimilarity], result of:
      0.017470727 = score(doc=6962,freq=4.0), product of:
        0.09099081 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0518325 = queryNorm
        0.1920054 = fieldWeight in 6962, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=6962)
    0.022771563 = product of:
      0.045543127 = sum of:
        0.045543127 = weight(_text_:management in 6962) [ClassicSimilarity], result of:
          0.045543127 = score(doc=6962,freq=2.0), product of:
            0.17470726 = queryWeight, product of:
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.0518325 = queryNorm
            0.2606825 = fieldWeight in 6962, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6962)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: Knowledge of window style, content, location, and grammatical structure may be used to classify documents as originating within a particular discipline or may be used to place a document on a theory vs. practice spectrum. Examines characteristics of phrases and text windows, including their number, location in documents, and grammatical construction, in addition to studying variations in these window characteristics across disciplines. Examines some of the linguistic regularities for individual disciplines, and suggests families of regularities that may provide helpful for the automatic classification of documents, as well as for information retrieval and filtering applications
Source: Information processing and management. 32(1996) no.6, S.747-767

Yoon, Y.; Lee, G.G.: Efficient implementation of associative classifiers for document classification (2007) 0.03
```
0.025239285 = product of:
  0.037858926 = sum of:
    0.018340444 = weight(_text_:information in 909) [ClassicSimilarity], result of:
      0.018340444 = score(doc=909,freq=6.0), product of:
        0.09099081 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0518325 = queryNorm
        0.20156369 = fieldWeight in 909, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=909)
    0.019518482 = product of:
      0.039036963 = sum of:
        0.039036963 = weight(_text_:management in 909) [ClassicSimilarity], result of:
          0.039036963 = score(doc=909,freq=2.0), product of:
            0.17470726 = queryWeight, product of:
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.0518325 = queryNorm
            0.22344214 = fieldWeight in 909, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.046875 = fieldNorm(doc=909)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

In practical text classification tasks, the ability to interpret the classification result is as important as the ability to classify exactly. Associative classifiers have many favorable characteristics such as rapid training, good classification accuracy, and excellent interpretation. However, associative classifiers also have some obstacles to overcome when they are applied in the area of text classification. The target text collection generally has a very high dimension, thus the training process might take a very long time. We propose a feature selection based on the mutual information between the word and class variables to reduce the space dimension of the associative classifiers. In addition, the training process of the associative classifier produces a huge amount of classification rules, which makes the prediction with a new document ineffective. We resolve this by introducing a new efficient method for storing and pruning classification rules. This method can also be used when predicting a test document. Experimental results using the 20-newsgroups dataset show many benefits of the associative classification in both training and predicting when applied to a real world problem.

Footnote

Beitrag in: Special issue on AIRS2005: Information Retrieval Research in Asia

Source

Information processing and management. 43(2007) no.2, S.393-405

Montesi, M.; Navarrete, T.: Classifying web genres in context : A case study documenting the web genres used by a software engineer (2008) 0.03

0.025239285 = product of:
  0.037858926 = sum of:
    0.018340444 = weight(_text_:information in 2100) [ClassicSimilarity], result of:
      0.018340444 = score(doc=2100,freq=6.0), product of:
        0.09099081 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0518325 = queryNorm
        0.20156369 = fieldWeight in 2100, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2100)
    0.019518482 = product of:
      0.039036963 = sum of:
        0.039036963 = weight(_text_:management in 2100) [ClassicSimilarity], result of:
          0.039036963 = score(doc=2100,freq=2.0), product of:
            0.17470726 = queryWeight, product of:
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.0518325 = queryNorm
            0.22344214 = fieldWeight in 2100, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.046875 = fieldNorm(doc=2100)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: This case study analyzes the Internet-based resources that a software engineer uses in his daily work. Methodologically, we studied the web browser history of the participant, classifying all the web pages he had seen over a period of 12 days into web genres. We interviewed him before and after the analysis of the web browser history. In the first interview, he spoke about his general information behavior; in the second, he commented on each web genre, explaining why and how he used them. As a result, three approaches allow us to describe the set of 23 web genres obtained: (a) the purposes they serve for the participant; (b) the role they play in the various work and search phases; (c) and the way they are used in combination with each other. Further observations concern the way the participant assesses quality of web-based resources, and his information behavior as a software engineer.
Source: Information processing and management. 44(2008) no.4, S.1410-1430

Bock, H.-H.: Datenanalyse zur Strukturierung und Ordnung von Information (1989) 0.02

0.02462182 = product of:
  0.03693273 = sum of:
    0.01235367 = weight(_text_:information in 141) [ClassicSimilarity], result of:
      0.01235367 = score(doc=141,freq=2.0), product of:
        0.09099081 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0518325 = queryNorm
        0.13576832 = fieldWeight in 141, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=141)
    0.02457906 = product of:
      0.04915812 = sum of:
        0.04915812 = weight(_text_:22 in 141) [ClassicSimilarity], result of:
          0.04915812 = score(doc=141,freq=2.0), product of:
            0.18150859 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0518325 = queryNorm
            0.2708308 = fieldWeight in 141, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=141)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Pages: S.1-22

Yoon, Y.; Lee, C.; Lee, G.G.: ¬An effective procedure for constructing a hierarchical text classification system (2006) 0.02

0.02462182 = product of:
  0.03693273 = sum of:
    0.01235367 = weight(_text_:information in 5273) [ClassicSimilarity], result of:
      0.01235367 = score(doc=5273,freq=2.0), product of:
        0.09099081 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0518325 = queryNorm
        0.13576832 = fieldWeight in 5273, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5273)
    0.02457906 = product of:
      0.04915812 = sum of:
        0.04915812 = weight(_text_:22 in 5273) [ClassicSimilarity], result of:
          0.04915812 = score(doc=5273,freq=2.0), product of:
            0.18150859 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0518325 = queryNorm
            0.2708308 = fieldWeight in 5273, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5273)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Date: 22. 7.2006 16:24:52
Source: Journal of the American Society for Information Science and Technology. 57(2006) no.3, S.431-442

Yi, K.: Automatic text classification using library classification schemes : trends, issues and challenges (2007) 0.02

0.02462182 = product of:
  0.03693273 = sum of:
    0.01235367 = weight(_text_:information in 2560) [ClassicSimilarity], result of:
      0.01235367 = score(doc=2560,freq=2.0), product of:
        0.09099081 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0518325 = queryNorm
        0.13576832 = fieldWeight in 2560, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2560)
    0.02457906 = product of:
      0.04915812 = sum of:
        0.04915812 = weight(_text_:22 in 2560) [ClassicSimilarity], result of:
          0.04915812 = score(doc=2560,freq=2.0), product of:
            0.18150859 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0518325 = queryNorm
            0.2708308 = fieldWeight in 2560, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2560)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: The proliferation of digital resources and their integration into a traditional library setting has created a pressing need for an automated tool that organizes textual information based on library classification schemes. Automated text classification is a research field of developing tools, methods, and models to automate text classification. This article describes the current popular approach for text classification and major text classification projects and applications that are based on library classification schemes. Related issues and challenges are discussed, and a number of considerations for the challenges are examined.
Date: 22. 9.2008 18:31:54

Classification, automation, and new media : Proceedings of the 24th Annual Conference of the Gesellschaft für Klassifikation e.V., University of Passau, March 15 - 17, 2000 (2002) 0.02

0.023654563 = product of:
  0.035481844 = sum of:
    0.01247909 = weight(_text_:information in 5997) [ClassicSimilarity], result of:
      0.01247909 = score(doc=5997,freq=4.0), product of:
        0.09099081 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0518325 = queryNorm
        0.13714671 = fieldWeight in 5997, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5997)
    0.023002753 = product of:
      0.046005506 = sum of:
        0.046005506 = weight(_text_:management in 5997) [ClassicSimilarity], result of:
          0.046005506 = score(doc=5997,freq=4.0), product of:
            0.17470726 = queryWeight, product of:
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.0518325 = queryNorm
            0.2633291 = fieldWeight in 5997, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5997)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: Given the huge amount of information in the internet and in practically every domain of knowledge that we are facing today, knowledge discovery calls for automation. The book deals with methods from classification and data analysis that respond effectively to this rapidly growing challenge. The interested reader will find new methodological insights as well as applications in economics, management science, finance, and marketing, and in pattern recognition, biology, health, and archaeology.
Content: Data Analysis, Statistics, and Classification.- Pattern Recognition and Automation.- Data Mining, Information Processing, and Automation.- New Media, Web Mining, and Automation.- Applications in Management Science, Finance, and Marketing.- Applications in Medicine, Biology, Archaeology, and Others.- Author Index.- Subject Index.

Mengle, S.; Goharian, N.: Passage detection using text classification (2009) 0.02
```
0.023469714 = product of:
  0.03520457 = sum of:
    0.017648099 = weight(_text_:information in 2765) [ClassicSimilarity], result of:
      0.017648099 = score(doc=2765,freq=8.0), product of:
        0.09099081 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0518325 = queryNorm
        0.19395474 = fieldWeight in 2765, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2765)
    0.017556472 = product of:
      0.035112944 = sum of:
        0.035112944 = weight(_text_:22 in 2765) [ClassicSimilarity], result of:
          0.035112944 = score(doc=2765,freq=2.0), product of:
            0.18150859 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0518325 = queryNorm
            0.19345059 = fieldWeight in 2765, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2765)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Passages can be hidden within a text to circumvent their disallowed transfer. Such release of compartmentalized information is of concern to all corporate and governmental organizations. Passage retrieval is well studied; we posit, however, that passage detection is not. Passage retrieval is the determination of the degree of relevance of blocks of text, namely passages, comprising a document. Rather than determining the relevance of a document in its entirety, passage retrieval determines the relevance of the individual passages. As such, modified traditional information-retrieval techniques compare terms found in user queries with the individual passages to determine a similarity score for passages of interest. In passage detection, passages are classified into predetermined categories. More often than not, passage detection techniques are deployed to detect hidden paragraphs in documents. That is, to hide information, documents are injected with hidden text into passages. Rather than matching query terms against passages to determine their relevance, using text-mining techniques, the passages are classified. Those documents with hidden passages are defined as infected. Thus, simply stated, passage retrieval is the search for passages relevant to a user query, while passage detection is the classification of passages. That is, in passage detection, passages are labeled with one or more categories from a set of predetermined categories. We present a keyword-based dynamic passage approach (KDP) and demonstrate that KDP outperforms statistically significantly (99% confidence) the other document-splitting approaches by 12% to 18% in the passage detection and passage category-prediction tasks. Furthermore, we evaluate the effects of the feature selection, passage length, ambiguous passages, and finally training-data category distribution on passage-detection accuracy.

Date

22. 3.2009 19:14:43

Source

Journal of the American Society for Information Science and Technology. 60(2009) no.4, S.814-825

Search (155 results, page 1 of 8)

Authors

Years

Languages

Types

Themes

Subjects