Search (14 results, page 1 of 1)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.10

0.101439916 = sum of:
  0.08076982 = product of:
    0.24230945 = sum of:
      0.24230945 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
        0.24230945 = score(doc=562,freq=2.0), product of:
          0.43114176 = queryWeight, product of:
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.050854117 = queryNorm
          0.56201804 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
    0.33333334 = coord(1/3)
  0.020670092 = product of:
    0.041340183 = sum of:
      0.041340183 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
        0.041340183 = score(doc=562,freq=2.0), product of:
          0.17808245 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.050854117 = queryNorm
          0.23214069 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
    0.5 = coord(1/2)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Leroy, G.; Miller, T.; Rosemblat, G.; Browne, A.: ¬A balanced approach to health information evaluation : a vocabulary-based naïve Bayes classifier and readability formulas (2008) 0.03
```
0.034448884 = product of:
  0.06889777 = sum of:
    0.06889777 = product of:
      0.13779554 = sum of:
        0.13779554 = weight(_text_:90 in 1998) [ClassicSimilarity], result of:
          0.13779554 = score(doc=1998,freq=4.0), product of:
            0.2733978 = queryWeight, product of:
              5.376119 = idf(docFreq=555, maxDocs=44218)
              0.050854117 = queryNorm
            0.50401115 = fieldWeight in 1998, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.376119 = idf(docFreq=555, maxDocs=44218)
              0.046875 = fieldNorm(doc=1998)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Since millions seek health information online, it is vital for this information to be comprehensible. Most studies use readability formulas, which ignore vocabulary, and conclude that online health information is too difficult. We developed a vocabularly-based, naïve Bayes classifier to distinguish between three difficulty levels in text. It proved 98% accurate in a 250-document evaluation. We compared our classifier with readability formulas for 90 new documents with different origins and asked representative human evaluators, an expert and a consumer, to judge each document. Average readability grade levels for educational and commercial pages was 10th grade or higher, too difficult according to current literature. In contrast, the classifier showed that 70-90% of these pages were written at an intermediate, appropriate level indicating that vocabulary usage is frequently appropriate in text considered too difficult by readability formula evaluations. The expert considered the pages more difficult for a consumer than the consumer did.

Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.02

0.020670092 = product of:
  0.041340183 = sum of:
    0.041340183 = product of:
      0.08268037 = sum of:
        0.08268037 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
          0.08268037 = score(doc=1046,freq=2.0), product of:
            0.17808245 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050854117 = queryNorm
            0.46428138 = fieldWeight in 1046, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=1046)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 5. 5.2003 14:17:22

Calado, P.; Cristo, M.; Gonçalves, M.A.; Moura, E.S. de; Ribeiro-Neto, B.; Ziviani, N.: Link-based similarity measures for the classification of Web documents (2006) 0.02
```
0.020299202 = product of:
  0.040598404 = sum of:
    0.040598404 = product of:
      0.08119681 = sum of:
        0.08119681 = weight(_text_:90 in 4921) [ClassicSimilarity], result of:
          0.08119681 = score(doc=4921,freq=2.0), product of:
            0.2733978 = queryWeight, product of:
              5.376119 = idf(docFreq=555, maxDocs=44218)
              0.050854117 = queryNorm
            0.29699144 = fieldWeight in 4921, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.376119 = idf(docFreq=555, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4921)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Traditional text-based document classifiers tend to perform poorly an the Web. Text in Web documents is usually noisy and often does not contain enough information to determine their topic. However, the Web provides a different source that can be useful to document classification: its hyperlink structure. In this work, the authors evaluate how the link structure of the Web can be used to determine a measure of similarity appropriate for document classification. They experiment with five different similarity measures and determine their adequacy for predicting the topic of a Web page. Tests performed an a Web directory Show that link information alone allows classifying documents with an average precision of 86%. Further, when combined with a traditional textbased classifier, precision increases to values of up to 90%, representing gains that range from 63 to 132% over the use of text-based classification alone. Because the measures proposed in this article are straightforward to compute, they provide a practical and effective solution for Web classification and related information retrieval tasks. Further, the authors provide an important set of guidelines an how link structure can be used effectively to classify Web documents.
Wang, J.: ¬An extensive study on automated Dewey Decimal Classification (2009) 0.02
```
0.020299202 = product of:
  0.040598404 = sum of:
    0.040598404 = product of:
      0.08119681 = sum of:
        0.08119681 = weight(_text_:90 in 3172) [ClassicSimilarity], result of:
          0.08119681 = score(doc=3172,freq=2.0), product of:
            0.2733978 = queryWeight, product of:
              5.376119 = idf(docFreq=555, maxDocs=44218)
              0.050854117 = queryNorm
            0.29699144 = fieldWeight in 3172, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.376119 = idf(docFreq=555, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3172)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

In this paper, we present a theoretical analysis and extensive experiments on the automated assignment of Dewey Decimal Classification (DDC) classes to bibliographic data with a supervised machine-learning approach. Library classification systems, such as the DDC, impose great obstacles on state-of-art text categorization (TC) technologies, including deep hierarchy, data sparseness, and skewed distribution. We first analyze statistically the document and category distributions over the DDC, and discuss the obstacles imposed by bibliographic corpora and library classification schemes on TC technology. To overcome these obstacles, we propose an innovative algorithm to reshape the DDC structure into a balanced virtual tree by balancing the category distribution and flattening the hierarchy. To improve the classification effectiveness to a level acceptable to real-world applications, we propose an interactive classification model that is able to predict a class of any depth within a limited number of user interactions. The experiments are conducted on a large bibliographic collection created by the Library of Congress within the science and technology domains over 10 years. With no more than three interactions, a classification accuracy of nearly 90% is achieved, thus providing a practical solution to the automatic bibliographic classification problem.

Reiner, U.: Automatische DDC-Klassifizierung von bibliografischen Titeldatensätzen (2009) 0.02

0.017225077 = product of:
  0.034450155 = sum of:
    0.034450155 = product of:
      0.06890031 = sum of:
        0.06890031 = weight(_text_:22 in 611) [ClassicSimilarity], result of:
          0.06890031 = score(doc=611,freq=2.0), product of:
            0.17808245 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050854117 = queryNorm
            0.38690117 = fieldWeight in 611, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=611)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 8.2009 12:54:24

Automatic classification research at OCLC (2002) 0.01

0.012057554 = product of:
  0.024115108 = sum of:
    0.024115108 = product of:
      0.048230216 = sum of:
        0.048230216 = weight(_text_:22 in 1563) [ClassicSimilarity], result of:
          0.048230216 = score(doc=1563,freq=2.0), product of:
            0.17808245 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050854117 = queryNorm
            0.2708308 = fieldWeight in 1563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1563)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 5. 5.2003 9:22:09

Yoon, Y.; Lee, C.; Lee, G.G.: ¬An effective procedure for constructing a hierarchical text classification system (2006) 0.01

0.012057554 = product of:
  0.024115108 = sum of:
    0.024115108 = product of:
      0.048230216 = sum of:
        0.048230216 = weight(_text_:22 in 5273) [ClassicSimilarity], result of:
          0.048230216 = score(doc=5273,freq=2.0), product of:
            0.17808245 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050854117 = queryNorm
            0.2708308 = fieldWeight in 5273, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5273)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 7.2006 16:24:52

Yi, K.: Automatic text classification using library classification schemes : trends, issues and challenges (2007) 0.01

0.012057554 = product of:
  0.024115108 = sum of:
    0.024115108 = product of:
      0.048230216 = sum of:
        0.048230216 = weight(_text_:22 in 2560) [ClassicSimilarity], result of:
          0.048230216 = score(doc=2560,freq=2.0), product of:
            0.17808245 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050854117 = queryNorm
            0.2708308 = fieldWeight in 2560, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2560)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 9.2008 18:31:54

Liu, R.-L.: Context recognition for hierarchical text classification (2009) 0.01

0.010335046 = product of:
  0.020670092 = sum of:
    0.020670092 = product of:
      0.041340183 = sum of:
        0.041340183 = weight(_text_:22 in 2760) [ClassicSimilarity], result of:
          0.041340183 = score(doc=2760,freq=2.0), product of:
            0.17808245 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050854117 = queryNorm
            0.23214069 = fieldWeight in 2760, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2760)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 3.2009 19:11:54

Pfeffer, M.: Automatische Vergabe von RVK-Notationen mittels fallbasiertem Schließen (2009) 0.01

0.010335046 = product of:
  0.020670092 = sum of:
    0.020670092 = product of:
      0.041340183 = sum of:
        0.041340183 = weight(_text_:22 in 3051) [ClassicSimilarity], result of:
          0.041340183 = score(doc=3051,freq=2.0), product of:
            0.17808245 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050854117 = queryNorm
            0.23214069 = fieldWeight in 3051, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=3051)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 8.2009 19:51:28

Mengle, S.; Goharian, N.: Passage detection using text classification (2009) 0.01

0.008612539 = product of:
  0.017225077 = sum of:
    0.017225077 = product of:
      0.034450155 = sum of:
        0.034450155 = weight(_text_:22 in 2765) [ClassicSimilarity], result of:
          0.034450155 = score(doc=2765,freq=2.0), product of:
            0.17808245 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050854117 = queryNorm
            0.19345059 = fieldWeight in 2765, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2765)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 3.2009 19:14:43

Khoo, C.S.G.; Ng, K.; Ou, S.: ¬An exploratory study of human clustering of Web pages (2003) 0.01

0.0068900306 = product of:
  0.013780061 = sum of:
    0.013780061 = product of:
      0.027560122 = sum of:
        0.027560122 = weight(_text_:22 in 2741) [ClassicSimilarity], result of:
          0.027560122 = score(doc=2741,freq=2.0), product of:
            0.17808245 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050854117 = queryNorm
            0.15476047 = fieldWeight in 2741, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=2741)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 12. 9.2004 9:56:22

Reiner, U.: Automatische DDC-Klassifizierung bibliografischer Titeldatensätze der Deutschen Nationalbibliografie (2009) 0.01

0.0068900306 = product of:
  0.013780061 = sum of:
    0.013780061 = product of:
      0.027560122 = sum of:
        0.027560122 = weight(_text_:22 in 3284) [ClassicSimilarity], result of:
          0.027560122 = score(doc=3284,freq=2.0), product of:
            0.17808245 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050854117 = queryNorm
            0.15476047 = fieldWeight in 3284, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=3284)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 1.2010 14:41:24

Search (14 results, page 1 of 1)

Authors

Languages

Types

Themes