Search (68 results, page 1 of 4)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.07

0.065799624 = sum of:
  0.053605508 = product of:
    0.21442203 = sum of:
      0.21442203 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
        0.21442203 = score(doc=562,freq=2.0), product of:
          0.3815216 = queryWeight, product of:
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.04500131 = queryNorm
          0.56201804 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
    0.25 = coord(1/4)
  0.012194112 = product of:
    0.036582336 = sum of:
      0.036582336 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
        0.036582336 = score(doc=562,freq=2.0), product of:
          0.15758692 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.04500131 = queryNorm
          0.23214069 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
    0.33333334 = coord(1/3)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Wätjen, H.-J.; Diekmann, B.; Möller, G.; Carstensen, K.-U.: Bericht zum DFG-Projekt: GERHARD : German Harvest Automated Retrieval and Directory (1998) 0.04

0.03928058 = product of:
  0.07856116 = sum of:
    0.07856116 = product of:
      0.11784173 = sum of:
        0.06335962 = weight(_text_:k in 3065) [ClassicSimilarity], result of:
          0.06335962 = score(doc=3065,freq=2.0), product of:
            0.16064468 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.04500131 = queryNorm
            0.39440846 = fieldWeight in 3065, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.078125 = fieldNorm(doc=3065)
        0.05448211 = weight(_text_:r in 3065) [ClassicSimilarity], result of:
          0.05448211 = score(doc=3065,freq=2.0), product of:
            0.14896595 = queryWeight, product of:
              3.3102584 = idf(docFreq=4387, maxDocs=44218)
              0.04500131 = queryNorm
            0.36573532 = fieldWeight in 3065, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3102584 = idf(docFreq=4387, maxDocs=44218)
              0.078125 = fieldNorm(doc=3065)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)

Type: r

Zhu, W.Z.; Allen, R.B.: Document clustering using the LSI subspace signature model (2013) 0.03
```
0.03414253 = product of:
  0.06828506 = sum of:
    0.06828506 = product of:
      0.10242759 = sum of:
        0.06584525 = weight(_text_:k in 690) [ClassicSimilarity], result of:
          0.06584525 = score(doc=690,freq=6.0), product of:
            0.16064468 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.04500131 = queryNorm
            0.40988132 = fieldWeight in 690, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.046875 = fieldNorm(doc=690)
        0.036582336 = weight(_text_:22 in 690) [ClassicSimilarity], result of:
          0.036582336 = score(doc=690,freq=2.0), product of:
            0.15758692 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04500131 = queryNorm
            0.23214069 = fieldWeight in 690, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=690)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)
```
Abstract

We describe the latent semantic indexing subspace signature model (LSISSM) for semantic content representation of unstructured text. Grounded on singular value decomposition, the model represents terms and documents by the distribution signatures of their statistical contribution across the top-ranking latent concept dimensions. LSISSM matches term signatures with document signatures according to their mapping coherence between latent semantic indexing (LSI) term subspace and LSI document subspace. LSISSM does feature reduction and finds a low-rank approximation of scalable and sparse term-document matrices. Experiments demonstrate that this approach significantly improves the performance of major clustering algorithms such as standard K-means and self-organizing maps compared with the vector space model and the traditional LSI model. The unique contribution ranking mechanism in LSISSM also improves the initialization of standard K-means compared with random seeding procedure, which sometimes causes low efficiency and effectiveness of clustering. A two-stage initialization strategy based on LSISSM significantly reduces the running time of standard K-means procedures.

Date

23. 3.2013 13:22:36

Sojka, P.; Lee, M.; Rehurek, R.; Hatlapatka, R.; Kucbel, M.; Bouche, T.; Goutorbe, C.; Anghelache, R.; Wojciechowski, K.: Toolset for entity and semantic associations : Final Release (2013) 0.03

0.031545084 = product of:
  0.06309017 = sum of:
    0.06309017 = product of:
      0.09463525 = sum of:
        0.038015775 = weight(_text_:k in 1057) [ClassicSimilarity], result of:
          0.038015775 = score(doc=1057,freq=2.0), product of:
            0.16064468 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.04500131 = queryNorm
            0.23664509 = fieldWeight in 1057, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.046875 = fieldNorm(doc=1057)
        0.056619477 = weight(_text_:r in 1057) [ClassicSimilarity], result of:
          0.056619477 = score(doc=1057,freq=6.0), product of:
            0.14896595 = queryWeight, product of:
              3.3102584 = idf(docFreq=4387, maxDocs=44218)
              0.04500131 = queryNorm
            0.38008332 = fieldWeight in 1057, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.3102584 = idf(docFreq=4387, maxDocs=44218)
              0.046875 = fieldNorm(doc=1057)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)

Yi, K.: Automatic text classification using library classification schemes : trends, issues and challenges (2007) 0.03

0.029010376 = product of:
  0.058020752 = sum of:
    0.058020752 = product of:
      0.087031126 = sum of:
        0.044351738 = weight(_text_:k in 2560) [ClassicSimilarity], result of:
          0.044351738 = score(doc=2560,freq=2.0), product of:
            0.16064468 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.04500131 = queryNorm
            0.27608594 = fieldWeight in 2560, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2560)
        0.04267939 = weight(_text_:22 in 2560) [ClassicSimilarity], result of:
          0.04267939 = score(doc=2560,freq=2.0), product of:
            0.15758692 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04500131 = queryNorm
            0.2708308 = fieldWeight in 2560, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2560)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)

Date: 22. 9.2008 18:31:54

Bock, H.-H.: Datenanalyse zur Strukturierung und Ordnung von Information (1989) 0.03

0.026938958 = product of:
  0.053877916 = sum of:
    0.053877916 = product of:
      0.08081687 = sum of:
        0.03813748 = weight(_text_:r in 141) [ClassicSimilarity], result of:
          0.03813748 = score(doc=141,freq=2.0), product of:
            0.14896595 = queryWeight, product of:
              3.3102584 = idf(docFreq=4387, maxDocs=44218)
              0.04500131 = queryNorm
            0.25601473 = fieldWeight in 141, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3102584 = idf(docFreq=4387, maxDocs=44218)
              0.0546875 = fieldNorm(doc=141)
        0.04267939 = weight(_text_:22 in 141) [ClassicSimilarity], result of:
          0.04267939 = score(doc=141,freq=2.0), product of:
            0.15758692 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04500131 = queryNorm
            0.2708308 = fieldWeight in 141, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=141)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)

Pages: S.1-22
Source: Klassifikation und Ordnung. Tagungsband 12. Jahrestagung der Gesellschaft für Klassifikation, Darmstadt 17.-19.3.1988. Hrsg.: R. Wille

Han, K.; Rezapour, R.; Nakamura, K.; Devkota, D.; Miller, D.C.; Diesner, J.: ¬An expert-in-the-loop method for domain-specific document categorization based on small training data (2023) 0.02

0.02401436 = product of:
  0.04802872 = sum of:
    0.04802872 = product of:
      0.072043076 = sum of:
        0.04480202 = weight(_text_:k in 967) [ClassicSimilarity], result of:
          0.04480202 = score(doc=967,freq=4.0), product of:
            0.16064468 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.04500131 = queryNorm
            0.2788889 = fieldWeight in 967, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=967)
        0.027241055 = weight(_text_:r in 967) [ClassicSimilarity], result of:
          0.027241055 = score(doc=967,freq=2.0), product of:
            0.14896595 = queryWeight, product of:
              3.3102584 = idf(docFreq=4387, maxDocs=44218)
              0.04500131 = queryNorm
            0.18286766 = fieldWeight in 967, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3102584 = idf(docFreq=4387, maxDocs=44218)
              0.0390625 = fieldNorm(doc=967)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)

Liu, R.-L.: Context recognition for hierarchical text classification (2009) 0.02

0.023090538 = product of:
  0.046181075 = sum of:
    0.046181075 = product of:
      0.06927161 = sum of:
        0.03268927 = weight(_text_:r in 2760) [ClassicSimilarity], result of:
          0.03268927 = score(doc=2760,freq=2.0), product of:
            0.14896595 = queryWeight, product of:
              3.3102584 = idf(docFreq=4387, maxDocs=44218)
              0.04500131 = queryNorm
            0.2194412 = fieldWeight in 2760, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3102584 = idf(docFreq=4387, maxDocs=44218)
              0.046875 = fieldNorm(doc=2760)
        0.036582336 = weight(_text_:22 in 2760) [ClassicSimilarity], result of:
          0.036582336 = score(doc=2760,freq=2.0), product of:
            0.15758692 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04500131 = queryNorm
            0.23214069 = fieldWeight in 2760, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2760)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)

Date: 22. 3.2009 19:11:54

Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.02

0.019242113 = product of:
  0.038484227 = sum of:
    0.038484227 = product of:
      0.05772634 = sum of:
        0.027241055 = weight(_text_:r in 1107) [ClassicSimilarity], result of:
          0.027241055 = score(doc=1107,freq=2.0), product of:
            0.14896595 = queryWeight, product of:
              3.3102584 = idf(docFreq=4387, maxDocs=44218)
              0.04500131 = queryNorm
            0.18286766 = fieldWeight in 1107, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3102584 = idf(docFreq=4387, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1107)
        0.030485282 = weight(_text_:22 in 1107) [ClassicSimilarity], result of:
          0.030485282 = score(doc=1107,freq=2.0), product of:
            0.15758692 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04500131 = queryNorm
            0.19345059 = fieldWeight in 1107, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1107)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)

Date: 28.10.2013 19:22:57

Sparck Jones, K.: Automatic classification (1976) 0.02

0.016895901 = product of:
  0.033791803 = sum of:
    0.033791803 = product of:
      0.1013754 = sum of:
        0.1013754 = weight(_text_:k in 2908) [ClassicSimilarity], result of:
          0.1013754 = score(doc=2908,freq=2.0), product of:
            0.16064468 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.04500131 = queryNorm
            0.63105357 = fieldWeight in 2908, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.125 = fieldNorm(doc=2908)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Khoo, C.S.G.; Ng, K.; Ou, S.: ¬An exploratory study of human clustering of Web pages (2003) 0.02

0.01657736 = product of:
  0.03315472 = sum of:
    0.03315472 = product of:
      0.049732074 = sum of:
        0.02534385 = weight(_text_:k in 2741) [ClassicSimilarity], result of:
          0.02534385 = score(doc=2741,freq=2.0), product of:
            0.16064468 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.04500131 = queryNorm
            0.15776339 = fieldWeight in 2741, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.03125 = fieldNorm(doc=2741)
        0.024388224 = weight(_text_:22 in 2741) [ClassicSimilarity], result of:
          0.024388224 = score(doc=2741,freq=2.0), product of:
            0.15758692 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04500131 = queryNorm
            0.15476047 = fieldWeight in 2741, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=2741)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)

Date: 12. 9.2004 9:56:22

Borko, H.: Research in computer based classification systems (1985) 0.01
```
0.013748204 = product of:
  0.027496409 = sum of:
    0.027496409 = product of:
      0.04124461 = sum of:
        0.022175869 = weight(_text_:k in 3647) [ClassicSimilarity], result of:
          0.022175869 = score(doc=3647,freq=2.0), product of:
            0.16064468 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.04500131 = queryNorm
            0.13804297 = fieldWeight in 3647, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.02734375 = fieldNorm(doc=3647)
        0.01906874 = weight(_text_:r in 3647) [ClassicSimilarity], result of:
          0.01906874 = score(doc=3647,freq=2.0), product of:
            0.14896595 = queryWeight, product of:
              3.3102584 = idf(docFreq=4387, maxDocs=44218)
              0.04500131 = queryNorm
            0.12800737 = fieldWeight in 3647, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3102584 = idf(docFreq=4387, maxDocs=44218)
              0.02734375 = fieldNorm(doc=3647)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)
```
Abstract

The selection in this reader by R. M. Needham and K. Sparck Jones reports an early approach to automatic classification that was taken in England. The following selection reviews various approaches that were being pursued in the United States at about the same time. It then discusses a particular approach initiated in the early 1960s by Harold Borko, at that time Head of the Language Processing and Retrieval Research Staff at the System Development Corporation, Santa Monica, California and, since 1966, a member of the faculty at the Graduate School of Library and Information Science, University of California, Los Angeles. As was described earlier, there are two steps in automatic classification, the first being to identify pairs of terms that are similar by virtue of co-occurring as index terms in the same documents, and the second being to form equivalence classes of intersubstitutable terms. To compute similarities, Borko and his associates used a standard correlation formula; to derive classification categories, where Needham and Sparck Jones used clumping, the Borko team used the statistical technique of factor analysis. The fact that documents can be classified automatically, and in any number of ways, is worthy of passing notice. Worthy of serious attention would be a demonstra tion that a computer-based classification system was effective in the organization and retrieval of documents. One reason for the inclusion of the following selection in the reader is that it addresses the question of evaluation. To evaluate the effectiveness of their automatically derived classification, Borko and his team asked three questions. The first was Is the classification reliable? in other words, could the categories derived from one sample of texts be used to classify other texts? Reliability was assessed by a case-study comparison of the classes derived from three different samples of abstracts. The notso-surprising conclusion reached was that automatically derived classes were reliable only to the extent that the sample from which they were derived was representative of the total document collection. The second evaluation question asked whether the classification was reasonable, in the sense of adequately describing the content of the document collection. The answer was sought by comparing the automatically derived categories with categories in a related classification system that was manually constructed. Here the conclusion was that the automatic method yielded categories that fairly accurately reflected the major area of interest in the sample collection of texts; however, since there were only eleven such categories and they were quite broad, they could not be regarded as suitable for use in a university or any large general library. The third evaluation question asked whether automatic classification was accurate, in the sense of producing results similar to those obtainabie by human cIassifiers. When using human classification as a criterion, automatic classification was found to be 50 percent accurate.

Wu, M.; Fuller, M.; Wilkinson, R.: Using clustering and classification approaches in interactive retrieval (2001) 0.01

0.012712494 = product of:
  0.025424987 = sum of:
    0.025424987 = product of:
      0.07627496 = sum of:
        0.07627496 = weight(_text_:r in 2666) [ClassicSimilarity], result of:
          0.07627496 = score(doc=2666,freq=2.0), product of:
            0.14896595 = queryWeight, product of:
              3.3102584 = idf(docFreq=4387, maxDocs=44218)
              0.04500131 = queryNorm
            0.51202947 = fieldWeight in 2666, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3102584 = idf(docFreq=4387, maxDocs=44218)
              0.109375 = fieldNorm(doc=2666)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Yu, W.; Gong, Y.: Document clustering by concept factorization (2004) 0.01

0.012671925 = product of:
  0.02534385 = sum of:
    0.02534385 = product of:
      0.07603155 = sum of:
        0.07603155 = weight(_text_:k in 4084) [ClassicSimilarity], result of:
          0.07603155 = score(doc=4084,freq=2.0), product of:
            0.16064468 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.04500131 = queryNorm
            0.47329018 = fieldWeight in 4084, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.09375 = fieldNorm(doc=4084)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Source: SIGIR'04: Proceedings of the 27th Annual International ACM-SIGIR Conference an Research and Development in Information Retrieval. Ed.: K. Järvelin, u.a

Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.01

0.012194112 = product of:
  0.024388224 = sum of:
    0.024388224 = product of:
      0.07316467 = sum of:
        0.07316467 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
          0.07316467 = score(doc=1046,freq=2.0), product of:
            0.15758692 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04500131 = queryNorm
            0.46428138 = fieldWeight in 1046, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=1046)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Date: 5. 5.2003 14:17:22

Kwon, O.W.; Lee, J.H.: Text categorization based on k-nearest neighbor approach for web site classification (2003) 0.01
```
0.011806369 = product of:
  0.023612738 = sum of:
    0.023612738 = product of:
      0.07083821 = sum of:
        0.07083821 = weight(_text_:k in 1070) [ClassicSimilarity], result of:
          0.07083821 = score(doc=1070,freq=10.0), product of:
            0.16064468 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.04500131 = queryNorm
            0.44096208 = fieldWeight in 1070, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1070)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)
```
Abstract

Automatic categorization is a viable method to deal with the scaling problem on the World Wide Web. For Web site classification, this paper proposes the use of Web pages linked with the home page in a different manner from the sole use of home pages in previous research. To implement our proposed method, we derive a scheme for Web site classification based on the k-nearest neighbor (k-NN) approach. It consists of three phases: Web page selection (connectivity analysis), Web page classification, and Web site classification. Given a Web site, the Web page selection chooses several representative Web pages using connectivity analysis. The k-NN classifier next classifies each of the selected Web pages. Finally, the classified Web pages are extended to a classification of the entire Web site. To improve performance, we supplement the k-NN approach with a feature selection method and a term weighting scheme using markup tags, and also reform its document-document similarity measure. In our experiments on a Korean commercial Web directory, the proposed system, using both a home page and its linked pages, improved the performance of micro-averaging breakeven point by 30.02%, compared with an ordinary classification which uses a home page only.

Shen, D.; Chen, Z.; Yang, Q.; Zeng, H.J.; Zhang, B.; Lu, Y.; Ma, W.Y.: Web page classification through summarization (2004) 0.01

0.010559937 = product of:
  0.021119874 = sum of:
    0.021119874 = product of:
      0.06335962 = sum of:
        0.06335962 = weight(_text_:k in 4132) [ClassicSimilarity], result of:
          0.06335962 = score(doc=4132,freq=2.0), product of:
            0.16064468 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.04500131 = queryNorm
            0.39440846 = fieldWeight in 4132, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.078125 = fieldNorm(doc=4132)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Source: SIGIR'04: Proceedings of the 27th Annual International ACM-SIGIR Conference an Research and Development in Information Retrieval. Ed.: K. Järvelin, u.a

Hu, G.; Zhou, S.; Guan, J.; Hu, X.: Towards effective document clustering : a constrained K-means based approach (2008) 0.01
```
0.010453804 = product of:
  0.020907609 = sum of:
    0.020907609 = product of:
      0.062722825 = sum of:
        0.062722825 = weight(_text_:k in 2113) [ClassicSimilarity], result of:
          0.062722825 = score(doc=2113,freq=4.0), product of:
            0.16064468 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.04500131 = queryNorm
            0.39044446 = fieldWeight in 2113, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2113)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)
```
Abstract

Document clustering is an important tool for document collection organization and browsing. In real applications, some limited knowledge about cluster membership of a small number of documents is often available, such as some pairs of documents belonging to the same cluster. This kind of prior knowledge can be served as constraints for the clustering process. We integrate the constraints into the trace formulation of the sum of square Euclidean distance function of K-means. Then, the combined criterion function is transformed into trace maximization, which is further optimized by eigen-decomposition. Our experimental evaluation shows that the proposed semi-supervised clustering method can achieve better performance, compared to three existing methods.

Reiner, U.: Automatische DDC-Klassifizierung von bibliografischen Titeldatensätzen (2009) 0.01

0.010161761 = product of:
  0.020323522 = sum of:
    0.020323522 = product of:
      0.060970563 = sum of:
        0.060970563 = weight(_text_:22 in 611) [ClassicSimilarity], result of:
          0.060970563 = score(doc=611,freq=2.0), product of:
            0.15758692 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04500131 = queryNorm
            0.38690117 = fieldWeight in 611, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=611)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Date: 22. 8.2009 12:54:24

HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.01

0.010161761 = product of:
  0.020323522 = sum of:
    0.020323522 = product of:
      0.060970563 = sum of:
        0.060970563 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
          0.060970563 = score(doc=2748,freq=2.0), product of:
            0.15758692 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04500131 = queryNorm
            0.38690117 = fieldWeight in 2748, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2748)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Date: 1. 2.2016 18:25:22

Search (68 results, page 1 of 4)

Authors

Years

Languages

Types

Themes