Search (10 results, page 1 of 1)

Zhu, W.Z.; Allen, R.B.: Document clustering using the LSI subspace signature model (2013) 0.01
```
0.006019162 = product of:
  0.042134132 = sum of:
    0.03531506 = weight(_text_:representation in 690) [ClassicSimilarity], result of:
      0.03531506 = score(doc=690,freq=2.0), product of:
        0.11578492 = queryWeight, product of:
          4.600994 = idf(docFreq=1206, maxDocs=44218)
          0.025165197 = queryNorm
        0.3050057 = fieldWeight in 690, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.600994 = idf(docFreq=1206, maxDocs=44218)
          0.046875 = fieldNorm(doc=690)
    0.006819073 = product of:
      0.02045722 = sum of:
        0.02045722 = weight(_text_:22 in 690) [ClassicSimilarity], result of:
          0.02045722 = score(doc=690,freq=2.0), product of:
            0.08812423 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.025165197 = queryNorm
            0.23214069 = fieldWeight in 690, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=690)
      0.33333334 = coord(1/3)
  0.14285715 = coord(2/14)
```
Abstract

We describe the latent semantic indexing subspace signature model (LSISSM) for semantic content representation of unstructured text. Grounded on singular value decomposition, the model represents terms and documents by the distribution signatures of their statistical contribution across the top-ranking latent concept dimensions. LSISSM matches term signatures with document signatures according to their mapping coherence between latent semantic indexing (LSI) term subspace and LSI document subspace. LSISSM does feature reduction and finds a low-rank approximation of scalable and sparse term-document matrices. Experiments demonstrate that this approach significantly improves the performance of major clustering algorithms such as standard K-means and self-organizing maps compared with the vector space model and the traditional LSI model. The unique contribution ranking mechanism in LSISSM also improves the initialization of standard K-means compared with random seeding procedure, which sometimes causes low efficiency and effectiveness of clustering. A two-stage initialization strategy based on LSISSM significantly reduces the running time of standard K-means procedures.

Date

23. 3.2013 13:22:36
AlQenaei, Z.M.; Monarchi, D.E.: ¬The use of learning techniques to analyze the results of a manual classification system (2016) 0.00
```
0.0021020873 = product of:
  0.02942922 = sum of:
    0.02942922 = weight(_text_:representation in 2836) [ClassicSimilarity], result of:
      0.02942922 = score(doc=2836,freq=2.0), product of:
        0.11578492 = queryWeight, product of:
          4.600994 = idf(docFreq=1206, maxDocs=44218)
          0.025165197 = queryNorm
        0.25417143 = fieldWeight in 2836, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.600994 = idf(docFreq=1206, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2836)
  0.071428575 = coord(1/14)
```
Abstract

Classification is the process of assigning objects to pre-defined classes based on observations or characteristics of those objects, and there are many approaches to performing this task. The overall objective of this study is to demonstrate the use of two learning techniques to analyze the results of a manual classification system. Our sample consisted of 1,026 documents, from the ACM Computing Classification System, classified by their authors as belonging to one of the groups of the classification system: "H.3 Information Storage and Retrieval." A singular value decomposition of the documents' weighted term-frequency matrix was used to represent each document in a 50-dimensional vector space. The analysis of the representation using both supervised (decision tree) and unsupervised (clustering) techniques suggests that two pairs of the ACM classes are closely related to each other in the vector space. Class 1 (Content Analysis and Indexing) is closely related to Class 3 (Information Search and Retrieval), and Class 4 (Systems and Software) is closely related to Class 5 (Online Information Services). Further analysis was performed to test the diffusion of the words in the two classes using both cosine and Euclidean distance.

HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.00

8.1179454E-4 = product of:
  0.011365123 = sum of:
    0.011365123 = product of:
      0.03409537 = sum of:
        0.03409537 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
          0.03409537 = score(doc=2748,freq=2.0), product of:
            0.08812423 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.025165197 = queryNorm
            0.38690117 = fieldWeight in 2748, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2748)
      0.33333334 = coord(1/3)
  0.071428575 = coord(1/14)

Date: 1. 2.2016 18:25:22

Liu, X.; Yu, S.; Janssens, F.; Glänzel, W.; Moreau, Y.; Moor, B.de: Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database (2010) 0.00

4.9149804E-4 = product of:
  0.006880972 = sum of:
    0.006880972 = product of:
      0.020642916 = sum of:
        0.020642916 = weight(_text_:29 in 3464) [ClassicSimilarity], result of:
          0.020642916 = score(doc=3464,freq=2.0), product of:
            0.08852329 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.025165197 = queryNorm
            0.23319192 = fieldWeight in 3464, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.046875 = fieldNorm(doc=3464)
      0.33333334 = coord(1/3)
  0.071428575 = coord(1/14)

Date: 1. 6.2010 9:29:57

Sommer, M.: Automatische Generierung von DDC-Notationen für Hochschulveröffentlichungen (2012) 0.00

4.9149804E-4 = product of:
  0.006880972 = sum of:
    0.006880972 = product of:
      0.020642916 = sum of:
        0.020642916 = weight(_text_:29 in 587) [ClassicSimilarity], result of:
          0.020642916 = score(doc=587,freq=2.0), product of:
            0.08852329 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.025165197 = queryNorm
            0.23319192 = fieldWeight in 587, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.046875 = fieldNorm(doc=587)
      0.33333334 = coord(1/3)
  0.071428575 = coord(1/14)

Date: 29. 1.2013 15:44:43

Egbert, J.; Biber, D.; Davies, M.: Developing a bottom-up, user-based method of web register classification (2015) 0.00

4.8707667E-4 = product of:
  0.006819073 = sum of:
    0.006819073 = product of:
      0.02045722 = sum of:
        0.02045722 = weight(_text_:22 in 2158) [ClassicSimilarity], result of:
          0.02045722 = score(doc=2158,freq=2.0), product of:
            0.08812423 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.025165197 = queryNorm
            0.23214069 = fieldWeight in 2158, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2158)
      0.33333334 = coord(1/3)
  0.071428575 = coord(1/14)

Date: 4. 8.2015 19:22:04

Ma, Z.; Sun, A.; Cong, G.: On predicting the popularity of newly emerging hashtags in Twitter (2013) 0.00

4.0958173E-4 = product of:
  0.005734144 = sum of:
    0.005734144 = product of:
      0.017202431 = sum of:
        0.017202431 = weight(_text_:29 in 967) [ClassicSimilarity], result of:
          0.017202431 = score(doc=967,freq=2.0), product of:
            0.08852329 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.025165197 = queryNorm
            0.19432661 = fieldWeight in 967, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=967)
      0.33333334 = coord(1/3)
  0.071428575 = coord(1/14)

Date: 25. 6.2013 19:05:29

Golub, K.; Hansson, J.; Soergel, D.; Tudhope, D.: Managing classification in libraries : a methodological outline for evaluating automatic subject indexing and classification in Swedish library catalogues (2015) 0.00

4.0958173E-4 = product of:
  0.005734144 = sum of:
    0.005734144 = product of:
      0.017202431 = sum of:
        0.017202431 = weight(_text_:29 in 2300) [ClassicSimilarity], result of:
          0.017202431 = score(doc=2300,freq=2.0), product of:
            0.08852329 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.025165197 = queryNorm
            0.19432661 = fieldWeight in 2300, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2300)
      0.33333334 = coord(1/3)
  0.071428575 = coord(1/14)

Source: Classification and authority control: expanding resource discovery: proceedings of the International UDC Seminar 2015, 29-30 October 2015, Lisbon, Portugal. Eds.: Slavic, A. u. M.I. Cordeiro

Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.00

4.0589727E-4 = product of:
  0.0056825615 = sum of:
    0.0056825615 = product of:
      0.017047685 = sum of:
        0.017047685 = weight(_text_:22 in 1107) [ClassicSimilarity], result of:
          0.017047685 = score(doc=1107,freq=2.0), product of:
            0.08812423 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.025165197 = queryNorm
            0.19345059 = fieldWeight in 1107, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1107)
      0.33333334 = coord(1/3)
  0.071428575 = coord(1/14)

Date: 28.10.2013 19:22:57

Piros, A.: Automatic interpretation of complex UDC numbers : towards support for library systems (2015) 0.00

3.276654E-4 = product of:
  0.004587315 = sum of:
    0.004587315 = product of:
      0.013761944 = sum of:
        0.013761944 = weight(_text_:29 in 2301) [ClassicSimilarity], result of:
          0.013761944 = score(doc=2301,freq=2.0), product of:
            0.08852329 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.025165197 = queryNorm
            0.15546128 = fieldWeight in 2301, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.03125 = fieldNorm(doc=2301)
      0.33333334 = coord(1/3)
  0.071428575 = coord(1/14)

Source: Classification and authority control: expanding resource discovery: proceedings of the International UDC Seminar 2015, 29-30 October 2015, Lisbon, Portugal. Eds.: Slavic, A. u. M.I. Cordeiro

Search (10 results, page 1 of 1)

Authors

Languages

Types

Themes