Document (#28461)

Author
Ding, C.H.Q.
Title
¬A probabilistic model for Latent Semantic Indexing
Source
Journal of the American Society for Information Science and Technology. 56(2005) no.6, S.597-608
Year
2005
Abstract
Latent Semantic Indexing (LSI), when applied to semantic space built an text collections, improves information retrieval, information filtering, and word sense disambiguation. A new dual probability model based an the similarity concepts is introduced to provide deeper understanding of LSI. Semantic associations can be quantitatively characterized by their statistical significance, the likelihood. Semantic dimensions containing redundant and noisy information can be separated out and should be ignored because their negative contribution to the overall statistical significance. LSI is the optimal solution of the model. The peak in the likelihood curve indicates the existence of an intrinsic semantic dimension. The importance of LSI dimensions follows the Zipf-distribution, indicating that LSI dimensions represent latent concepts. Document frequency of words follows the Zipf distribution, and the number of distinct words follows log-normal distribution. Experiments an five standard document collections confirm and illustrate the analysis.
Theme
Retrievalstudien
Object
Latent Semantic Indexing

Similar documents (author)

  1. Ding, Y.: Visualization of intellectual structure in information retrieval : author cocitation analysis (1998) 4.84
    4.8379126 = sum of:
      4.8379126 = weight(author_txt:ding in 3793) [ClassicSimilarity], result of:
        4.8379126 = score(doc=3793,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            7.740661 = idf(docFreq=49, maxDocs=42306)
            0.12918793 = queryNorm
          4.837913 = fieldWeight in 3793, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            7.740661 = idf(docFreq=49, maxDocs=42306)
            0.625 = fieldNorm(doc=3793)
    
  2. Ding, Y.: Scholarly communication and bibliometrics : Part 1: The scholarly communication model: literature review (1998) 4.84
    4.8379126 = sum of:
      4.8379126 = weight(author_txt:ding in 4996) [ClassicSimilarity], result of:
        4.8379126 = score(doc=4996,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            7.740661 = idf(docFreq=49, maxDocs=42306)
            0.12918793 = queryNorm
          4.837913 = fieldWeight in 4996, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            7.740661 = idf(docFreq=49, maxDocs=42306)
            0.625 = fieldNorm(doc=4996)
    
  3. Ding, Y.: ¬A review of ontologies with the Semantic Web in view (2001) 4.84
    4.8379126 = sum of:
      4.8379126 = weight(author_txt:ding in 153) [ClassicSimilarity], result of:
        4.8379126 = score(doc=153,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            7.740661 = idf(docFreq=49, maxDocs=42306)
            0.12918793 = queryNorm
          4.837913 = fieldWeight in 153, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            7.740661 = idf(docFreq=49, maxDocs=42306)
            0.625 = fieldNorm(doc=153)
    
  4. Ding, Y.: Applying weighted PageRank to author citation networks (2011) 4.84
    4.8379126 = sum of:
      4.8379126 = weight(author_txt:ding in 1189) [ClassicSimilarity], result of:
        4.8379126 = score(doc=1189,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            7.740661 = idf(docFreq=49, maxDocs=42306)
            0.12918793 = queryNorm
          4.837913 = fieldWeight in 1189, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            7.740661 = idf(docFreq=49, maxDocs=42306)
            0.625 = fieldNorm(doc=1189)
    
  5. Ding, Y.: Topic-based PageRank on author cocitation networks (2011) 4.84
    4.8379126 = sum of:
      4.8379126 = weight(author_txt:ding in 1349) [ClassicSimilarity], result of:
        4.8379126 = score(doc=1349,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            7.740661 = idf(docFreq=49, maxDocs=42306)
            0.12918793 = queryNorm
          4.837913 = fieldWeight in 1349, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            7.740661 = idf(docFreq=49, maxDocs=42306)
            0.625 = fieldNorm(doc=1349)
    

Similar documents (content)

  1. Zhu, W.Z.; Allen, R.B.: Document clustering using the LSI subspace signature model (2013) 0.31
    0.30875114 = sum of:
      0.30875114 = product of:
        0.9648473 = sum of:
          0.050651338 = weight(abstract_txt:document in 2691) [ClassicSimilarity], result of:
            0.050651338 = score(doc=2691,freq=3.0), product of:
              0.0874715 = queryWeight, product of:
                1.1396981 = boost
                4.2793097 = idf(docFreq=1592, maxDocs=42306)
                0.017935067 = queryNorm
              0.57906103 = fieldWeight in 2691, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.2793097 = idf(docFreq=1592, maxDocs=42306)
                0.078125 = fieldNorm(doc=2691)
          0.043085612 = weight(abstract_txt:indexing in 2691) [ClassicSimilarity], result of:
            0.043085612 = score(doc=2691,freq=2.0), product of:
              0.08989272 = queryWeight, product of:
                1.1553639 = boost
                4.3381314 = idf(docFreq=1501, maxDocs=42306)
                0.017935067 = queryNorm
              0.47930032 = fieldWeight in 2691, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3381314 = idf(docFreq=1501, maxDocs=42306)
                0.078125 = fieldNorm(doc=2691)
          0.063799545 = weight(abstract_txt:statistical in 2691) [ClassicSimilarity], result of:
            0.063799545 = score(doc=2691,freq=1.0), product of:
              0.14713797 = queryWeight, product of:
                1.4781514 = boost
                5.5501256 = idf(docFreq=446, maxDocs=42306)
                0.017935067 = queryNorm
              0.43360355 = fieldWeight in 2691, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5501256 = idf(docFreq=446, maxDocs=42306)
                0.078125 = fieldNorm(doc=2691)
          0.07348848 = weight(abstract_txt:model in 2691) [ClassicSimilarity], result of:
            0.07348848 = score(doc=2691,freq=4.0), product of:
              0.116592236 = queryWeight, product of:
                1.6115248 = boost
                4.0339417 = idf(docFreq=2035, maxDocs=42306)
                0.017935067 = queryNorm
              0.6303034 = fieldWeight in 2691, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.0339417 = idf(docFreq=2035, maxDocs=42306)
                0.078125 = fieldNorm(doc=2691)
          0.09883276 = weight(abstract_txt:distribution in 2691) [ClassicSimilarity], result of:
            0.09883276 = score(doc=2691,freq=1.0), product of:
              0.22549872 = queryWeight, product of:
                2.241167 = boost
                5.610051 = idf(docFreq=420, maxDocs=42306)
                0.017935067 = queryNorm
              0.43828523 = fieldWeight in 2691, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.610051 = idf(docFreq=420, maxDocs=42306)
                0.078125 = fieldNorm(doc=2691)
          0.12049727 = weight(abstract_txt:dimensions in 2691) [ClassicSimilarity], result of:
            0.12049727 = score(doc=2691,freq=1.0), product of:
              0.2573524 = queryWeight, product of:
                2.3942323 = boost
                5.993202 = idf(docFreq=286, maxDocs=42306)
                0.017935067 = queryNorm
              0.46821892 = fieldWeight in 2691, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.993202 = idf(docFreq=286, maxDocs=42306)
                0.078125 = fieldNorm(doc=2691)
          0.337931 = weight(abstract_txt:latent in 2691) [ClassicSimilarity], result of:
            0.337931 = score(doc=2691,freq=3.0), product of:
              0.35485837 = queryWeight, product of:
                2.8114457 = boost
                7.037564 = idf(docFreq=100, maxDocs=42306)
                0.017935067 = queryNorm
              0.9522983 = fieldWeight in 2691, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.037564 = idf(docFreq=100, maxDocs=42306)
                0.078125 = fieldNorm(doc=2691)
          0.17656136 = weight(abstract_txt:semantic in 2691) [ClassicSimilarity], result of:
            0.17656136 = score(doc=2691,freq=3.0), product of:
              0.29003036 = queryWeight, product of:
                3.5945036 = boost
                4.4988503 = idf(docFreq=1278, maxDocs=42306)
                0.017935067 = queryNorm
              0.6087685 = fieldWeight in 2691, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.4988503 = idf(docFreq=1278, maxDocs=42306)
                0.078125 = fieldNorm(doc=2691)
        0.32 = coord(8/25)
    
  2. Li, D.; Kwong, C.-P.; Lee, D.L.: Unified linear subspace approach to semantic analysis (2009) 0.20
    0.19711502 = sum of:
      0.19711502 = product of:
        0.70398223 = sum of:
          0.10374333 = weight(abstract_txt:dual in 322) [ClassicSimilarity], result of:
            0.10374333 = score(doc=322,freq=2.0), product of:
              0.1487327 = queryWeight, product of:
                1.0508598 = boost
                7.8914843 = idf(docFreq=42, maxDocs=42306)
                0.017935067 = queryNorm
              0.69751525 = fieldWeight in 322, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.8914843 = idf(docFreq=42, maxDocs=42306)
                0.0625 = fieldNorm(doc=322)
          0.03308532 = weight(abstract_txt:document in 322) [ClassicSimilarity], result of:
            0.03308532 = score(doc=322,freq=2.0), product of:
              0.0874715 = queryWeight, product of:
                1.1396981 = boost
                4.2793097 = idf(docFreq=1592, maxDocs=42306)
                0.017935067 = queryNorm
              0.37824112 = fieldWeight in 322, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2793097 = idf(docFreq=1592, maxDocs=42306)
                0.0625 = fieldNorm(doc=322)
          0.024372904 = weight(abstract_txt:indexing in 322) [ClassicSimilarity], result of:
            0.024372904 = score(doc=322,freq=1.0), product of:
              0.08989272 = queryWeight, product of:
                1.1553639 = boost
                4.3381314 = idf(docFreq=1501, maxDocs=42306)
                0.017935067 = queryNorm
              0.2711332 = fieldWeight in 322, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3381314 = idf(docFreq=1501, maxDocs=42306)
                0.0625 = fieldNorm(doc=322)
          0.031108145 = weight(abstract_txt:collections in 322) [ClassicSimilarity], result of:
            0.031108145 = score(doc=322,freq=1.0), product of:
              0.10577161 = queryWeight, product of:
                1.2532598 = boost
                4.705708 = idf(docFreq=1039, maxDocs=42306)
                0.017935067 = queryNorm
              0.29410675 = fieldWeight in 322, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.705708 = idf(docFreq=1039, maxDocs=42306)
                0.0625 = fieldNorm(doc=322)
          0.041571364 = weight(abstract_txt:model in 322) [ClassicSimilarity], result of:
            0.041571364 = score(doc=322,freq=2.0), product of:
              0.116592236 = queryWeight, product of:
                1.6115248 = boost
                4.0339417 = idf(docFreq=2035, maxDocs=42306)
                0.017935067 = queryNorm
              0.35655344 = fieldWeight in 322, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0339417 = idf(docFreq=2035, maxDocs=42306)
                0.0625 = fieldNorm(doc=322)
          0.2703448 = weight(abstract_txt:latent in 322) [ClassicSimilarity], result of:
            0.2703448 = score(doc=322,freq=3.0), product of:
              0.35485837 = queryWeight, product of:
                2.8114457 = boost
                7.037564 = idf(docFreq=100, maxDocs=42306)
                0.017935067 = queryNorm
              0.7618386 = fieldWeight in 322, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.037564 = idf(docFreq=100, maxDocs=42306)
                0.0625 = fieldNorm(doc=322)
          0.19975638 = weight(abstract_txt:semantic in 322) [ClassicSimilarity], result of:
            0.19975638 = score(doc=322,freq=6.0), product of:
              0.29003036 = queryWeight, product of:
                3.5945036 = boost
                4.4988503 = idf(docFreq=1278, maxDocs=42306)
                0.017935067 = queryNorm
              0.688743 = fieldWeight in 322, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.4988503 = idf(docFreq=1278, maxDocs=42306)
                0.0625 = fieldNorm(doc=322)
        0.28 = coord(7/25)
    
  3. Choi, Y.: ¬A complete assessment of tagging quality : a consolidated methodology (2015) 0.19
    0.19439237 = sum of:
      0.19439237 = product of:
        0.69425845 = sum of:
          0.029243566 = weight(abstract_txt:document in 3731) [ClassicSimilarity], result of:
            0.029243566 = score(doc=3731,freq=1.0), product of:
              0.0874715 = queryWeight, product of:
                1.1396981 = boost
                4.2793097 = idf(docFreq=1592, maxDocs=42306)
                0.017935067 = queryNorm
              0.33432108 = fieldWeight in 3731, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2793097 = idf(docFreq=1592, maxDocs=42306)
                0.078125 = fieldNorm(doc=3731)
          0.06812433 = weight(abstract_txt:indexing in 3731) [ClassicSimilarity], result of:
            0.06812433 = score(doc=3731,freq=5.0), product of:
              0.08989272 = queryWeight, product of:
                1.1553639 = boost
                4.3381314 = idf(docFreq=1501, maxDocs=42306)
                0.017935067 = queryNorm
              0.75784034 = fieldWeight in 3731, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.3381314 = idf(docFreq=1501, maxDocs=42306)
                0.078125 = fieldNorm(doc=3731)
          0.063799545 = weight(abstract_txt:statistical in 3731) [ClassicSimilarity], result of:
            0.063799545 = score(doc=3731,freq=1.0), product of:
              0.14713797 = queryWeight, product of:
                1.4781514 = boost
                5.5501256 = idf(docFreq=446, maxDocs=42306)
                0.017935067 = queryNorm
              0.43360355 = fieldWeight in 3731, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5501256 = idf(docFreq=446, maxDocs=42306)
                0.078125 = fieldNorm(doc=3731)
          0.03674424 = weight(abstract_txt:model in 3731) [ClassicSimilarity], result of:
            0.03674424 = score(doc=3731,freq=1.0), product of:
              0.116592236 = queryWeight, product of:
                1.6115248 = boost
                4.0339417 = idf(docFreq=2035, maxDocs=42306)
                0.017935067 = queryNorm
              0.3151517 = fieldWeight in 3731, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0339417 = idf(docFreq=2035, maxDocs=42306)
                0.078125 = fieldNorm(doc=3731)
          0.09736672 = weight(abstract_txt:significance in 3731) [ClassicSimilarity], result of:
            0.09736672 = score(doc=3731,freq=1.0), product of:
              0.19503836 = queryWeight, product of:
                1.7018316 = boost
                6.389994 = idf(docFreq=192, maxDocs=42306)
                0.017935067 = queryNorm
              0.49921829 = fieldWeight in 3731, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.389994 = idf(docFreq=192, maxDocs=42306)
                0.078125 = fieldNorm(doc=3731)
          0.19510457 = weight(abstract_txt:latent in 3731) [ClassicSimilarity], result of:
            0.19510457 = score(doc=3731,freq=1.0), product of:
              0.35485837 = queryWeight, product of:
                2.8114457 = boost
                7.037564 = idf(docFreq=100, maxDocs=42306)
                0.017935067 = queryNorm
              0.5498097 = fieldWeight in 3731, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.037564 = idf(docFreq=100, maxDocs=42306)
                0.078125 = fieldNorm(doc=3731)
          0.2038755 = weight(abstract_txt:semantic in 3731) [ClassicSimilarity], result of:
            0.2038755 = score(doc=3731,freq=4.0), product of:
              0.29003036 = queryWeight, product of:
                3.5945036 = boost
                4.4988503 = idf(docFreq=1278, maxDocs=42306)
                0.017935067 = queryNorm
              0.70294535 = fieldWeight in 3731, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.4988503 = idf(docFreq=1278, maxDocs=42306)
                0.078125 = fieldNorm(doc=3731)
        0.28 = coord(7/25)
    
  4. He, X.; Cai, D.; Liu, H.; Ma, W.Y.: Locality preserving indexing for document representation (2004) 0.16
    0.1632614 = sum of:
      0.1632614 = product of:
        1.3605117 = sum of:
          0.17234245 = weight(abstract_txt:indexing in 80) [ClassicSimilarity], result of:
            0.17234245 = score(doc=80,freq=2.0), product of:
              0.08989272 = queryWeight, product of:
                1.1553639 = boost
                4.3381314 = idf(docFreq=1501, maxDocs=42306)
                0.017935067 = queryNorm
              1.9172013 = fieldWeight in 80, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3381314 = idf(docFreq=1501, maxDocs=42306)
                0.3125 = fieldNorm(doc=80)
          0.7804183 = weight(abstract_txt:latent in 80) [ClassicSimilarity], result of:
            0.7804183 = score(doc=80,freq=1.0), product of:
              0.35485837 = queryWeight, product of:
                2.8114457 = boost
                7.037564 = idf(docFreq=100, maxDocs=42306)
                0.017935067 = queryNorm
              2.1992388 = fieldWeight in 80, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.037564 = idf(docFreq=100, maxDocs=42306)
                0.3125 = fieldNorm(doc=80)
          0.407751 = weight(abstract_txt:semantic in 80) [ClassicSimilarity], result of:
            0.407751 = score(doc=80,freq=1.0), product of:
              0.29003036 = queryWeight, product of:
                3.5945036 = boost
                4.4988503 = idf(docFreq=1278, maxDocs=42306)
                0.017935067 = queryNorm
              1.4058907 = fieldWeight in 80, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4988503 = idf(docFreq=1278, maxDocs=42306)
                0.3125 = fieldNorm(doc=80)
        0.12 = coord(3/25)
    
  5. Story, R.E.: ¬An explanation of the effectiveness of latent semantic indexing by means of a Baysian regression model (1996) 0.16
    0.1606256 = sum of:
      0.1606256 = product of:
        0.6692734 = sum of:
          0.040940993 = weight(abstract_txt:document in 2012) [ClassicSimilarity], result of:
            0.040940993 = score(doc=2012,freq=1.0), product of:
              0.0874715 = queryWeight, product of:
                1.1396981 = boost
                4.2793097 = idf(docFreq=1592, maxDocs=42306)
                0.017935067 = queryNorm
              0.4680495 = fieldWeight in 2012, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2793097 = idf(docFreq=1592, maxDocs=42306)
                0.109375 = fieldNorm(doc=2012)
          0.04265258 = weight(abstract_txt:indexing in 2012) [ClassicSimilarity], result of:
            0.04265258 = score(doc=2012,freq=1.0), product of:
              0.08989272 = queryWeight, product of:
                1.1553639 = boost
                4.3381314 = idf(docFreq=1501, maxDocs=42306)
                0.017935067 = queryNorm
              0.47448313 = fieldWeight in 2012, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3381314 = idf(docFreq=1501, maxDocs=42306)
                0.109375 = fieldNorm(doc=2012)
          0.08050124 = weight(abstract_txt:words in 2012) [ClassicSimilarity], result of:
            0.08050124 = score(doc=2012,freq=1.0), product of:
              0.13728699 = queryWeight, product of:
                1.4278127 = boost
                5.361115 = idf(docFreq=539, maxDocs=42306)
                0.017935067 = queryNorm
              0.58637196 = fieldWeight in 2012, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.361115 = idf(docFreq=539, maxDocs=42306)
                0.109375 = fieldNorm(doc=2012)
          0.08931937 = weight(abstract_txt:statistical in 2012) [ClassicSimilarity], result of:
            0.08931937 = score(doc=2012,freq=1.0), product of:
              0.14713797 = queryWeight, product of:
                1.4781514 = boost
                5.5501256 = idf(docFreq=446, maxDocs=42306)
                0.017935067 = queryNorm
              0.607045 = fieldWeight in 2012, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5501256 = idf(docFreq=446, maxDocs=42306)
                0.109375 = fieldNorm(doc=2012)
          0.2731464 = weight(abstract_txt:latent in 2012) [ClassicSimilarity], result of:
            0.2731464 = score(doc=2012,freq=1.0), product of:
              0.35485837 = queryWeight, product of:
                2.8114457 = boost
                7.037564 = idf(docFreq=100, maxDocs=42306)
                0.017935067 = queryNorm
              0.76973355 = fieldWeight in 2012, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.037564 = idf(docFreq=100, maxDocs=42306)
                0.109375 = fieldNorm(doc=2012)
          0.14271285 = weight(abstract_txt:semantic in 2012) [ClassicSimilarity], result of:
            0.14271285 = score(doc=2012,freq=1.0), product of:
              0.29003036 = queryWeight, product of:
                3.5945036 = boost
                4.4988503 = idf(docFreq=1278, maxDocs=42306)
                0.017935067 = queryNorm
              0.49206176 = fieldWeight in 2012, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4988503 = idf(docFreq=1278, maxDocs=42306)
                0.109375 = fieldNorm(doc=2012)
        0.24 = coord(6/25)