Document (#28458)

Author
Hoenkamp, E.
Title
Unitary operators on the document space
Source
Journal of the American Society for Information Science and technology. 54(2003) no.4, S.314-320
Year
2003
Abstract
When people search for documents, they eventually want content, not words. Hence, search engines should relate documents more by their underlying concepts than by the words they contain. One promising technique to do so is Latent Semantic Indexing (LSI). LSI dramatically reduces the dimension of the document space by mapping it into a space spanned by conceptual indices. Empirically, the number of concepts that can represent the documents are far fewer than the great variety of words in the textual representation. Although this almost obviates the problem of lexical matching, the mapping incurs a high computational cost compared to document parsing, indexing, query matching, and updating. This article accomplishes several things. First, it shows how the technique underlying LSI is just one example of a unitary operator, for which there are computationally more attractive alternatives. Second, it proposes the Haar transform as such an alternative, as it is memory efficient, and can be computed in linear to sublinear time. Third, it generalizes LSI by a multiresolution representation of the document space. The approach not only preserves the advantages of LSI at drastically reduced computational costs, it also opens a spectrum of possibilities for new research.
Footnote
Beitrag eines Themenheftes: Mathematical, logical, and formal methods in information retrieval
Theme
Retrievalalgorithmen
Object
Latent Semantic Indexing

Similar documents (content)

  1. Liu, G.Z.: Semantic vector space model : implementation and evaluation (1997) 0.21
    0.21237224 = sum of:
      0.21237224 = product of:
        0.66366327 = sum of:
          0.09251436 = weight(abstract_txt:parsing in 161) [ClassicSimilarity], result of:
            0.09251436 = score(doc=161,freq=1.0), product of:
              0.15288384 = queryWeight, product of:
                1.0165753 = boost
                7.7456436 = idf(docFreq=51, maxDocs=44218)
                0.019416213 = queryNorm
              0.6051284 = fieldWeight in 161, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7456436 = idf(docFreq=51, maxDocs=44218)
                0.078125 = fieldNorm(doc=161)
          0.033276144 = weight(abstract_txt:than in 161) [ClassicSimilarity], result of:
            0.033276144 = score(doc=161,freq=2.0), product of:
              0.077323385 = queryWeight, product of:
                1.0224197 = boost
                3.8950868 = idf(docFreq=2444, maxDocs=44218)
                0.019416213 = queryNorm
              0.43035033 = fieldWeight in 161, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.8950868 = idf(docFreq=2444, maxDocs=44218)
                0.078125 = fieldNorm(doc=161)
          0.032765046 = weight(abstract_txt:indexing in 161) [ClassicSimilarity], result of:
            0.032765046 = score(doc=161,freq=1.0), product of:
              0.09642124 = queryWeight, product of:
                1.1417214 = boost
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.019416213 = queryNorm
              0.3398115 = fieldWeight in 161, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.078125 = fieldNorm(doc=161)
          0.06731173 = weight(abstract_txt:representation in 161) [ClassicSimilarity], result of:
            0.06731173 = score(doc=161,freq=2.0), product of:
              0.12367521 = queryWeight, product of:
                1.2930493 = boost
                4.926098 = idf(docFreq=871, maxDocs=44218)
                0.019416213 = queryNorm
              0.54426205 = fieldWeight in 161, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.926098 = idf(docFreq=871, maxDocs=44218)
                0.078125 = fieldNorm(doc=161)
          0.13909145 = weight(abstract_txt:technique in 161) [ClassicSimilarity], result of:
            0.13909145 = score(doc=161,freq=4.0), product of:
              0.15924989 = queryWeight, product of:
                1.4672811 = boost
                5.5898643 = idf(docFreq=448, maxDocs=44218)
                0.019416213 = queryNorm
              0.8734163 = fieldWeight in 161, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.5898643 = idf(docFreq=448, maxDocs=44218)
                0.078125 = fieldNorm(doc=161)
          0.059125714 = weight(abstract_txt:documents in 161) [ClassicSimilarity], result of:
            0.059125714 = score(doc=161,freq=2.0), product of:
              0.12984848 = queryWeight, product of:
                1.6226985 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.019416213 = queryNorm
              0.4553439 = fieldWeight in 161, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.078125 = fieldNorm(doc=161)
          0.06298838 = weight(abstract_txt:document in 161) [ClassicSimilarity], result of:
            0.06298838 = score(doc=161,freq=1.0), product of:
              0.18782316 = queryWeight, product of:
                2.25353 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.019416213 = queryNorm
              0.33536002 = fieldWeight in 161, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.078125 = fieldNorm(doc=161)
          0.17659046 = weight(abstract_txt:space in 161) [ClassicSimilarity], result of:
            0.17659046 = score(doc=161,freq=2.0), product of:
              0.29639918 = queryWeight, product of:
                2.8309178 = boost
                5.3924384 = idf(docFreq=546, maxDocs=44218)
                0.019416213 = queryNorm
              0.5957859 = fieldWeight in 161, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.3924384 = idf(docFreq=546, maxDocs=44218)
                0.078125 = fieldNorm(doc=161)
        0.32 = coord(8/25)
    
  2. Zhang, J.; Mostafa, J.; Tripathy, H.: Information retrieval by semantic analysis and visualization of the concept space of D-Lib® magazine (2002) 0.14
    0.14135414 = sum of:
      0.14135414 = product of:
        0.3926504 = sum of:
          0.03522495 = weight(abstract_txt:computed in 1211) [ClassicSimilarity], result of:
            0.03522495 = score(doc=1211,freq=1.0), product of:
              0.14793892 = queryWeight, product of:
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.019416213 = queryNorm
              0.23810469 = fieldWeight in 1211, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.03125 = fieldNorm(doc=1211)
          0.009411915 = weight(abstract_txt:than in 1211) [ClassicSimilarity], result of:
            0.009411915 = score(doc=1211,freq=1.0), product of:
              0.077323385 = queryWeight, product of:
                1.0224197 = boost
                3.8950868 = idf(docFreq=2444, maxDocs=44218)
                0.019416213 = queryNorm
              0.12172146 = fieldWeight in 1211, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.8950868 = idf(docFreq=2444, maxDocs=44218)
                0.03125 = fieldNorm(doc=1211)
          0.042524736 = weight(abstract_txt:concepts in 1211) [ClassicSimilarity], result of:
            0.042524736 = score(doc=1211,freq=8.0), product of:
              0.10566308 = queryWeight, product of:
                1.1951858 = boost
                4.5532694 = idf(docFreq=1265, maxDocs=44218)
                0.019416213 = queryNorm
              0.40245596 = fieldWeight in 1211, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                4.5532694 = idf(docFreq=1265, maxDocs=44218)
                0.03125 = fieldNorm(doc=1211)
          0.01903863 = weight(abstract_txt:representation in 1211) [ClassicSimilarity], result of:
            0.01903863 = score(doc=1211,freq=1.0), product of:
              0.12367521 = queryWeight, product of:
                1.2930493 = boost
                4.926098 = idf(docFreq=871, maxDocs=44218)
                0.019416213 = queryNorm
              0.15394056 = fieldWeight in 1211, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.926098 = idf(docFreq=871, maxDocs=44218)
                0.03125 = fieldNorm(doc=1211)
          0.055636577 = weight(abstract_txt:technique in 1211) [ClassicSimilarity], result of:
            0.055636577 = score(doc=1211,freq=4.0), product of:
              0.15924989 = queryWeight, product of:
                1.4672811 = boost
                5.5898643 = idf(docFreq=448, maxDocs=44218)
                0.019416213 = queryNorm
              0.34936652 = fieldWeight in 1211, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.5898643 = idf(docFreq=448, maxDocs=44218)
                0.03125 = fieldNorm(doc=1211)
          0.04982625 = weight(abstract_txt:matching in 1211) [ClassicSimilarity], result of:
            0.04982625 = score(doc=1211,freq=2.0), product of:
              0.18641792 = queryWeight, product of:
                1.5875142 = boost
                6.047913 = idf(docFreq=283, maxDocs=44218)
                0.019416213 = queryNorm
              0.26728252 = fieldWeight in 1211, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.047913 = idf(docFreq=283, maxDocs=44218)
                0.03125 = fieldNorm(doc=1211)
          0.033446554 = weight(abstract_txt:documents in 1211) [ClassicSimilarity], result of:
            0.033446554 = score(doc=1211,freq=4.0), product of:
              0.12984848 = queryWeight, product of:
                1.6226985 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.019416213 = queryNorm
              0.2575814 = fieldWeight in 1211, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.03125 = fieldNorm(doc=1211)
          0.02519535 = weight(abstract_txt:document in 1211) [ClassicSimilarity], result of:
            0.02519535 = score(doc=1211,freq=1.0), product of:
              0.18782316 = queryWeight, product of:
                2.25353 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.019416213 = queryNorm
              0.13414401 = fieldWeight in 1211, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.03125 = fieldNorm(doc=1211)
          0.122345455 = weight(abstract_txt:space in 1211) [ClassicSimilarity], result of:
            0.122345455 = score(doc=1211,freq=6.0), product of:
              0.29639918 = queryWeight, product of:
                2.8309178 = boost
                5.3924384 = idf(docFreq=546, maxDocs=44218)
                0.019416213 = queryNorm
              0.4127726 = fieldWeight in 1211, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.3924384 = idf(docFreq=546, maxDocs=44218)
                0.03125 = fieldNorm(doc=1211)
        0.36 = coord(9/25)
    
  3. Martin, D.I.; Berry, M.W.: Latent Semantic Indexing (2009) 0.13
    0.13352652 = sum of:
      0.13352652 = product of:
        0.5563605 = sum of:
          0.046336778 = weight(abstract_txt:indexing in 3834) [ClassicSimilarity], result of:
            0.046336778 = score(doc=3834,freq=2.0), product of:
              0.09642124 = queryWeight, product of:
                1.1417214 = boost
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.019416213 = queryNorm
              0.48056605 = fieldWeight in 3834, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.078125 = fieldNorm(doc=3834)
          0.06954572 = weight(abstract_txt:technique in 3834) [ClassicSimilarity], result of:
            0.06954572 = score(doc=3834,freq=1.0), product of:
              0.15924989 = queryWeight, product of:
                1.4672811 = boost
                5.5898643 = idf(docFreq=448, maxDocs=44218)
                0.019416213 = queryNorm
              0.43670815 = fieldWeight in 3834, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5898643 = idf(docFreq=448, maxDocs=44218)
                0.078125 = fieldNorm(doc=3834)
          0.07117205 = weight(abstract_txt:underlying in 3834) [ClassicSimilarity], result of:
            0.07117205 = score(doc=3834,freq=1.0), product of:
              0.16172302 = queryWeight, product of:
                1.4786305 = boost
                5.633102 = idf(docFreq=429, maxDocs=44218)
                0.019416213 = queryNorm
              0.4400861 = fieldWeight in 3834, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.633102 = idf(docFreq=429, maxDocs=44218)
                0.078125 = fieldNorm(doc=3834)
          0.08361639 = weight(abstract_txt:documents in 3834) [ClassicSimilarity], result of:
            0.08361639 = score(doc=3834,freq=4.0), product of:
              0.12984848 = queryWeight, product of:
                1.6226985 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.019416213 = queryNorm
              0.64395356 = fieldWeight in 3834, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.078125 = fieldNorm(doc=3834)
          0.109099075 = weight(abstract_txt:document in 3834) [ClassicSimilarity], result of:
            0.109099075 = score(doc=3834,freq=3.0), product of:
              0.18782316 = queryWeight, product of:
                2.25353 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.019416213 = queryNorm
              0.5808606 = fieldWeight in 3834, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.078125 = fieldNorm(doc=3834)
          0.17659046 = weight(abstract_txt:space in 3834) [ClassicSimilarity], result of:
            0.17659046 = score(doc=3834,freq=2.0), product of:
              0.29639918 = queryWeight, product of:
                2.8309178 = boost
                5.3924384 = idf(docFreq=546, maxDocs=44218)
                0.019416213 = queryNorm
              0.5957859 = fieldWeight in 3834, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.3924384 = idf(docFreq=546, maxDocs=44218)
                0.078125 = fieldNorm(doc=3834)
        0.24 = coord(6/25)
    
  4. Kiren, T.; Shoaib, M.: ¬A novel ontology matching approach using key concepts (2016) 0.12
    0.12309275 = sum of:
      0.12309275 = product of:
        0.61546373 = sum of:
          0.0704499 = weight(abstract_txt:computed in 2589) [ClassicSimilarity], result of:
            0.0704499 = score(doc=2589,freq=1.0), product of:
              0.14793892 = queryWeight, product of:
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.019416213 = queryNorm
              0.47620937 = fieldWeight in 2589, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.0625 = fieldNorm(doc=2589)
          0.06723752 = weight(abstract_txt:concepts in 2589) [ClassicSimilarity], result of:
            0.06723752 = score(doc=2589,freq=5.0), product of:
              0.10566308 = queryWeight, product of:
                1.1951858 = boost
                4.5532694 = idf(docFreq=1265, maxDocs=44218)
                0.019416213 = queryNorm
              0.6363388 = fieldWeight in 2589, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.5532694 = idf(docFreq=1265, maxDocs=44218)
                0.0625 = fieldNorm(doc=2589)
          0.078682 = weight(abstract_txt:technique in 2589) [ClassicSimilarity], result of:
            0.078682 = score(doc=2589,freq=2.0), product of:
              0.15924989 = queryWeight, product of:
                1.4672811 = boost
                5.5898643 = idf(docFreq=448, maxDocs=44218)
                0.019416213 = queryNorm
              0.49407884 = fieldWeight in 2589, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.5898643 = idf(docFreq=448, maxDocs=44218)
                0.0625 = fieldNorm(doc=2589)
          0.199305 = weight(abstract_txt:matching in 2589) [ClassicSimilarity], result of:
            0.199305 = score(doc=2589,freq=8.0), product of:
              0.18641792 = queryWeight, product of:
                1.5875142 = boost
                6.047913 = idf(docFreq=283, maxDocs=44218)
                0.019416213 = queryNorm
              1.0691301 = fieldWeight in 2589, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                6.047913 = idf(docFreq=283, maxDocs=44218)
                0.0625 = fieldNorm(doc=2589)
          0.19978929 = weight(abstract_txt:space in 2589) [ClassicSimilarity], result of:
            0.19978929 = score(doc=2589,freq=4.0), product of:
              0.29639918 = queryWeight, product of:
                2.8309178 = boost
                5.3924384 = idf(docFreq=546, maxDocs=44218)
                0.019416213 = queryNorm
              0.6740548 = fieldWeight in 2589, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.3924384 = idf(docFreq=546, maxDocs=44218)
                0.0625 = fieldNorm(doc=2589)
        0.2 = coord(5/25)
    
  5. Kiren, T.: ¬A clustering based indexing technique of modularized ontologies for information retrieval (2017) 0.12
    0.122002885 = sum of:
      0.122002885 = product of:
        0.4357246 = sum of:
          0.06216731 = weight(abstract_txt:indexing in 4399) [ClassicSimilarity], result of:
            0.06216731 = score(doc=4399,freq=10.0), product of:
              0.09642124 = queryWeight, product of:
                1.1417214 = boost
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.019416213 = queryNorm
              0.644747 = fieldWeight in 4399, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.046875 = fieldNorm(doc=4399)
          0.039061464 = weight(abstract_txt:concepts in 4399) [ClassicSimilarity], result of:
            0.039061464 = score(doc=4399,freq=3.0), product of:
              0.10566308 = queryWeight, product of:
                1.1951858 = boost
                4.5532694 = idf(docFreq=1265, maxDocs=44218)
                0.019416213 = queryNorm
              0.3696794 = fieldWeight in 4399, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.5532694 = idf(docFreq=1265, maxDocs=44218)
                0.046875 = fieldNorm(doc=4399)
          0.07227404 = weight(abstract_txt:technique in 4399) [ClassicSimilarity], result of:
            0.07227404 = score(doc=4399,freq=3.0), product of:
              0.15924989 = queryWeight, product of:
                1.4672811 = boost
                5.5898643 = idf(docFreq=448, maxDocs=44218)
                0.019416213 = queryNorm
              0.4538404 = fieldWeight in 4399, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.5898643 = idf(docFreq=448, maxDocs=44218)
                0.046875 = fieldNorm(doc=4399)
          0.025084917 = weight(abstract_txt:documents in 4399) [ClassicSimilarity], result of:
            0.025084917 = score(doc=4399,freq=1.0), product of:
              0.12984848 = queryWeight, product of:
                1.6226985 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.019416213 = queryNorm
              0.19318606 = fieldWeight in 4399, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.046875 = fieldNorm(doc=4399)
          0.07773517 = weight(abstract_txt:words in 4399) [ClassicSimilarity], result of:
            0.07773517 = score(doc=4399,freq=2.0), product of:
              0.21906021 = queryWeight, product of:
                2.107663 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.019416213 = queryNorm
              0.35485756 = fieldWeight in 4399, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.046875 = fieldNorm(doc=4399)
          0.05344741 = weight(abstract_txt:document in 4399) [ClassicSimilarity], result of:
            0.05344741 = score(doc=4399,freq=2.0), product of:
              0.18782316 = queryWeight, product of:
                2.25353 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.019416213 = queryNorm
              0.2845624 = fieldWeight in 4399, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.046875 = fieldNorm(doc=4399)
          0.10595427 = weight(abstract_txt:space in 4399) [ClassicSimilarity], result of:
            0.10595427 = score(doc=4399,freq=2.0), product of:
              0.29639918 = queryWeight, product of:
                2.8309178 = boost
                5.3924384 = idf(docFreq=546, maxDocs=44218)
                0.019416213 = queryNorm
              0.35747153 = fieldWeight in 4399, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.3924384 = idf(docFreq=546, maxDocs=44218)
                0.046875 = fieldNorm(doc=4399)
        0.28 = coord(7/25)