Document (#13137)

Author
Salton, G.
Allan, J.
Singhal, A.
Title
Automatic text decomposition and structuring
Source
Information processing and management. 32(1996) no.2, S.127-138
Year
1996
Abstract
Sophisticated text similarity measurements are used to determine relationships between natural language text and text excerpts. The resulting linked hypertext maps can be decomposed into text segments and text theme, and these decompositions are usable to identify different text types and text structures, leading to improved text access and utilization. Gives examples of text decomposition for expository and non expository texts
Theme
Automatisches Indexieren

Similar documents (author)

  1. Salton, G.; Allan, J.; Buckley, C.; Singhal, A.: Automatic analysis, theme generation, and summarization of machine readable texts (1994) 4.73
    4.7285657 = sum of:
      4.7285657 = sum of:
        1.0830188 = weight(author_txt:salton in 3950) [ClassicSimilarity], result of:
          1.0830188 = score(doc=3950,freq=1.0), product of:
            0.44644335 = queryWeight, product of:
              7.762822 = idf(docFreq=49, maxDocs=43254)
              0.057510443 = queryNorm
            2.4258819 = fieldWeight in 3950, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              7.762822 = idf(docFreq=49, maxDocs=43254)
              0.3125 = fieldNorm(doc=3950)
        1.6002004 = weight(author_txt:allan in 3950) [ClassicSimilarity], result of:
          1.6002004 = score(doc=3950,freq=1.0), product of:
            0.57915115 = queryWeight, product of:
              1.1389713 = boost
              8.841632 = idf(docFreq=16, maxDocs=43254)
              0.057510443 = queryNorm
            2.76301 = fieldWeight in 3950, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.841632 = idf(docFreq=16, maxDocs=43254)
              0.3125 = fieldNorm(doc=3950)
        2.0453465 = weight(author_txt:singhal in 3950) [ClassicSimilarity], result of:
          2.0453465 = score(doc=3950,freq=1.0), product of:
            0.6821087 = queryWeight, product of:
              1.2360716 = boost
              9.595404 = idf(docFreq=7, maxDocs=43254)
              0.057510443 = queryNorm
            2.9985638 = fieldWeight in 3950, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.595404 = idf(docFreq=7, maxDocs=43254)
              0.3125 = fieldNorm(doc=3950)
    
  2. Salton, G.; Allan, J.: Selective text utilization and text traversal (1995) 2.86
    2.8621006 = sum of:
      2.8621006 = product of:
        4.293151 = sum of:
          1.7328302 = weight(author_txt:salton in 874) [ClassicSimilarity], result of:
            1.7328302 = score(doc=874,freq=1.0), product of:
              0.44644335 = queryWeight, product of:
                7.762822 = idf(docFreq=49, maxDocs=43254)
                0.057510443 = queryNorm
              3.881411 = fieldWeight in 874, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.762822 = idf(docFreq=49, maxDocs=43254)
                0.5 = fieldNorm(doc=874)
          2.5603206 = weight(author_txt:allan in 874) [ClassicSimilarity], result of:
            2.5603206 = score(doc=874,freq=1.0), product of:
              0.57915115 = queryWeight, product of:
                1.1389713 = boost
                8.841632 = idf(docFreq=16, maxDocs=43254)
                0.057510443 = queryNorm
              4.420816 = fieldWeight in 874, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.841632 = idf(docFreq=16, maxDocs=43254)
                0.5 = fieldNorm(doc=874)
        0.6666667 = coord(2/3)
    
  3. Salton, G.; Buckley, C.; Allan, J.: Automatic structuring of text files (1992) 2.15
    2.1465755 = sum of:
      2.1465755 = product of:
        3.219863 = sum of:
          1.2996227 = weight(author_txt:salton in 6507) [ClassicSimilarity], result of:
            1.2996227 = score(doc=6507,freq=1.0), product of:
              0.44644335 = queryWeight, product of:
                7.762822 = idf(docFreq=49, maxDocs=43254)
                0.057510443 = queryNorm
              2.9110584 = fieldWeight in 6507, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.762822 = idf(docFreq=49, maxDocs=43254)
                0.375 = fieldNorm(doc=6507)
          1.9202404 = weight(author_txt:allan in 6507) [ClassicSimilarity], result of:
            1.9202404 = score(doc=6507,freq=1.0), product of:
              0.57915115 = queryWeight, product of:
                1.1389713 = boost
                8.841632 = idf(docFreq=16, maxDocs=43254)
                0.057510443 = queryNorm
              3.3156118 = fieldWeight in 6507, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.841632 = idf(docFreq=16, maxDocs=43254)
                0.375 = fieldNorm(doc=6507)
        0.6666667 = coord(2/3)
    
  4. Buckley, C.; Allan, J.; Salton, G.: Automatic routing and retrieval using Smart : TREC-2 (1995) 2.15
    2.1465755 = sum of:
      2.1465755 = product of:
        3.219863 = sum of:
          1.2996227 = weight(author_txt:salton in 700) [ClassicSimilarity], result of:
            1.2996227 = score(doc=700,freq=1.0), product of:
              0.44644335 = queryWeight, product of:
                7.762822 = idf(docFreq=49, maxDocs=43254)
                0.057510443 = queryNorm
              2.9110584 = fieldWeight in 700, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.762822 = idf(docFreq=49, maxDocs=43254)
                0.375 = fieldNorm(doc=700)
          1.9202404 = weight(author_txt:allan in 700) [ClassicSimilarity], result of:
            1.9202404 = score(doc=700,freq=1.0), product of:
              0.57915115 = queryWeight, product of:
                1.1389713 = boost
                8.841632 = idf(docFreq=16, maxDocs=43254)
                0.057510443 = queryNorm
              3.3156118 = fieldWeight in 700, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.841632 = idf(docFreq=16, maxDocs=43254)
                0.375 = fieldNorm(doc=700)
        0.6666667 = coord(2/3)
    
  5. Salton, G.; Allen, J.; Buckley, C.; Singhal, A.: Automatic analysis, theme generation, and summarization of machine-readable data (1994) 2.09
    2.085577 = sum of:
      2.085577 = product of:
        3.1283653 = sum of:
          1.0830188 = weight(author_txt:salton in 2237) [ClassicSimilarity], result of:
            1.0830188 = score(doc=2237,freq=1.0), product of:
              0.44644335 = queryWeight, product of:
                7.762822 = idf(docFreq=49, maxDocs=43254)
                0.057510443 = queryNorm
              2.4258819 = fieldWeight in 2237, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.762822 = idf(docFreq=49, maxDocs=43254)
                0.3125 = fieldNorm(doc=2237)
          2.0453465 = weight(author_txt:singhal in 2237) [ClassicSimilarity], result of:
            2.0453465 = score(doc=2237,freq=1.0), product of:
              0.6821087 = queryWeight, product of:
                1.2360716 = boost
                9.595404 = idf(docFreq=7, maxDocs=43254)
                0.057510443 = queryNorm
              2.9985638 = fieldWeight in 2237, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.595404 = idf(docFreq=7, maxDocs=43254)
                0.3125 = fieldNorm(doc=2237)
        0.6666667 = coord(2/3)
    

Similar documents (content)

  1. Salton, G.; Buckley, C.: Approaches to global text analysis (1990) 0.19
    0.19153248 = sum of:
      0.19153248 = product of:
        0.95766234 = sum of:
          0.05233564 = weight(abstract_txt:linked in 4901) [ClassicSimilarity], result of:
            0.05233564 = score(doc=4901,freq=1.0), product of:
              0.08545359 = queryWeight, product of:
                1.0900954 = boost
                5.5994987 = idf(docFreq=434, maxDocs=43254)
                0.0139996335 = queryNorm
              0.6124452 = fieldWeight in 4901, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5994987 = idf(docFreq=434, maxDocs=43254)
                0.109375 = fieldNorm(doc=4901)
          0.054012883 = weight(abstract_txt:texts in 4901) [ClassicSimilarity], result of:
            0.054012883 = score(doc=4901,freq=1.0), product of:
              0.08726971 = queryWeight, product of:
                1.1016182 = boost
                5.658688 = idf(docFreq=409, maxDocs=43254)
                0.0139996335 = queryNorm
              0.618919 = fieldWeight in 4901, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.658688 = idf(docFreq=409, maxDocs=43254)
                0.109375 = fieldNorm(doc=4901)
          0.062420957 = weight(abstract_txt:leading in 4901) [ClassicSimilarity], result of:
            0.062420957 = score(doc=4901,freq=1.0), product of:
              0.09610638 = queryWeight, product of:
                1.1560469 = boost
                5.9382725 = idf(docFreq=309, maxDocs=43254)
                0.0139996335 = queryNorm
              0.6494986 = fieldWeight in 4901, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9382725 = idf(docFreq=309, maxDocs=43254)
                0.109375 = fieldNorm(doc=4901)
          0.30393225 = weight(abstract_txt:excerpts in 4901) [ClassicSimilarity], result of:
            0.30393225 = score(doc=4901,freq=2.0), product of:
              0.21913235 = queryWeight, product of:
                1.7456316 = boost
                8.966795 = idf(docFreq=14, maxDocs=43254)
                0.0139996335 = queryNorm
              1.3869803 = fieldWeight in 4901, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.966795 = idf(docFreq=14, maxDocs=43254)
                0.109375 = fieldNorm(doc=4901)
          0.48496065 = weight(abstract_txt:text in 4901) [ClassicSimilarity], result of:
            0.48496065 = score(doc=4901,freq=6.0), product of:
              0.4469777 = queryWeight, product of:
                7.883921 = boost
                4.049738 = idf(docFreq=2048, maxDocs=43254)
                0.0139996335 = queryNorm
              1.0849773 = fieldWeight in 4901, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.049738 = idf(docFreq=2048, maxDocs=43254)
                0.109375 = fieldNorm(doc=4901)
        0.2 = coord(5/25)
    
  2. Rittschof, K.A.; Kulhavy, R.W.; Stock, W.A.; Verdi, M.P.; Doran, J.M.: Thematic maps improve memory for facts and inferences : a test of the stimulus order hypothesis (1994) 0.15
    0.14768478 = sum of:
      0.14768478 = product of:
        0.9230299 = sum of:
          0.043462414 = weight(abstract_txt:maps in 3158) [ClassicSimilarity], result of:
            0.043462414 = score(doc=3158,freq=1.0), product of:
              0.09448435 = queryWeight, product of:
                1.1462498 = boost
                5.8879476 = idf(docFreq=325, maxDocs=43254)
                0.0139996335 = queryNorm
              0.4599959 = fieldWeight in 3158, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8879476 = idf(docFreq=325, maxDocs=43254)
                0.078125 = fieldNorm(doc=3158)
          0.15226552 = weight(abstract_txt:theme in 3158) [ClassicSimilarity], result of:
            0.15226552 = score(doc=3158,freq=5.0), product of:
              0.12745641 = queryWeight, product of:
                1.331313 = boost
                6.838563 = idf(docFreq=125, maxDocs=43254)
                0.0139996335 = queryNorm
              1.1946478 = fieldWeight in 3158, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.838563 = idf(docFreq=125, maxDocs=43254)
                0.078125 = fieldNorm(doc=3158)
          0.4110831 = weight(abstract_txt:expository in 3158) [ClassicSimilarity], result of:
            0.4110831 = score(doc=3158,freq=1.0), product of:
              0.53241104 = queryWeight, product of:
                3.8480248 = boost
                9.883085 = idf(docFreq=5, maxDocs=43254)
                0.0139996335 = queryNorm
              0.77211607 = fieldWeight in 3158, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.883085 = idf(docFreq=5, maxDocs=43254)
                0.078125 = fieldNorm(doc=3158)
          0.31621888 = weight(abstract_txt:text in 3158) [ClassicSimilarity], result of:
            0.31621888 = score(doc=3158,freq=5.0), product of:
              0.4469777 = queryWeight, product of:
                7.883921 = boost
                4.049738 = idf(docFreq=2048, maxDocs=43254)
                0.0139996335 = queryNorm
              0.7074601 = fieldWeight in 3158, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.049738 = idf(docFreq=2048, maxDocs=43254)
                0.078125 = fieldNorm(doc=3158)
        0.16 = coord(4/25)
    
  3. Liu, S.: Decomposing DDC synthesized numbers (1997) 0.11
    0.10616954 = sum of:
      0.10616954 = product of:
        0.6635596 = sum of:
          0.029875387 = weight(abstract_txt:automatic in 969) [ClassicSimilarity], result of:
            0.029875387 = score(doc=969,freq=1.0), product of:
              0.07359128 = queryWeight, product of:
                1.0116086 = boost
                5.1963353 = idf(docFreq=650, maxDocs=43254)
                0.0139996335 = queryNorm
              0.4059637 = fieldWeight in 969, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1963353 = idf(docFreq=650, maxDocs=43254)
                0.078125 = fieldNorm(doc=969)
          0.21709447 = weight(abstract_txt:decomposed in 969) [ClassicSimilarity], result of:
            0.21709447 = score(doc=969,freq=2.0), product of:
              0.21913235 = queryWeight, product of:
                1.7456316 = boost
                8.966795 = idf(docFreq=14, maxDocs=43254)
                0.0139996335 = queryNorm
              0.9907002 = fieldWeight in 969, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.966795 = idf(docFreq=14, maxDocs=43254)
                0.078125 = fieldNorm(doc=969)
          0.20554155 = weight(abstract_txt:decompositions in 969) [ClassicSimilarity], result of:
            0.20554155 = score(doc=969,freq=1.0), product of:
              0.26620552 = queryWeight, product of:
                1.9240124 = boost
                9.883085 = idf(docFreq=5, maxDocs=43254)
                0.0139996335 = queryNorm
              0.77211607 = fieldWeight in 969, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.883085 = idf(docFreq=5, maxDocs=43254)
                0.078125 = fieldNorm(doc=969)
          0.2110482 = weight(abstract_txt:decomposition in 969) [ClassicSimilarity], result of:
            0.2110482 = score(doc=969,freq=1.0), product of:
              0.34136194 = queryWeight, product of:
                3.0812142 = boost
                7.913645 = idf(docFreq=42, maxDocs=43254)
                0.0139996335 = queryNorm
              0.61825347 = fieldWeight in 969, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.913645 = idf(docFreq=42, maxDocs=43254)
                0.078125 = fieldNorm(doc=969)
        0.16 = coord(4/25)
    
  4. Rafols, I.; Leydesdorff, L.: Content-based and algorithmic classifications of journals : perspectives on the dynamics of scientific communication and indexer effects (2009) 0.10
    0.102787174 = sum of:
      0.102787174 = product of:
        0.51393586 = sum of:
          0.023086919 = weight(abstract_txt:structures in 96) [ClassicSimilarity], result of:
            0.023086919 = score(doc=96,freq=1.0), product of:
              0.07191199 = queryWeight, product of:
                5.1367054 = idf(docFreq=690, maxDocs=43254)
                0.0139996335 = queryNorm
              0.3210441 = fieldWeight in 96, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1367054 = idf(docFreq=690, maxDocs=43254)
                0.0625 = fieldNorm(doc=96)
          0.03476993 = weight(abstract_txt:maps in 96) [ClassicSimilarity], result of:
            0.03476993 = score(doc=96,freq=1.0), product of:
              0.09448435 = queryWeight, product of:
                1.1462498 = boost
                5.8879476 = idf(docFreq=325, maxDocs=43254)
                0.0139996335 = queryNorm
              0.36799672 = fieldWeight in 96, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8879476 = idf(docFreq=325, maxDocs=43254)
                0.0625 = fieldNorm(doc=96)
          0.122807175 = weight(abstract_txt:decomposed in 96) [ClassicSimilarity], result of:
            0.122807175 = score(doc=96,freq=1.0), product of:
              0.21913235 = queryWeight, product of:
                1.7456316 = boost
                8.966795 = idf(docFreq=14, maxDocs=43254)
                0.0139996335 = queryNorm
              0.5604247 = fieldWeight in 96, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.966795 = idf(docFreq=14, maxDocs=43254)
                0.0625 = fieldNorm(doc=96)
          0.16443324 = weight(abstract_txt:decompositions in 96) [ClassicSimilarity], result of:
            0.16443324 = score(doc=96,freq=1.0), product of:
              0.26620552 = queryWeight, product of:
                1.9240124 = boost
                9.883085 = idf(docFreq=5, maxDocs=43254)
                0.0139996335 = queryNorm
              0.6176928 = fieldWeight in 96, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.883085 = idf(docFreq=5, maxDocs=43254)
                0.0625 = fieldNorm(doc=96)
          0.16883858 = weight(abstract_txt:decomposition in 96) [ClassicSimilarity], result of:
            0.16883858 = score(doc=96,freq=1.0), product of:
              0.34136194 = queryWeight, product of:
                3.0812142 = boost
                7.913645 = idf(docFreq=42, maxDocs=43254)
                0.0139996335 = queryNorm
              0.4946028 = fieldWeight in 96, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.913645 = idf(docFreq=42, maxDocs=43254)
                0.0625 = fieldNorm(doc=96)
        0.2 = coord(5/25)
    
  5. Salton, G.; Buckley, C.; Allan, J.: Automatic structuring of text files (1992) 0.08
    0.08072176 = sum of:
      0.08072176 = product of:
        0.504511 = sum of:
          0.035850465 = weight(abstract_txt:automatic in 6507) [ClassicSimilarity], result of:
            0.035850465 = score(doc=6507,freq=1.0), product of:
              0.07359128 = queryWeight, product of:
                1.0116086 = boost
                5.1963353 = idf(docFreq=650, maxDocs=43254)
                0.0139996335 = queryNorm
              0.48715645 = fieldWeight in 6507, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1963353 = idf(docFreq=650, maxDocs=43254)
                0.09375 = fieldNorm(doc=6507)
          0.04485912 = weight(abstract_txt:linked in 6507) [ClassicSimilarity], result of:
            0.04485912 = score(doc=6507,freq=1.0), product of:
              0.08545359 = queryWeight, product of:
                1.0900954 = boost
                5.5994987 = idf(docFreq=434, maxDocs=43254)
                0.0139996335 = queryNorm
              0.524953 = fieldWeight in 6507, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5994987 = idf(docFreq=434, maxDocs=43254)
                0.09375 = fieldNorm(doc=6507)
          0.084399715 = weight(abstract_txt:structuring in 6507) [ClassicSimilarity], result of:
            0.084399715 = score(doc=6507,freq=1.0), product of:
              0.13023382 = queryWeight, product of:
                1.3457402 = boost
                6.912671 = idf(docFreq=116, maxDocs=43254)
                0.0139996335 = queryNorm
              0.64806294 = fieldWeight in 6507, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.912671 = idf(docFreq=116, maxDocs=43254)
                0.09375 = fieldNorm(doc=6507)
          0.33940172 = weight(abstract_txt:text in 6507) [ClassicSimilarity], result of:
            0.33940172 = score(doc=6507,freq=4.0), product of:
              0.4469777 = queryWeight, product of:
                7.883921 = boost
                4.049738 = idf(docFreq=2048, maxDocs=43254)
                0.0139996335 = queryNorm
              0.75932586 = fieldWeight in 6507, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.049738 = idf(docFreq=2048, maxDocs=43254)
                0.09375 = fieldNorm(doc=6507)
        0.16 = coord(4/25)