Document (#13137)

Author
Salton, G.
Allan, J.
Singhal, A.
Title
Automatic text decomposition and structuring
Source
Information processing and management. 32(1996) no.2, S.127-138
Year
1996
Abstract
Sophisticated text similarity measurements are used to determine relationships between natural language text and text excerpts. The resulting linked hypertext maps can be decomposed into text segments and text theme, and these decompositions are usable to identify different text types and text structures, leading to improved text access and utilization. Gives examples of text decomposition for expository and non expository texts
Theme
Automatisches Indexieren

Similar documents (author)

  1. Salton, G.; Allan, J.; Buckley, C.; Singhal, A.: Automatic analysis, theme generation, and summarization of machine readable texts (1994) 4.73
    4.7299604 = sum of:
      4.7299604 = sum of:
        1.0915798 = weight(author_txt:salton in 1949) [ClassicSimilarity], result of:
          1.0915798 = score(doc=1949,freq=1.0), product of:
            0.44869825 = queryWeight, product of:
              7.7848644 = idf(docFreq=49, maxDocs=44218)
              0.05763726 = queryNorm
            2.4327703 = fieldWeight in 1949, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              7.7848644 = idf(docFreq=49, maxDocs=44218)
              0.3125 = fieldNorm(doc=1949)
        1.5802094 = weight(author_txt:allan in 1949) [ClassicSimilarity], result of:
          1.5802094 = score(doc=1949,freq=1.0), product of:
            0.57419646 = queryWeight, product of:
              1.1312356 = boost
              8.806516 = idf(docFreq=17, maxDocs=44218)
              0.05763726 = queryNorm
            2.752036 = fieldWeight in 1949, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.806516 = idf(docFreq=17, maxDocs=44218)
              0.3125 = fieldNorm(doc=1949)
        2.0581715 = weight(author_txt:singhal in 1949) [ClassicSimilarity], result of:
          2.0581715 = score(doc=1949,freq=1.0), product of:
            0.68481266 = queryWeight, product of:
              1.2354032 = boost
              9.617446 = idf(docFreq=7, maxDocs=44218)
              0.05763726 = queryNorm
            3.005452 = fieldWeight in 1949, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.617446 = idf(docFreq=7, maxDocs=44218)
              0.3125 = fieldNorm(doc=1949)
    
  2. Salton, G.; Allan, J.: Selective text utilization and text traversal (1995) 2.85
    2.8499086 = sum of:
      2.8499086 = product of:
        4.274863 = sum of:
          1.7465276 = weight(author_txt:salton in 6805) [ClassicSimilarity], result of:
            1.7465276 = score(doc=6805,freq=1.0), product of:
              0.44869825 = queryWeight, product of:
                7.7848644 = idf(docFreq=49, maxDocs=44218)
                0.05763726 = queryNorm
              3.8924322 = fieldWeight in 6805, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7848644 = idf(docFreq=49, maxDocs=44218)
                0.5 = fieldNorm(doc=6805)
          2.528335 = weight(author_txt:allan in 6805) [ClassicSimilarity], result of:
            2.528335 = score(doc=6805,freq=1.0), product of:
              0.57419646 = queryWeight, product of:
                1.1312356 = boost
                8.806516 = idf(docFreq=17, maxDocs=44218)
                0.05763726 = queryNorm
              4.403258 = fieldWeight in 6805, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.806516 = idf(docFreq=17, maxDocs=44218)
                0.5 = fieldNorm(doc=6805)
        0.6666667 = coord(2/3)
    
  3. Salton, G.; Buckley, C.; Allan, J.: Automatic structuring of text files (1992) 2.14
    2.1374314 = sum of:
      2.1374314 = product of:
        3.206147 = sum of:
          1.3098956 = weight(author_txt:salton in 6507) [ClassicSimilarity], result of:
            1.3098956 = score(doc=6507,freq=1.0), product of:
              0.44869825 = queryWeight, product of:
                7.7848644 = idf(docFreq=49, maxDocs=44218)
                0.05763726 = queryNorm
              2.9193242 = fieldWeight in 6507, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7848644 = idf(docFreq=49, maxDocs=44218)
                0.375 = fieldNorm(doc=6507)
          1.8962513 = weight(author_txt:allan in 6507) [ClassicSimilarity], result of:
            1.8962513 = score(doc=6507,freq=1.0), product of:
              0.57419646 = queryWeight, product of:
                1.1312356 = boost
                8.806516 = idf(docFreq=17, maxDocs=44218)
                0.05763726 = queryNorm
              3.3024435 = fieldWeight in 6507, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.806516 = idf(docFreq=17, maxDocs=44218)
                0.375 = fieldNorm(doc=6507)
        0.6666667 = coord(2/3)
    
  4. Buckley, C.; Allan, J.; Salton, G.: Automatic routing and retrieval using Smart : TREC-2 (1995) 2.14
    2.1374314 = sum of:
      2.1374314 = product of:
        3.206147 = sum of:
          1.3098956 = weight(author_txt:salton in 5699) [ClassicSimilarity], result of:
            1.3098956 = score(doc=5699,freq=1.0), product of:
              0.44869825 = queryWeight, product of:
                7.7848644 = idf(docFreq=49, maxDocs=44218)
                0.05763726 = queryNorm
              2.9193242 = fieldWeight in 5699, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7848644 = idf(docFreq=49, maxDocs=44218)
                0.375 = fieldNorm(doc=5699)
          1.8962513 = weight(author_txt:allan in 5699) [ClassicSimilarity], result of:
            1.8962513 = score(doc=5699,freq=1.0), product of:
              0.57419646 = queryWeight, product of:
                1.1312356 = boost
                8.806516 = idf(docFreq=17, maxDocs=44218)
                0.05763726 = queryNorm
              3.3024435 = fieldWeight in 5699, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.806516 = idf(docFreq=17, maxDocs=44218)
                0.375 = fieldNorm(doc=5699)
        0.6666667 = coord(2/3)
    
  5. Salton, G.; Allen, J.; Buckley, C.; Singhal, A.: Automatic analysis, theme generation, and summarization of machine-readable data (1994) 2.10
    2.0998342 = sum of:
      2.0998342 = product of:
        3.1497512 = sum of:
          1.0915798 = weight(author_txt:salton in 1168) [ClassicSimilarity], result of:
            1.0915798 = score(doc=1168,freq=1.0), product of:
              0.44869825 = queryWeight, product of:
                7.7848644 = idf(docFreq=49, maxDocs=44218)
                0.05763726 = queryNorm
              2.4327703 = fieldWeight in 1168, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7848644 = idf(docFreq=49, maxDocs=44218)
                0.3125 = fieldNorm(doc=1168)
          2.0581715 = weight(author_txt:singhal in 1168) [ClassicSimilarity], result of:
            2.0581715 = score(doc=1168,freq=1.0), product of:
              0.68481266 = queryWeight, product of:
                1.2354032 = boost
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.05763726 = queryNorm
              3.005452 = fieldWeight in 1168, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.3125 = fieldNorm(doc=1168)
        0.6666667 = coord(2/3)
    

Similar documents (content)

  1. Salton, G.; Buckley, C.: Approaches to global text analysis (1990) 0.19
    0.18967693 = sum of:
      0.18967693 = product of:
        0.94838464 = sum of:
          0.050953824 = weight(abstract_txt:linked in 4901) [ClassicSimilarity], result of:
            0.050953824 = score(doc=4901,freq=1.0), product of:
              0.08393094 = queryWeight, product of:
                1.0845941 = boost
                5.550558 = idf(docFreq=466, maxDocs=44218)
                0.01394178 = queryNorm
              0.60709226 = fieldWeight in 4901, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.550558 = idf(docFreq=466, maxDocs=44218)
                0.109375 = fieldNorm(doc=4901)
          0.053863294 = weight(abstract_txt:texts in 4901) [ClassicSimilarity], result of:
            0.053863294 = score(doc=4901,freq=1.0), product of:
              0.08709626 = queryWeight, product of:
                1.1048566 = boost
                5.6542544 = idf(docFreq=420, maxDocs=44218)
                0.01394178 = queryNorm
              0.6184341 = fieldWeight in 4901, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6542544 = idf(docFreq=420, maxDocs=44218)
                0.109375 = fieldNorm(doc=4901)
          0.06141178 = weight(abstract_txt:leading in 4901) [ClassicSimilarity], result of:
            0.06141178 = score(doc=4901,freq=1.0), product of:
              0.09505436 = queryWeight, product of:
                1.1542296 = boost
                5.906927 = idf(docFreq=326, maxDocs=44218)
                0.01394178 = queryNorm
              0.6460701 = fieldWeight in 4901, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.906927 = idf(docFreq=326, maxDocs=44218)
                0.109375 = fieldNorm(doc=4901)
          0.29950503 = weight(abstract_txt:excerpts in 4901) [ClassicSimilarity], result of:
            0.29950503 = score(doc=4901,freq=2.0), product of:
              0.21696864 = queryWeight, product of:
                1.7438321 = boost
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.01394178 = queryNorm
              1.380407 = fieldWeight in 4901, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.109375 = fieldNorm(doc=4901)
          0.48265073 = weight(abstract_txt:text in 4901) [ClassicSimilarity], result of:
            0.48265073 = score(doc=4901,freq=6.0), product of:
              0.44549462 = queryWeight, product of:
                7.9018254 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.01394178 = queryNorm
              1.0834042 = fieldWeight in 4901, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.109375 = fieldNorm(doc=4901)
        0.2 = coord(5/25)
    
  2. Rittschof, K.A.; Kulhavy, R.W.; Stock, W.A.; Verdi, M.P.; Doran, J.M.: Thematic maps improve memory for facts and inferences : a test of the stimulus order hypothesis (1994) 0.15
    0.14774832 = sum of:
      0.14774832 = product of:
        0.923427 = sum of:
          0.043461733 = weight(abstract_txt:maps in 2089) [ClassicSimilarity], result of:
            0.043461733 = score(doc=2089,freq=1.0), product of:
              0.094470076 = queryWeight, product of:
                1.1506767 = boost
                5.888745 = idf(docFreq=332, maxDocs=44218)
                0.01394178 = queryNorm
              0.46005818 = fieldWeight in 2089, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.888745 = idf(docFreq=332, maxDocs=44218)
                0.078125 = fieldNorm(doc=2089)
          0.1515872 = weight(abstract_txt:theme in 2089) [ClassicSimilarity], result of:
            0.1515872 = score(doc=2089,freq=5.0), product of:
              0.12705973 = queryWeight, product of:
                1.3344741 = boost
                6.829353 = idf(docFreq=129, maxDocs=44218)
                0.01394178 = queryNorm
              1.193039 = fieldWeight in 2089, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.829353 = idf(docFreq=129, maxDocs=44218)
                0.078125 = fieldNorm(doc=2089)
          0.41366526 = weight(abstract_txt:expository in 2089) [ClassicSimilarity], result of:
            0.41366526 = score(doc=2089,freq=1.0), product of:
              0.53456306 = queryWeight, product of:
                3.8709776 = boost
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.01394178 = queryNorm
              0.7738381 = fieldWeight in 2089, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.078125 = fieldNorm(doc=2089)
          0.31471276 = weight(abstract_txt:text in 2089) [ClassicSimilarity], result of:
            0.31471276 = score(doc=2089,freq=5.0), product of:
              0.44549462 = queryWeight, product of:
                7.9018254 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.01394178 = queryNorm
              0.7064345 = fieldWeight in 2089, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=2089)
        0.16 = coord(4/25)
    
  3. Liu, S.: Decomposing DDC synthesized numbers (1997) 0.11
    0.10658773 = sum of:
      0.10658773 = product of:
        0.66617334 = sum of:
          0.029850073 = weight(abstract_txt:automatic in 5968) [ClassicSimilarity], result of:
            0.029850073 = score(doc=5968,freq=1.0), product of:
              0.07353936 = queryWeight, product of:
                1.0152339 = boost
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.01394178 = queryNorm
              0.40590608 = fieldWeight in 5968, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.078125 = fieldNorm(doc=5968)
          0.21860714 = weight(abstract_txt:decomposed in 5968) [ClassicSimilarity], result of:
            0.21860714 = score(doc=5968,freq=2.0), product of:
              0.22011814 = queryWeight, product of:
                1.7564431 = boost
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.01394178 = queryNorm
              0.9931356 = fieldWeight in 5968, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.078125 = fieldNorm(doc=5968)
          0.20683263 = weight(abstract_txt:decompositions in 5968) [ClassicSimilarity], result of:
            0.20683263 = score(doc=5968,freq=1.0), product of:
              0.26728153 = queryWeight, product of:
                1.9354888 = boost
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.01394178 = queryNorm
              0.7738381 = fieldWeight in 5968, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.078125 = fieldNorm(doc=5968)
          0.21088348 = weight(abstract_txt:decomposition in 5968) [ClassicSimilarity], result of:
            0.21088348 = score(doc=5968,freq=1.0), product of:
              0.34113634 = queryWeight, product of:
                3.0923252 = boost
                7.912698 = idf(docFreq=43, maxDocs=44218)
                0.01394178 = queryNorm
              0.6181795 = fieldWeight in 5968, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.912698 = idf(docFreq=43, maxDocs=44218)
                0.078125 = fieldNorm(doc=5968)
        0.16 = coord(4/25)
    
  4. Rafols, I.; Leydesdorff, L.: Content-based and algorithmic classifications of journals : perspectives on the dynamics of scientific communication and indexer effects (2009) 0.10
    0.10308526 = sum of:
      0.10308526 = product of:
        0.5154263 = sum of:
          0.022821125 = weight(abstract_txt:structures in 3095) [ClassicSimilarity], result of:
            0.022821125 = score(doc=3095,freq=1.0), product of:
              0.07134896 = queryWeight, product of:
                5.117636 = idf(docFreq=719, maxDocs=44218)
                0.01394178 = queryNorm
              0.31985226 = fieldWeight in 3095, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.117636 = idf(docFreq=719, maxDocs=44218)
                0.0625 = fieldNorm(doc=3095)
          0.034769386 = weight(abstract_txt:maps in 3095) [ClassicSimilarity], result of:
            0.034769386 = score(doc=3095,freq=1.0), product of:
              0.094470076 = queryWeight, product of:
                1.1506767 = boost
                5.888745 = idf(docFreq=332, maxDocs=44218)
                0.01394178 = queryNorm
              0.36804655 = fieldWeight in 3095, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.888745 = idf(docFreq=332, maxDocs=44218)
                0.0625 = fieldNorm(doc=3095)
          0.12366288 = weight(abstract_txt:decomposed in 3095) [ClassicSimilarity], result of:
            0.12366288 = score(doc=3095,freq=1.0), product of:
              0.22011814 = queryWeight, product of:
                1.7564431 = boost
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.01394178 = queryNorm
              0.5618023 = fieldWeight in 3095, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.0625 = fieldNorm(doc=3095)
          0.1654661 = weight(abstract_txt:decompositions in 3095) [ClassicSimilarity], result of:
            0.1654661 = score(doc=3095,freq=1.0), product of:
              0.26728153 = queryWeight, product of:
                1.9354888 = boost
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.01394178 = queryNorm
              0.6190705 = fieldWeight in 3095, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.0625 = fieldNorm(doc=3095)
          0.16870679 = weight(abstract_txt:decomposition in 3095) [ClassicSimilarity], result of:
            0.16870679 = score(doc=3095,freq=1.0), product of:
              0.34113634 = queryWeight, product of:
                3.0923252 = boost
                7.912698 = idf(docFreq=43, maxDocs=44218)
                0.01394178 = queryNorm
              0.4945436 = fieldWeight in 3095, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.912698 = idf(docFreq=43, maxDocs=44218)
                0.0625 = fieldNorm(doc=3095)
        0.2 = coord(5/25)
    
  5. Salton, G.; Buckley, C.; Allan, J.: Automatic structuring of text files (1992) 0.08
    0.08029291 = sum of:
      0.08029291 = product of:
        0.5018307 = sum of:
          0.03582009 = weight(abstract_txt:automatic in 6507) [ClassicSimilarity], result of:
            0.03582009 = score(doc=6507,freq=1.0), product of:
              0.07353936 = queryWeight, product of:
                1.0152339 = boost
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.01394178 = queryNorm
              0.48708728 = fieldWeight in 6507, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.09375 = fieldNorm(doc=6507)
          0.043674707 = weight(abstract_txt:linked in 6507) [ClassicSimilarity], result of:
            0.043674707 = score(doc=6507,freq=1.0), product of:
              0.08393094 = queryWeight, product of:
                1.0845941 = boost
                5.550558 = idf(docFreq=466, maxDocs=44218)
                0.01394178 = queryNorm
              0.5203648 = fieldWeight in 6507, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.550558 = idf(docFreq=466, maxDocs=44218)
                0.09375 = fieldNorm(doc=6507)
          0.084550716 = weight(abstract_txt:structuring in 6507) [ClassicSimilarity], result of:
            0.084550716 = score(doc=6507,freq=1.0), product of:
              0.13037078 = queryWeight, product of:
                1.3517498 = boost
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.01394178 = queryNorm
              0.6485404 = fieldWeight in 6507, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.09375 = fieldNorm(doc=6507)
          0.33778515 = weight(abstract_txt:text in 6507) [ClassicSimilarity], result of:
            0.33778515 = score(doc=6507,freq=4.0), product of:
              0.44549462 = queryWeight, product of:
                7.9018254 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.01394178 = queryNorm
              0.75822496 = fieldWeight in 6507, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.09375 = fieldNorm(doc=6507)
        0.16 = coord(4/25)