Document (#21505)

Author
Srinivasan, P.
Title
Thesaurus construction
Source
Information retrieval: data structures and algorithms. Ed.: W.B. Frakes u. R. Baeza-Yates
Imprint
Englewood Cliffs, NJ : Prentice Hall
Year
1992
Pages
S.161-218
Abstract
Thesauri are valuable structures for Information Retrieval systems. A thesaurus provides a precise and controlled vocabulary which serves to coordinate dacument indexing and document retrieval. In both indexing and retrieval, a thesaurus may be used to select the most appropriate terms. Additionally, the thesaurus can assist the searcher in reformulating search strategies if required. Examines the important features of thesauri. This should allow the reader to differentiate between thesauri. Next, a brief overview of the manual thesaurus construction process is given. 2 major approaches for automatic thesaurus construction have been selected for detailed examination. The first is on thesaurus construction from collections of documents,a nd the 2nd, on thesaurus construction by merging existing thesauri. These 2 methods were selected since they rely on statistical techniques alone and are also significantly different from each other. Programs written in C language accompany the discussion of these approaches
Theme
Konzeption und Anwendung des Prinzips Thesaurus

Similar documents (author)

  1. Srinivasan, P.: Expert interface to Library of Congress Subject Headings (1990/91) 5.41
    5.4077277 = sum of:
      5.4077277 = weight(author_txt:srinivasan in 2209) [ClassicSimilarity], result of:
        5.4077277 = score(doc=2209,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.652365 = idf(docFreq=20, maxDocs=44218)
            0.115575336 = queryNorm
          5.407728 = fieldWeight in 2209, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.652365 = idf(docFreq=20, maxDocs=44218)
            0.625 = fieldNorm(doc=2209)
    
  2. Srinivasan, P.: Query expansion and MEDLINE (1996) 5.41
    5.4077277 = sum of:
      5.4077277 = weight(author_txt:srinivasan in 8453) [ClassicSimilarity], result of:
        5.4077277 = score(doc=8453,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.652365 = idf(docFreq=20, maxDocs=44218)
            0.115575336 = queryNorm
          5.407728 = fieldWeight in 8453, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.652365 = idf(docFreq=20, maxDocs=44218)
            0.625 = fieldNorm(doc=8453)
    
  3. Srinivasan, P.: Intelligent information retrieval using rough set approximations (1989) 5.41
    5.4077277 = sum of:
      5.4077277 = weight(author_txt:srinivasan in 2526) [ClassicSimilarity], result of:
        5.4077277 = score(doc=2526,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.652365 = idf(docFreq=20, maxDocs=44218)
            0.115575336 = queryNorm
          5.407728 = fieldWeight in 2526, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.652365 = idf(docFreq=20, maxDocs=44218)
            0.625 = fieldNorm(doc=2526)
    
  4. Srinivasan, P.: On generalizing the Two-Poisson Model (1990) 5.41
    5.4077277 = sum of:
      5.4077277 = weight(author_txt:srinivasan in 2880) [ClassicSimilarity], result of:
        5.4077277 = score(doc=2880,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.652365 = idf(docFreq=20, maxDocs=44218)
            0.115575336 = queryNorm
          5.407728 = fieldWeight in 2880, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.652365 = idf(docFreq=20, maxDocs=44218)
            0.625 = fieldNorm(doc=2880)
    
  5. Srinivasan, P.: Optimal document-indexing vocabulary for MEDLINE (1996) 5.41
    5.4077277 = sum of:
      5.4077277 = weight(author_txt:srinivasan in 6634) [ClassicSimilarity], result of:
        5.4077277 = score(doc=6634,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.652365 = idf(docFreq=20, maxDocs=44218)
            0.115575336 = queryNorm
          5.407728 = fieldWeight in 6634, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.652365 = idf(docFreq=20, maxDocs=44218)
            0.625 = fieldNorm(doc=6634)
    

Similar documents (content)

  1. Nielsen, M.L.: Future thesauri : what kind of conceptual knowledge do searchers need? (1998) 0.29
    0.2931101 = sum of:
      0.2931101 = product of:
        1.0468218 = sum of:
          0.0445664 = weight(abstract_txt:valuable in 145) [ClassicSimilarity], result of:
            0.0445664 = score(doc=145,freq=1.0), product of:
              0.09581139 = queryWeight, product of:
                5.953884 = idf(docFreq=311, maxDocs=44218)
                0.01609225 = queryNorm
              0.4651472 = fieldWeight in 145, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.953884 = idf(docFreq=311, maxDocs=44218)
                0.078125 = fieldNorm(doc=145)
          0.06658636 = weight(abstract_txt:searcher in 145) [ClassicSimilarity], result of:
            0.06658636 = score(doc=145,freq=1.0), product of:
              0.12521863 = queryWeight, product of:
                1.1432097 = boost
                6.806538 = idf(docFreq=132, maxDocs=44218)
                0.01609225 = queryNorm
              0.5317608 = fieldWeight in 145, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.806538 = idf(docFreq=132, maxDocs=44218)
                0.078125 = fieldNorm(doc=145)
          0.034752063 = weight(abstract_txt:indexing in 145) [ClassicSimilarity], result of:
            0.034752063 = score(doc=145,freq=1.0), product of:
              0.10226864 = queryWeight, product of:
                1.4610924 = boost
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.01609225 = queryNorm
              0.3398115 = fieldWeight in 145, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.078125 = fieldNorm(doc=145)
          0.02658547 = weight(abstract_txt:retrieval in 145) [ClassicSimilarity], result of:
            0.02658547 = score(doc=145,freq=1.0), product of:
              0.097922415 = queryWeight, product of:
                1.7510281 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.01609225 = queryNorm
              0.27149525 = fieldWeight in 145, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.078125 = fieldNorm(doc=145)
          0.30264348 = weight(abstract_txt:thesauri in 145) [ClassicSimilarity], result of:
            0.30264348 = score(doc=145,freq=5.0), product of:
              0.31895518 = queryWeight, product of:
                3.649104 = boost
                5.431586 = idf(docFreq=525, maxDocs=44218)
                0.01609225 = queryNorm
              0.948859 = fieldWeight in 145, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.431586 = idf(docFreq=525, maxDocs=44218)
                0.078125 = fieldNorm(doc=145)
          0.24232343 = weight(abstract_txt:construction in 145) [ClassicSimilarity], result of:
            0.24232343 = score(doc=145,freq=2.0), product of:
              0.40208918 = queryWeight, product of:
                4.580761 = boost
                5.4546638 = idf(docFreq=513, maxDocs=44218)
                0.01609225 = queryNorm
              0.6026609 = fieldWeight in 145, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4546638 = idf(docFreq=513, maxDocs=44218)
                0.078125 = fieldNorm(doc=145)
          0.32936463 = weight(abstract_txt:thesaurus in 145) [ClassicSimilarity], result of:
            0.32936463 = score(doc=145,freq=2.0), product of:
              0.5770544 = queryWeight, product of:
                6.941364 = boost
                5.1660094 = idf(docFreq=685, maxDocs=44218)
                0.01609225 = queryNorm
              0.5707688 = fieldWeight in 145, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1660094 = idf(docFreq=685, maxDocs=44218)
                0.078125 = fieldNorm(doc=145)
        0.28 = coord(7/25)
    
  2. Spiteri, L.F.: ¬The use of facet analysis in information retrieval thesauri : an examination of selected guidelines for thesaurus construction (1997) 0.26
    0.25889933 = sum of:
      0.25889933 = product of:
        1.2944967 = sum of:
          0.05921548 = weight(abstract_txt:examination in 372) [ClassicSimilarity], result of:
            0.05921548 = score(doc=372,freq=1.0), product of:
              0.10254507 = queryWeight, product of:
                1.0345436 = boost
                6.159553 = idf(docFreq=253, maxDocs=44218)
                0.01609225 = queryNorm
              0.5774581 = fieldWeight in 372, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.159553 = idf(docFreq=253, maxDocs=44218)
                0.09375 = fieldNorm(doc=372)
          0.031902567 = weight(abstract_txt:retrieval in 372) [ClassicSimilarity], result of:
            0.031902567 = score(doc=372,freq=1.0), product of:
              0.097922415 = queryWeight, product of:
                1.7510281 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.01609225 = queryNorm
              0.3257943 = fieldWeight in 372, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.09375 = fieldNorm(doc=372)
          0.3631722 = weight(abstract_txt:thesauri in 372) [ClassicSimilarity], result of:
            0.3631722 = score(doc=372,freq=5.0), product of:
              0.31895518 = queryWeight, product of:
                3.649104 = boost
                5.431586 = idf(docFreq=525, maxDocs=44218)
                0.01609225 = queryNorm
              1.1386309 = fieldWeight in 372, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.431586 = idf(docFreq=525, maxDocs=44218)
                0.09375 = fieldNorm(doc=372)
          0.35614127 = weight(abstract_txt:construction in 372) [ClassicSimilarity], result of:
            0.35614127 = score(doc=372,freq=3.0), product of:
              0.40208918 = queryWeight, product of:
                4.580761 = boost
                5.4546638 = idf(docFreq=513, maxDocs=44218)
                0.01609225 = queryNorm
              0.88572705 = fieldWeight in 372, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.4546638 = idf(docFreq=513, maxDocs=44218)
                0.09375 = fieldNorm(doc=372)
          0.48406518 = weight(abstract_txt:thesaurus in 372) [ClassicSimilarity], result of:
            0.48406518 = score(doc=372,freq=3.0), product of:
              0.5770544 = queryWeight, product of:
                6.941364 = boost
                5.1660094 = idf(docFreq=685, maxDocs=44218)
                0.01609225 = queryNorm
              0.8388554 = fieldWeight in 372, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.1660094 = idf(docFreq=685, maxDocs=44218)
                0.09375 = fieldNorm(doc=372)
        0.2 = coord(5/25)
    
  3. Sanatjoo, A.: Development of thesaurus structure through a work-task oriented methodology 0.24
    0.2440995 = sum of:
      0.2440995 = product of:
        1.0170813 = sum of:
          0.035653118 = weight(abstract_txt:valuable in 3536) [ClassicSimilarity], result of:
            0.035653118 = score(doc=3536,freq=1.0), product of:
              0.09581139 = queryWeight, product of:
                5.953884 = idf(docFreq=311, maxDocs=44218)
                0.01609225 = queryNorm
              0.37211776 = fieldWeight in 3536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.953884 = idf(docFreq=311, maxDocs=44218)
                0.0625 = fieldNorm(doc=3536)
          0.053269085 = weight(abstract_txt:searcher in 3536) [ClassicSimilarity], result of:
            0.053269085 = score(doc=3536,freq=1.0), product of:
              0.12521863 = queryWeight, product of:
                1.1432097 = boost
                6.806538 = idf(docFreq=132, maxDocs=44218)
                0.01609225 = queryNorm
              0.42540863 = fieldWeight in 3536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.806538 = idf(docFreq=132, maxDocs=44218)
                0.0625 = fieldNorm(doc=3536)
          0.03683791 = weight(abstract_txt:retrieval in 3536) [ClassicSimilarity], result of:
            0.03683791 = score(doc=3536,freq=3.0), product of:
              0.097922415 = queryWeight, product of:
                1.7510281 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.01609225 = queryNorm
              0.37619486 = fieldWeight in 3536, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.0625 = fieldNorm(doc=3536)
          0.10827703 = weight(abstract_txt:thesauri in 3536) [ClassicSimilarity], result of:
            0.10827703 = score(doc=3536,freq=1.0), product of:
              0.31895518 = queryWeight, product of:
                3.649104 = boost
                5.431586 = idf(docFreq=525, maxDocs=44218)
                0.01609225 = queryNorm
              0.3394741 = fieldWeight in 3536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.431586 = idf(docFreq=525, maxDocs=44218)
                0.0625 = fieldNorm(doc=3536)
          0.19385874 = weight(abstract_txt:construction in 3536) [ClassicSimilarity], result of:
            0.19385874 = score(doc=3536,freq=2.0), product of:
              0.40208918 = queryWeight, product of:
                4.580761 = boost
                5.4546638 = idf(docFreq=513, maxDocs=44218)
                0.01609225 = queryNorm
              0.4821287 = fieldWeight in 3536, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4546638 = idf(docFreq=513, maxDocs=44218)
                0.0625 = fieldNorm(doc=3536)
          0.5891854 = weight(abstract_txt:thesaurus in 3536) [ClassicSimilarity], result of:
            0.5891854 = score(doc=3536,freq=10.0), product of:
              0.5770544 = queryWeight, product of:
                6.941364 = boost
                5.1660094 = idf(docFreq=685, maxDocs=44218)
                0.01609225 = queryNorm
              1.0210223 = fieldWeight in 3536, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                5.1660094 = idf(docFreq=685, maxDocs=44218)
                0.0625 = fieldNorm(doc=3536)
        0.24 = coord(6/25)
    
  4. McCulloch, E.: Thesauri: practical guidance for construction (2005) 0.24
    0.2394239 = sum of:
      0.2394239 = product of:
        0.9975996 = sum of:
          0.03902978 = weight(abstract_txt:assist in 4724) [ClassicSimilarity], result of:
            0.03902978 = score(doc=4724,freq=1.0), product of:
              0.10176916 = queryWeight, product of:
                1.0306222 = boost
                6.1362057 = idf(docFreq=259, maxDocs=44218)
                0.01609225 = queryNorm
              0.38351285 = fieldWeight in 4724, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1362057 = idf(docFreq=259, maxDocs=44218)
                0.0625 = fieldNorm(doc=4724)
          0.021268377 = weight(abstract_txt:retrieval in 4724) [ClassicSimilarity], result of:
            0.021268377 = score(doc=4724,freq=1.0), product of:
              0.097922415 = queryWeight, product of:
                1.7510281 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.01609225 = queryNorm
              0.21719621 = fieldWeight in 4724, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.0625 = fieldNorm(doc=4724)
          0.04839539 = weight(abstract_txt:selected in 4724) [ClassicSimilarity], result of:
            0.04839539 = score(doc=4724,freq=1.0), product of:
              0.14798969 = queryWeight, product of:
                1.7576085 = boost
                5.232299 = idf(docFreq=641, maxDocs=44218)
                0.01609225 = queryNorm
              0.32701868 = fieldWeight in 4724, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.232299 = idf(docFreq=641, maxDocs=44218)
                0.0625 = fieldNorm(doc=4724)
          0.2421148 = weight(abstract_txt:thesauri in 4724) [ClassicSimilarity], result of:
            0.2421148 = score(doc=4724,freq=5.0), product of:
              0.31895518 = queryWeight, product of:
                3.649104 = boost
                5.431586 = idf(docFreq=525, maxDocs=44218)
                0.01609225 = queryNorm
              0.7590872 = fieldWeight in 4724, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.431586 = idf(docFreq=525, maxDocs=44218)
                0.0625 = fieldNorm(doc=4724)
          0.27415767 = weight(abstract_txt:construction in 4724) [ClassicSimilarity], result of:
            0.27415767 = score(doc=4724,freq=4.0), product of:
              0.40208918 = queryWeight, product of:
                4.580761 = boost
                5.4546638 = idf(docFreq=513, maxDocs=44218)
                0.01609225 = queryNorm
              0.68183297 = fieldWeight in 4724, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.4546638 = idf(docFreq=513, maxDocs=44218)
                0.0625 = fieldNorm(doc=4724)
          0.37263355 = weight(abstract_txt:thesaurus in 4724) [ClassicSimilarity], result of:
            0.37263355 = score(doc=4724,freq=4.0), product of:
              0.5770544 = queryWeight, product of:
                6.941364 = boost
                5.1660094 = idf(docFreq=685, maxDocs=44218)
                0.01609225 = queryNorm
              0.6457512 = fieldWeight in 4724, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.1660094 = idf(docFreq=685, maxDocs=44218)
                0.0625 = fieldNorm(doc=4724)
        0.24 = coord(6/25)
    
  5. Hou, H.; Chen, S.: ¬The integration of Chinese classification and thesaurus : its progress and technical features (1996) 0.22
    0.21926601 = sum of:
      0.21926601 = product of:
        1.3704126 = sum of:
          0.09829368 = weight(abstract_txt:indexing in 2319) [ClassicSimilarity], result of:
            0.09829368 = score(doc=2319,freq=2.0), product of:
              0.10226864 = queryWeight, product of:
                1.4610924 = boost
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.01609225 = queryNorm
              0.9611321 = fieldWeight in 2319, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.15625 = fieldNorm(doc=2319)
          0.27069256 = weight(abstract_txt:thesauri in 2319) [ClassicSimilarity], result of:
            0.27069256 = score(doc=2319,freq=1.0), product of:
              0.31895518 = queryWeight, product of:
                3.649104 = boost
                5.431586 = idf(docFreq=525, maxDocs=44218)
                0.01609225 = queryNorm
              0.84868526 = fieldWeight in 2319, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.431586 = idf(docFreq=525, maxDocs=44218)
                0.15625 = fieldNorm(doc=2319)
          0.34269708 = weight(abstract_txt:construction in 2319) [ClassicSimilarity], result of:
            0.34269708 = score(doc=2319,freq=1.0), product of:
              0.40208918 = queryWeight, product of:
                4.580761 = boost
                5.4546638 = idf(docFreq=513, maxDocs=44218)
                0.01609225 = queryNorm
              0.8522912 = fieldWeight in 2319, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4546638 = idf(docFreq=513, maxDocs=44218)
                0.15625 = fieldNorm(doc=2319)
          0.65872926 = weight(abstract_txt:thesaurus in 2319) [ClassicSimilarity], result of:
            0.65872926 = score(doc=2319,freq=2.0), product of:
              0.5770544 = queryWeight, product of:
                6.941364 = boost
                5.1660094 = idf(docFreq=685, maxDocs=44218)
                0.01609225 = queryNorm
              1.1415375 = fieldWeight in 2319, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1660094 = idf(docFreq=685, maxDocs=44218)
                0.15625 = fieldNorm(doc=2319)
        0.16 = coord(4/25)