Document (#13222)

Author
Huffman, G.D.
Vital, D.A.
Bivins, R.G.
Title
Generating indices with lexical association methods : term uniqueness
Source
Information processing and management. 26(1990) no.4, S.549-558
Year
1990
Abstract
A software system has been developed which orders citations retrieved from an online database in terms of relevancy. The system resulted from an effort generated by NASA's Technology Utilization Program to create new advanced software tools to largely automate the process of determining relevancy of database citations retrieved to support large technology transfer studies. The ranking is based on the generation of an enriched vocabulary using lexical association methods, a user assessment of the vocabulary and a combination of the user assessment and the lexical metric. One of the key elements in relevancy ranking is the enriched vocabulary -the terms mst be both unique and descriptive. This paper examines term uniqueness. Six lexical association methods were employed to generate characteristic word indices. A limited subset of the terms - the highest 20,40,60 and 7,5% of the uniquess words - we compared and uniquess factors developed. Computational times were also measured. It was found that methods based on occurrences and signal produced virtually the same terms. The limited subset of terms producedby the exact and centroid discrimination value were also nearly identical. Unique terms sets were produced by teh occurrence, variance and discrimination value (centroid), An end-user evaluation showed that the generated terms were largely distinct and had values of word precision which were consistent with values of the search precision.
Theme
Retrievalstudien
Indexierungsstudien

Similar documents (content)

  1. Bilal, D.: Ranking, relevance judgment, and precision of information retrieval on children's queries : evaluation of Google, Yahoo!, Bing, Yahoo! Kids, and ask Kids (2012) 0.16
    0.15874413 = sum of:
      0.15874413 = product of:
        0.5669433 = sum of:
          0.059377354 = weight(abstract_txt:word in 393) [ClassicSimilarity], result of:
            0.059377354 = score(doc=393,freq=3.0), product of:
              0.11532966 = queryWeight, product of:
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.021218264 = queryNorm
              0.51484895 = fieldWeight in 393, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.0546875 = fieldNorm(doc=393)
          0.10184802 = weight(abstract_txt:precision in 393) [ClassicSimilarity], result of:
            0.10184802 = score(doc=393,freq=8.0), product of:
              0.11917155 = queryWeight, product of:
                1.0165197 = boost
                5.5251865 = idf(docFreq=478, maxDocs=44218)
                0.021218264 = queryNorm
              0.8546337 = fieldWeight in 393, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                5.5251865 = idf(docFreq=478, maxDocs=44218)
                0.0546875 = fieldNorm(doc=393)
          0.06308612 = weight(abstract_txt:produced in 393) [ClassicSimilarity], result of:
            0.06308612 = score(doc=393,freq=3.0), product of:
              0.12008341 = queryWeight, product of:
                1.0204012 = boost
                5.5462847 = idf(docFreq=468, maxDocs=44218)
                0.021218264 = queryNorm
              0.52535254 = fieldWeight in 393, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.5462847 = idf(docFreq=468, maxDocs=44218)
                0.0546875 = fieldNorm(doc=393)
          0.052987043 = weight(abstract_txt:ranking in 393) [ClassicSimilarity], result of:
            0.052987043 = score(doc=393,freq=2.0), product of:
              0.12236878 = queryWeight, product of:
                1.0300654 = boost
                5.598813 = idf(docFreq=444, maxDocs=44218)
                0.021218264 = queryNorm
              0.4330111 = fieldWeight in 393, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.598813 = idf(docFreq=444, maxDocs=44218)
                0.0546875 = fieldNorm(doc=393)
          0.07940206 = weight(abstract_txt:retrieved in 393) [ClassicSimilarity], result of:
            0.07940206 = score(doc=393,freq=4.0), product of:
              0.12718484 = queryWeight, product of:
                1.0501399 = boost
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.021218264 = queryNorm
              0.6243044 = fieldWeight in 393, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.0546875 = fieldNorm(doc=393)
          0.03165985 = weight(abstract_txt:were in 393) [ClassicSimilarity], result of:
            0.03165985 = score(doc=393,freq=1.0), product of:
              0.15774193 = queryWeight, product of:
                2.0256467 = boost
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.021218264 = queryNorm
              0.20070662 = fieldWeight in 393, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.0546875 = fieldNorm(doc=393)
          0.17858288 = weight(abstract_txt:relevancy in 393) [ClassicSimilarity], result of:
            0.17858288 = score(doc=393,freq=1.0), product of:
              0.39672643 = queryWeight, product of:
                2.2715416 = boost
                8.231152 = idf(docFreq=31, maxDocs=44218)
                0.021218264 = queryNorm
              0.4501411 = fieldWeight in 393, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.231152 = idf(docFreq=31, maxDocs=44218)
                0.0546875 = fieldNorm(doc=393)
        0.28 = coord(7/25)
    
  2. Lucas, W.; Topi, H.: Form and function : the impact of query term and operator usage on Web search results (2002) 0.13
    0.1338025 = sum of:
      0.1338025 = product of:
        0.6690125 = sum of:
          0.018291427 = weight(abstract_txt:user in 198) [ClassicSimilarity], result of:
            0.018291427 = score(doc=198,freq=1.0), product of:
              0.07945143 = queryWeight, product of:
                1.0165435 = boost
                3.6835442 = idf(docFreq=3020, maxDocs=44218)
                0.021218264 = queryNorm
              0.23022151 = fieldWeight in 198, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6835442 = idf(docFreq=3020, maxDocs=44218)
                0.0625 = fieldNorm(doc=198)
          0.0453726 = weight(abstract_txt:retrieved in 198) [ClassicSimilarity], result of:
            0.0453726 = score(doc=198,freq=1.0), product of:
              0.12718484 = queryWeight, product of:
                1.0501399 = boost
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.021218264 = queryNorm
              0.35674536 = fieldWeight in 198, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.0625 = fieldNorm(doc=198)
          0.05117004 = weight(abstract_txt:were in 198) [ClassicSimilarity], result of:
            0.05117004 = score(doc=198,freq=2.0), product of:
              0.15774193 = queryWeight, product of:
                2.0256467 = boost
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.021218264 = queryNorm
              0.32439086 = fieldWeight in 198, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.0625 = fieldNorm(doc=198)
          0.45636964 = weight(abstract_txt:relevancy in 198) [ClassicSimilarity], result of:
            0.45636964 = score(doc=198,freq=5.0), product of:
              0.39672643 = queryWeight, product of:
                2.2715416 = boost
                8.231152 = idf(docFreq=31, maxDocs=44218)
                0.021218264 = queryNorm
              1.1503384 = fieldWeight in 198, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.231152 = idf(docFreq=31, maxDocs=44218)
                0.0625 = fieldNorm(doc=198)
          0.09780877 = weight(abstract_txt:terms in 198) [ClassicSimilarity], result of:
            0.09780877 = score(doc=198,freq=3.0), product of:
              0.22342941 = queryWeight, product of:
                2.6039562 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.021218264 = queryNorm
              0.4377614 = fieldWeight in 198, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=198)
        0.2 = coord(5/25)
    
  3. Dumais, S.T.: Latent semantic analysis (2003) 0.13
    0.13050224 = sum of:
      0.13050224 = product of:
        0.4078195 = sum of:
          0.033929918 = weight(abstract_txt:word in 2462) [ClassicSimilarity], result of:
            0.033929918 = score(doc=2462,freq=3.0), product of:
              0.11532966 = queryWeight, product of:
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.021218264 = queryNorm
              0.2941994 = fieldWeight in 2462, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.03125 = fieldNorm(doc=2462)
          0.02052989 = weight(abstract_txt:generated in 2462) [ClassicSimilarity], result of:
            0.02052989 = score(doc=2462,freq=1.0), product of:
              0.118991874 = queryWeight, product of:
                1.015753 = boost
                5.52102 = idf(docFreq=480, maxDocs=44218)
                0.021218264 = queryNorm
              0.17253187 = fieldWeight in 2462, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.52102 = idf(docFreq=480, maxDocs=44218)
                0.03125 = fieldNorm(doc=2462)
          0.0213075 = weight(abstract_txt:unique in 2462) [ClassicSimilarity], result of:
            0.0213075 = score(doc=2462,freq=1.0), product of:
              0.12197792 = queryWeight, product of:
                1.028419 = boost
                5.5898643 = idf(docFreq=448, maxDocs=44218)
                0.021218264 = queryNorm
              0.17468326 = fieldWeight in 2462, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5898643 = idf(docFreq=448, maxDocs=44218)
                0.03125 = fieldNorm(doc=2462)
          0.032083273 = weight(abstract_txt:retrieved in 2462) [ClassicSimilarity], result of:
            0.032083273 = score(doc=2462,freq=2.0), product of:
              0.12718484 = queryWeight, product of:
                1.0501399 = boost
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.021218264 = queryNorm
              0.25225705 = fieldWeight in 2462, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.03125 = fieldNorm(doc=2462)
          0.05630271 = weight(abstract_txt:vocabulary in 2462) [ClassicSimilarity], result of:
            0.05630271 = score(doc=2462,freq=4.0), product of:
              0.16812134 = queryWeight, product of:
                1.478722 = boost
                5.358293 = idf(docFreq=565, maxDocs=44218)
                0.021218264 = queryNorm
              0.33489332 = fieldWeight in 2462, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.358293 = idf(docFreq=565, maxDocs=44218)
                0.03125 = fieldNorm(doc=2462)
          0.03254764 = weight(abstract_txt:association in 2462) [ClassicSimilarity], result of:
            0.03254764 = score(doc=2462,freq=1.0), product of:
              0.18519801 = queryWeight, product of:
                1.5520055 = boost
                5.6238427 = idf(docFreq=433, maxDocs=44218)
                0.021218264 = queryNorm
              0.17574508 = fieldWeight in 2462, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6238427 = idf(docFreq=433, maxDocs=44218)
                0.03125 = fieldNorm(doc=2462)
          0.11747382 = weight(abstract_txt:lexical in 2462) [ClassicSimilarity], result of:
            0.11747382 = score(doc=2462,freq=3.0), product of:
              0.3325497 = queryWeight, product of:
                2.4014456 = boost
                6.5264034 = idf(docFreq=175, maxDocs=44218)
                0.021218264 = queryNorm
              0.35325193 = fieldWeight in 2462, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.5264034 = idf(docFreq=175, maxDocs=44218)
                0.03125 = fieldNorm(doc=2462)
          0.09364477 = weight(abstract_txt:terms in 2462) [ClassicSimilarity], result of:
            0.09364477 = score(doc=2462,freq=11.0), product of:
              0.22342941 = queryWeight, product of:
                2.6039562 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.021218264 = queryNorm
              0.41912463 = fieldWeight in 2462, product of:
                3.3166249 = tf(freq=11.0), with freq of:
                  11.0 = termFreq=11.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.03125 = fieldNorm(doc=2462)
        0.32 = coord(8/25)
    
  4. Amolochitis, E.; Christou, I.T.; Tan, Z.-H.; Prasad, R.: ¬A heuristic hierarchical scheme for academic search and retrieval (2013) 0.13
    0.12759829 = sum of:
      0.12759829 = product of:
        0.45570815 = sum of:
          0.041152816 = weight(abstract_txt:precision in 2711) [ClassicSimilarity], result of:
            0.041152816 = score(doc=2711,freq=1.0), product of:
              0.11917155 = queryWeight, product of:
                1.0165197 = boost
                5.5251865 = idf(docFreq=478, maxDocs=44218)
                0.021218264 = queryNorm
              0.34532416 = fieldWeight in 2711, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5251865 = idf(docFreq=478, maxDocs=44218)
                0.0625 = fieldNorm(doc=2711)
          0.036582854 = weight(abstract_txt:user in 2711) [ClassicSimilarity], result of:
            0.036582854 = score(doc=2711,freq=4.0), product of:
              0.07945143 = queryWeight, product of:
                1.0165435 = boost
                3.6835442 = idf(docFreq=3020, maxDocs=44218)
                0.021218264 = queryNorm
              0.46044302 = fieldWeight in 2711, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.6835442 = idf(docFreq=3020, maxDocs=44218)
                0.0625 = fieldNorm(doc=2711)
          0.041626047 = weight(abstract_txt:produced in 2711) [ClassicSimilarity], result of:
            0.041626047 = score(doc=2711,freq=1.0), product of:
              0.12008341 = queryWeight, product of:
                1.0204012 = boost
                5.5462847 = idf(docFreq=468, maxDocs=44218)
                0.021218264 = queryNorm
              0.3466428 = fieldWeight in 2711, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5462847 = idf(docFreq=468, maxDocs=44218)
                0.0625 = fieldNorm(doc=2711)
          0.07416641 = weight(abstract_txt:ranking in 2711) [ClassicSimilarity], result of:
            0.07416641 = score(doc=2711,freq=3.0), product of:
              0.12236878 = queryWeight, product of:
                1.0300654 = boost
                5.598813 = idf(docFreq=444, maxDocs=44218)
                0.021218264 = queryNorm
              0.6060893 = fieldWeight in 2711, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.598813 = idf(docFreq=444, maxDocs=44218)
                0.0625 = fieldNorm(doc=2711)
          0.0453726 = weight(abstract_txt:retrieved in 2711) [ClassicSimilarity], result of:
            0.0453726 = score(doc=2711,freq=1.0), product of:
              0.12718484 = queryWeight, product of:
                1.0501399 = boost
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.021218264 = queryNorm
              0.35674536 = fieldWeight in 2711, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.0625 = fieldNorm(doc=2711)
          0.09053684 = weight(abstract_txt:subset in 2711) [ClassicSimilarity], result of:
            0.09053684 = score(doc=2711,freq=1.0), product of:
              0.20158418 = queryWeight, product of:
                1.3220799 = boost
                7.1860275 = idf(docFreq=90, maxDocs=44218)
                0.021218264 = queryNorm
              0.44912672 = fieldWeight in 2711, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1860275 = idf(docFreq=90, maxDocs=44218)
                0.0625 = fieldNorm(doc=2711)
          0.12627059 = weight(abstract_txt:terms in 2711) [ClassicSimilarity], result of:
            0.12627059 = score(doc=2711,freq=5.0), product of:
              0.22342941 = queryWeight, product of:
                2.6039562 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.021218264 = queryNorm
              0.5651476 = fieldWeight in 2711, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=2711)
        0.28 = coord(7/25)
    
  5. Coladangelo, L.P.: Organizing controversy : toward cultural hospitality in controlled vocabularies through semantic annotation (2021) 0.12
    0.12446407 = sum of:
      0.12446407 = product of:
        0.44451454 = sum of:
          0.016004998 = weight(abstract_txt:user in 578) [ClassicSimilarity], result of:
            0.016004998 = score(doc=578,freq=1.0), product of:
              0.07945143 = queryWeight, product of:
                1.0165435 = boost
                3.6835442 = idf(docFreq=3020, maxDocs=44218)
                0.021218264 = queryNorm
              0.20144382 = fieldWeight in 578, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6835442 = idf(docFreq=3020, maxDocs=44218)
                0.0546875 = fieldNorm(doc=578)
          0.079219736 = weight(abstract_txt:subset in 578) [ClassicSimilarity], result of:
            0.079219736 = score(doc=578,freq=1.0), product of:
              0.20158418 = queryWeight, product of:
                1.3220799 = boost
                7.1860275 = idf(docFreq=90, maxDocs=44218)
                0.021218264 = queryNorm
              0.39298588 = fieldWeight in 578, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1860275 = idf(docFreq=90, maxDocs=44218)
                0.0546875 = fieldNorm(doc=578)
          0.06967105 = weight(abstract_txt:vocabulary in 578) [ClassicSimilarity], result of:
            0.06967105 = score(doc=578,freq=2.0), product of:
              0.16812134 = queryWeight, product of:
                1.478722 = boost
                5.358293 = idf(docFreq=565, maxDocs=44218)
                0.021218264 = queryNorm
              0.41440934 = fieldWeight in 578, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.358293 = idf(docFreq=565, maxDocs=44218)
                0.0546875 = fieldNorm(doc=578)
          0.030445347 = weight(abstract_txt:methods in 578) [ClassicSimilarity], result of:
            0.030445347 = score(doc=578,freq=1.0), product of:
              0.13425325 = queryWeight, product of:
                1.5258325 = boost
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.021218264 = queryNorm
              0.2267755 = fieldWeight in 578, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.0546875 = fieldNorm(doc=578)
          0.03165985 = weight(abstract_txt:were in 578) [ClassicSimilarity], result of:
            0.03165985 = score(doc=578,freq=1.0), product of:
              0.15774193 = queryWeight, product of:
                2.0256467 = boost
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.021218264 = queryNorm
              0.20070662 = fieldWeight in 578, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.0546875 = fieldNorm(doc=578)
          0.1186912 = weight(abstract_txt:lexical in 578) [ClassicSimilarity], result of:
            0.1186912 = score(doc=578,freq=1.0), product of:
              0.3325497 = queryWeight, product of:
                2.4014456 = boost
                6.5264034 = idf(docFreq=175, maxDocs=44218)
                0.021218264 = queryNorm
              0.35691267 = fieldWeight in 578, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5264034 = idf(docFreq=175, maxDocs=44218)
                0.0546875 = fieldNorm(doc=578)
          0.09882236 = weight(abstract_txt:terms in 578) [ClassicSimilarity], result of:
            0.09882236 = score(doc=578,freq=4.0), product of:
              0.22342941 = queryWeight, product of:
                2.6039562 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.021218264 = queryNorm
              0.4422979 = fieldWeight in 578, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0546875 = fieldNorm(doc=578)
        0.28 = coord(7/25)