Document (#13222)

Author
Huffman, G.D.
Vital, D.A.
Bivins, R.G.
Title
Generating indices with lexical association methods : term uniqueness
Source
Information processing and management. 26(1990) no.4, S.549-558
Year
1990
Abstract
A software system has been developed which orders citations retrieved from an online database in terms of relevancy. The system resulted from an effort generated by NASA's Technology Utilization Program to create new advanced software tools to largely automate the process of determining relevancy of database citations retrieved to support large technology transfer studies. The ranking is based on the generation of an enriched vocabulary using lexical association methods, a user assessment of the vocabulary and a combination of the user assessment and the lexical metric. One of the key elements in relevancy ranking is the enriched vocabulary -the terms mst be both unique and descriptive. This paper examines term uniqueness. Six lexical association methods were employed to generate characteristic word indices. A limited subset of the terms - the highest 20,40,60 and 7,5% of the uniquess words - we compared and uniquess factors developed. Computational times were also measured. It was found that methods based on occurrences and signal produced virtually the same terms. The limited subset of terms producedby the exact and centroid discrimination value were also nearly identical. Unique terms sets were produced by teh occurrence, variance and discrimination value (centroid), An end-user evaluation showed that the generated terms were largely distinct and had values of word precision which were consistent with values of the search precision.
Theme
Retrievalstudien
Indexierungsstudien

Similar documents (content)

  1. Bilal, D.: Ranking, relevance judgment, and precision of information retrieval on children's queries : evaluation of Google, Yahoo!, Bing, Yahoo! Kids, and ask Kids (2012) 0.16
    0.1578801 = sum of:
      0.1578801 = product of:
        0.5638575 = sum of:
          0.059808407 = weight(abstract_txt:word in 2394) [ClassicSimilarity], result of:
            0.059808407 = score(doc=2394,freq=3.0), product of:
              0.1156366 = queryWeight, product of:
                5.460322 = idf(docFreq=488, maxDocs=42306)
                0.021177616 = queryNorm
              0.51721 = fieldWeight in 2394, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.460322 = idf(docFreq=488, maxDocs=42306)
                0.0546875 = fieldNorm(doc=2394)
          0.10050983 = weight(abstract_txt:precision in 2394) [ClassicSimilarity], result of:
            0.10050983 = score(doc=2394,freq=8.0), product of:
              0.117869996 = queryWeight, product of:
                1.0096108 = boost
                5.5127997 = idf(docFreq=463, maxDocs=42306)
                0.021177616 = queryNorm
              0.8527177 = fieldWeight in 2394, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                5.5127997 = idf(docFreq=463, maxDocs=42306)
                0.0546875 = fieldNorm(doc=2394)
          0.0631908 = weight(abstract_txt:produced in 2394) [ClassicSimilarity], result of:
            0.0631908 = score(doc=2394,freq=3.0), product of:
              0.1199563 = queryWeight, product of:
                1.0185066 = boost
                5.561374 = idf(docFreq=441, maxDocs=42306)
                0.021177616 = queryNorm
              0.52678186 = fieldWeight in 2394, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.561374 = idf(docFreq=441, maxDocs=42306)
                0.0546875 = fieldNorm(doc=2394)
          0.05269439 = weight(abstract_txt:ranking in 2394) [ClassicSimilarity], result of:
            0.05269439 = score(doc=2394,freq=2.0), product of:
              0.12165421 = queryWeight, product of:
                1.0256895 = boost
                5.600595 = idf(docFreq=424, maxDocs=42306)
                0.021177616 = queryNorm
              0.43314892 = fieldWeight in 2394, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.600595 = idf(docFreq=424, maxDocs=42306)
                0.0546875 = fieldNorm(doc=2394)
          0.078752376 = weight(abstract_txt:retrieved in 2394) [ClassicSimilarity], result of:
            0.078752376 = score(doc=2394,freq=4.0), product of:
              0.12621666 = queryWeight, product of:
                1.0447459 = boost
                5.704649 = idf(docFreq=382, maxDocs=42306)
                0.021177616 = queryNorm
              0.62394595 = fieldWeight in 2394, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.704649 = idf(docFreq=382, maxDocs=42306)
                0.0546875 = fieldNorm(doc=2394)
          0.0322788 = weight(abstract_txt:were in 2394) [ClassicSimilarity], result of:
            0.0322788 = score(doc=2394,freq=1.0), product of:
              0.15944573 = queryWeight, product of:
                2.0338523 = boost
                3.7018294 = idf(docFreq=2837, maxDocs=42306)
                0.021177616 = queryNorm
              0.2024438 = fieldWeight in 2394, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7018294 = idf(docFreq=2837, maxDocs=42306)
                0.0546875 = fieldNorm(doc=2394)
          0.17662288 = weight(abstract_txt:relevancy in 2394) [ClassicSimilarity], result of:
            0.17662288 = score(doc=2394,freq=1.0), product of:
              0.3929669 = queryWeight, product of:
                2.2577505 = boost
                8.218697 = idf(docFreq=30, maxDocs=42306)
                0.021177616 = queryNorm
              0.44945997 = fieldWeight in 2394, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.218697 = idf(docFreq=30, maxDocs=42306)
                0.0546875 = fieldNorm(doc=2394)
        0.28 = coord(7/25)
    
  2. Lucas, W.; Topi, H.: Form and function : the impact of query term and operator usage on Web search results (2002) 0.13
    0.13299324 = sum of:
      0.13299324 = product of:
        0.66496617 = sum of:
          0.01810351 = weight(abstract_txt:user in 1199) [ClassicSimilarity], result of:
            0.01810351 = score(doc=1199,freq=1.0), product of:
              0.07873573 = queryWeight, product of:
                1.0106107 = boost
                3.67884 = idf(docFreq=2903, maxDocs=42306)
                0.021177616 = queryNorm
              0.2299275 = fieldWeight in 1199, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.67884 = idf(docFreq=2903, maxDocs=42306)
                0.0625 = fieldNorm(doc=1199)
          0.04500136 = weight(abstract_txt:retrieved in 1199) [ClassicSimilarity], result of:
            0.04500136 = score(doc=1199,freq=1.0), product of:
              0.12621666 = queryWeight, product of:
                1.0447459 = boost
                5.704649 = idf(docFreq=382, maxDocs=42306)
                0.021177616 = queryNorm
              0.35654056 = fieldWeight in 1199, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.704649 = idf(docFreq=382, maxDocs=42306)
                0.0625 = fieldNorm(doc=1199)
          0.05217042 = weight(abstract_txt:were in 1199) [ClassicSimilarity], result of:
            0.05217042 = score(doc=1199,freq=2.0), product of:
              0.15944573 = queryWeight, product of:
                2.0338523 = boost
                3.7018294 = idf(docFreq=2837, maxDocs=42306)
                0.021177616 = queryNorm
              0.3271986 = fieldWeight in 1199, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.7018294 = idf(docFreq=2837, maxDocs=42306)
                0.0625 = fieldNorm(doc=1199)
          0.45136094 = weight(abstract_txt:relevancy in 1199) [ClassicSimilarity], result of:
            0.45136094 = score(doc=1199,freq=5.0), product of:
              0.3929669 = queryWeight, product of:
                2.2577505 = boost
                8.218697 = idf(docFreq=30, maxDocs=42306)
                0.021177616 = queryNorm
              1.1485978 = fieldWeight in 1199, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.218697 = idf(docFreq=30, maxDocs=42306)
                0.0625 = fieldNorm(doc=1199)
          0.09832997 = weight(abstract_txt:terms in 1199) [ClassicSimilarity], result of:
            0.09832997 = score(doc=1199,freq=3.0), product of:
              0.22373767 = queryWeight, product of:
                2.6022913 = boost
                4.059814 = idf(docFreq=1983, maxDocs=42306)
                0.021177616 = queryNorm
              0.43948776 = fieldWeight in 1199, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.059814 = idf(docFreq=1983, maxDocs=42306)
                0.0625 = fieldNorm(doc=1199)
        0.2 = coord(5/25)
    
  3. Dumais, S.T.: Latent semantic analysis (2003) 0.13
    0.13082191 = sum of:
      0.13082191 = product of:
        0.40881848 = sum of:
          0.034176234 = weight(abstract_txt:word in 282) [ClassicSimilarity], result of:
            0.034176234 = score(doc=282,freq=3.0), product of:
              0.1156366 = queryWeight, product of:
                5.460322 = idf(docFreq=488, maxDocs=42306)
                0.021177616 = queryNorm
              0.2955486 = fieldWeight in 282, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.460322 = idf(docFreq=488, maxDocs=42306)
                0.03125 = fieldNorm(doc=282)
          0.020796828 = weight(abstract_txt:generated in 282) [ClassicSimilarity], result of:
            0.020796828 = score(doc=282,freq=1.0), product of:
              0.119761616 = queryWeight, product of:
                1.0176798 = boost
                5.5568595 = idf(docFreq=443, maxDocs=42306)
                0.021177616 = queryNorm
              0.17365186 = fieldWeight in 282, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5568595 = idf(docFreq=443, maxDocs=42306)
                0.03125 = fieldNorm(doc=282)
          0.021704191 = weight(abstract_txt:unique in 282) [ClassicSimilarity], result of:
            0.021704191 = score(doc=282,freq=1.0), product of:
              0.12322022 = queryWeight, product of:
                1.0322701 = boost
                5.636527 = idf(docFreq=409, maxDocs=42306)
                0.021177616 = queryNorm
              0.17614147 = fieldWeight in 282, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.636527 = idf(docFreq=409, maxDocs=42306)
                0.03125 = fieldNorm(doc=282)
          0.031820767 = weight(abstract_txt:retrieved in 282) [ClassicSimilarity], result of:
            0.031820767 = score(doc=282,freq=2.0), product of:
              0.12621666 = queryWeight, product of:
                1.0447459 = boost
                5.704649 = idf(docFreq=382, maxDocs=42306)
                0.021177616 = queryNorm
              0.25211224 = fieldWeight in 282, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.704649 = idf(docFreq=382, maxDocs=42306)
                0.03125 = fieldNorm(doc=282)
          0.055795696 = weight(abstract_txt:vocabulary in 282) [ClassicSimilarity], result of:
            0.055795696 = score(doc=282,freq=4.0), product of:
              0.16674922 = queryWeight, product of:
                1.4707196 = boost
                5.353735 = idf(docFreq=543, maxDocs=42306)
                0.021177616 = queryNorm
              0.33460844 = fieldWeight in 282, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.353735 = idf(docFreq=543, maxDocs=42306)
                0.03125 = fieldNorm(doc=282)
          0.03289893 = weight(abstract_txt:association in 282) [ClassicSimilarity], result of:
            0.03289893 = score(doc=282,freq=1.0), product of:
              0.18612492 = queryWeight, product of:
                1.5538183 = boost
                5.656232 = idf(docFreq=401, maxDocs=42306)
                0.021177616 = queryNorm
              0.17675725 = fieldWeight in 282, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.656232 = idf(docFreq=401, maxDocs=42306)
                0.03125 = fieldNorm(doc=282)
          0.117482044 = weight(abstract_txt:lexical in 282) [ClassicSimilarity], result of:
            0.117482044 = score(doc=282,freq=3.0), product of:
              0.3318462 = queryWeight, product of:
                2.3957183 = boost
                6.5406966 = idf(docFreq=165, maxDocs=42306)
                0.021177616 = queryNorm
              0.35402557 = fieldWeight in 282, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.5406966 = idf(docFreq=165, maxDocs=42306)
                0.03125 = fieldNorm(doc=282)
          0.09414378 = weight(abstract_txt:terms in 282) [ClassicSimilarity], result of:
            0.09414378 = score(doc=282,freq=11.0), product of:
              0.22373767 = queryWeight, product of:
                2.6022913 = boost
                4.059814 = idf(docFreq=1983, maxDocs=42306)
                0.021177616 = queryNorm
              0.4207775 = fieldWeight in 282, product of:
                3.3166249 = tf(freq=11.0), with freq of:
                  11.0 = termFreq=11.0
                4.059814 = idf(docFreq=1983, maxDocs=42306)
                0.03125 = fieldNorm(doc=282)
        0.32 = coord(8/25)
    
  4. Amolochitis, E.; Christou, I.T.; Tan, Z.-H.; Prasad, R.: ¬A heuristic hierarchical scheme for academic search and retrieval (2013) 0.13
    0.12754542 = sum of:
      0.12754542 = product of:
        0.45551932 = sum of:
          0.040612105 = weight(abstract_txt:precision in 4712) [ClassicSimilarity], result of:
            0.040612105 = score(doc=4712,freq=1.0), product of:
              0.117869996 = queryWeight, product of:
                1.0096108 = boost
                5.5127997 = idf(docFreq=463, maxDocs=42306)
                0.021177616 = queryNorm
              0.34454998 = fieldWeight in 4712, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5127997 = idf(docFreq=463, maxDocs=42306)
                0.0625 = fieldNorm(doc=4712)
          0.03620702 = weight(abstract_txt:user in 4712) [ClassicSimilarity], result of:
            0.03620702 = score(doc=4712,freq=4.0), product of:
              0.07873573 = queryWeight, product of:
                1.0106107 = boost
                3.67884 = idf(docFreq=2903, maxDocs=42306)
                0.021177616 = queryNorm
              0.459855 = fieldWeight in 4712, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.67884 = idf(docFreq=2903, maxDocs=42306)
                0.0625 = fieldNorm(doc=4712)
          0.041695118 = weight(abstract_txt:produced in 4712) [ClassicSimilarity], result of:
            0.041695118 = score(doc=4712,freq=1.0), product of:
              0.1199563 = queryWeight, product of:
                1.0185066 = boost
                5.561374 = idf(docFreq=441, maxDocs=42306)
                0.021177616 = queryNorm
              0.3475859 = fieldWeight in 4712, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.561374 = idf(docFreq=441, maxDocs=42306)
                0.0625 = fieldNorm(doc=4712)
          0.073756784 = weight(abstract_txt:ranking in 4712) [ClassicSimilarity], result of:
            0.073756784 = score(doc=4712,freq=3.0), product of:
              0.12165421 = queryWeight, product of:
                1.0256895 = boost
                5.600595 = idf(docFreq=424, maxDocs=42306)
                0.021177616 = queryNorm
              0.6062822 = fieldWeight in 4712, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.600595 = idf(docFreq=424, maxDocs=42306)
                0.0625 = fieldNorm(doc=4712)
          0.04500136 = weight(abstract_txt:retrieved in 4712) [ClassicSimilarity], result of:
            0.04500136 = score(doc=4712,freq=1.0), product of:
              0.12621666 = queryWeight, product of:
                1.0447459 = boost
                5.704649 = idf(docFreq=382, maxDocs=42306)
                0.021177616 = queryNorm
              0.35654056 = fieldWeight in 4712, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.704649 = idf(docFreq=382, maxDocs=42306)
                0.0625 = fieldNorm(doc=4712)
          0.0913035 = weight(abstract_txt:subset in 4712) [ClassicSimilarity], result of:
            0.0913035 = score(doc=4712,freq=1.0), product of:
              0.2022823 = queryWeight, product of:
                1.3226084 = boost
                7.2218676 = idf(docFreq=83, maxDocs=42306)
                0.021177616 = queryNorm
              0.45136672 = fieldWeight in 4712, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2218676 = idf(docFreq=83, maxDocs=42306)
                0.0625 = fieldNorm(doc=4712)
          0.12694344 = weight(abstract_txt:terms in 4712) [ClassicSimilarity], result of:
            0.12694344 = score(doc=4712,freq=5.0), product of:
              0.22373767 = queryWeight, product of:
                2.6022913 = boost
                4.059814 = idf(docFreq=1983, maxDocs=42306)
                0.021177616 = queryNorm
              0.56737626 = fieldWeight in 4712, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.059814 = idf(docFreq=1983, maxDocs=42306)
                0.0625 = fieldNorm(doc=4712)
        0.28 = coord(7/25)
    
  5. Yang, Y.; Wilbur, J.: Using corpus statistics to remove redundant words in text categorization (1996) 0.12
    0.12191827 = sum of:
      0.12191827 = product of:
        0.4354224 = sum of:
          0.09865829 = weight(abstract_txt:word in 4268) [ClassicSimilarity], result of:
            0.09865829 = score(doc=4268,freq=4.0), product of:
              0.1156366 = queryWeight, product of:
                5.460322 = idf(docFreq=488, maxDocs=42306)
                0.021177616 = queryNorm
              0.8531753 = fieldWeight in 4268, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.460322 = idf(docFreq=488, maxDocs=42306)
                0.078125 = fieldNorm(doc=4268)
          0.05076513 = weight(abstract_txt:precision in 4268) [ClassicSimilarity], result of:
            0.05076513 = score(doc=4268,freq=1.0), product of:
              0.117869996 = queryWeight, product of:
                1.0096108 = boost
                5.5127997 = idf(docFreq=463, maxDocs=42306)
                0.021177616 = queryNorm
              0.4306875 = fieldWeight in 4268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5127997 = idf(docFreq=463, maxDocs=42306)
                0.078125 = fieldNorm(doc=4268)
          0.053229373 = weight(abstract_txt:ranking in 4268) [ClassicSimilarity], result of:
            0.053229373 = score(doc=4268,freq=1.0), product of:
              0.12165421 = queryWeight, product of:
                1.0256895 = boost
                5.600595 = idf(docFreq=424, maxDocs=42306)
                0.021177616 = queryNorm
              0.4375465 = fieldWeight in 4268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.600595 = idf(docFreq=424, maxDocs=42306)
                0.078125 = fieldNorm(doc=4268)
          0.054260477 = weight(abstract_txt:unique in 4268) [ClassicSimilarity], result of:
            0.054260477 = score(doc=4268,freq=1.0), product of:
              0.12322022 = queryWeight, product of:
                1.0322701 = boost
                5.636527 = idf(docFreq=409, maxDocs=42306)
                0.021177616 = queryNorm
              0.4403537 = fieldWeight in 4268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.636527 = idf(docFreq=409, maxDocs=42306)
                0.078125 = fieldNorm(doc=4268)
          0.069744624 = weight(abstract_txt:vocabulary in 4268) [ClassicSimilarity], result of:
            0.069744624 = score(doc=4268,freq=1.0), product of:
              0.16674922 = queryWeight, product of:
                1.4707196 = boost
                5.353735 = idf(docFreq=543, maxDocs=42306)
                0.021177616 = queryNorm
              0.41826054 = fieldWeight in 4268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.353735 = idf(docFreq=543, maxDocs=42306)
                0.078125 = fieldNorm(doc=4268)
          0.062651925 = weight(abstract_txt:methods in 4268) [ClassicSimilarity], result of:
            0.062651925 = score(doc=4268,freq=2.0), product of:
              0.13561754 = queryWeight, product of:
                1.5315292 = boost
                4.181321 = idf(docFreq=1756, maxDocs=42306)
                0.021177616 = queryNorm
              0.46197507 = fieldWeight in 4268, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.181321 = idf(docFreq=1756, maxDocs=42306)
                0.078125 = fieldNorm(doc=4268)
          0.04611257 = weight(abstract_txt:were in 4268) [ClassicSimilarity], result of:
            0.04611257 = score(doc=4268,freq=1.0), product of:
              0.15944573 = queryWeight, product of:
                2.0338523 = boost
                3.7018294 = idf(docFreq=2837, maxDocs=42306)
                0.021177616 = queryNorm
              0.28920543 = fieldWeight in 4268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7018294 = idf(docFreq=2837, maxDocs=42306)
                0.078125 = fieldNorm(doc=4268)
        0.28 = coord(7/25)