Document (#21769)

Author
Rorvig, M.
Title
Images of similarity : a visual exploration of optimal similarity metrics and scaling properties of TREC topic-document sets
Source
Journal of the American Society for Information Science. 50(1999) no.8, S.639-651
Year
1999
Abstract
Multiple similarity measures for 5 TREC topic-document sets from the LDC TREC Collection Disk 1 are derived from the full text of documents. Each measure on each set is scaled using SAS MDS under ordinal, interval, and MLE assumptions. The resulting 75 permutations are ploted. It is suggested that cosine-vector and overlap measures for similarity appear to recover optimal data relationships among the documents of the 5 sets. MLE assumptions appear to be required to model the data adequately

Similar documents (author)

  1. Rorvig, M.E.: ¬A method for automatically abstracting visual documents (1993) 5.63
    5.633517 = sum of:
      5.633517 = weight(author_txt:rorvig in 2723) [ClassicSimilarity], result of:
        5.633517 = fieldWeight in 2723, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.013627 = idf(docFreq=13, maxDocs=42306)
          0.625 = fieldNorm(doc=2723)
    
  2. Rorvig, M.E.: Image information retrieval (1987) 5.63
    5.633517 = sum of:
      5.633517 = weight(author_txt:rorvig in 5640) [ClassicSimilarity], result of:
        5.633517 = fieldWeight in 5640, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.013627 = idf(docFreq=13, maxDocs=42306)
          0.625 = fieldNorm(doc=5640)
    
  3. Rorvig, M.E.: ¬The bibliographic control of microcomputer software (1988) 5.63
    5.633517 = sum of:
      5.633517 = weight(author_txt:rorvig in 1344) [ClassicSimilarity], result of:
        5.633517 = fieldWeight in 1344, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.013627 = idf(docFreq=13, maxDocs=42306)
          0.625 = fieldNorm(doc=1344)
    
  4. Rorvig, M.E.: Psychometric measurement and information retrieval (1989) 5.63
    5.633517 = sum of:
      5.633517 = weight(author_txt:rorvig in 334) [ClassicSimilarity], result of:
        5.633517 = fieldWeight in 334, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.013627 = idf(docFreq=13, maxDocs=42306)
          0.625 = fieldNorm(doc=334)
    
  5. Rorvig, M.: Scaled structure in visualized TREC data and query feedback (1998) 5.63
    5.633517 = sum of:
      5.633517 = weight(author_txt:rorvig in 4270) [ClassicSimilarity], result of:
        5.633517 = fieldWeight in 4270, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.013627 = idf(docFreq=13, maxDocs=42306)
          0.625 = fieldNorm(doc=4270)
    

Similar documents (content)

  1. Rorvig, M.: ¬A visual exploration of the orderliness of TREC relevance judgements (1999) 0.34
    0.3353743 = sum of:
      0.3353743 = product of:
        1.1977654 = sum of:
          0.1756767 = weight(abstract_txt:scaling in 4769) [ClassicSimilarity], result of:
            0.1756767 = score(doc=4769,freq=4.0), product of:
              0.15214212 = queryWeight, product of:
                1.1449938 = boost
                7.390004 = idf(docFreq=70, maxDocs=42306)
                0.017980495 = queryNorm
              1.1546881 = fieldWeight in 4769, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.390004 = idf(docFreq=70, maxDocs=42306)
                0.078125 = fieldNorm(doc=4769)
          0.07433861 = weight(abstract_txt:documents in 4769) [ClassicSimilarity], result of:
            0.07433861 = score(doc=4769,freq=6.0), product of:
              0.094383456 = queryWeight, product of:
                1.275385 = boost
                4.115787 = idf(docFreq=1875, maxDocs=42306)
                0.017980495 = queryNorm
              0.7876233 = fieldWeight in 4769, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.115787 = idf(docFreq=1875, maxDocs=42306)
                0.078125 = fieldNorm(doc=4769)
          0.1725241 = weight(abstract_txt:scaled in 4769) [ClassicSimilarity], result of:
            0.1725241 = score(doc=4769,freq=1.0), product of:
              0.23861249 = queryWeight, product of:
                1.4339201 = boost
                9.254789 = idf(docFreq=10, maxDocs=42306)
                0.017980495 = queryNorm
              0.72303045 = fieldWeight in 4769, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.254789 = idf(docFreq=10, maxDocs=42306)
                0.078125 = fieldNorm(doc=4769)
          0.12945412 = weight(abstract_txt:topic in 4769) [ClassicSimilarity], result of:
            0.12945412 = score(doc=4769,freq=5.0), product of:
              0.14517458 = queryWeight, product of:
                1.581753 = boost
                5.104465 = idf(docFreq=697, maxDocs=42306)
                0.017980495 = queryNorm
              0.8917134 = fieldWeight in 4769, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.104465 = idf(docFreq=697, maxDocs=42306)
                0.078125 = fieldNorm(doc=4769)
          0.12894674 = weight(abstract_txt:sets in 4769) [ClassicSimilarity], result of:
            0.12894674 = score(doc=4769,freq=2.0), product of:
              0.22495587 = queryWeight, product of:
                2.4115024 = boost
                5.188096 = idf(docFreq=641, maxDocs=42306)
                0.017980495 = queryNorm
              0.57320905 = fieldWeight in 4769, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.188096 = idf(docFreq=641, maxDocs=42306)
                0.078125 = fieldNorm(doc=4769)
          0.2747424 = weight(abstract_txt:trec in 4769) [ClassicSimilarity], result of:
            0.2747424 = score(doc=4769,freq=2.0), product of:
              0.37248394 = queryWeight, product of:
                3.1030786 = boost
                6.6759505 = idf(docFreq=144, maxDocs=42306)
                0.017980495 = queryNorm
              0.73759526 = fieldWeight in 4769, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.6759505 = idf(docFreq=144, maxDocs=42306)
                0.078125 = fieldNorm(doc=4769)
          0.24208277 = weight(abstract_txt:similarity in 4769) [ClassicSimilarity], result of:
            0.24208277 = score(doc=4769,freq=2.0), product of:
              0.3768015 = queryWeight, product of:
                3.6038332 = boost
                5.814954 = idf(docFreq=342, maxDocs=42306)
                0.017980495 = queryNorm
              0.6424677 = fieldWeight in 4769, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.814954 = idf(docFreq=342, maxDocs=42306)
                0.078125 = fieldNorm(doc=4769)
        0.28 = coord(7/25)
    
  2. Rorvig, M.: Scaled structure in visualized TREC data and query feedback (1998) 0.26
    0.25770468 = sum of:
      0.25770468 = product of:
        0.9203738 = sum of:
          0.058516026 = weight(abstract_txt:exploration in 4270) [ClassicSimilarity], result of:
            0.058516026 = score(doc=4270,freq=1.0), product of:
              0.11604949 = queryWeight, product of:
                6.4541874 = idf(docFreq=180, maxDocs=42306)
                0.017980495 = queryNorm
              0.50423336 = fieldWeight in 4270, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4541874 = idf(docFreq=180, maxDocs=42306)
                0.078125 = fieldNorm(doc=4270)
          0.06069722 = weight(abstract_txt:documents in 4270) [ClassicSimilarity], result of:
            0.06069722 = score(doc=4270,freq=4.0), product of:
              0.094383456 = queryWeight, product of:
                1.275385 = boost
                4.115787 = idf(docFreq=1875, maxDocs=42306)
                0.017980495 = queryNorm
              0.64309174 = fieldWeight in 4270, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.115787 = idf(docFreq=1875, maxDocs=42306)
                0.078125 = fieldNorm(doc=4270)
          0.083555855 = weight(abstract_txt:document in 4270) [ClassicSimilarity], result of:
            0.083555855 = score(doc=4270,freq=6.0), product of:
              0.10203226 = queryWeight, product of:
                1.3260568 = boost
                4.2793097 = idf(docFreq=1592, maxDocs=42306)
                0.017980495 = queryNorm
              0.818916 = fieldWeight in 4270, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.2793097 = idf(docFreq=1592, maxDocs=42306)
                0.078125 = fieldNorm(doc=4270)
          0.24398589 = weight(abstract_txt:scaled in 4270) [ClassicSimilarity], result of:
            0.24398589 = score(doc=4270,freq=2.0), product of:
              0.23861249 = queryWeight, product of:
                1.4339201 = boost
                9.254789 = idf(docFreq=10, maxDocs=42306)
                0.017980495 = queryNorm
              1.0225194 = fieldWeight in 4270, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.254789 = idf(docFreq=10, maxDocs=42306)
                0.078125 = fieldNorm(doc=4270)
          0.10027472 = weight(abstract_txt:topic in 4270) [ClassicSimilarity], result of:
            0.10027472 = score(doc=4270,freq=3.0), product of:
              0.14517458 = queryWeight, product of:
                1.581753 = boost
                5.104465 = idf(docFreq=697, maxDocs=42306)
                0.017980495 = queryNorm
              0.6907182 = fieldWeight in 4270, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.104465 = idf(docFreq=697, maxDocs=42306)
                0.078125 = fieldNorm(doc=4270)
          0.09860167 = weight(abstract_txt:appear in 4270) [ClassicSimilarity], result of:
            0.09860167 = score(doc=4270,freq=1.0), product of:
              0.20704252 = queryWeight, product of:
                1.8889617 = boost
                6.095856 = idf(docFreq=258, maxDocs=42306)
                0.017980495 = queryNorm
              0.47623876 = fieldWeight in 4270, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.095856 = idf(docFreq=258, maxDocs=42306)
                0.078125 = fieldNorm(doc=4270)
          0.2747424 = weight(abstract_txt:trec in 4270) [ClassicSimilarity], result of:
            0.2747424 = score(doc=4270,freq=2.0), product of:
              0.37248394 = queryWeight, product of:
                3.1030786 = boost
                6.6759505 = idf(docFreq=144, maxDocs=42306)
                0.017980495 = queryNorm
              0.73759526 = fieldWeight in 4270, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.6759505 = idf(docFreq=144, maxDocs=42306)
                0.078125 = fieldNorm(doc=4270)
        0.28 = coord(7/25)
    
  3. Egghe, L.: Good properties of similarity measures and their complementarity (2010) 0.19
    0.19270568 = sum of:
      0.19270568 = product of:
        0.96352834 = sum of:
          0.08495957 = weight(abstract_txt:vector in 994) [ClassicSimilarity], result of:
            0.08495957 = score(doc=994,freq=2.0), product of:
              0.11810226 = queryWeight, product of:
                1.0088056 = boost
                6.5110207 = idf(docFreq=170, maxDocs=42306)
                0.017980495 = queryNorm
              0.7193729 = fieldWeight in 994, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5110207 = idf(docFreq=170, maxDocs=42306)
                0.078125 = fieldNorm(doc=994)
          0.12617725 = weight(abstract_txt:overlap in 994) [ClassicSimilarity], result of:
            0.12617725 = score(doc=994,freq=3.0), product of:
              0.13429926 = queryWeight, product of:
                1.0757595 = boost
                6.943154 = idf(docFreq=110, maxDocs=42306)
                0.017980495 = queryNorm
              0.93952304 = fieldWeight in 994, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.943154 = idf(docFreq=110, maxDocs=42306)
                0.078125 = fieldNorm(doc=994)
          0.17621422 = weight(abstract_txt:cosine in 994) [ClassicSimilarity], result of:
            0.17621422 = score(doc=994,freq=3.0), product of:
              0.16779546 = queryWeight, product of:
                1.2024541 = boost
                7.760864 = idf(docFreq=48, maxDocs=42306)
                0.017980495 = queryNorm
              1.0501727 = fieldWeight in 994, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.760864 = idf(docFreq=48, maxDocs=42306)
                0.078125 = fieldNorm(doc=994)
          0.15687759 = weight(abstract_txt:measures in 994) [ClassicSimilarity], result of:
            0.15687759 = score(doc=994,freq=5.0), product of:
              0.16501394 = queryWeight, product of:
                1.6863732 = boost
                5.4420843 = idf(docFreq=497, maxDocs=42306)
                0.017980495 = queryNorm
              0.950693 = fieldWeight in 994, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.4420843 = idf(docFreq=497, maxDocs=42306)
                0.078125 = fieldNorm(doc=994)
          0.41929972 = weight(abstract_txt:similarity in 994) [ClassicSimilarity], result of:
            0.41929972 = score(doc=994,freq=6.0), product of:
              0.3768015 = queryWeight, product of:
                3.6038332 = boost
                5.814954 = idf(docFreq=342, maxDocs=42306)
                0.017980495 = queryNorm
              1.1127868 = fieldWeight in 994, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.814954 = idf(docFreq=342, maxDocs=42306)
                0.078125 = fieldNorm(doc=994)
        0.2 = coord(5/25)
    
  4. Huang, L.; Milne, D.; Frank, E.; Witten, I.H.: Learning a concept-based document similarity measure (2012) 0.18
    0.18128823 = sum of:
      0.18128823 = product of:
        0.64745796 = sum of:
          0.07284848 = weight(abstract_txt:overlap in 2373) [ClassicSimilarity], result of:
            0.07284848 = score(doc=2373,freq=1.0), product of:
              0.13429926 = queryWeight, product of:
                1.0757595 = boost
                6.943154 = idf(docFreq=110, maxDocs=42306)
                0.017980495 = queryNorm
              0.5424339 = fieldWeight in 2373, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.943154 = idf(docFreq=110, maxDocs=42306)
                0.078125 = fieldNorm(doc=2373)
          0.042919416 = weight(abstract_txt:documents in 2373) [ClassicSimilarity], result of:
            0.042919416 = score(doc=2373,freq=2.0), product of:
              0.094383456 = queryWeight, product of:
                1.275385 = boost
                4.115787 = idf(docFreq=1875, maxDocs=42306)
                0.017980495 = queryNorm
              0.45473453 = fieldWeight in 2373, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.115787 = idf(docFreq=1875, maxDocs=42306)
                0.078125 = fieldNorm(doc=2373)
          0.03098699 = weight(abstract_txt:each in 2373) [ClassicSimilarity], result of:
            0.03098699 = score(doc=2373,freq=1.0), product of:
              0.095702425 = queryWeight, product of:
                1.2842656 = boost
                4.1444454 = idf(docFreq=1822, maxDocs=42306)
                0.017980495 = queryNorm
              0.3237848 = fieldWeight in 2373, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1444454 = idf(docFreq=1822, maxDocs=42306)
                0.078125 = fieldNorm(doc=2373)
          0.068223074 = weight(abstract_txt:document in 2373) [ClassicSimilarity], result of:
            0.068223074 = score(doc=2373,freq=4.0), product of:
              0.10203226 = queryWeight, product of:
                1.3260568 = boost
                4.2793097 = idf(docFreq=1592, maxDocs=42306)
                0.017980495 = queryNorm
              0.66864216 = fieldWeight in 2373, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.2793097 = idf(docFreq=1592, maxDocs=42306)
                0.078125 = fieldNorm(doc=2373)
          0.0992181 = weight(abstract_txt:measures in 2373) [ClassicSimilarity], result of:
            0.0992181 = score(doc=2373,freq=2.0), product of:
              0.16501394 = queryWeight, product of:
                1.6863732 = boost
                5.4420843 = idf(docFreq=497, maxDocs=42306)
                0.017980495 = queryNorm
              0.60127103 = fieldWeight in 2373, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4420843 = idf(docFreq=497, maxDocs=42306)
                0.078125 = fieldNorm(doc=2373)
          0.09117911 = weight(abstract_txt:sets in 2373) [ClassicSimilarity], result of:
            0.09117911 = score(doc=2373,freq=1.0), product of:
              0.22495587 = queryWeight, product of:
                2.4115024 = boost
                5.188096 = idf(docFreq=641, maxDocs=42306)
                0.017980495 = queryNorm
              0.40532 = fieldWeight in 2373, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.188096 = idf(docFreq=641, maxDocs=42306)
                0.078125 = fieldNorm(doc=2373)
          0.24208277 = weight(abstract_txt:similarity in 2373) [ClassicSimilarity], result of:
            0.24208277 = score(doc=2373,freq=2.0), product of:
              0.3768015 = queryWeight, product of:
                3.6038332 = boost
                5.814954 = idf(docFreq=342, maxDocs=42306)
                0.017980495 = queryNorm
              0.6424677 = fieldWeight in 2373, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.814954 = idf(docFreq=342, maxDocs=42306)
                0.078125 = fieldNorm(doc=2373)
        0.28 = coord(7/25)
    
  5. Chen, T.T.: ¬The congruity between linkage-based factors and content-based clusters : an experimental study using multiple document corpora (2016) 0.18
    0.17754227 = sum of:
      0.17754227 = product of:
        0.5548196 = sum of:
          0.04806039 = weight(abstract_txt:vector in 4776) [ClassicSimilarity], result of:
            0.04806039 = score(doc=4776,freq=1.0), product of:
              0.11810226 = queryWeight, product of:
                1.0088056 = boost
                6.5110207 = idf(docFreq=170, maxDocs=42306)
                0.017980495 = queryNorm
              0.4069388 = fieldWeight in 4776, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5110207 = idf(docFreq=170, maxDocs=42306)
                0.0625 = fieldNorm(doc=4776)
          0.013478744 = weight(abstract_txt:data in 4776) [ClassicSimilarity], result of:
            0.013478744 = score(doc=4776,freq=1.0), product of:
              0.06375432 = queryWeight, product of:
                1.0482098 = boost
                3.382671 = idf(docFreq=3904, maxDocs=42306)
                0.017980495 = queryNorm
              0.21141694 = fieldWeight in 4776, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.382671 = idf(docFreq=3904, maxDocs=42306)
                0.0625 = fieldNorm(doc=4776)
          0.08138986 = weight(abstract_txt:cosine in 4776) [ClassicSimilarity], result of:
            0.08138986 = score(doc=4776,freq=1.0), product of:
              0.16779546 = queryWeight, product of:
                1.2024541 = boost
                7.760864 = idf(docFreq=48, maxDocs=42306)
                0.017980495 = queryNorm
              0.485054 = fieldWeight in 4776, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.760864 = idf(docFreq=48, maxDocs=42306)
                0.0625 = fieldNorm(doc=4776)
          0.05947089 = weight(abstract_txt:documents in 4776) [ClassicSimilarity], result of:
            0.05947089 = score(doc=4776,freq=6.0), product of:
              0.094383456 = queryWeight, product of:
                1.275385 = boost
                4.115787 = idf(docFreq=1875, maxDocs=42306)
                0.017980495 = queryNorm
              0.63009864 = fieldWeight in 4776, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.115787 = idf(docFreq=1875, maxDocs=42306)
                0.0625 = fieldNorm(doc=4776)
          0.024789592 = weight(abstract_txt:each in 4776) [ClassicSimilarity], result of:
            0.024789592 = score(doc=4776,freq=1.0), product of:
              0.095702425 = queryWeight, product of:
                1.2842656 = boost
                4.1444454 = idf(docFreq=1822, maxDocs=42306)
                0.017980495 = queryNorm
              0.25902784 = fieldWeight in 4776, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1444454 = idf(docFreq=1822, maxDocs=42306)
                0.0625 = fieldNorm(doc=4776)
          0.061020568 = weight(abstract_txt:document in 4776) [ClassicSimilarity], result of:
            0.061020568 = score(doc=4776,freq=5.0), product of:
              0.10203226 = queryWeight, product of:
                1.3260568 = boost
                4.2793097 = idf(docFreq=1592, maxDocs=42306)
                0.017980495 = queryNorm
              0.5980517 = fieldWeight in 4776, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.2793097 = idf(docFreq=1592, maxDocs=42306)
                0.0625 = fieldNorm(doc=4776)
          0.07294329 = weight(abstract_txt:sets in 4776) [ClassicSimilarity], result of:
            0.07294329 = score(doc=4776,freq=1.0), product of:
              0.22495587 = queryWeight, product of:
                2.4115024 = boost
                5.188096 = idf(docFreq=641, maxDocs=42306)
                0.017980495 = queryNorm
              0.324256 = fieldWeight in 4776, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.188096 = idf(docFreq=641, maxDocs=42306)
                0.0625 = fieldNorm(doc=4776)
          0.19366622 = weight(abstract_txt:similarity in 4776) [ClassicSimilarity], result of:
            0.19366622 = score(doc=4776,freq=2.0), product of:
              0.3768015 = queryWeight, product of:
                3.6038332 = boost
                5.814954 = idf(docFreq=342, maxDocs=42306)
                0.017980495 = queryNorm
              0.51397413 = fieldWeight in 4776, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.814954 = idf(docFreq=342, maxDocs=42306)
                0.0625 = fieldNorm(doc=4776)
        0.32 = coord(8/25)