Document (#21768)

Author
Rorvig, M.
Title
Images of similarity : a visual exploration of optimal similarity metrics and scaling properties of TREC topic-document sets
Source
Journal of the American Society for Information Science. 50(1999) no.8, S.639-651
Year
1999
Abstract
Multiple similarity measures for 5 TREC topic-document sets from the LDC TREC Collection Disk 1 are derived from the full text of documents. Each measure on each set is scaled using SAS MDS under ordinal, interval, and MLE assumptions. The resulting 75 permutations are ploted. It is suggested that cosine-vector and overlap measures for similarity appear to recover optimal data relationships among the documents of the 5 sets. MLE assumptions appear to be required to model the data adequately

Similar documents (author)

  1. Rorvig, M.E.: ¬A method for automatically abstracting visual documents (1993) 5.66
    5.661144 = sum of:
      5.661144 = weight(author_txt:rorvig in 2723) [ClassicSimilarity], result of:
        5.661144 = fieldWeight in 2723, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.05783 = idf(docFreq=13, maxDocs=44218)
          0.625 = fieldNorm(doc=2723)
    
  2. Rorvig, M.E.: Image information retrieval (1987) 5.66
    5.661144 = sum of:
      5.661144 = weight(author_txt:rorvig in 5640) [ClassicSimilarity], result of:
        5.661144 = fieldWeight in 5640, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.05783 = idf(docFreq=13, maxDocs=44218)
          0.625 = fieldNorm(doc=5640)
    
  3. Rorvig, M.E.: ¬The bibliographic control of microcomputer software (1988) 5.66
    5.661144 = sum of:
      5.661144 = weight(author_txt:rorvig in 1275) [ClassicSimilarity], result of:
        5.661144 = fieldWeight in 1275, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.05783 = idf(docFreq=13, maxDocs=44218)
          0.625 = fieldNorm(doc=1275)
    
  4. Rorvig, M.E.: Psychometric measurement and information retrieval (1989) 5.66
    5.661144 = sum of:
      5.661144 = weight(author_txt:rorvig in 333) [ClassicSimilarity], result of:
        5.661144 = fieldWeight in 333, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.05783 = idf(docFreq=13, maxDocs=44218)
          0.625 = fieldNorm(doc=333)
    
  5. Rorvig, M.: Scaled structure in visualized TREC data and query feedback (1998) 5.66
    5.661144 = sum of:
      5.661144 = weight(author_txt:rorvig in 3269) [ClassicSimilarity], result of:
        5.661144 = fieldWeight in 3269, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.05783 = idf(docFreq=13, maxDocs=44218)
          0.625 = fieldNorm(doc=3269)
    

Similar documents (content)

  1. Rorvig, M.: ¬A visual exploration of the orderliness of TREC relevance judgements (1999) 0.34
    0.3365314 = sum of:
      0.3365314 = product of:
        1.2018979 = sum of:
          0.1752197 = weight(abstract_txt:scaling in 3768) [ClassicSimilarity], result of:
            0.1752197 = score(doc=3768,freq=4.0), product of:
              0.15196441 = queryWeight, product of:
                1.1586413 = boost
                7.3793993 = idf(docFreq=74, maxDocs=44218)
                0.017773455 = queryNorm
              1.1530311 = fieldWeight in 3768, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.3793993 = idf(docFreq=74, maxDocs=44218)
                0.078125 = fieldNorm(doc=3768)
          0.07476512 = weight(abstract_txt:documents in 3768) [ClassicSimilarity], result of:
            0.07476512 = score(doc=3768,freq=6.0), product of:
              0.09479793 = queryWeight, product of:
                1.2941735 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.017773455 = queryNorm
              0.7886788 = fieldWeight in 3768, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.078125 = fieldNorm(doc=3768)
          0.17530632 = weight(abstract_txt:scaled in 3768) [ClassicSimilarity], result of:
            0.17530632 = score(doc=3768,freq=1.0), product of:
              0.24130796 = queryWeight, product of:
                1.4600371 = boost
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.017773455 = queryNorm
              0.72648376 = fieldWeight in 3768, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.078125 = fieldNorm(doc=3768)
          0.1264843 = weight(abstract_txt:topic in 3768) [ClassicSimilarity], result of:
            0.1264843 = score(doc=3768,freq=5.0), product of:
              0.14302689 = queryWeight, product of:
                1.5896515 = boost
                5.062254 = idf(docFreq=760, maxDocs=44218)
                0.017773455 = queryNorm
              0.88433933 = fieldWeight in 3768, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.062254 = idf(docFreq=760, maxDocs=44218)
                0.078125 = fieldNorm(doc=3768)
          0.12894608 = weight(abstract_txt:sets in 3768) [ClassicSimilarity], result of:
            0.12894608 = score(doc=3768,freq=2.0), product of:
              0.22508287 = queryWeight, product of:
                2.4423614 = boost
                5.185142 = idf(docFreq=672, maxDocs=44218)
                0.017773455 = queryNorm
              0.57288265 = fieldWeight in 3768, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.185142 = idf(docFreq=672, maxDocs=44218)
                0.078125 = fieldNorm(doc=3768)
          0.27815533 = weight(abstract_txt:trec in 3768) [ClassicSimilarity], result of:
            0.27815533 = score(doc=3768,freq=2.0), product of:
              0.37577564 = queryWeight, product of:
                3.1557531 = boost
                6.699675 = idf(docFreq=147, maxDocs=44218)
                0.017773455 = queryNorm
              0.7402165 = fieldWeight in 3768, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.699675 = idf(docFreq=147, maxDocs=44218)
                0.078125 = fieldNorm(doc=3768)
          0.24302104 = weight(abstract_txt:similarity in 3768) [ClassicSimilarity], result of:
            0.24302104 = score(doc=3768,freq=2.0), product of:
              0.37798902 = queryWeight, product of:
                3.6546657 = boost
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.017773455 = queryNorm
              0.64293146 = fieldWeight in 3768, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.078125 = fieldNorm(doc=3768)
        0.28 = coord(7/25)
    
  2. Rorvig, M.: Scaled structure in visualized TREC data and query feedback (1998) 0.26
    0.25849944 = sum of:
      0.25849944 = product of:
        0.9232123 = sum of:
          0.056325607 = weight(abstract_txt:exploration in 3269) [ClassicSimilarity], result of:
            0.056325607 = score(doc=3269,freq=1.0), product of:
              0.11319933 = queryWeight, product of:
                6.369011 = idf(docFreq=205, maxDocs=44218)
                0.017773455 = queryNorm
              0.49757898 = fieldWeight in 3269, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.369011 = idf(docFreq=205, maxDocs=44218)
                0.078125 = fieldNorm(doc=3269)
          0.061045464 = weight(abstract_txt:documents in 3269) [ClassicSimilarity], result of:
            0.061045464 = score(doc=3269,freq=4.0), product of:
              0.09479793 = queryWeight, product of:
                1.2941735 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.017773455 = queryNorm
              0.64395356 = fieldWeight in 3269, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.078125 = fieldNorm(doc=3269)
          0.08448105 = weight(abstract_txt:document in 3269) [ClassicSimilarity], result of:
            0.08448105 = score(doc=3269,freq=6.0), product of:
              0.10284244 = queryWeight, product of:
                1.347967 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.017773455 = queryNorm
              0.82146096 = fieldWeight in 3269, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.078125 = fieldNorm(doc=3269)
          0.24792054 = weight(abstract_txt:scaled in 3269) [ClassicSimilarity], result of:
            0.24792054 = score(doc=3269,freq=2.0), product of:
              0.24130796 = queryWeight, product of:
                1.4600371 = boost
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.017773455 = queryNorm
              1.0274031 = fieldWeight in 3269, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.078125 = fieldNorm(doc=3269)
          0.09797432 = weight(abstract_txt:topic in 3269) [ClassicSimilarity], result of:
            0.09797432 = score(doc=3269,freq=3.0), product of:
              0.14302689 = queryWeight, product of:
                1.5896515 = boost
                5.062254 = idf(docFreq=760, maxDocs=44218)
                0.017773455 = queryNorm
              0.6850063 = fieldWeight in 3269, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.062254 = idf(docFreq=760, maxDocs=44218)
                0.078125 = fieldNorm(doc=3269)
          0.097309984 = weight(abstract_txt:appear in 3269) [ClassicSimilarity], result of:
            0.097309984 = score(doc=3269,freq=1.0), product of:
              0.20534693 = queryWeight, product of:
                1.9047464 = boost
                6.0656753 = idf(docFreq=278, maxDocs=44218)
                0.017773455 = queryNorm
              0.4738809 = fieldWeight in 3269, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0656753 = idf(docFreq=278, maxDocs=44218)
                0.078125 = fieldNorm(doc=3269)
          0.27815533 = weight(abstract_txt:trec in 3269) [ClassicSimilarity], result of:
            0.27815533 = score(doc=3269,freq=2.0), product of:
              0.37577564 = queryWeight, product of:
                3.1557531 = boost
                6.699675 = idf(docFreq=147, maxDocs=44218)
                0.017773455 = queryNorm
              0.7402165 = fieldWeight in 3269, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.699675 = idf(docFreq=147, maxDocs=44218)
                0.078125 = fieldNorm(doc=3269)
        0.28 = coord(7/25)
    
  3. Egghe, L.: Good properties of similarity measures and their complementarity (2010) 0.19
    0.19323571 = sum of:
      0.19323571 = product of:
        0.96617854 = sum of:
          0.08548602 = weight(abstract_txt:vector in 3993) [ClassicSimilarity], result of:
            0.08548602 = score(doc=3993,freq=2.0), product of:
              0.118656985 = queryWeight, product of:
                1.0238227 = boost
                6.5207376 = idf(docFreq=176, maxDocs=44218)
                0.017773455 = queryNorm
              0.7204466 = fieldWeight in 3993, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5207376 = idf(docFreq=176, maxDocs=44218)
                0.078125 = fieldNorm(doc=3993)
          0.1264002 = weight(abstract_txt:overlap in 3993) [ClassicSimilarity], result of:
            0.1264002 = score(doc=3993,freq=3.0), product of:
              0.1345338 = queryWeight, product of:
                1.0901688 = boost
                6.943297 = idf(docFreq=115, maxDocs=44218)
                0.017773455 = queryNorm
              0.9395424 = fieldWeight in 3993, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.943297 = idf(docFreq=115, maxDocs=44218)
                0.078125 = fieldNorm(doc=3993)
          0.17680119 = weight(abstract_txt:cosine in 3993) [ClassicSimilarity], result of:
            0.17680119 = score(doc=3993,freq=3.0), product of:
              0.16826339 = queryWeight, product of:
                1.2191942 = boost
                7.7650614 = idf(docFreq=50, maxDocs=44218)
                0.017773455 = queryNorm
              1.0507407 = fieldWeight in 3993, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.7650614 = idf(docFreq=50, maxDocs=44218)
                0.078125 = fieldNorm(doc=3993)
          0.15656635 = weight(abstract_txt:measures in 3993) [ClassicSimilarity], result of:
            0.15656635 = score(doc=3993,freq=5.0), product of:
              0.1648892 = queryWeight, product of:
                1.7068257 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.017773455 = queryNorm
              0.9495246 = fieldWeight in 3993, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.078125 = fieldNorm(doc=3993)
          0.4209248 = weight(abstract_txt:similarity in 3993) [ClassicSimilarity], result of:
            0.4209248 = score(doc=3993,freq=6.0), product of:
              0.37798902 = queryWeight, product of:
                3.6546657 = boost
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.017773455 = queryNorm
              1.11359 = fieldWeight in 3993, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.078125 = fieldNorm(doc=3993)
        0.2 = coord(5/25)
    
  4. Huang, L.; Milne, D.; Frank, E.; Witten, I.H.: Learning a concept-based document similarity measure (2012) 0.18
    0.18166628 = sum of:
      0.18166628 = product of:
        0.6488082 = sum of:
          0.07297719 = weight(abstract_txt:overlap in 372) [ClassicSimilarity], result of:
            0.07297719 = score(doc=372,freq=1.0), product of:
              0.1345338 = queryWeight, product of:
                1.0901688 = boost
                6.943297 = idf(docFreq=115, maxDocs=44218)
                0.017773455 = queryNorm
              0.54244506 = fieldWeight in 372, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.943297 = idf(docFreq=115, maxDocs=44218)
                0.078125 = fieldNorm(doc=372)
          0.030465877 = weight(abstract_txt:each in 372) [ClassicSimilarity], result of:
            0.030465877 = score(doc=372,freq=1.0), product of:
              0.094680175 = queryWeight, product of:
                1.2933694 = boost
                4.118742 = idf(docFreq=1954, maxDocs=44218)
                0.017773455 = queryNorm
              0.32177672 = fieldWeight in 372, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.118742 = idf(docFreq=1954, maxDocs=44218)
                0.078125 = fieldNorm(doc=372)
          0.04316566 = weight(abstract_txt:documents in 372) [ClassicSimilarity], result of:
            0.04316566 = score(doc=372,freq=2.0), product of:
              0.09479793 = queryWeight, product of:
                1.2941735 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.017773455 = queryNorm
              0.4553439 = fieldWeight in 372, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.078125 = fieldNorm(doc=372)
          0.06897849 = weight(abstract_txt:document in 372) [ClassicSimilarity], result of:
            0.06897849 = score(doc=372,freq=4.0), product of:
              0.10284244 = queryWeight, product of:
                1.347967 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.017773455 = queryNorm
              0.67072004 = fieldWeight in 372, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.078125 = fieldNorm(doc=372)
          0.09902125 = weight(abstract_txt:measures in 372) [ClassicSimilarity], result of:
            0.09902125 = score(doc=372,freq=2.0), product of:
              0.1648892 = queryWeight, product of:
                1.7068257 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.017773455 = queryNorm
              0.60053205 = fieldWeight in 372, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.078125 = fieldNorm(doc=372)
          0.09117865 = weight(abstract_txt:sets in 372) [ClassicSimilarity], result of:
            0.09117865 = score(doc=372,freq=1.0), product of:
              0.22508287 = queryWeight, product of:
                2.4423614 = boost
                5.185142 = idf(docFreq=672, maxDocs=44218)
                0.017773455 = queryNorm
              0.40508923 = fieldWeight in 372, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.185142 = idf(docFreq=672, maxDocs=44218)
                0.078125 = fieldNorm(doc=372)
          0.24302104 = weight(abstract_txt:similarity in 372) [ClassicSimilarity], result of:
            0.24302104 = score(doc=372,freq=2.0), product of:
              0.37798902 = queryWeight, product of:
                3.6546657 = boost
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.017773455 = queryNorm
              0.64293146 = fieldWeight in 372, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.078125 = fieldNorm(doc=372)
        0.28 = coord(7/25)
    
  5. Chen, T.T.: ¬The congruity between linkage-based factors and content-based clusters : an experimental study using multiple document corpora (2016) 0.18
    0.17798863 = sum of:
      0.17798863 = product of:
        0.5562145 = sum of:
          0.04835819 = weight(abstract_txt:vector in 2775) [ClassicSimilarity], result of:
            0.04835819 = score(doc=2775,freq=1.0), product of:
              0.118656985 = queryWeight, product of:
                1.0238227 = boost
                6.5207376 = idf(docFreq=176, maxDocs=44218)
                0.017773455 = queryNorm
              0.4075461 = fieldWeight in 2775, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5207376 = idf(docFreq=176, maxDocs=44218)
                0.0625 = fieldNorm(doc=2775)
          0.012954595 = weight(abstract_txt:data in 2775) [ClassicSimilarity], result of:
            0.012954595 = score(doc=2775,freq=1.0), product of:
              0.062125873 = queryWeight, product of:
                1.0476816 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.017773455 = queryNorm
              0.20852174 = fieldWeight in 2775, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.0625 = fieldNorm(doc=2775)
          0.08166097 = weight(abstract_txt:cosine in 2775) [ClassicSimilarity], result of:
            0.08166097 = score(doc=2775,freq=1.0), product of:
              0.16826339 = queryWeight, product of:
                1.2191942 = boost
                7.7650614 = idf(docFreq=50, maxDocs=44218)
                0.017773455 = queryNorm
              0.48531634 = fieldWeight in 2775, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7650614 = idf(docFreq=50, maxDocs=44218)
                0.0625 = fieldNorm(doc=2775)
          0.0243727 = weight(abstract_txt:each in 2775) [ClassicSimilarity], result of:
            0.0243727 = score(doc=2775,freq=1.0), product of:
              0.094680175 = queryWeight, product of:
                1.2933694 = boost
                4.118742 = idf(docFreq=1954, maxDocs=44218)
                0.017773455 = queryNorm
              0.25742137 = fieldWeight in 2775, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.118742 = idf(docFreq=1954, maxDocs=44218)
                0.0625 = fieldNorm(doc=2775)
          0.0598121 = weight(abstract_txt:documents in 2775) [ClassicSimilarity], result of:
            0.0598121 = score(doc=2775,freq=6.0), product of:
              0.09479793 = queryWeight, product of:
                1.2941735 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.017773455 = queryNorm
              0.63094306 = fieldWeight in 2775, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.0625 = fieldNorm(doc=2775)
          0.061696235 = weight(abstract_txt:document in 2775) [ClassicSimilarity], result of:
            0.061696235 = score(doc=2775,freq=5.0), product of:
              0.10284244 = queryWeight, product of:
                1.347967 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.017773455 = queryNorm
              0.59991026 = fieldWeight in 2775, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.0625 = fieldNorm(doc=2775)
          0.07294292 = weight(abstract_txt:sets in 2775) [ClassicSimilarity], result of:
            0.07294292 = score(doc=2775,freq=1.0), product of:
              0.22508287 = queryWeight, product of:
                2.4423614 = boost
                5.185142 = idf(docFreq=672, maxDocs=44218)
                0.017773455 = queryNorm
              0.32407138 = fieldWeight in 2775, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.185142 = idf(docFreq=672, maxDocs=44218)
                0.0625 = fieldNorm(doc=2775)
          0.19441682 = weight(abstract_txt:similarity in 2775) [ClassicSimilarity], result of:
            0.19441682 = score(doc=2775,freq=2.0), product of:
              0.37798902 = queryWeight, product of:
                3.6546657 = boost
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.017773455 = queryNorm
              0.51434517 = fieldWeight in 2775, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.0625 = fieldNorm(doc=2775)
        0.32 = coord(8/25)