Document (#31321)

Author
Egghe, L.
Title
Properties of the n-overlap vector and n-overlap similarity theory
Source
Journal of the American Society for Information Science and Technology. 57(2006) no.9, S.1165-1177
Year
2006
Abstract
In the first part of this article the author defines the n-overlap vector whose coordinates consist of the fraction of the objects (e.g., books, N-grams, etc.) that belong to 1, 2, , n sets (more generally: families) (e.g., libraries, databases, etc.). With the aid of the Lorenz concentration theory, a theory of n-overlap similarity is conceived together with corresponding measures, such as the generalized Jaccard index (generalizing the well-known Jaccard index in case n 5 2). Next, the distributional form of the n-overlap vector is determined assuming certain distributions of the object's and of the set (family) sizes. In this section the decreasing power law and decreasing exponential distribution is explained for the n-overlap vector. Both item (token) n-overlap and source (type) n-overlap are studied. The n-overlap properties of objects indexed by a hierarchical system (e.g., books indexed by numbers from a UDC or Dewey system or by N-grams) are presented in the final section. The author shows how the results given in the previous section can be applied as well as how the Lorenz order of the n-overlap vector is respected by an increase or a decrease of the level of refinement in the hierarchical system (e.g., the value N in N-grams).

Similar documents (author)

  1. Egghe, L.: Little science, big science and beyond (1994) 4.71
    4.7136316 = sum of:
      4.7136316 = weight(author_txt:egghe in 6883) [ClassicSimilarity], result of:
        4.7136316 = fieldWeight in 6883, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.5418105 = idf(docFreq=60, maxDocs=42306)
          0.625 = fieldNorm(doc=6883)
    
  2. Egghe, L.: Expansion of the field of informetrics : the second special issue (2006) 4.71
    4.7136316 = sum of:
      4.7136316 = weight(author_txt:egghe in 7119) [ClassicSimilarity], result of:
        4.7136316 = fieldWeight in 7119, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.5418105 = idf(docFreq=60, maxDocs=42306)
          0.625 = fieldNorm(doc=7119)
    
  3. Egghe, L.: Expansion of the field of informetrics : origins and consequences (2005) 4.71
    4.7136316 = sum of:
      4.7136316 = weight(author_txt:egghe in 1979) [ClassicSimilarity], result of:
        4.7136316 = fieldWeight in 1979, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.5418105 = idf(docFreq=60, maxDocs=42306)
          0.625 = fieldNorm(doc=1979)
    
  4. Egghe, L.: ¬The amount of actions needed for shelving and reshelving (1996) 4.71
    4.7136316 = sum of:
      4.7136316 = weight(author_txt:egghe in 4463) [ClassicSimilarity], result of:
        4.7136316 = fieldWeight in 4463, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.5418105 = idf(docFreq=60, maxDocs=42306)
          0.625 = fieldNorm(doc=4463)
    
  5. Egghe, L.: Special features of the author - publication relationship and a new explanation of Lotka's law based on convolution theory (1994) 4.71
    4.7136316 = sum of:
      4.7136316 = weight(author_txt:egghe in 5137) [ClassicSimilarity], result of:
        4.7136316 = fieldWeight in 5137, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.5418105 = idf(docFreq=60, maxDocs=42306)
          0.625 = fieldNorm(doc=5137)
    

Similar documents (content)

  1. Egghe, L.: Good properties of similarity measures and their complementarity (2010) 0.43
    0.42528063 = sum of:
      0.42528063 = product of:
        1.5188594 = sum of:
          0.059469275 = weight(abstract_txt:concentration in 994) [ClassicSimilarity], result of:
            0.059469275 = score(doc=994,freq=1.0), product of:
              0.09433525 = queryWeight, product of:
                1.0194757 = boost
                8.069165 = idf(docFreq=35, maxDocs=42306)
                0.011467495 = queryNorm
              0.6304035 = fieldWeight in 994, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.069165 = idf(docFreq=35, maxDocs=42306)
                0.078125 = fieldNorm(doc=994)
          0.109031335 = weight(abstract_txt:similarity in 994) [ClassicSimilarity], result of:
            0.109031335 = score(doc=994,freq=6.0), product of:
              0.09798044 = queryWeight, product of:
                1.4693476 = boost
                5.814954 = idf(docFreq=342, maxDocs=42306)
                0.011467495 = queryNorm
              1.1127868 = fieldWeight in 994, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.814954 = idf(docFreq=342, maxDocs=42306)
                0.078125 = fieldNorm(doc=994)
          0.032884613 = weight(abstract_txt:theory in 994) [ClassicSimilarity], result of:
            0.032884613 = score(doc=994,freq=1.0), product of:
              0.091660276 = queryWeight, product of:
                1.7405683 = boost
                4.592208 = idf(docFreq=1164, maxDocs=42306)
                0.011467495 = queryNorm
              0.35876626 = fieldWeight in 994, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.592208 = idf(docFreq=1164, maxDocs=42306)
                0.078125 = fieldNorm(doc=994)
          0.2745683 = weight(abstract_txt:jaccard in 994) [ClassicSimilarity], result of:
            0.2745683 = score(doc=994,freq=3.0), product of:
              0.22849783 = queryWeight, product of:
                2.2438607 = boost
                8.8800955 = idf(docFreq=15, maxDocs=42306)
                0.011467495 = queryNorm
              1.2016232 = fieldWeight in 994, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.8800955 = idf(docFreq=15, maxDocs=42306)
                0.078125 = fieldNorm(doc=994)
          0.16578133 = weight(abstract_txt:lorenz in 994) [ClassicSimilarity], result of:
            0.16578133 = score(doc=994,freq=1.0), product of:
              0.23542145 = queryWeight, product of:
                2.2776022 = boost
                9.013627 = idf(docFreq=13, maxDocs=42306)
                0.011467495 = queryNorm
              0.7041896 = fieldWeight in 994, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.013627 = idf(docFreq=13, maxDocs=42306)
                0.078125 = fieldNorm(doc=994)
          0.22092207 = weight(abstract_txt:vector in 994) [ClassicSimilarity], result of:
            0.22092207 = score(doc=994,freq=2.0), product of:
              0.30710366 = queryWeight, product of:
                4.113082 = boost
                6.5110207 = idf(docFreq=170, maxDocs=42306)
                0.011467495 = queryNorm
              0.7193729 = fieldWeight in 994, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5110207 = idf(docFreq=170, maxDocs=42306)
                0.078125 = fieldNorm(doc=994)
          0.65620244 = weight(abstract_txt:overlap in 994) [ClassicSimilarity], result of:
            0.65620244 = score(doc=994,freq=3.0), product of:
              0.6984421 = queryWeight, product of:
                8.77213 = boost
                6.943154 = idf(docFreq=110, maxDocs=42306)
                0.011467495 = queryNorm
              0.93952304 = fieldWeight in 994, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.943154 = idf(docFreq=110, maxDocs=42306)
                0.078125 = fieldNorm(doc=994)
        0.28 = coord(7/25)
    
  2. Egghe, L.: New relations between similarity measures for vectors based on vector norms (2009) 0.13
    0.1286032 = sum of:
      0.1286032 = product of:
        0.80377007 = sum of:
          0.04451185 = weight(abstract_txt:similarity in 528) [ClassicSimilarity], result of:
            0.04451185 = score(doc=528,freq=1.0), product of:
              0.09798044 = queryWeight, product of:
                1.4693476 = boost
                5.814954 = idf(docFreq=342, maxDocs=42306)
                0.011467495 = queryNorm
              0.45429325 = fieldWeight in 528, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.814954 = idf(docFreq=342, maxDocs=42306)
                0.078125 = fieldNorm(doc=528)
          0.22418407 = weight(abstract_txt:jaccard in 528) [ClassicSimilarity], result of:
            0.22418407 = score(doc=528,freq=2.0), product of:
              0.22849783 = queryWeight, product of:
                2.2438607 = boost
                8.8800955 = idf(docFreq=15, maxDocs=42306)
                0.011467495 = queryNorm
              0.9811212 = fieldWeight in 528, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.8800955 = idf(docFreq=15, maxDocs=42306)
                0.078125 = fieldNorm(doc=528)
          0.15621549 = weight(abstract_txt:vector in 528) [ClassicSimilarity], result of:
            0.15621549 = score(doc=528,freq=1.0), product of:
              0.30710366 = queryWeight, product of:
                4.113082 = boost
                6.5110207 = idf(docFreq=170, maxDocs=42306)
                0.011467495 = queryNorm
              0.5086735 = fieldWeight in 528, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5110207 = idf(docFreq=170, maxDocs=42306)
                0.078125 = fieldNorm(doc=528)
          0.3788587 = weight(abstract_txt:overlap in 528) [ClassicSimilarity], result of:
            0.3788587 = score(doc=528,freq=1.0), product of:
              0.6984421 = queryWeight, product of:
                8.77213 = boost
                6.943154 = idf(docFreq=110, maxDocs=42306)
                0.011467495 = queryNorm
              0.5424339 = fieldWeight in 528, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.943154 = idf(docFreq=110, maxDocs=42306)
                0.078125 = fieldNorm(doc=528)
        0.16 = coord(4/25)
    
  3. Hood, W.W.; Wilson, C.S.: ¬The relationship of records in multiple databases to their usage or citedness (2005) 0.13
    0.12792684 = sum of:
      0.12792684 = product of:
        1.0660571 = sum of:
          0.10133514 = weight(abstract_txt:indexed in 4681) [ClassicSimilarity], result of:
            0.10133514 = score(doc=4681,freq=2.0), product of:
              0.10753924 = queryWeight, product of:
                1.5393534 = boost
                6.0920024 = idf(docFreq=259, maxDocs=42306)
                0.011467495 = queryNorm
              0.94230855 = fieldWeight in 4681, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.0920024 = idf(docFreq=259, maxDocs=42306)
                0.109375 = fieldNorm(doc=4681)
          0.046038456 = weight(abstract_txt:theory in 4681) [ClassicSimilarity], result of:
            0.046038456 = score(doc=4681,freq=1.0), product of:
              0.091660276 = queryWeight, product of:
                1.7405683 = boost
                4.592208 = idf(docFreq=1164, maxDocs=42306)
                0.011467495 = queryNorm
              0.5022727 = fieldWeight in 4681, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.592208 = idf(docFreq=1164, maxDocs=42306)
                0.109375 = fieldNorm(doc=4681)
          0.91868347 = weight(abstract_txt:overlap in 4681) [ClassicSimilarity], result of:
            0.91868347 = score(doc=4681,freq=3.0), product of:
              0.6984421 = queryWeight, product of:
                8.77213 = boost
                6.943154 = idf(docFreq=110, maxDocs=42306)
                0.011467495 = queryNorm
              1.3153323 = fieldWeight in 4681, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.943154 = idf(docFreq=110, maxDocs=42306)
                0.109375 = fieldNorm(doc=4681)
        0.12 = coord(3/25)
    
  4. Colavizza, G.; Boyack, K.W.; Eck, N.J. van; Waltman, L.: ¬The closer the better : similarity of publication pairs at different cocitation levels (2018) 0.12
    0.11953572 = sum of:
      0.11953572 = product of:
        0.74709827 = sum of:
          0.02822885 = weight(abstract_txt:author in 1133) [ClassicSimilarity], result of:
            0.02822885 = score(doc=1133,freq=1.0), product of:
              0.07232432 = queryWeight, product of:
                1.2624002 = boost
                4.995958 = idf(docFreq=777, maxDocs=42306)
                0.011467495 = queryNorm
              0.3903092 = fieldWeight in 1133, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.995958 = idf(docFreq=777, maxDocs=42306)
                0.078125 = fieldNorm(doc=1133)
          0.0770968 = weight(abstract_txt:similarity in 1133) [ClassicSimilarity], result of:
            0.0770968 = score(doc=1133,freq=3.0), product of:
              0.09798044 = queryWeight, product of:
                1.4693476 = boost
                5.814954 = idf(docFreq=342, maxDocs=42306)
                0.011467495 = queryNorm
              0.78685904 = fieldWeight in 1133, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.814954 = idf(docFreq=342, maxDocs=42306)
                0.078125 = fieldNorm(doc=1133)
          0.10598556 = weight(abstract_txt:section in 1133) [ClassicSimilarity], result of:
            0.10598556 = score(doc=1133,freq=2.0), product of:
              0.15873541 = queryWeight, product of:
                2.2905374 = boost
                6.0432124 = idf(docFreq=272, maxDocs=42306)
                0.011467495 = queryNorm
              0.66768694 = fieldWeight in 1133, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.0432124 = idf(docFreq=272, maxDocs=42306)
                0.078125 = fieldNorm(doc=1133)
          0.53578705 = weight(abstract_txt:overlap in 1133) [ClassicSimilarity], result of:
            0.53578705 = score(doc=1133,freq=2.0), product of:
              0.6984421 = queryWeight, product of:
                8.77213 = boost
                6.943154 = idf(docFreq=110, maxDocs=42306)
                0.011467495 = queryNorm
              0.7671174 = fieldWeight in 1133, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.943154 = idf(docFreq=110, maxDocs=42306)
                0.078125 = fieldNorm(doc=1133)
        0.16 = coord(4/25)
    
  5. Rorvig, M.: Images of similarity : a visual exploration of optimal similarity metrics and scaling properties of TREC topic-document sets (1999) 0.10
    0.10046793 = sum of:
      0.10046793 = product of:
        0.83723277 = sum of:
          0.08812897 = weight(abstract_txt:similarity in 4768) [ClassicSimilarity], result of:
            0.08812897 = score(doc=4768,freq=2.0), product of:
              0.09798044 = queryWeight, product of:
                1.4693476 = boost
                5.814954 = idf(docFreq=342, maxDocs=42306)
                0.011467495 = queryNorm
              0.8994547 = fieldWeight in 4768, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.814954 = idf(docFreq=342, maxDocs=42306)
                0.109375 = fieldNorm(doc=4768)
          0.21870169 = weight(abstract_txt:vector in 4768) [ClassicSimilarity], result of:
            0.21870169 = score(doc=4768,freq=1.0), product of:
              0.30710366 = queryWeight, product of:
                4.113082 = boost
                6.5110207 = idf(docFreq=170, maxDocs=42306)
                0.011467495 = queryNorm
              0.7121429 = fieldWeight in 4768, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5110207 = idf(docFreq=170, maxDocs=42306)
                0.109375 = fieldNorm(doc=4768)
          0.5304021 = weight(abstract_txt:overlap in 4768) [ClassicSimilarity], result of:
            0.5304021 = score(doc=4768,freq=1.0), product of:
              0.6984421 = queryWeight, product of:
                8.77213 = boost
                6.943154 = idf(docFreq=110, maxDocs=42306)
                0.011467495 = queryNorm
              0.75940746 = fieldWeight in 4768, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.943154 = idf(docFreq=110, maxDocs=42306)
                0.109375 = fieldNorm(doc=4768)
        0.12 = coord(3/25)