# Document (#31321)

Author
Egghe, L.
Title
Properties of the n-overlap vector and n-overlap similarity theory
Source
Journal of the American Society for Information Science and Technology. 57(2006) no.9, S.1165-1177
Year
2006
Abstract
In the first part of this article the author defines the n-overlap vector whose coordinates consist of the fraction of the objects (e.g., books, N-grams, etc.) that belong to 1, 2, , n sets (more generally: families) (e.g., libraries, databases, etc.). With the aid of the Lorenz concentration theory, a theory of n-overlap similarity is conceived together with corresponding measures, such as the generalized Jaccard index (generalizing the well-known Jaccard index in case n 5 2). Next, the distributional form of the n-overlap vector is determined assuming certain distributions of the object's and of the set (family) sizes. In this section the decreasing power law and decreasing exponential distribution is explained for the n-overlap vector. Both item (token) n-overlap and source (type) n-overlap are studied. The n-overlap properties of objects indexed by a hierarchical system (e.g., books indexed by numbers from a UDC or Dewey system or by N-grams) are presented in the final section. The author shows how the results given in the previous section can be applied as well as how the Lorenz order of the n-overlap vector is respected by an increase or a decrease of the level of refinement in the hierarchical system (e.g., the value N in N-grams).

## Similar documents (author)

1. Egghe, L.: Little science, big science and beyond (1994) 4.73
```4.727482 = sum of:
4.727482 = weight(author_txt:egghe in 6883) [ClassicSimilarity], result of:
4.727482 = fieldWeight in 6883, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
7.563971 = idf(docFreq=60, maxDocs=43254)
0.625 = fieldNorm(doc=6883)
```
2. Egghe, L.: Expansion of the field of informetrics : the second special issue (2006) 4.73
```4.727482 = sum of:
4.727482 = weight(author_txt:egghe in 119) [ClassicSimilarity], result of:
4.727482 = fieldWeight in 119, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
7.563971 = idf(docFreq=60, maxDocs=43254)
0.625 = fieldNorm(doc=119)
```
3. Egghe, L.: Expansion of the field of informetrics : origins and consequences (2005) 4.73
```4.727482 = sum of:
4.727482 = weight(author_txt:egghe in 2979) [ClassicSimilarity], result of:
4.727482 = fieldWeight in 2979, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
7.563971 = idf(docFreq=60, maxDocs=43254)
0.625 = fieldNorm(doc=2979)
```
4. Egghe, L.: ¬The amount of actions needed for shelving and reshelving (1996) 4.73
```4.727482 = sum of:
4.727482 = weight(author_txt:egghe in 5463) [ClassicSimilarity], result of:
4.727482 = fieldWeight in 5463, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
7.563971 = idf(docFreq=60, maxDocs=43254)
0.625 = fieldNorm(doc=5463)
```
5. Egghe, L.: Special features of the author - publication relationship and a new explanation of Lotka's law based on convolution theory (1994) 4.73
```4.727482 = sum of:
4.727482 = weight(author_txt:egghe in 6137) [ClassicSimilarity], result of:
4.727482 = fieldWeight in 6137, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
7.563971 = idf(docFreq=60, maxDocs=43254)
0.625 = fieldNorm(doc=6137)
```

## Similar documents (content)

1. Egghe, L.: Good properties of similarity measures and their complementarity (2010) 0.43
```0.425826 = sum of:
0.425826 = product of:
1.5208071 = sum of:
0.05859518 = weight(abstract_txt:concentration in 458) [ClassicSimilarity], result of:
0.05859518 = score(doc=458,freq=1.0), product of:
0.09331767 = queryWeight, product of:
1.0126095 = boost
8.037259 = idf(docFreq=37, maxDocs=43254)
0.011466052 = queryNorm
0.62791085 = fieldWeight in 458, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
8.037259 = idf(docFreq=37, maxDocs=43254)
0.078125 = fieldNorm(doc=458)
0.10882308 = weight(abstract_txt:similarity in 458) [ClassicSimilarity], result of:
0.10882308 = score(doc=458,freq=6.0), product of:
0.09776038 = queryWeight, product of:
1.4657385 = boost
5.8169117 = idf(docFreq=349, maxDocs=43254)
0.011466052 = queryNorm
1.1131614 = fieldWeight in 458, product of:
2.4494898 = tf(freq=6.0), with freq of:
6.0 = termFreq=6.0
5.8169117 = idf(docFreq=349, maxDocs=43254)
0.078125 = fieldNorm(doc=458)
0.032260563 = weight(abstract_txt:theory in 458) [ClassicSimilarity], result of:
0.032260563 = score(doc=458,freq=1.0), product of:
0.09040887 = queryWeight, product of:
1.7263395 = boost
4.5674195 = idf(docFreq=1220, maxDocs=43254)
0.011466052 = queryNorm
0.35682964 = fieldWeight in 458, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
4.5674195 = idf(docFreq=1220, maxDocs=43254)
0.078125 = fieldNorm(doc=458)
0.27582195 = weight(abstract_txt:jaccard in 458) [ClassicSimilarity], result of:
0.27582195 = score(doc=458,freq=3.0), product of:
0.22896974 = queryWeight, product of:
2.2431798 = boost
8.902256 = idf(docFreq=15, maxDocs=43254)
0.011466052 = queryNorm
1.2046219 = fieldWeight in 458, product of:
1.7320508 = tf(freq=3.0), with freq of:
3.0 = termFreq=3.0
8.902256 = idf(docFreq=15, maxDocs=43254)
0.078125 = fieldNorm(doc=458)
0.16651987 = weight(abstract_txt:lorenz in 458) [ClassicSimilarity], result of:
0.16651987 = score(doc=458,freq=1.0), product of:
0.23589024 = queryWeight, product of:
2.276827 = boost
9.035788 = idf(docFreq=13, maxDocs=43254)
0.011466052 = queryNorm
0.70592093 = fieldWeight in 458, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
9.035788 = idf(docFreq=13, maxDocs=43254)
0.078125 = fieldNorm(doc=458)
0.22076191 = weight(abstract_txt:vector in 458) [ClassicSimilarity], result of:
0.22076191 = score(doc=458,freq=2.0), product of:
0.30665642 = queryWeight, product of:
4.1046023 = boost
6.5157895 = idf(docFreq=173, maxDocs=43254)
0.011466052 = queryNorm
0.71989983 = fieldWeight in 458, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.5157895 = idf(docFreq=173, maxDocs=43254)
0.078125 = fieldNorm(doc=458)
0.6580246 = weight(abstract_txt:overlap in 458) [ClassicSimilarity], result of:
0.6580246 = score(doc=458,freq=3.0), product of:
0.69905335 = queryWeight, product of:
8.764259 = boost
6.956346 = idf(docFreq=111, maxDocs=43254)
0.011466052 = queryNorm
0.94130814 = fieldWeight in 458, product of:
1.7320508 = tf(freq=3.0), with freq of:
3.0 = termFreq=3.0
6.956346 = idf(docFreq=111, maxDocs=43254)
0.078125 = fieldNorm(doc=458)
0.28 = coord(7/25)
```
2. Egghe, L.: New relations between similarity measures for vectors based on vector norms (2009) 0.13
```0.1289036 = sum of:
0.1289036 = product of:
0.8056475 = sum of:
0.044426836 = weight(abstract_txt:similarity in 4709) [ClassicSimilarity], result of:
0.044426836 = score(doc=4709,freq=1.0), product of:
0.09776038 = queryWeight, product of:
1.4657385 = boost
5.8169117 = idf(docFreq=349, maxDocs=43254)
0.011466052 = queryNorm
0.45444623 = fieldWeight in 4709, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
5.8169117 = idf(docFreq=349, maxDocs=43254)
0.078125 = fieldNorm(doc=4709)
0.22520767 = weight(abstract_txt:jaccard in 4709) [ClassicSimilarity], result of:
0.22520767 = score(doc=4709,freq=2.0), product of:
0.22896974 = queryWeight, product of:
2.2431798 = boost
8.902256 = idf(docFreq=15, maxDocs=43254)
0.011466052 = queryNorm
0.9835696 = fieldWeight in 4709, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
8.902256 = idf(docFreq=15, maxDocs=43254)
0.078125 = fieldNorm(doc=4709)
0.15610225 = weight(abstract_txt:vector in 4709) [ClassicSimilarity], result of:
0.15610225 = score(doc=4709,freq=1.0), product of:
0.30665642 = queryWeight, product of:
4.1046023 = boost
6.5157895 = idf(docFreq=173, maxDocs=43254)
0.011466052 = queryNorm
0.5090461 = fieldWeight in 4709, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
6.5157895 = idf(docFreq=173, maxDocs=43254)
0.078125 = fieldNorm(doc=4709)
0.3799107 = weight(abstract_txt:overlap in 4709) [ClassicSimilarity], result of:
0.3799107 = score(doc=4709,freq=1.0), product of:
0.69905335 = queryWeight, product of:
8.764259 = boost
6.956346 = idf(docFreq=111, maxDocs=43254)
0.011466052 = queryNorm
0.54346454 = fieldWeight in 4709, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
6.956346 = idf(docFreq=111, maxDocs=43254)
0.078125 = fieldNorm(doc=4709)
0.16 = coord(4/25)
```
3. Hood, W.W.; Wilson, C.S.: ¬The relationship of records in multiple databases to their usage or citedness (2005) 0.13
```0.12820238 = sum of:
0.12820238 = product of:
1.0683532 = sum of:
0.10195389 = weight(abstract_txt:indexed in 5681) [ClassicSimilarity], result of:
0.10195389 = score(doc=5681,freq=2.0), product of:
0.10787144 = queryWeight, product of:
1.5396723 = boost
6.1103244 = idf(docFreq=260, maxDocs=43254)
0.011466052 = queryNorm
0.9451425 = fieldWeight in 5681, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.1103244 = idf(docFreq=260, maxDocs=43254)
0.109375 = fieldNorm(doc=5681)
0.045164794 = weight(abstract_txt:theory in 5681) [ClassicSimilarity], result of:
0.045164794 = score(doc=5681,freq=1.0), product of:
0.09040887 = queryWeight, product of:
1.7263395 = boost
4.5674195 = idf(docFreq=1220, maxDocs=43254)
0.011466052 = queryNorm
0.49956152 = fieldWeight in 5681, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
4.5674195 = idf(docFreq=1220, maxDocs=43254)
0.109375 = fieldNorm(doc=5681)
0.9212344 = weight(abstract_txt:overlap in 5681) [ClassicSimilarity], result of:
0.9212344 = score(doc=5681,freq=3.0), product of:
0.69905335 = queryWeight, product of:
8.764259 = boost
6.956346 = idf(docFreq=111, maxDocs=43254)
0.011466052 = queryNorm
1.3178314 = fieldWeight in 5681, product of:
1.7320508 = tf(freq=3.0), with freq of:
3.0 = termFreq=3.0
6.956346 = idf(docFreq=111, maxDocs=43254)
0.109375 = fieldNorm(doc=5681)
0.12 = coord(3/25)
```
4. Colavizza, G.; Boyack, K.W.; Eck, N.J. van; Waltman, L.: ¬The closer the better : similarity of publication pairs at different cocitation levels (2018) 0.12
```0.119628824 = sum of:
0.119628824 = product of:
0.7476802 = sum of:
0.027944913 = weight(abstract_txt:author in 215) [ClassicSimilarity], result of:
0.027944913 = score(doc=215,freq=1.0), product of:
0.0717686 = queryWeight, product of:
1.2558631 = boost
4.9840026 = idf(docFreq=804, maxDocs=43254)
0.011466052 = queryNorm
0.3893752 = fieldWeight in 215, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
4.9840026 = idf(docFreq=804, maxDocs=43254)
0.078125 = fieldNorm(doc=215)
0.07694954 = weight(abstract_txt:similarity in 215) [ClassicSimilarity], result of:
0.07694954 = score(doc=215,freq=3.0), product of:
0.09776038 = queryWeight, product of:
1.4657385 = boost
5.8169117 = idf(docFreq=349, maxDocs=43254)
0.011466052 = queryNorm
0.787124 = fieldWeight in 215, product of:
1.7320508 = tf(freq=3.0), with freq of:
3.0 = termFreq=3.0
5.8169117 = idf(docFreq=349, maxDocs=43254)
0.078125 = fieldNorm(doc=215)
0.105510816 = weight(abstract_txt:section in 215) [ClassicSimilarity], result of:
0.105510816 = score(doc=215,freq=2.0), product of:
0.15810698 = queryWeight, product of:
2.282949 = boost
6.0400553 = idf(docFreq=279, maxDocs=43254)
0.011466052 = queryNorm
0.66733813 = fieldWeight in 215, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.0400553 = idf(docFreq=279, maxDocs=43254)
0.078125 = fieldNorm(doc=215)
0.5372749 = weight(abstract_txt:overlap in 215) [ClassicSimilarity], result of:
0.5372749 = score(doc=215,freq=2.0), product of:
0.69905335 = queryWeight, product of:
8.764259 = boost
6.956346 = idf(docFreq=111, maxDocs=43254)
0.011466052 = queryNorm
0.76857495 = fieldWeight in 215, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.956346 = idf(docFreq=111, maxDocs=43254)
0.078125 = fieldNorm(doc=215)
0.16 = coord(4/25)
```
5. Rorvig, M.: Images of similarity : a visual exploration of optimal similarity metrics and scaling properties of TREC topic-document sets (1999) 0.10
```0.10060545 = sum of:
0.10060545 = product of:
0.8383788 = sum of:
0.087960646 = weight(abstract_txt:similarity in 5768) [ClassicSimilarity], result of:
0.087960646 = score(doc=5768,freq=2.0), product of:
0.09776038 = queryWeight, product of:
1.4657385 = boost
5.8169117 = idf(docFreq=349, maxDocs=43254)
0.011466052 = queryNorm
0.8997576 = fieldWeight in 5768, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
5.8169117 = idf(docFreq=349, maxDocs=43254)
0.109375 = fieldNorm(doc=5768)
0.21854314 = weight(abstract_txt:vector in 5768) [ClassicSimilarity], result of:
0.21854314 = score(doc=5768,freq=1.0), product of:
0.30665642 = queryWeight, product of:
4.1046023 = boost
6.5157895 = idf(docFreq=173, maxDocs=43254)
0.011466052 = queryNorm
0.7126645 = fieldWeight in 5768, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
6.5157895 = idf(docFreq=173, maxDocs=43254)
0.109375 = fieldNorm(doc=5768)
0.531875 = weight(abstract_txt:overlap in 5768) [ClassicSimilarity], result of:
0.531875 = score(doc=5768,freq=1.0), product of:
0.69905335 = queryWeight, product of:
8.764259 = boost
6.956346 = idf(docFreq=111, maxDocs=43254)
0.011466052 = queryNorm
0.76085037 = fieldWeight in 5768, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
6.956346 = idf(docFreq=111, maxDocs=43254)
0.109375 = fieldNorm(doc=5768)
0.12 = coord(3/25)
```