Document (#33072)

Author
Miyamoto, S.
Title
Information clustering based an fuzzy multisets
Source
Information processing and management. 39(2003) no.2, S.195-213
Year
2003
Abstract
A fuzzy multiset model for information clustering is proposed with application to information retrieval on the World Wide Web. Noting that a search engine retrieves multiple occurrences of the same subjects with possibly different degrees of relevance, we observe that fuzzy multisets provide an appropriate model of information retrieval on the WWW. Information clustering which means both term clustering and document clustering is considered. Three methods of the hard c-means, fuzzy c-means, and an agglomerative method using cluster centers are proposed. Two distances between fuzzy multisets and algorithms for calculating cluster centers are defined. Theoretical properties concerning the clustering algorithms are studied. Illustrative examples are given to show how the algorithms work.
Theme
Automatisches Klassifizieren

Similar documents (content)

  1. Bose, I.; Chen, X.: ¬A method for extension of generative topographic mapping for fuzzy clustering (2009) 0.27
    0.26701778 = sum of:
      0.26701778 = product of:
        1.1125741 = sum of:
          0.04882158 = weight(abstract_txt:proposed in 2711) [ClassicSimilarity], result of:
            0.04882158 = score(doc=2711,freq=2.0), product of:
              0.095867306 = queryWeight, product of:
                1.6678425 = boost
                4.6093135 = idf(docFreq=1196, maxDocs=44218)
                0.012470366 = queryNorm
              0.509262 = fieldWeight in 2711, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6093135 = idf(docFreq=1196, maxDocs=44218)
                0.078125 = fieldNorm(doc=2711)
          0.09903555 = weight(abstract_txt:cluster in 2711) [ClassicSimilarity], result of:
            0.09903555 = score(doc=2711,freq=1.0), product of:
              0.19355308 = queryWeight, product of:
                2.3698444 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.012470366 = queryNorm
              0.5116713 = fieldWeight in 2711, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.078125 = fieldNorm(doc=2711)
          0.066686 = weight(abstract_txt:means in 2711) [ClassicSimilarity], result of:
            0.066686 = score(doc=2711,freq=1.0), product of:
              0.17021304 = queryWeight, product of:
                2.7218354 = boost
                5.0147786 = idf(docFreq=797, maxDocs=44218)
                0.012470366 = queryNorm
              0.39177957 = fieldWeight in 2711, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0147786 = idf(docFreq=797, maxDocs=44218)
                0.078125 = fieldNorm(doc=2711)
          0.13906865 = weight(abstract_txt:algorithms in 2711) [ClassicSimilarity], result of:
            0.13906865 = score(doc=2711,freq=2.0), product of:
              0.22051895 = queryWeight, product of:
                3.0980496 = boost
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.012470366 = queryNorm
              0.63064265 = fieldWeight in 2711, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.078125 = fieldNorm(doc=2711)
          0.39970183 = weight(abstract_txt:fuzzy in 2711) [ClassicSimilarity], result of:
            0.39970183 = score(doc=2711,freq=2.0), product of:
              0.5285265 = queryWeight, product of:
                6.1918893 = boost
                6.8448567 = idf(docFreq=127, maxDocs=44218)
                0.012470366 = queryNorm
              0.75625694 = fieldWeight in 2711, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.8448567 = idf(docFreq=127, maxDocs=44218)
                0.078125 = fieldNorm(doc=2711)
          0.35926044 = weight(abstract_txt:clustering in 2711) [ClassicSimilarity], result of:
            0.35926044 = score(doc=2711,freq=2.0), product of:
              0.5230895 = queryWeight, product of:
                6.747897 = boost
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.012470366 = queryNorm
              0.6868049 = fieldWeight in 2711, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.078125 = fieldNorm(doc=2711)
        0.24 = coord(6/25)
    
  2. Cathey, R.J.; Jensen, E.C.; Beitzel, S.M.; Frieder, O.; Grossman, D.: Exploiting parallelism to support scalable hierarchical clustering (2007) 0.26
    0.26499853 = sum of:
      0.26499853 = product of:
        0.9464233 = sum of:
          0.011835721 = weight(abstract_txt:retrieval in 448) [ClassicSimilarity], result of:
            0.011835721 = score(doc=448,freq=1.0), product of:
              0.054493222 = queryWeight, product of:
                1.2574509 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.012470366 = queryNorm
              0.21719621 = fieldWeight in 448, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.0625 = fieldNorm(doc=448)
          0.027617654 = weight(abstract_txt:proposed in 448) [ClassicSimilarity], result of:
            0.027617654 = score(doc=448,freq=1.0), product of:
              0.095867306 = queryWeight, product of:
                1.6678425 = boost
                4.6093135 = idf(docFreq=1196, maxDocs=44218)
                0.012470366 = queryNorm
              0.2880821 = fieldWeight in 448, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6093135 = idf(docFreq=1196, maxDocs=44218)
                0.0625 = fieldNorm(doc=448)
          0.16533189 = weight(abstract_txt:agglomerative in 448) [ClassicSimilarity], result of:
            0.16533189 = score(doc=448,freq=2.0), product of:
              0.19911183 = queryWeight, product of:
                1.6996258 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.012470366 = queryNorm
              0.8303468 = fieldWeight in 448, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.0625 = fieldNorm(doc=448)
          0.079228446 = weight(abstract_txt:cluster in 448) [ClassicSimilarity], result of:
            0.079228446 = score(doc=448,freq=1.0), product of:
              0.19355308 = queryWeight, product of:
                2.3698444 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.012470366 = queryNorm
              0.40933704 = fieldWeight in 448, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.0625 = fieldNorm(doc=448)
          0.053348795 = weight(abstract_txt:means in 448) [ClassicSimilarity], result of:
            0.053348795 = score(doc=448,freq=1.0), product of:
              0.17021304 = queryWeight, product of:
                2.7218354 = boost
                5.0147786 = idf(docFreq=797, maxDocs=44218)
                0.012470366 = queryNorm
              0.31342366 = fieldWeight in 448, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0147786 = idf(docFreq=797, maxDocs=44218)
                0.0625 = fieldNorm(doc=448)
          0.111254916 = weight(abstract_txt:algorithms in 448) [ClassicSimilarity], result of:
            0.111254916 = score(doc=448,freq=2.0), product of:
              0.22051895 = queryWeight, product of:
                3.0980496 = boost
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.012470366 = queryNorm
              0.5045141 = fieldWeight in 448, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.0625 = fieldNorm(doc=448)
          0.49780592 = weight(abstract_txt:clustering in 448) [ClassicSimilarity], result of:
            0.49780592 = score(doc=448,freq=6.0), product of:
              0.5230895 = queryWeight, product of:
                6.747897 = boost
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.012470366 = queryNorm
              0.95166487 = fieldWeight in 448, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.0625 = fieldNorm(doc=448)
        0.28 = coord(7/25)
    
  3. Pop, H.P.: ¬A fuzzy classification of the chemical elements (1996) 0.18
    0.18002334 = sum of:
      0.18002334 = product of:
        1.1251459 = sum of:
          0.07126385 = weight(abstract_txt:properties in 5116) [ClassicSimilarity], result of:
            0.07126385 = score(doc=5116,freq=2.0), product of:
              0.07823723 = queryWeight, product of:
                1.0653971 = boost
                5.888745 = idf(docFreq=332, maxDocs=44218)
                0.012470366 = queryNorm
              0.91086876 = fieldWeight in 5116, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.888745 = idf(docFreq=332, maxDocs=44218)
                0.109375 = fieldNorm(doc=5116)
          0.13864978 = weight(abstract_txt:cluster in 5116) [ClassicSimilarity], result of:
            0.13864978 = score(doc=5116,freq=1.0), product of:
              0.19355308 = queryWeight, product of:
                2.3698444 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.012470366 = queryNorm
              0.7163398 = fieldWeight in 5116, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.109375 = fieldNorm(doc=5116)
          0.55958253 = weight(abstract_txt:fuzzy in 5116) [ClassicSimilarity], result of:
            0.55958253 = score(doc=5116,freq=2.0), product of:
              0.5285265 = queryWeight, product of:
                6.1918893 = boost
                6.8448567 = idf(docFreq=127, maxDocs=44218)
                0.012470366 = queryNorm
              1.0587597 = fieldWeight in 5116, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.8448567 = idf(docFreq=127, maxDocs=44218)
                0.109375 = fieldNorm(doc=5116)
          0.3556497 = weight(abstract_txt:clustering in 5116) [ClassicSimilarity], result of:
            0.3556497 = score(doc=5116,freq=1.0), product of:
              0.5230895 = queryWeight, product of:
                6.747897 = boost
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.012470366 = queryNorm
              0.6799022 = fieldWeight in 5116, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.109375 = fieldNorm(doc=5116)
        0.16 = coord(4/25)
    
  4. Miyamoto, S.: Application of rough sets to information retrieval (1998) 0.18
    0.17617485 = sum of:
      0.17617485 = product of:
        0.8808742 = sum of:
          0.030750107 = weight(abstract_txt:retrieval in 559) [ClassicSimilarity], result of:
            0.030750107 = score(doc=559,freq=3.0), product of:
              0.054493222 = queryWeight, product of:
                1.2574509 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.012470366 = queryNorm
              0.5642923 = fieldWeight in 559, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.09375 = fieldNorm(doc=559)
          0.02679524 = weight(abstract_txt:model in 559) [ClassicSimilarity], result of:
            0.02679524 = score(doc=559,freq=1.0), product of:
              0.07170073 = queryWeight, product of:
                1.4423863 = boost
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.012470366 = queryNorm
              0.37370944 = fieldWeight in 559, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.09375 = fieldNorm(doc=559)
          0.12379063 = weight(abstract_txt:illustrative in 559) [ClassicSimilarity], result of:
            0.12379063 = score(doc=559,freq=1.0), product of:
              0.15785815 = queryWeight, product of:
                1.5133462 = boost
                8.364683 = idf(docFreq=27, maxDocs=44218)
                0.012470366 = queryNorm
              0.78418905 = fieldWeight in 559, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.364683 = idf(docFreq=27, maxDocs=44218)
                0.09375 = fieldNorm(doc=559)
          0.021221682 = weight(abstract_txt:information in 559) [ClassicSimilarity], result of:
            0.021221682 = score(doc=559,freq=2.0), product of:
              0.06611627 = queryWeight, product of:
                2.1899993 = boost
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.012470366 = queryNorm
              0.32097518 = fieldWeight in 559, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.09375 = fieldNorm(doc=559)
          0.67831653 = weight(abstract_txt:fuzzy in 559) [ClassicSimilarity], result of:
            0.67831653 = score(doc=559,freq=4.0), product of:
              0.5285265 = queryWeight, product of:
                6.1918893 = boost
                6.8448567 = idf(docFreq=127, maxDocs=44218)
                0.012470366 = queryNorm
              1.2834107 = fieldWeight in 559, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.8448567 = idf(docFreq=127, maxDocs=44218)
                0.09375 = fieldNorm(doc=559)
        0.2 = coord(5/25)
    
  5. Mather, L.A.: ¬A linear algebra measure of cluster quality (2000) 0.17
    0.16966666 = sum of:
      0.16966666 = product of:
        0.70694447 = sum of:
          0.016738236 = weight(abstract_txt:retrieval in 4767) [ClassicSimilarity], result of:
            0.016738236 = score(doc=4767,freq=2.0), product of:
              0.054493222 = queryWeight, product of:
                1.2574509 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.012470366 = queryNorm
              0.3071618 = fieldWeight in 4767, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.0625 = fieldNorm(doc=4767)
          0.025262792 = weight(abstract_txt:model in 4767) [ClassicSimilarity], result of:
            0.025262792 = score(doc=4767,freq=2.0), product of:
              0.07170073 = queryWeight, product of:
                1.4423863 = boost
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.012470366 = queryNorm
              0.35233662 = fieldWeight in 4767, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.0625 = fieldNorm(doc=4767)
          0.010003997 = weight(abstract_txt:information in 4767) [ClassicSimilarity], result of:
            0.010003997 = score(doc=4767,freq=1.0), product of:
              0.06611627 = queryWeight, product of:
                2.1899993 = boost
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.012470366 = queryNorm
              0.15130915 = fieldWeight in 4767, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.0625 = fieldNorm(doc=4767)
          0.13722768 = weight(abstract_txt:cluster in 4767) [ClassicSimilarity], result of:
            0.13722768 = score(doc=4767,freq=3.0), product of:
              0.19355308 = queryWeight, product of:
                2.3698444 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.012470366 = queryNorm
              0.70899254 = fieldWeight in 4767, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.0625 = fieldNorm(doc=4767)
          0.111254916 = weight(abstract_txt:algorithms in 4767) [ClassicSimilarity], result of:
            0.111254916 = score(doc=4767,freq=2.0), product of:
              0.22051895 = queryWeight, product of:
                3.0980496 = boost
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.012470366 = queryNorm
              0.5045141 = fieldWeight in 4767, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.0625 = fieldNorm(doc=4767)
          0.4064568 = weight(abstract_txt:clustering in 4767) [ClassicSimilarity], result of:
            0.4064568 = score(doc=4767,freq=4.0), product of:
              0.5230895 = queryWeight, product of:
                6.747897 = boost
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.012470366 = queryNorm
              0.77703106 = fieldWeight in 4767, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.0625 = fieldNorm(doc=4767)
        0.24 = coord(6/25)