Document (#25130)

Author
Polanco, X.
Francois, C.
Title
Data clustering and cluster mapping or visualization in text processing and mining
Source
Dynamism and stability in knowledge organization: Proceedings of the 6th International ISKO-Conference, 10-13 July 2000, Toronto, Canada. Ed.: C. Beghtol et al
Imprint
Würzburg : Ergon
Year
2000
Pages
S.359-365
Series
Advances in knowledge organization; vol.7
Abstract
The focus of this paper is on a cooperative use of the text data clustering and mapping as visualization-based analysis tools. Whether we expose a generic approach in text processing and mining, we only concentrate on the two-middle steps of the process: data clustering and cluster mapping. In the data clustering analysis step, we use the axial k-means (AKM) algorithm: an iterative partitioning unsupervised winner-take-all (WTA) method, producing overlapping clusters. In the step of mapping the clusters, we use a nonlinear multilayer perceptron (MLP) with two hidden layers. Finally, the map is proposed as an analysis device rather than of visualization. It allows the analyst to evaluate the relative position of clusters which are indicators of themes induced from data themselves.

Similar documents (author)

  1. Polanco, X.: Extraction et modélisation des connaissances : une approche et ses technologies (EMCAT) (1999) 6.19
    6.190705 = sum of:
      6.190705 = weight(author_txt:polanco in 6241) [ClassicSimilarity], result of:
        6.190705 = fieldWeight in 6241, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.905128 = idf(docFreq=5, maxDocs=44218)
          0.625 = fieldNorm(doc=6241)
    
  2. Polanco, X.: Clusters, graphs, and networks for analyzing Internet-Web-supported communication within a virtual community (2003) 6.19
    6.190705 = sum of:
      6.190705 = weight(author_txt:polanco in 2737) [ClassicSimilarity], result of:
        6.190705 = fieldWeight in 2737, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.905128 = idf(docFreq=5, maxDocs=44218)
          0.625 = fieldNorm(doc=2737)
    
  3. Grivel, L.; Mutschke, P.; Polanco, X.: Thematic mapping on bibliographic databases by cluster analysis : a description of the SDOC environment with SOLIS (1995) 3.71
    3.7144227 = sum of:
      3.7144227 = weight(author_txt:polanco in 1900) [ClassicSimilarity], result of:
        3.7144227 = fieldWeight in 1900, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.905128 = idf(docFreq=5, maxDocs=44218)
          0.375 = fieldNorm(doc=1900)
    
  4. Polanco, X.; François, C.; Aly Ould Louly, M.: ¬An artificial neural network perspective on knowledge representation from databases : the use of a multilayer perception for data clusters cartography (1998) 3.10
    3.0953524 = sum of:
      3.0953524 = weight(author_txt:polanco in 72) [ClassicSimilarity], result of:
        3.0953524 = fieldWeight in 72, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.905128 = idf(docFreq=5, maxDocs=44218)
          0.3125 = fieldNorm(doc=72)
    

Similar documents (content)

  1. Liu, X.; Yu, S.; Janssens, F.; Glänzel, W.; Moreau, Y.; Moor, B.de: Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database (2010) 0.25
    0.24650097 = sum of:
      0.24650097 = product of:
        0.8803606 = sum of:
          0.045599256 = weight(abstract_txt:processing in 3464) [ClassicSimilarity], result of:
            0.045599256 = score(doc=3464,freq=1.0), product of:
              0.11834721 = queryWeight, product of:
                1.3484461 = boost
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.017795686 = queryNorm
              0.38530064 = fieldWeight in 3464, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.078125 = fieldNorm(doc=3464)
          0.02780759 = weight(abstract_txt:analysis in 3464) [ClassicSimilarity], result of:
            0.02780759 = score(doc=3464,freq=1.0), product of:
              0.097422406 = queryWeight, product of:
                1.4984065 = boost
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.017795686 = queryNorm
              0.2854332 = fieldWeight in 3464, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.078125 = fieldNorm(doc=3464)
          0.037705995 = weight(abstract_txt:text in 3464) [ClassicSimilarity], result of:
            0.037705995 = score(doc=3464,freq=1.0), product of:
              0.11935031 = queryWeight, product of:
                1.6584867 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.017795686 = queryNorm
              0.3159271 = fieldWeight in 3464, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=3464)
          0.08952196 = weight(abstract_txt:mining in 3464) [ClassicSimilarity], result of:
            0.08952196 = score(doc=3464,freq=1.0), product of:
              0.18555497 = queryWeight, product of:
                1.6884605 = boost
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.017795686 = queryNorm
              0.4824552 = fieldWeight in 3464, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.078125 = fieldNorm(doc=3464)
          0.04991116 = weight(abstract_txt:data in 3464) [ClassicSimilarity], result of:
            0.04991116 = score(doc=3464,freq=2.0), product of:
              0.13540083 = queryWeight, product of:
                2.2805274 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.017795686 = queryNorm
              0.36861783 = fieldWeight in 3464, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.078125 = fieldNorm(doc=3464)
          0.14665261 = weight(abstract_txt:mapping in 3464) [ClassicSimilarity], result of:
            0.14665261 = score(doc=3464,freq=1.0), product of:
              0.32487983 = queryWeight, product of:
                3.1595917 = boost
                5.777993 = idf(docFreq=371, maxDocs=44218)
                0.017795686 = queryNorm
              0.4514057 = fieldWeight in 3464, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.777993 = idf(docFreq=371, maxDocs=44218)
                0.078125 = fieldNorm(doc=3464)
          0.48316205 = weight(abstract_txt:clustering in 3464) [ClassicSimilarity], result of:
            0.48316205 = score(doc=3464,freq=7.0), product of:
              0.3760325 = queryWeight, product of:
                3.3992436 = boost
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.017795686 = queryNorm
              1.2848943 = fieldWeight in 3464, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.078125 = fieldNorm(doc=3464)
        0.28 = coord(7/25)
    
  2. Chen, C.; Ibekwe-SanJuan, F.; Hou, J.: ¬The structure and dynamics of cocitation clusters : a multiple-perspective cocitation analysis (2010) 0.20
    0.2010899 = sum of:
      0.2010899 = product of:
        0.8378746 = sum of:
          0.05561518 = weight(abstract_txt:analysis in 3591) [ClassicSimilarity], result of:
            0.05561518 = score(doc=3591,freq=4.0), product of:
              0.097422406 = queryWeight, product of:
                1.4984065 = boost
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.017795686 = queryNorm
              0.5708664 = fieldWeight in 3591, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.078125 = fieldNorm(doc=3591)
          0.037705995 = weight(abstract_txt:text in 3591) [ClassicSimilarity], result of:
            0.037705995 = score(doc=3591,freq=1.0), product of:
              0.11935031 = queryWeight, product of:
                1.6584867 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.017795686 = queryNorm
              0.3159271 = fieldWeight in 3591, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=3591)
          0.15102427 = weight(abstract_txt:cluster in 3591) [ClassicSimilarity], result of:
            0.15102427 = score(doc=3591,freq=2.0), product of:
              0.20870878 = queryWeight, product of:
                1.7907088 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.017795686 = queryNorm
              0.7236124 = fieldWeight in 3591, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.078125 = fieldNorm(doc=3591)
          0.13779669 = weight(abstract_txt:visualization in 3591) [ClassicSimilarity], result of:
            0.13779669 = score(doc=3591,freq=1.0), product of:
              0.2831669 = queryWeight, product of:
                2.5545914 = boost
                6.228827 = idf(docFreq=236, maxDocs=44218)
                0.017795686 = queryNorm
              0.4866271 = fieldWeight in 3591, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.228827 = idf(docFreq=236, maxDocs=44218)
                0.078125 = fieldNorm(doc=3591)
          0.27311438 = weight(abstract_txt:clusters in 3591) [ClassicSimilarity], result of:
            0.27311438 = score(doc=3591,freq=3.0), product of:
              0.30979368 = queryWeight, product of:
                2.6720004 = boost
                6.515104 = idf(docFreq=177, maxDocs=44218)
                0.017795686 = queryNorm
              0.88160086 = fieldWeight in 3591, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.515104 = idf(docFreq=177, maxDocs=44218)
                0.078125 = fieldNorm(doc=3591)
          0.18261808 = weight(abstract_txt:clustering in 3591) [ClassicSimilarity], result of:
            0.18261808 = score(doc=3591,freq=1.0), product of:
              0.3760325 = queryWeight, product of:
                3.3992436 = boost
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.017795686 = queryNorm
              0.4856444 = fieldWeight in 3591, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.078125 = fieldNorm(doc=3591)
        0.24 = coord(6/25)
    
  3. Small, H.: ¬A general framework for creating large scale maps of science in two or three dimensions : the SciViz system (1998) 0.18
    0.18481193 = sum of:
      0.18481193 = product of:
        0.9240596 = sum of:
          0.04940953 = weight(abstract_txt:data in 1039) [ClassicSimilarity], result of:
            0.04940953 = score(doc=1039,freq=1.0), product of:
              0.13540083 = queryWeight, product of:
                2.2805274 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.017795686 = queryNorm
              0.36491305 = fieldWeight in 1039, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.109375 = fieldNorm(doc=1039)
          0.19291535 = weight(abstract_txt:visualization in 1039) [ClassicSimilarity], result of:
            0.19291535 = score(doc=1039,freq=1.0), product of:
              0.2831669 = queryWeight, product of:
                2.5545914 = boost
                6.228827 = idf(docFreq=236, maxDocs=44218)
                0.017795686 = queryNorm
              0.68127793 = fieldWeight in 1039, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.228827 = idf(docFreq=236, maxDocs=44218)
                0.109375 = fieldNorm(doc=1039)
          0.22075573 = weight(abstract_txt:clusters in 1039) [ClassicSimilarity], result of:
            0.22075573 = score(doc=1039,freq=1.0), product of:
              0.30979368 = queryWeight, product of:
                2.6720004 = boost
                6.515104 = idf(docFreq=177, maxDocs=44218)
                0.017795686 = queryNorm
              0.7125895 = fieldWeight in 1039, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.515104 = idf(docFreq=177, maxDocs=44218)
                0.109375 = fieldNorm(doc=1039)
          0.20531367 = weight(abstract_txt:mapping in 1039) [ClassicSimilarity], result of:
            0.20531367 = score(doc=1039,freq=1.0), product of:
              0.32487983 = queryWeight, product of:
                3.1595917 = boost
                5.777993 = idf(docFreq=371, maxDocs=44218)
                0.017795686 = queryNorm
              0.631968 = fieldWeight in 1039, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.777993 = idf(docFreq=371, maxDocs=44218)
                0.109375 = fieldNorm(doc=1039)
          0.25566533 = weight(abstract_txt:clustering in 1039) [ClassicSimilarity], result of:
            0.25566533 = score(doc=1039,freq=1.0), product of:
              0.3760325 = queryWeight, product of:
                3.3992436 = boost
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.017795686 = queryNorm
              0.6799022 = fieldWeight in 1039, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.109375 = fieldNorm(doc=1039)
        0.2 = coord(5/25)
    
  4. Janssens, F.; Leta, J.; Glänzel, W.; Moor, B. de: Towards mapping library and information science (2006) 0.18
    0.1781045 = sum of:
      0.1781045 = product of:
        0.74210215 = sum of:
          0.038531326 = weight(abstract_txt:analysis in 992) [ClassicSimilarity], result of:
            0.038531326 = score(doc=992,freq=3.0), product of:
              0.097422406 = queryWeight, product of:
                1.4984065 = boost
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.017795686 = queryNorm
              0.39550784 = fieldWeight in 992, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.0625 = fieldNorm(doc=992)
          0.052246958 = weight(abstract_txt:text in 992) [ClassicSimilarity], result of:
            0.052246958 = score(doc=992,freq=3.0), product of:
              0.11935031 = queryWeight, product of:
                1.6584867 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.017795686 = queryNorm
              0.4377614 = fieldWeight in 992, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=992)
          0.12081942 = weight(abstract_txt:cluster in 992) [ClassicSimilarity], result of:
            0.12081942 = score(doc=992,freq=2.0), product of:
              0.20870878 = queryWeight, product of:
                1.7907088 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.017795686 = queryNorm
              0.57888997 = fieldWeight in 992, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.0625 = fieldNorm(doc=992)
          0.2184915 = weight(abstract_txt:clusters in 992) [ClassicSimilarity], result of:
            0.2184915 = score(doc=992,freq=3.0), product of:
              0.30979368 = queryWeight, product of:
                2.6720004 = boost
                6.515104 = idf(docFreq=177, maxDocs=44218)
                0.017795686 = queryNorm
              0.70528066 = fieldWeight in 992, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.515104 = idf(docFreq=177, maxDocs=44218)
                0.0625 = fieldNorm(doc=992)
          0.16591848 = weight(abstract_txt:mapping in 992) [ClassicSimilarity], result of:
            0.16591848 = score(doc=992,freq=2.0), product of:
              0.32487983 = queryWeight, product of:
                3.1595917 = boost
                5.777993 = idf(docFreq=371, maxDocs=44218)
                0.017795686 = queryNorm
              0.51070726 = fieldWeight in 992, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.777993 = idf(docFreq=371, maxDocs=44218)
                0.0625 = fieldNorm(doc=992)
          0.14609447 = weight(abstract_txt:clustering in 992) [ClassicSimilarity], result of:
            0.14609447 = score(doc=992,freq=1.0), product of:
              0.3760325 = queryWeight, product of:
                3.3992436 = boost
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.017795686 = queryNorm
              0.38851553 = fieldWeight in 992, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.0625 = fieldNorm(doc=992)
        0.24 = coord(6/25)
    
  5. Gamber, T.; Friedrich-Nishio, M.; Grupp, H.: Science and technology in standardization : a statistical analysis of merging knowledge structures (2008) 0.16
    0.16115098 = sum of:
      0.16115098 = product of:
        0.6714624 = sum of:
          0.02780759 = weight(abstract_txt:analysis in 2260) [ClassicSimilarity], result of:
            0.02780759 = score(doc=2260,freq=1.0), product of:
              0.097422406 = queryWeight, product of:
                1.4984065 = boost
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.017795686 = queryNorm
              0.2854332 = fieldWeight in 2260, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.078125 = fieldNorm(doc=2260)
          0.10679029 = weight(abstract_txt:cluster in 2260) [ClassicSimilarity], result of:
            0.10679029 = score(doc=2260,freq=1.0), product of:
              0.20870878 = queryWeight, product of:
                1.7907088 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.017795686 = queryNorm
              0.5116713 = fieldWeight in 2260, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.078125 = fieldNorm(doc=2260)
          0.04991116 = weight(abstract_txt:data in 2260) [ClassicSimilarity], result of:
            0.04991116 = score(doc=2260,freq=2.0), product of:
              0.13540083 = queryWeight, product of:
                2.2805274 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.017795686 = queryNorm
              0.36861783 = fieldWeight in 2260, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.078125 = fieldNorm(doc=2260)
          0.15768266 = weight(abstract_txt:clusters in 2260) [ClassicSimilarity], result of:
            0.15768266 = score(doc=2260,freq=1.0), product of:
              0.30979368 = queryWeight, product of:
                2.6720004 = boost
                6.515104 = idf(docFreq=177, maxDocs=44218)
                0.017795686 = queryNorm
              0.5089925 = fieldWeight in 2260, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.515104 = idf(docFreq=177, maxDocs=44218)
                0.078125 = fieldNorm(doc=2260)
          0.14665261 = weight(abstract_txt:mapping in 2260) [ClassicSimilarity], result of:
            0.14665261 = score(doc=2260,freq=1.0), product of:
              0.32487983 = queryWeight, product of:
                3.1595917 = boost
                5.777993 = idf(docFreq=371, maxDocs=44218)
                0.017795686 = queryNorm
              0.4514057 = fieldWeight in 2260, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.777993 = idf(docFreq=371, maxDocs=44218)
                0.078125 = fieldNorm(doc=2260)
          0.18261808 = weight(abstract_txt:clustering in 2260) [ClassicSimilarity], result of:
            0.18261808 = score(doc=2260,freq=1.0), product of:
              0.3760325 = queryWeight, product of:
                3.3992436 = boost
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.017795686 = queryNorm
              0.4856444 = fieldWeight in 2260, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.078125 = fieldNorm(doc=2260)
        0.24 = coord(6/25)