Document (#36222)

Author
Tsui, E.
Wang, W.M.
Cheung, C.F.
Lau, A.S.M.
Title
¬A concept-relationship acquisition and inference approach for hierarchical taxonomy construction from tags
Source
Information processing and management. 46(2010) no.1, S.44-57
Year
2010
Abstract
Taxonomy construction is a resource-demanding, top-down, and time consuming effort. It does not always cater for the prevailing context of the captured information. This paper proposes a novel approach to automatically convert tags into a hierarchical taxonomy. Folksonomy describes the process by which many users add metadata in the form of keywords or tags to shared content. Using folksonomy as a knowledge source for nominating tags, the proposed method first converts the tags into a hierarchy. This serves to harness a core set of taxonomy terms; the generated hierarchical structure facilitates users' information navigation behavior and permits personalizations. Newly acquired tags are then progressively integrated into a taxonomy in a largely automated way to complete the taxonomy creation process. Common taxonomy construction techniques are based on 3 main approaches: clustering, lexico-syntactic pattern matching, and automatic acquisition from machine-readable dictionaries. In contrast to these prevailing approaches, this paper proposes a taxonomy construction analysis based on heuristic rules and deep syntactic analysis. The proposed method requires only a relatively small corpus to create a preliminary taxonomy. The approach has been evaluated using an expert-defined taxonomy in the environmental protection domain and encouraging results were yielded.
Theme
Social tagging

Similar documents (author)

  1. Wang, W.M.; Cheung, C.F.; Lee, W.B.; Kwok, S.K.: Mining knowledge from natural language texts using fuzzy associated concept mapping (2008) 3.52
    3.524797 = sum of:
      3.524797 = sum of:
        1.0008686 = weight(author_txt:wang in 4122) [ClassicSimilarity], result of:
          1.0008686 = score(doc=4122,freq=1.0), product of:
            0.47498482 = queryWeight, product of:
              6.7429094 = idf(docFreq=136, maxDocs=42740)
              0.07044212 = queryNorm
            2.1071591 = fieldWeight in 4122, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              6.7429094 = idf(docFreq=136, maxDocs=42740)
              0.3125 = fieldNorm(doc=4122)
        2.5239284 = weight(author_txt:cheung in 4122) [ClassicSimilarity], result of:
          2.5239284 = score(doc=4122,freq=1.0), product of:
            0.8799939 = queryWeight, product of:
              1.3611312 = boost
              9.177984 = idf(docFreq=11, maxDocs=42740)
              0.07044212 = queryNorm
            2.8681202 = fieldWeight in 4122, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.177984 = idf(docFreq=11, maxDocs=42740)
              0.3125 = fieldNorm(doc=4122)
    
  2. Cheung, W.; Hsu, C.: ¬The model-assisted global query system for multiple databases in distributed enterprises (1996) 2.02
    2.0191426 = sum of:
      2.0191426 = product of:
        4.0382853 = sum of:
          4.0382853 = weight(author_txt:cheung in 349) [ClassicSimilarity], result of:
            4.0382853 = score(doc=349,freq=1.0), product of:
              0.8799939 = queryWeight, product of:
                1.3611312 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.07044212 = queryNorm
              4.588992 = fieldWeight in 349, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.5 = fieldNorm(doc=349)
        0.5 = coord(1/2)
    
  3. Cheung, C.M.K.; Lee, M.K.O.: Understanding consumer trust in Internet shopping : a multidisciplinary approach (2006) 2.02
    2.0191426 = sum of:
      2.0191426 = product of:
        4.0382853 = sum of:
          4.0382853 = weight(author_txt:cheung in 281) [ClassicSimilarity], result of:
            4.0382853 = score(doc=281,freq=1.0), product of:
              0.8799939 = queryWeight, product of:
                1.3611312 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.07044212 = queryNorm
              4.588992 = fieldWeight in 281, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.5 = fieldNorm(doc=281)
        0.5 = coord(1/2)
    
  4. Cheung, C.M.K.; Lee, M.K.O.: ¬The structure of Web-based information systems satisfaction : testing of competing models (2008) 2.02
    2.0191426 = sum of:
      2.0191426 = product of:
        4.0382853 = sum of:
          4.0382853 = weight(author_txt:cheung in 4006) [ClassicSimilarity], result of:
            4.0382853 = score(doc=4006,freq=1.0), product of:
              0.8799939 = queryWeight, product of:
                1.3611312 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.07044212 = queryNorm
              4.588992 = fieldWeight in 4006, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.5 = fieldNorm(doc=4006)
        0.5 = coord(1/2)
    
  5. Cheung, C.M.K.; Lee, M.K.O.: User satisfaction with an internet-based portal : an asymmetric and nonlinear approach (2009) 2.02
    2.0191426 = sum of:
      2.0191426 = product of:
        4.0382853 = sum of:
          4.0382853 = weight(author_txt:cheung in 4702) [ClassicSimilarity], result of:
            4.0382853 = score(doc=4702,freq=1.0), product of:
              0.8799939 = queryWeight, product of:
                1.3611312 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.07044212 = queryNorm
              4.588992 = fieldWeight in 4702, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.5 = fieldNorm(doc=4702)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Alexander, F.: Assessing information taxonomies using epistemology and the sociology of science (2012) 0.21
    0.20839942 = sum of:
      0.20839942 = product of:
        0.7442836 = sum of:
          0.012181825 = weight(abstract_txt:process in 2398) [ClassicSimilarity], result of:
            0.012181825 = score(doc=2398,freq=1.0), product of:
              0.054655623 = queryWeight, product of:
                1.0972402 = boost
                4.07558 = idf(docFreq=1972, maxDocs=42740)
                0.0122220395 = queryNorm
              0.22288328 = fieldWeight in 2398, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.07558 = idf(docFreq=1972, maxDocs=42740)
                0.0546875 = fieldNorm(doc=2398)
          0.017986828 = weight(abstract_txt:proposed in 2398) [ClassicSimilarity], result of:
            0.017986828 = score(doc=2398,freq=1.0), product of:
              0.070870094 = queryWeight, product of:
                1.2494411 = boost
                4.640914 = idf(docFreq=1120, maxDocs=42740)
                0.0122220395 = queryNorm
              0.25379997 = fieldWeight in 2398, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.640914 = idf(docFreq=1120, maxDocs=42740)
                0.0546875 = fieldNorm(doc=2398)
          0.018228948 = weight(abstract_txt:approaches in 2398) [ClassicSimilarity], result of:
            0.018228948 = score(doc=2398,freq=1.0), product of:
              0.07150466 = queryWeight, product of:
                1.2550224 = boost
                4.661645 = idf(docFreq=1097, maxDocs=42740)
                0.0122220395 = queryNorm
              0.2549337 = fieldWeight in 2398, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.661645 = idf(docFreq=1097, maxDocs=42740)
                0.0546875 = fieldNorm(doc=2398)
          0.013905081 = weight(abstract_txt:into in 2398) [ClassicSimilarity], result of:
            0.013905081 = score(doc=2398,freq=1.0), product of:
              0.0683344 = queryWeight, product of:
                1.5026215 = boost
                3.7208836 = idf(docFreq=2812, maxDocs=42740)
                0.0122220395 = queryNorm
              0.20348582 = fieldWeight in 2398, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7208836 = idf(docFreq=2812, maxDocs=42740)
                0.0546875 = fieldNorm(doc=2398)
          0.025096742 = weight(abstract_txt:approach in 2398) [ClassicSimilarity], result of:
            0.025096742 = score(doc=2398,freq=3.0), product of:
              0.07023628 = queryWeight, product of:
                1.5233885 = boost
                3.772308 = idf(docFreq=2671, maxDocs=42740)
                0.0122220395 = queryNorm
              0.3573188 = fieldWeight in 2398, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.772308 = idf(docFreq=2671, maxDocs=42740)
                0.0546875 = fieldNorm(doc=2398)
          0.06000132 = weight(abstract_txt:construction in 2398) [ClassicSimilarity], result of:
            0.06000132 = score(doc=2398,freq=1.0), product of:
              0.19934736 = queryWeight, product of:
                2.9634972 = boost
                5.503795 = idf(docFreq=472, maxDocs=42740)
                0.0122220395 = queryNorm
              0.3009888 = fieldWeight in 2398, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.503795 = idf(docFreq=472, maxDocs=42740)
                0.0546875 = fieldNorm(doc=2398)
          0.5968829 = weight(abstract_txt:taxonomy in 2398) [ClassicSimilarity], result of:
            0.5968829 = score(doc=2398,freq=6.0), product of:
              0.68869287 = queryWeight, product of:
                8.709276 = boost
                6.4699335 = idf(docFreq=179, maxDocs=42740)
                0.0122220395 = queryNorm
              0.8666895 = fieldWeight in 2398, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.4699335 = idf(docFreq=179, maxDocs=42740)
                0.0546875 = fieldNorm(doc=2398)
        0.28 = coord(7/25)
    
  2. Esteban, M.A.: ¬Los lenguajes documentales ante el paso de la organizacion de la realidad y el saber a la organizacion del conocimiento (1995) 0.19
    0.18899156 = sum of:
      0.18899156 = product of:
        0.78746486 = sum of:
          0.07750556 = weight(abstract_txt:permits in 6799) [ClassicSimilarity], result of:
            0.07750556 = score(doc=6799,freq=1.0), product of:
              0.09383219 = queryWeight, product of:
                1.0165886 = boost
                7.5520167 = idf(docFreq=60, maxDocs=42740)
                0.0122220395 = queryNorm
              0.8260018 = fieldWeight in 6799, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5520167 = idf(docFreq=60, maxDocs=42740)
                0.109375 = fieldNorm(doc=6799)
          0.036457896 = weight(abstract_txt:approaches in 6799) [ClassicSimilarity], result of:
            0.036457896 = score(doc=6799,freq=1.0), product of:
              0.07150466 = queryWeight, product of:
                1.2550224 = boost
                4.661645 = idf(docFreq=1097, maxDocs=42740)
                0.0122220395 = queryNorm
              0.5098674 = fieldWeight in 6799, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.661645 = idf(docFreq=1097, maxDocs=42740)
                0.109375 = fieldNorm(doc=6799)
          0.05442854 = weight(abstract_txt:proposes in 6799) [ClassicSimilarity], result of:
            0.05442854 = score(doc=6799,freq=1.0), product of:
              0.09340233 = queryWeight, product of:
                1.4343765 = boost
                5.3278365 = idf(docFreq=563, maxDocs=42740)
                0.0122220395 = queryNorm
              0.58273214 = fieldWeight in 6799, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3278365 = idf(docFreq=563, maxDocs=42740)
                0.109375 = fieldNorm(doc=6799)
          0.028979221 = weight(abstract_txt:approach in 6799) [ClassicSimilarity], result of:
            0.028979221 = score(doc=6799,freq=1.0), product of:
              0.07023628 = queryWeight, product of:
                1.5233885 = boost
                3.772308 = idf(docFreq=2671, maxDocs=42740)
                0.0122220395 = queryNorm
              0.4125962 = fieldWeight in 6799, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.772308 = idf(docFreq=2671, maxDocs=42740)
                0.109375 = fieldNorm(doc=6799)
          0.10274085 = weight(abstract_txt:hierarchical in 6799) [ClassicSimilarity], result of:
            0.10274085 = score(doc=6799,freq=1.0), product of:
              0.16330487 = queryWeight, product of:
                2.3228943 = boost
                5.752094 = idf(docFreq=368, maxDocs=42740)
                0.0122220395 = queryNorm
              0.62913525 = fieldWeight in 6799, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.752094 = idf(docFreq=368, maxDocs=42740)
                0.109375 = fieldNorm(doc=6799)
          0.48735282 = weight(abstract_txt:taxonomy in 6799) [ClassicSimilarity], result of:
            0.48735282 = score(doc=6799,freq=1.0), product of:
              0.68869287 = queryWeight, product of:
                8.709276 = boost
                6.4699335 = idf(docFreq=179, maxDocs=42740)
                0.0122220395 = queryNorm
              0.707649 = fieldWeight in 6799, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4699335 = idf(docFreq=179, maxDocs=42740)
                0.109375 = fieldNorm(doc=6799)
        0.24 = coord(6/25)
    
  3. Wu, Y.; Yang, L.: Construction and evaluation of an oil spill semantic relation taxonomy for supporting knowledge discovery (2015) 0.18
    0.18067731 = sum of:
      0.18067731 = product of:
        1.1292332 = sum of:
          0.028517852 = weight(abstract_txt:method in 4203) [ClassicSimilarity], result of:
            0.028517852 = score(doc=4203,freq=1.0), product of:
              0.06727427 = queryWeight, product of:
                1.2173313 = boost
                4.5216455 = idf(docFreq=1262, maxDocs=42740)
                0.0122220395 = queryNorm
              0.42390427 = fieldWeight in 4203, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5216455 = idf(docFreq=1262, maxDocs=42740)
                0.09375 = fieldNorm(doc=4203)
          0.030834563 = weight(abstract_txt:proposed in 4203) [ClassicSimilarity], result of:
            0.030834563 = score(doc=4203,freq=1.0), product of:
              0.070870094 = queryWeight, product of:
                1.2494411 = boost
                4.640914 = idf(docFreq=1120, maxDocs=42740)
                0.0122220395 = queryNorm
              0.43508568 = fieldWeight in 4203, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.640914 = idf(docFreq=1120, maxDocs=42740)
                0.09375 = fieldNorm(doc=4203)
          0.046653032 = weight(abstract_txt:proposes in 4203) [ClassicSimilarity], result of:
            0.046653032 = score(doc=4203,freq=1.0), product of:
              0.09340233 = queryWeight, product of:
                1.4343765 = boost
                5.3278365 = idf(docFreq=563, maxDocs=42740)
                0.0122220395 = queryNorm
              0.49948466 = fieldWeight in 4203, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3278365 = idf(docFreq=563, maxDocs=42740)
                0.09375 = fieldNorm(doc=4203)
          1.0232278 = weight(abstract_txt:taxonomy in 4203) [ClassicSimilarity], result of:
            1.0232278 = score(doc=4203,freq=6.0), product of:
              0.68869287 = queryWeight, product of:
                8.709276 = boost
                6.4699335 = idf(docFreq=179, maxDocs=42740)
                0.0122220395 = queryNorm
              1.4857534 = fieldWeight in 4203, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.4699335 = idf(docFreq=179, maxDocs=42740)
                0.09375 = fieldNorm(doc=4203)
        0.16 = coord(4/25)
    
  4. Wang, Z.; Khoo, C.S.G.; Chaudhry, A.S.: Evaluation of the navigation effectiveness of an organizational taxonomy built on a general classification scheme and domain thesauri (2014) 0.15
    0.15020299 = sum of:
      0.15020299 = product of:
        0.93876874 = sum of:
          0.020699443 = weight(abstract_txt:approach in 3252) [ClassicSimilarity], result of:
            0.020699443 = score(doc=3252,freq=1.0), product of:
              0.07023628 = queryWeight, product of:
                1.5233885 = boost
                3.772308 = idf(docFreq=2671, maxDocs=42740)
                0.0122220395 = queryNorm
              0.29471156 = fieldWeight in 3252, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.772308 = idf(docFreq=2671, maxDocs=42740)
                0.078125 = fieldNorm(doc=3252)
          0.07338632 = weight(abstract_txt:hierarchical in 3252) [ClassicSimilarity], result of:
            0.07338632 = score(doc=3252,freq=1.0), product of:
              0.16330487 = queryWeight, product of:
                2.3228943 = boost
                5.752094 = idf(docFreq=368, maxDocs=42740)
                0.0122220395 = queryNorm
              0.44938233 = fieldWeight in 3252, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.752094 = idf(docFreq=368, maxDocs=42740)
                0.078125 = fieldNorm(doc=3252)
          0.14846477 = weight(abstract_txt:construction in 3252) [ClassicSimilarity], result of:
            0.14846477 = score(doc=3252,freq=3.0), product of:
              0.19934736 = queryWeight, product of:
                2.9634972 = boost
                5.503795 = idf(docFreq=472, maxDocs=42740)
                0.0122220395 = queryNorm
              0.74475414 = fieldWeight in 3252, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.503795 = idf(docFreq=472, maxDocs=42740)
                0.078125 = fieldNorm(doc=3252)
          0.69621825 = weight(abstract_txt:taxonomy in 3252) [ClassicSimilarity], result of:
            0.69621825 = score(doc=3252,freq=4.0), product of:
              0.68869287 = queryWeight, product of:
                8.709276 = boost
                6.4699335 = idf(docFreq=179, maxDocs=42740)
                0.0122220395 = queryNorm
              1.0109271 = fieldWeight in 3252, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.4699335 = idf(docFreq=179, maxDocs=42740)
                0.078125 = fieldNorm(doc=3252)
        0.16 = coord(4/25)
    
  5. Wang, Z.; Chaudhry, A.S.; Khoo, C.: Support from bibliographic tools to build an organizational taxonomy for navigation : use of a general classification scheme and domain thesauri (2010) 0.14
    0.14449936 = sum of:
      0.14449936 = product of:
        0.903121 = sum of:
          0.017402608 = weight(abstract_txt:process in 711) [ClassicSimilarity], result of:
            0.017402608 = score(doc=711,freq=1.0), product of:
              0.054655623 = queryWeight, product of:
                1.0972402 = boost
                4.07558 = idf(docFreq=1972, maxDocs=42740)
                0.0122220395 = queryNorm
              0.3184047 = fieldWeight in 711, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.07558 = idf(docFreq=1972, maxDocs=42740)
                0.078125 = fieldNorm(doc=711)
          0.103783935 = weight(abstract_txt:hierarchical in 711) [ClassicSimilarity], result of:
            0.103783935 = score(doc=711,freq=2.0), product of:
              0.16330487 = queryWeight, product of:
                2.3228943 = boost
                5.752094 = idf(docFreq=368, maxDocs=42740)
                0.0122220395 = queryNorm
              0.6355226 = fieldWeight in 711, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.752094 = idf(docFreq=368, maxDocs=42740)
                0.078125 = fieldNorm(doc=711)
          0.08571617 = weight(abstract_txt:construction in 711) [ClassicSimilarity], result of:
            0.08571617 = score(doc=711,freq=1.0), product of:
              0.19934736 = queryWeight, product of:
                2.9634972 = boost
                5.503795 = idf(docFreq=472, maxDocs=42740)
                0.0122220395 = queryNorm
              0.429984 = fieldWeight in 711, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.503795 = idf(docFreq=472, maxDocs=42740)
                0.078125 = fieldNorm(doc=711)
          0.69621825 = weight(abstract_txt:taxonomy in 711) [ClassicSimilarity], result of:
            0.69621825 = score(doc=711,freq=4.0), product of:
              0.68869287 = queryWeight, product of:
                8.709276 = boost
                6.4699335 = idf(docFreq=179, maxDocs=42740)
                0.0122220395 = queryNorm
              1.0109271 = fieldWeight in 711, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.4699335 = idf(docFreq=179, maxDocs=42740)
                0.078125 = fieldNorm(doc=711)
        0.16 = coord(4/25)