Document (#36222)

Author
Tsui, E.
Wang, W.M.
Cheung, C.F.
Lau, A.S.M.
Title
¬A concept-relationship acquisition and inference approach for hierarchical taxonomy construction from tags
Source
Information processing and management. 46(2010) no.1, S.44-57
Year
2010
Abstract
Taxonomy construction is a resource-demanding, top-down, and time consuming effort. It does not always cater for the prevailing context of the captured information. This paper proposes a novel approach to automatically convert tags into a hierarchical taxonomy. Folksonomy describes the process by which many users add metadata in the form of keywords or tags to shared content. Using folksonomy as a knowledge source for nominating tags, the proposed method first converts the tags into a hierarchy. This serves to harness a core set of taxonomy terms; the generated hierarchical structure facilitates users' information navigation behavior and permits personalizations. Newly acquired tags are then progressively integrated into a taxonomy in a largely automated way to complete the taxonomy creation process. Common taxonomy construction techniques are based on 3 main approaches: clustering, lexico-syntactic pattern matching, and automatic acquisition from machine-readable dictionaries. In contrast to these prevailing approaches, this paper proposes a taxonomy construction analysis based on heuristic rules and deep syntactic analysis. The proposed method requires only a relatively small corpus to create a preliminary taxonomy. The approach has been evaluated using an expert-defined taxonomy in the environmental protection domain and encouraging results were yielded.
Theme
Social tagging

Similar documents (author)

  1. Wang, W.M.; Cheung, C.F.; Lee, W.B.; Kwok, S.K.: Mining knowledge from natural language texts using fuzzy associated concept mapping (2008) 3.52
    3.5240302 = sum of:
      3.5240302 = sum of:
        0.99250305 = weight(author_txt:wang in 4122) [ClassicSimilarity], result of:
          0.99250305 = score(doc=4122,freq=1.0), product of:
            0.47219294 = queryWeight, product of:
              6.726085 = idf(docFreq=140, maxDocs=43254)
              0.07020324 = queryNorm
            2.1019015 = fieldWeight in 4122, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              6.726085 = idf(docFreq=140, maxDocs=43254)
              0.3125 = fieldNorm(doc=4122)
        2.531527 = weight(author_txt:cheung in 4122) [ClassicSimilarity], result of:
          2.531527 = score(doc=4122,freq=1.0), product of:
            0.88149524 = queryWeight, product of:
              1.3663131 = boost
              9.189939 = idf(docFreq=11, maxDocs=43254)
              0.07020324 = queryNorm
            2.8718557 = fieldWeight in 4122, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.189939 = idf(docFreq=11, maxDocs=43254)
              0.3125 = fieldNorm(doc=4122)
    
  2. Cheung, W.; Hsu, C.: ¬The model-assisted global query system for multiple databases in distributed enterprises (1996) 2.03
    2.0252218 = sum of:
      2.0252218 = product of:
        4.0504436 = sum of:
          4.0504436 = weight(author_txt:cheung in 349) [ClassicSimilarity], result of:
            4.0504436 = score(doc=349,freq=1.0), product of:
              0.88149524 = queryWeight, product of:
                1.3663131 = boost
                9.189939 = idf(docFreq=11, maxDocs=43254)
                0.07020324 = queryNorm
              4.5949693 = fieldWeight in 349, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.189939 = idf(docFreq=11, maxDocs=43254)
                0.5 = fieldNorm(doc=349)
        0.5 = coord(1/2)
    
  3. Cheung, C.M.K.; Lee, M.K.O.: Understanding consumer trust in Internet shopping : a multidisciplinary approach (2006) 2.03
    2.0252218 = sum of:
      2.0252218 = product of:
        4.0504436 = sum of:
          4.0504436 = weight(author_txt:cheung in 281) [ClassicSimilarity], result of:
            4.0504436 = score(doc=281,freq=1.0), product of:
              0.88149524 = queryWeight, product of:
                1.3663131 = boost
                9.189939 = idf(docFreq=11, maxDocs=43254)
                0.07020324 = queryNorm
              4.5949693 = fieldWeight in 281, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.189939 = idf(docFreq=11, maxDocs=43254)
                0.5 = fieldNorm(doc=281)
        0.5 = coord(1/2)
    
  4. Cheung, C.M.K.; Lee, M.K.O.: ¬The structure of Web-based information systems satisfaction : testing of competing models (2008) 2.03
    2.0252218 = sum of:
      2.0252218 = product of:
        4.0504436 = sum of:
          4.0504436 = weight(author_txt:cheung in 4006) [ClassicSimilarity], result of:
            4.0504436 = score(doc=4006,freq=1.0), product of:
              0.88149524 = queryWeight, product of:
                1.3663131 = boost
                9.189939 = idf(docFreq=11, maxDocs=43254)
                0.07020324 = queryNorm
              4.5949693 = fieldWeight in 4006, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.189939 = idf(docFreq=11, maxDocs=43254)
                0.5 = fieldNorm(doc=4006)
        0.5 = coord(1/2)
    
  5. Cheung, C.M.K.; Lee, M.K.O.: User satisfaction with an internet-based portal : an asymmetric and nonlinear approach (2009) 2.03
    2.0252218 = sum of:
      2.0252218 = product of:
        4.0504436 = sum of:
          4.0504436 = weight(author_txt:cheung in 4702) [ClassicSimilarity], result of:
            4.0504436 = score(doc=4702,freq=1.0), product of:
              0.88149524 = queryWeight, product of:
                1.3663131 = boost
                9.189939 = idf(docFreq=11, maxDocs=43254)
                0.07020324 = queryNorm
              4.5949693 = fieldWeight in 4702, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.189939 = idf(docFreq=11, maxDocs=43254)
                0.5 = fieldNorm(doc=4702)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Alexander, F.: Assessing information taxonomies using epistemology and the sociology of science (2012) 0.21
    0.20686658 = sum of:
      0.20686658 = product of:
        0.7388092 = sum of:
          0.0121300565 = weight(abstract_txt:process in 1862) [ClassicSimilarity], result of:
            0.0121300565 = score(doc=1862,freq=1.0), product of:
              0.05457854 = queryWeight, product of:
                1.0986465 = boost
                4.063992 = idf(docFreq=2019, maxDocs=43254)
                0.012223937 = queryNorm
              0.22224957 = fieldWeight in 1862, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.063992 = idf(docFreq=2019, maxDocs=43254)
                0.0546875 = fieldNorm(doc=1862)
          0.017895771 = weight(abstract_txt:proposed in 1862) [ClassicSimilarity], result of:
            0.017895771 = score(doc=1862,freq=1.0), product of:
              0.070731625 = queryWeight, product of:
                1.250702 = boost
                4.6264586 = idf(docFreq=1150, maxDocs=43254)
                0.012223937 = queryNorm
              0.25300947 = fieldWeight in 1862, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6264586 = idf(docFreq=1150, maxDocs=43254)
                0.0546875 = fieldNorm(doc=1862)
          0.018058706 = weight(abstract_txt:approaches in 1862) [ClassicSimilarity], result of:
            0.018058706 = score(doc=1862,freq=1.0), product of:
              0.0711603 = queryWeight, product of:
                1.2544863 = boost
                4.640457 = idf(docFreq=1134, maxDocs=43254)
                0.012223937 = queryNorm
              0.253775 = fieldWeight in 1862, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.640457 = idf(docFreq=1134, maxDocs=43254)
                0.0546875 = fieldNorm(doc=1862)
          0.013893261 = weight(abstract_txt:into in 1862) [ClassicSimilarity], result of:
            0.013893261 = score(doc=1862,freq=1.0), product of:
              0.06839325 = queryWeight, product of:
                1.5062577 = boost
                3.7145214 = idf(docFreq=2864, maxDocs=43254)
                0.012223937 = queryNorm
              0.20313789 = fieldWeight in 1862, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7145214 = idf(docFreq=2864, maxDocs=43254)
                0.0546875 = fieldNorm(doc=1862)
          0.02487607 = weight(abstract_txt:approach in 1862) [ClassicSimilarity], result of:
            0.02487607 = score(doc=1862,freq=3.0), product of:
              0.06992372 = queryWeight, product of:
                1.5230176 = boost
                3.7558525 = idf(docFreq=2748, maxDocs=43254)
                0.012223937 = queryNorm
              0.3557601 = fieldWeight in 1862, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.7558525 = idf(docFreq=2748, maxDocs=43254)
                0.0546875 = fieldNorm(doc=1862)
          0.059762362 = weight(abstract_txt:construction in 1862) [ClassicSimilarity], result of:
            0.059762362 = score(doc=1862,freq=1.0), product of:
              0.1991018 = queryWeight, product of:
                2.9675608 = boost
                5.4886365 = idf(docFreq=485, maxDocs=43254)
                0.012223937 = queryNorm
              0.3001598 = fieldWeight in 1862, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4886365 = idf(docFreq=485, maxDocs=43254)
                0.0546875 = fieldNorm(doc=1862)
          0.592193 = weight(abstract_txt:taxonomy in 1862) [ClassicSimilarity], result of:
            0.592193 = score(doc=1862,freq=6.0), product of:
              0.68605953 = queryWeight, product of:
                8.709895 = boost
                6.4437366 = idf(docFreq=186, maxDocs=43254)
                0.012223937 = queryNorm
              0.8631802 = fieldWeight in 1862, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.4437366 = idf(docFreq=186, maxDocs=43254)
                0.0546875 = fieldNorm(doc=1862)
        0.28 = coord(7/25)
    
  2. Esteban, M.A.: ¬Los lenguajes documentales ante el paso de la organizacion de la realidad y el saber a la organizacion del conocimiento (1995) 0.19
    0.1880964 = sum of:
      0.1880964 = product of:
        0.78373504 = sum of:
          0.078208484 = weight(abstract_txt:permits in 799) [ClassicSimilarity], result of:
            0.078208484 = score(doc=799,freq=1.0), product of:
              0.09453355 = queryWeight, product of:
                1.0224098 = boost
                7.563971 = idf(docFreq=60, maxDocs=43254)
                0.012223937 = queryNorm
              0.8273093 = fieldWeight in 799, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.563971 = idf(docFreq=60, maxDocs=43254)
                0.109375 = fieldNorm(doc=799)
          0.036117412 = weight(abstract_txt:approaches in 799) [ClassicSimilarity], result of:
            0.036117412 = score(doc=799,freq=1.0), product of:
              0.0711603 = queryWeight, product of:
                1.2544863 = boost
                4.640457 = idf(docFreq=1134, maxDocs=43254)
                0.012223937 = queryNorm
              0.50755 = fieldWeight in 799, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.640457 = idf(docFreq=1134, maxDocs=43254)
                0.109375 = fieldNorm(doc=799)
          0.05448936 = weight(abstract_txt:proposes in 799) [ClassicSimilarity], result of:
            0.05448936 = score(doc=799,freq=1.0), product of:
              0.09360546 = queryWeight, product of:
                1.4387907 = boost
                5.3222156 = idf(docFreq=573, maxDocs=43254)
                0.012223937 = queryNorm
              0.5821173 = fieldWeight in 799, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3222156 = idf(docFreq=573, maxDocs=43254)
                0.109375 = fieldNorm(doc=799)
          0.028724412 = weight(abstract_txt:approach in 799) [ClassicSimilarity], result of:
            0.028724412 = score(doc=799,freq=1.0), product of:
              0.06992372 = queryWeight, product of:
                1.5230176 = boost
                3.7558525 = idf(docFreq=2748, maxDocs=43254)
                0.012223937 = queryNorm
              0.41079637 = fieldWeight in 799, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7558525 = idf(docFreq=2748, maxDocs=43254)
                0.109375 = fieldNorm(doc=799)
          0.102671824 = weight(abstract_txt:hierarchical in 799) [ClassicSimilarity], result of:
            0.102671824 = score(doc=799,freq=1.0), product of:
              0.16346495 = queryWeight, product of:
                2.328654 = boost
                5.7426 = idf(docFreq=376, maxDocs=43254)
                0.012223937 = queryNorm
              0.6280969 = fieldWeight in 799, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7426 = idf(docFreq=376, maxDocs=43254)
                0.109375 = fieldNorm(doc=799)
          0.48352355 = weight(abstract_txt:taxonomy in 799) [ClassicSimilarity], result of:
            0.48352355 = score(doc=799,freq=1.0), product of:
              0.68605953 = queryWeight, product of:
                8.709895 = boost
                6.4437366 = idf(docFreq=186, maxDocs=43254)
                0.012223937 = queryNorm
              0.7047837 = fieldWeight in 799, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4437366 = idf(docFreq=186, maxDocs=43254)
                0.109375 = fieldNorm(doc=799)
        0.24 = coord(6/25)
    
  3. Wu, Y.; Yang, L.: Construction and evaluation of an oil spill semantic relation taxonomy for supporting knowledge discovery (2015) 0.18
    0.17936599 = sum of:
      0.17936599 = product of:
        1.1210375 = sum of:
          0.028465837 = weight(abstract_txt:method in 3667) [ClassicSimilarity], result of:
            0.028465837 = score(doc=3667,freq=1.0), product of:
              0.06728845 = queryWeight, product of:
                1.2198806 = boost
                4.5124474 = idf(docFreq=1289, maxDocs=43254)
                0.012223937 = queryNorm
              0.42304194 = fieldWeight in 3667, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5124474 = idf(docFreq=1289, maxDocs=43254)
                0.09375 = fieldNorm(doc=3667)
          0.030678462 = weight(abstract_txt:proposed in 3667) [ClassicSimilarity], result of:
            0.030678462 = score(doc=3667,freq=1.0), product of:
              0.070731625 = queryWeight, product of:
                1.250702 = boost
                4.6264586 = idf(docFreq=1150, maxDocs=43254)
                0.012223937 = queryNorm
              0.43373048 = fieldWeight in 3667, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6264586 = idf(docFreq=1150, maxDocs=43254)
                0.09375 = fieldNorm(doc=3667)
          0.046705164 = weight(abstract_txt:proposes in 3667) [ClassicSimilarity], result of:
            0.046705164 = score(doc=3667,freq=1.0), product of:
              0.09360546 = queryWeight, product of:
                1.4387907 = boost
                5.3222156 = idf(docFreq=573, maxDocs=43254)
                0.012223937 = queryNorm
              0.4989577 = fieldWeight in 3667, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3222156 = idf(docFreq=573, maxDocs=43254)
                0.09375 = fieldNorm(doc=3667)
          1.015188 = weight(abstract_txt:taxonomy in 3667) [ClassicSimilarity], result of:
            1.015188 = score(doc=3667,freq=6.0), product of:
              0.68605953 = queryWeight, product of:
                8.709895 = boost
                6.4437366 = idf(docFreq=186, maxDocs=43254)
                0.012223937 = queryNorm
              1.4797375 = fieldWeight in 3667, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.4437366 = idf(docFreq=186, maxDocs=43254)
                0.09375 = fieldNorm(doc=3667)
        0.16 = coord(4/25)
    
  4. Wang, Z.; Khoo, C.S.G.; Chaudhry, A.S.: Evaluation of the navigation effectiveness of an organizational taxonomy built on a general classification scheme and domain thesauri (2014) 0.15
    0.14919613 = sum of:
      0.14919613 = product of:
        0.93247586 = sum of:
          0.020517437 = weight(abstract_txt:approach in 2716) [ClassicSimilarity], result of:
            0.020517437 = score(doc=2716,freq=1.0), product of:
              0.06992372 = queryWeight, product of:
                1.5230176 = boost
                3.7558525 = idf(docFreq=2748, maxDocs=43254)
                0.012223937 = queryNorm
              0.29342598 = fieldWeight in 2716, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7558525 = idf(docFreq=2748, maxDocs=43254)
                0.078125 = fieldNorm(doc=2716)
          0.07333702 = weight(abstract_txt:hierarchical in 2716) [ClassicSimilarity], result of:
            0.07333702 = score(doc=2716,freq=1.0), product of:
              0.16346495 = queryWeight, product of:
                2.328654 = boost
                5.7426 = idf(docFreq=376, maxDocs=43254)
                0.012223937 = queryNorm
              0.4486406 = fieldWeight in 2716, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7426 = idf(docFreq=376, maxDocs=43254)
                0.078125 = fieldNorm(doc=2716)
          0.1478735 = weight(abstract_txt:construction in 2716) [ClassicSimilarity], result of:
            0.1478735 = score(doc=2716,freq=3.0), product of:
              0.1991018 = queryWeight, product of:
                2.9675608 = boost
                5.4886365 = idf(docFreq=485, maxDocs=43254)
                0.012223937 = queryNorm
              0.74270296 = fieldWeight in 2716, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.4886365 = idf(docFreq=485, maxDocs=43254)
                0.078125 = fieldNorm(doc=2716)
          0.6907479 = weight(abstract_txt:taxonomy in 2716) [ClassicSimilarity], result of:
            0.6907479 = score(doc=2716,freq=4.0), product of:
              0.68605953 = queryWeight, product of:
                8.709895 = boost
                6.4437366 = idf(docFreq=186, maxDocs=43254)
                0.012223937 = queryNorm
              1.0068338 = fieldWeight in 2716, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.4437366 = idf(docFreq=186, maxDocs=43254)
                0.078125 = fieldNorm(doc=2716)
        0.16 = coord(4/25)
    
  5. Wang, Z.; Chaudhry, A.S.; Khoo, C.: Support from bibliographic tools to build an organizational taxonomy for navigation : use of a general classification scheme and domain thesauri (2010) 0.14
    0.14354648 = sum of:
      0.14354648 = product of:
        0.89716554 = sum of:
          0.017328652 = weight(abstract_txt:process in 175) [ClassicSimilarity], result of:
            0.017328652 = score(doc=175,freq=1.0), product of:
              0.05457854 = queryWeight, product of:
                1.0986465 = boost
                4.063992 = idf(docFreq=2019, maxDocs=43254)
                0.012223937 = queryNorm
              0.31749937 = fieldWeight in 175, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.063992 = idf(docFreq=2019, maxDocs=43254)
                0.078125 = fieldNorm(doc=175)
          0.1037142 = weight(abstract_txt:hierarchical in 175) [ClassicSimilarity], result of:
            0.1037142 = score(doc=175,freq=2.0), product of:
              0.16346495 = queryWeight, product of:
                2.328654 = boost
                5.7426 = idf(docFreq=376, maxDocs=43254)
                0.012223937 = queryNorm
              0.6344736 = fieldWeight in 175, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7426 = idf(docFreq=376, maxDocs=43254)
                0.078125 = fieldNorm(doc=175)
          0.085374795 = weight(abstract_txt:construction in 175) [ClassicSimilarity], result of:
            0.085374795 = score(doc=175,freq=1.0), product of:
              0.1991018 = queryWeight, product of:
                2.9675608 = boost
                5.4886365 = idf(docFreq=485, maxDocs=43254)
                0.012223937 = queryNorm
              0.42879972 = fieldWeight in 175, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4886365 = idf(docFreq=485, maxDocs=43254)
                0.078125 = fieldNorm(doc=175)
          0.6907479 = weight(abstract_txt:taxonomy in 175) [ClassicSimilarity], result of:
            0.6907479 = score(doc=175,freq=4.0), product of:
              0.68605953 = queryWeight, product of:
                8.709895 = boost
                6.4437366 = idf(docFreq=186, maxDocs=43254)
                0.012223937 = queryNorm
              1.0068338 = fieldWeight in 175, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.4437366 = idf(docFreq=186, maxDocs=43254)
                0.078125 = fieldNorm(doc=175)
        0.16 = coord(4/25)