Document (#36221)

Author
Tsui, E.
Wang, W.M.
Cheung, C.F.
Lau, A.S.M.
Title
¬A concept-relationship acquisition and inference approach for hierarchical taxonomy construction from tags
Source
Information processing and management. 46(2010) no.1, S.44-57
Year
2010
Abstract
Taxonomy construction is a resource-demanding, top-down, and time consuming effort. It does not always cater for the prevailing context of the captured information. This paper proposes a novel approach to automatically convert tags into a hierarchical taxonomy. Folksonomy describes the process by which many users add metadata in the form of keywords or tags to shared content. Using folksonomy as a knowledge source for nominating tags, the proposed method first converts the tags into a hierarchy. This serves to harness a core set of taxonomy terms; the generated hierarchical structure facilitates users' information navigation behavior and permits personalizations. Newly acquired tags are then progressively integrated into a taxonomy in a largely automated way to complete the taxonomy creation process. Common taxonomy construction techniques are based on 3 main approaches: clustering, lexico-syntactic pattern matching, and automatic acquisition from machine-readable dictionaries. In contrast to these prevailing approaches, this paper proposes a taxonomy construction analysis based on heuristic rules and deep syntactic analysis. The proposed method requires only a relatively small corpus to create a preliminary taxonomy. The approach has been evaluated using an expert-defined taxonomy in the environmental protection domain and encouraging results were yielded.
Theme
Social tagging

Similar documents (author)

  1. Wang, W.M.; Cheung, C.F.; Lee, W.B.; Kwok, S.K.: Mining knowledge from natural language texts using fuzzy associated concept mapping (2008) 3.49
    3.4948802 = sum of:
      3.4948802 = sum of:
        0.92757046 = weight(author_txt:wang in 2121) [ClassicSimilarity], result of:
          0.92757046 = score(doc=2121,freq=1.0), product of:
            0.45239833 = queryWeight, product of:
              6.5610886 = idf(docFreq=169, maxDocs=44218)
              0.06895172 = queryNorm
            2.0503402 = fieldWeight in 2121, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              6.5610886 = idf(docFreq=169, maxDocs=44218)
              0.3125 = fieldNorm(doc=2121)
        2.5673099 = weight(author_txt:cheung in 2121) [ClassicSimilarity], result of:
          2.5673099 = score(doc=2121,freq=1.0), product of:
            0.89181596 = queryWeight, product of:
              1.4040323 = boost
              9.211981 = idf(docFreq=11, maxDocs=44218)
              0.06895172 = queryNorm
            2.8787441 = fieldWeight in 2121, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.211981 = idf(docFreq=11, maxDocs=44218)
              0.3125 = fieldNorm(doc=2121)
    
  2. Cheung, W.; Hsu, C.: ¬The model-assisted global query system for multiple databases in distributed enterprises (1996) 2.05
    2.0538478 = sum of:
      2.0538478 = product of:
        4.1076956 = sum of:
          4.1076956 = weight(author_txt:cheung in 7279) [ClassicSimilarity], result of:
            4.1076956 = score(doc=7279,freq=1.0), product of:
              0.89181596 = queryWeight, product of:
                1.4040323 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.06895172 = queryNorm
              4.6059904 = fieldWeight in 7279, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.5 = fieldNorm(doc=7279)
        0.5 = coord(1/2)
    
  3. Cheung, C.M.K.; Lee, M.K.O.: Understanding consumer trust in Internet shopping : a multidisciplinary approach (2006) 2.05
    2.0538478 = sum of:
      2.0538478 = product of:
        4.1076956 = sum of:
          4.1076956 = weight(author_txt:cheung in 5280) [ClassicSimilarity], result of:
            4.1076956 = score(doc=5280,freq=1.0), product of:
              0.89181596 = queryWeight, product of:
                1.4040323 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.06895172 = queryNorm
              4.6059904 = fieldWeight in 5280, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.5 = fieldNorm(doc=5280)
        0.5 = coord(1/2)
    
  4. Cheung, C.M.K.; Lee, M.K.O.: ¬The structure of Web-based information systems satisfaction : testing of competing models (2008) 2.05
    2.0538478 = sum of:
      2.0538478 = product of:
        4.1076956 = sum of:
          4.1076956 = weight(author_txt:cheung in 2005) [ClassicSimilarity], result of:
            4.1076956 = score(doc=2005,freq=1.0), product of:
              0.89181596 = queryWeight, product of:
                1.4040323 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.06895172 = queryNorm
              4.6059904 = fieldWeight in 2005, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.5 = fieldNorm(doc=2005)
        0.5 = coord(1/2)
    
  5. Cheung, C.M.K.; Lee, M.K.O.: User satisfaction with an internet-based portal : an asymmetric and nonlinear approach (2009) 2.05
    2.0538478 = sum of:
      2.0538478 = product of:
        4.1076956 = sum of:
          4.1076956 = weight(author_txt:cheung in 2701) [ClassicSimilarity], result of:
            4.1076956 = score(doc=2701,freq=1.0), product of:
              0.89181596 = queryWeight, product of:
                1.4040323 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.06895172 = queryNorm
              4.6059904 = fieldWeight in 2701, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.5 = fieldNorm(doc=2701)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Alexander, F.: Assessing information taxonomies using epistemology and the sociology of science (2012) 0.21
    0.20575517 = sum of:
      0.20575517 = product of:
        0.7348399 = sum of:
          0.012050312 = weight(abstract_txt:process in 397) [ClassicSimilarity], result of:
            0.012050312 = score(doc=397,freq=1.0), product of:
              0.054393467 = queryWeight, product of:
                1.0939177 = boost
                4.0510116 = idf(docFreq=2091, maxDocs=44218)
                0.012274353 = queryNorm
              0.22153969 = fieldWeight in 397, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0510116 = idf(docFreq=2091, maxDocs=44218)
                0.0546875 = fieldNorm(doc=397)
          0.017741103 = weight(abstract_txt:approaches in 397) [ClassicSimilarity], result of:
            0.017741103 = score(doc=397,freq=1.0), product of:
              0.07039389 = queryWeight, product of:
                1.2444538 = boost
                4.6084785 = idf(docFreq=1197, maxDocs=44218)
                0.012274353 = queryNorm
              0.25202617 = fieldWeight in 397, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6084785 = idf(docFreq=1197, maxDocs=44218)
                0.0546875 = fieldNorm(doc=397)
          0.017750746 = weight(abstract_txt:proposed in 397) [ClassicSimilarity], result of:
            0.017750746 = score(doc=397,freq=1.0), product of:
              0.07041939 = queryWeight, product of:
                1.2446792 = boost
                4.6093135 = idf(docFreq=1196, maxDocs=44218)
                0.012274353 = queryNorm
              0.25207183 = fieldWeight in 397, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6093135 = idf(docFreq=1196, maxDocs=44218)
                0.0546875 = fieldNorm(doc=397)
          0.013804972 = weight(abstract_txt:into in 397) [ClassicSimilarity], result of:
            0.013804972 = score(doc=397,freq=1.0), product of:
              0.06817137 = queryWeight, product of:
                1.499885 = boost
                3.7029297 = idf(docFreq=2962, maxDocs=44218)
                0.012274353 = queryNorm
              0.20250396 = fieldWeight in 397, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7029297 = idf(docFreq=2962, maxDocs=44218)
                0.0546875 = fieldNorm(doc=397)
          0.024741689 = weight(abstract_txt:approach in 397) [ClassicSimilarity], result of:
            0.024741689 = score(doc=397,freq=3.0), product of:
              0.06974142 = queryWeight, product of:
                1.5170585 = boost
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.012274353 = queryNorm
              0.35476318 = fieldWeight in 397, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.0546875 = fieldNorm(doc=397)
          0.058835823 = weight(abstract_txt:construction in 397) [ClassicSimilarity], result of:
            0.058835823 = score(doc=397,freq=1.0), product of:
              0.19723581 = queryWeight, product of:
                2.945908 = boost
                5.4546638 = idf(docFreq=513, maxDocs=44218)
                0.012274353 = queryNorm
              0.29830194 = fieldWeight in 397, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4546638 = idf(docFreq=513, maxDocs=44218)
                0.0546875 = fieldNorm(doc=397)
          0.5899153 = weight(abstract_txt:taxonomy in 397) [ClassicSimilarity], result of:
            0.5899153 = score(doc=397,freq=6.0), product of:
              0.6849842 = queryWeight, product of:
                8.680337 = boost
                6.429029 = idf(docFreq=193, maxDocs=44218)
                0.012274353 = queryNorm
              0.86121005 = fieldWeight in 397, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.429029 = idf(docFreq=193, maxDocs=44218)
                0.0546875 = fieldNorm(doc=397)
        0.28 = coord(7/25)
    
  2. Esteban, M.A.: ¬Los lenguajes documentales ante el paso de la organizacion de la realidad y el saber a la organizacion del conocimiento (1995) 0.19
    0.18740581 = sum of:
      0.18740581 = product of:
        0.78085756 = sum of:
          0.07862361 = weight(abstract_txt:permits in 6730) [ClassicSimilarity], result of:
            0.07862361 = score(doc=6730,freq=1.0), product of:
              0.09496273 = queryWeight, product of:
                1.0220518 = boost
                7.5697527 = idf(docFreq=61, maxDocs=44218)
                0.012274353 = queryNorm
              0.8279417 = fieldWeight in 6730, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5697527 = idf(docFreq=61, maxDocs=44218)
                0.109375 = fieldNorm(doc=6730)
          0.035482205 = weight(abstract_txt:approaches in 6730) [ClassicSimilarity], result of:
            0.035482205 = score(doc=6730,freq=1.0), product of:
              0.07039389 = queryWeight, product of:
                1.2444538 = boost
                4.6084785 = idf(docFreq=1197, maxDocs=44218)
                0.012274353 = queryNorm
              0.50405234 = fieldWeight in 6730, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6084785 = idf(docFreq=1197, maxDocs=44218)
                0.109375 = fieldNorm(doc=6730)
          0.05417492 = weight(abstract_txt:proposes in 6730) [ClassicSimilarity], result of:
            0.05417492 = score(doc=6730,freq=1.0), product of:
              0.09333834 = queryWeight, product of:
                1.432984 = boost
                5.3066463 = idf(docFreq=595, maxDocs=44218)
                0.012274353 = queryNorm
              0.5804144 = fieldWeight in 6730, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3066463 = idf(docFreq=595, maxDocs=44218)
                0.109375 = fieldNorm(doc=6730)
          0.028569242 = weight(abstract_txt:approach in 6730) [ClassicSimilarity], result of:
            0.028569242 = score(doc=6730,freq=1.0), product of:
              0.06974142 = queryWeight, product of:
                1.5170585 = boost
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.012274353 = queryNorm
              0.40964526 = fieldWeight in 6730, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.109375 = fieldNorm(doc=6730)
          0.102343775 = weight(abstract_txt:hierarchical in 6730) [ClassicSimilarity], result of:
            0.102343775 = score(doc=6730,freq=1.0), product of:
              0.16327986 = queryWeight, product of:
                2.321257 = boost
                5.7307405 = idf(docFreq=389, maxDocs=44218)
                0.012274353 = queryNorm
              0.62679976 = fieldWeight in 6730, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7307405 = idf(docFreq=389, maxDocs=44218)
                0.109375 = fieldNorm(doc=6730)
          0.48166382 = weight(abstract_txt:taxonomy in 6730) [ClassicSimilarity], result of:
            0.48166382 = score(doc=6730,freq=1.0), product of:
              0.6849842 = queryWeight, product of:
                8.680337 = boost
                6.429029 = idf(docFreq=193, maxDocs=44218)
                0.012274353 = queryNorm
              0.70317507 = fieldWeight in 6730, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.429029 = idf(docFreq=193, maxDocs=44218)
                0.109375 = fieldNorm(doc=6730)
        0.24 = coord(6/25)
    
  3. Wu, Y.; Yang, L.: Construction and evaluation of an oil spill semantic relation taxonomy for supporting knowledge discovery (2015) 0.18
    0.17863722 = sum of:
      0.17863722 = product of:
        1.1164826 = sum of:
          0.028333718 = weight(abstract_txt:method in 2202) [ClassicSimilarity], result of:
            0.028333718 = score(doc=2202,freq=1.0), product of:
              0.06714723 = queryWeight, product of:
                1.2154171 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.012274353 = queryNorm
              0.42196405 = fieldWeight in 2202, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.09375 = fieldNorm(doc=2202)
          0.030429848 = weight(abstract_txt:proposed in 2202) [ClassicSimilarity], result of:
            0.030429848 = score(doc=2202,freq=1.0), product of:
              0.07041939 = queryWeight, product of:
                1.2446792 = boost
                4.6093135 = idf(docFreq=1196, maxDocs=44218)
                0.012274353 = queryNorm
              0.43212312 = fieldWeight in 2202, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6093135 = idf(docFreq=1196, maxDocs=44218)
                0.09375 = fieldNorm(doc=2202)
          0.046435647 = weight(abstract_txt:proposes in 2202) [ClassicSimilarity], result of:
            0.046435647 = score(doc=2202,freq=1.0), product of:
              0.09333834 = queryWeight, product of:
                1.432984 = boost
                5.3066463 = idf(docFreq=595, maxDocs=44218)
                0.012274353 = queryNorm
              0.4974981 = fieldWeight in 2202, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3066463 = idf(docFreq=595, maxDocs=44218)
                0.09375 = fieldNorm(doc=2202)
          1.0112834 = weight(abstract_txt:taxonomy in 2202) [ClassicSimilarity], result of:
            1.0112834 = score(doc=2202,freq=6.0), product of:
              0.6849842 = queryWeight, product of:
                8.680337 = boost
                6.429029 = idf(docFreq=193, maxDocs=44218)
                0.012274353 = queryNorm
              1.4763601 = fieldWeight in 2202, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.429029 = idf(docFreq=193, maxDocs=44218)
                0.09375 = fieldNorm(doc=2202)
        0.16 = coord(4/25)
    
  4. Cheng, Y.-Y.; Xia, Y.: ¬A systematic review of methods for aligning, mapping, merging taxonomies in information sciences (2023) 0.16
    0.16360405 = sum of:
      0.16360405 = product of:
        0.8180202 = sum of:
          0.013771785 = weight(abstract_txt:process in 1029) [ClassicSimilarity], result of:
            0.013771785 = score(doc=1029,freq=1.0), product of:
              0.054393467 = queryWeight, product of:
                1.0939177 = boost
                4.0510116 = idf(docFreq=2091, maxDocs=44218)
                0.012274353 = queryNorm
              0.25318822 = fieldWeight in 1029, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0510116 = idf(docFreq=2091, maxDocs=44218)
                0.0625 = fieldNorm(doc=1029)
          0.020275546 = weight(abstract_txt:approaches in 1029) [ClassicSimilarity], result of:
            0.020275546 = score(doc=1029,freq=1.0), product of:
              0.07039389 = queryWeight, product of:
                1.2444538 = boost
                4.6084785 = idf(docFreq=1197, maxDocs=44218)
                0.012274353 = queryNorm
              0.2880299 = fieldWeight in 1029, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6084785 = idf(docFreq=1197, maxDocs=44218)
                0.0625 = fieldNorm(doc=1029)
          0.015777111 = weight(abstract_txt:into in 1029) [ClassicSimilarity], result of:
            0.015777111 = score(doc=1029,freq=1.0), product of:
              0.06817137 = queryWeight, product of:
                1.499885 = boost
                3.7029297 = idf(docFreq=2962, maxDocs=44218)
                0.012274353 = queryNorm
              0.23143311 = fieldWeight in 1029, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7029297 = idf(docFreq=2962, maxDocs=44218)
                0.0625 = fieldNorm(doc=1029)
          0.03998861 = weight(abstract_txt:approach in 1029) [ClassicSimilarity], result of:
            0.03998861 = score(doc=1029,freq=6.0), product of:
              0.06974142 = queryWeight, product of:
                1.5170585 = boost
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.012274353 = queryNorm
              0.5733839 = fieldWeight in 1029, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.0625 = fieldNorm(doc=1029)
          0.7282072 = weight(abstract_txt:taxonomy in 1029) [ClassicSimilarity], result of:
            0.7282072 = score(doc=1029,freq=7.0), product of:
              0.6849842 = queryWeight, product of:
                8.680337 = boost
                6.429029 = idf(docFreq=193, maxDocs=44218)
                0.012274353 = queryNorm
              1.0631007 = fieldWeight in 1029, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.429029 = idf(docFreq=193, maxDocs=44218)
                0.0625 = fieldNorm(doc=1029)
        0.2 = coord(5/25)
    
  5. Wang, Z.; Khoo, C.S.G.; Chaudhry, A.S.: Evaluation of the navigation effectiveness of an organizational taxonomy built on a general classification scheme and domain thesauri (2014) 0.15
    0.14834902 = sum of:
      0.14834902 = product of:
        0.92718136 = sum of:
          0.0204066 = weight(abstract_txt:approach in 1251) [ClassicSimilarity], result of:
            0.0204066 = score(doc=1251,freq=1.0), product of:
              0.06974142 = queryWeight, product of:
                1.5170585 = boost
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.012274353 = queryNorm
              0.29260373 = fieldWeight in 1251, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.078125 = fieldNorm(doc=1251)
          0.0731027 = weight(abstract_txt:hierarchical in 1251) [ClassicSimilarity], result of:
            0.0731027 = score(doc=1251,freq=1.0), product of:
              0.16327986 = queryWeight, product of:
                2.321257 = boost
                5.7307405 = idf(docFreq=389, maxDocs=44218)
                0.012274353 = queryNorm
              0.4477141 = fieldWeight in 1251, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7307405 = idf(docFreq=389, maxDocs=44218)
                0.078125 = fieldNorm(doc=1251)
          0.1455809 = weight(abstract_txt:construction in 1251) [ClassicSimilarity], result of:
            0.1455809 = score(doc=1251,freq=3.0), product of:
              0.19723581 = queryWeight, product of:
                2.945908 = boost
                5.4546638 = idf(docFreq=513, maxDocs=44218)
                0.012274353 = queryNorm
              0.73810583 = fieldWeight in 1251, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.4546638 = idf(docFreq=513, maxDocs=44218)
                0.078125 = fieldNorm(doc=1251)
          0.68809116 = weight(abstract_txt:taxonomy in 1251) [ClassicSimilarity], result of:
            0.68809116 = score(doc=1251,freq=4.0), product of:
              0.6849842 = queryWeight, product of:
                8.680337 = boost
                6.429029 = idf(docFreq=193, maxDocs=44218)
                0.012274353 = queryNorm
              1.0045358 = fieldWeight in 1251, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.429029 = idf(docFreq=193, maxDocs=44218)
                0.078125 = fieldNorm(doc=1251)
        0.16 = coord(4/25)