Document (#36237)

Author
Yu, L.-C.
Wu, C.-H.
Chang, R.-Y.
Liu, C.-H.
Hovy, E.H.
Title
Annotation and verification of sense pools in OntoNotes
Source
Information processing and management. 46(2010) no.4, S.436-447
Year
2010
Abstract
The paper describes the OntoNotes, a multilingual (English, Chinese and Arabic) corpus with large-scale semantic annotations, including predicate-argument structure, word senses, ontology linking, and coreference. The underlying semantic model of OntoNotes involves word senses that are grouped into so-called sense pools, i.e., sets of near-synonymous senses of words. Such information is useful for many applications, including query expansion for information retrieval (IR) systems, (near-)duplicate detection for text summarization systems, and alternative word selection for writing support systems. Although a sense pool provides a set of near-synonymous senses of words, there is still no knowledge about whether two words in a pool are interchangeable in practical use. Therefore, this paper devises an unsupervised algorithm that incorporates Google n-grams and a statistical test to determine whether a word in a pool can be substituted by other words in the same pool. The n-gram features are used to measure the degree of context mismatch for a substitution. The statistical test is then applied to determine whether the substitution is adequate based on the degree of mismatch. The proposed method is compared with a supervised method, namely Linear Discriminant Analysis (LDA). Experimental results show that the proposed unsupervised method can achieve comparable performance with the supervised method.
Theme
Wissensrepräsentation
Multilinguale Probleme
Object
OntoNotes

Similar documents (author)

  1. Chang, R.: DBase, relational data models, and MARC records (1992) 4.78
    4.7836475 = sum of:
      4.7836475 = weight(author_txt:chang in 5057) [ClassicSimilarity], result of:
        4.7836475 = fieldWeight in 5057, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.653836 = idf(docFreq=56, maxDocs=44218)
          0.625 = fieldNorm(doc=5057)
    
  2. Chang, R.: ¬The development of indexing technology (1993) 4.78
    4.7836475 = sum of:
      4.7836475 = weight(author_txt:chang in 7024) [ClassicSimilarity], result of:
        4.7836475 = fieldWeight in 7024, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.653836 = idf(docFreq=56, maxDocs=44218)
          0.625 = fieldNorm(doc=7024)
    
  3. Chang, R.: Keyword searching and indexing (1993) 4.78
    4.7836475 = sum of:
      4.7836475 = weight(author_txt:chang in 7223) [ClassicSimilarity], result of:
        4.7836475 = fieldWeight in 7223, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.653836 = idf(docFreq=56, maxDocs=44218)
          0.625 = fieldNorm(doc=7223)
    
  4. Chang, R.H.: To classify or not to classify? : a new look at an old problem (1989) 4.78
    4.7836475 = sum of:
      4.7836475 = weight(author_txt:chang in 2510) [ClassicSimilarity], result of:
        4.7836475 = fieldWeight in 2510, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.653836 = idf(docFreq=56, maxDocs=44218)
          0.625 = fieldNorm(doc=2510)
    
  5. Chang, S.H.: ¬The current state of Web search engines (1999) 4.78
    4.7836475 = sum of:
      4.7836475 = weight(author_txt:chang in 509) [ClassicSimilarity], result of:
        4.7836475 = fieldWeight in 509, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.653836 = idf(docFreq=56, maxDocs=44218)
          0.625 = fieldNorm(doc=509)
    

Similar documents (content)

  1. Krovetz, R.; Croft, W.B.: Lexical ambiguity and information retrieval (1992) 0.20
    0.19703266 = sum of:
      0.19703266 = product of:
        0.82096946 = sum of:
          0.0429369 = weight(abstract_txt:determine in 4028) [ClassicSimilarity], result of:
            0.0429369 = score(doc=4028,freq=1.0), product of:
              0.087427564 = queryWeight, product of:
                1.170803 = boost
                5.2385488 = idf(docFreq=637, maxDocs=44218)
                0.014254552 = queryNorm
              0.49111396 = fieldWeight in 4028, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2385488 = idf(docFreq=637, maxDocs=44218)
                0.09375 = fieldNorm(doc=4028)
          0.050875254 = weight(abstract_txt:whether in 4028) [ClassicSimilarity], result of:
            0.050875254 = score(doc=4028,freq=1.0), product of:
              0.11206314 = queryWeight, product of:
                1.6234415 = boost
                4.8425326 = idf(docFreq=947, maxDocs=44218)
                0.014254552 = queryNorm
              0.45398742 = fieldWeight in 4028, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8425326 = idf(docFreq=947, maxDocs=44218)
                0.09375 = fieldNorm(doc=4028)
          0.08866577 = weight(abstract_txt:sense in 4028) [ClassicSimilarity], result of:
            0.08866577 = score(doc=4028,freq=1.0), product of:
              0.16229147 = queryWeight, product of:
                1.9536785 = boost
                5.8275905 = idf(docFreq=353, maxDocs=44218)
                0.014254552 = queryNorm
              0.5463366 = fieldWeight in 4028, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8275905 = idf(docFreq=353, maxDocs=44218)
                0.09375 = fieldNorm(doc=4028)
          0.091626495 = weight(abstract_txt:words in 4028) [ClassicSimilarity], result of:
            0.091626495 = score(doc=4028,freq=1.0), product of:
              0.18257949 = queryWeight, product of:
                2.3927681 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.014254552 = queryNorm
              0.5018444 = fieldWeight in 4028, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.09375 = fieldNorm(doc=4028)
          0.16614288 = weight(abstract_txt:word in 4028) [ClassicSimilarity], result of:
            0.16614288 = score(doc=4028,freq=3.0), product of:
              0.18824294 = queryWeight, product of:
                2.4295955 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.014254552 = queryNorm
              0.8825982 = fieldWeight in 4028, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.09375 = fieldNorm(doc=4028)
          0.38072214 = weight(abstract_txt:senses in 4028) [ClassicSimilarity], result of:
            0.38072214 = score(doc=4028,freq=1.0), product of:
              0.4718928 = queryWeight, product of:
                3.8467708 = boost
                8.6058445 = idf(docFreq=21, maxDocs=44218)
                0.014254552 = queryNorm
              0.8067979 = fieldWeight in 4028, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.6058445 = idf(docFreq=21, maxDocs=44218)
                0.09375 = fieldNorm(doc=4028)
        0.24 = coord(6/25)
    
  2. Kiela, D.; Clark, S.: Detecting compositionality of multi-word expressions using nearest neighbours in vector space models (2013) 0.17
    0.16760188 = sum of:
      0.16760188 = product of:
        0.6983412 = sum of:
          0.031212317 = weight(abstract_txt:semantic in 1161) [ClassicSimilarity], result of:
            0.031212317 = score(doc=1161,freq=1.0), product of:
              0.06377945 = queryWeight, product of:
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.014254552 = queryNorm
              0.4893789 = fieldWeight in 1161, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.109375 = fieldNorm(doc=1161)
          0.14936161 = weight(abstract_txt:substituted in 1161) [ClassicSimilarity], result of:
            0.14936161 = score(doc=1161,freq=1.0), product of:
              0.14375162 = queryWeight, product of:
                1.0615758 = boost
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.014254552 = queryNorm
              1.0390255 = fieldWeight in 1161, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.109375 = fieldNorm(doc=1161)
          0.144826 = weight(abstract_txt:supervised in 1161) [ClassicSimilarity], result of:
            0.144826 = score(doc=1161,freq=1.0), product of:
              0.17743029 = queryWeight, product of:
                1.6679134 = boost
                7.462781 = idf(docFreq=68, maxDocs=44218)
                0.014254552 = queryNorm
              0.8162417 = fieldWeight in 1161, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.462781 = idf(docFreq=68, maxDocs=44218)
                0.109375 = fieldNorm(doc=1161)
          0.15413392 = weight(abstract_txt:unsupervised in 1161) [ClassicSimilarity], result of:
            0.15413392 = score(doc=1161,freq=1.0), product of:
              0.18495336 = queryWeight, product of:
                1.7029063 = boost
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.014254552 = queryNorm
              0.8333664 = fieldWeight in 1161, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.109375 = fieldNorm(doc=1161)
          0.10689757 = weight(abstract_txt:words in 1161) [ClassicSimilarity], result of:
            0.10689757 = score(doc=1161,freq=1.0), product of:
              0.18257949 = queryWeight, product of:
                2.3927681 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.014254552 = queryNorm
              0.5854851 = fieldWeight in 1161, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.109375 = fieldNorm(doc=1161)
          0.111909755 = weight(abstract_txt:word in 1161) [ClassicSimilarity], result of:
            0.111909755 = score(doc=1161,freq=1.0), product of:
              0.18824294 = queryWeight, product of:
                2.4295955 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.014254552 = queryNorm
              0.5944964 = fieldWeight in 1161, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.109375 = fieldNorm(doc=1161)
        0.24 = coord(6/25)
    
  3. Green, R.: WordNet (2009) 0.16
    0.16047859 = sum of:
      0.16047859 = product of:
        0.80239296 = sum of:
          0.03783504 = weight(abstract_txt:semantic in 4696) [ClassicSimilarity], result of:
            0.03783504 = score(doc=4696,freq=2.0), product of:
              0.06377945 = queryWeight, product of:
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.014254552 = queryNorm
              0.5932168 = fieldWeight in 4696, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.09375 = fieldNorm(doc=4696)
          0.15951489 = weight(abstract_txt:synonymous in 4696) [ClassicSimilarity], result of:
            0.15951489 = score(doc=4696,freq=1.0), product of:
              0.20971465 = queryWeight, product of:
                1.8133181 = boost
                8.113368 = idf(docFreq=35, maxDocs=44218)
                0.014254552 = queryNorm
              0.7606282 = fieldWeight in 4696, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.113368 = idf(docFreq=35, maxDocs=44218)
                0.09375 = fieldNorm(doc=4696)
          0.08866577 = weight(abstract_txt:sense in 4696) [ClassicSimilarity], result of:
            0.08866577 = score(doc=4696,freq=1.0), product of:
              0.16229147 = queryWeight, product of:
                1.9536785 = boost
                5.8275905 = idf(docFreq=353, maxDocs=44218)
                0.014254552 = queryNorm
              0.5463366 = fieldWeight in 4696, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8275905 = idf(docFreq=353, maxDocs=44218)
                0.09375 = fieldNorm(doc=4696)
          0.1356551 = weight(abstract_txt:word in 4696) [ClassicSimilarity], result of:
            0.1356551 = score(doc=4696,freq=2.0), product of:
              0.18824294 = queryWeight, product of:
                2.4295955 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.014254552 = queryNorm
              0.72063845 = fieldWeight in 4696, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.09375 = fieldNorm(doc=4696)
          0.38072214 = weight(abstract_txt:senses in 4696) [ClassicSimilarity], result of:
            0.38072214 = score(doc=4696,freq=1.0), product of:
              0.4718928 = queryWeight, product of:
                3.8467708 = boost
                8.6058445 = idf(docFreq=21, maxDocs=44218)
                0.014254552 = queryNorm
              0.8067979 = fieldWeight in 4696, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.6058445 = idf(docFreq=21, maxDocs=44218)
                0.09375 = fieldNorm(doc=4696)
        0.2 = coord(5/25)
    
  4. Cribbin, T.: Discovering latent topical structure by second-order similarity analysis (2011) 0.14
    0.1434304 = sum of:
      0.1434304 = product of:
        0.44821998 = sum of:
          0.025223361 = weight(abstract_txt:semantic in 4470) [ClassicSimilarity], result of:
            0.025223361 = score(doc=4470,freq=2.0), product of:
              0.06377945 = queryWeight, product of:
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.014254552 = queryNorm
              0.39547786 = fieldWeight in 4470, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.0625 = fieldNorm(doc=4470)
          0.011862556 = weight(abstract_txt:systems in 4470) [ClassicSimilarity], result of:
            0.011862556 = score(doc=4470,freq=1.0), product of:
              0.055629447 = queryWeight, product of:
                1.1438198 = boost
                3.4118783 = idf(docFreq=3963, maxDocs=44218)
                0.014254552 = queryNorm
              0.2132424 = fieldWeight in 4470, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4118783 = idf(docFreq=3963, maxDocs=44218)
                0.0625 = fieldNorm(doc=4470)
          0.028624598 = weight(abstract_txt:determine in 4470) [ClassicSimilarity], result of:
            0.028624598 = score(doc=4470,freq=1.0), product of:
              0.087427564 = queryWeight, product of:
                1.170803 = boost
                5.2385488 = idf(docFreq=637, maxDocs=44218)
                0.014254552 = queryNorm
              0.3274093 = fieldWeight in 4470, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2385488 = idf(docFreq=637, maxDocs=44218)
                0.0625 = fieldNorm(doc=4470)
          0.10634326 = weight(abstract_txt:synonymous in 4470) [ClassicSimilarity], result of:
            0.10634326 = score(doc=4470,freq=1.0), product of:
              0.20971465 = queryWeight, product of:
                1.8133181 = boost
                8.113368 = idf(docFreq=35, maxDocs=44218)
                0.014254552 = queryNorm
              0.5070855 = fieldWeight in 4470, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.113368 = idf(docFreq=35, maxDocs=44218)
                0.0625 = fieldNorm(doc=4470)
          0.11965946 = weight(abstract_txt:mismatch in 4470) [ClassicSimilarity], result of:
            0.11965946 = score(doc=4470,freq=1.0), product of:
              0.2268751 = queryWeight, product of:
                1.8860493 = boost
                8.43879 = idf(docFreq=25, maxDocs=44218)
                0.014254552 = queryNorm
              0.5274244 = fieldWeight in 4470, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.43879 = idf(docFreq=25, maxDocs=44218)
                0.0625 = fieldNorm(doc=4470)
          0.05911051 = weight(abstract_txt:sense in 4470) [ClassicSimilarity], result of:
            0.05911051 = score(doc=4470,freq=1.0), product of:
              0.16229147 = queryWeight, product of:
                1.9536785 = boost
                5.8275905 = idf(docFreq=353, maxDocs=44218)
                0.014254552 = queryNorm
              0.3642244 = fieldWeight in 4470, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8275905 = idf(docFreq=353, maxDocs=44218)
                0.0625 = fieldNorm(doc=4470)
          0.036311895 = weight(abstract_txt:method in 4470) [ClassicSimilarity], result of:
            0.036311895 = score(doc=4470,freq=1.0), product of:
              0.12908171 = queryWeight, product of:
                2.0119028 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.014254552 = queryNorm
              0.28130937 = fieldWeight in 4470, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.0625 = fieldNorm(doc=4470)
          0.061084326 = weight(abstract_txt:words in 4470) [ClassicSimilarity], result of:
            0.061084326 = score(doc=4470,freq=1.0), product of:
              0.18257949 = queryWeight, product of:
                2.3927681 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.014254552 = queryNorm
              0.33456293 = fieldWeight in 4470, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.0625 = fieldNorm(doc=4470)
        0.32 = coord(8/25)
    
  5. Garcés, P.J.; Olivas, J.A.; Romero, F.P.: Concept-matching IR systems versus word-matching information retrieval systems : considering fuzzy interrelations for indexing Web pages (2006) 0.13
    0.12521671 = sum of:
      0.12521671 = product of:
        0.52173626 = sum of:
          0.025223361 = weight(abstract_txt:semantic in 5288) [ClassicSimilarity], result of:
            0.025223361 = score(doc=5288,freq=2.0), product of:
              0.06377945 = queryWeight, product of:
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.014254552 = queryNorm
              0.39547786 = fieldWeight in 5288, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.0625 = fieldNorm(doc=5288)
          0.019499127 = weight(abstract_txt:proposed in 5288) [ClassicSimilarity], result of:
            0.019499127 = score(doc=5288,freq=1.0), product of:
              0.06768601 = queryWeight, product of:
                1.0301704 = boost
                4.6093135 = idf(docFreq=1196, maxDocs=44218)
                0.014254552 = queryNorm
              0.2880821 = fieldWeight in 5288, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6093135 = idf(docFreq=1196, maxDocs=44218)
                0.0625 = fieldNorm(doc=5288)
          0.05135277 = weight(abstract_txt:method in 5288) [ClassicSimilarity], result of:
            0.05135277 = score(doc=5288,freq=2.0), product of:
              0.12908171 = queryWeight, product of:
                2.0119028 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.014254552 = queryNorm
              0.3978315 = fieldWeight in 5288, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.0625 = fieldNorm(doc=5288)
          0.061084326 = weight(abstract_txt:words in 5288) [ClassicSimilarity], result of:
            0.061084326 = score(doc=5288,freq=1.0), product of:
              0.18257949 = queryWeight, product of:
                2.3927681 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.014254552 = queryNorm
              0.33456293 = fieldWeight in 5288, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.0625 = fieldNorm(doc=5288)
          0.110761926 = weight(abstract_txt:word in 5288) [ClassicSimilarity], result of:
            0.110761926 = score(doc=5288,freq=3.0), product of:
              0.18824294 = queryWeight, product of:
                2.4295955 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.014254552 = queryNorm
              0.5883988 = fieldWeight in 5288, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.0625 = fieldNorm(doc=5288)
          0.25381476 = weight(abstract_txt:senses in 5288) [ClassicSimilarity], result of:
            0.25381476 = score(doc=5288,freq=1.0), product of:
              0.4718928 = queryWeight, product of:
                3.8467708 = boost
                8.6058445 = idf(docFreq=21, maxDocs=44218)
                0.014254552 = queryNorm
              0.5378653 = fieldWeight in 5288, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.6058445 = idf(docFreq=21, maxDocs=44218)
                0.0625 = fieldNorm(doc=5288)
        0.24 = coord(6/25)