Document (#27561)

Author
Terada, A.
Tokunaga, T.
Tanaka, H.
Title
Automatic expansion of abbreviations by using context and character information
Source
Information processing and management. 40(2004) no.1, S.31-45
Year
2004
Abstract
Unknown words such as proper nouns, abbreviations, and acronyms are a major obstacle in text processing. Abbreviations, in particular, are difficult to read/process because they are often domain specific. In this paper, we propose a method for automatic expansion of abbreviations by using context and character information. In previous studies dictionaries were used to search for abbreviation expansion candidates (candidates words for original form of abbreviations) to expand abbreviations. We use a corpus with few abbreviations from the same field instead of a dictionary. We calculate the adequacy of abbreviation expansion candidates based on the similarity between the context of the target abbreviation and that of its expansion candidate. The similarity is calculated using a vector space model in which each vector element consists of words surrounding the target abbreviation and those of its expansion candidate. Experiments using approximately 10,000 documents in the field of aviation showed that the accuracy of the proposed method is 10% higher than that of previously developed methods.

Similar documents (content)

  1. HaCohen-Kerner, Y.; Kass, A.; Peretz, A.: HAADS: a Hebrew Aramaic abbreviation disambiguation system (2010) 0.22
    0.21693687 = sum of:
      0.21693687 = product of:
        1.0846844 = sum of:
          0.018081678 = weight(abstract_txt:method in 3990) [ClassicSimilarity], result of:
            0.018081678 = score(doc=3990,freq=1.0), product of:
              0.051421475 = queryWeight, product of:
                1.2468781 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.00916255 = queryNorm
              0.3516367 = fieldWeight in 3990, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.078125 = fieldNorm(doc=3990)
          0.024315173 = weight(abstract_txt:context in 3990) [ClassicSimilarity], result of:
            0.024315173 = score(doc=3990,freq=1.0), product of:
              0.07171346 = queryWeight, product of:
                1.8034235 = boost
                4.339969 = idf(docFreq=1566, maxDocs=44218)
                0.00916255 = queryNorm
              0.3390601 = fieldWeight in 3990, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.339969 = idf(docFreq=1566, maxDocs=44218)
                0.078125 = fieldNorm(doc=3990)
          0.016472453 = weight(abstract_txt:using in 3990) [ClassicSimilarity], result of:
            0.016472453 = score(doc=3990,freq=1.0), product of:
              0.06088368 = queryWeight, product of:
                1.9187447 = boost
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.00916255 = queryNorm
              0.27055615 = fieldWeight in 3990, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.078125 = fieldNorm(doc=3990)
          0.36770594 = weight(abstract_txt:abbreviation in 3990) [ClassicSimilarity], result of:
            0.36770594 = score(doc=3990,freq=1.0), product of:
              0.4826835 = queryWeight, product of:
                5.4025397 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.00916255 = queryNorm
              0.7617951 = fieldWeight in 3990, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.078125 = fieldNorm(doc=3990)
          0.65810907 = weight(abstract_txt:abbreviations in 3990) [ClassicSimilarity], result of:
            0.65810907 = score(doc=3990,freq=2.0), product of:
              0.6805551 = queryWeight, product of:
                8.486281 = boost
                8.752448 = idf(docFreq=18, maxDocs=44218)
                0.00916255 = queryNorm
              0.967018 = fieldWeight in 3990, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.752448 = idf(docFreq=18, maxDocs=44218)
                0.078125 = fieldNorm(doc=3990)
        0.2 = coord(5/25)
    
  2. Beall, J.: Abbreviations, full spellings, and searchers' preferences (2011) 0.21
    0.20887935 = sum of:
      0.20887935 = product of:
        1.305496 = sum of:
          0.019766944 = weight(abstract_txt:using in 4166) [ClassicSimilarity], result of:
            0.019766944 = score(doc=4166,freq=1.0), product of:
              0.06088368 = queryWeight, product of:
                1.9187447 = boost
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.00916255 = queryNorm
              0.32466736 = fieldWeight in 4166, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.09375 = fieldNorm(doc=4166)
          0.05475101 = weight(abstract_txt:words in 4166) [ClassicSimilarity], result of:
            0.05475101 = score(doc=4166,freq=1.0), product of:
              0.10909957 = queryWeight, product of:
                2.2243795 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.00916255 = queryNorm
              0.5018444 = fieldWeight in 4166, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.09375 = fieldNorm(doc=4166)
          0.44124714 = weight(abstract_txt:abbreviation in 4166) [ClassicSimilarity], result of:
            0.44124714 = score(doc=4166,freq=1.0), product of:
              0.4826835 = queryWeight, product of:
                5.4025397 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.00916255 = queryNorm
              0.9141542 = fieldWeight in 4166, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.09375 = fieldNorm(doc=4166)
          0.78973085 = weight(abstract_txt:abbreviations in 4166) [ClassicSimilarity], result of:
            0.78973085 = score(doc=4166,freq=2.0), product of:
              0.6805551 = queryWeight, product of:
                8.486281 = boost
                8.752448 = idf(docFreq=18, maxDocs=44218)
                0.00916255 = queryNorm
              1.1604216 = fieldWeight in 4166, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.752448 = idf(docFreq=18, maxDocs=44218)
                0.09375 = fieldNorm(doc=4166)
        0.16 = coord(4/25)
    
  3. Wacholder, N.; Byrd, R.J.: Retrieving information from full text using linguistic knowledge (1994) 0.16
    0.1551354 = sum of:
      0.1551354 = product of:
        0.6463975 = sum of:
          0.065317094 = weight(abstract_txt:acronyms in 8524) [ClassicSimilarity], result of:
            0.065317094 = score(doc=8524,freq=1.0), product of:
              0.09608596 = queryWeight, product of:
                1.205221 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.00916255 = queryNorm
              0.67977774 = fieldWeight in 8524, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.078125 = fieldNorm(doc=8524)
          0.017973969 = weight(abstract_txt:field in 8524) [ClassicSimilarity], result of:
            0.017973969 = score(doc=8524,freq=1.0), product of:
              0.051217064 = queryWeight, product of:
                1.2443974 = boost
                4.491995 = idf(docFreq=1345, maxDocs=44218)
                0.00916255 = queryNorm
              0.3509371 = fieldWeight in 8524, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.491995 = idf(docFreq=1345, maxDocs=44218)
                0.078125 = fieldNorm(doc=8524)
          0.027812036 = weight(abstract_txt:automatic in 8524) [ClassicSimilarity], result of:
            0.027812036 = score(doc=8524,freq=1.0), product of:
              0.0685184 = queryWeight, product of:
                1.4393133 = boost
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.00916255 = queryNorm
              0.40590608 = fieldWeight in 8524, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.078125 = fieldNorm(doc=8524)
          0.024315173 = weight(abstract_txt:context in 8524) [ClassicSimilarity], result of:
            0.024315173 = score(doc=8524,freq=1.0), product of:
              0.07171346 = queryWeight, product of:
                1.8034235 = boost
                4.339969 = idf(docFreq=1566, maxDocs=44218)
                0.00916255 = queryNorm
              0.3390601 = fieldWeight in 8524, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.339969 = idf(docFreq=1566, maxDocs=44218)
                0.078125 = fieldNorm(doc=8524)
          0.045625836 = weight(abstract_txt:words in 8524) [ClassicSimilarity], result of:
            0.045625836 = score(doc=8524,freq=1.0), product of:
              0.10909957 = queryWeight, product of:
                2.2243795 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.00916255 = queryNorm
              0.41820365 = fieldWeight in 8524, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.078125 = fieldNorm(doc=8524)
          0.4653534 = weight(abstract_txt:abbreviations in 8524) [ClassicSimilarity], result of:
            0.4653534 = score(doc=8524,freq=1.0), product of:
              0.6805551 = queryWeight, product of:
                8.486281 = boost
                8.752448 = idf(docFreq=18, maxDocs=44218)
                0.00916255 = queryNorm
              0.683785 = fieldWeight in 8524, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.752448 = idf(docFreq=18, maxDocs=44218)
                0.078125 = fieldNorm(doc=8524)
        0.24 = coord(6/25)
    
  4. HaCohen-Kerner, Y.; Kass, A.; Peretz, A.: Initialism disambiguation : man versus machine (2013) 0.13
    0.13132872 = sum of:
      0.13132872 = product of:
        0.65664357 = sum of:
          0.052253675 = weight(abstract_txt:acronyms in 1094) [ClassicSimilarity], result of:
            0.052253675 = score(doc=1094,freq=1.0), product of:
              0.09608596 = queryWeight, product of:
                1.205221 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.00916255 = queryNorm
              0.54382217 = fieldWeight in 1094, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.0625 = fieldNorm(doc=1094)
          0.014465342 = weight(abstract_txt:method in 1094) [ClassicSimilarity], result of:
            0.014465342 = score(doc=1094,freq=1.0), product of:
              0.051421475 = queryWeight, product of:
                1.2468781 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.00916255 = queryNorm
              0.28130937 = fieldWeight in 1094, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.0625 = fieldNorm(doc=1094)
          0.019452138 = weight(abstract_txt:context in 1094) [ClassicSimilarity], result of:
            0.019452138 = score(doc=1094,freq=1.0), product of:
              0.07171346 = queryWeight, product of:
                1.8034235 = boost
                4.339969 = idf(docFreq=1566, maxDocs=44218)
                0.00916255 = queryNorm
              0.27124807 = fieldWeight in 1094, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.339969 = idf(docFreq=1566, maxDocs=44218)
                0.0625 = fieldNorm(doc=1094)
          0.04398518 = weight(abstract_txt:vector in 1094) [ClassicSimilarity], result of:
            0.04398518 = score(doc=1094,freq=1.0), product of:
              0.10792688 = queryWeight, product of:
                1.8064109 = boost
                6.5207376 = idf(docFreq=176, maxDocs=44218)
                0.00916255 = queryNorm
              0.4075461 = fieldWeight in 1094, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5207376 = idf(docFreq=176, maxDocs=44218)
                0.0625 = fieldNorm(doc=1094)
          0.52648723 = weight(abstract_txt:abbreviations in 1094) [ClassicSimilarity], result of:
            0.52648723 = score(doc=1094,freq=2.0), product of:
              0.6805551 = queryWeight, product of:
                8.486281 = boost
                8.752448 = idf(docFreq=18, maxDocs=44218)
                0.00916255 = queryNorm
              0.7736144 = fieldWeight in 1094, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.752448 = idf(docFreq=18, maxDocs=44218)
                0.0625 = fieldNorm(doc=1094)
        0.2 = coord(5/25)
    
  5. Franceschini, F.; Maisano, D.; Mastrogiacomo, L.: ¬A novel approach for estimating the omitted-citation rate of bibliometric databases with an application to the field of bibliometrics (2013) 0.13
    0.13132872 = sum of:
      0.13132872 = product of:
        0.65664357 = sum of:
          0.052253675 = weight(abstract_txt:acronyms in 1097) [ClassicSimilarity], result of:
            0.052253675 = score(doc=1097,freq=1.0), product of:
              0.09608596 = queryWeight, product of:
                1.205221 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.00916255 = queryNorm
              0.54382217 = fieldWeight in 1097, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.0625 = fieldNorm(doc=1097)
          0.014465342 = weight(abstract_txt:method in 1097) [ClassicSimilarity], result of:
            0.014465342 = score(doc=1097,freq=1.0), product of:
              0.051421475 = queryWeight, product of:
                1.2468781 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.00916255 = queryNorm
              0.28130937 = fieldWeight in 1097, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.0625 = fieldNorm(doc=1097)
          0.019452138 = weight(abstract_txt:context in 1097) [ClassicSimilarity], result of:
            0.019452138 = score(doc=1097,freq=1.0), product of:
              0.07171346 = queryWeight, product of:
                1.8034235 = boost
                4.339969 = idf(docFreq=1566, maxDocs=44218)
                0.00916255 = queryNorm
              0.27124807 = fieldWeight in 1097, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.339969 = idf(docFreq=1566, maxDocs=44218)
                0.0625 = fieldNorm(doc=1097)
          0.04398518 = weight(abstract_txt:vector in 1097) [ClassicSimilarity], result of:
            0.04398518 = score(doc=1097,freq=1.0), product of:
              0.10792688 = queryWeight, product of:
                1.8064109 = boost
                6.5207376 = idf(docFreq=176, maxDocs=44218)
                0.00916255 = queryNorm
              0.4075461 = fieldWeight in 1097, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5207376 = idf(docFreq=176, maxDocs=44218)
                0.0625 = fieldNorm(doc=1097)
          0.52648723 = weight(abstract_txt:abbreviations in 1097) [ClassicSimilarity], result of:
            0.52648723 = score(doc=1097,freq=2.0), product of:
              0.6805551 = queryWeight, product of:
                8.486281 = boost
                8.752448 = idf(docFreq=18, maxDocs=44218)
                0.00916255 = queryNorm
              0.7736144 = fieldWeight in 1097, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.752448 = idf(docFreq=18, maxDocs=44218)
                0.0625 = fieldNorm(doc=1097)
        0.2 = coord(5/25)