Document (#13779)

Author
Tsujii, J.-I.
Title
Automatic acquisition of semantic collocation from corpora
Source
Machine translation. 10(1995) no.3, S.219-258
Year
1995
Abstract
Proposes automatic linguistic knowledge acquisition from sublanguage corpora. The system combines existing linguistic knowledge and human intervention with corpus based techniques. The algorithm involves a gradual approximation which works to converge linguistic knowledge gradually towards desirable results. The 1st experiment revealed the characteristic of this algorithm and the others proved the effectiveness of this algorithm for a real corpus
Theme
Automatisches Indexieren

Similar documents (content)

  1. Sánchez-de-Madariaga, R.; Fernández-del-Castillo, J.R.: ¬The bootstrapping of the Yarowsky algorithm in real corpora (2009) 0.18
    0.17872469 = sum of:
      0.17872469 = product of:
        0.8936235 = sum of:
          0.06289576 = weight(abstract_txt:real in 2451) [ClassicSimilarity], result of:
            0.06289576 = score(doc=2451,freq=2.0), product of:
              0.08936707 = queryWeight, product of:
                5.308326 = idf(docFreq=594, maxDocs=44218)
                0.016835265 = queryNorm
              0.7037912 = fieldWeight in 2451, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.308326 = idf(docFreq=594, maxDocs=44218)
                0.09375 = fieldNorm(doc=2451)
          0.040000282 = weight(abstract_txt:knowledge in 2451) [ClassicSimilarity], result of:
            0.040000282 = score(doc=2451,freq=1.0), product of:
              0.1200943 = queryWeight, product of:
                2.0078583 = boost
                3.5527887 = idf(docFreq=3442, maxDocs=44218)
                0.016835265 = queryNorm
              0.33307394 = fieldWeight in 2451, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5527887 = idf(docFreq=3442, maxDocs=44218)
                0.09375 = fieldNorm(doc=2451)
          0.14429589 = weight(abstract_txt:acquisition in 2451) [ClassicSimilarity], result of:
            0.14429589 = score(doc=2451,freq=1.0), product of:
              0.24676633 = queryWeight, product of:
                2.3500073 = boost
                6.237302 = idf(docFreq=234, maxDocs=44218)
                0.016835265 = queryNorm
              0.5847471 = fieldWeight in 2451, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.237302 = idf(docFreq=234, maxDocs=44218)
                0.09375 = fieldNorm(doc=2451)
          0.41215253 = weight(abstract_txt:corpora in 2451) [ClassicSimilarity], result of:
            0.41215253 = score(doc=2451,freq=4.0), product of:
              0.31294543 = queryWeight, product of:
                2.6464307 = boost
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.016835265 = queryNorm
              1.3170109 = fieldWeight in 2451, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.09375 = fieldNorm(doc=2451)
          0.234279 = weight(abstract_txt:algorithm in 2451) [ClassicSimilarity], result of:
            0.234279 = score(doc=2451,freq=2.0), product of:
              0.3097129 = queryWeight, product of:
                3.2244194 = boost
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.016835265 = queryNorm
              0.7564393 = fieldWeight in 2451, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.09375 = fieldNorm(doc=2451)
        0.2 = coord(5/25)
    
  2. Ibekwe-SanJuan, F.: Constructing and maintaining knowledge organization tools : a symbolic approach (2006) 0.16
    0.16189937 = sum of:
      0.16189937 = product of:
        0.578212 = sum of:
          0.0073238467 = weight(abstract_txt:from in 5595) [ClassicSimilarity], result of:
            0.0073238467 = score(doc=5595,freq=1.0), product of:
              0.048454214 = queryWeight, product of:
                1.0413387 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.016835265 = queryNorm
              0.15114984 = fieldWeight in 5595, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5595)
          0.06880214 = weight(abstract_txt:automatic in 5595) [ClassicSimilarity], result of:
            0.06880214 = score(doc=5595,freq=2.0), product of:
              0.1712235 = queryWeight, product of:
                1.9575278 = boost
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.016835265 = queryNorm
              0.4018265 = fieldWeight in 5595, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5595)
          0.057155162 = weight(abstract_txt:knowledge in 5595) [ClassicSimilarity], result of:
            0.057155162 = score(doc=5595,freq=6.0), product of:
              0.1200943 = queryWeight, product of:
                2.0078583 = boost
                3.5527887 = idf(docFreq=3442, maxDocs=44218)
                0.016835265 = queryNorm
              0.47591904 = fieldWeight in 5595, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.5527887 = idf(docFreq=3442, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5595)
          0.07867598 = weight(abstract_txt:corpus in 5595) [ClassicSimilarity], result of:
            0.07867598 = score(doc=5595,freq=1.0), product of:
              0.23590302 = queryWeight, product of:
                2.2976983 = boost
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.016835265 = queryNorm
              0.33350983 = fieldWeight in 5595, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5595)
          0.0841726 = weight(abstract_txt:acquisition in 5595) [ClassicSimilarity], result of:
            0.0841726 = score(doc=5595,freq=1.0), product of:
              0.24676633 = queryWeight, product of:
                2.3500073 = boost
                6.237302 = idf(docFreq=234, maxDocs=44218)
                0.016835265 = queryNorm
              0.34110245 = fieldWeight in 5595, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.237302 = idf(docFreq=234, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5595)
          0.13666275 = weight(abstract_txt:algorithm in 5595) [ClassicSimilarity], result of:
            0.13666275 = score(doc=5595,freq=2.0), product of:
              0.3097129 = queryWeight, product of:
                3.2244194 = boost
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.016835265 = queryNorm
              0.44125628 = fieldWeight in 5595, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5595)
          0.14541957 = weight(abstract_txt:linguistic in 5595) [ClassicSimilarity], result of:
            0.14541957 = score(doc=5595,freq=2.0), product of:
              0.3228056 = queryWeight, product of:
                3.291868 = boost
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.016835265 = queryNorm
              0.4504865 = fieldWeight in 5595, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5595)
        0.28 = coord(7/25)
    
  3. Cui, H.; Heidorn, P.B.: ¬The reusability of induced knowledge for the automatic semantic markup of taxonomic descriptions (2007) 0.13
    0.12857829 = sum of:
      0.12857829 = product of:
        0.6428914 = sum of:
          0.01674022 = weight(abstract_txt:from in 84) [ClassicSimilarity], result of:
            0.01674022 = score(doc=84,freq=4.0), product of:
              0.048454214 = queryWeight, product of:
                1.0413387 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.016835265 = queryNorm
              0.34548533 = fieldWeight in 84, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.0625 = fieldNorm(doc=84)
          0.055600528 = weight(abstract_txt:automatic in 84) [ClassicSimilarity], result of:
            0.055600528 = score(doc=84,freq=1.0), product of:
              0.1712235 = queryWeight, product of:
                1.9575278 = boost
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.016835265 = queryNorm
              0.32472485 = fieldWeight in 84, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.0625 = fieldNorm(doc=84)
          0.046188347 = weight(abstract_txt:knowledge in 84) [ClassicSimilarity], result of:
            0.046188347 = score(doc=84,freq=3.0), product of:
              0.1200943 = queryWeight, product of:
                2.0078583 = boost
                3.5527887 = idf(docFreq=3442, maxDocs=44218)
                0.016835265 = queryNorm
              0.38460067 = fieldWeight in 84, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.5527887 = idf(docFreq=3442, maxDocs=44218)
                0.0625 = fieldNorm(doc=84)
          0.0899154 = weight(abstract_txt:corpus in 84) [ClassicSimilarity], result of:
            0.0899154 = score(doc=84,freq=1.0), product of:
              0.23590302 = queryWeight, product of:
                2.2976983 = boost
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.016835265 = queryNorm
              0.3811541 = fieldWeight in 84, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.0625 = fieldNorm(doc=84)
          0.43444693 = weight(abstract_txt:corpora in 84) [ClassicSimilarity], result of:
            0.43444693 = score(doc=84,freq=10.0), product of:
              0.31294543 = queryWeight, product of:
                2.6464307 = boost
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.016835265 = queryNorm
              1.3882514 = fieldWeight in 84, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.0625 = fieldNorm(doc=84)
        0.2 = coord(5/25)
    
  4. Dias, G.: Multiword unit hybrid extraction (o.J.) 0.12
    0.11828078 = sum of:
      0.11828078 = product of:
        0.5914039 = sum of:
          0.014796404 = weight(abstract_txt:from in 643) [ClassicSimilarity], result of:
            0.014796404 = score(doc=643,freq=2.0), product of:
              0.048454214 = queryWeight, product of:
                1.0413387 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.016835265 = queryNorm
              0.30536878 = fieldWeight in 643, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.078125 = fieldNorm(doc=643)
          0.09903182 = weight(abstract_txt:intervention in 643) [ClassicSimilarity], result of:
            0.09903182 = score(doc=643,freq=1.0), product of:
              0.17208537 = queryWeight, product of:
                1.3876605 = boost
                7.3661537 = idf(docFreq=75, maxDocs=44218)
                0.016835265 = queryNorm
              0.57548076 = fieldWeight in 643, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3661537 = idf(docFreq=75, maxDocs=44218)
                0.078125 = fieldNorm(doc=643)
          0.1589495 = weight(abstract_txt:corpus in 643) [ClassicSimilarity], result of:
            0.1589495 = score(doc=643,freq=2.0), product of:
              0.23590302 = queryWeight, product of:
                2.2976983 = boost
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.016835265 = queryNorm
              0.67379165 = fieldWeight in 643, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.078125 = fieldNorm(doc=643)
          0.17173024 = weight(abstract_txt:corpora in 643) [ClassicSimilarity], result of:
            0.17173024 = score(doc=643,freq=1.0), product of:
              0.31294543 = queryWeight, product of:
                2.6464307 = boost
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.016835265 = queryNorm
              0.5487546 = fieldWeight in 643, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.078125 = fieldNorm(doc=643)
          0.14689596 = weight(abstract_txt:linguistic in 643) [ClassicSimilarity], result of:
            0.14689596 = score(doc=643,freq=1.0), product of:
              0.3228056 = queryWeight, product of:
                3.291868 = boost
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.016835265 = queryNorm
              0.45506012 = fieldWeight in 643, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.078125 = fieldNorm(doc=643)
        0.2 = coord(5/25)
    
  5. Anguiano Peña, G.; Naumis Peña, C.: Method for selecting specialized terms from a general language corpus (2015) 0.11
    0.11281812 = sum of:
      0.11281812 = product of:
        0.5640906 = sum of:
          0.014796404 = weight(abstract_txt:from in 2196) [ClassicSimilarity], result of:
            0.014796404 = score(doc=2196,freq=2.0), product of:
              0.048454214 = queryWeight, product of:
                1.0413387 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.016835265 = queryNorm
              0.30536878 = fieldWeight in 2196, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.078125 = fieldNorm(doc=2196)
          0.047140785 = weight(abstract_txt:knowledge in 2196) [ClassicSimilarity], result of:
            0.047140785 = score(doc=2196,freq=2.0), product of:
              0.1200943 = queryWeight, product of:
                2.0078583 = boost
                3.5527887 = idf(docFreq=3442, maxDocs=44218)
                0.016835265 = queryNorm
              0.39253142 = fieldWeight in 2196, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5527887 = idf(docFreq=3442, maxDocs=44218)
                0.078125 = fieldNorm(doc=2196)
          0.11239425 = weight(abstract_txt:corpus in 2196) [ClassicSimilarity], result of:
            0.11239425 = score(doc=2196,freq=1.0), product of:
              0.23590302 = queryWeight, product of:
                2.2976983 = boost
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.016835265 = queryNorm
              0.4764426 = fieldWeight in 2196, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.078125 = fieldNorm(doc=2196)
          0.24286321 = weight(abstract_txt:corpora in 2196) [ClassicSimilarity], result of:
            0.24286321 = score(doc=2196,freq=2.0), product of:
              0.31294543 = queryWeight, product of:
                2.6464307 = boost
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.016835265 = queryNorm
              0.7760561 = fieldWeight in 2196, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.078125 = fieldNorm(doc=2196)
          0.14689596 = weight(abstract_txt:linguistic in 2196) [ClassicSimilarity], result of:
            0.14689596 = score(doc=2196,freq=1.0), product of:
              0.3228056 = queryWeight, product of:
                3.291868 = boost
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.016835265 = queryNorm
              0.45506012 = fieldWeight in 2196, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.078125 = fieldNorm(doc=2196)
        0.2 = coord(5/25)