Document (#21252)

Author
Charniak, E.
Title
Statistical techniques for natural language parsing
Source
AI magazine. 18(1997) no.4, S.33-43
Year
1997
Abstract
Reviews statistical work on syntactic parsing and considers part-of-speech tagging, which was the 1st syntactic problem to be successfully be attacked by statistical techniques and discusses statistical parsing. Considers both the simplified case in which the input string is a string of parts of speech and the more interesting case in which the parser is guided by statistical information about the particular words in the sentence
Footnote
Contribution to a special section reviewing recent research in empirical methods in speech recognition, syntactic parsing, semantic processing, information extraction and machine translation

Similar documents (content)

  1. Losee, R.M.: Learning syntactic rules and tags with genetic algorithms for information retrieval and filtering : an empirical basis for grammatical rules (1996) 0.26
    0.25719497 = sum of:
      0.25719497 = product of:
        0.8037343 = sum of:
          0.016905654 = weight(abstract_txt:language in 4068) [ClassicSimilarity], result of:
            0.016905654 = score(doc=4068,freq=1.0), product of:
              0.051742673 = queryWeight, product of:
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.012372451 = queryNorm
              0.32672557 = fieldWeight in 4068, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.078125 = fieldNorm(doc=4068)
          0.01973126 = weight(abstract_txt:particular in 4068) [ClassicSimilarity], result of:
            0.01973126 = score(doc=4068,freq=1.0), product of:
              0.057358447 = queryWeight, product of:
                1.0528688 = boost
                4.4031897 = idf(docFreq=1470, maxDocs=44218)
                0.012372451 = queryNorm
              0.3439992 = fieldWeight in 4068, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4031897 = idf(docFreq=1470, maxDocs=44218)
                0.078125 = fieldNorm(doc=4068)
          0.022212086 = weight(abstract_txt:part in 4068) [ClassicSimilarity], result of:
            0.022212086 = score(doc=4068,freq=1.0), product of:
              0.062070765 = queryWeight, product of:
                1.0952648 = boost
                4.580493 = idf(docFreq=1231, maxDocs=44218)
                0.012372451 = queryNorm
              0.35785103 = fieldWeight in 4068, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.580493 = idf(docFreq=1231, maxDocs=44218)
                0.078125 = fieldNorm(doc=4068)
          0.030290859 = weight(abstract_txt:natural in 4068) [ClassicSimilarity], result of:
            0.030290859 = score(doc=4068,freq=1.0), product of:
              0.07633117 = queryWeight, product of:
                1.2145811 = boost
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.012372451 = queryNorm
              0.39683473 = fieldWeight in 4068, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.078125 = fieldNorm(doc=4068)
          0.05204377 = weight(abstract_txt:parts in 4068) [ClassicSimilarity], result of:
            0.05204377 = score(doc=4068,freq=1.0), product of:
              0.10949813 = queryWeight, product of:
                1.4547184 = boost
                6.0837593 = idf(docFreq=273, maxDocs=44218)
                0.012372451 = queryNorm
              0.4752937 = fieldWeight in 4068, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0837593 = idf(docFreq=273, maxDocs=44218)
                0.078125 = fieldNorm(doc=4068)
          0.12850039 = weight(abstract_txt:syntactic in 4068) [ClassicSimilarity], result of:
            0.12850039 = score(doc=4068,freq=1.0), product of:
              0.25202316 = queryWeight, product of:
                3.1211224 = boost
                6.5264034 = idf(docFreq=175, maxDocs=44218)
                0.012372451 = queryNorm
              0.5098753 = fieldWeight in 4068, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5264034 = idf(docFreq=175, maxDocs=44218)
                0.078125 = fieldNorm(doc=4068)
          0.2118348 = weight(abstract_txt:speech in 4068) [ClassicSimilarity], result of:
            0.2118348 = score(doc=4068,freq=2.0), product of:
              0.27914235 = queryWeight, product of:
                3.2847586 = boost
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.012372451 = queryNorm
              0.75887734 = fieldWeight in 4068, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.078125 = fieldNorm(doc=4068)
          0.32221553 = weight(abstract_txt:parsing in 4068) [ClassicSimilarity], result of:
            0.32221553 = score(doc=4068,freq=1.0), product of:
              0.53247464 = queryWeight, product of:
                5.5562997 = boost
                7.7456436 = idf(docFreq=51, maxDocs=44218)
                0.012372451 = queryNorm
              0.6051284 = fieldWeight in 4068, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7456436 = idf(docFreq=51, maxDocs=44218)
                0.078125 = fieldNorm(doc=4068)
        0.32 = coord(8/25)
    
  2. Jacquemin, C.: What is the tree that we see through the window : a linguistic approach to windowing and term variation (1996) 0.24
    0.24291688 = sum of:
      0.24291688 = product of:
        0.86756027 = sum of:
          0.020286787 = weight(abstract_txt:language in 5578) [ClassicSimilarity], result of:
            0.020286787 = score(doc=5578,freq=1.0), product of:
              0.051742673 = queryWeight, product of:
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.012372451 = queryNorm
              0.3920707 = fieldWeight in 5578, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.09375 = fieldNorm(doc=5578)
          0.03634903 = weight(abstract_txt:natural in 5578) [ClassicSimilarity], result of:
            0.03634903 = score(doc=5578,freq=1.0), product of:
              0.07633117 = queryWeight, product of:
                1.2145811 = boost
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.012372451 = queryNorm
              0.47620165 = fieldWeight in 5578, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.09375 = fieldNorm(doc=5578)
          0.060164772 = weight(abstract_txt:words in 5578) [ClassicSimilarity], result of:
            0.060164772 = score(doc=5578,freq=2.0), product of:
              0.08477313 = queryWeight, product of:
                1.2799845 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.012372451 = queryNorm
              0.7097151 = fieldWeight in 5578, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.09375 = fieldNorm(doc=5578)
          0.15834028 = weight(abstract_txt:parser in 5578) [ClassicSimilarity], result of:
            0.15834028 = score(doc=5578,freq=1.0), product of:
              0.20359524 = queryWeight, product of:
                1.9836241 = boost
                8.29569 = idf(docFreq=29, maxDocs=44218)
                0.012372451 = queryNorm
              0.7777209 = fieldWeight in 5578, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.29569 = idf(docFreq=29, maxDocs=44218)
                0.09375 = fieldNorm(doc=5578)
          0.051560275 = weight(abstract_txt:techniques in 5578) [ClassicSimilarity], result of:
            0.051560275 = score(doc=5578,freq=1.0), product of:
              0.1214116 = queryWeight, product of:
                2.1663103 = boost
                4.5298495 = idf(docFreq=1295, maxDocs=44218)
                0.012372451 = queryNorm
              0.42467338 = fieldWeight in 5578, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5298495 = idf(docFreq=1295, maxDocs=44218)
                0.09375 = fieldNorm(doc=5578)
          0.15420045 = weight(abstract_txt:syntactic in 5578) [ClassicSimilarity], result of:
            0.15420045 = score(doc=5578,freq=1.0), product of:
              0.25202316 = queryWeight, product of:
                3.1211224 = boost
                6.5264034 = idf(docFreq=175, maxDocs=44218)
                0.012372451 = queryNorm
              0.6118503 = fieldWeight in 5578, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5264034 = idf(docFreq=175, maxDocs=44218)
                0.09375 = fieldNorm(doc=5578)
          0.38665864 = weight(abstract_txt:parsing in 5578) [ClassicSimilarity], result of:
            0.38665864 = score(doc=5578,freq=1.0), product of:
              0.53247464 = queryWeight, product of:
                5.5562997 = boost
                7.7456436 = idf(docFreq=51, maxDocs=44218)
                0.012372451 = queryNorm
              0.7261541 = fieldWeight in 5578, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7456436 = idf(docFreq=51, maxDocs=44218)
                0.09375 = fieldNorm(doc=5578)
        0.28 = coord(7/25)
    
  3. Chowdhury, G.G.: Natural language processing and information retrieval : pt.1: basic issues; pt.2: major applications (1991) 0.22
    0.22421746 = sum of:
      0.22421746 = product of:
        1.1210873 = sum of:
          0.05856291 = weight(abstract_txt:language in 3313) [ClassicSimilarity], result of:
            0.05856291 = score(doc=3313,freq=3.0), product of:
              0.051742673 = queryWeight, product of:
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.012372451 = queryNorm
              1.1318107 = fieldWeight in 3313, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.15625 = fieldNorm(doc=3313)
          0.056161974 = weight(abstract_txt:reviews in 3313) [ClassicSimilarity], result of:
            0.056161974 = score(doc=3313,freq=1.0), product of:
              0.072572 = queryWeight, product of:
                1.1842957 = boost
                4.952828 = idf(docFreq=848, maxDocs=44218)
                0.012372451 = queryNorm
              0.77387935 = fieldWeight in 3313, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.952828 = idf(docFreq=848, maxDocs=44218)
                0.15625 = fieldNorm(doc=3313)
          0.10493061 = weight(abstract_txt:natural in 3313) [ClassicSimilarity], result of:
            0.10493061 = score(doc=3313,freq=3.0), product of:
              0.07633117 = queryWeight, product of:
                1.2145811 = boost
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.012372451 = queryNorm
              1.3746758 = fieldWeight in 3313, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.15625 = fieldNorm(doc=3313)
          0.25700077 = weight(abstract_txt:syntactic in 3313) [ClassicSimilarity], result of:
            0.25700077 = score(doc=3313,freq=1.0), product of:
              0.25202316 = queryWeight, product of:
                3.1211224 = boost
                6.5264034 = idf(docFreq=175, maxDocs=44218)
                0.012372451 = queryNorm
              1.0197506 = fieldWeight in 3313, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5264034 = idf(docFreq=175, maxDocs=44218)
                0.15625 = fieldNorm(doc=3313)
          0.64443105 = weight(abstract_txt:parsing in 3313) [ClassicSimilarity], result of:
            0.64443105 = score(doc=3313,freq=1.0), product of:
              0.53247464 = queryWeight, product of:
                5.5562997 = boost
                7.7456436 = idf(docFreq=51, maxDocs=44218)
                0.012372451 = queryNorm
              1.2102568 = fieldWeight in 3313, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7456436 = idf(docFreq=51, maxDocs=44218)
                0.15625 = fieldNorm(doc=3313)
        0.2 = coord(5/25)
    
  4. Multilingual information management : current levels and future abilities. A report Commissioned by the US National Science Foundation and also delivered to the European Commission's Language Engineering Office and the US Defense Advanced Research Projects Agency, April 1999 (1999) 0.22
    0.21537343 = sum of:
      0.21537343 = product of:
        0.7691908 = sum of:
          0.023667917 = weight(abstract_txt:language in 6068) [ClassicSimilarity], result of:
            0.023667917 = score(doc=6068,freq=4.0), product of:
              0.051742673 = queryWeight, product of:
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.012372451 = queryNorm
              0.45741582 = fieldWeight in 6068, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.0546875 = fieldNorm(doc=6068)
          0.021203602 = weight(abstract_txt:natural in 6068) [ClassicSimilarity], result of:
            0.021203602 = score(doc=6068,freq=1.0), product of:
              0.07633117 = queryWeight, product of:
                1.2145811 = boost
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.012372451 = queryNorm
              0.27778432 = fieldWeight in 6068, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.0546875 = fieldNorm(doc=6068)
          0.012043531 = weight(abstract_txt:which in 6068) [ClassicSimilarity], result of:
            0.012043531 = score(doc=6068,freq=1.0), product of:
              0.07550432 = queryWeight, product of:
                2.092291 = boost
                2.9167147 = idf(docFreq=6503, maxDocs=44218)
                0.012372451 = queryNorm
              0.15950784 = fieldWeight in 6068, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.9167147 = idf(docFreq=6503, maxDocs=44218)
                0.0546875 = fieldNorm(doc=6068)
          0.07367288 = weight(abstract_txt:techniques in 6068) [ClassicSimilarity], result of:
            0.07367288 = score(doc=6068,freq=6.0), product of:
              0.1214116 = queryWeight, product of:
                2.1663103 = boost
                4.5298495 = idf(docFreq=1295, maxDocs=44218)
                0.012372451 = queryNorm
              0.6068027 = fieldWeight in 6068, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.5298495 = idf(docFreq=1295, maxDocs=44218)
                0.0546875 = fieldNorm(doc=6068)
          0.1816105 = weight(abstract_txt:speech in 6068) [ClassicSimilarity], result of:
            0.1816105 = score(doc=6068,freq=3.0), product of:
              0.27914235 = queryWeight, product of:
                3.2847586 = boost
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.012372451 = queryNorm
              0.65060174 = fieldWeight in 6068, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.0546875 = fieldNorm(doc=6068)
          0.3189771 = weight(abstract_txt:parsing in 6068) [ClassicSimilarity], result of:
            0.3189771 = score(doc=6068,freq=2.0), product of:
              0.53247464 = queryWeight, product of:
                5.5562997 = boost
                7.7456436 = idf(docFreq=51, maxDocs=44218)
                0.012372451 = queryNorm
              0.5990465 = fieldWeight in 6068, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.7456436 = idf(docFreq=51, maxDocs=44218)
                0.0546875 = fieldNorm(doc=6068)
          0.13801527 = weight(abstract_txt:statistical in 6068) [ClassicSimilarity], result of:
            0.13801527 = score(doc=6068,freq=1.0), product of:
              0.45502672 = queryWeight, product of:
                6.631 = boost
                5.5462847 = idf(docFreq=468, maxDocs=44218)
                0.012372451 = queryNorm
              0.30331245 = fieldWeight in 6068, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5462847 = idf(docFreq=468, maxDocs=44218)
                0.0546875 = fieldNorm(doc=6068)
        0.28 = coord(7/25)
    
  5. Sikkel, K.: Parsing schemata : a framework for specification and analysis of parsing algorithms (1996) 0.21
    0.21451722 = sum of:
      0.21451722 = product of:
        0.8938218 = sum of:
          0.016905654 = weight(abstract_txt:language in 685) [ClassicSimilarity], result of:
            0.016905654 = score(doc=685,freq=1.0), product of:
              0.051742673 = queryWeight, product of:
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.012372451 = queryNorm
              0.32672557 = fieldWeight in 685, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.078125 = fieldNorm(doc=685)
          0.028080987 = weight(abstract_txt:reviews in 685) [ClassicSimilarity], result of:
            0.028080987 = score(doc=685,freq=1.0), product of:
              0.072572 = queryWeight, product of:
                1.1842957 = boost
                4.952828 = idf(docFreq=848, maxDocs=44218)
                0.012372451 = queryNorm
              0.38693967 = fieldWeight in 685, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.952828 = idf(docFreq=848, maxDocs=44218)
                0.078125 = fieldNorm(doc=685)
          0.030290859 = weight(abstract_txt:natural in 685) [ClassicSimilarity], result of:
            0.030290859 = score(doc=685,freq=1.0), product of:
              0.07633117 = queryWeight, product of:
                1.2145811 = boost
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.012372451 = queryNorm
              0.39683473 = fieldWeight in 685, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.078125 = fieldNorm(doc=685)
          0.13195021 = weight(abstract_txt:parser in 685) [ClassicSimilarity], result of:
            0.13195021 = score(doc=685,freq=1.0), product of:
              0.20359524 = queryWeight, product of:
                1.9836241 = boost
                8.29569 = idf(docFreq=29, maxDocs=44218)
                0.012372451 = queryNorm
              0.64810073 = fieldWeight in 685, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.29569 = idf(docFreq=29, maxDocs=44218)
                0.078125 = fieldNorm(doc=685)
          0.12850039 = weight(abstract_txt:syntactic in 685) [ClassicSimilarity], result of:
            0.12850039 = score(doc=685,freq=1.0), product of:
              0.25202316 = queryWeight, product of:
                3.1211224 = boost
                6.5264034 = idf(docFreq=175, maxDocs=44218)
                0.012372451 = queryNorm
              0.5098753 = fieldWeight in 685, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5264034 = idf(docFreq=175, maxDocs=44218)
                0.078125 = fieldNorm(doc=685)
          0.55809367 = weight(abstract_txt:parsing in 685) [ClassicSimilarity], result of:
            0.55809367 = score(doc=685,freq=3.0), product of:
              0.53247464 = queryWeight, product of:
                5.5562997 = boost
                7.7456436 = idf(docFreq=51, maxDocs=44218)
                0.012372451 = queryNorm
              1.0481131 = fieldWeight in 685, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.7456436 = idf(docFreq=51, maxDocs=44218)
                0.078125 = fieldNorm(doc=685)
        0.24 = coord(6/25)