Document (#21253)

Author
Charniak, E.
Title
Statistical techniques for natural language parsing
Source
AI magazine. 18(1997) no.4, S.33-43
Year
1997
Abstract
Reviews statistical work on syntactic parsing and considers part-of-speech tagging, which was the 1st syntactic problem to be successfully be attacked by statistical techniques and discusses statistical parsing. Considers both the simplified case in which the input string is a string of parts of speech and the more interesting case in which the parser is guided by statistical information about the particular words in the sentence
Footnote
Contribution to a special section reviewing recent research in empirical methods in speech recognition, syntactic parsing, semantic processing, information extraction and machine translation

Similar documents (content)

  1. Losee, R.M.: Learning syntactic rules and tags with genetic algorithms for information retrieval and filtering : an empirical basis for grammatical rules (1996) 0.26
    0.2579029 = sum of:
      0.2579029 = product of:
        0.8059466 = sum of:
          0.017026477 = weight(abstract_txt:language in 5137) [ClassicSimilarity], result of:
            0.017026477 = score(doc=5137,freq=1.0), product of:
              0.05198722 = queryWeight, product of:
                4.192163 = idf(docFreq=1776, maxDocs=43254)
                0.012401049 = queryNorm
              0.32751274 = fieldWeight in 5137, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.192163 = idf(docFreq=1776, maxDocs=43254)
                0.078125 = fieldNorm(doc=5137)
          0.01977551 = weight(abstract_txt:particular in 5137) [ClassicSimilarity], result of:
            0.01977551 = score(doc=5137,freq=1.0), product of:
              0.057442304 = queryWeight, product of:
                1.0511571 = boost
                4.406622 = idf(docFreq=1433, maxDocs=43254)
                0.012401049 = queryNorm
              0.34426734 = fieldWeight in 5137, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.406622 = idf(docFreq=1433, maxDocs=43254)
                0.078125 = fieldNorm(doc=5137)
          0.022175469 = weight(abstract_txt:part in 5137) [ClassicSimilarity], result of:
            0.022175469 = score(doc=5137,freq=1.0), product of:
              0.06200051 = queryWeight, product of:
                1.0920671 = boost
                4.5781236 = idf(docFreq=1207, maxDocs=43254)
                0.012401049 = queryNorm
              0.3576659 = fieldWeight in 5137, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5781236 = idf(docFreq=1207, maxDocs=43254)
                0.078125 = fieldNorm(doc=5137)
          0.030778576 = weight(abstract_txt:natural in 5137) [ClassicSimilarity], result of:
            0.030778576 = score(doc=5137,freq=1.0), product of:
              0.07714582 = queryWeight, product of:
                1.21817 = boost
                5.106767 = idf(docFreq=711, maxDocs=43254)
                0.012401049 = queryNorm
              0.3989662 = fieldWeight in 5137, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.106767 = idf(docFreq=711, maxDocs=43254)
                0.078125 = fieldNorm(doc=5137)
          0.05223368 = weight(abstract_txt:parts in 5137) [ClassicSimilarity], result of:
            0.05223368 = score(doc=5137,freq=1.0), product of:
              0.10976077 = queryWeight, product of:
                1.4530324 = boost
                6.0913486 = idf(docFreq=265, maxDocs=43254)
                0.012401049 = queryNorm
              0.4758866 = fieldWeight in 5137, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0913486 = idf(docFreq=265, maxDocs=43254)
                0.078125 = fieldNorm(doc=5137)
          0.12854378 = weight(abstract_txt:syntactic in 5137) [ClassicSimilarity], result of:
            0.12854378 = score(doc=5137,freq=1.0), product of:
              0.2520717 = queryWeight, product of:
                3.1140728 = boost
                6.5273504 = idf(docFreq=171, maxDocs=43254)
                0.012401049 = queryNorm
              0.50994927 = fieldWeight in 5137, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5273504 = idf(docFreq=171, maxDocs=43254)
                0.078125 = fieldNorm(doc=5137)
          0.21355632 = weight(abstract_txt:speech in 5137) [ClassicSimilarity], result of:
            0.21355632 = score(doc=5137,freq=2.0), product of:
              0.28064352 = queryWeight, product of:
                3.2858233 = boost
                6.8873534 = idf(docFreq=119, maxDocs=43254)
                0.012401049 = queryNorm
              0.76095223 = fieldWeight in 5137, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.8873534 = idf(docFreq=119, maxDocs=43254)
                0.078125 = fieldNorm(doc=5137)
          0.32185677 = weight(abstract_txt:parsing in 5137) [ClassicSimilarity], result of:
            0.32185677 = score(doc=5137,freq=1.0), product of:
              0.532062 = queryWeight, product of:
                5.5410676 = boost
                7.7430196 = idf(docFreq=50, maxDocs=43254)
                0.012401049 = queryNorm
              0.6049234 = fieldWeight in 5137, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7430196 = idf(docFreq=50, maxDocs=43254)
                0.078125 = fieldNorm(doc=5137)
        0.32 = coord(8/25)
    
  2. Jacquemin, C.: What is the tree that we see through the window : a linguistic approach to windowing and term variation (1996) 0.24
    0.24261865 = sum of:
      0.24261865 = product of:
        0.8664952 = sum of:
          0.02043177 = weight(abstract_txt:language in 6647) [ClassicSimilarity], result of:
            0.02043177 = score(doc=6647,freq=1.0), product of:
              0.05198722 = queryWeight, product of:
                4.192163 = idf(docFreq=1776, maxDocs=43254)
                0.012401049 = queryNorm
              0.39301527 = fieldWeight in 6647, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.192163 = idf(docFreq=1776, maxDocs=43254)
                0.09375 = fieldNorm(doc=6647)
          0.03693429 = weight(abstract_txt:natural in 6647) [ClassicSimilarity], result of:
            0.03693429 = score(doc=6647,freq=1.0), product of:
              0.07714582 = queryWeight, product of:
                1.21817 = boost
                5.106767 = idf(docFreq=711, maxDocs=43254)
                0.012401049 = queryNorm
              0.4787594 = fieldWeight in 6647, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.106767 = idf(docFreq=711, maxDocs=43254)
                0.09375 = fieldNorm(doc=6647)
          0.060194988 = weight(abstract_txt:words in 6647) [ClassicSimilarity], result of:
            0.060194988 = score(doc=6647,freq=2.0), product of:
              0.08479875 = queryWeight, product of:
                1.2771634 = boost
                5.354077 = idf(docFreq=555, maxDocs=43254)
                0.012401049 = queryNorm
              0.709857 = fieldWeight in 6647, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.354077 = idf(docFreq=555, maxDocs=43254)
                0.09375 = fieldNorm(doc=6647)
          0.15706612 = weight(abstract_txt:parser in 6647) [ClassicSimilarity], result of:
            0.15706612 = score(doc=6647,freq=1.0), product of:
              0.20249498 = queryWeight, product of:
                1.9735987 = boost
                8.273647 = idf(docFreq=29, maxDocs=43254)
                0.012401049 = queryNorm
              0.77565444 = fieldWeight in 6647, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.273647 = idf(docFreq=29, maxDocs=43254)
                0.09375 = fieldNorm(doc=6647)
          0.051387392 = weight(abstract_txt:techniques in 6647) [ClassicSimilarity], result of:
            0.051387392 = score(doc=6647,freq=1.0), product of:
              0.12113611 = queryWeight, product of:
                2.1587558 = boost
                4.524928 = idf(docFreq=1273, maxDocs=43254)
                0.012401049 = queryNorm
              0.424212 = fieldWeight in 6647, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.524928 = idf(docFreq=1273, maxDocs=43254)
                0.09375 = fieldNorm(doc=6647)
          0.15425253 = weight(abstract_txt:syntactic in 6647) [ClassicSimilarity], result of:
            0.15425253 = score(doc=6647,freq=1.0), product of:
              0.2520717 = queryWeight, product of:
                3.1140728 = boost
                6.5273504 = idf(docFreq=171, maxDocs=43254)
                0.012401049 = queryNorm
              0.6119391 = fieldWeight in 6647, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5273504 = idf(docFreq=171, maxDocs=43254)
                0.09375 = fieldNorm(doc=6647)
          0.3862281 = weight(abstract_txt:parsing in 6647) [ClassicSimilarity], result of:
            0.3862281 = score(doc=6647,freq=1.0), product of:
              0.532062 = queryWeight, product of:
                5.5410676 = boost
                7.7430196 = idf(docFreq=50, maxDocs=43254)
                0.012401049 = queryNorm
              0.7259081 = fieldWeight in 6647, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7430196 = idf(docFreq=50, maxDocs=43254)
                0.09375 = fieldNorm(doc=6647)
        0.28 = coord(7/25)
    
  3. Chowdhury, G.G.: Natural language processing and information retrieval : pt.1: basic issues; pt.2: major applications (1991) 0.22
    0.22450767 = sum of:
      0.22450767 = product of:
        1.1225383 = sum of:
          0.05898144 = weight(abstract_txt:language in 3313) [ClassicSimilarity], result of:
            0.05898144 = score(doc=3313,freq=3.0), product of:
              0.05198722 = queryWeight, product of:
                4.192163 = idf(docFreq=1776, maxDocs=43254)
                0.012401049 = queryNorm
              1.1345373 = fieldWeight in 3313, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.192163 = idf(docFreq=1776, maxDocs=43254)
                0.15625 = fieldNorm(doc=3313)
          0.056135654 = weight(abstract_txt:reviews in 3313) [ClassicSimilarity], result of:
            0.056135654 = score(doc=3313,freq=1.0), product of:
              0.07254697 = queryWeight, product of:
                1.1813031 = boost
                4.952215 = idf(docFreq=830, maxDocs=43254)
                0.012401049 = queryNorm
              0.7737836 = fieldWeight in 3313, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.952215 = idf(docFreq=830, maxDocs=43254)
                0.15625 = fieldNorm(doc=3313)
          0.10662011 = weight(abstract_txt:natural in 3313) [ClassicSimilarity], result of:
            0.10662011 = score(doc=3313,freq=3.0), product of:
              0.07714582 = queryWeight, product of:
                1.21817 = boost
                5.106767 = idf(docFreq=711, maxDocs=43254)
                0.012401049 = queryNorm
              1.3820595 = fieldWeight in 3313, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.106767 = idf(docFreq=711, maxDocs=43254)
                0.15625 = fieldNorm(doc=3313)
          0.25708756 = weight(abstract_txt:syntactic in 3313) [ClassicSimilarity], result of:
            0.25708756 = score(doc=3313,freq=1.0), product of:
              0.2520717 = queryWeight, product of:
                3.1140728 = boost
                6.5273504 = idf(docFreq=171, maxDocs=43254)
                0.012401049 = queryNorm
              1.0198985 = fieldWeight in 3313, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5273504 = idf(docFreq=171, maxDocs=43254)
                0.15625 = fieldNorm(doc=3313)
          0.64371353 = weight(abstract_txt:parsing in 3313) [ClassicSimilarity], result of:
            0.64371353 = score(doc=3313,freq=1.0), product of:
              0.532062 = queryWeight, product of:
                5.5410676 = boost
                7.7430196 = idf(docFreq=50, maxDocs=43254)
                0.012401049 = queryNorm
              1.2098469 = fieldWeight in 3313, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7430196 = idf(docFreq=50, maxDocs=43254)
                0.15625 = fieldNorm(doc=3313)
        0.2 = coord(5/25)
    
  4. Multilingual information management : current levels and future abilities. A report Commissioned by the US National Science Foundation and also delivered to the European Commission's Language Engineering Office and the US Defense Advanced Research Projects Agency, April 1999 (1999) 0.22
    0.21578419 = sum of:
      0.21578419 = product of:
        0.77065784 = sum of:
          0.023837067 = weight(abstract_txt:language in 1069) [ClassicSimilarity], result of:
            0.023837067 = score(doc=1069,freq=4.0), product of:
              0.05198722 = queryWeight, product of:
                4.192163 = idf(docFreq=1776, maxDocs=43254)
                0.012401049 = queryNorm
              0.45851782 = fieldWeight in 1069, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.192163 = idf(docFreq=1776, maxDocs=43254)
                0.0546875 = fieldNorm(doc=1069)
          0.021545002 = weight(abstract_txt:natural in 1069) [ClassicSimilarity], result of:
            0.021545002 = score(doc=1069,freq=1.0), product of:
              0.07714582 = queryWeight, product of:
                1.21817 = boost
                5.106767 = idf(docFreq=711, maxDocs=43254)
                0.012401049 = queryNorm
              0.27927634 = fieldWeight in 1069, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.106767 = idf(docFreq=711, maxDocs=43254)
                0.0546875 = fieldNorm(doc=1069)
          0.012176351 = weight(abstract_txt:which in 1069) [ClassicSimilarity], result of:
            0.012176351 = score(doc=1069,freq=1.0), product of:
              0.07605595 = queryWeight, product of:
                2.0949755 = boost
                2.9274929 = idf(docFreq=6293, maxDocs=43254)
                0.012401049 = queryNorm
              0.16009727 = fieldWeight in 1069, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.9274929 = idf(docFreq=6293, maxDocs=43254)
                0.0546875 = fieldNorm(doc=1069)
          0.07342585 = weight(abstract_txt:techniques in 1069) [ClassicSimilarity], result of:
            0.07342585 = score(doc=1069,freq=6.0), product of:
              0.12113611 = queryWeight, product of:
                2.1587558 = boost
                4.524928 = idf(docFreq=1273, maxDocs=43254)
                0.012401049 = queryNorm
              0.6061434 = fieldWeight in 1069, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.524928 = idf(docFreq=1273, maxDocs=43254)
                0.0546875 = fieldNorm(doc=1069)
          0.1830864 = weight(abstract_txt:speech in 1069) [ClassicSimilarity], result of:
            0.1830864 = score(doc=1069,freq=3.0), product of:
              0.28064352 = queryWeight, product of:
                3.2858233 = boost
                6.8873534 = idf(docFreq=119, maxDocs=43254)
                0.012401049 = queryNorm
              0.65238065 = fieldWeight in 1069, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.8873534 = idf(docFreq=119, maxDocs=43254)
                0.0546875 = fieldNorm(doc=1069)
          0.3186219 = weight(abstract_txt:parsing in 1069) [ClassicSimilarity], result of:
            0.3186219 = score(doc=1069,freq=2.0), product of:
              0.532062 = queryWeight, product of:
                5.5410676 = boost
                7.7430196 = idf(docFreq=50, maxDocs=43254)
                0.012401049 = queryNorm
              0.5988436 = fieldWeight in 1069, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.7430196 = idf(docFreq=50, maxDocs=43254)
                0.0546875 = fieldNorm(doc=1069)
          0.13796526 = weight(abstract_txt:statistical in 1069) [ClassicSimilarity], result of:
            0.13796526 = score(doc=1069,freq=1.0), product of:
              0.45490202 = queryWeight, product of:
                6.614479 = boost
                5.545795 = idf(docFreq=458, maxDocs=43254)
                0.012401049 = queryNorm
              0.30328566 = fieldWeight in 1069, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.545795 = idf(docFreq=458, maxDocs=43254)
                0.0546875 = fieldNorm(doc=1069)
        0.28 = coord(7/25)
    
  5. Sikkel, K.: Parsing schemata : a framework for specification and analysis of parsing algorithms (1996) 0.21
    0.21426657 = sum of:
      0.21426657 = product of:
        0.8927774 = sum of:
          0.017026477 = weight(abstract_txt:language in 2686) [ClassicSimilarity], result of:
            0.017026477 = score(doc=2686,freq=1.0), product of:
              0.05198722 = queryWeight, product of:
                4.192163 = idf(docFreq=1776, maxDocs=43254)
                0.012401049 = queryNorm
              0.32751274 = fieldWeight in 2686, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.192163 = idf(docFreq=1776, maxDocs=43254)
                0.078125 = fieldNorm(doc=2686)
          0.028067827 = weight(abstract_txt:reviews in 2686) [ClassicSimilarity], result of:
            0.028067827 = score(doc=2686,freq=1.0), product of:
              0.07254697 = queryWeight, product of:
                1.1813031 = boost
                4.952215 = idf(docFreq=830, maxDocs=43254)
                0.012401049 = queryNorm
              0.3868918 = fieldWeight in 2686, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.952215 = idf(docFreq=830, maxDocs=43254)
                0.078125 = fieldNorm(doc=2686)
          0.030778576 = weight(abstract_txt:natural in 2686) [ClassicSimilarity], result of:
            0.030778576 = score(doc=2686,freq=1.0), product of:
              0.07714582 = queryWeight, product of:
                1.21817 = boost
                5.106767 = idf(docFreq=711, maxDocs=43254)
                0.012401049 = queryNorm
              0.3989662 = fieldWeight in 2686, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.106767 = idf(docFreq=711, maxDocs=43254)
                0.078125 = fieldNorm(doc=2686)
          0.13088845 = weight(abstract_txt:parser in 2686) [ClassicSimilarity], result of:
            0.13088845 = score(doc=2686,freq=1.0), product of:
              0.20249498 = queryWeight, product of:
                1.9735987 = boost
                8.273647 = idf(docFreq=29, maxDocs=43254)
                0.012401049 = queryNorm
              0.6463787 = fieldWeight in 2686, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.273647 = idf(docFreq=29, maxDocs=43254)
                0.078125 = fieldNorm(doc=2686)
          0.12854378 = weight(abstract_txt:syntactic in 2686) [ClassicSimilarity], result of:
            0.12854378 = score(doc=2686,freq=1.0), product of:
              0.2520717 = queryWeight, product of:
                3.1140728 = boost
                6.5273504 = idf(docFreq=171, maxDocs=43254)
                0.012401049 = queryNorm
              0.50994927 = fieldWeight in 2686, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5273504 = idf(docFreq=171, maxDocs=43254)
                0.078125 = fieldNorm(doc=2686)
          0.5574723 = weight(abstract_txt:parsing in 2686) [ClassicSimilarity], result of:
            0.5574723 = score(doc=2686,freq=3.0), product of:
              0.532062 = queryWeight, product of:
                5.5410676 = boost
                7.7430196 = idf(docFreq=50, maxDocs=43254)
                0.012401049 = queryNorm
              1.0477581 = fieldWeight in 2686, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.7430196 = idf(docFreq=50, maxDocs=43254)
                0.078125 = fieldNorm(doc=2686)
        0.24 = coord(6/25)