Document (#14648)

Author
Jacquemin, C.
Title
What is the tree that we see through the window : a linguistic approach to windowing and term variation
Source
Information processing and management. 32(1996) no.4, S.445-458
Year
1996
Abstract
Provides a linguistic approach to text windowing through an extraction of term variants with the help of a partial parser. The syntactic grounding of the method ensures ehat words observed within restricted spans are lexically related and that spurious word cooccurrences are rules out with a good level of confidence. The system is computationally tractable on large corpora and large lists of terms. Gives illustrative examples of term variation from a large medical corpus. An experimental evaluation of the method shows that only a small proportion of co-occuring words are lexically related and motivates the call for natural language parsing techniques in text windowing
Theme
Computerlinguistik

Similar documents (content)

  1. Jacquemin, C.: Spotting and discovering terms through natural language processing (2001) 0.22
    0.221333 = sum of:
      0.221333 = product of:
        0.61481386 = sum of:
          0.11219872 = weight(abstract_txt:variants in 119) [ClassicSimilarity], result of:
            0.11219872 = score(doc=119,freq=2.0), product of:
              0.1690881 = queryWeight, product of:
                1.0262988 = boost
                7.5072327 = idf(docFreq=65, maxDocs=44218)
                0.021946201 = queryNorm
              0.66355187 = fieldWeight in 119, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.5072327 = idf(docFreq=65, maxDocs=44218)
                0.0625 = fieldNorm(doc=119)
          0.024203666 = weight(abstract_txt:through in 119) [ClassicSimilarity], result of:
            0.024203666 = score(doc=119,freq=1.0), product of:
              0.09654472 = queryWeight, product of:
                1.096722 = boost
                4.011184 = idf(docFreq=2176, maxDocs=44218)
                0.021946201 = queryNorm
              0.250699 = fieldWeight in 119, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.011184 = idf(docFreq=2176, maxDocs=44218)
                0.0625 = fieldNorm(doc=119)
          0.035072666 = weight(abstract_txt:text in 119) [ClassicSimilarity], result of:
            0.035072666 = score(doc=119,freq=2.0), product of:
              0.09812438 = queryWeight, product of:
                1.1056578 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.021946201 = queryNorm
              0.3574307 = fieldWeight in 119, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=119)
          0.10705101 = weight(abstract_txt:parser in 119) [ClassicSimilarity], result of:
            0.10705101 = score(doc=119,freq=1.0), product of:
              0.20647061 = queryWeight, product of:
                1.1340871 = boost
                8.29569 = idf(docFreq=29, maxDocs=44218)
                0.021946201 = queryNorm
              0.5184806 = fieldWeight in 119, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.29569 = idf(docFreq=29, maxDocs=44218)
                0.0625 = fieldNorm(doc=119)
          0.027878117 = weight(abstract_txt:related in 119) [ClassicSimilarity], result of:
            0.027878117 = score(doc=119,freq=1.0), product of:
              0.106084034 = queryWeight, product of:
                1.1496279 = boost
                4.2046843 = idf(docFreq=1793, maxDocs=44218)
                0.021946201 = queryNorm
              0.26279277 = fieldWeight in 119, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2046843 = idf(docFreq=1793, maxDocs=44218)
                0.0625 = fieldNorm(doc=119)
          0.03419604 = weight(abstract_txt:method in 119) [ClassicSimilarity], result of:
            0.03419604 = score(doc=119,freq=1.0), product of:
              0.12156026 = queryWeight, product of:
                1.2306317 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.021946201 = queryNorm
              0.28130937 = fieldWeight in 119, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.0625 = fieldNorm(doc=119)
          0.08135265 = weight(abstract_txt:words in 119) [ClassicSimilarity], result of:
            0.08135265 = score(doc=119,freq=2.0), product of:
              0.17194079 = queryWeight, product of:
                1.4635978 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.021946201 = queryNorm
              0.47314343 = fieldWeight in 119, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.0625 = fieldNorm(doc=119)
          0.10481285 = weight(abstract_txt:linguistic in 119) [ClassicSimilarity], result of:
            0.10481285 = score(doc=119,freq=2.0), product of:
              0.20358266 = queryWeight, product of:
                1.5925852 = boost
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.021946201 = queryNorm
              0.51484174 = fieldWeight in 119, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.0625 = fieldNorm(doc=119)
          0.088048145 = weight(abstract_txt:term in 119) [ClassicSimilarity], result of:
            0.088048145 = score(doc=119,freq=2.0), product of:
              0.2074794 = queryWeight, product of:
                1.9690893 = boost
                4.8012047 = idf(docFreq=987, maxDocs=44218)
                0.021946201 = queryNorm
              0.42437053 = fieldWeight in 119, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.8012047 = idf(docFreq=987, maxDocs=44218)
                0.0625 = fieldNorm(doc=119)
        0.36 = coord(9/25)
    
  2. Bowker, L.: ¬A corpus-based investigation of variation in the organization of medical terms (2000) 0.15
    0.14621016 = sum of:
      0.14621016 = product of:
        0.7310508 = sum of:
          0.029554533 = weight(abstract_txt:approach in 92) [ClassicSimilarity], result of:
            0.029554533 = score(doc=92,freq=1.0), product of:
              0.084171094 = queryWeight, product of:
                1.0240326 = boost
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.021946201 = queryNorm
              0.3511245 = fieldWeight in 92, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.09375 = fieldNorm(doc=92)
          0.037200175 = weight(abstract_txt:text in 92) [ClassicSimilarity], result of:
            0.037200175 = score(doc=92,freq=1.0), product of:
              0.09812438 = queryWeight, product of:
                1.1056578 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.021946201 = queryNorm
              0.37911248 = fieldWeight in 92, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.09375 = fieldNorm(doc=92)
          0.11117081 = weight(abstract_txt:linguistic in 92) [ClassicSimilarity], result of:
            0.11117081 = score(doc=92,freq=1.0), product of:
              0.20358266 = queryWeight, product of:
                1.5925852 = boost
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.021946201 = queryNorm
              0.5460721 = fieldWeight in 92, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.09375 = fieldNorm(doc=92)
          0.42105305 = weight(abstract_txt:variation in 92) [ClassicSimilarity], result of:
            0.42105305 = score(doc=92,freq=5.0), product of:
              0.28927758 = queryWeight, product of:
                1.8984085 = boost
                6.943297 = idf(docFreq=115, maxDocs=44218)
                0.021946201 = queryNorm
              1.4555329 = fieldWeight in 92, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.943297 = idf(docFreq=115, maxDocs=44218)
                0.09375 = fieldNorm(doc=92)
          0.13207221 = weight(abstract_txt:term in 92) [ClassicSimilarity], result of:
            0.13207221 = score(doc=92,freq=2.0), product of:
              0.2074794 = queryWeight, product of:
                1.9690893 = boost
                4.8012047 = idf(docFreq=987, maxDocs=44218)
                0.021946201 = queryNorm
              0.6365558 = fieldWeight in 92, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.8012047 = idf(docFreq=987, maxDocs=44218)
                0.09375 = fieldNorm(doc=92)
        0.2 = coord(5/25)
    
  3. Galvez, C.; Moya-Anegón, F. de; Solana, V.H.: Term conflation methods in information retrieval : non-linguistic and linguistic approaches (2005) 0.14
    0.14160687 = sum of:
      0.14160687 = product of:
        0.59002864 = sum of:
          0.03483035 = weight(abstract_txt:approach in 4394) [ClassicSimilarity], result of:
            0.03483035 = score(doc=4394,freq=2.0), product of:
              0.084171094 = queryWeight, product of:
                1.0240326 = boost
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.021946201 = queryNorm
              0.41380417 = fieldWeight in 4394, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.078125 = fieldNorm(doc=4394)
          0.1402484 = weight(abstract_txt:variants in 4394) [ClassicSimilarity], result of:
            0.1402484 = score(doc=4394,freq=2.0), product of:
              0.1690881 = queryWeight, product of:
                1.0262988 = boost
                7.5072327 = idf(docFreq=65, maxDocs=44218)
                0.021946201 = queryNorm
              0.8294398 = fieldWeight in 4394, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.5072327 = idf(docFreq=65, maxDocs=44218)
                0.078125 = fieldNorm(doc=4394)
          0.030254584 = weight(abstract_txt:through in 4394) [ClassicSimilarity], result of:
            0.030254584 = score(doc=4394,freq=1.0), product of:
              0.09654472 = queryWeight, product of:
                1.096722 = boost
                4.011184 = idf(docFreq=2176, maxDocs=44218)
                0.021946201 = queryNorm
              0.31337377 = fieldWeight in 4394, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.011184 = idf(docFreq=2176, maxDocs=44218)
                0.078125 = fieldNorm(doc=4394)
          0.04274505 = weight(abstract_txt:method in 4394) [ClassicSimilarity], result of:
            0.04274505 = score(doc=4394,freq=1.0), product of:
              0.12156026 = queryWeight, product of:
                1.2306317 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.021946201 = queryNorm
              0.3516367 = fieldWeight in 4394, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.078125 = fieldNorm(doc=4394)
          0.20715459 = weight(abstract_txt:linguistic in 4394) [ClassicSimilarity], result of:
            0.20715459 = score(doc=4394,freq=5.0), product of:
              0.20358266 = queryWeight, product of:
                1.5925852 = boost
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.021946201 = queryNorm
              1.0175453 = fieldWeight in 4394, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.078125 = fieldNorm(doc=4394)
          0.13479564 = weight(abstract_txt:term in 4394) [ClassicSimilarity], result of:
            0.13479564 = score(doc=4394,freq=3.0), product of:
              0.2074794 = queryWeight, product of:
                1.9690893 = boost
                4.8012047 = idf(docFreq=987, maxDocs=44218)
                0.021946201 = queryNorm
              0.64968204 = fieldWeight in 4394, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.8012047 = idf(docFreq=987, maxDocs=44218)
                0.078125 = fieldNorm(doc=4394)
        0.24 = coord(6/25)
    
  4. Bodoff, D.; Kambil, A.: Partial coordination : I. The best of pre-coordination and post-coordination (1998) 0.12
    0.121183634 = sum of:
      0.121183634 = product of:
        0.60591817 = sum of:
          0.024628777 = weight(abstract_txt:approach in 2322) [ClassicSimilarity], result of:
            0.024628777 = score(doc=2322,freq=1.0), product of:
              0.084171094 = queryWeight, product of:
                1.0240326 = boost
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.021946201 = queryNorm
              0.29260373 = fieldWeight in 2322, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.078125 = fieldNorm(doc=2322)
          0.15715636 = weight(abstract_txt:motivates in 2322) [ClassicSimilarity], result of:
            0.15715636 = score(doc=2322,freq=1.0), product of:
              0.22983299 = queryWeight, product of:
                1.1965296 = boost
                8.752448 = idf(docFreq=18, maxDocs=44218)
                0.021946201 = queryNorm
              0.683785 = fieldWeight in 2322, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.752448 = idf(docFreq=18, maxDocs=44218)
                0.078125 = fieldNorm(doc=2322)
          0.28417304 = weight(abstract_txt:spurious in 2322) [ClassicSimilarity], result of:
            0.28417304 = score(doc=2322,freq=2.0), product of:
              0.27075073 = queryWeight, product of:
                1.2986798 = boost
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.021946201 = queryNorm
              1.0495744 = fieldWeight in 2322, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.078125 = fieldNorm(doc=2322)
          0.062135708 = weight(abstract_txt:large in 2322) [ClassicSimilarity], result of:
            0.062135708 = score(doc=2322,freq=1.0), product of:
              0.17856334 = queryWeight, product of:
                1.8267288 = boost
                4.454089 = idf(docFreq=1397, maxDocs=44218)
                0.021946201 = queryNorm
              0.34797573 = fieldWeight in 2322, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.454089 = idf(docFreq=1397, maxDocs=44218)
                0.078125 = fieldNorm(doc=2322)
          0.0778243 = weight(abstract_txt:term in 2322) [ClassicSimilarity], result of:
            0.0778243 = score(doc=2322,freq=1.0), product of:
              0.2074794 = queryWeight, product of:
                1.9690893 = boost
                4.8012047 = idf(docFreq=987, maxDocs=44218)
                0.021946201 = queryNorm
              0.37509412 = fieldWeight in 2322, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8012047 = idf(docFreq=987, maxDocs=44218)
                0.078125 = fieldNorm(doc=2322)
        0.2 = coord(5/25)
    
  5. Srihari, R.K.: Computational models for integrating linguistic and visual information : a survey (1994/95) 0.12
    0.11742403 = sum of:
      0.11742403 = product of:
        0.5871202 = sum of:
          0.16255327 = weight(abstract_txt:computationally in 2244) [ClassicSimilarity], result of:
            0.16255327 = score(doc=2244,freq=1.0), product of:
              0.20816165 = queryWeight, product of:
                1.1387218 = boost
                8.329592 = idf(docFreq=28, maxDocs=44218)
                0.021946201 = queryNorm
              0.7808992 = fieldWeight in 2244, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.329592 = idf(docFreq=28, maxDocs=44218)
                0.09375 = fieldNorm(doc=2244)
          0.041817173 = weight(abstract_txt:related in 2244) [ClassicSimilarity], result of:
            0.041817173 = score(doc=2244,freq=1.0), product of:
              0.106084034 = queryWeight, product of:
                1.1496279 = boost
                4.2046843 = idf(docFreq=1793, maxDocs=44218)
                0.021946201 = queryNorm
              0.39418915 = fieldWeight in 2244, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2046843 = idf(docFreq=1793, maxDocs=44218)
                0.09375 = fieldNorm(doc=2244)
          0.1852914 = weight(abstract_txt:spans in 2244) [ClassicSimilarity], result of:
            0.1852914 = score(doc=2244,freq=1.0), product of:
              0.22714704 = queryWeight, product of:
                1.1895175 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.021946201 = queryNorm
              0.81573325 = fieldWeight in 2244, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.09375 = fieldNorm(doc=2244)
          0.08628752 = weight(abstract_txt:words in 2244) [ClassicSimilarity], result of:
            0.08628752 = score(doc=2244,freq=1.0), product of:
              0.17194079 = queryWeight, product of:
                1.4635978 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.021946201 = queryNorm
              0.5018444 = fieldWeight in 2244, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.09375 = fieldNorm(doc=2244)
          0.11117081 = weight(abstract_txt:linguistic in 2244) [ClassicSimilarity], result of:
            0.11117081 = score(doc=2244,freq=1.0), product of:
              0.20358266 = queryWeight, product of:
                1.5925852 = boost
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.021946201 = queryNorm
              0.5460721 = fieldWeight in 2244, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.09375 = fieldNorm(doc=2244)
        0.2 = coord(5/25)