Document (#37467)

Author
Gödert, W.
Title
Detecting multiword phrases in mathematical text corpora
Source
http://arxiv.org/abs/1210.0852
Year
2012
Abstract
We present an approach for detecting multiword phrases in mathematical text corpora. The method used is based on characteristic features of mathematical terminology. It makes use of a software tool named Lingo which allows to identify words by means of previously defined dictionaries for specific word classes as adjectives, personal names or nouns. The detection of multiword groups is done algorithmically. Possible advantages of the method for indexing and information retrieval and conclusions for applying dictionary-based methods of automatic indexing instead of stemming procedures are discussed.
Footnote
Vgl. auch unter: http://hdl.handle.net/10760/17742.
Theme
Automatisches Indexieren
Field
Mathematik
Object
Lingo

Similar documents (author)

  1. Gödert, W.: Inhalte formal erschließen : Anspruch und Wirklichkeit (1984) 4.33
    4.334196 = sum of:
      4.334196 = weight(author_txt:gödert in 31) [ClassicSimilarity], result of:
        4.334196 = fieldWeight in 31, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          6.9347134 = idf(docFreq=116, maxDocs=44218)
          0.625 = fieldNorm(doc=31)
    
  2. Gödert, W.: Gegenwart und Zukunft der bibliothekarischen Sacherschließung : Gedanken unter Berücksichtigung des EDV-Einsatzes (1981) 4.33
    4.334196 = sum of:
      4.334196 = weight(author_txt:gödert in 165) [ClassicSimilarity], result of:
        4.334196 = fieldWeight in 165, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          6.9347134 = idf(docFreq=116, maxDocs=44218)
          0.625 = fieldNorm(doc=165)
    
  3. Gödert, W.: Syntax von Dokumentationssprachen im Online-Katalog (1988) 4.33
    4.334196 = sum of:
      4.334196 = weight(author_txt:gödert in 167) [ClassicSimilarity], result of:
        4.334196 = fieldWeight in 167, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          6.9347134 = idf(docFreq=116, maxDocs=44218)
          0.625 = fieldNorm(doc=167)
    
  4. Gödert, W.: Aufbereitung und Recherche von nach RSWK gebildeten Daten in der CD-ROM-Ausgabe der Deutschen Bibliographie (1990) 4.33
    4.334196 = sum of:
      4.334196 = weight(author_txt:gödert in 168) [ClassicSimilarity], result of:
        4.334196 = fieldWeight in 168, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          6.9347134 = idf(docFreq=116, maxDocs=44218)
          0.625 = fieldNorm(doc=168)
    
  5. Gödert, W.: Gestaltung sachlicher Abfragekomponenten für Online-Kataloge : Vortrag anläßlich der Tagung 'Automatisierte Sacherschließung - Status und Trends, Schloß Hofen, Lochau bei Bregenz, 17.4.-20.4.1989 (???) 4.33
    4.334196 = sum of:
      4.334196 = weight(author_txt:gödert in 170) [ClassicSimilarity], result of:
        4.334196 = fieldWeight in 170, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          6.9347134 = idf(docFreq=116, maxDocs=44218)
          0.625 = fieldNorm(doc=170)
    

Similar documents (content)

  1. Nagy T., I.: Detecting multiword expressions and named entities in natural language texts (2014) 0.30
    0.29880345 = sum of:
      0.29880345 = product of:
        0.9337608 = sum of:
          0.023426255 = weight(abstract_txt:previously in 1536) [ClassicSimilarity], result of:
            0.023426255 = score(doc=1536,freq=1.0), product of:
              0.097733386 = queryWeight, product of:
                1.0514221 = boost
                6.1362057 = idf(docFreq=259, maxDocs=44218)
                0.015148371 = queryNorm
              0.23969553 = fieldWeight in 1536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1362057 = idf(docFreq=259, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1536)
          0.018582474 = weight(abstract_txt:based in 1536) [ClassicSimilarity], result of:
            0.018582474 = score(doc=1536,freq=8.0), product of:
              0.05275821 = queryWeight, product of:
                1.0924854 = boost
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.015148371 = queryNorm
              0.35221958 = fieldWeight in 1536, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1536)
          0.029606493 = weight(abstract_txt:dictionary in 1536) [ClassicSimilarity], result of:
            0.029606493 = score(doc=1536,freq=1.0), product of:
              0.11424372 = queryWeight, product of:
                1.1367679 = boost
                6.634292 = idf(docFreq=157, maxDocs=44218)
                0.015148371 = queryNorm
              0.25915202 = fieldWeight in 1536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.634292 = idf(docFreq=157, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1536)
          0.089546844 = weight(abstract_txt:detection in 1536) [ClassicSimilarity], result of:
            0.089546844 = score(doc=1536,freq=8.0), product of:
              0.11946607 = queryWeight, product of:
                1.1624597 = boost
                6.784232 = idf(docFreq=135, maxDocs=44218)
                0.015148371 = queryNorm
              0.7495588 = fieldWeight in 1536, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                6.784232 = idf(docFreq=135, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1536)
          0.0643738 = weight(abstract_txt:named in 1536) [ClassicSimilarity], result of:
            0.0643738 = score(doc=1536,freq=4.0), product of:
              0.120788924 = queryWeight, product of:
                1.168878 = boost
                6.82169 = idf(docFreq=130, maxDocs=44218)
                0.015148371 = queryNorm
              0.53294456 = fieldWeight in 1536, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.82169 = idf(docFreq=130, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1536)
          0.013409843 = weight(abstract_txt:text in 1536) [ClassicSimilarity], result of:
            0.013409843 = score(doc=1536,freq=1.0), product of:
              0.08489201 = queryWeight, product of:
                1.3858111 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.015148371 = queryNorm
              0.15796354 = fieldWeight in 1536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1536)
          0.041345738 = weight(abstract_txt:method in 1536) [ClassicSimilarity], result of:
            0.041345738 = score(doc=1536,freq=5.0), product of:
              0.10516749 = queryWeight, product of:
                1.542451 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.015148371 = queryNorm
              0.3931418 = fieldWeight in 1536, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1536)
          0.6534694 = weight(abstract_txt:multiword in 1536) [ClassicSimilarity], result of:
            0.6534694 = score(doc=1536,freq=11.0), product of:
              0.58295363 = queryWeight, product of:
                4.447677 = boost
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.015148371 = queryNorm
              1.1209629 = fieldWeight in 1536, product of:
                3.3166249 = tf(freq=11.0), with freq of:
                  11.0 = termFreq=11.0
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1536)
        0.32 = coord(8/25)
    
  2. Cruys, T. van de; Moirón, B.V.: Semantics-based multiword expression extraction (2007) 0.17
    0.16900359 = sum of:
      0.16900359 = product of:
        0.8450179 = sum of:
          0.015767753 = weight(abstract_txt:based in 2919) [ClassicSimilarity], result of:
            0.015767753 = score(doc=2919,freq=1.0), product of:
              0.05275821 = queryWeight, product of:
                1.0924854 = boost
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.015148371 = queryNorm
              0.29886824 = fieldWeight in 2919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.09375 = fieldNorm(doc=2919)
          0.12496487 = weight(abstract_txt:nouns in 2919) [ClassicSimilarity], result of:
            0.12496487 = score(doc=2919,freq=1.0), product of:
              0.1664532 = queryWeight, product of:
                1.3721503 = boost
                8.008008 = idf(docFreq=39, maxDocs=44218)
                0.015148371 = queryNorm
              0.7507508 = fieldWeight in 2919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.008008 = idf(docFreq=39, maxDocs=44218)
                0.09375 = fieldNorm(doc=2919)
          0.062758416 = weight(abstract_txt:method in 2919) [ClassicSimilarity], result of:
            0.062758416 = score(doc=2919,freq=2.0), product of:
              0.10516749 = queryWeight, product of:
                1.542451 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.015148371 = queryNorm
              0.5967473 = fieldWeight in 2919, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.09375 = fieldNorm(doc=2919)
          0.16865869 = weight(abstract_txt:corpora in 2919) [ClassicSimilarity], result of:
            0.16865869 = score(doc=2919,freq=1.0), product of:
              0.25612345 = queryWeight, product of:
                2.4071064 = boost
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.015148371 = queryNorm
              0.65850544 = fieldWeight in 2919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.09375 = fieldNorm(doc=2919)
          0.4728682 = weight(abstract_txt:multiword in 2919) [ClassicSimilarity], result of:
            0.4728682 = score(doc=2919,freq=1.0), product of:
              0.58295363 = queryWeight, product of:
                4.447677 = boost
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.015148371 = queryNorm
              0.8111592 = fieldWeight in 2919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.09375 = fieldNorm(doc=2919)
        0.2 = coord(5/25)
    
  3. Huang, X.; Robertson, S.E.: Application of probilistic methods to Chinese text retrieval (1997) 0.13
    0.12823457 = sum of:
      0.12823457 = product of:
        0.45798057 = sum of:
          0.04837075 = weight(abstract_txt:done in 4706) [ClassicSimilarity], result of:
            0.04837075 = score(doc=4706,freq=1.0), product of:
              0.08840743 = queryWeight, product of:
                5.836101 = idf(docFreq=350, maxDocs=44218)
                0.015148371 = queryNorm
              0.54713446 = fieldWeight in 4706, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.836101 = idf(docFreq=350, maxDocs=44218)
                0.09375 = fieldNorm(doc=4706)
          0.022298967 = weight(abstract_txt:based in 4706) [ClassicSimilarity], result of:
            0.022298967 = score(doc=4706,freq=2.0), product of:
              0.05275821 = queryWeight, product of:
                1.0924854 = boost
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.015148371 = queryNorm
              0.42266348 = fieldWeight in 4706, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.09375 = fieldNorm(doc=4706)
          0.07105558 = weight(abstract_txt:dictionary in 4706) [ClassicSimilarity], result of:
            0.07105558 = score(doc=4706,freq=1.0), product of:
              0.11424372 = queryWeight, product of:
                1.1367679 = boost
                6.634292 = idf(docFreq=157, maxDocs=44218)
                0.015148371 = queryNorm
              0.6219649 = fieldWeight in 4706, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.634292 = idf(docFreq=157, maxDocs=44218)
                0.09375 = fieldNorm(doc=4706)
          0.05574367 = weight(abstract_txt:text in 4706) [ClassicSimilarity], result of:
            0.05574367 = score(doc=4706,freq=3.0), product of:
              0.08489201 = queryWeight, product of:
                1.3858111 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.015148371 = queryNorm
              0.6566421 = fieldWeight in 4706, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.09375 = fieldNorm(doc=4706)
          0.040048715 = weight(abstract_txt:indexing in 4706) [ClassicSimilarity], result of:
            0.040048715 = score(doc=4706,freq=1.0), product of:
              0.09821306 = queryWeight, product of:
                1.4905798 = boost
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.015148371 = queryNorm
              0.40777382 = fieldWeight in 4706, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.09375 = fieldNorm(doc=4706)
          0.062758416 = weight(abstract_txt:method in 4706) [ClassicSimilarity], result of:
            0.062758416 = score(doc=4706,freq=2.0), product of:
              0.10516749 = queryWeight, product of:
                1.542451 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.015148371 = queryNorm
              0.5967473 = fieldWeight in 4706, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.09375 = fieldNorm(doc=4706)
          0.15770449 = weight(abstract_txt:phrases in 4706) [ClassicSimilarity], result of:
            0.15770449 = score(doc=4706,freq=1.0), product of:
              0.24490984 = queryWeight, product of:
                2.3538227 = boost
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.015148371 = queryNorm
              0.64392877 = fieldWeight in 4706, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.09375 = fieldNorm(doc=4706)
        0.28 = coord(7/25)
    
  4. Dias, G.: Multiword unit hybrid extraction (o.J.) 0.11
    0.111265525 = sum of:
      0.111265525 = product of:
        0.9272127 = sum of:
          0.10413738 = weight(abstract_txt:nouns in 643) [ClassicSimilarity], result of:
            0.10413738 = score(doc=643,freq=1.0), product of:
              0.1664532 = queryWeight, product of:
                1.3721503 = boost
                8.008008 = idf(docFreq=39, maxDocs=44218)
                0.015148371 = queryNorm
              0.6256256 = fieldWeight in 643, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.008008 = idf(docFreq=39, maxDocs=44218)
                0.078125 = fieldNorm(doc=643)
          0.14054891 = weight(abstract_txt:corpora in 643) [ClassicSimilarity], result of:
            0.14054891 = score(doc=643,freq=1.0), product of:
              0.25612345 = queryWeight, product of:
                2.4071064 = boost
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.015148371 = queryNorm
              0.5487546 = fieldWeight in 643, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.078125 = fieldNorm(doc=643)
          0.6825264 = weight(abstract_txt:multiword in 643) [ClassicSimilarity], result of:
            0.6825264 = score(doc=643,freq=3.0), product of:
              0.58295363 = queryWeight, product of:
                4.447677 = boost
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.015148371 = queryNorm
              1.1708074 = fieldWeight in 643, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.078125 = fieldNorm(doc=643)
        0.12 = coord(3/25)
    
  5. Terada, A.; Tokunaga, T.; Tanaka, H.: Automatic expansion of abbreviations by using context and character information (2004) 0.11
    0.10926026 = sum of:
      0.10926026 = product of:
        0.34143832 = sum of:
          0.03748201 = weight(abstract_txt:previously in 2560) [ClassicSimilarity], result of:
            0.03748201 = score(doc=2560,freq=1.0), product of:
              0.097733386 = queryWeight, product of:
                1.0514221 = boost
                6.1362057 = idf(docFreq=259, maxDocs=44218)
                0.015148371 = queryNorm
              0.38351285 = fieldWeight in 2560, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1362057 = idf(docFreq=259, maxDocs=44218)
                0.0625 = fieldNorm(doc=2560)
          0.037838977 = weight(abstract_txt:instead in 2560) [ClassicSimilarity], result of:
            0.037838977 = score(doc=2560,freq=1.0), product of:
              0.09835293 = queryWeight, product of:
                1.0547494 = boost
                6.155624 = idf(docFreq=254, maxDocs=44218)
                0.015148371 = queryNorm
              0.3847265 = fieldWeight in 2560, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.155624 = idf(docFreq=254, maxDocs=44218)
                0.0625 = fieldNorm(doc=2560)
          0.010511835 = weight(abstract_txt:based in 2560) [ClassicSimilarity], result of:
            0.010511835 = score(doc=2560,freq=1.0), product of:
              0.05275821 = queryWeight, product of:
                1.0924854 = boost
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.015148371 = queryNorm
              0.19924548 = fieldWeight in 2560, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.0625 = fieldNorm(doc=2560)
          0.04737039 = weight(abstract_txt:dictionary in 2560) [ClassicSimilarity], result of:
            0.04737039 = score(doc=2560,freq=1.0), product of:
              0.11424372 = queryWeight, product of:
                1.1367679 = boost
                6.634292 = idf(docFreq=157, maxDocs=44218)
                0.015148371 = queryNorm
              0.41464326 = fieldWeight in 2560, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.634292 = idf(docFreq=157, maxDocs=44218)
                0.0625 = fieldNorm(doc=2560)
          0.06163053 = weight(abstract_txt:dictionaries in 2560) [ClassicSimilarity], result of:
            0.06163053 = score(doc=2560,freq=1.0), product of:
              0.1361523 = queryWeight, product of:
                1.2409894 = boost
                7.24254 = idf(docFreq=85, maxDocs=44218)
                0.015148371 = queryNorm
              0.45265874 = fieldWeight in 2560, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.24254 = idf(docFreq=85, maxDocs=44218)
                0.0625 = fieldNorm(doc=2560)
          0.08330991 = weight(abstract_txt:nouns in 2560) [ClassicSimilarity], result of:
            0.08330991 = score(doc=2560,freq=1.0), product of:
              0.1664532 = queryWeight, product of:
                1.3721503 = boost
                8.008008 = idf(docFreq=39, maxDocs=44218)
                0.015148371 = queryNorm
              0.5005005 = fieldWeight in 2560, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.008008 = idf(docFreq=39, maxDocs=44218)
                0.0625 = fieldNorm(doc=2560)
          0.021455748 = weight(abstract_txt:text in 2560) [ClassicSimilarity], result of:
            0.021455748 = score(doc=2560,freq=1.0), product of:
              0.08489201 = queryWeight, product of:
                1.3858111 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.015148371 = queryNorm
              0.25274166 = fieldWeight in 2560, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=2560)
          0.04183894 = weight(abstract_txt:method in 2560) [ClassicSimilarity], result of:
            0.04183894 = score(doc=2560,freq=2.0), product of:
              0.10516749 = queryWeight, product of:
                1.542451 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.015148371 = queryNorm
              0.3978315 = fieldWeight in 2560, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.0625 = fieldNorm(doc=2560)
        0.32 = coord(8/25)