Document (#37468)

Author
Gödert, W.
Title
Detecting multiword phrases in mathematical text corpora
Source
http://arxiv.org/abs/1210.0852
Year
2012
Abstract
We present an approach for detecting multiword phrases in mathematical text corpora. The method used is based on characteristic features of mathematical terminology. It makes use of a software tool named Lingo which allows to identify words by means of previously defined dictionaries for specific word classes as adjectives, personal names or nouns. The detection of multiword groups is done algorithmically. Possible advantages of the method for indexing and information retrieval and conclusions for applying dictionary-based methods of automatic indexing instead of stemming procedures are discussed.
Footnote
Vgl. auch unter: http://hdl.handle.net/10760/17742.
Theme
Automatisches Indexieren
Field
Mathematik
Object
Lingo

Similar documents (author)

  1. Gödert, W.: Inhalte formal erschließen : Anspruch und Wirklichkeit (1984) 4.32
    4.323724 = sum of:
      4.323724 = weight(author_txt:gödert in 31) [ClassicSimilarity], result of:
        4.323724 = fieldWeight in 31, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          6.9179583 = idf(docFreq=114, maxDocs=42740)
          0.625 = fieldNorm(doc=31)
    
  2. Gödert, W.: Gegenwart und Zukunft der bibliothekarischen Sacherschließung : Gedanken unter Berücksichtigung des EDV-Einsatzes (1981) 4.32
    4.323724 = sum of:
      4.323724 = weight(author_txt:gödert in 165) [ClassicSimilarity], result of:
        4.323724 = fieldWeight in 165, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          6.9179583 = idf(docFreq=114, maxDocs=42740)
          0.625 = fieldNorm(doc=165)
    
  3. Gödert, W.: Syntax von Dokumentationssprachen im Online-Katalog (1988) 4.32
    4.323724 = sum of:
      4.323724 = weight(author_txt:gödert in 167) [ClassicSimilarity], result of:
        4.323724 = fieldWeight in 167, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          6.9179583 = idf(docFreq=114, maxDocs=42740)
          0.625 = fieldNorm(doc=167)
    
  4. Gödert, W.: Aufbereitung und Recherche von nach RSWK gebildeten Daten in der CD-ROM-Ausgabe der Deutschen Bibliographie (1990) 4.32
    4.323724 = sum of:
      4.323724 = weight(author_txt:gödert in 168) [ClassicSimilarity], result of:
        4.323724 = fieldWeight in 168, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          6.9179583 = idf(docFreq=114, maxDocs=42740)
          0.625 = fieldNorm(doc=168)
    
  5. Gödert, W.: Gestaltung sachlicher Abfragekomponenten für Online-Kataloge : Vortrag anläßlich der Tagung 'Automatisierte Sacherschließung - Status und Trends, Schloß Hofen, Lochau bei Bregenz, 17.4.-20.4.1989 (???) 4.32
    4.323724 = sum of:
      4.323724 = weight(author_txt:gödert in 170) [ClassicSimilarity], result of:
        4.323724 = fieldWeight in 170, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          6.9179583 = idf(docFreq=114, maxDocs=42740)
          0.625 = fieldNorm(doc=170)
    

Similar documents (content)

  1. Nagy T., I.: Detecting multiword expressions and named entities in natural language texts (2014) 0.30
    0.29717106 = sum of:
      0.29717106 = product of:
        0.92865956 = sum of:
          0.023449533 = weight(abstract_txt:previously in 3537) [ClassicSimilarity], result of:
            0.023449533 = score(doc=3537,freq=1.0), product of:
              0.09774728 = queryWeight, product of:
                1.0511093 = boost
                6.1414294 = idf(docFreq=249, maxDocs=42740)
                0.015142143 = queryNorm
              0.23989959 = fieldWeight in 3537, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1414294 = idf(docFreq=249, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3537)
          0.018921215 = weight(abstract_txt:based in 3537) [ClassicSimilarity], result of:
            0.018921215 = score(doc=3537,freq=8.0), product of:
              0.053369682 = queryWeight, product of:
                1.0983932 = boost
                3.2088501 = idf(docFreq=4693, maxDocs=42740)
                0.015142143 = queryNorm
              0.35453117 = fieldWeight in 3537, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                3.2088501 = idf(docFreq=4693, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3537)
          0.029535763 = weight(abstract_txt:dictionary in 3537) [ClassicSimilarity], result of:
            0.029535763 = score(doc=3537,freq=1.0), product of:
              0.11400241 = queryWeight, product of:
                1.1351482 = boost
                6.6324525 = idf(docFreq=152, maxDocs=42740)
                0.015142143 = queryNorm
              0.25908017 = fieldWeight in 3537, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6324525 = idf(docFreq=152, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3537)
          0.09109316 = weight(abstract_txt:detection in 3537) [ClassicSimilarity], result of:
            0.09109316 = score(doc=3537,freq=8.0), product of:
              0.12077464 = queryWeight, product of:
                1.1683781 = boost
                6.8266087 = idf(docFreq=125, maxDocs=42740)
                0.015142143 = queryNorm
              0.7542408 = fieldWeight in 3537, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                6.8266087 = idf(docFreq=125, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3537)
          0.06556557 = weight(abstract_txt:named in 3537) [ClassicSimilarity], result of:
            0.06556557 = score(doc=3537,freq=4.0), product of:
              0.12221161 = queryWeight, product of:
                1.1753082 = boost
                6.8671 = idf(docFreq=120, maxDocs=42740)
                0.015142143 = queryNorm
              0.53649217 = fieldWeight in 3537, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.8671 = idf(docFreq=120, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3537)
          0.013450548 = weight(abstract_txt:text in 3537) [ClassicSimilarity], result of:
            0.013450548 = score(doc=3537,freq=1.0), product of:
              0.0850195 = queryWeight, product of:
                1.3863401 = boost
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.015142143 = queryNorm
              0.15820545 = fieldWeight in 3537, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3537)
          0.04185336 = weight(abstract_txt:method in 3537) [ClassicSimilarity], result of:
            0.04185336 = score(doc=3537,freq=5.0), product of:
              0.105971426 = queryWeight, product of:
                1.5477647 = boost
                4.5216455 = idf(docFreq=1262, maxDocs=42740)
                0.015142143 = queryNorm
              0.3949495 = fieldWeight in 3537, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.5216455 = idf(docFreq=1262, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3537)
          0.6447904 = weight(abstract_txt:multiword in 3537) [ClassicSimilarity], result of:
            0.6447904 = score(doc=3537,freq=11.0), product of:
              0.5774802 = queryWeight, product of:
                4.4251165 = boost
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.015142143 = queryNorm
              1.1165584 = fieldWeight in 3537, product of:
                3.3166249 = tf(freq=11.0), with freq of:
                  11.0 = termFreq=11.0
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3537)
        0.32 = coord(8/25)
    
  2. Cruys, T. van de; Moirón, B.V.: Semantics-based multiword expression extraction (2007) 0.17
    0.16863392 = sum of:
      0.16863392 = product of:
        0.8431696 = sum of:
          0.016055185 = weight(abstract_txt:based in 4920) [ClassicSimilarity], result of:
            0.016055185 = score(doc=4920,freq=1.0), product of:
              0.053369682 = queryWeight, product of:
                1.0983932 = boost
                3.2088501 = idf(docFreq=4693, maxDocs=42740)
                0.015142143 = queryNorm
              0.3008297 = fieldWeight in 4920, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.2088501 = idf(docFreq=4693, maxDocs=42740)
                0.09375 = fieldNorm(doc=4920)
          0.12318774 = weight(abstract_txt:nouns in 4920) [ClassicSimilarity], result of:
            0.12318774 = score(doc=4920,freq=1.0), product of:
              0.16478565 = queryWeight, product of:
                1.3647567 = boost
                7.974011 = idf(docFreq=39, maxDocs=42740)
                0.015142143 = queryNorm
              0.74756354 = fieldWeight in 4920, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.974011 = idf(docFreq=39, maxDocs=42740)
                0.09375 = fieldNorm(doc=4920)
          0.063528925 = weight(abstract_txt:method in 4920) [ClassicSimilarity], result of:
            0.063528925 = score(doc=4920,freq=2.0), product of:
              0.105971426 = queryWeight, product of:
                1.5477647 = boost
                4.5216455 = idf(docFreq=1262, maxDocs=42740)
                0.015142143 = queryNorm
              0.5994911 = fieldWeight in 4920, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5216455 = idf(docFreq=1262, maxDocs=42740)
                0.09375 = fieldNorm(doc=4920)
          0.17380992 = weight(abstract_txt:corpora in 4920) [ClassicSimilarity], result of:
            0.17380992 = score(doc=4920,freq=1.0), product of:
              0.2611765 = queryWeight, product of:
                2.4298394 = boost
                7.098542 = idf(docFreq=95, maxDocs=42740)
                0.015142143 = queryNorm
              0.66548836 = fieldWeight in 4920, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.098542 = idf(docFreq=95, maxDocs=42740)
                0.09375 = fieldNorm(doc=4920)
          0.46658784 = weight(abstract_txt:multiword in 4920) [ClassicSimilarity], result of:
            0.46658784 = score(doc=4920,freq=1.0), product of:
              0.5774802 = queryWeight, product of:
                4.4251165 = boost
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.015142143 = queryNorm
              0.807972 = fieldWeight in 4920, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.09375 = fieldNorm(doc=4920)
        0.2 = coord(5/25)
    
  3. Dias, G.: Multiword unit hybrid extraction (o.J.) 0.11
    0.11051514 = sum of:
      0.11051514 = product of:
        0.92095953 = sum of:
          0.102656454 = weight(abstract_txt:nouns in 2644) [ClassicSimilarity], result of:
            0.102656454 = score(doc=2644,freq=1.0), product of:
              0.16478565 = queryWeight, product of:
                1.3647567 = boost
                7.974011 = idf(docFreq=39, maxDocs=42740)
                0.015142143 = queryNorm
              0.6229696 = fieldWeight in 2644, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.974011 = idf(docFreq=39, maxDocs=42740)
                0.078125 = fieldNorm(doc=2644)
          0.14484158 = weight(abstract_txt:corpora in 2644) [ClassicSimilarity], result of:
            0.14484158 = score(doc=2644,freq=1.0), product of:
              0.2611765 = queryWeight, product of:
                2.4298394 = boost
                7.098542 = idf(docFreq=95, maxDocs=42740)
                0.015142143 = queryNorm
              0.5545736 = fieldWeight in 2644, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.098542 = idf(docFreq=95, maxDocs=42740)
                0.078125 = fieldNorm(doc=2644)
          0.6734615 = weight(abstract_txt:multiword in 2644) [ClassicSimilarity], result of:
            0.6734615 = score(doc=2644,freq=3.0), product of:
              0.5774802 = queryWeight, product of:
                4.4251165 = boost
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.015142143 = queryNorm
              1.1662071 = fieldWeight in 2644, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.078125 = fieldNorm(doc=2644)
        0.12 = coord(3/25)
    
  4. Terada, A.; Tokunaga, T.; Tanaka, H.: Automatic expansion of abbreviations by using context and character information (2004) 0.11
    0.10952352 = sum of:
      0.10952352 = product of:
        0.34226102 = sum of:
          0.03751925 = weight(abstract_txt:previously in 3561) [ClassicSimilarity], result of:
            0.03751925 = score(doc=3561,freq=1.0), product of:
              0.09774728 = queryWeight, product of:
                1.0511093 = boost
                6.1414294 = idf(docFreq=249, maxDocs=42740)
                0.015142143 = queryNorm
              0.38383934 = fieldWeight in 3561, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1414294 = idf(docFreq=249, maxDocs=42740)
                0.0625 = fieldNorm(doc=3561)
          0.0385854 = weight(abstract_txt:instead in 3561) [ClassicSimilarity], result of:
            0.0385854 = score(doc=3561,freq=1.0), product of:
              0.099590346 = queryWeight, product of:
                1.0609726 = boost
                6.1990585 = idf(docFreq=235, maxDocs=42740)
                0.015142143 = queryNorm
              0.38744116 = fieldWeight in 3561, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1990585 = idf(docFreq=235, maxDocs=42740)
                0.0625 = fieldNorm(doc=3561)
          0.010703457 = weight(abstract_txt:based in 3561) [ClassicSimilarity], result of:
            0.010703457 = score(doc=3561,freq=1.0), product of:
              0.053369682 = queryWeight, product of:
                1.0983932 = boost
                3.2088501 = idf(docFreq=4693, maxDocs=42740)
                0.015142143 = queryNorm
              0.20055313 = fieldWeight in 3561, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.2088501 = idf(docFreq=4693, maxDocs=42740)
                0.0625 = fieldNorm(doc=3561)
          0.047257222 = weight(abstract_txt:dictionary in 3561) [ClassicSimilarity], result of:
            0.047257222 = score(doc=3561,freq=1.0), product of:
              0.11400241 = queryWeight, product of:
                1.1351482 = boost
                6.6324525 = idf(docFreq=152, maxDocs=42740)
                0.015142143 = queryNorm
              0.41452828 = fieldWeight in 3561, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6324525 = idf(docFreq=152, maxDocs=42740)
                0.0625 = fieldNorm(doc=3561)
          0.062197022 = weight(abstract_txt:dictionaries in 3561) [ClassicSimilarity], result of:
            0.062197022 = score(doc=3561,freq=1.0), product of:
              0.13691413 = queryWeight, product of:
                1.243998 = boost
                7.268441 = idf(docFreq=80, maxDocs=42740)
                0.015142143 = queryNorm
              0.45427758 = fieldWeight in 3561, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.268441 = idf(docFreq=80, maxDocs=42740)
                0.0625 = fieldNorm(doc=3561)
          0.082125165 = weight(abstract_txt:nouns in 3561) [ClassicSimilarity], result of:
            0.082125165 = score(doc=3561,freq=1.0), product of:
              0.16478565 = queryWeight, product of:
                1.3647567 = boost
                7.974011 = idf(docFreq=39, maxDocs=42740)
                0.015142143 = queryNorm
              0.49837568 = fieldWeight in 3561, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.974011 = idf(docFreq=39, maxDocs=42740)
                0.0625 = fieldNorm(doc=3561)
          0.021520875 = weight(abstract_txt:text in 3561) [ClassicSimilarity], result of:
            0.021520875 = score(doc=3561,freq=1.0), product of:
              0.0850195 = queryWeight, product of:
                1.3863401 = boost
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.015142143 = queryNorm
              0.2531287 = fieldWeight in 3561, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.0625 = fieldNorm(doc=3561)
          0.04235262 = weight(abstract_txt:method in 3561) [ClassicSimilarity], result of:
            0.04235262 = score(doc=3561,freq=2.0), product of:
              0.105971426 = queryWeight, product of:
                1.5477647 = boost
                4.5216455 = idf(docFreq=1262, maxDocs=42740)
                0.015142143 = queryNorm
              0.39966077 = fieldWeight in 3561, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5216455 = idf(docFreq=1262, maxDocs=42740)
                0.0625 = fieldNorm(doc=3561)
        0.32 = coord(8/25)
    
  5. Wordhoard (o.J.) 0.10
    0.10046024 = sum of:
      0.10046024 = product of:
        0.8371687 = sum of:
          0.026901096 = weight(abstract_txt:text in 923) [ClassicSimilarity], result of:
            0.026901096 = score(doc=923,freq=1.0), product of:
              0.0850195 = queryWeight, product of:
                1.3863401 = boost
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.015142143 = queryNorm
              0.3164109 = fieldWeight in 923, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.078125 = fieldNorm(doc=923)
          0.26038852 = weight(abstract_txt:phrases in 923) [ClassicSimilarity], result of:
            0.26038852 = score(doc=923,freq=4.0), product of:
              0.24325761 = queryWeight, product of:
                2.3450048 = boost
                6.850706 = idf(docFreq=122, maxDocs=42740)
                0.015142143 = queryNorm
              1.0704229 = fieldWeight in 923, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.850706 = idf(docFreq=122, maxDocs=42740)
                0.078125 = fieldNorm(doc=923)
          0.5498791 = weight(abstract_txt:multiword in 923) [ClassicSimilarity], result of:
            0.5498791 = score(doc=923,freq=2.0), product of:
              0.5774802 = queryWeight, product of:
                4.4251165 = boost
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.015142143 = queryNorm
              0.95220417 = fieldWeight in 923, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.078125 = fieldNorm(doc=923)
        0.12 = coord(3/25)