Document (#28583)

Author
Lepsky, K.
Vorhauer, J.
Title
Lingo - ein open source System für die Automatische Indexierung deutschsprachiger Dokumente
Source
ABI-Technik. 26(2006) H.1, S.18-28
Year
2006
Abstract
Lingo ist ein frei verfügbares System (open source) zur automatischen Indexierung der deutschen Sprache. Bei der Entwicklung von lingo standen hohe Konfigurierbarkeit und Flexibilität des Systems für unterschiedliche Einsatzmöglichkeiten im Vordergrund. Der Beitrag zeigt den Nutzen einer linguistisch basierten automatischen Indexierung für das Information Retrieval auf. Die für eine Retrievalverbesserung zur Verfügung stehende linguistische Funktionalität von lingo wird vorgestellt und an Beispielen erläutert: Grundformerkennung, Kompositumerkennung bzw. Kompositumzerlegung, Wortrelationierung, lexikalische und algorithmische Mehrwortgruppenerkennung, OCR-Fehlerkorrektur. Der offene Systemaufbau von lingo wird beschrieben, mögliche Einsatzszenarien und Anwendungsgrenzen werden benannt.
Theme
Automatisches Indexieren
Object
Lingo

Similar documents (author)

  1. Lepsky, K.: Art and language : Ernst H. Gombrich and Karl Bühler's theory of language (1996) 5.05
    5.0496073 = sum of:
      5.0496073 = weight(author_txt:lepsky in 5229) [ClassicSimilarity], result of:
        5.0496073 = fieldWeight in 5229, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.079371 = idf(docFreq=35, maxDocs=42740)
          0.625 = fieldNorm(doc=5229)
    
  2. Lepsky, K.: Maschinelle Indexierung von Titelaufnahmen zur Verbesserung der sachlichen Erschließung in Online-Publikumskatalogen (1994) 5.05
    5.0496073 = sum of:
      5.0496073 = weight(author_txt:lepsky in 7064) [ClassicSimilarity], result of:
        5.0496073 = fieldWeight in 7064, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.079371 = idf(docFreq=35, maxDocs=42740)
          0.625 = fieldNorm(doc=7064)
    
  3. Lepsky, K.: RSWK - und was noch? : Stellungnahme zum Bericht 'Sacherschließung in Online-Katalogen' der Expertengruppe Online-Kataloge (1995) 5.05
    5.0496073 = sum of:
      5.0496073 = weight(author_txt:lepsky in 841) [ClassicSimilarity], result of:
        5.0496073 = fieldWeight in 841, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.079371 = idf(docFreq=35, maxDocs=42740)
          0.625 = fieldNorm(doc=841)
    
  4. Lepsky, K.: Bild und Wirklichkeit : die Wirklichkeit im Bild (1987) 5.05
    5.0496073 = sum of:
      5.0496073 = weight(author_txt:lepsky in 1415) [ClassicSimilarity], result of:
        5.0496073 = fieldWeight in 1415, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.079371 = idf(docFreq=35, maxDocs=42740)
          0.625 = fieldNorm(doc=1415)
    
  5. Lepsky, K.: Ernst H. Gombrich : Theorie und Methode (1991) 5.05
    5.0496073 = sum of:
      5.0496073 = weight(author_txt:lepsky in 1754) [ClassicSimilarity], result of:
        5.0496073 = fieldWeight in 1754, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.079371 = idf(docFreq=35, maxDocs=42740)
          0.625 = fieldNorm(doc=1754)
    

Similar documents (content)

  1. Bredack, J.: Terminologieextraktion von Mehrwortgruppen in kunsthistorischen Fachtexten (2013) 0.27
    0.26517615 = sum of:
      0.26517615 = product of:
        0.8286755 = sum of:
          0.023185184 = weight(abstract_txt:mögliche in 3055) [ClassicSimilarity], result of:
            0.023185184 = score(doc=3055,freq=1.0), product of:
              0.08611767 = queryWeight, product of:
                1.003656 = boost
                6.8922057 = idf(docFreq=117, maxDocs=42740)
                0.012449421 = queryNorm
              0.2692268 = fieldWeight in 3055, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8922057 = idf(docFreq=117, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3055)
          0.017273229 = weight(abstract_txt:wird in 3055) [ClassicSimilarity], result of:
            0.017273229 = score(doc=3055,freq=5.0), product of:
              0.05214599 = queryWeight, product of:
                1.1044961 = boost
                3.7923427 = idf(docFreq=2618, maxDocs=42740)
                0.012449421 = queryNorm
              0.33124748 = fieldWeight in 3055, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.7923427 = idf(docFreq=2618, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3055)
          0.05807724 = weight(abstract_txt:linguistisch in 3055) [ClassicSimilarity], result of:
            0.05807724 = score(doc=3055,freq=1.0), product of:
              0.15883854 = queryWeight, product of:
                1.3630654 = boost
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.012449421 = queryNorm
              0.36563694 = fieldWeight in 3055, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3055)
          0.08213361 = weight(abstract_txt:algorithmische in 3055) [ClassicSimilarity], result of:
            0.08213361 = score(doc=3055,freq=2.0), product of:
              0.15883854 = queryWeight, product of:
                1.3630654 = boost
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.012449421 = queryNorm
              0.5170887 = fieldWeight in 3055, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3055)
          0.06497253 = weight(abstract_txt:lexikalische in 3055) [ClassicSimilarity], result of:
            0.06497253 = score(doc=3055,freq=1.0), product of:
              0.17117424 = queryWeight, product of:
                1.415005 = boost
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.012449421 = queryNorm
              0.37956953 = fieldWeight in 3055, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3055)
          0.016559295 = weight(abstract_txt:open in 3055) [ClassicSimilarity], result of:
            0.016559295 = score(doc=3055,freq=1.0), product of:
              0.08669415 = queryWeight, product of:
                1.4241267 = boost
                4.88981 = idf(docFreq=873, maxDocs=42740)
                0.012449421 = queryNorm
              0.19100821 = fieldWeight in 3055, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.88981 = idf(docFreq=873, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3055)
          0.018982342 = weight(abstract_txt:source in 3055) [ClassicSimilarity], result of:
            0.018982342 = score(doc=3055,freq=1.0), product of:
              0.094957314 = queryWeight, product of:
                1.4904518 = boost
                5.117541 = idf(docFreq=695, maxDocs=42740)
                0.012449421 = queryNorm
              0.19990394 = fieldWeight in 3055, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.117541 = idf(docFreq=695, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3055)
          0.547492 = weight(abstract_txt:lingo in 3055) [ClassicSimilarity], result of:
            0.547492 = score(doc=3055,freq=4.0), product of:
              0.7635552 = queryWeight, product of:
                6.682577 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.012449421 = queryNorm
              0.71703005 = fieldWeight in 3055, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3055)
        0.32 = coord(8/25)
    
  2. Glaesener, L.: Automatisches Indexieren einer informationswissenschaftlichen Datenbank mit Mehrwortgruppen (2012) 0.22
    0.22313724 = sum of:
      0.22313724 = product of:
        1.3946078 = sum of:
          0.07559961 = weight(abstract_txt:automatische in 2402) [ClassicSimilarity], result of:
            0.07559961 = score(doc=2402,freq=1.0), product of:
              0.08720304 = queryWeight, product of:
                1.0099609 = boost
                6.9355025 = idf(docFreq=112, maxDocs=42740)
                0.012449421 = queryNorm
              0.8669378 = fieldWeight in 2402, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9355025 = idf(docFreq=112, maxDocs=42740)
                0.125 = fieldNorm(doc=2402)
          0.14730228 = weight(abstract_txt:automatischen in 2402) [ClassicSimilarity], result of:
            0.14730228 = score(doc=2402,freq=1.0), product of:
              0.17139636 = queryWeight, product of:
                2.002417 = boost
                6.8753986 = idf(docFreq=119, maxDocs=42740)
                0.012449421 = queryNorm
              0.8594248 = fieldWeight in 2402, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8753986 = idf(docFreq=119, maxDocs=42740)
                0.125 = fieldNorm(doc=2402)
          0.29571873 = weight(abstract_txt:indexierung in 2402) [ClassicSimilarity], result of:
            0.29571873 = score(doc=2402,freq=2.0), product of:
              0.24781917 = queryWeight, product of:
                2.948946 = boost
                6.7502356 = idf(docFreq=135, maxDocs=42740)
                0.012449421 = queryNorm
              1.1932843 = fieldWeight in 2402, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.7502356 = idf(docFreq=135, maxDocs=42740)
                0.125 = fieldNorm(doc=2402)
          0.87598723 = weight(abstract_txt:lingo in 2402) [ClassicSimilarity], result of:
            0.87598723 = score(doc=2402,freq=1.0), product of:
              0.7635552 = queryWeight, product of:
                6.682577 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.012449421 = queryNorm
              1.147248 = fieldWeight in 2402, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.125 = fieldNorm(doc=2402)
        0.16 = coord(4/25)
    
  3. Scherer, B.: Automatische Indexierung und ihre Anwendung im DFG-Projekt "Gemeinsames Portal für Bibliotheken, Archive und Museen (BAM)" (2003) 0.17
    0.17479746 = sum of:
      0.17479746 = product of:
        0.62427664 = sum of:
          0.03709629 = weight(abstract_txt:mögliche in 1284) [ClassicSimilarity], result of:
            0.03709629 = score(doc=1284,freq=1.0), product of:
              0.08611767 = queryWeight, product of:
                1.003656 = boost
                6.8922057 = idf(docFreq=117, maxDocs=42740)
                0.012449421 = queryNorm
              0.43076286 = fieldWeight in 1284, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8922057 = idf(docFreq=117, maxDocs=42740)
                0.0625 = fieldNorm(doc=1284)
          0.053457 = weight(abstract_txt:automatische in 1284) [ClassicSimilarity], result of:
            0.053457 = score(doc=1284,freq=2.0), product of:
              0.08720304 = queryWeight, product of:
                1.0099609 = boost
                6.9355025 = idf(docFreq=112, maxDocs=42740)
                0.012449421 = queryNorm
              0.6130176 = fieldWeight in 1284, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.9355025 = idf(docFreq=112, maxDocs=42740)
                0.0625 = fieldNorm(doc=1284)
          0.04419033 = weight(abstract_txt:vordergrund in 1284) [ClassicSimilarity], result of:
            0.04419033 = score(doc=1284,freq=1.0), product of:
              0.096773565 = queryWeight, product of:
                1.0639399 = boost
                7.306182 = idf(docFreq=77, maxDocs=42740)
                0.012449421 = queryNorm
              0.45663637 = fieldWeight in 1284, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.306182 = idf(docFreq=77, maxDocs=42740)
                0.0625 = fieldNorm(doc=1284)
          0.012359717 = weight(abstract_txt:wird in 1284) [ClassicSimilarity], result of:
            0.012359717 = score(doc=1284,freq=1.0), product of:
              0.05214599 = queryWeight, product of:
                1.1044961 = boost
                3.7923427 = idf(docFreq=2618, maxDocs=42740)
                0.012449421 = queryNorm
              0.23702142 = fieldWeight in 1284, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7923427 = idf(docFreq=2618, maxDocs=42740)
                0.0625 = fieldNorm(doc=1284)
          0.07377105 = weight(abstract_txt:linguistische in 1284) [ClassicSimilarity], result of:
            0.07377105 = score(doc=1284,freq=1.0), product of:
              0.13618499 = queryWeight, product of:
                1.2621279 = boost
                8.667158 = idf(docFreq=19, maxDocs=42740)
                0.012449421 = queryNorm
              0.5416974 = fieldWeight in 1284, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.667158 = idf(docFreq=19, maxDocs=42740)
                0.0625 = fieldNorm(doc=1284)
          0.14730228 = weight(abstract_txt:automatischen in 1284) [ClassicSimilarity], result of:
            0.14730228 = score(doc=1284,freq=4.0), product of:
              0.17139636 = queryWeight, product of:
                2.002417 = boost
                6.8753986 = idf(docFreq=119, maxDocs=42740)
                0.012449421 = queryNorm
              0.8594248 = fieldWeight in 1284, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.8753986 = idf(docFreq=119, maxDocs=42740)
                0.0625 = fieldNorm(doc=1284)
          0.25609994 = weight(abstract_txt:indexierung in 1284) [ClassicSimilarity], result of:
            0.25609994 = score(doc=1284,freq=6.0), product of:
              0.24781917 = queryWeight, product of:
                2.948946 = boost
                6.7502356 = idf(docFreq=135, maxDocs=42740)
                0.012449421 = queryNorm
              1.0334146 = fieldWeight in 1284, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.7502356 = idf(docFreq=135, maxDocs=42740)
                0.0625 = fieldNorm(doc=1284)
        0.28 = coord(7/25)
    
  4. Jersek, T.: Automatische DDC-Klassifizierung mit Lingo : Vorgehensweise und Ergebnisse (2012) 0.16
    0.1556214 = sum of:
      0.1556214 = product of:
        1.296845 = sum of:
          0.030588739 = weight(abstract_txt:wird in 2123) [ClassicSimilarity], result of:
            0.030588739 = score(doc=2123,freq=2.0), product of:
              0.05214599 = queryWeight, product of:
                1.1044961 = boost
                3.7923427 = idf(docFreq=2618, maxDocs=42740)
                0.012449421 = queryNorm
              0.5865981 = fieldWeight in 2123, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.7923427 = idf(docFreq=2618, maxDocs=42740)
                0.109375 = fieldNorm(doc=2123)
          0.18227728 = weight(abstract_txt:automatischen in 2123) [ClassicSimilarity], result of:
            0.18227728 = score(doc=2123,freq=2.0), product of:
              0.17139636 = queryWeight, product of:
                2.002417 = boost
                6.8753986 = idf(docFreq=119, maxDocs=42740)
                0.012449421 = queryNorm
              1.063484 = fieldWeight in 2123, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.8753986 = idf(docFreq=119, maxDocs=42740)
                0.109375 = fieldNorm(doc=2123)
          1.0839789 = weight(abstract_txt:lingo in 2123) [ClassicSimilarity], result of:
            1.0839789 = score(doc=2123,freq=2.0), product of:
              0.7635552 = queryWeight, product of:
                6.682577 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.012449421 = queryNorm
              1.419647 = fieldWeight in 2123, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.109375 = fieldNorm(doc=2123)
        0.12 = coord(3/25)
    
  5. Grün, S.: Mehrwortbegriffe und Latent Semantic Analysis : Bewertung automatisch extrahierter Mehrwortgruppen mit LSA (2017) 0.14
    0.13675076 = sum of:
      0.13675076 = product of:
        0.8546922 = sum of:
          0.037799805 = weight(abstract_txt:automatische in 5955) [ClassicSimilarity], result of:
            0.037799805 = score(doc=5955,freq=1.0), product of:
              0.08720304 = queryWeight, product of:
                1.0099609 = boost
                6.9355025 = idf(docFreq=112, maxDocs=42740)
                0.012449421 = queryNorm
              0.4334689 = fieldWeight in 5955, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9355025 = idf(docFreq=112, maxDocs=42740)
                0.0625 = fieldNorm(doc=5955)
          0.09292358 = weight(abstract_txt:algorithmische in 5955) [ClassicSimilarity], result of:
            0.09292358 = score(doc=5955,freq=1.0), product of:
              0.15883854 = queryWeight, product of:
                1.3630654 = boost
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.012449421 = queryNorm
              0.5850191 = fieldWeight in 5955, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.0625 = fieldNorm(doc=5955)
          0.10455236 = weight(abstract_txt:indexierung in 5955) [ClassicSimilarity], result of:
            0.10455236 = score(doc=5955,freq=1.0), product of:
              0.24781917 = queryWeight, product of:
                2.948946 = boost
                6.7502356 = idf(docFreq=135, maxDocs=42740)
                0.012449421 = queryNorm
              0.42188972 = fieldWeight in 5955, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7502356 = idf(docFreq=135, maxDocs=42740)
                0.0625 = fieldNorm(doc=5955)
          0.6194165 = weight(abstract_txt:lingo in 5955) [ClassicSimilarity], result of:
            0.6194165 = score(doc=5955,freq=2.0), product of:
              0.7635552 = queryWeight, product of:
                6.682577 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.012449421 = queryNorm
              0.81122684 = fieldWeight in 5955, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.0625 = fieldNorm(doc=5955)
        0.16 = coord(4/25)