Document (#28582)

Author
Lepsky, K.
Vorhauer, J.
Title
Lingo - ein open source System für die Automatische Indexierung deutschsprachiger Dokumente
Source
ABI-Technik. 26(2006) H.1, S.18-28
Year
2006
Abstract
Lingo ist ein frei verfügbares System (open source) zur automatischen Indexierung der deutschen Sprache. Bei der Entwicklung von lingo standen hohe Konfigurierbarkeit und Flexibilität des Systems für unterschiedliche Einsatzmöglichkeiten im Vordergrund. Der Beitrag zeigt den Nutzen einer linguistisch basierten automatischen Indexierung für das Information Retrieval auf. Die für eine Retrievalverbesserung zur Verfügung stehende linguistische Funktionalität von lingo wird vorgestellt und an Beispielen erläutert: Grundformerkennung, Kompositumerkennung bzw. Kompositumzerlegung, Wortrelationierung, lexikalische und algorithmische Mehrwortgruppenerkennung, OCR-Fehlerkorrektur. Der offene Systemaufbau von lingo wird beschrieben, mögliche Einsatzszenarien und Anwendungsgrenzen werden benannt.
Theme
Automatisches Indexieren
Object
Lingo

Similar documents (author)

  1. Lepsky, K.: Art and language : Ernst H. Gombrich and Karl Bühler's theory of language (1996) 5.04
    5.0370636 = sum of:
      5.0370636 = weight(author_txt:lepsky in 5229) [ClassicSimilarity], result of:
        5.0370636 = fieldWeight in 5229, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.625 = fieldNorm(doc=5229)
    
  2. Lepsky, K.: Maschinelle Indexierung von Titelaufnahmen zur Verbesserung der sachlichen Erschließung in Online-Publikumskatalogen (1994) 5.04
    5.0370636 = sum of:
      5.0370636 = weight(author_txt:lepsky in 7064) [ClassicSimilarity], result of:
        5.0370636 = fieldWeight in 7064, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.625 = fieldNorm(doc=7064)
    
  3. Lepsky, K.: RSWK - und was noch? : Stellungnahme zum Bericht 'Sacherschließung in Online-Katalogen' der Expertengruppe Online-Kataloge (1995) 5.04
    5.0370636 = sum of:
      5.0370636 = weight(author_txt:lepsky in 772) [ClassicSimilarity], result of:
        5.0370636 = fieldWeight in 772, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.625 = fieldNorm(doc=772)
    
  4. Lepsky, K.: Bild und Wirklichkeit : die Wirklichkeit im Bild (1987) 5.04
    5.0370636 = sum of:
      5.0370636 = weight(author_txt:lepsky in 1346) [ClassicSimilarity], result of:
        5.0370636 = fieldWeight in 1346, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.625 = fieldNorm(doc=1346)
    
  5. Lepsky, K.: Ernst H. Gombrich : Theorie und Methode (1991) 5.04
    5.0370636 = sum of:
      5.0370636 = weight(author_txt:lepsky in 1685) [ClassicSimilarity], result of:
        5.0370636 = fieldWeight in 1685, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.625 = fieldNorm(doc=1685)
    

Similar documents (content)

  1. Bredack, J.: Terminologieextraktion von Mehrwortgruppen in kunsthistorischen Fachtexten (2013) 0.26
    0.2643192 = sum of:
      0.2643192 = product of:
        0.8259976 = sum of:
          0.02304247 = weight(abstract_txt:mögliche in 1054) [ClassicSimilarity], result of:
            0.02304247 = score(doc=1054,freq=1.0), product of:
              0.08568086 = queryWeight, product of:
                1.00467 = boost
                6.8847027 = idf(docFreq=122, maxDocs=44218)
                0.012387258 = queryNorm
              0.2689337 = fieldWeight in 1054, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8847027 = idf(docFreq=122, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
          0.016963286 = weight(abstract_txt:wird in 1054) [ClassicSimilarity], result of:
            0.016963286 = score(doc=1054,freq=5.0), product of:
              0.051470425 = queryWeight, product of:
                1.1012233 = boost
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.012387258 = queryNorm
              0.32957345 = fieldWeight in 1054, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
          0.07806342 = weight(abstract_txt:algorithmische in 1054) [ClassicSimilarity], result of:
            0.07806342 = score(doc=1054,freq=2.0), product of:
              0.15339793 = queryWeight, product of:
                1.3442848 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.012387258 = queryNorm
              0.50889486 = fieldWeight in 1054, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
          0.05854194 = weight(abstract_txt:linguistisch in 1054) [ClassicSimilarity], result of:
            0.05854194 = score(doc=1054,freq=1.0), product of:
              0.15953006 = queryWeight, product of:
                1.3708906 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.012387258 = queryNorm
              0.36696494 = fieldWeight in 1054, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
          0.06281346 = weight(abstract_txt:lexikalische in 1054) [ClassicSimilarity], result of:
            0.06281346 = score(doc=1054,freq=1.0), product of:
              0.16719872 = queryWeight, product of:
                1.4034535 = boost
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.012387258 = queryNorm
              0.3756815 = fieldWeight in 1054, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
          0.01585078 = weight(abstract_txt:open in 1054) [ClassicSimilarity], result of:
            0.01585078 = score(doc=1054,freq=1.0), product of:
              0.08412174 = queryWeight, product of:
                1.4078314 = boost
                4.8237233 = idf(docFreq=965, maxDocs=44218)
                0.012387258 = queryNorm
              0.18842669 = fieldWeight in 1054, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8237233 = idf(docFreq=965, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
          0.01873044 = weight(abstract_txt:source in 1054) [ClassicSimilarity], result of:
            0.01873044 = score(doc=1054,freq=1.0), product of:
              0.094024226 = queryWeight, product of:
                1.4883889 = boost
                5.0997415 = idf(docFreq=732, maxDocs=44218)
                0.012387258 = queryNorm
              0.19920865 = fieldWeight in 1054, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0997415 = idf(docFreq=732, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
          0.55199176 = weight(abstract_txt:lingo in 1054) [ClassicSimilarity], result of:
            0.55199176 = score(doc=1054,freq=4.0), product of:
              0.76698965 = queryWeight, product of:
                6.721424 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.012387258 = queryNorm
              0.71968603 = fieldWeight in 1054, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
        0.32 = coord(8/25)
    
  2. Glaesener, L.: Automatisches Indexieren einer informationswissenschaftlichen Datenbank mit Mehrwortgruppen (2012) 0.22
    0.22407351 = sum of:
      0.22407351 = product of:
        1.4004595 = sum of:
          0.07480328 = weight(abstract_txt:automatische in 401) [ClassicSimilarity], result of:
            0.07480328 = score(doc=401,freq=1.0), product of:
              0.08650573 = queryWeight, product of:
                1.0094945 = boost
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.012387258 = queryNorm
              0.86472046 = fieldWeight in 401, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.125 = fieldNorm(doc=401)
          0.1469521 = weight(abstract_txt:automatischen in 401) [ClassicSimilarity], result of:
            0.1469521 = score(doc=401,freq=1.0), product of:
              0.17095888 = queryWeight, product of:
                2.0069768 = boost
                6.8766055 = idf(docFreq=123, maxDocs=44218)
                0.012387258 = queryNorm
              0.8595757 = fieldWeight in 401, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8766055 = idf(docFreq=123, maxDocs=44218)
                0.125 = fieldNorm(doc=401)
          0.29551733 = weight(abstract_txt:indexierung in 401) [ClassicSimilarity], result of:
            0.29551733 = score(doc=401,freq=2.0), product of:
              0.24746676 = queryWeight, product of:
                2.9573355 = boost
                6.7552447 = idf(docFreq=139, maxDocs=44218)
                0.012387258 = queryNorm
              1.1941698 = fieldWeight in 401, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.7552447 = idf(docFreq=139, maxDocs=44218)
                0.125 = fieldNorm(doc=401)
          0.88318676 = weight(abstract_txt:lingo in 401) [ClassicSimilarity], result of:
            0.88318676 = score(doc=401,freq=1.0), product of:
              0.76698965 = queryWeight, product of:
                6.721424 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.012387258 = queryNorm
              1.1514976 = fieldWeight in 401, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.125 = fieldNorm(doc=401)
        0.16 = coord(4/25)
    
  3. Scherer, B.: Automatische Indexierung und ihre Anwendung im DFG-Projekt "Gemeinsames Portal für Bibliotheken, Archive und Museen (BAM)" (2003) 0.17
    0.17443337 = sum of:
      0.17443337 = product of:
        0.6229763 = sum of:
          0.03686795 = weight(abstract_txt:mögliche in 4283) [ClassicSimilarity], result of:
            0.03686795 = score(doc=4283,freq=1.0), product of:
              0.08568086 = queryWeight, product of:
                1.00467 = boost
                6.8847027 = idf(docFreq=122, maxDocs=44218)
                0.012387258 = queryNorm
              0.43029392 = fieldWeight in 4283, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8847027 = idf(docFreq=122, maxDocs=44218)
                0.0625 = fieldNorm(doc=4283)
          0.052893907 = weight(abstract_txt:automatische in 4283) [ClassicSimilarity], result of:
            0.052893907 = score(doc=4283,freq=2.0), product of:
              0.08650573 = queryWeight, product of:
                1.0094945 = boost
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.012387258 = queryNorm
              0.6114497 = fieldWeight in 4283, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.0625 = fieldNorm(doc=4283)
          0.043772973 = weight(abstract_txt:vordergrund in 4283) [ClassicSimilarity], result of:
            0.043772973 = score(doc=4283,freq=1.0), product of:
              0.096070156 = queryWeight, product of:
                1.0638387 = boost
                7.290168 = idf(docFreq=81, maxDocs=44218)
                0.012387258 = queryNorm
              0.4556355 = fieldWeight in 4283, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.290168 = idf(docFreq=81, maxDocs=44218)
                0.0625 = fieldNorm(doc=4283)
          0.012137938 = weight(abstract_txt:wird in 4283) [ClassicSimilarity], result of:
            0.012137938 = score(doc=4283,freq=1.0), product of:
              0.051470425 = queryWeight, product of:
                1.1012233 = boost
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.012387258 = queryNorm
              0.23582356 = fieldWeight in 4283, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.0625 = fieldNorm(doc=4283)
          0.074425906 = weight(abstract_txt:linguistische in 4283) [ClassicSimilarity], result of:
            0.074425906 = score(doc=4283,freq=1.0), product of:
              0.13685706 = queryWeight, product of:
                1.269741 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.012387258 = queryNorm
              0.54382217 = fieldWeight in 4283, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.0625 = fieldNorm(doc=4283)
          0.1469521 = weight(abstract_txt:automatischen in 4283) [ClassicSimilarity], result of:
            0.1469521 = score(doc=4283,freq=4.0), product of:
              0.17095888 = queryWeight, product of:
                2.0069768 = boost
                6.8766055 = idf(docFreq=123, maxDocs=44218)
                0.012387258 = queryNorm
              0.8595757 = fieldWeight in 4283, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.8766055 = idf(docFreq=123, maxDocs=44218)
                0.0625 = fieldNorm(doc=4283)
          0.25592554 = weight(abstract_txt:indexierung in 4283) [ClassicSimilarity], result of:
            0.25592554 = score(doc=4283,freq=6.0), product of:
              0.24746676 = queryWeight, product of:
                2.9573355 = boost
                6.7552447 = idf(docFreq=139, maxDocs=44218)
                0.012387258 = queryNorm
              1.0341815 = fieldWeight in 4283, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.7552447 = idf(docFreq=139, maxDocs=44218)
                0.0625 = fieldNorm(doc=4283)
        0.28 = coord(7/25)
    
  4. Jersek, T.: Automatische DDC-Klassifizierung mit Lingo : Vorgehensweise und Ergebnisse (2012) 0.16
    0.15657258 = sum of:
      0.15657258 = product of:
        1.3047715 = sum of:
          0.030039864 = weight(abstract_txt:wird in 122) [ClassicSimilarity], result of:
            0.030039864 = score(doc=122,freq=2.0), product of:
              0.051470425 = queryWeight, product of:
                1.1012233 = boost
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.012387258 = queryNorm
              0.5836335 = fieldWeight in 122, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.109375 = fieldNorm(doc=122)
          0.18184394 = weight(abstract_txt:automatischen in 122) [ClassicSimilarity], result of:
            0.18184394 = score(doc=122,freq=2.0), product of:
              0.17095888 = queryWeight, product of:
                2.0069768 = boost
                6.8766055 = idf(docFreq=123, maxDocs=44218)
                0.012387258 = queryNorm
              1.0636706 = fieldWeight in 122, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.8766055 = idf(docFreq=123, maxDocs=44218)
                0.109375 = fieldNorm(doc=122)
          1.0928878 = weight(abstract_txt:lingo in 122) [ClassicSimilarity], result of:
            1.0928878 = score(doc=122,freq=2.0), product of:
              0.76698965 = queryWeight, product of:
                6.721424 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.012387258 = queryNorm
              1.4249055 = fieldWeight in 122, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.109375 = fieldNorm(doc=122)
        0.12 = coord(3/25)
    
  5. Grün, S.: Mehrwortbegriffe und Latent Semantic Analysis : Bewertung automatisch extrahierter Mehrwortgruppen mit LSA (2017) 0.14
    0.13675341 = sum of:
      0.13675341 = product of:
        0.8547088 = sum of:
          0.03740164 = weight(abstract_txt:automatische in 3954) [ClassicSimilarity], result of:
            0.03740164 = score(doc=3954,freq=1.0), product of:
              0.08650573 = queryWeight, product of:
                1.0094945 = boost
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.012387258 = queryNorm
              0.43236023 = fieldWeight in 3954, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.0625 = fieldNorm(doc=3954)
          0.088318676 = weight(abstract_txt:algorithmische in 3954) [ClassicSimilarity], result of:
            0.088318676 = score(doc=3954,freq=1.0), product of:
              0.15339793 = queryWeight, product of:
                1.3442848 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.012387258 = queryNorm
              0.5757488 = fieldWeight in 3954, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.0625 = fieldNorm(doc=3954)
          0.10448116 = weight(abstract_txt:indexierung in 3954) [ClassicSimilarity], result of:
            0.10448116 = score(doc=3954,freq=1.0), product of:
              0.24746676 = queryWeight, product of:
                2.9573355 = boost
                6.7552447 = idf(docFreq=139, maxDocs=44218)
                0.012387258 = queryNorm
              0.4222028 = fieldWeight in 3954, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7552447 = idf(docFreq=139, maxDocs=44218)
                0.0625 = fieldNorm(doc=3954)
          0.6245073 = weight(abstract_txt:lingo in 3954) [ClassicSimilarity], result of:
            0.6245073 = score(doc=3954,freq=2.0), product of:
              0.76698965 = queryWeight, product of:
                6.721424 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.012387258 = queryNorm
              0.81423175 = fieldWeight in 3954, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.0625 = fieldNorm(doc=3954)
        0.16 = coord(4/25)