Document (#28583)

Author
Lepsky, K.
Vorhauer, J.
Title
Lingo - ein open source System für die Automatische Indexierung deutschsprachiger Dokumente
Source
ABI-Technik. 26(2006) H.1, S.18-28
Year
2006
Abstract
Lingo ist ein frei verfügbares System (open source) zur automatischen Indexierung der deutschen Sprache. Bei der Entwicklung von lingo standen hohe Konfigurierbarkeit und Flexibilität des Systems für unterschiedliche Einsatzmöglichkeiten im Vordergrund. Der Beitrag zeigt den Nutzen einer linguistisch basierten automatischen Indexierung für das Information Retrieval auf. Die für eine Retrievalverbesserung zur Verfügung stehende linguistische Funktionalität von lingo wird vorgestellt und an Beispielen erläutert: Grundformerkennung, Kompositumerkennung bzw. Kompositumzerlegung, Wortrelationierung, lexikalische und algorithmische Mehrwortgruppenerkennung, OCR-Fehlerkorrektur. Der offene Systemaufbau von lingo wird beschrieben, mögliche Einsatzszenarien und Anwendungsgrenzen werden benannt.
Theme
Automatisches Indexieren
Object
Lingo

Similar documents (author)

  1. Lepsky, K.: Art and language : Ernst H. Gombrich and Karl Bühler's theory of language (1996) 5.06
    5.0570784 = sum of:
      5.0570784 = weight(author_txt:lepsky in 5229) [ClassicSimilarity], result of:
        5.0570784 = fieldWeight in 5229, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.091326 = idf(docFreq=35, maxDocs=43254)
          0.625 = fieldNorm(doc=5229)
    
  2. Lepsky, K.: Maschinelle Indexierung von Titelaufnahmen zur Verbesserung der sachlichen Erschließung in Online-Publikumskatalogen (1994) 5.06
    5.0570784 = sum of:
      5.0570784 = weight(author_txt:lepsky in 64) [ClassicSimilarity], result of:
        5.0570784 = fieldWeight in 64, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.091326 = idf(docFreq=35, maxDocs=43254)
          0.625 = fieldNorm(doc=64)
    
  3. Lepsky, K.: RSWK - und was noch? : Stellungnahme zum Bericht 'Sacherschließung in Online-Katalogen' der Expertengruppe Online-Kataloge (1995) 5.06
    5.0570784 = sum of:
      5.0570784 = weight(author_txt:lepsky in 1841) [ClassicSimilarity], result of:
        5.0570784 = fieldWeight in 1841, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.091326 = idf(docFreq=35, maxDocs=43254)
          0.625 = fieldNorm(doc=1841)
    
  4. Lepsky, K.: Bild und Wirklichkeit : die Wirklichkeit im Bild (1987) 5.06
    5.0570784 = sum of:
      5.0570784 = weight(author_txt:lepsky in 2415) [ClassicSimilarity], result of:
        5.0570784 = fieldWeight in 2415, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.091326 = idf(docFreq=35, maxDocs=43254)
          0.625 = fieldNorm(doc=2415)
    
  5. Lepsky, K.: Ernst H. Gombrich : Theorie und Methode (1991) 5.06
    5.0570784 = sum of:
      5.0570784 = weight(author_txt:lepsky in 2754) [ClassicSimilarity], result of:
        5.0570784 = fieldWeight in 2754, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.091326 = idf(docFreq=35, maxDocs=43254)
          0.625 = fieldNorm(doc=2754)
    

Similar documents (content)

  1. Bredack, J.: Terminologieextraktion von Mehrwortgruppen in kunsthistorischen Fachtexten (2013) 0.26
    0.2648272 = sum of:
      0.2648272 = product of:
        0.827585 = sum of:
          0.023271283 = weight(abstract_txt:mögliche in 2519) [ClassicSimilarity], result of:
            0.023271283 = score(doc=2519,freq=1.0), product of:
              0.086287804 = queryWeight, product of:
                1.0060472 = boost
                6.9041605 = idf(docFreq=117, maxDocs=43254)
                0.012422819 = queryNorm
              0.26969376 = fieldWeight in 2519, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9041605 = idf(docFreq=117, maxDocs=43254)
                0.0390625 = fieldNorm(doc=2519)
          0.017219182 = weight(abstract_txt:wird in 2519) [ClassicSimilarity], result of:
            0.017219182 = score(doc=2519,freq=5.0), product of:
              0.05201128 = queryWeight, product of:
                1.1046062 = boost
                3.7902684 = idf(docFreq=2655, maxDocs=43254)
                0.012422819 = queryNorm
              0.3310663 = fieldWeight in 2519, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.7902684 = idf(docFreq=2655, maxDocs=43254)
                0.0390625 = fieldNorm(doc=2519)
          0.07983959 = weight(abstract_txt:algorithmische in 2519) [ClassicSimilarity], result of:
            0.07983959 = score(doc=2519,freq=2.0), product of:
              0.15578945 = queryWeight, product of:
                1.3518008 = boost
                9.27695 = idf(docFreq=10, maxDocs=43254)
                0.012422819 = queryNorm
              0.5124839 = fieldWeight in 2519, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.27695 = idf(docFreq=10, maxDocs=43254)
                0.0390625 = fieldNorm(doc=2519)
          0.058213096 = weight(abstract_txt:linguistisch in 2519) [ClassicSimilarity], result of:
            0.058213096 = score(doc=2519,freq=1.0), product of:
              0.15900703 = queryWeight, product of:
                1.365689 = boost
                9.37226 = idf(docFreq=9, maxDocs=43254)
                0.012422819 = queryNorm
              0.36610392 = fieldWeight in 2519, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.37226 = idf(docFreq=9, maxDocs=43254)
                0.0390625 = fieldNorm(doc=2519)
          0.016215093 = weight(abstract_txt:open in 2519) [ClassicSimilarity], result of:
            0.016215093 = score(doc=2519,freq=1.0), product of:
              0.08544608 = queryWeight, product of:
                1.4158093 = boost
                4.858109 = idf(docFreq=912, maxDocs=43254)
                0.012422819 = queryNorm
              0.18976988 = fieldWeight in 2519, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.858109 = idf(docFreq=912, maxDocs=43254)
                0.0390625 = fieldNorm(doc=2519)
          0.065115385 = weight(abstract_txt:lexikalische in 2519) [ClassicSimilarity], result of:
            0.065115385 = score(doc=2519,freq=1.0), product of:
              0.1713398 = queryWeight, product of:
                1.4176624 = boost
                9.728935 = idf(docFreq=6, maxDocs=43254)
                0.012422819 = queryNorm
              0.38003653 = fieldWeight in 2519, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.728935 = idf(docFreq=6, maxDocs=43254)
                0.0390625 = fieldNorm(doc=2519)
          0.018896978 = weight(abstract_txt:source in 2519) [ClassicSimilarity], result of:
            0.018896978 = score(doc=2519,freq=1.0), product of:
              0.09462534 = queryWeight, product of:
                1.4899181 = boost
                5.112401 = idf(docFreq=707, maxDocs=43254)
                0.012422819 = queryNorm
              0.19970316 = fieldWeight in 2519, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.112401 = idf(docFreq=707, maxDocs=43254)
                0.0390625 = fieldNorm(doc=2519)
          0.54881436 = weight(abstract_txt:lingo in 2519) [ClassicSimilarity], result of:
            0.54881436 = score(doc=2519,freq=4.0), product of:
              0.76440376 = queryWeight, product of:
                6.695609 = boost
                9.189939 = idf(docFreq=11, maxDocs=43254)
                0.012422819 = queryNorm
              0.71796393 = fieldWeight in 2519, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.189939 = idf(docFreq=11, maxDocs=43254)
                0.0390625 = fieldNorm(doc=2519)
        0.32 = coord(8/25)
    
  2. Glaesener, L.: Automatisches Indexieren einer informationswissenschaftlichen Datenbank mit Mehrwortgruppen (2012) 0.22
    0.22363481 = sum of:
      0.22363481 = product of:
        1.3977176 = sum of:
          0.075877905 = weight(abstract_txt:automatische in 1866) [ClassicSimilarity], result of:
            0.075877905 = score(doc=1866,freq=1.0), product of:
              0.087373435 = queryWeight, product of:
                1.0123563 = boost
                6.9474573 = idf(docFreq=112, maxDocs=43254)
                0.012422819 = queryNorm
              0.86843216 = fieldWeight in 1866, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9474573 = idf(docFreq=112, maxDocs=43254)
                0.125 = fieldNorm(doc=1866)
          0.14785118 = weight(abstract_txt:automatischen in 1866) [ClassicSimilarity], result of:
            0.14785118 = score(doc=1866,freq=1.0), product of:
              0.17173642 = queryWeight, product of:
                2.0071964 = boost
                6.8873534 = idf(docFreq=119, maxDocs=43254)
                0.012422819 = queryNorm
              0.8609192 = fieldWeight in 1866, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8873534 = idf(docFreq=119, maxDocs=43254)
                0.125 = fieldNorm(doc=1866)
          0.29588553 = weight(abstract_txt:indexierung in 1866) [ClassicSimilarity], result of:
            0.29588553 = score(doc=1866,freq=2.0), product of:
              0.24778906 = queryWeight, product of:
                2.952877 = boost
                6.754864 = idf(docFreq=136, maxDocs=43254)
                0.012422819 = queryNorm
              1.1941025 = fieldWeight in 1866, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.754864 = idf(docFreq=136, maxDocs=43254)
                0.125 = fieldNorm(doc=1866)
          0.87810296 = weight(abstract_txt:lingo in 1866) [ClassicSimilarity], result of:
            0.87810296 = score(doc=1866,freq=1.0), product of:
              0.76440376 = queryWeight, product of:
                6.695609 = boost
                9.189939 = idf(docFreq=11, maxDocs=43254)
                0.012422819 = queryNorm
              1.1487423 = fieldWeight in 1866, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.189939 = idf(docFreq=11, maxDocs=43254)
                0.125 = fieldNorm(doc=1866)
        0.16 = coord(4/25)
    
  3. Scherer, B.: Automatische Indexierung und ihre Anwendung im DFG-Projekt "Gemeinsames Portal für Bibliotheken, Archive und Museen (BAM)" (2003) 0.18
    0.17510666 = sum of:
      0.17510666 = product of:
        0.62538093 = sum of:
          0.037234053 = weight(abstract_txt:mögliche in 748) [ClassicSimilarity], result of:
            0.037234053 = score(doc=748,freq=1.0), product of:
              0.086287804 = queryWeight, product of:
                1.0060472 = boost
                6.9041605 = idf(docFreq=117, maxDocs=43254)
                0.012422819 = queryNorm
              0.43151003 = fieldWeight in 748, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9041605 = idf(docFreq=117, maxDocs=43254)
                0.0625 = fieldNorm(doc=748)
          0.05365378 = weight(abstract_txt:automatische in 748) [ClassicSimilarity], result of:
            0.05365378 = score(doc=748,freq=2.0), product of:
              0.087373435 = queryWeight, product of:
                1.0123563 = boost
                6.9474573 = idf(docFreq=112, maxDocs=43254)
                0.012422819 = queryNorm
              0.6140743 = fieldWeight in 748, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.9474573 = idf(docFreq=112, maxDocs=43254)
                0.0625 = fieldNorm(doc=748)
          0.044110212 = weight(abstract_txt:vordergrund in 748) [ClassicSimilarity], result of:
            0.044110212 = score(doc=748,freq=1.0), product of:
              0.09660849 = queryWeight, product of:
                1.0645138 = boost
                7.305397 = idf(docFreq=78, maxDocs=43254)
                0.012422819 = queryNorm
              0.4565873 = fieldWeight in 748, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.305397 = idf(docFreq=78, maxDocs=43254)
                0.0625 = fieldNorm(doc=748)
          0.012321045 = weight(abstract_txt:wird in 748) [ClassicSimilarity], result of:
            0.012321045 = score(doc=748,freq=1.0), product of:
              0.05201128 = queryWeight, product of:
                1.1046062 = boost
                3.7902684 = idf(docFreq=2655, maxDocs=43254)
                0.012422819 = queryNorm
              0.23689178 = fieldWeight in 748, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7902684 = idf(docFreq=2655, maxDocs=43254)
                0.0625 = fieldNorm(doc=748)
          0.07396624 = weight(abstract_txt:linguistische in 748) [ClassicSimilarity], result of:
            0.07396624 = score(doc=748,freq=1.0), product of:
              0.13635725 = queryWeight, product of:
                1.2646862 = boost
                8.679112 = idf(docFreq=19, maxDocs=43254)
                0.012422819 = queryNorm
              0.5424445 = fieldWeight in 748, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.679112 = idf(docFreq=19, maxDocs=43254)
                0.0625 = fieldNorm(doc=748)
          0.14785118 = weight(abstract_txt:automatischen in 748) [ClassicSimilarity], result of:
            0.14785118 = score(doc=748,freq=4.0), product of:
              0.17173642 = queryWeight, product of:
                2.0071964 = boost
                6.8873534 = idf(docFreq=119, maxDocs=43254)
                0.012422819 = queryNorm
              0.8609192 = fieldWeight in 748, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.8873534 = idf(docFreq=119, maxDocs=43254)
                0.0625 = fieldNorm(doc=748)
          0.25624442 = weight(abstract_txt:indexierung in 748) [ClassicSimilarity], result of:
            0.25624442 = score(doc=748,freq=6.0), product of:
              0.24778906 = queryWeight, product of:
                2.952877 = boost
                6.754864 = idf(docFreq=136, maxDocs=43254)
                0.012422819 = queryNorm
              1.0341232 = fieldWeight in 748, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.754864 = idf(docFreq=136, maxDocs=43254)
                0.0625 = fieldNorm(doc=748)
        0.28 = coord(7/25)
    
  4. Jersek, T.: Automatische DDC-Klassifizierung mit Lingo : Vorgehensweise und Ergebnisse (2012) 0.16
    0.15600558 = sum of:
      0.15600558 = product of:
        1.3000464 = sum of:
          0.03049303 = weight(abstract_txt:wird in 1587) [ClassicSimilarity], result of:
            0.03049303 = score(doc=1587,freq=2.0), product of:
              0.05201128 = queryWeight, product of:
                1.1046062 = boost
                3.7902684 = idf(docFreq=2655, maxDocs=43254)
                0.012422819 = queryNorm
              0.58627725 = fieldWeight in 1587, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.7902684 = idf(docFreq=2655, maxDocs=43254)
                0.109375 = fieldNorm(doc=1587)
          0.1829565 = weight(abstract_txt:automatischen in 1587) [ClassicSimilarity], result of:
            0.1829565 = score(doc=1587,freq=2.0), product of:
              0.17173642 = queryWeight, product of:
                2.0071964 = boost
                6.8873534 = idf(docFreq=119, maxDocs=43254)
                0.012422819 = queryNorm
              1.0653331 = fieldWeight in 1587, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.8873534 = idf(docFreq=119, maxDocs=43254)
                0.109375 = fieldNorm(doc=1587)
          1.086597 = weight(abstract_txt:lingo in 1587) [ClassicSimilarity], result of:
            1.086597 = score(doc=1587,freq=2.0), product of:
              0.76440376 = queryWeight, product of:
                6.695609 = boost
                9.189939 = idf(docFreq=11, maxDocs=43254)
                0.012422819 = queryNorm
              1.421496 = fieldWeight in 1587, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.189939 = idf(docFreq=11, maxDocs=43254)
                0.109375 = fieldNorm(doc=1587)
        0.12 = coord(3/25)
    
  5. Grün, S.: Mehrwortbegriffe und Latent Semantic Analysis : Bewertung automatisch extrahierter Mehrwortgruppen mit LSA (2017) 0.14
    0.13660656 = sum of:
      0.13660656 = product of:
        0.853791 = sum of:
          0.037938952 = weight(abstract_txt:automatische in 5419) [ClassicSimilarity], result of:
            0.037938952 = score(doc=5419,freq=1.0), product of:
              0.087373435 = queryWeight, product of:
                1.0123563 = boost
                6.9474573 = idf(docFreq=112, maxDocs=43254)
                0.012422819 = queryNorm
              0.43421608 = fieldWeight in 5419, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9474573 = idf(docFreq=112, maxDocs=43254)
                0.0625 = fieldNorm(doc=5419)
          0.09032818 = weight(abstract_txt:algorithmische in 5419) [ClassicSimilarity], result of:
            0.09032818 = score(doc=5419,freq=1.0), product of:
              0.15578945 = queryWeight, product of:
                1.3518008 = boost
                9.27695 = idf(docFreq=10, maxDocs=43254)
                0.012422819 = queryNorm
              0.57980937 = fieldWeight in 5419, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.27695 = idf(docFreq=10, maxDocs=43254)
                0.0625 = fieldNorm(doc=5419)
          0.10461134 = weight(abstract_txt:indexierung in 5419) [ClassicSimilarity], result of:
            0.10461134 = score(doc=5419,freq=1.0), product of:
              0.24778906 = queryWeight, product of:
                2.952877 = boost
                6.754864 = idf(docFreq=136, maxDocs=43254)
                0.012422819 = queryNorm
              0.422179 = fieldWeight in 5419, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.754864 = idf(docFreq=136, maxDocs=43254)
                0.0625 = fieldNorm(doc=5419)
          0.62091255 = weight(abstract_txt:lingo in 5419) [ClassicSimilarity], result of:
            0.62091255 = score(doc=5419,freq=2.0), product of:
              0.76440376 = queryWeight, product of:
                6.695609 = boost
                9.189939 = idf(docFreq=11, maxDocs=43254)
                0.012422819 = queryNorm
              0.81228346 = fieldWeight in 5419, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.189939 = idf(docFreq=11, maxDocs=43254)
                0.0625 = fieldNorm(doc=5419)
        0.16 = coord(4/25)