Document (#26917)

Author
Gaese, V.
Title
"Automatische Klassifikation von Presseartikeln in der Gruner + Jahr Dokumentation"
Source
Bibliotheken und Informationseinrichtungen - Aufgaben, Strukturen, Ziele: 29. Arbeits- und Fortbildungstagung der ASpB / Sektion 5 im DBV in Zusammenarbeit mit der BDB, BIB, DBV, DGI und VDB, zugleich DBV-Jahrestagung, 8.-11.4.2003 in Stuttgart. Red.: Margit Bauer
Imprint
Jülich : ASpB / Sektion 5 im DBV
Year
2003
Pages
S.401-413
Abstract
Das Klassifizieren von Texten, auch Indexieren, inhaltliches Erschließen oder verschlagworten genannt, gehört seit jeher zu den zwar notwendigen aber sehr aufwändigen Aufgaben von Archiven bzw. Dokumentationen. Ihre unterschiedlichen Zwecke bzw. Anforderungen sind sicher ein Grund dafür, dass es fast ebenso viele Erschließungsinventare, Thesauri oder Schlagwortverzeichnisse wie Dokumentationen gibt. Im folgenden werden Klassifizierung, Indexierung, Erschließung und Verschlagwortung synonym verwendet. In der G+J Dokumentation arbeiten heute ca. 20 Dokumentare an Auswahl und Erschließung von täglich etwa 1.100 Artikeln aus insgesamt ca. 210 Titeln. In der G+J Pressedatenbank sind aktuell ca. 7 Mio Artikel gespeichert, gut 2 Mio als digitaler Volltext (OCR/Satzdaten). Archiviert sind nur Artikel, für die die G+J Dokumentation die entsprechenden Rechte hat.
Theme
Automatisches Indexieren
Object
Gruner + Jahr
Location
D
Hamburg
Area
Pressearchive

Similar documents (content)

  1. Schek, M.: Automatische Klassifizierung in Erschließung und Recherche eines Pressearchivs (2006) 0.17
    0.1733958 = sum of:
      0.1733958 = product of:
        0.6192707 = sum of:
          0.0622047 = weight(abstract_txt:täglich in 1044) [ClassicSimilarity], result of:
            0.0622047 = score(doc=1044,freq=1.0), product of:
              0.14963652 = queryWeight, product of:
                1.0217265 = boost
                7.6014686 = idf(docFreq=56, maxDocs=41962)
                0.019266617 = queryNorm
              0.41570532 = fieldWeight in 1044, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.6014686 = idf(docFreq=56, maxDocs=41962)
                0.0546875 = fieldNorm(doc=1044)
          0.1108561 = weight(abstract_txt:klassifizierung in 1044) [ClassicSimilarity], result of:
            0.1108561 = score(doc=1044,freq=2.0), product of:
              0.17457628 = queryWeight, product of:
                1.1035918 = boost
                8.210532 = idf(docFreq=30, maxDocs=41962)
                0.019266617 = queryNorm
              0.63500094 = fieldWeight in 1044, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.210532 = idf(docFreq=30, maxDocs=41962)
                0.0546875 = fieldNorm(doc=1044)
          0.022345591 = weight(abstract_txt:oder in 1044) [ClassicSimilarity], result of:
            0.022345591 = score(doc=1044,freq=1.0), product of:
              0.09527086 = queryWeight, product of:
                1.1529511 = boost
                4.2888784 = idf(docFreq=1564, maxDocs=41962)
                0.019266617 = queryNorm
              0.23454803 = fieldWeight in 1044, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2888784 = idf(docFreq=1564, maxDocs=41962)
                0.0546875 = fieldNorm(doc=1044)
          0.09007944 = weight(abstract_txt:dokumentare in 1044) [ClassicSimilarity], result of:
            0.09007944 = score(doc=1044,freq=1.0), product of:
              0.19153109 = queryWeight, product of:
                1.1559405 = boost
                8.5999975 = idf(docFreq=20, maxDocs=41962)
                0.019266617 = queryNorm
              0.47031236 = fieldWeight in 1044, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.5999975 = idf(docFreq=20, maxDocs=41962)
                0.0546875 = fieldNorm(doc=1044)
          0.19998698 = weight(abstract_txt:pressedatenbank in 1044) [ClassicSimilarity], result of:
            0.19998698 = score(doc=1044,freq=3.0), product of:
              0.22600405 = queryWeight, product of:
                1.2556655 = boost
                9.341934 = idf(docFreq=9, maxDocs=41962)
                0.019266617 = queryNorm
              0.8848823 = fieldWeight in 1044, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.341934 = idf(docFreq=9, maxDocs=41962)
                0.0546875 = fieldNorm(doc=1044)
          0.08839362 = weight(abstract_txt:artikel in 1044) [ClassicSimilarity], result of:
            0.08839362 = score(doc=1044,freq=3.0), product of:
              0.1652237 = queryWeight, product of:
                1.5183331 = boost
                5.6480675 = idf(docFreq=401, maxDocs=41962)
                0.019266617 = queryNorm
              0.5349936 = fieldWeight in 1044, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.6480675 = idf(docFreq=401, maxDocs=41962)
                0.0546875 = fieldNorm(doc=1044)
          0.045404255 = weight(abstract_txt:sind in 1044) [ClassicSimilarity], result of:
            0.045404255 = score(doc=1044,freq=3.0), product of:
              0.12130719 = queryWeight, product of:
                1.5933814 = boost
                3.951494 = idf(docFreq=2192, maxDocs=41962)
                0.019266617 = queryNorm
              0.37429154 = fieldWeight in 1044, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.951494 = idf(docFreq=2192, maxDocs=41962)
                0.0546875 = fieldNorm(doc=1044)
        0.28 = coord(7/25)
    
  2. Portal "Bibliothek Information Dokumentation" eingestellt (2004) 0.15
    0.146439 = sum of:
      0.146439 = product of:
        1.220325 = sum of:
          0.174974 = weight(abstract_txt:artikel in 4294) [ClassicSimilarity], result of:
            0.174974 = score(doc=4294,freq=1.0), product of:
              0.1652237 = queryWeight, product of:
                1.5183331 = boost
                5.6480675 = idf(docFreq=401, maxDocs=41962)
                0.019266617 = queryNorm
              1.0590127 = fieldWeight in 4294, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6480675 = idf(docFreq=401, maxDocs=41962)
                0.1875 = fieldNorm(doc=4294)
          0.08987711 = weight(abstract_txt:sind in 4294) [ClassicSimilarity], result of:
            0.08987711 = score(doc=4294,freq=1.0), product of:
              0.12130719 = queryWeight, product of:
                1.5933814 = boost
                3.951494 = idf(docFreq=2192, maxDocs=41962)
                0.019266617 = queryNorm
              0.7409051 = fieldWeight in 4294, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.951494 = idf(docFreq=2192, maxDocs=41962)
                0.1875 = fieldNorm(doc=4294)
          0.95547396 = weight(title_txt:dokumentation in 4294) [ClassicSimilarity], result of:
            0.95547396 = score(doc=4294,freq=1.0), product of:
              0.33338687 = queryWeight, product of:
                2.6415007 = boost
                6.5507693 = idf(docFreq=162, maxDocs=41962)
                0.019266617 = queryNorm
              2.8659616 = fieldWeight in 4294, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5507693 = idf(docFreq=162, maxDocs=41962)
                0.4375 = fieldNorm(doc=4294)
        0.12 = coord(3/25)
    
  3. Schek, M.: Automatische Klassifizierung und Visualisierung im Archiv der Süddeutschen Zeitung (2005) 0.13
    0.12717721 = sum of:
      0.12717721 = product of:
        0.45420432 = sum of:
          0.04297374 = weight(abstract_txt:archiven in 885) [ClassicSimilarity], result of:
            0.04297374 = score(doc=885,freq=1.0), product of:
              0.14634445 = queryWeight, product of:
                1.0104247 = boost
                7.5173855 = idf(docFreq=61, maxDocs=41962)
                0.019266617 = queryNorm
              0.2936479 = fieldWeight in 885, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5173855 = idf(docFreq=61, maxDocs=41962)
                0.0390625 = fieldNorm(doc=885)
          0.04443193 = weight(abstract_txt:täglich in 885) [ClassicSimilarity], result of:
            0.04443193 = score(doc=885,freq=1.0), product of:
              0.14963652 = queryWeight, product of:
                1.0217265 = boost
                7.6014686 = idf(docFreq=56, maxDocs=41962)
                0.019266617 = queryNorm
              0.29693237 = fieldWeight in 885, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.6014686 = idf(docFreq=56, maxDocs=41962)
                0.0390625 = fieldNorm(doc=885)
          0.12519921 = weight(abstract_txt:klassifizierung in 885) [ClassicSimilarity], result of:
            0.12519921 = score(doc=885,freq=5.0), product of:
              0.17457628 = queryWeight, product of:
                1.1035918 = boost
                8.210532 = idf(docFreq=30, maxDocs=41962)
                0.019266617 = queryNorm
              0.71716046 = fieldWeight in 885, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.210532 = idf(docFreq=30, maxDocs=41962)
                0.0390625 = fieldNorm(doc=885)
          0.015961139 = weight(abstract_txt:oder in 885) [ClassicSimilarity], result of:
            0.015961139 = score(doc=885,freq=1.0), product of:
              0.09527086 = queryWeight, product of:
                1.1529511 = boost
                4.2888784 = idf(docFreq=1564, maxDocs=41962)
                0.019266617 = queryNorm
              0.16753432 = fieldWeight in 885, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2888784 = idf(docFreq=1564, maxDocs=41962)
                0.0390625 = fieldNorm(doc=885)
          0.11663477 = weight(abstract_txt:pressedatenbank in 885) [ClassicSimilarity], result of:
            0.11663477 = score(doc=885,freq=2.0), product of:
              0.22600405 = queryWeight, product of:
                1.2556655 = boost
                9.341934 = idf(docFreq=9, maxDocs=41962)
                0.019266617 = queryNorm
              0.5160738 = fieldWeight in 885, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.341934 = idf(docFreq=9, maxDocs=41962)
                0.0390625 = fieldNorm(doc=885)
          0.0631383 = weight(abstract_txt:artikel in 885) [ClassicSimilarity], result of:
            0.0631383 = score(doc=885,freq=3.0), product of:
              0.1652237 = queryWeight, product of:
                1.5183331 = boost
                5.6480675 = idf(docFreq=401, maxDocs=41962)
                0.019266617 = queryNorm
              0.38213825 = fieldWeight in 885, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.6480675 = idf(docFreq=401, maxDocs=41962)
                0.0390625 = fieldNorm(doc=885)
          0.045865227 = weight(abstract_txt:sind in 885) [ClassicSimilarity], result of:
            0.045865227 = score(doc=885,freq=6.0), product of:
              0.12130719 = queryWeight, product of:
                1.5933814 = boost
                3.951494 = idf(docFreq=2192, maxDocs=41962)
                0.019266617 = queryNorm
              0.37809157 = fieldWeight in 885, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.951494 = idf(docFreq=2192, maxDocs=41962)
                0.0390625 = fieldNorm(doc=885)
        0.28 = coord(7/25)
    
  4. Rahmstorf, G.: ¬Der eigene Kern der Dokumentation im Wandel der Technik (1997) 0.11
    0.11482982 = sum of:
      0.11482982 = product of:
        0.9569152 = sum of:
          0.12001194 = weight(abstract_txt:indexieren in 3996) [ClassicSimilarity], result of:
            0.12001194 = score(doc=3996,freq=1.0), product of:
              0.16190106 = queryWeight, product of:
                1.0627735 = boost
                7.9068503 = idf(docFreq=41, maxDocs=41962)
                0.019266617 = queryNorm
              0.7412672 = fieldWeight in 3996, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9068503 = idf(docFreq=41, maxDocs=41962)
                0.09375 = fieldNorm(doc=3996)
          0.1544219 = weight(abstract_txt:dokumentare in 3996) [ClassicSimilarity], result of:
            0.1544219 = score(doc=3996,freq=1.0), product of:
              0.19153109 = queryWeight, product of:
                1.1559405 = boost
                8.5999975 = idf(docFreq=20, maxDocs=41962)
                0.019266617 = queryNorm
              0.80624974 = fieldWeight in 3996, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.5999975 = idf(docFreq=20, maxDocs=41962)
                0.09375 = fieldNorm(doc=3996)
          0.68248135 = weight(title_txt:dokumentation in 3996) [ClassicSimilarity], result of:
            0.68248135 = score(doc=3996,freq=1.0), product of:
              0.33338687 = queryWeight, product of:
                2.6415007 = boost
                6.5507693 = idf(docFreq=162, maxDocs=41962)
                0.019266617 = queryNorm
              2.0471153 = fieldWeight in 3996, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5507693 = idf(docFreq=162, maxDocs=41962)
                0.3125 = fieldNorm(doc=3996)
        0.12 = coord(3/25)
    
  5. Gaus, W.: Information und Dokumentation in der Medizin (2004) 0.11
    0.10853203 = sum of:
      0.10853203 = product of:
        0.90443355 = sum of:
          0.02553782 = weight(abstract_txt:oder in 3953) [ClassicSimilarity], result of:
            0.02553782 = score(doc=3953,freq=1.0), product of:
              0.09527086 = queryWeight, product of:
                1.1529511 = boost
                4.2888784 = idf(docFreq=1564, maxDocs=41962)
                0.019266617 = queryNorm
              0.2680549 = fieldWeight in 3953, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2888784 = idf(docFreq=1564, maxDocs=41962)
                0.0625 = fieldNorm(doc=3953)
          0.059918076 = weight(abstract_txt:sind in 3953) [ClassicSimilarity], result of:
            0.059918076 = score(doc=3953,freq=4.0), product of:
              0.12130719 = queryWeight, product of:
                1.5933814 = boost
                3.951494 = idf(docFreq=2192, maxDocs=41962)
                0.019266617 = queryNorm
              0.49393675 = fieldWeight in 3953, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.951494 = idf(docFreq=2192, maxDocs=41962)
                0.0625 = fieldNorm(doc=3953)
          0.81897765 = weight(title_txt:dokumentation in 3953) [ClassicSimilarity], result of:
            0.81897765 = score(doc=3953,freq=1.0), product of:
              0.33338687 = queryWeight, product of:
                2.6415007 = boost
                6.5507693 = idf(docFreq=162, maxDocs=41962)
                0.019266617 = queryNorm
              2.4565384 = fieldWeight in 3953, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5507693 = idf(docFreq=162, maxDocs=41962)
                0.375 = fieldNorm(doc=3953)
        0.12 = coord(3/25)