Document (#37906)

Kempf, A.O.
Automatische Inhaltserschließung in der Fachinformation
Information - Wissenschaft und Praxis. 64(2013) H.2/3, S.96-106
Der Artikel basiert auf einer Masterarbeit mit dem Titel "Automatische Indexierung in der sozialwissenschaftlichen Fachinformation. Eine Evaluationsstudie zur maschinellen Erschließung für die Datenbank SOLIS" (Kempf 2012), die im Rahmen des Aufbaustudiengangs Bibliotheks- und Informationswissenschaft an der Humboldt- Universität zu Berlin am Lehrstuhl Information Retrieval verfasst wurde. Auf der Grundlage des Schalenmodells zur Inhaltserschließung in der Fachinformation stellt der Artikel Evaluationsergebnisse eines automatischen Erschließungsverfahrens für den Einsatz in der sozialwissenschaftlichen Fachinformation vor. Ausgehend von dem von Krause beschriebenen Anwendungsszenario, wonach SOLIS-Datenbestände (Sozialwissenschaftliches Literaturinformationssystem) von geringerer Relevanz automatisch erschlossen werden sollten, wurden auf dieser Dokumentgrundlage zwei Testreihen mit der Indexierungssoftware MindServer der Firma Recommind durchgeführt. Neben den Auswirkungen allgemeiner Systemeinstellungen in der ersten Testreihe wurde in der zweiten Testreihe die Indexierungsleistung der Software für die Rand- und die Kernbereiche der Literaturdatenbank miteinander verglichen. Für letztere Testreihe wurden für beide Bereiche der Datenbank spezifische Versionen der Indexierungssoftware aufgebaut, die anhand von Dokumentkorpora aus den entsprechenden Bereichen trainiert wurden. Die Ergebnisse der Evaluation, die auf der Grundlage intellektuell generierter Vergleichsdaten erfolgt, weisen auf Unterschiede in der Indexierungsleistung zwischen Rand- und Kernbereichen hin, die einerseits gegen den Einsatz automatischer Indexierungsverfahren in den Randbereichen sprechen. Andererseits deutet sich an, dass sich die Indexierungsresultate durch den Aufbau fachteilgebietsspezifischer Trainingsmengen verbessern lassen.
Automatisches Indexieren

Similar documents (author)

  1. Kempf, G.: Klassifikationsprobleme der Rechtswissenschaft (1972) 5.23
    5.2279267 = sum of:
      5.2279267 = weight(author_txt:kempf in 4743) [ClassicSimilarity], result of:
        5.2279267 = score(doc=4743,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.364683 = idf(docFreq=27, maxDocs=44218)
            0.11955025 = queryNorm
          5.227927 = fieldWeight in 4743, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.364683 = idf(docFreq=27, maxDocs=44218)
            0.625 = fieldNorm(doc=4743)
  2. Kempf, A.: Thematischer Zugang zu Fachinformationen im Internet (1994) 5.23
    5.2279267 = sum of:
      5.2279267 = weight(author_txt:kempf in 8975) [ClassicSimilarity], result of:
        5.2279267 = score(doc=8975,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.364683 = idf(docFreq=27, maxDocs=44218)
            0.11955025 = queryNorm
          5.227927 = fieldWeight in 8975, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.364683 = idf(docFreq=27, maxDocs=44218)
            0.625 = fieldNorm(doc=8975)
  3. Kempf, A.: Forstliche Klassifikation und Meta-Information zum Wald im Internet (1995) 5.23
    5.2279267 = sum of:
      5.2279267 = weight(author_txt:kempf in 3204) [ClassicSimilarity], result of:
        5.2279267 = score(doc=3204,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.364683 = idf(docFreq=27, maxDocs=44218)
            0.11955025 = queryNorm
          5.227927 = fieldWeight in 3204, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.364683 = idf(docFreq=27, maxDocs=44218)
            0.625 = fieldNorm(doc=3204)
  4. Kempf, A.: Advocating global forest issues on the Internet (1996) 5.23
    5.2279267 = sum of:
      5.2279267 = weight(author_txt:kempf in 7024) [ClassicSimilarity], result of:
        5.2279267 = score(doc=7024,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.364683 = idf(docFreq=27, maxDocs=44218)
            0.11955025 = queryNorm
          5.227927 = fieldWeight in 7024, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.364683 = idf(docFreq=27, maxDocs=44218)
            0.625 = fieldNorm(doc=7024)
  5. Kempf, K.: Dalla Germania un esempio avanzato di sistema integrato (1997) 5.23
    5.2279267 = sum of:
      5.2279267 = weight(author_txt:kempf in 846) [ClassicSimilarity], result of:
        5.2279267 = score(doc=846,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.364683 = idf(docFreq=27, maxDocs=44218)
            0.11955025 = queryNorm
          5.227927 = fieldWeight in 846, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.364683 = idf(docFreq=27, maxDocs=44218)
            0.625 = fieldNorm(doc=846)

Similar documents (content)

  1. Kempf, A.O.: Automatische Indexierung in der sozialwissenschaftlichen Fachinformation : eine Evaluationsstudie zur maschinellen Erschließung für die Datenbank SOLIS (2012) 0.50
    0.4957406 = sum of:
      0.4957406 = product of:
        1.7705021 = sum of:
          0.11092461 = weight(abstract_txt:literaturdatenbank in 903) [ClassicSimilarity], result of:
            0.11092461 = score(doc=903,freq=1.0), product of:
              0.15795535 = queryWeight, product of:
                1.087855 = boost
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.016153246 = queryNorm
              0.7022529 = fieldWeight in 903, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.078125 = fieldNorm(doc=903)
          0.11349842 = weight(abstract_txt:indexierungsverfahren in 903) [ClassicSimilarity], result of:
            0.11349842 = score(doc=903,freq=1.0), product of:
              0.16038938 = queryWeight, product of:
                1.0962046 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.016153246 = queryNorm
              0.707643 = fieldWeight in 903, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.078125 = fieldNorm(doc=903)
          0.07152026 = weight(abstract_txt:datenbank in 903) [ClassicSimilarity], result of:
            0.07152026 = score(doc=903,freq=1.0), product of:
              0.14852919 = queryWeight, product of:
                1.4918486 = boost
                6.163498 = idf(docFreq=252, maxDocs=44218)
                0.016153246 = queryNorm
              0.48152328 = fieldWeight in 903, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.163498 = idf(docFreq=252, maxDocs=44218)
                0.078125 = fieldNorm(doc=903)
          0.14300773 = weight(abstract_txt:automatische in 903) [ClassicSimilarity], result of:
            0.14300773 = score(doc=903,freq=2.0), product of:
              0.18710645 = queryWeight, product of:
                1.6744153 = boost
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.016153246 = queryNorm
              0.76431215 = fieldWeight in 903, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.078125 = fieldNorm(doc=903)
          0.21271038 = weight(abstract_txt:sozialwissenschaftlichen in 903) [ClassicSimilarity], result of:
            0.21271038 = score(doc=903,freq=1.0), product of:
              0.3071743 = queryWeight, product of:
                2.1454148 = boost
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.016153246 = queryNorm
              0.69247454 = fieldWeight in 903, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.078125 = fieldNorm(doc=903)
          0.32896617 = weight(abstract_txt:solis in 903) [ClassicSimilarity], result of:
            0.32896617 = score(doc=903,freq=2.0), product of:
              0.32604927 = queryWeight, product of:
                2.210347 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.016153246 = queryNorm
              1.0089462 = fieldWeight in 903, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.078125 = fieldNorm(doc=903)
          0.7898745 = weight(title_txt:fachinformation in 903) [ClassicSimilarity], result of:
            0.7898745 = score(doc=903,freq=1.0), product of:
              0.4273737 = queryWeight, product of:
                3.5788026 = boost
                7.3928223 = idf(docFreq=73, maxDocs=44218)
                0.016153246 = queryNorm
              1.8482056 = fieldWeight in 903, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3928223 = idf(docFreq=73, maxDocs=44218)
                0.25 = fieldNorm(doc=903)
        0.28 = coord(7/25)
  2. Seeger, T.: Entwicklung der Fachinformation und -kommunikation (2004) 0.18
    0.17742917 = sum of:
      0.17742917 = product of:
        1.4785764 = sum of:
          0.069753304 = weight(abstract_txt:beschriebenen in 2907) [ClassicSimilarity], result of:
            0.069753304 = score(doc=2907,freq=1.0), product of:
              0.13453406 = queryWeight, product of:
                1.0039682 = boost
                8.29569 = idf(docFreq=29, maxDocs=44218)
                0.016153246 = queryNorm
              0.5184806 = fieldWeight in 2907, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.29569 = idf(docFreq=29, maxDocs=44218)
                0.0625 = fieldNorm(doc=2907)
          0.026542718 = weight(abstract_txt:wurde in 2907) [ClassicSimilarity], result of:
            0.026542718 = score(doc=2907,freq=1.0), product of:
              0.08900806 = queryWeight, product of:
                1.1548711 = boost
                4.771292 = idf(docFreq=1017, maxDocs=44218)
                0.016153246 = queryNorm
              0.29820576 = fieldWeight in 2907, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.771292 = idf(docFreq=1017, maxDocs=44218)
                0.0625 = fieldNorm(doc=2907)
          1.3822803 = weight(title_txt:fachinformation in 2907) [ClassicSimilarity], result of:
            1.3822803 = score(doc=2907,freq=1.0), product of:
              0.4273737 = queryWeight, product of:
                3.5788026 = boost
                7.3928223 = idf(docFreq=73, maxDocs=44218)
                0.016153246 = queryNorm
              3.2343597 = fieldWeight in 2907, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3928223 = idf(docFreq=73, maxDocs=44218)
                0.4375 = fieldNorm(doc=2907)
        0.12 = coord(3/25)
  3. Capurro, R.: Hermeneutik der Fachinformation (1986) 0.13
    0.12744163 = sum of:
      0.12744163 = product of:
        1.5930203 = sum of:
          0.013271359 = weight(abstract_txt:wurde in 3613) [ClassicSimilarity], result of:
            0.013271359 = score(doc=3613,freq=1.0), product of:
              0.08900806 = queryWeight, product of:
                1.1548711 = boost
                4.771292 = idf(docFreq=1017, maxDocs=44218)
                0.016153246 = queryNorm
              0.14910288 = fieldWeight in 3613, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.771292 = idf(docFreq=1017, maxDocs=44218)
                0.03125 = fieldNorm(doc=3613)
          1.579749 = weight(title_txt:fachinformation in 3613) [ClassicSimilarity], result of:
            1.579749 = score(doc=3613,freq=1.0), product of:
              0.4273737 = queryWeight, product of:
                3.5788026 = boost
                7.3928223 = idf(docFreq=73, maxDocs=44218)
                0.016153246 = queryNorm
              3.6964111 = fieldWeight in 3613, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3928223 = idf(docFreq=73, maxDocs=44218)
                0.5 = fieldNorm(doc=3613)
        0.08 = coord(2/25)
  4. Groß, T.: Automatische Indexierung von Dokumenten in einer wissenschaftlichen Bibliothek : Implementierung und Evaluierung am Beispiel der Deutschen Zentralbibliothek für Wirtschaftswissenschaften (2011) 0.11
    0.10767163 = sum of:
      0.10767163 = product of:
        0.53835815 = sum of:
          0.098928586 = weight(abstract_txt:letztere in 1083) [ClassicSimilarity], result of:
            0.098928586 = score(doc=1083,freq=1.0), product of:
              0.14635141 = queryWeight, product of:
                1.047134 = boost
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.016153246 = queryNorm
              0.675966 = fieldWeight in 1083, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.078125 = fieldNorm(doc=1083)
          0.11349842 = weight(abstract_txt:indexierungsverfahren in 1083) [ClassicSimilarity], result of:
            0.11349842 = score(doc=1083,freq=1.0), product of:
              0.16038938 = queryWeight, product of:
                1.0962046 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.016153246 = queryNorm
              0.707643 = fieldWeight in 1083, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.078125 = fieldNorm(doc=1083)
          0.14842175 = weight(abstract_txt:recommind in 1083) [ClassicSimilarity], result of:
            0.14842175 = score(doc=1083,freq=1.0), product of:
              0.19179948 = queryWeight, product of:
                1.198747 = boost
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.016153246 = queryNorm
              0.7738381 = fieldWeight in 1083, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.078125 = fieldNorm(doc=1083)
          0.06467775 = weight(abstract_txt:grundlage in 1083) [ClassicSimilarity], result of:
            0.06467775 = score(doc=1083,freq=1.0), product of:
              0.1388979 = queryWeight, product of:
                1.4426689 = boost
                5.9603148 = idf(docFreq=309, maxDocs=44218)
                0.016153246 = queryNorm
              0.4656496 = fieldWeight in 1083, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9603148 = idf(docFreq=309, maxDocs=44218)
                0.078125 = fieldNorm(doc=1083)
          0.112831645 = weight(abstract_txt:inhaltserschließung in 1083) [ClassicSimilarity], result of:
            0.112831645 = score(doc=1083,freq=1.0), product of:
              0.20128575 = queryWeight, product of:
                1.7367022 = boost
                7.1750984 = idf(docFreq=91, maxDocs=44218)
                0.016153246 = queryNorm
              0.56055456 = fieldWeight in 1083, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1750984 = idf(docFreq=91, maxDocs=44218)
                0.078125 = fieldNorm(doc=1083)
        0.2 = coord(5/25)
  5. Herb, U.: Wege zur psychologischen Fachinformation : Eine Bilanz aus der Virtuellen Fachbibliothek Psychologie (2002) 0.11
    0.10671714 = sum of:
      0.10671714 = product of:
        0.8893095 = sum of:
          0.026542718 = weight(abstract_txt:wurde in 1177) [ClassicSimilarity], result of:
            0.026542718 = score(doc=1177,freq=1.0), product of:
              0.08900806 = queryWeight, product of:
                1.1548711 = boost
                4.771292 = idf(docFreq=1017, maxDocs=44218)
                0.016153246 = queryNorm
              0.29820576 = fieldWeight in 1177, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.771292 = idf(docFreq=1017, maxDocs=44218)
                0.0625 = fieldNorm(doc=1177)
          0.072892316 = weight(abstract_txt:wurden in 1177) [ClassicSimilarity], result of:
            0.072892316 = score(doc=1177,freq=2.0), product of:
              0.15858935 = queryWeight, product of:
                1.8879977 = boost
                5.2001123 = idf(docFreq=662, maxDocs=44218)
                0.016153246 = queryNorm
              0.45962933 = fieldWeight in 1177, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.2001123 = idf(docFreq=662, maxDocs=44218)
                0.0625 = fieldNorm(doc=1177)
          0.7898745 = weight(title_txt:fachinformation in 1177) [ClassicSimilarity], result of:
            0.7898745 = score(doc=1177,freq=1.0), product of:
              0.4273737 = queryWeight, product of:
                3.5788026 = boost
                7.3928223 = idf(docFreq=73, maxDocs=44218)
                0.016153246 = queryNorm
              1.8482056 = fieldWeight in 1177, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3928223 = idf(docFreq=73, maxDocs=44218)
                0.25 = fieldNorm(doc=1177)
        0.12 = coord(3/25)