Document (#32783)

Author
Kumpe, D.
Title
Methoden zur automatischen Indexierung von Dokumenten
Imprint
Berlin : Technische Universität Berlin / Institut für Softwaretechnik und Theoretische Informatik, Computergestützte Informationssysteme
Year
2006
Pages
VII, 147 S
Abstract
Diese Diplomarbeit handelt von der Indexierung von unstrukturierten und natürlichsprachigen Dokumenten. Die zunehmende Informationsflut und die Zahl an veröffentlichten wissenschaftlichen Berichten und Büchern machen eine maschinelle inhaltliche Erschließung notwendig. Um die Anforderungen hierfür besser zu verstehen, werden Probleme der natürlichsprachigen schriftlichen Kommunikation untersucht. Die manuellen Techniken der Indexierung und die Dokumentationssprachen werden vorgestellt. Die Indexierung wird thematisch in den Bereich der inhaltlichen Erschließung und des Information Retrieval eingeordnet. Weiterhin werden Vor- und Nachteile von ausgesuchten Algorithmen untersucht und Softwareprodukte im Bereich des Information Retrieval auf ihre Arbeitsweise hin evaluiert. Anhand von Beispiel-Dokumenten werden die Ergebnisse einzelner Verfahren vorgestellt. Mithilfe des Projekts European Migration Network werden Probleme und grundlegende Anforderungen an die Durchführung einer inhaltlichen Erschließung identifiziert und Lösungsmöglichkeiten vorgeschlagen.
Content
Diplomarbeit
Theme
Automatisches Indexieren

Similar documents (content)

  1. El Jerroudi, F.: Inhaltliche Erschließung in Dokumenten-Management-Systemen, dargestellt am Beispiel der KRAFTWERKSSCHULE e.V (2007) 0.26
    0.26231152 = sum of:
      0.26231152 = product of:
        1.0929646 = sum of:
          0.08913681 = weight(abstract_txt:nachteile in 527) [ClassicSimilarity], result of:
            0.08913681 = score(doc=527,freq=1.0), product of:
              0.1473023 = queryWeight, product of:
                1.0072942 = boost
                7.7456436 = idf(docFreq=51, maxDocs=44218)
                0.018879727 = queryNorm
              0.6051284 = fieldWeight in 527, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7456436 = idf(docFreq=51, maxDocs=44218)
                0.078125 = fieldNorm(doc=527)
          0.23652193 = weight(abstract_txt:diplomarbeit in 527) [ClassicSimilarity], result of:
            0.23652193 = score(doc=527,freq=5.0), product of:
              0.16510583 = queryWeight, product of:
                1.066431 = boost
                8.200379 = idf(docFreq=32, maxDocs=44218)
                0.018879727 = queryNorm
              1.4325473 = fieldWeight in 527, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.200379 = idf(docFreq=32, maxDocs=44218)
                0.078125 = fieldNorm(doc=527)
          0.16516127 = weight(abstract_txt:inhaltlichen in 527) [ClassicSimilarity], result of:
            0.16516127 = score(doc=527,freq=2.0), product of:
              0.22221684 = queryWeight, product of:
                1.7496656 = boost
                6.727074 = idf(docFreq=143, maxDocs=44218)
                0.018879727 = queryNorm
              0.7432437 = fieldWeight in 527, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.727074 = idf(docFreq=143, maxDocs=44218)
                0.078125 = fieldNorm(doc=527)
          0.09244221 = weight(abstract_txt:werden in 527) [ClassicSimilarity], result of:
            0.09244221 = score(doc=527,freq=5.0), product of:
              0.1509217 = queryWeight, product of:
                2.2798822 = boost
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.018879727 = queryNorm
              0.61251765 = fieldWeight in 527, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.078125 = fieldNorm(doc=527)
          0.29187548 = weight(abstract_txt:erschließung in 527) [ClassicSimilarity], result of:
            0.29187548 = score(doc=527,freq=6.0), product of:
              0.25780612 = queryWeight, product of:
                2.3081224 = boost
                5.916144 = idf(docFreq=323, maxDocs=44218)
                0.018879727 = queryNorm
              1.1321511 = fieldWeight in 527, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.916144 = idf(docFreq=323, maxDocs=44218)
                0.078125 = fieldNorm(doc=527)
          0.21782692 = weight(abstract_txt:dokumenten in 527) [ClassicSimilarity], result of:
            0.21782692 = score(doc=527,freq=2.0), product of:
              0.30592123 = queryWeight, product of:
                2.5142994 = boost
                6.444614 = idf(docFreq=190, maxDocs=44218)
                0.018879727 = queryNorm
              0.71203595 = fieldWeight in 527, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.444614 = idf(docFreq=190, maxDocs=44218)
                0.078125 = fieldNorm(doc=527)
        0.24 = coord(6/25)
    
  2. Halip, I.: Automatische Extrahierung von Schlagworten aus unstrukturierten Texten (2005) 0.22
    0.21507195 = sum of:
      0.21507195 = product of:
        0.7681141 = sum of:
          0.062395763 = weight(abstract_txt:nachteile in 861) [ClassicSimilarity], result of:
            0.062395763 = score(doc=861,freq=1.0), product of:
              0.1473023 = queryWeight, product of:
                1.0072942 = boost
                7.7456436 = idf(docFreq=51, maxDocs=44218)
                0.018879727 = queryNorm
              0.4235899 = fieldWeight in 861, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7456436 = idf(docFreq=51, maxDocs=44218)
                0.0546875 = fieldNorm(doc=861)
          0.083008565 = weight(abstract_txt:eingeordnet in 861) [ClassicSimilarity], result of:
            0.083008565 = score(doc=861,freq=1.0), product of:
              0.17817827 = queryWeight, product of:
                1.1078448 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.018879727 = queryNorm
              0.4658737 = fieldWeight in 861, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.0546875 = fieldNorm(doc=861)
          0.086973526 = weight(abstract_txt:unstrukturierten in 861) [ClassicSimilarity], result of:
            0.086973526 = score(doc=861,freq=1.0), product of:
              0.18380791 = queryWeight, product of:
                1.1252102 = boost
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.018879727 = queryNorm
              0.47317618 = fieldWeight in 861, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.0546875 = fieldNorm(doc=861)
          0.043960046 = weight(abstract_txt:bereich in 861) [ClassicSimilarity], result of:
            0.043960046 = score(doc=861,freq=1.0), product of:
              0.14694503 = queryWeight, product of:
                1.4228005 = boost
                5.4703507 = idf(docFreq=505, maxDocs=44218)
                0.018879727 = queryNorm
              0.2991598 = fieldWeight in 861, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4703507 = idf(docFreq=505, maxDocs=44218)
                0.0546875 = fieldNorm(doc=861)
          0.070885755 = weight(abstract_txt:werden in 861) [ClassicSimilarity], result of:
            0.070885755 = score(doc=861,freq=6.0), product of:
              0.1509217 = queryWeight, product of:
                2.2798822 = boost
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.018879727 = queryNorm
              0.4696856 = fieldWeight in 861, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.0546875 = fieldNorm(doc=861)
          0.18674767 = weight(abstract_txt:dokumenten in 861) [ClassicSimilarity], result of:
            0.18674767 = score(doc=861,freq=3.0), product of:
              0.30592123 = queryWeight, product of:
                2.5142994 = boost
                6.444614 = idf(docFreq=190, maxDocs=44218)
                0.018879727 = queryNorm
              0.61044365 = fieldWeight in 861, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.444614 = idf(docFreq=190, maxDocs=44218)
                0.0546875 = fieldNorm(doc=861)
          0.2341428 = weight(abstract_txt:indexierung in 861) [ClassicSimilarity], result of:
            0.2341428 = score(doc=861,freq=2.0), product of:
              0.44816372 = queryWeight, product of:
                3.5139852 = boost
                6.7552447 = idf(docFreq=139, maxDocs=44218)
                0.018879727 = queryNorm
              0.52244925 = fieldWeight in 861, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.7552447 = idf(docFreq=139, maxDocs=44218)
                0.0546875 = fieldNorm(doc=861)
        0.28 = coord(7/25)
    
  3. Simon, D.: Anreicherung bibliothekarischer Titeldaten durch Tagging : Möglichkeiten und Probleme (2007) 0.21
    0.21395381 = sum of:
      0.21395381 = product of:
        0.89147425 = sum of:
          0.07577123 = weight(abstract_txt:vorgestellt in 530) [ClassicSimilarity], result of:
            0.07577123 = score(doc=530,freq=1.0), product of:
              0.14747901 = queryWeight, product of:
                1.4253833 = boost
                5.4802814 = idf(docFreq=500, maxDocs=44218)
                0.018879727 = queryNorm
              0.51377636 = fieldWeight in 530, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4802814 = idf(docFreq=500, maxDocs=44218)
                0.09375 = fieldNorm(doc=530)
          0.13990809 = weight(abstract_txt:untersucht in 530) [ClassicSimilarity], result of:
            0.13990809 = score(doc=530,freq=2.0), product of:
              0.17617565 = queryWeight, product of:
                1.5578997 = boost
                5.989777 = idf(docFreq=300, maxDocs=44218)
                0.018879727 = queryNorm
              0.79413974 = fieldWeight in 530, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.989777 = idf(docFreq=300, maxDocs=44218)
                0.09375 = fieldNorm(doc=530)
          0.14014399 = weight(abstract_txt:inhaltlichen in 530) [ClassicSimilarity], result of:
            0.14014399 = score(doc=530,freq=1.0), product of:
              0.22221684 = queryWeight, product of:
                1.7496656 = boost
                6.727074 = idf(docFreq=143, maxDocs=44218)
                0.018879727 = queryNorm
              0.6306632 = fieldWeight in 530, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.727074 = idf(docFreq=143, maxDocs=44218)
                0.09375 = fieldNorm(doc=530)
          0.04960969 = weight(abstract_txt:werden in 530) [ClassicSimilarity], result of:
            0.04960969 = score(doc=530,freq=1.0), product of:
              0.1509217 = queryWeight, product of:
                2.2798822 = boost
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.018879727 = queryNorm
              0.32871145 = fieldWeight in 530, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.09375 = fieldNorm(doc=530)
          0.20221725 = weight(abstract_txt:erschließung in 530) [ClassicSimilarity], result of:
            0.20221725 = score(doc=530,freq=2.0), product of:
              0.25780612 = queryWeight, product of:
                2.3081224 = boost
                5.916144 = idf(docFreq=323, maxDocs=44218)
                0.018879727 = queryNorm
              0.7843772 = fieldWeight in 530, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.916144 = idf(docFreq=323, maxDocs=44218)
                0.09375 = fieldNorm(doc=530)
          0.28382397 = weight(abstract_txt:indexierung in 530) [ClassicSimilarity], result of:
            0.28382397 = score(doc=530,freq=1.0), product of:
              0.44816372 = queryWeight, product of:
                3.5139852 = boost
                6.7552447 = idf(docFreq=139, maxDocs=44218)
                0.018879727 = queryNorm
              0.6333042 = fieldWeight in 530, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7552447 = idf(docFreq=139, maxDocs=44218)
                0.09375 = fieldNorm(doc=530)
        0.24 = coord(6/25)
    
  4. Probst, M.; Mittelbach, J.: Maschinelle Indexierung in der Sacherschließung wissenschaftlicher Bibliotheken (2006) 0.17
    0.17249528 = sum of:
      0.17249528 = product of:
        0.86247635 = sum of:
          0.124791525 = weight(abstract_txt:nachteile in 1755) [ClassicSimilarity], result of:
            0.124791525 = score(doc=1755,freq=1.0), product of:
              0.1473023 = queryWeight, product of:
                1.0072942 = boost
                7.7456436 = idf(docFreq=51, maxDocs=44218)
                0.018879727 = queryNorm
              0.8471798 = fieldWeight in 1755, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7456436 = idf(docFreq=51, maxDocs=44218)
                0.109375 = fieldNorm(doc=1755)
          0.13304126 = weight(abstract_txt:maschinelle in 1755) [ClassicSimilarity], result of:
            0.13304126 = score(doc=1755,freq=1.0), product of:
              0.15372472 = queryWeight, product of:
                1.029019 = boost
                7.912698 = idf(docFreq=43, maxDocs=44218)
                0.018879727 = queryNorm
              0.86545134 = fieldWeight in 1755, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.912698 = idf(docFreq=43, maxDocs=44218)
                0.109375 = fieldNorm(doc=1755)
          0.057877976 = weight(abstract_txt:werden in 1755) [ClassicSimilarity], result of:
            0.057877976 = score(doc=1755,freq=1.0), product of:
              0.1509217 = queryWeight, product of:
                2.2798822 = boost
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.018879727 = queryNorm
              0.3834967 = fieldWeight in 1755, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.109375 = fieldNorm(doc=1755)
          0.21563764 = weight(abstract_txt:dokumenten in 1755) [ClassicSimilarity], result of:
            0.21563764 = score(doc=1755,freq=1.0), product of:
              0.30592123 = queryWeight, product of:
                2.5142994 = boost
                6.444614 = idf(docFreq=190, maxDocs=44218)
                0.018879727 = queryNorm
              0.70487964 = fieldWeight in 1755, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.444614 = idf(docFreq=190, maxDocs=44218)
                0.109375 = fieldNorm(doc=1755)
          0.33112794 = weight(abstract_txt:indexierung in 1755) [ClassicSimilarity], result of:
            0.33112794 = score(doc=1755,freq=1.0), product of:
              0.44816372 = queryWeight, product of:
                3.5139852 = boost
                6.7552447 = idf(docFreq=139, maxDocs=44218)
                0.018879727 = queryNorm
              0.7388549 = fieldWeight in 1755, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7552447 = idf(docFreq=139, maxDocs=44218)
                0.109375 = fieldNorm(doc=1755)
        0.2 = coord(5/25)
    
  5. Schwarzendorfer, H.: Inhaltliche Erschließung von Altbeständen in allgemeinen Bibliothekskatalogen : Bestandsaufnahme und Entwicklungsmöglichkeiten (2009) 0.17
    0.17207217 = sum of:
      0.17207217 = product of:
        0.86036086 = sum of:
          0.10102831 = weight(abstract_txt:vorgestellt in 4585) [ClassicSimilarity], result of:
            0.10102831 = score(doc=4585,freq=1.0), product of:
              0.14747901 = queryWeight, product of:
                1.4253833 = boost
                5.4802814 = idf(docFreq=500, maxDocs=44218)
                0.018879727 = queryNorm
              0.68503517 = fieldWeight in 4585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4802814 = idf(docFreq=500, maxDocs=44218)
                0.125 = fieldNorm(doc=4585)
          0.13190661 = weight(abstract_txt:untersucht in 4585) [ClassicSimilarity], result of:
            0.13190661 = score(doc=4585,freq=1.0), product of:
              0.17617565 = queryWeight, product of:
                1.5578997 = boost
                5.989777 = idf(docFreq=300, maxDocs=44218)
                0.018879727 = queryNorm
              0.74872214 = fieldWeight in 4585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.989777 = idf(docFreq=300, maxDocs=44218)
                0.125 = fieldNorm(doc=4585)
          0.26425803 = weight(abstract_txt:inhaltlichen in 4585) [ClassicSimilarity], result of:
            0.26425803 = score(doc=4585,freq=2.0), product of:
              0.22221684 = queryWeight, product of:
                1.7496656 = boost
                6.727074 = idf(docFreq=143, maxDocs=44218)
                0.018879727 = queryNorm
              1.1891899 = fieldWeight in 4585, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.727074 = idf(docFreq=143, maxDocs=44218)
                0.125 = fieldNorm(doc=4585)
          0.09354494 = weight(abstract_txt:werden in 4585) [ClassicSimilarity], result of:
            0.09354494 = score(doc=4585,freq=2.0), product of:
              0.1509217 = queryWeight, product of:
                2.2798822 = boost
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.018879727 = queryNorm
              0.6198243 = fieldWeight in 4585, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.125 = fieldNorm(doc=4585)
          0.269623 = weight(abstract_txt:erschließung in 4585) [ClassicSimilarity], result of:
            0.269623 = score(doc=4585,freq=2.0), product of:
              0.25780612 = queryWeight, product of:
                2.3081224 = boost
                5.916144 = idf(docFreq=323, maxDocs=44218)
                0.018879727 = queryNorm
              1.0458363 = fieldWeight in 4585, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.916144 = idf(docFreq=323, maxDocs=44218)
                0.125 = fieldNorm(doc=4585)
        0.2 = coord(5/25)