Document (#32784)

Author
Kumpe, D.
Title
Methoden zur automatischen Indexierung von Dokumenten
Imprint
Berlin : Technische Universität Berlin / Institut für Softwaretechnik und Theoretische Informatik, Computergestützte Informationssysteme
Year
2006
Pages
VII, 147 S
Abstract
Diese Diplomarbeit handelt von der Indexierung von unstrukturierten und natürlichsprachigen Dokumenten. Die zunehmende Informationsflut und die Zahl an veröffentlichten wissenschaftlichen Berichten und Büchern machen eine maschinelle inhaltliche Erschließung notwendig. Um die Anforderungen hierfür besser zu verstehen, werden Probleme der natürlichsprachigen schriftlichen Kommunikation untersucht. Die manuellen Techniken der Indexierung und die Dokumentationssprachen werden vorgestellt. Die Indexierung wird thematisch in den Bereich der inhaltlichen Erschließung und des Information Retrieval eingeordnet. Weiterhin werden Vor- und Nachteile von ausgesuchten Algorithmen untersucht und Softwareprodukte im Bereich des Information Retrieval auf ihre Arbeitsweise hin evaluiert. Anhand von Beispiel-Dokumenten werden die Ergebnisse einzelner Verfahren vorgestellt. Mithilfe des Projekts European Migration Network werden Probleme und grundlegende Anforderungen an die Durchführung einer inhaltlichen Erschließung identifiziert und Lösungsmöglichkeiten vorgeschlagen.
Content
Diplomarbeit
Theme
Automatisches Indexieren

Similar documents (content)

  1. Simon, D.: Anreicherung bibliothekarischer Titeldaten durch Tagging : Möglichkeiten und Probleme (2007) 0.22
    0.21566448 = sum of:
      0.21566448 = product of:
        0.898602 = sum of:
          0.07620984 = weight(abstract_txt:vorgestellt in 2531) [ClassicSimilarity], result of:
            0.07620984 = score(doc=2531,freq=1.0), product of:
              0.14779192 = queryWeight, product of:
                1.4263068 = boost
                5.5003343 = idf(docFreq=465, maxDocs=41962)
                0.018838601 = queryNorm
              0.51565635 = fieldWeight in 2531, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5003343 = idf(docFreq=465, maxDocs=41962)
                0.09375 = fieldNorm(doc=2531)
          0.14133736 = weight(abstract_txt:untersucht in 2531) [ClassicSimilarity], result of:
            0.14133736 = score(doc=2531,freq=2.0), product of:
              0.17706715 = queryWeight, product of:
                1.561193 = boost
                6.020502 = idf(docFreq=276, maxDocs=41962)
                0.018838601 = queryNorm
              0.79821336 = fieldWeight in 2531, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.020502 = idf(docFreq=276, maxDocs=41962)
                0.09375 = fieldNorm(doc=2531)
          0.14158425 = weight(abstract_txt:inhaltlichen in 2531) [ClassicSimilarity], result of:
            0.14158425 = score(doc=2531,freq=1.0), product of:
              0.22335035 = queryWeight, product of:
                1.7533997 = boost
                6.761718 = idf(docFreq=131, maxDocs=41962)
                0.018838601 = queryNorm
              0.633911 = fieldWeight in 2531, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.761718 = idf(docFreq=131, maxDocs=41962)
                0.09375 = fieldNorm(doc=2531)
          0.050409377 = weight(abstract_txt:werden in 2531) [ClassicSimilarity], result of:
            0.050409377 = score(doc=2531,freq=1.0), product of:
              0.1522758 = queryWeight, product of:
                2.2891438 = boost
                3.5310931 = idf(docFreq=3338, maxDocs=41962)
                0.018838601 = queryNorm
              0.33103997 = fieldWeight in 2531, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5310931 = idf(docFreq=3338, maxDocs=41962)
                0.09375 = fieldNorm(doc=2531)
          0.20683983 = weight(abstract_txt:erschließung in 2531) [ClassicSimilarity], result of:
            0.20683983 = score(doc=2531,freq=2.0), product of:
              0.2612682 = queryWeight, product of:
                2.322611 = boost
                5.9711967 = idf(docFreq=290, maxDocs=41962)
                0.018838601 = queryNorm
              0.7916763 = fieldWeight in 2531, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9711967 = idf(docFreq=290, maxDocs=41962)
                0.09375 = fieldNorm(doc=2531)
          0.28222135 = weight(abstract_txt:indexierung in 2531) [ClassicSimilarity], result of:
            0.28222135 = score(doc=2531,freq=1.0), product of:
              0.445704 = queryWeight, product of:
                3.502885 = boost
                6.7541704 = idf(docFreq=132, maxDocs=41962)
                0.018838601 = queryNorm
              0.6332035 = fieldWeight in 2531, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7541704 = idf(docFreq=132, maxDocs=41962)
                0.09375 = fieldNorm(doc=2531)
        0.24 = coord(6/25)
    
  2. Scherer, B.: Automatische Indexierung und ihre Anwendung im DFG-Projekt "Gemeinsames Portal für Bibliotheken, Archive und Museen (BAM)" (2003) 0.21
    0.20793472 = sum of:
      0.20793472 = product of:
        0.8663947 = sum of:
          0.07003915 = weight(abstract_txt:informationsflut in 1284) [ClassicSimilarity], result of:
            0.07003915 = score(doc=1284,freq=1.0), product of:
              0.14529637 = queryWeight, product of:
                7.712694 = idf(docFreq=50, maxDocs=41962)
                0.018838601 = queryNorm
              0.4820434 = fieldWeight in 1284, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.712694 = idf(docFreq=50, maxDocs=41962)
                0.0625 = fieldNorm(doc=1284)
          0.07170378 = weight(abstract_txt:evaluiert in 1284) [ClassicSimilarity], result of:
            0.07170378 = score(doc=1284,freq=1.0), product of:
              0.14758952 = queryWeight, product of:
                1.0078604 = boost
                7.773319 = idf(docFreq=47, maxDocs=41962)
                0.018838601 = queryNorm
              0.48583242 = fieldWeight in 1284, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.773319 = idf(docFreq=47, maxDocs=41962)
                0.0625 = fieldNorm(doc=1284)
          0.050747175 = weight(abstract_txt:bereich in 1284) [ClassicSimilarity], result of:
            0.050747175 = score(doc=1284,freq=1.0), product of:
              0.14767674 = queryWeight, product of:
                1.425751 = boost
                5.4981904 = idf(docFreq=466, maxDocs=41962)
                0.018838601 = queryNorm
              0.3436369 = fieldWeight in 1284, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4981904 = idf(docFreq=466, maxDocs=41962)
                0.0625 = fieldNorm(doc=1284)
          0.07514586 = weight(abstract_txt:werden in 1284) [ClassicSimilarity], result of:
            0.07514586 = score(doc=1284,freq=5.0), product of:
              0.1522758 = queryWeight, product of:
                2.2891438 = boost
                3.5310931 = idf(docFreq=3338, maxDocs=41962)
                0.018838601 = queryNorm
              0.49348527 = fieldWeight in 1284, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.5310931 = idf(docFreq=3338, maxDocs=41962)
                0.0625 = fieldNorm(doc=1284)
          0.13789321 = weight(abstract_txt:erschließung in 1284) [ClassicSimilarity], result of:
            0.13789321 = score(doc=1284,freq=2.0), product of:
              0.2612682 = queryWeight, product of:
                2.322611 = boost
                5.9711967 = idf(docFreq=290, maxDocs=41962)
                0.018838601 = queryNorm
              0.52778417 = fieldWeight in 1284, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9711967 = idf(docFreq=290, maxDocs=41962)
                0.0625 = fieldNorm(doc=1284)
          0.4608655 = weight(abstract_txt:indexierung in 1284) [ClassicSimilarity], result of:
            0.4608655 = score(doc=1284,freq=6.0), product of:
              0.445704 = queryWeight, product of:
                3.502885 = boost
                6.7541704 = idf(docFreq=132, maxDocs=41962)
                0.018838601 = queryNorm
              1.034017 = fieldWeight in 1284, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.7541704 = idf(docFreq=132, maxDocs=41962)
                0.0625 = fieldNorm(doc=1284)
        0.24 = coord(6/25)
    
  3. El Jerroudi, F.: Inhaltliche Erschließung in Dokumenten-Management-Systemen, dargestellt am Beispiel der KRAFTWERKSSCHULE e.V (2007) 0.20
    0.20162478 = sum of:
      0.20162478 = product of:
        1.0081239 = sum of:
          0.23081957 = weight(abstract_txt:diplomarbeit in 2528) [ClassicSimilarity], result of:
            0.23081957 = score(doc=2528,freq=5.0), product of:
              0.16216081 = queryWeight, product of:
                1.0564418 = boost
                8.148012 = idf(docFreq=32, maxDocs=41962)
                0.018838601 = queryNorm
              1.4233992 = fieldWeight in 2528, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.148012 = idf(docFreq=32, maxDocs=41962)
                0.078125 = fieldNorm(doc=2528)
          0.16685863 = weight(abstract_txt:inhaltlichen in 2528) [ClassicSimilarity], result of:
            0.16685863 = score(doc=2528,freq=2.0), product of:
              0.22335035 = queryWeight, product of:
                1.7533997 = boost
                6.761718 = idf(docFreq=131, maxDocs=41962)
                0.018838601 = queryNorm
              0.74707127 = fieldWeight in 2528, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.761718 = idf(docFreq=131, maxDocs=41962)
                0.078125 = fieldNorm(doc=2528)
          0.09393233 = weight(abstract_txt:werden in 2528) [ClassicSimilarity], result of:
            0.09393233 = score(doc=2528,freq=5.0), product of:
              0.1522758 = queryWeight, product of:
                2.2891438 = boost
                3.5310931 = idf(docFreq=3338, maxDocs=41962)
                0.018838601 = queryNorm
              0.6168566 = fieldWeight in 2528, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.5310931 = idf(docFreq=3338, maxDocs=41962)
                0.078125 = fieldNorm(doc=2528)
          0.2985476 = weight(abstract_txt:erschließung in 2528) [ClassicSimilarity], result of:
            0.2985476 = score(doc=2528,freq=6.0), product of:
              0.2612682 = queryWeight, product of:
                2.322611 = boost
                5.9711967 = idf(docFreq=290, maxDocs=41962)
                0.018838601 = queryNorm
              1.1426864 = fieldWeight in 2528, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.9711967 = idf(docFreq=290, maxDocs=41962)
                0.078125 = fieldNorm(doc=2528)
          0.21796572 = weight(abstract_txt:dokumenten in 2528) [ClassicSimilarity], result of:
            0.21796572 = score(doc=2528,freq=2.0), product of:
              0.30552262 = queryWeight, product of:
                2.5116258 = boost
                6.457134 = idf(docFreq=178, maxDocs=41962)
                0.018838601 = queryNorm
              0.71341926 = fieldWeight in 2528, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.457134 = idf(docFreq=178, maxDocs=41962)
                0.078125 = fieldNorm(doc=2528)
        0.2 = coord(5/25)
    
  4. Schwarzendorfer, H.: Inhaltliche Erschließung von Altbeständen in allgemeinen Bibliothekskatalogen : Bestandsaufnahme und Entwicklungsmöglichkeiten (2009) 0.17
    0.17453606 = sum of:
      0.17453606 = product of:
        0.8726803 = sum of:
          0.10161312 = weight(abstract_txt:vorgestellt in 1586) [ClassicSimilarity], result of:
            0.10161312 = score(doc=1586,freq=1.0), product of:
              0.14779192 = queryWeight, product of:
                1.4263068 = boost
                5.5003343 = idf(docFreq=465, maxDocs=41962)
                0.018838601 = queryNorm
              0.6875418 = fieldWeight in 1586, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5003343 = idf(docFreq=465, maxDocs=41962)
                0.125 = fieldNorm(doc=1586)
          0.13325414 = weight(abstract_txt:untersucht in 1586) [ClassicSimilarity], result of:
            0.13325414 = score(doc=1586,freq=1.0), product of:
              0.17706715 = queryWeight, product of:
                1.561193 = boost
                6.020502 = idf(docFreq=276, maxDocs=41962)
                0.018838601 = queryNorm
              0.75256276 = fieldWeight in 1586, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.020502 = idf(docFreq=276, maxDocs=41962)
                0.125 = fieldNorm(doc=1586)
          0.2669738 = weight(abstract_txt:inhaltlichen in 1586) [ClassicSimilarity], result of:
            0.2669738 = score(doc=1586,freq=2.0), product of:
              0.22335035 = queryWeight, product of:
                1.7533997 = boost
                6.761718 = idf(docFreq=131, maxDocs=41962)
                0.018838601 = queryNorm
              1.195314 = fieldWeight in 1586, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.761718 = idf(docFreq=131, maxDocs=41962)
                0.125 = fieldNorm(doc=1586)
          0.09505283 = weight(abstract_txt:werden in 1586) [ClassicSimilarity], result of:
            0.09505283 = score(doc=1586,freq=2.0), product of:
              0.1522758 = queryWeight, product of:
                2.2891438 = boost
                3.5310931 = idf(docFreq=3338, maxDocs=41962)
                0.018838601 = queryNorm
              0.62421495 = fieldWeight in 1586, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5310931 = idf(docFreq=3338, maxDocs=41962)
                0.125 = fieldNorm(doc=1586)
          0.27578643 = weight(abstract_txt:erschließung in 1586) [ClassicSimilarity], result of:
            0.27578643 = score(doc=1586,freq=2.0), product of:
              0.2612682 = queryWeight, product of:
                2.322611 = boost
                5.9711967 = idf(docFreq=290, maxDocs=41962)
                0.018838601 = queryNorm
              1.0555683 = fieldWeight in 1586, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9711967 = idf(docFreq=290, maxDocs=41962)
                0.125 = fieldNorm(doc=1586)
        0.2 = coord(5/25)
    
  5. Halip, I.: Automatische Extrahierung von Schlagworten aus unstrukturierten Texten (2005) 0.17
    0.16953048 = sum of:
      0.16953048 = product of:
        0.706377 = sum of:
          0.08229437 = weight(abstract_txt:eingeordnet in 1987) [ClassicSimilarity], result of:
            0.08229437 = score(doc=1987,freq=1.0), product of:
              0.17684884 = queryWeight, product of:
                1.1032494 = boost
                8.509026 = idf(docFreq=22, maxDocs=41962)
                0.018838601 = queryNorm
              0.46533734 = fieldWeight in 1987, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.509026 = idf(docFreq=22, maxDocs=41962)
                0.0546875 = fieldNorm(doc=1987)
          0.0879631 = weight(abstract_txt:unstrukturierten in 1987) [ClassicSimilarity], result of:
            0.0879631 = score(doc=1987,freq=1.0), product of:
              0.18487966 = queryWeight, product of:
                1.128021 = boost
                8.700081 = idf(docFreq=18, maxDocs=41962)
                0.018838601 = queryNorm
              0.47578567 = fieldWeight in 1987, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.700081 = idf(docFreq=18, maxDocs=41962)
                0.0546875 = fieldNorm(doc=1987)
          0.044403777 = weight(abstract_txt:bereich in 1987) [ClassicSimilarity], result of:
            0.044403777 = score(doc=1987,freq=1.0), product of:
              0.14767674 = queryWeight, product of:
                1.425751 = boost
                5.4981904 = idf(docFreq=466, maxDocs=41962)
                0.018838601 = queryNorm
              0.30068228 = fieldWeight in 1987, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4981904 = idf(docFreq=466, maxDocs=41962)
                0.0546875 = fieldNorm(doc=1987)
          0.072028406 = weight(abstract_txt:werden in 1987) [ClassicSimilarity], result of:
            0.072028406 = score(doc=1987,freq=6.0), product of:
              0.1522758 = queryWeight, product of:
                2.2891438 = boost
                3.5310931 = idf(docFreq=3338, maxDocs=41962)
                0.018838601 = queryNorm
              0.4730128 = fieldWeight in 1987, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.5310931 = idf(docFreq=3338, maxDocs=41962)
                0.0546875 = fieldNorm(doc=1987)
          0.18686669 = weight(abstract_txt:dokumenten in 1987) [ClassicSimilarity], result of:
            0.18686669 = score(doc=1987,freq=3.0), product of:
              0.30552262 = queryWeight, product of:
                2.5116258 = boost
                6.457134 = idf(docFreq=178, maxDocs=41962)
                0.018838601 = queryNorm
              0.6116296 = fieldWeight in 1987, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.457134 = idf(docFreq=178, maxDocs=41962)
                0.0546875 = fieldNorm(doc=1987)
          0.23282069 = weight(abstract_txt:indexierung in 1987) [ClassicSimilarity], result of:
            0.23282069 = score(doc=1987,freq=2.0), product of:
              0.445704 = queryWeight, product of:
                3.502885 = boost
                6.7541704 = idf(docFreq=132, maxDocs=41962)
                0.018838601 = queryNorm
              0.52236617 = fieldWeight in 1987, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.7541704 = idf(docFreq=132, maxDocs=41962)
                0.0546875 = fieldNorm(doc=1987)
        0.24 = coord(6/25)