Document (#32784)

Author
Kumpe, D.
Title
Methoden zur automatischen Indexierung von Dokumenten
Imprint
Berlin : Technische Universität Berlin / Institut für Softwaretechnik und Theoretische Informatik, Computergestützte Informationssysteme
Year
2006
Pages
VII, 147 S
Abstract
Diese Diplomarbeit handelt von der Indexierung von unstrukturierten und natürlichsprachigen Dokumenten. Die zunehmende Informationsflut und die Zahl an veröffentlichten wissenschaftlichen Berichten und Büchern machen eine maschinelle inhaltliche Erschließung notwendig. Um die Anforderungen hierfür besser zu verstehen, werden Probleme der natürlichsprachigen schriftlichen Kommunikation untersucht. Die manuellen Techniken der Indexierung und die Dokumentationssprachen werden vorgestellt. Die Indexierung wird thematisch in den Bereich der inhaltlichen Erschließung und des Information Retrieval eingeordnet. Weiterhin werden Vor- und Nachteile von ausgesuchten Algorithmen untersucht und Softwareprodukte im Bereich des Information Retrieval auf ihre Arbeitsweise hin evaluiert. Anhand von Beispiel-Dokumenten werden die Ergebnisse einzelner Verfahren vorgestellt. Mithilfe des Projekts European Migration Network werden Probleme und grundlegende Anforderungen an die Durchführung einer inhaltlichen Erschließung identifiziert und Lösungsmöglichkeiten vorgeschlagen.
Content
Diplomarbeit
Theme
Automatisches Indexieren

Similar documents (content)

  1. El Jerroudi, F.: Inhaltliche Erschließung in Dokumenten-Management-Systemen, dargestellt am Beispiel der KRAFTWERKSSCHULE e.V (2007) 0.26
    0.26286227 = sum of:
      0.26286227 = product of:
        1.0952594 = sum of:
          0.08820366 = weight(abstract_txt:nachteile in 2528) [ClassicSimilarity], result of:
            0.08820366 = score(doc=2528,freq=1.0), product of:
              0.1460351 = queryWeight, product of:
                1.0025179 = boost
                7.731065 = idf(docFreq=50, maxDocs=42740)
                0.018841948 = queryNorm
              0.6039894 = fieldWeight in 2528, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.731065 = idf(docFreq=50, maxDocs=42740)
                0.078125 = fieldNorm(doc=2528)
          0.23245716 = weight(abstract_txt:diplomarbeit in 2528) [ClassicSimilarity], result of:
            0.23245716 = score(doc=2528,freq=5.0), product of:
              0.16294391 = queryWeight, product of:
                1.0589674 = boost
                8.166383 = idf(docFreq=32, maxDocs=42740)
                0.018841948 = queryNorm
              1.4266084 = fieldWeight in 2528, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.166383 = idf(docFreq=32, maxDocs=42740)
                0.078125 = fieldNorm(doc=2528)
          0.16660734 = weight(abstract_txt:inhaltlichen in 2528) [ClassicSimilarity], result of:
            0.16660734 = score(doc=2528,freq=2.0), product of:
              0.22314936 = queryWeight, product of:
                1.7525738 = boost
                6.7576156 = idf(docFreq=134, maxDocs=42740)
                0.018841948 = queryNorm
              0.74661803 = fieldWeight in 2528, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.7576156 = idf(docFreq=134, maxDocs=42740)
                0.078125 = fieldNorm(doc=2528)
          0.09344581 = weight(abstract_txt:werden in 2528) [ClassicSimilarity], result of:
            0.09344581 = score(doc=2528,freq=5.0), product of:
              0.15176539 = queryWeight, product of:
                2.2852561 = boost
                3.524618 = idf(docFreq=3422, maxDocs=42740)
                0.018841948 = queryNorm
              0.6157254 = fieldWeight in 2528, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.524618 = idf(docFreq=3422, maxDocs=42740)
                0.078125 = fieldNorm(doc=2528)
          0.29633406 = weight(abstract_txt:erschließung in 2528) [ClassicSimilarity], result of:
            0.29633406 = score(doc=2528,freq=6.0), product of:
              0.26000232 = queryWeight, product of:
                2.316929 = boost
                5.95578 = idf(docFreq=300, maxDocs=42740)
                0.018841948 = queryNorm
              1.1397362 = fieldWeight in 2528, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.95578 = idf(docFreq=300, maxDocs=42740)
                0.078125 = fieldNorm(doc=2528)
          0.21821135 = weight(abstract_txt:dokumenten in 2528) [ClassicSimilarity], result of:
            0.21821135 = score(doc=2528,freq=2.0), product of:
              0.30578408 = queryWeight, product of:
                2.5126476 = boost
                6.458884 = idf(docFreq=181, maxDocs=42740)
                0.018841948 = queryNorm
              0.71361256 = fieldWeight in 2528, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.458884 = idf(docFreq=181, maxDocs=42740)
                0.078125 = fieldNorm(doc=2528)
        0.24 = coord(6/25)
    
  2. Simon, D.: Anreicherung bibliothekarischer Titeldaten durch Tagging : Möglichkeiten und Probleme (2007) 0.21
    0.21492375 = sum of:
      0.21492375 = product of:
        0.8955157 = sum of:
          0.076202184 = weight(abstract_txt:vorgestellt in 2531) [ClassicSimilarity], result of:
            0.076202184 = score(doc=2531,freq=1.0), product of:
              0.14779747 = queryWeight, product of:
                1.4263037 = boost
                5.4995756 = idf(docFreq=474, maxDocs=42740)
                0.018841948 = queryNorm
              0.5155852 = fieldWeight in 2531, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4995756 = idf(docFreq=474, maxDocs=42740)
                0.09375 = fieldNorm(doc=2531)
          0.14067122 = weight(abstract_txt:untersucht in 2531) [ClassicSimilarity], result of:
            0.14067122 = score(doc=2531,freq=2.0), product of:
              0.1765288 = queryWeight, product of:
                1.5587853 = boost
                6.0104012 = idf(docFreq=284, maxDocs=42740)
                0.018841948 = queryNorm
              0.7968741 = fieldWeight in 2531, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.0104012 = idf(docFreq=284, maxDocs=42740)
                0.09375 = fieldNorm(doc=2531)
          0.14137103 = weight(abstract_txt:inhaltlichen in 2531) [ClassicSimilarity], result of:
            0.14137103 = score(doc=2531,freq=1.0), product of:
              0.22314936 = queryWeight, product of:
                1.7525738 = boost
                6.7576156 = idf(docFreq=134, maxDocs=42740)
                0.018841948 = queryNorm
              0.63352644 = fieldWeight in 2531, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7576156 = idf(docFreq=134, maxDocs=42740)
                0.09375 = fieldNorm(doc=2531)
          0.050148282 = weight(abstract_txt:werden in 2531) [ClassicSimilarity], result of:
            0.050148282 = score(doc=2531,freq=1.0), product of:
              0.15176539 = queryWeight, product of:
                2.2852561 = boost
                3.524618 = idf(docFreq=3422, maxDocs=42740)
                0.018841948 = queryNorm
              0.33043292 = fieldWeight in 2531, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.524618 = idf(docFreq=3422, maxDocs=42740)
                0.09375 = fieldNorm(doc=2531)
          0.20530623 = weight(abstract_txt:erschließung in 2531) [ClassicSimilarity], result of:
            0.20530623 = score(doc=2531,freq=2.0), product of:
              0.26000232 = queryWeight, product of:
                2.316929 = boost
                5.95578 = idf(docFreq=300, maxDocs=42740)
                0.018841948 = queryNorm
              0.7896323 = fieldWeight in 2531, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.95578 = idf(docFreq=300, maxDocs=42740)
                0.09375 = fieldNorm(doc=2531)
          0.2818167 = weight(abstract_txt:indexierung in 2531) [ClassicSimilarity], result of:
            0.2818167 = score(doc=2531,freq=1.0), product of:
              0.44532442 = queryWeight, product of:
                3.50132 = boost
                6.7502356 = idf(docFreq=135, maxDocs=42740)
                0.018841948 = queryNorm
              0.63283455 = fieldWeight in 2531, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7502356 = idf(docFreq=135, maxDocs=42740)
                0.09375 = fieldNorm(doc=2531)
        0.24 = coord(6/25)
    
  3. Halip, I.: Automatische Extrahierung von Schlagworten aus unstrukturierten Texten (2005) 0.21
    0.21489465 = sum of:
      0.21489465 = product of:
        0.7674809 = sum of:
          0.061742563 = weight(abstract_txt:nachteile in 1987) [ClassicSimilarity], result of:
            0.061742563 = score(doc=1987,freq=1.0), product of:
              0.1460351 = queryWeight, product of:
                1.0025179 = boost
                7.731065 = idf(docFreq=50, maxDocs=42740)
                0.018841948 = queryNorm
              0.4227926 = fieldWeight in 1987, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.731065 = idf(docFreq=50, maxDocs=42740)
                0.0546875 = fieldNorm(doc=1987)
          0.081620105 = weight(abstract_txt:eingeordnet in 1987) [ClassicSimilarity], result of:
            0.081620105 = score(doc=1987,freq=1.0), product of:
              0.17589991 = queryWeight, product of:
                1.1002625 = boost
                8.484837 = idf(docFreq=23, maxDocs=42740)
                0.018841948 = queryNorm
              0.4640145 = fieldWeight in 1987, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.484837 = idf(docFreq=23, maxDocs=42740)
                0.0546875 = fieldNorm(doc=1987)
          0.088549234 = weight(abstract_txt:unstrukturierten in 1987) [ClassicSimilarity], result of:
            0.088549234 = score(doc=1987,freq=1.0), product of:
              0.18571945 = queryWeight, product of:
                1.1305563 = boost
                8.7184515 = idf(docFreq=18, maxDocs=42740)
                0.018841948 = queryNorm
              0.4767903 = fieldWeight in 1987, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.7184515 = idf(docFreq=18, maxDocs=42740)
                0.0546875 = fieldNorm(doc=1987)
          0.044349484 = weight(abstract_txt:bereich in 1987) [ClassicSimilarity], result of:
            0.044349484 = score(doc=1987,freq=1.0), product of:
              0.14757174 = queryWeight, product of:
                1.4252142 = boost
                5.495374 = idf(docFreq=476, maxDocs=42740)
                0.018841948 = queryNorm
              0.3005283 = fieldWeight in 1987, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.495374 = idf(docFreq=476, maxDocs=42740)
                0.0546875 = fieldNorm(doc=1987)
          0.071655326 = weight(abstract_txt:werden in 1987) [ClassicSimilarity], result of:
            0.071655326 = score(doc=1987,freq=6.0), product of:
              0.15176539 = queryWeight, product of:
                2.2852561 = boost
                3.524618 = idf(docFreq=3422, maxDocs=42740)
                0.018841948 = queryNorm
              0.47214538 = fieldWeight in 1987, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.524618 = idf(docFreq=3422, maxDocs=42740)
                0.0546875 = fieldNorm(doc=1987)
          0.18707727 = weight(abstract_txt:dokumenten in 1987) [ClassicSimilarity], result of:
            0.18707727 = score(doc=1987,freq=3.0), product of:
              0.30578408 = queryWeight, product of:
                2.5126476 = boost
                6.458884 = idf(docFreq=181, maxDocs=42740)
                0.018841948 = queryNorm
              0.6117953 = fieldWeight in 1987, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.458884 = idf(docFreq=181, maxDocs=42740)
                0.0546875 = fieldNorm(doc=1987)
          0.2324869 = weight(abstract_txt:indexierung in 1987) [ClassicSimilarity], result of:
            0.2324869 = score(doc=1987,freq=2.0), product of:
              0.44532442 = queryWeight, product of:
                3.50132 = boost
                6.7502356 = idf(docFreq=135, maxDocs=42740)
                0.018841948 = queryNorm
              0.5220619 = fieldWeight in 1987, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.7502356 = idf(docFreq=135, maxDocs=42740)
                0.0546875 = fieldNorm(doc=1987)
        0.28 = coord(7/25)
    
  4. Scherer, B.: Automatische Indexierung und ihre Anwendung im DFG-Projekt "Gemeinsames Portal für Bibliotheken, Archive und Museen (BAM)" (2003) 0.21
    0.20741135 = sum of:
      0.20741135 = product of:
        0.86421394 = sum of:
          0.07003258 = weight(abstract_txt:informationsflut in 1284) [ClassicSimilarity], result of:
            0.07003258 = score(doc=1284,freq=1.0), product of:
              0.14530246 = queryWeight, product of:
                7.711647 = idf(docFreq=51, maxDocs=42740)
                0.018841948 = queryNorm
              0.48197794 = fieldWeight in 1284, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.711647 = idf(docFreq=51, maxDocs=42740)
                0.0625 = fieldNorm(doc=1284)
          0.07166402 = weight(abstract_txt:evaluiert in 1284) [ClassicSimilarity], result of:
            0.07166402 = score(doc=1284,freq=1.0), product of:
              0.14755037 = queryWeight, product of:
                1.0077056 = boost
                7.77107 = idf(docFreq=48, maxDocs=42740)
                0.018841948 = queryNorm
              0.48569188 = fieldWeight in 1284, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.77107 = idf(docFreq=48, maxDocs=42740)
                0.0625 = fieldNorm(doc=1284)
          0.050685123 = weight(abstract_txt:bereich in 1284) [ClassicSimilarity], result of:
            0.050685123 = score(doc=1284,freq=1.0), product of:
              0.14757174 = queryWeight, product of:
                1.4252142 = boost
                5.495374 = idf(docFreq=476, maxDocs=42740)
                0.018841948 = queryNorm
              0.3434609 = fieldWeight in 1284, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.495374 = idf(docFreq=476, maxDocs=42740)
                0.0625 = fieldNorm(doc=1284)
          0.074756645 = weight(abstract_txt:werden in 1284) [ClassicSimilarity], result of:
            0.074756645 = score(doc=1284,freq=5.0), product of:
              0.15176539 = queryWeight, product of:
                2.2852561 = boost
                3.524618 = idf(docFreq=3422, maxDocs=42740)
                0.018841948 = queryNorm
              0.49258032 = fieldWeight in 1284, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.524618 = idf(docFreq=3422, maxDocs=42740)
                0.0625 = fieldNorm(doc=1284)
          0.13687082 = weight(abstract_txt:erschließung in 1284) [ClassicSimilarity], result of:
            0.13687082 = score(doc=1284,freq=2.0), product of:
              0.26000232 = queryWeight, product of:
                2.316929 = boost
                5.95578 = idf(docFreq=300, maxDocs=42740)
                0.018841948 = queryNorm
              0.52642155 = fieldWeight in 1284, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.95578 = idf(docFreq=300, maxDocs=42740)
                0.0625 = fieldNorm(doc=1284)
          0.46020475 = weight(abstract_txt:indexierung in 1284) [ClassicSimilarity], result of:
            0.46020475 = score(doc=1284,freq=6.0), product of:
              0.44532442 = queryWeight, product of:
                3.50132 = boost
                6.7502356 = idf(docFreq=135, maxDocs=42740)
                0.018841948 = queryNorm
              1.0334146 = fieldWeight in 1284, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.7502356 = idf(docFreq=135, maxDocs=42740)
                0.0625 = fieldNorm(doc=1284)
        0.24 = coord(6/25)
    
  5. Artemenko, O.; Shramko, M.: Entwicklung eines Werkzeugs zur Sprachidentifikation in mono- und multilingualen Texten (2005) 0.18
    0.17663307 = sum of:
      0.17663307 = product of:
        0.4906474 = sum of:
          0.04377036 = weight(abstract_txt:informationsflut in 2573) [ClassicSimilarity], result of:
            0.04377036 = score(doc=2573,freq=1.0), product of:
              0.14530246 = queryWeight, product of:
                7.711647 = idf(docFreq=51, maxDocs=42740)
                0.018841948 = queryNorm
              0.3012362 = fieldWeight in 2573, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.711647 = idf(docFreq=51, maxDocs=42740)
                0.0390625 = fieldNorm(doc=2573)
          0.04410183 = weight(abstract_txt:nachteile in 2573) [ClassicSimilarity], result of:
            0.04410183 = score(doc=2573,freq=1.0), product of:
              0.1460351 = queryWeight, product of:
                1.0025179 = boost
                7.731065 = idf(docFreq=50, maxDocs=42740)
                0.018841948 = queryNorm
              0.3019947 = fieldWeight in 2573, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.731065 = idf(docFreq=50, maxDocs=42740)
                0.0390625 = fieldNorm(doc=2573)
          0.046676952 = weight(abstract_txt:identifiziert in 2573) [ClassicSimilarity], result of:
            0.046676952 = score(doc=2573,freq=1.0), product of:
              0.15166587 = queryWeight, product of:
                1.0216625 = boost
                7.8787007 = idf(docFreq=43, maxDocs=42740)
                0.018841948 = queryNorm
              0.30776176 = fieldWeight in 2573, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.8787007 = idf(docFreq=43, maxDocs=42740)
                0.0390625 = fieldNorm(doc=2573)
          0.063249454 = weight(abstract_txt:unstrukturierten in 2573) [ClassicSimilarity], result of:
            0.063249454 = score(doc=2573,freq=1.0), product of:
              0.18571945 = queryWeight, product of:
                1.1305563 = boost
                8.7184515 = idf(docFreq=18, maxDocs=42740)
                0.018841948 = queryNorm
              0.34056452 = fieldWeight in 2573, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.7184515 = idf(docFreq=18, maxDocs=42740)
                0.0390625 = fieldNorm(doc=2573)
          0.0316782 = weight(abstract_txt:bereich in 2573) [ClassicSimilarity], result of:
            0.0316782 = score(doc=2573,freq=1.0), product of:
              0.14757174 = queryWeight, product of:
                1.4252142 = boost
                5.495374 = idf(docFreq=476, maxDocs=42740)
                0.018841948 = queryNorm
              0.21466306 = fieldWeight in 2573, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.495374 = idf(docFreq=476, maxDocs=42740)
                0.0390625 = fieldNorm(doc=2573)
          0.044902567 = weight(abstract_txt:vorgestellt in 2573) [ClassicSimilarity], result of:
            0.044902567 = score(doc=2573,freq=2.0), product of:
              0.14779747 = queryWeight, product of:
                1.4263037 = boost
                5.4995756 = idf(docFreq=474, maxDocs=42740)
                0.018841948 = queryNorm
              0.3038115 = fieldWeight in 2573, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4995756 = idf(docFreq=474, maxDocs=42740)
                0.0390625 = fieldNorm(doc=2573)
          0.044477023 = weight(abstract_txt:anforderungen in 2573) [ClassicSimilarity], result of:
            0.044477023 = score(doc=2573,freq=1.0), product of:
              0.18503477 = queryWeight, product of:
                1.5958983 = boost
                6.153502 = idf(docFreq=246, maxDocs=42740)
                0.018841948 = queryNorm
              0.24037117 = fieldWeight in 2573, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.153502 = idf(docFreq=246, maxDocs=42740)
                0.0390625 = fieldNorm(doc=2573)
          0.06268535 = weight(abstract_txt:werden in 2573) [ClassicSimilarity], result of:
            0.06268535 = score(doc=2573,freq=9.0), product of:
              0.15176539 = queryWeight, product of:
                2.2852561 = boost
                3.524618 = idf(docFreq=3422, maxDocs=42740)
                0.018841948 = queryNorm
              0.41304114 = fieldWeight in 2573, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                3.524618 = idf(docFreq=3422, maxDocs=42740)
                0.0390625 = fieldNorm(doc=2573)
          0.10910568 = weight(abstract_txt:dokumenten in 2573) [ClassicSimilarity], result of:
            0.10910568 = score(doc=2573,freq=2.0), product of:
              0.30578408 = queryWeight, product of:
                2.5126476 = boost
                6.458884 = idf(docFreq=181, maxDocs=42740)
                0.018841948 = queryNorm
              0.35680628 = fieldWeight in 2573, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.458884 = idf(docFreq=181, maxDocs=42740)
                0.0390625 = fieldNorm(doc=2573)
        0.36 = coord(9/25)