Search (7 results, page 1 of 1)

  • × theme_ss:"Automatisches Indexieren"
  • × type_ss:"el"
  1. Junger, U.: Can indexing be automated? : the example of the Deutsche Nationalbibliothek (2012) 0.02
    0.022510704 = product of:
      0.045021407 = sum of:
        0.045021407 = product of:
          0.090042815 = sum of:
            0.090042815 = weight(_text_:subject in 1717) [ClassicSimilarity], result of:
              0.090042815 = score(doc=1717,freq=8.0), product of:
                0.16275941 = queryWeight, product of:
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.04550679 = queryNorm
                0.5532265 = fieldWeight in 1717, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1717)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The German subject headings authority file (Schlagwortnormdatei/SWD) provides a broad controlled vocabulary for indexing documents of all subjects. Traditionally used for intellectual subject cataloguing primarily of books the Deutsche Nationalbibliothek (DNB, German National Library) has been working on developping and implementing procedures for automated assignment of subject headings for online publications. This project, its results and problems are sketched in the paper.
    Content
    Beitrag für die Tagung: Beyond libraries - subject metadata in the digital environment and semantic web. IFLA Satellite Post-Conference, 17-18 August 2012, Tallinn. Vgl.: http://http://www.nlib.ee/index.php?id=17763.
  2. Suominen, O.; Koskenniemi, I.: Annif Analyzer Shootout : comparing text lemmatization methods for automated subject indexing (2022) 0.01
    0.0139248865 = product of:
      0.027849773 = sum of:
        0.027849773 = product of:
          0.055699546 = sum of:
            0.055699546 = weight(_text_:subject in 658) [ClassicSimilarity], result of:
              0.055699546 = score(doc=658,freq=6.0), product of:
                0.16275941 = queryWeight, product of:
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.04550679 = queryNorm
                0.34222013 = fieldWeight in 658, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=658)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Automated text classification is an important function for many AI systems relevant to libraries, including automated subject indexing and classification. When implemented using the traditional natural language processing (NLP) paradigm, one key part of the process is the normalization of words using stemming or lemmatization, which reduces the amount of linguistic variation and often improves the quality of classification. In this paper, we compare the output of seven different text lemmatization algorithms as well as two baseline methods. We measure how the choice of method affects the quality of text classification using example corpora in three languages. The experiments have been performed using the open source Annif toolkit for automated subject indexing and classification, but should generalize also to other NLP toolkits and similar text classification tasks. The results show that lemmatization methods in most cases outperform baseline methods in text classification particularly for Finnish and Swedish text, but not English, where baseline methods are most effective. The differences between lemmatization methods are quite small. The systematic comparison will help optimize text classification pipelines and inform the further development of the Annif toolkit to incorporate a wider choice of normalization methods.
  3. Wolfe, EW.: a case study in automated metadata enhancement : Natural Language Processing in the humanities (2019) 0.01
    0.011255352 = product of:
      0.022510704 = sum of:
        0.022510704 = product of:
          0.045021407 = sum of:
            0.045021407 = weight(_text_:subject in 5236) [ClassicSimilarity], result of:
              0.045021407 = score(doc=5236,freq=2.0), product of:
                0.16275941 = queryWeight, product of:
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.04550679 = queryNorm
                0.27661324 = fieldWeight in 5236, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5236)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The Black Book Interactive Project at the University of Kansas (KU) is developing an expanded corpus of novels by African American authors, with an emphasis on lesser known writers and a goal of expanding research in this field. Using a custom metadata schema with an emphasis on race-related elements, each novel is analyzed for a variety of elements such as literary style, targeted content analysis, historical context, and other areas. Librarians at KU have worked to develop a variety of computational text analysis processes designed to assist with specific aspects of this metadata collection, including text mining and natural language processing, automated subject extraction based on word sense disambiguation, harvesting data from Wikidata, and other actions.
  4. Mongin, L.; Fu, Y.Y.; Mostafa, J.: Open Archives data Service prototype and automated subject indexing using D-Lib archive content as a testbed (2003) 0.01
    0.009647444 = product of:
      0.019294888 = sum of:
        0.019294888 = product of:
          0.038589776 = sum of:
            0.038589776 = weight(_text_:subject in 1167) [ClassicSimilarity], result of:
              0.038589776 = score(doc=1167,freq=2.0), product of:
                0.16275941 = queryWeight, product of:
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.04550679 = queryNorm
                0.23709705 = fieldWeight in 1167, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1167)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
  5. Schöneberg, U.; Gödert, W.: Erschließung mathematischer Publikationen mittels linguistischer Verfahren (2012) 0.01
    0.009647444 = product of:
      0.019294888 = sum of:
        0.019294888 = product of:
          0.038589776 = sum of:
            0.038589776 = weight(_text_:subject in 1055) [ClassicSimilarity], result of:
              0.038589776 = score(doc=1055,freq=2.0), product of:
                0.16275941 = queryWeight, product of:
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.04550679 = queryNorm
                0.23709705 = fieldWeight in 1055, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1055)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Die Zahl der mathematik-relevanten Publikationn steigt von Jahr zu Jahr an. Referatedienste wie da Zentralblatt MATH und Mathematical Reviews erfassen die bibliographischen Daten, erschließen die Arbeiten inhaltlich und machen sie - heute über Datenbanken, früher in gedruckter Form - für den Nutzer suchbar. Keywords sind ein wesentlicher Bestandteil der inhaltlichen Erschließung der Publikationen. Keywords sind meist keine einzelnen Wörter, sondern Mehrwortphrasen. Das legt die Anwendung linguistischer Methoden und Verfahren nahe. Die an der FH Köln entwickelte Software 'Lingo' wurde für die speziellen Anforderungen mathematischer Texte angepasst und sowohl zum Aufbau eines kontrollierten Vokabulars als auch zur Extraction von Keywords aus mathematischen Publikationen genutzt. Es ist geplant, über eine Verknüpfung von kontrolliertem Vokabular und der Mathematical Subject Classification Methoden für die automatische Klassifikation für den Referatedienst Zentralblatt MATH zu entwickeln und zu erproben.
  6. Toepfer, M.; Seifert, C.: Content-based quality estimation for automatic subject indexing of short texts under precision and recall constraints 0.01
    0.008039537 = product of:
      0.016079074 = sum of:
        0.016079074 = product of:
          0.032158148 = sum of:
            0.032158148 = weight(_text_:subject in 4309) [ClassicSimilarity], result of:
              0.032158148 = score(doc=4309,freq=2.0), product of:
                0.16275941 = queryWeight, product of:
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.04550679 = queryNorm
                0.19758089 = fieldWeight in 4309, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4309)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
  7. Junger, U.; Schwens, U.: ¬Die inhaltliche Erschließung des schriftlichen kulturellen Erbes auf dem Weg in die Zukunft : Automatische Vergabe von Schlagwörtern in der Deutschen Nationalbibliothek (2017) 0.01
    0.0077069276 = product of:
      0.015413855 = sum of:
        0.015413855 = product of:
          0.03082771 = sum of:
            0.03082771 = weight(_text_:22 in 3780) [ClassicSimilarity], result of:
              0.03082771 = score(doc=3780,freq=2.0), product of:
                0.15935703 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04550679 = queryNorm
                0.19345059 = fieldWeight in 3780, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3780)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    19. 8.2017 9:24:22