Search (31 results, page 1 of 2)

Fauzi, F.; Belkhatir, M.: Multifaceted conceptual image indexing on the world wide web (2013) 0.06

0.056143478 = product of:
  0.16843043 = sum of:
    0.043577533 = weight(_text_:web in 2721) [ClassicSimilarity], result of:
      0.043577533 = score(doc=2721,freq=6.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.37471575 = fieldWeight in 2721, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=2721)
    0.034899916 = weight(_text_:world in 2721) [ClassicSimilarity], result of:
      0.034899916 = score(doc=2721,freq=2.0), product of:
        0.13696888 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.035634913 = queryNorm
        0.25480178 = fieldWeight in 2721, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.046875 = fieldNorm(doc=2721)
    0.046375446 = weight(_text_:wide in 2721) [ClassicSimilarity], result of:
      0.046375446 = score(doc=2721,freq=2.0), product of:
        0.1578897 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.035634913 = queryNorm
        0.29372054 = fieldWeight in 2721, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.046875 = fieldNorm(doc=2721)
    0.043577533 = weight(_text_:web in 2721) [ClassicSimilarity], result of:
      0.043577533 = score(doc=2721,freq=6.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.37471575 = fieldWeight in 2721, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=2721)
  0.33333334 = coord(4/12)

Abstract: In this paper, we describe a user-centered design of an automated multifaceted concept-based indexing framework which analyzes the semantics of the Web image contextual information and classifies it into five broad semantic concept facets: signal, object, abstract, scene, and relational; and identifies the semantic relationships between the concepts. An important aspect of our indexing model is that it relates to the users' levels of image descriptions. Also, a major contribution relies on the fact that the classification is performed automatically with the raw image contextual information extracted from any general webpage and is not solely based on image tags like state-of-the-art solutions. Human Language Technology techniques and an external knowledge base are used to analyze the information both syntactically and semantically. Experimental results on a human-annotated Web image collection and corresponding contextual information indicate that our method outperforms empirical frameworks employing tf-idf and location-based tf-idf weighting schemes as well as n-gram indexing in a recall/precision based evaluation framework.

Daudaravicius, V.: ¬A framework for keyphrase extraction from scientific journals (2016) 0.05

0.051175587 = product of:
  0.15352675 = sum of:
    0.02935275 = weight(_text_:web in 2930) [ClassicSimilarity], result of:
      0.02935275 = score(doc=2930,freq=2.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.25239927 = fieldWeight in 2930, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2930)
    0.040716566 = weight(_text_:world in 2930) [ClassicSimilarity], result of:
      0.040716566 = score(doc=2930,freq=2.0), product of:
        0.13696888 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.035634913 = queryNorm
        0.29726875 = fieldWeight in 2930, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2930)
    0.05410469 = weight(_text_:wide in 2930) [ClassicSimilarity], result of:
      0.05410469 = score(doc=2930,freq=2.0), product of:
        0.1578897 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.035634913 = queryNorm
        0.342674 = fieldWeight in 2930, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2930)
    0.02935275 = weight(_text_:web in 2930) [ClassicSimilarity], result of:
      0.02935275 = score(doc=2930,freq=2.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.25239927 = fieldWeight in 2930, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2930)
  0.33333334 = coord(4/12)

Content: Vortrag, "Semantics, Analytics, Visualisation: Enhancing Scholarly Data Workshop co-located with the 25th International World Wide Web Conference April 11, 2016 - Montreal, Canada", Montreal 2016.

Gábor, K.; Zargayouna, H.; Tellier, I.; Buscaldi, D.; Charnois, T.: ¬A typology of semantic relations dedicated to scientific literature analysis (2016) 0.05

0.051175587 = product of:
  0.15352675 = sum of:
    0.02935275 = weight(_text_:web in 2933) [ClassicSimilarity], result of:
      0.02935275 = score(doc=2933,freq=2.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.25239927 = fieldWeight in 2933, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2933)
    0.040716566 = weight(_text_:world in 2933) [ClassicSimilarity], result of:
      0.040716566 = score(doc=2933,freq=2.0), product of:
        0.13696888 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.035634913 = queryNorm
        0.29726875 = fieldWeight in 2933, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2933)
    0.05410469 = weight(_text_:wide in 2933) [ClassicSimilarity], result of:
      0.05410469 = score(doc=2933,freq=2.0), product of:
        0.1578897 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.035634913 = queryNorm
        0.342674 = fieldWeight in 2933, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2933)
    0.02935275 = weight(_text_:web in 2933) [ClassicSimilarity], result of:
      0.02935275 = score(doc=2933,freq=2.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.25239927 = fieldWeight in 2933, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2933)
  0.33333334 = coord(4/12)

Content: Vortrag, "Semantics, Analytics, Visualisation: Enhancing Scholarly Data Workshop co-located with the 25th International World Wide Web Conference April 11, 2016 - Montreal, Canada", Montreal 2016.

Groß, T.; Faden, M.: Automatische Indexierung elektronischer Dokumente an der Deutschen Zentralbibliothek für Wirtschaftswissenschaften : Bericht über die Jahrestagung der Internationalen Buchwissenschaftlichen Gesellschaft (2010) 0.03
```
0.029243192 = product of:
  0.08772957 = sum of:
    0.016773 = weight(_text_:web in 4051) [ClassicSimilarity], result of:
      0.016773 = score(doc=4051,freq=2.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.14422815 = fieldWeight in 4051, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03125 = fieldNorm(doc=4051)
    0.02326661 = weight(_text_:world in 4051) [ClassicSimilarity], result of:
      0.02326661 = score(doc=4051,freq=2.0), product of:
        0.13696888 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.035634913 = queryNorm
        0.16986786 = fieldWeight in 4051, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.03125 = fieldNorm(doc=4051)
    0.030916965 = weight(_text_:wide in 4051) [ClassicSimilarity], result of:
      0.030916965 = score(doc=4051,freq=2.0), product of:
        0.1578897 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.035634913 = queryNorm
        0.1958137 = fieldWeight in 4051, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.03125 = fieldNorm(doc=4051)
    0.016773 = weight(_text_:web in 4051) [ClassicSimilarity], result of:
      0.016773 = score(doc=4051,freq=2.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.14422815 = fieldWeight in 4051, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03125 = fieldNorm(doc=4051)
  0.33333334 = coord(4/12)
```
Abstract

Die zunehmende Verfügbarmachung digitaler Informationen in den letzten Jahren sowie die Aussicht auf ein weiteres Ansteigen der sogenannten Datenflut kumulieren in einem grundlegenden, sich weiter verstärkenden Informationsstrukturierungsproblem. Die stetige Zunahme von digitalen Informationsressourcen im World Wide Web sichert zwar jederzeit und ortsungebunden den Zugriff auf verschiedene Informationen; offen bleibt der strukturierte Zugang, insbesondere zu wissenschaftlichen Ressourcen. Angesichts der steigenden Anzahl elektronischer Inhalte und vor dem Hintergrund stagnierender bzw. knapper werdender personeller Ressourcen in der Sacherschließun schafft keine Bibliothek bzw. kein Bibliotheksverbund es mehr, weder aktuell noch zukünftig, alle digitalen Daten zu erfassen, zu strukturieren und zueinander in Beziehung zu setzen. In der Informationsgesellschaft des 21. Jahrhunderts wird es aber zunehmend wichtiger, die in der Flut verschwundenen wissenschaftlichen Informationen zeitnah, angemessen und vollständig zu strukturieren und somit als Basis für eine Wissensgenerierung wieder nutzbar zu machen. Eine normierte Inhaltserschließung digitaler Informationsressourcen ist deshalb für die Deutsche Zentralbibliothek für Wirtschaftswissenschaften (ZBW) als wichtige Informationsinfrastruktureinrichtung in diesem Bereich ein entscheidender und auch erfolgskritischer Aspekt im Wettbewerb mit anderen Informationsdienstleistern. Weil die traditionelle intellektuelle Sacherschließung aber nicht beliebig skalierbar ist - mit dem Anstieg der Zahl an Online-Dokumenten steigt proportional auch der personelle Ressourcenbedarf an Fachreferenten, wenn ein gewisser Qualitätsstandard gehalten werden soll - bedarf es zukünftig anderer Sacherschließungsverfahren. Automatisierte Verschlagwortungsmethoden werden dabei als einzige Möglichkeit angesehen, die bibliothekarische Sacherschließung auch im digitalen Zeitalter zukunftsfest auszugestalten. Zudem können maschinelle Ansätze dazu beitragen, die Heterogenitäten (Indexierungsinkonsistenzen) zwischen den einzelnen Sacherschließer zu nivellieren, und somit zu einer homogeneren Erschließung des Bibliotheksbestandes beitragen.
Carevic, Z.: Semi-automatische Verschlagwortung zur Integration externer semantischer Inhalte innerhalb einer medizinischen Kooperationsplattform (2012) 0.02
```
0.02210968 = product of:
  0.08843872 = sum of:
    0.054892723 = weight(_text_:tagging in 897) [ClassicSimilarity], result of:
      0.054892723 = score(doc=897,freq=2.0), product of:
        0.21038401 = queryWeight, product of:
          5.9038734 = idf(docFreq=327, maxDocs=44218)
          0.035634913 = queryNorm
        0.2609168 = fieldWeight in 897, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.9038734 = idf(docFreq=327, maxDocs=44218)
          0.03125 = fieldNorm(doc=897)
    0.016773 = weight(_text_:web in 897) [ClassicSimilarity], result of:
      0.016773 = score(doc=897,freq=2.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.14422815 = fieldWeight in 897, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03125 = fieldNorm(doc=897)
    0.016773 = weight(_text_:web in 897) [ClassicSimilarity], result of:
      0.016773 = score(doc=897,freq=2.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.14422815 = fieldWeight in 897, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03125 = fieldNorm(doc=897)
  0.25 = coord(3/12)
```
Abstract

Die vorliegende Arbeit beschäftigt sich mit der Integration von externen semantischen Inhalten auf Basis eines medizinischen Begriffssystems. Die zugrundeliegende Annahme ist, dass die Verwendung einer einheitlichen Terminologie auf Seiten des Anfragesystems und der Wissensbasis zu qualitativ hochwertigen Ergebnissen führt. Um dies zu erreichen muss auf Seiten des Anfragesystems eine Abbildung natürlicher Sprache auf die verwendete Terminologie gewährleistet werden. Dies geschieht auf Basis einer (semi-)automatischen Verschlagwortung textbasierter Inhalte. Im Wesentlichen lassen sich folgende Fragestellungen festhalten: Automatische Verschlagwortung textbasierter Inhalte Kann eine automatische Verschlagwortung textbasierter Inhalte auf Basis eines Begriffssystems optimiert werden? Der zentrale Aspekt der vorliegenden Arbeit ist die (semi-)automatische Verschlagwortung textbasierter Inhalte auf Basis eines medizinischen Begriffssystems. Zu diesem Zweck wird der aktuelle Stand der Forschung betrachtet. Es werden eine Reihe von Tokenizern verglichen um zu erfahren welche Algorithmen sich zur Ermittlung von Wortgrenzen eignen. Speziell wird betrachtet, wie die Ermittlung von Wortgrenzen in einer domänenspezifischen Umgebung eingesetzt werden kann. Auf Basis von identifizierten Token in einem Text werden die Auswirkungen des Stemming und POS-Tagging auf die Gesamtmenge der zu analysierenden Inhalte beobachtet. Abschließend wird evaluiert wie ein kontrolliertes Vokabular die Präzision bei der Verschlagwortung erhöhen kann. Dies geschieht unter der Annahme dass domänenspezifische Inhalte auch innerhalb eines domänenspezifischen Begriffssystems definiert sind. Zu diesem Zweck wird ein allgemeines Prozessmodell entwickelt anhand dessen eine Verschlagwortung vorgenommen wird.
Integration externer Inhalte Inwieweit kann die Nutzung einer einheitlichen Terminologie zwischen Anfragesystem und Wissensbasis den Prozess der Informationsbeschaffung unterstützen? Zu diesem Zweck wird in einer ersten Phase ermittelt welche Wissensbasen aus der medizinischen Domäne in der Linked Data Cloud zur Verfügung stehen. Aufbauend auf den Ergebnissen werden Informationen aus verschiedenen dezentralen Wissensbasen exemplarisch integriert. Der Fokus der Betrachtung liegt dabei auf der verwendeten Terminologie sowie der Nutzung von Semantic Web Technologien. Neben Informationen aus der Linked Data Cloud erfolgt eine Suche nach medizinischer Literatur in PubMed. Wie auch in der Linked Data Cloud erfolgt die Integration unter Verwendung einer einheitlichen Terminologie. Eine weitere Fragestellung ist, wie Informationen aus insgesamt 21. Mio Aufsatzzitaten in PubMed sinnvoll integriert werden können. Dabei wird ermittelt welche Mechanismen eingesetzt werden können um die Präzision der Ergebnisse zu optimieren. Eignung medizinischer Begriffssystem Welche medizinischen Begriffssysteme existieren und wie eignen sich diese als zugrungeliegendes Vokabular für die automatische Verschlagwortung und Integration semantischer Inhalte? Der Fokus liegt dabei speziell auf einer Bewertung der Reichhaltigkeit von Begriffssystemen, wobei insbesondere der Detaillierungsgrad von Interesse ist. Handelt es sich um ein spezifisches oder allgemeines Begriffssystem und eignet sich dieses auch dafür bestimmte Teilaspekte der Medizin, wie bspw. die Chirurige oder die Anästhesie, in einer ausreichenden Tiefe zu beschreiben?
Golub, K.; Lykke, M.; Tudhope, D.: Enhancing social tagging with automated keywords from the Dewey Decimal Classification (2014) 0.02
```
0.015128385 = product of:
  0.18154062 = sum of:
    0.18154062 = weight(_text_:tagging in 2918) [ClassicSimilarity], result of:
      0.18154062 = score(doc=2918,freq=14.0), product of:
        0.21038401 = queryWeight, product of:
          5.9038734 = idf(docFreq=327, maxDocs=44218)
          0.035634913 = queryNorm
        0.8629013 = fieldWeight in 2918, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          5.9038734 = idf(docFreq=327, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2918)
  0.083333336 = coord(1/12)
```
Abstract

Purpose - The purpose of this paper is to explore the potential of applying the Dewey Decimal Classification (DDC) as an established knowledge organization system (KOS) for enhancing social tagging, with the ultimate purpose of improving subject indexing and information retrieval. Design/methodology/approach - Over 11.000 Intute metadata records in politics were used. Totally, 28 politics students were each given four tasks, in which a total of 60 resources were tagged in two different configurations, one with uncontrolled social tags only and another with uncontrolled social tags as well as suggestions from a controlled vocabulary. The controlled vocabulary was DDC comprising also mappings from the Library of Congress Subject Headings. Findings - The results demonstrate the importance of controlled vocabulary suggestions for indexing and retrieval: to help produce ideas of which tags to use, to make it easier to find focus for the tagging, to ensure consistency and to increase the number of access points in retrieval. The value and usefulness of the suggestions proved to be dependent on the quality of the suggestions, both as to conceptual relevance to the user and as to appropriateness of the terminology. Originality/value - No research has investigated the enhancement of social tagging with suggestions from the DDC, an established KOS, in a user trial, comparing social tagging only and social tagging enhanced with the suggestions. This paper is a final reflection on all aspects of the study.

Theme

Social tagging
Martins, A.L.; Souza, R.R.; Ribeiro de Mello, H.: ¬The use of noun phrases in information retrieval : proposing a mechanism for automatic classification (2014) 0.01
```
0.010758134 = product of:
  0.0645488 = sum of:
    0.054892723 = weight(_text_:tagging in 1441) [ClassicSimilarity], result of:
      0.054892723 = score(doc=1441,freq=2.0), product of:
        0.21038401 = queryWeight, product of:
          5.9038734 = idf(docFreq=327, maxDocs=44218)
          0.035634913 = queryNorm
        0.2609168 = fieldWeight in 1441, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.9038734 = idf(docFreq=327, maxDocs=44218)
          0.03125 = fieldNorm(doc=1441)
    0.009656077 = product of:
      0.019312155 = sum of:
        0.019312155 = weight(_text_:22 in 1441) [ClassicSimilarity], result of:
          0.019312155 = score(doc=1441,freq=2.0), product of:
            0.12478739 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.035634913 = queryNorm
            0.15476047 = fieldWeight in 1441, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=1441)
      0.5 = coord(1/2)
  0.16666667 = coord(2/12)
```
Abstract

This paper presents a research on syntactic structures known as noun phrases (NP) being applied to increase the effectiveness and efficiency of the mechanisms for the document's classification. Our hypothesis is the fact that the NP can be used instead of single words as a semantic aggregator to reduce the number of words that will be used for the classification system without losing its semantic coverage, increasing its efficiency. The experiment divided the documents classification process in three phases: a) NP preprocessing b) system training; and c) classification experiments. In the first step, a corpus of digitalized texts was submitted to a natural language processing platform1 in which the part-of-speech tagging was done, and them PERL scripts pertaining to the PALAVRAS package were used to extract the Noun Phrases. The preprocessing also involved the tasks of a) removing NP low meaning pre-modifiers, as quantifiers; b) identification of synonyms and corresponding substitution for common hyperonyms; and c) stemming of the relevant words contained in the NP, for similitude checking with other NPs. The first tests with the resulting documents have demonstrated its effectiveness. We have compared the structural similarity of the documents before and after the whole pre-processing steps of phase one. The texts maintained the consistency with the original and have kept the readability. The second phase involves submitting the modified documents to a SVM algorithm to identify clusters and classify the documents. The classification rules are to be established using a machine learning approach. Finally, tests will be conducted to check the effectiveness of the whole process.

Source

Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik

Junger, U.: Can indexing be automated? : the example of the Deutsche Nationalbibliothek (2012) 0.01

0.0097842505 = product of:
  0.0587055 = sum of:
    0.02935275 = weight(_text_:web in 1717) [ClassicSimilarity], result of:
      0.02935275 = score(doc=1717,freq=2.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.25239927 = fieldWeight in 1717, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1717)
    0.02935275 = weight(_text_:web in 1717) [ClassicSimilarity], result of:
      0.02935275 = score(doc=1717,freq=2.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.25239927 = fieldWeight in 1717, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1717)
  0.16666667 = coord(2/12)

Content: Beitrag für die Tagung: Beyond libraries - subject metadata in the digital environment and semantic web. IFLA Satellite Post-Conference, 17-18 August 2012, Tallinn. Vgl.: http://http://www.nlib.ee/index.php?id=17763.

Junger, U.: Can indexing be automated? : the example of the Deutsche Nationalbibliothek (2014) 0.01

0.0097842505 = product of:
  0.0587055 = sum of:
    0.02935275 = weight(_text_:web in 1969) [ClassicSimilarity], result of:
      0.02935275 = score(doc=1969,freq=2.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.25239927 = fieldWeight in 1969, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1969)
    0.02935275 = weight(_text_:web in 1969) [ClassicSimilarity], result of:
      0.02935275 = score(doc=1969,freq=2.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.25239927 = fieldWeight in 1969, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1969)
  0.16666667 = coord(2/12)

Footnote: Contribution in a special issue "Beyond libraries: Subject metadata in the digital environment and Semantic Web" - Enthält Beiträge der gleichnamigen IFLA Satellite Post-Conference, 17-18 August 2012, Tallinn.

Lichtenstein, A.; Plank, M.; Neumann, J.: TIB's portal for audiovisual media : combining manual and automatic indexing (2014) 0.01

0.0097842505 = product of:
  0.0587055 = sum of:
    0.02935275 = weight(_text_:web in 1981) [ClassicSimilarity], result of:
      0.02935275 = score(doc=1981,freq=2.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.25239927 = fieldWeight in 1981, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1981)
    0.02935275 = weight(_text_:web in 1981) [ClassicSimilarity], result of:
      0.02935275 = score(doc=1981,freq=2.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.25239927 = fieldWeight in 1981, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1981)
  0.16666667 = coord(2/12)

Abstract: The German National Library of Science and Technology (TIB) developed a Web-based platform for audiovisual media. The audiovisual portal optimizes access to scientific videos such as computer animations and lecture and conference recordings. TIB's AV-Portal combines traditional cataloging and automatic indexing of audiovisual media. The article describes metadata standards for audiovisual media and introduces the TIB's metadata schema in comparison to other metadata standards for non-textual materials. Additionally, we give an overview of multimedia retrieval technologies used for the Portal and present the AV-Portal in detail as well as the additional value for libraries and their users.

Schulz, K.U.; Brunner, L.: Vollautomatische thematische Verschlagwortung großer Textkollektionen mittels semantischer Netze (2017) 0.01
```
0.0097842505 = product of:
  0.0587055 = sum of:
    0.02935275 = weight(_text_:web in 3493) [ClassicSimilarity], result of:
      0.02935275 = score(doc=3493,freq=2.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.25239927 = fieldWeight in 3493, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3493)
    0.02935275 = weight(_text_:web in 3493) [ClassicSimilarity], result of:
      0.02935275 = score(doc=3493,freq=2.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.25239927 = fieldWeight in 3493, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3493)
  0.16666667 = coord(2/12)
```
Source

Theorie, Semantik und Organisation von Wissen: Proceedings der 13. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) und dem 13. Internationalen Symposium der Informationswissenschaft der Higher Education Association for Information Science (HI) Potsdam (19.-20.03.2013): 'Theory, Information and Organization of Knowledge' / Proceedings der 14. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) und Natural Language & Information Systems (NLDB) Passau (16.06.2015): 'Lexical Resources for Knowledge Organization' / Proceedings des Workshops der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) auf der SEMANTICS Leipzig (1.09.2014): 'Knowledge Organization and Semantic Web' / Proceedings des Workshops der Polnischen und Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) Cottbus (29.-30.09.2011): 'Economics of Knowledge Production and Organization'. Hrsg. von W. Babik, H.P. Ohly u. K. Weber
Böhm, A.; Seifert, C.; Schlötterer, J.; Granitzer, M.: Identifying tweets from the economic domain (2017) 0.01
```
0.0097842505 = product of:
  0.0587055 = sum of:
    0.02935275 = weight(_text_:web in 3495) [ClassicSimilarity], result of:
      0.02935275 = score(doc=3495,freq=2.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.25239927 = fieldWeight in 3495, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3495)
    0.02935275 = weight(_text_:web in 3495) [ClassicSimilarity], result of:
      0.02935275 = score(doc=3495,freq=2.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.25239927 = fieldWeight in 3495, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3495)
  0.16666667 = coord(2/12)
```
Source

Theorie, Semantik und Organisation von Wissen: Proceedings der 13. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) und dem 13. Internationalen Symposium der Informationswissenschaft der Higher Education Association for Information Science (HI) Potsdam (19.-20.03.2013): 'Theory, Information and Organization of Knowledge' / Proceedings der 14. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) und Natural Language & Information Systems (NLDB) Passau (16.06.2015): 'Lexical Resources for Knowledge Organization' / Proceedings des Workshops der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) auf der SEMANTICS Leipzig (1.09.2014): 'Knowledge Organization and Semantic Web' / Proceedings des Workshops der Polnischen und Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) Cottbus (29.-30.09.2011): 'Economics of Knowledge Production and Organization'. Hrsg. von W. Babik, H.P. Ohly u. K. Weber
Kempf, A.O.: Neue Verfahrenswege der Wissensorganisation : eine Evaluation automatischer Indexierung in der sozialwissenschaftlichen Fachinformation (2017) 0.01
```
0.0097842505 = product of:
  0.0587055 = sum of:
    0.02935275 = weight(_text_:web in 3497) [ClassicSimilarity], result of:
      0.02935275 = score(doc=3497,freq=2.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.25239927 = fieldWeight in 3497, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3497)
    0.02935275 = weight(_text_:web in 3497) [ClassicSimilarity], result of:
      0.02935275 = score(doc=3497,freq=2.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.25239927 = fieldWeight in 3497, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3497)
  0.16666667 = coord(2/12)
```
Source

Theorie, Semantik und Organisation von Wissen: Proceedings der 13. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) und dem 13. Internationalen Symposium der Informationswissenschaft der Higher Education Association for Information Science (HI) Potsdam (19.-20.03.2013): 'Theory, Information and Organization of Knowledge' / Proceedings der 14. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) und Natural Language & Information Systems (NLDB) Passau (16.06.2015): 'Lexical Resources for Knowledge Organization' / Proceedings des Workshops der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) auf der SEMANTICS Leipzig (1.09.2014): 'Knowledge Organization and Semantic Web' / Proceedings des Workshops der Polnischen und Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) Cottbus (29.-30.09.2011): 'Economics of Knowledge Production and Organization'. Hrsg. von W. Babik, H.P. Ohly u. K. Weber
Zhitomirsky-Geffet, M.; Prebor, G.; Bloch, O.: Improving proverb search and retrieval with a generic multidimensional ontology (2017) 0.01
```
0.0083865 = product of:
  0.050318997 = sum of:
    0.025159499 = weight(_text_:web in 3320) [ClassicSimilarity], result of:
      0.025159499 = score(doc=3320,freq=2.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.21634221 = fieldWeight in 3320, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=3320)
    0.025159499 = weight(_text_:web in 3320) [ClassicSimilarity], result of:
      0.025159499 = score(doc=3320,freq=2.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.21634221 = fieldWeight in 3320, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=3320)
  0.16666667 = coord(2/12)
```
Abstract

The goal of this research is to develop a generic ontological model for proverbs that unifies potential classification criteria and various characteristics of proverbs to enable their effective retrieval and large-scale analysis. Because proverbs can be described and indexed by multiple characteristics and criteria, we built a multidimensional ontology suitable for proverb classification. To evaluate the effectiveness of the constructed ontology for improving search and retrieval of proverbs, a large-scale user experiment was arranged with 70 users who were asked to search a proverb repository using ontology-based and free-text search interfaces. The comparative analysis of the results shows that the use of this ontology helped to substantially improve the search recall, precision, user satisfaction, and efficiency and to minimize user effort during the search process. A practical contribution of this work is an automated web-based proverb search and retrieval system which incorporates the proposed ontological scheme and an initial corpus of ontology-based annotated proverbs.
Smiraglia, R.P.; Cai, X.: Tracking the evolution of clustering, machine learning, automatic indexing and automatic classification in knowledge organization (2017) 0.01
```
0.0069887503 = product of:
  0.0419325 = sum of:
    0.02096625 = weight(_text_:web in 3627) [ClassicSimilarity], result of:
      0.02096625 = score(doc=3627,freq=2.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.18028519 = fieldWeight in 3627, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3627)
    0.02096625 = weight(_text_:web in 3627) [ClassicSimilarity], result of:
      0.02096625 = score(doc=3627,freq=2.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.18028519 = fieldWeight in 3627, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3627)
  0.16666667 = coord(2/12)
```
Abstract

A very important extension of the traditional domain of knowledge organization (KO) arises from attempts to incorporate techniques devised in the computer science domain for automatic concept extraction and for grouping, categorizing, clustering and otherwise organizing knowledge using mechanical means. Four specific terms have emerged to identify the most prevalent techniques: machine learning, clustering, automatic indexing, and automatic classification. Our study presents three domain analytical case analyses in search of answers. The first case relies on citations located using the ISKO-supported "Knowledge Organization Bibliography." The second case relies on works in both Web of Science and SCOPUS. Case three applies co-word analysis and citation analysis to the contents of the papers in the present special issue. We observe scholars involved in "clustering" and "automatic classification" who share common thematic emphases. But we have found no coherence, no common activity and no social semantics. We have not found a research front, or a common teleology within the KO domain. We also have found a lively group of authors who have succeeded in submitting papers to this special issue, and their work quite interestingly aligns with the case studies we report. There is an emphasis on KO for information retrieval; there is much work on clustering (which involves conceptual points within texts) and automatic classification (which involves semantic groupings at the meta-document level).

Ma, N.; Zheng, H.T.; Xiao, X.: ¬An ontology-based latent semantic indexing approach using long short-term memory networks (2017) 0.01

0.0069887503 = product of:
  0.0419325 = sum of:
    0.02096625 = weight(_text_:web in 3810) [ClassicSimilarity], result of:
      0.02096625 = score(doc=3810,freq=2.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.18028519 = fieldWeight in 3810, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3810)
    0.02096625 = weight(_text_:web in 3810) [ClassicSimilarity], result of:
      0.02096625 = score(doc=3810,freq=2.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.18028519 = fieldWeight in 3810, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3810)
  0.16666667 = coord(2/12)

Source: Web and Big Data: First International Joint Conference, APWeb-WAIM 2017, Beijing, China, July 7-9, 2017, Proceedings, Part I. Eds.: L. Chen et al

Donath, A.: Flickr sorgt mit Automatik-Tags für Aufregung (2015) 0.01
```
0.0057179923 = product of:
  0.068615906 = sum of:
    0.068615906 = weight(_text_:tagging in 1876) [ClassicSimilarity], result of:
      0.068615906 = score(doc=1876,freq=2.0), product of:
        0.21038401 = queryWeight, product of:
          5.9038734 = idf(docFreq=327, maxDocs=44218)
          0.035634913 = queryNorm
        0.326146 = fieldWeight in 1876, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.9038734 = idf(docFreq=327, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1876)
  0.083333336 = coord(1/12)
```
Content

"Flickr hat ein Tagging der heraufgeladenen Fotos eingeführt, das zusätzlich zu den Bildbeschreibungen der Nutzer versucht, die Fotos mit Schlagwörtern zu versehen, die den Bildinhalt beschreiben. Nach einem Bericht des britischen Guardian werden dabei Fehler gemacht, die unangebrachte Beschreibungen bis hin zu rassistischen oder politisch inkorrekten Bemerkungen beinhalten. So wurden dunkelhäutiger Menschen als "monochrom", "Tier" und "Affe" beschrieben. Auch das Gesicht einer hellhäutigen Frau wurde mit "Tier" klassifiziert. Bilder eines Konzentrationslagers wurden gar mit "Sport" und "Klettergerüst" verschlagwortet. Die automatischen Tags lassen sich nicht abschalten - und befinden sich nach Angaben von Yahoo noch in der Betaphase. Viel bringen sie nach Einschätzung von Golem.de nicht, da sie recht allgemein gehalten und wenig aussagekräftig sind. Oftmals kann der Algorithmus nur "Indoor" oder "Outdoor" hinzufügen, was zwar fast immer korrekt zugeordnet wird, dennoch wenig nutzt. Hinter den Kulissen scheint Flickr bereits an einer Verbesserung zu arbeiten - und hat dem Guardian auf Nachfrage versichert, dass die Probleme mit falschen Tags bekannt seien. Einige fehlerhafte Schlagwörter wurden mittlerweile auch wieder entfernt." Vgl. auch: https://news.ycombinator.com/item?id=8621658.
Bredack, J.: Automatische Extraktion fachterminologischer Mehrwortbegriffe : ein Verfahrensvergleich (2016) 0.01
```
0.0057179923 = product of:
  0.068615906 = sum of:
    0.068615906 = weight(_text_:tagging in 3194) [ClassicSimilarity], result of:
      0.068615906 = score(doc=3194,freq=2.0), product of:
        0.21038401 = queryWeight, product of:
          5.9038734 = idf(docFreq=327, maxDocs=44218)
          0.035634913 = queryNorm
        0.326146 = fieldWeight in 3194, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.9038734 = idf(docFreq=327, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3194)
  0.083333336 = coord(1/12)
```
Abstract

Als Extraktionssysteme wurden der TreeTagger und die Indexierungssoftware Lingo verwendet. Der TreeTagger basiert auf einem statistischen Tagging- und Chunking- Algorithmus, mit dessen Hilfe NPs automatisch identifiziert und extrahiert werden. Er kann für verschiedene Anwendungsszenarien der natürlichen Sprachverarbeitung eingesetzt werden, in erster Linie als POS-Tagger für unterschiedliche Sprachen. Das Indexierungssystem Lingo arbeitet im Gegensatz zum TreeTagger mit elektronischen Wörterbüchern und einem musterbasierten Abgleich. Lingo ist ein auf automatische Indexierung ausgerichtetes System, was eine Vielzahl von Modulen mitliefert, die individuell auf eine bestimmte Aufgabenstellung angepasst und aufeinander abgestimmt werden können. Die unterschiedlichen Verarbeitungsweisen haben sich in den Ergebnismengen beider Systeme deutlich gezeigt. Die gering ausfallenden Übereinstimmungen der Ergebnismengen verdeutlichen die abweichende Funktionsweise und konnte mit einer qualitativen Analyse beispielhaft beschrieben werden. In der vorliegenden Arbeit kann abschließend nicht geklärt werden, welches der beiden Systeme bevorzugt für die Generierung von Indextermen eingesetzt werden sollte.

Souza, R.R.; Gil-Leiva, I.: Automatic indexing of scientific texts : a methodological comparison (2016) 0.00

0.0038777683 = product of:
  0.04653322 = sum of:
    0.04653322 = weight(_text_:world in 4913) [ClassicSimilarity], result of:
      0.04653322 = score(doc=4913,freq=2.0), product of:
        0.13696888 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.035634913 = queryNorm
        0.33973572 = fieldWeight in 4913, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.0625 = fieldNorm(doc=4913)
  0.083333336 = coord(1/12)

Source: Knowledge organization for a sustainable world: challenges and perspectives for cultural, scientific, and technological sharing in a connected society : proceedings of the Fourteenth International ISKO Conference 27-29 September 2016, Rio de Janeiro, Brazil / organized by International Society for Knowledge Organization (ISKO), ISKO-Brazil, São Paulo State University ; edited by José Augusto Chaves Guimarães, Suellen Oliveira Milani, Vera Dodebei

Golub, K.: Automatic subject indexing of text (2019) 0.00
```
0.0032205172 = product of:
  0.038646206 = sum of:
    0.038646206 = weight(_text_:wide in 5268) [ClassicSimilarity], result of:
      0.038646206 = score(doc=5268,freq=2.0), product of:
        0.1578897 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.035634913 = queryNorm
        0.24476713 = fieldWeight in 5268, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5268)
  0.083333336 = coord(1/12)
```
Abstract

Automatic subject indexing addresses problems of scale and sustainability and can be at the same time used to enrich existing metadata records, establish more connections across and between resources from various metadata and resource collec-tions, and enhance consistency of the metadata. In this work, au-tomatic subject indexing focuses on assigning index terms or classes from established knowledge organization systems (KOSs) for subject indexing like thesauri, subject headings systems and classification systems. The following major approaches are dis-cussed, in terms of their similarities and differences, advantages and disadvantages for automatic assigned indexing from KOSs: "text categorization," "document clustering," and "document classification." Text categorization is perhaps the most wide-spread, machine-learning approach with what seems generally good reported performance. Document clustering automatically both creates groups of related documents and extracts names of subjects depicting the group at hand. Document classification re-uses the intellectual effort invested into creating a KOS for sub-ject indexing and even simple string-matching algorithms have been reported to achieve good results, because one concept can be described using a number of different terms, including equiv-alent, related, narrower and broader terms. Finally, applicability of automatic subject indexing to operative information systems and challenges of evaluation are outlined, suggesting the need for more research.

Search (31 results, page 1 of 2)

Authors

Languages

Types

Themes