Search (5 results, page 1 of 1)

  • × theme_ss:"Automatisches Indexieren"
  • × year_i:[2020 TO 2030}
  1. Zhang, Y.; Zhang, C.; Li, J.: Joint modeling of characters, words, and conversation contexts for microblog keyphrase extraction (2020) 0.01
    0.010183054 = product of:
      0.020366108 = sum of:
        0.020366108 = product of:
          0.040732216 = sum of:
            0.040732216 = weight(_text_:data in 5816) [ClassicSimilarity], result of:
              0.040732216 = score(doc=5816,freq=4.0), product of:
                0.16488427 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.052144732 = queryNorm
                0.24703519 = fieldWeight in 5816, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5816)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Millions of messages are produced on microblog platforms every day, leading to the pressing need for automatic identification of key points from the massive texts. To absorb salient content from the vast bulk of microblog posts, this article focuses on the task of microblog keyphrase extraction. In previous work, most efforts treat messages as independent documents and might suffer from the data sparsity problem exhibited in short and informal microblog posts. On the contrary, we propose to enrich contexts via exploiting conversations initialized by target posts and formed by their replies, which are generally centered around relevant topics to the target posts and therefore helpful for keyphrase identification. Concretely, we present a neural keyphrase extraction framework, which has 2 modules: a conversation context encoder and a keyphrase tagger. The conversation context encoder captures indicative representation from their conversation contexts and feeds the representation into the keyphrase tagger, and the keyphrase tagger extracts salient words from target posts. The 2 modules were trained jointly to optimize the conversation context encoding and keyphrase extraction processes. In the conversation context encoder, we leverage hierarchical structures to capture the word-level indicative representation and message-level indicative representation hierarchically. In both of the modules, we apply character-level representations, which enables the model to explore morphological features and deal with the out-of-vocabulary problem caused by the informal language style of microblog messages. Extensive comparison results on real-life data sets indicate that our model outperforms state-of-the-art models from previous studies.
  2. Lowe, D.B.; Dollinger, I.; Koster, T.; Herbert, B.E.: Text mining for type of research classification (2021) 0.01
    0.008640608 = product of:
      0.017281216 = sum of:
        0.017281216 = product of:
          0.03456243 = sum of:
            0.03456243 = weight(_text_:data in 720) [ClassicSimilarity], result of:
              0.03456243 = score(doc=720,freq=2.0), product of:
                0.16488427 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.052144732 = queryNorm
                0.2096163 = fieldWeight in 720, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.046875 = fieldNorm(doc=720)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Theme
    Data Mining
  3. Villaespesa, E.; Crider, S.: ¬A critical comparison analysis between human and machine-generated tags for the Metropolitan Museum of Art's collection (2021) 0.01
    0.007200507 = product of:
      0.014401014 = sum of:
        0.014401014 = product of:
          0.028802028 = sum of:
            0.028802028 = weight(_text_:data in 341) [ClassicSimilarity], result of:
              0.028802028 = score(doc=341,freq=2.0), product of:
                0.16488427 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.052144732 = queryNorm
                0.17468026 = fieldWeight in 341, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=341)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Purpose Based on the highlights of The Metropolitan Museum of Art's collection, the purpose of this paper is to examine the similarities and differences between the subject keywords tags assigned by the museum and those produced by three computer vision systems. Design/methodology/approach This paper uses computer vision tools to generate the data and the Getty Research Institute's Art and Architecture Thesaurus (AAT) to compare the subject keyword tags. Findings This paper finds that there are clear opportunities to use computer vision technologies to automatically generate tags that expand the terms used by the museum. This brings a new perspective to the collection that is different from the traditional art historical one. However, the study also surfaces challenges about the accuracy and lack of context within the computer vision results. Practical implications This finding has important implications on how these machine-generated tags complement the current taxonomies and vocabularies inputted in the collection database. In consequence, the museum needs to consider the selection process for choosing which computer vision system to apply to their collection. Furthermore, they also need to think critically about the kind of tags they wish to use, such as colors, materials or objects. Originality/value The study results add to the rapidly evolving field of computer vision within the art information context and provide recommendations of aspects to consider before selecting and implementing these technologies.
  4. Pintscher, L.; Bourgonje, P.; Moreno Schneider, J.; Ostendorff, M.; Rehm, G.: Wissensbasen für die automatische Erschließung und ihre Qualität am Beispiel von Wikidata : die Inhaltserschließungspolitik der Deutschen Nationalbibliothek (2021) 0.01
    0.007200507 = product of:
      0.014401014 = sum of:
        0.014401014 = product of:
          0.028802028 = sum of:
            0.028802028 = weight(_text_:data in 366) [ClassicSimilarity], result of:
              0.028802028 = score(doc=366,freq=2.0), product of:
                0.16488427 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.052144732 = queryNorm
                0.17468026 = fieldWeight in 366, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=366)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Wikidata ist eine freie Wissensbasis, die allgemeine Daten über die Welt zur Verfügung stellt. Sie wird von Wikimedia entwickelt und betrieben, wie auch das Schwesterprojekt Wikipedia. Die Daten in Wikidata werden von einer großen Community von Freiwilligen gesammelt und gepflegt, wobei die Daten sowie die zugrundeliegende Ontologie von vielen Projekten, Institutionen und Firmen als Basis für Applikationen und Visualisierungen, aber auch für das Training von maschinellen Lernverfahren genutzt werden. Wikidata nutzt MediaWiki und die Erweiterung Wikibase als technische Grundlage der kollaborativen Arbeit an einer Wissensbasis, die verlinkte offene Daten für Menschen und Maschinen zugänglich macht. Ende 2020 beschreibt Wikidata über 90 Millionen Entitäten unter Verwendung von über 8 000 Eigenschaften, womit insgesamt mehr als 1,15 Milliarden Aussagen über die beschriebenen Entitäten getroffen werden. Die Datenobjekte dieser Entitäten sind mit äquivalenten Einträgen in mehr als 5 500 externen Datenbanken, Katalogen und Webseiten verknüpft, was Wikidata zu einem der zentralen Knotenpunkte des Linked Data Web macht. Mehr als 11 500 aktiv Editierende tragen neue Daten in die Wissensbasis ein und pflegen sie. Diese sind in Wiki-Projekten organisiert, die jeweils bestimmte Themenbereiche oder Aufgabengebiete adressieren. Die Daten werden in mehr als der Hälfte der Inhaltsseiten in den Wikimedia-Projekten genutzt und unter anderem mehr als 6,5 Millionen Mal am Tag über den SPARQL-Endpoint abgefragt, um sie in externe Applikationen und Visualisierungen einzubinden.
  5. Ahmed, M.: Automatic indexing for agriculture : designing a framework by deploying Agrovoc, Agris and Annif (2023) 0.01
    0.007200507 = product of:
      0.014401014 = sum of:
        0.014401014 = product of:
          0.028802028 = sum of:
            0.028802028 = weight(_text_:data in 1024) [ClassicSimilarity], result of:
              0.028802028 = score(doc=1024,freq=2.0), product of:
                0.16488427 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.052144732 = queryNorm
                0.17468026 = fieldWeight in 1024, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1024)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    There are several ways to employ machine learning for automating subject indexing. One popular strategy is to utilize a supervised learning algorithm to train a model on a set of documents that have been manually indexed by subject matter using a standard vocabulary. The resulting model can then predict the subject of new and previously unseen documents by identifying patterns learned from the training data. To do this, the first step is to gather a large dataset of documents and manually assign each document a set of subject keywords/descriptors from a controlled vocabulary (e.g., from Agrovoc). Next, the dataset (obtained from Agris) can be divided into - i) a training dataset, and ii) a test dataset. The training dataset is used to train the model, while the test dataset is used to evaluate the model's performance. Machine learning can be a powerful tool for automating the process of subject indexing. This research is an attempt to apply Annif (http://annif. org/), an open-source AI/ML framework, to autogenerate subject keywords/descriptors for documentary resources in the domain of agriculture. The training dataset is obtained from Agris, which applies the Agrovoc thesaurus as a vocabulary tool (https://www.fao.org/agris/download).