Search (16 results, page 1 of 1)

  • × theme_ss:"Automatisches Indexieren"
  • × year_i:[2010 TO 2020}
  1. Martins, A.L.; Souza, R.R.; Ribeiro de Mello, H.: ¬The use of noun phrases in information retrieval : proposing a mechanism for automatic classification (2014) 0.03
    0.030189436 = product of:
      0.06037887 = sum of:
        0.06037887 = sum of:
          0.035710584 = weight(_text_:b in 1441) [ClassicSimilarity], result of:
            0.035710584 = score(doc=1441,freq=4.0), product of:
              0.16126883 = queryWeight, product of:
                3.542962 = idf(docFreq=3476, maxDocs=44218)
                0.045518078 = queryNorm
              0.22143513 = fieldWeight in 1441, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.542962 = idf(docFreq=3476, maxDocs=44218)
                0.03125 = fieldNorm(doc=1441)
          0.024668286 = weight(_text_:22 in 1441) [ClassicSimilarity], result of:
            0.024668286 = score(doc=1441,freq=2.0), product of:
              0.15939656 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.045518078 = queryNorm
              0.15476047 = fieldWeight in 1441, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.03125 = fieldNorm(doc=1441)
      0.5 = coord(1/2)
    
    Abstract
    This paper presents a research on syntactic structures known as noun phrases (NP) being applied to increase the effectiveness and efficiency of the mechanisms for the document's classification. Our hypothesis is the fact that the NP can be used instead of single words as a semantic aggregator to reduce the number of words that will be used for the classification system without losing its semantic coverage, increasing its efficiency. The experiment divided the documents classification process in three phases: a) NP preprocessing b) system training; and c) classification experiments. In the first step, a corpus of digitalized texts was submitted to a natural language processing platform1 in which the part-of-speech tagging was done, and them PERL scripts pertaining to the PALAVRAS package were used to extract the Noun Phrases. The preprocessing also involved the tasks of a) removing NP low meaning pre-modifiers, as quantifiers; b) identification of synonyms and corresponding substitution for common hyperonyms; and c) stemming of the relevant words contained in the NP, for similitude checking with other NPs. The first tests with the resulting documents have demonstrated its effectiveness. We have compared the structural similarity of the documents before and after the whole pre-processing steps of phase one. The texts maintained the consistency with the original and have kept the readability. The second phase involves submitting the modified documents to a SVM algorithm to identify clusters and classify the documents. The classification rules are to be established using a machine learning approach. Finally, tests will be conducted to check the effectiveness of the whole process.
    Source
    Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik
  2. Greiner-Petter, A.; Schubotz, M.; Cohl, H.S.; Gipp, B.: Semantic preserving bijective mappings for expressions involving special functions between computer algebra systems and document preparation systems (2019) 0.02
    0.02495974 = product of:
      0.04991948 = sum of:
        0.04991948 = sum of:
          0.025251195 = weight(_text_:b in 5499) [ClassicSimilarity], result of:
            0.025251195 = score(doc=5499,freq=2.0), product of:
              0.16126883 = queryWeight, product of:
                3.542962 = idf(docFreq=3476, maxDocs=44218)
                0.045518078 = queryNorm
              0.15657827 = fieldWeight in 5499, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.542962 = idf(docFreq=3476, maxDocs=44218)
                0.03125 = fieldNorm(doc=5499)
          0.024668286 = weight(_text_:22 in 5499) [ClassicSimilarity], result of:
            0.024668286 = score(doc=5499,freq=2.0), product of:
              0.15939656 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.045518078 = queryNorm
              0.15476047 = fieldWeight in 5499, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.03125 = fieldNorm(doc=5499)
      0.5 = coord(1/2)
    
    Date
    20. 1.2015 18:30:22
  3. Hauer, M.: Tiefenindexierung im Bibliothekskatalog : 17 Jahre intelligentCAPTURE (2019) 0.02
    0.018501213 = product of:
      0.037002426 = sum of:
        0.037002426 = product of:
          0.07400485 = sum of:
            0.07400485 = weight(_text_:22 in 5629) [ClassicSimilarity], result of:
              0.07400485 = score(doc=5629,freq=2.0), product of:
                0.15939656 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045518078 = queryNorm
                0.46428138 = fieldWeight in 5629, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=5629)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    B.I.T.online. 22(2019) H.2, S.163-166
  4. Wiesenmüller, H.: DNB-Sacherschließung : Neues für die Reihen A und B (2019) 0.02
    0.016401134 = product of:
      0.03280227 = sum of:
        0.03280227 = product of:
          0.06560454 = sum of:
            0.06560454 = weight(_text_:b in 5212) [ClassicSimilarity], result of:
              0.06560454 = score(doc=5212,freq=6.0), product of:
                0.16126883 = queryWeight, product of:
                  3.542962 = idf(docFreq=3476, maxDocs=44218)
                  0.045518078 = queryNorm
                0.40680233 = fieldWeight in 5212, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.542962 = idf(docFreq=3476, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5212)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    "Alle paar Jahre wird die Bibliothekscommunity mit Veränderungen in der inhaltlichen Erschließung durch die Deutsche Nationalbibliothek konfrontiert. Sicher werden sich viele noch an die Einschnitte des Jahres 2014 für die Reihe A erinnern: Seither werden u.a. Ratgeber, Sprachwörterbücher, Reiseführer und Kochbücher nicht mehr mit Schlagwörtern erschlossen (vgl. das DNB-Konzept von 2014). Das Jahr 2017 brachte die Einführung der maschinellen Indexierung für die Reihen B und H bei gleichzeitigem Verlust der DDC-Tiefenerschließung (vgl. DNB-Informationen von 2017). Virulent war seither die Frage, was mit der Reihe A passieren würde. Seit wenigen Tagen kann man dies nun auf der Website der DNB nachlesen. (Nebenbei: Es ist zu befürchten, dass viele Links in diesem Blog-Beitrag in absehbarer Zeit nicht mehr funktionieren werden, da ein Relaunch der DNB-Website angekündigt ist. Wie beim letzten Mal wird es vermutlich auch diesmal keine Weiterleitungen von den alten auf die neuen URLs geben.)"
    Source
    https://www.basiswissen-rda.de/dnb-sacherschliessung-reihen-a-und-b/
  5. Stankovic, R. et al.: Indexing of textual databases based on lexical resources : a case study for Serbian (2016) 0.02
    0.015417679 = product of:
      0.030835358 = sum of:
        0.030835358 = product of:
          0.061670717 = sum of:
            0.061670717 = weight(_text_:22 in 2759) [ClassicSimilarity], result of:
              0.061670717 = score(doc=2759,freq=2.0), product of:
                0.15939656 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045518078 = queryNorm
                0.38690117 = fieldWeight in 2759, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=2759)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    1. 2.2016 18:25:22
  6. Glaesener, L.: Automatisches Indexieren einer informationswissenschaftlichen Datenbank mit Mehrwortgruppen (2012) 0.01
    0.012334143 = product of:
      0.024668286 = sum of:
        0.024668286 = product of:
          0.04933657 = sum of:
            0.04933657 = weight(_text_:22 in 401) [ClassicSimilarity], result of:
              0.04933657 = score(doc=401,freq=2.0), product of:
                0.15939656 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045518078 = queryNorm
                0.30952093 = fieldWeight in 401, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=401)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    11. 9.2012 19:43:22
  7. Siebenkäs, A.; Markscheffel, B.: Conception of a workflow for the semi-automatic construction of a thesaurus for the German printing industry (2015) 0.01
    0.011047398 = product of:
      0.022094795 = sum of:
        0.022094795 = product of:
          0.04418959 = sum of:
            0.04418959 = weight(_text_:b in 2091) [ClassicSimilarity], result of:
              0.04418959 = score(doc=2091,freq=2.0), product of:
                0.16126883 = queryWeight, product of:
                  3.542962 = idf(docFreq=3476, maxDocs=44218)
                  0.045518078 = queryNorm
                0.27401197 = fieldWeight in 2091, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.542962 = idf(docFreq=3476, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2091)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
  8. Wiesenmüller, H.: Maschinelle Indexierung am Beispiel der DNB : Analyse und Entwicklungmöglichkeiten (2018) 0.01
    0.011047398 = product of:
      0.022094795 = sum of:
        0.022094795 = product of:
          0.04418959 = sum of:
            0.04418959 = weight(_text_:b in 5209) [ClassicSimilarity], result of:
              0.04418959 = score(doc=5209,freq=2.0), product of:
                0.16126883 = queryWeight, product of:
                  3.542962 = idf(docFreq=3476, maxDocs=44218)
                  0.045518078 = queryNorm
                0.27401197 = fieldWeight in 5209, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.542962 = idf(docFreq=3476, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5209)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Der Beitrag untersucht die Ergebnisse des bei der Deutschen Nationalbibliothek (DNB) eingesetzten Verfahrens zur automatischen Vergabe von Schlagwörtern. Seit 2017 kommt dieses auch bei Printausgaben der Reihen B und H der Deutschen Nationalbibliografie zum Einsatz. Die zentralen Problembereiche werden dargestellt und an Beispielen illustriert - beispielsweise dass nicht alle im Inhaltsverzeichnis vorkommenden Wörter tatsächlich thematische Aspekte ausdrücken und dass die Software sehr häufig Körperschaften und andere "Named entities" nicht erkennt. Die maschinell generierten Ergebnisse sind derzeit sehr unbefriedigend. Es werden Überlegungen für mögliche Verbesserungen und sinnvolle Strategien angestellt.
  9. Kasprzik, A.: Voraussetzungen und Anwendungspotentiale einer präzisen Sacherschließung aus Sicht der Wissenschaft (2018) 0.01
    0.010792375 = product of:
      0.02158475 = sum of:
        0.02158475 = product of:
          0.0431695 = sum of:
            0.0431695 = weight(_text_:22 in 5195) [ClassicSimilarity], result of:
              0.0431695 = score(doc=5195,freq=2.0), product of:
                0.15939656 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045518078 = queryNorm
                0.2708308 = fieldWeight in 5195, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5195)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Große Aufmerksamkeit richtet sich im Moment auf das Potential von automatisierten Methoden in der Sacherschließung und deren Interaktionsmöglichkeiten mit intellektuellen Methoden. In diesem Kontext befasst sich der vorliegende Beitrag mit den folgenden Fragen: Was sind die Anforderungen an bibliothekarische Metadaten aus Sicht der Wissenschaft? Was wird gebraucht, um den Informationsbedarf der Fachcommunities zu bedienen? Und was bedeutet das entsprechend für die Automatisierung der Metadatenerstellung und -pflege? Dieser Beitrag fasst die von der Autorin eingenommene Position in einem Impulsvortrag und der Podiumsdiskussion beim Workshop der FAG "Erschließung und Informationsvermittlung" des GBV zusammen. Der Workshop fand im Rahmen der 22. Verbundkonferenz des GBV statt.
  10. Franke-Maier, M.: Anforderungen an die Qualität der Inhaltserschließung im Spannungsfeld von intellektuell und automatisch erzeugten Metadaten (2018) 0.01
    0.010792375 = product of:
      0.02158475 = sum of:
        0.02158475 = product of:
          0.0431695 = sum of:
            0.0431695 = weight(_text_:22 in 5344) [ClassicSimilarity], result of:
              0.0431695 = score(doc=5344,freq=2.0), product of:
                0.15939656 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045518078 = queryNorm
                0.2708308 = fieldWeight in 5344, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5344)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Spätestens seit dem Deutschen Bibliothekartag 2018 hat sich die Diskussion zu den automatischen Verfahren der Inhaltserschließung der Deutschen Nationalbibliothek von einer politisch geführten Diskussion in eine Qualitätsdiskussion verwandelt. Der folgende Beitrag beschäftigt sich mit Fragen der Qualität von Inhaltserschließung in digitalen Zeiten, wo heterogene Erzeugnisse unterschiedlicher Verfahren aufeinandertreffen und versucht, wichtige Anforderungen an Qualität zu definieren. Dieser Tagungsbeitrag fasst die vom Autor als Impulse vorgetragenen Ideen beim Workshop der FAG "Erschließung und Informationsvermittlung" des GBV am 29. August 2018 in Kiel zusammen. Der Workshop fand im Rahmen der 22. Verbundkonferenz des GBV statt.
  11. Cui, H.; Boufford, D.; Selden, P.: Semantic annotation of biosystematics literature without training examples (2010) 0.01
    0.009469198 = product of:
      0.018938396 = sum of:
        0.018938396 = product of:
          0.037876792 = sum of:
            0.037876792 = weight(_text_:b in 3422) [ClassicSimilarity], result of:
              0.037876792 = score(doc=3422,freq=2.0), product of:
                0.16126883 = queryWeight, product of:
                  3.542962 = idf(docFreq=3476, maxDocs=44218)
                  0.045518078 = queryNorm
                0.23486741 = fieldWeight in 3422, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.542962 = idf(docFreq=3476, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3422)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This article presents an unsupervised algorithm for semantic annotation of morphological descriptions of whole organisms. The algorithm is able to annotate plain text descriptions with high accuracy at the clause level by exploiting the corpus itself. In other words, the algorithm does not need lexicons, syntactic parsers, training examples, or annotation templates. The evaluation on two real-life description collections in botany and paleontology shows that the algorithm has the following desirable features: (a) reduces/eliminates manual labor required to compile dictionaries and prepare source documents; (b) improves annotation coverage: the algorithm annotates what appears in documents and is not limited by predefined and often incomplete templates; (c) learns clean and reusable concepts: the algorithm learns organ names and character states that can be used to construct reusable domain lexicons, as opposed to collection-dependent patterns whose applicability is often limited to a particular collection; (d) insensitive to collection size; and (e) runs in linear time with respect to the number of clauses to be annotated.
  12. Schöneberg, U.; Gödert, W.: Erschließung mathematischer Publikationen mittels linguistischer Verfahren (2012) 0.01
    0.009469198 = product of:
      0.018938396 = sum of:
        0.018938396 = product of:
          0.037876792 = sum of:
            0.037876792 = weight(_text_:b in 1055) [ClassicSimilarity], result of:
              0.037876792 = score(doc=1055,freq=2.0), product of:
                0.16126883 = queryWeight, product of:
                  3.542962 = idf(docFreq=3476, maxDocs=44218)
                  0.045518078 = queryNorm
                0.23486741 = fieldWeight in 1055, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.542962 = idf(docFreq=3476, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1055)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    http://at.yorku.ca/c/b/f/j/99.htm
  13. Kiros, R.; Salakhutdinov, R.; Zemel, R.S.: Unifying visual-semantic embeddings with multimodal neural language models (2014) 0.01
    0.009469198 = product of:
      0.018938396 = sum of:
        0.018938396 = product of:
          0.037876792 = sum of:
            0.037876792 = weight(_text_:b in 1871) [ClassicSimilarity], result of:
              0.037876792 = score(doc=1871,freq=2.0), product of:
                0.16126883 = queryWeight, product of:
                  3.542962 = idf(docFreq=3476, maxDocs=44218)
                  0.045518078 = queryNorm
                0.23486741 = fieldWeight in 1871, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.542962 = idf(docFreq=3476, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1871)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Inspired by recent advances in multimodal learning and machine translation, we introduce an encoder-decoder pipeline that learns (a): a multimodal joint embedding space with images and text and (b): a novel language model for decoding distributed representations from our space. Our pipeline effectively unifies joint image-text embedding models with multimodal neural language models. We introduce the structure-content neural language model that disentangles the structure of a sentence to its content, conditioned on representations produced by the encoder. The encoder allows one to rank images and sentences while the decoder can generate novel descriptions from scratch. Using LSTM to encode sentences, we match the state-of-the-art performance on Flickr8K and Flickr30K without using object detections. We also set new best results when using the 19-layer Oxford convolutional network. Furthermore we show that with linear encoders, the learned embedding space captures multimodal regularities in terms of vector space arithmetic e.g. *image of a blue car* - "blue" + "red" is near images of red cars. Sample captions generated for 800 images are made available for comparison.
  14. Busch, D.: Domänenspezifische hybride automatische Indexierung von bibliographischen Metadaten (2019) 0.01
    0.009250606 = product of:
      0.018501213 = sum of:
        0.018501213 = product of:
          0.037002426 = sum of:
            0.037002426 = weight(_text_:22 in 5628) [ClassicSimilarity], result of:
              0.037002426 = score(doc=5628,freq=2.0), product of:
                0.15939656 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045518078 = queryNorm
                0.23214069 = fieldWeight in 5628, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5628)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    B.I.T.online. 22(2019) H.6, S.465-469
  15. Junger, U.; Schwens, U.: ¬Die inhaltliche Erschließung des schriftlichen kulturellen Erbes auf dem Weg in die Zukunft : Automatische Vergabe von Schlagwörtern in der Deutschen Nationalbibliothek (2017) 0.01
    0.0077088396 = product of:
      0.015417679 = sum of:
        0.015417679 = product of:
          0.030835358 = sum of:
            0.030835358 = weight(_text_:22 in 3780) [ClassicSimilarity], result of:
              0.030835358 = score(doc=3780,freq=2.0), product of:
                0.15939656 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045518078 = queryNorm
                0.19345059 = fieldWeight in 3780, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3780)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    19. 8.2017 9:24:22
  16. Mesquita, L.A.P.; Souza, R.R.; Baracho Porto, R.M.A.: Noun phrases in automatic indexing: : a structural analysis of the distribution of relevant terms in doctoral theses (2014) 0.01
    0.0061670714 = product of:
      0.012334143 = sum of:
        0.012334143 = product of:
          0.024668286 = sum of:
            0.024668286 = weight(_text_:22 in 1442) [ClassicSimilarity], result of:
              0.024668286 = score(doc=1442,freq=2.0), product of:
                0.15939656 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045518078 = queryNorm
                0.15476047 = fieldWeight in 1442, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03125 = fieldNorm(doc=1442)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik