Search (44 results, page 1 of 3)

  • × theme_ss:"Automatisches Indexieren"
  • × year_i:[2010 TO 2020}
  1. Stankovic, R. et al.: Indexing of textual databases based on lexical resources : a case study for Serbian (2016) 0.16
    0.16492793 = product of:
      0.2473919 = sum of:
        0.09675461 = weight(_text_:semantic in 2759) [ClassicSimilarity], result of:
          0.09675461 = score(doc=2759,freq=2.0), product of:
            0.21061863 = queryWeight, product of:
              4.1578603 = idf(docFreq=1879, maxDocs=44218)
              0.050655533 = queryNorm
            0.45938298 = fieldWeight in 2759, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1578603 = idf(docFreq=1879, maxDocs=44218)
              0.078125 = fieldNorm(doc=2759)
        0.15063728 = sum of:
          0.08200603 = weight(_text_:indexing in 2759) [ClassicSimilarity], result of:
            0.08200603 = score(doc=2759,freq=2.0), product of:
              0.19390269 = queryWeight, product of:
                3.8278677 = idf(docFreq=2614, maxDocs=44218)
                0.050655533 = queryNorm
              0.42292362 = fieldWeight in 2759, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.8278677 = idf(docFreq=2614, maxDocs=44218)
                0.078125 = fieldNorm(doc=2759)
          0.068631254 = weight(_text_:22 in 2759) [ClassicSimilarity], result of:
            0.068631254 = score(doc=2759,freq=2.0), product of:
              0.17738704 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.050655533 = queryNorm
              0.38690117 = fieldWeight in 2759, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.078125 = fieldNorm(doc=2759)
      0.6666667 = coord(2/3)
    
    Date
    1. 2.2016 18:25:22
    Source
    Semantic keyword-based search on structured data sources: First COST Action IC1302 International KEYSTONE Conference, IKC 2015, Coimbra, Portugal, September 8-9, 2015. Revised Selected Papers. Eds.: J. Cardoso et al
  2. Ma, N.; Zheng, H.T.; Xiao, X.: ¬An ontology-based latent semantic indexing approach using long short-term memory networks (2017) 0.11
    0.106335156 = product of:
      0.15950273 = sum of:
        0.11849972 = weight(_text_:semantic in 3810) [ClassicSimilarity], result of:
          0.11849972 = score(doc=3810,freq=12.0), product of:
            0.21061863 = queryWeight, product of:
              4.1578603 = idf(docFreq=1879, maxDocs=44218)
              0.050655533 = queryNorm
            0.56262696 = fieldWeight in 3810, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              4.1578603 = idf(docFreq=1879, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3810)
        0.041003015 = product of:
          0.08200603 = sum of:
            0.08200603 = weight(_text_:indexing in 3810) [ClassicSimilarity], result of:
              0.08200603 = score(doc=3810,freq=8.0), product of:
                0.19390269 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.050655533 = queryNorm
                0.42292362 = fieldWeight in 3810, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3810)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Nowadays, online data shows an astonishing increase and the issue of semantic indexing remains an open question. Ontologies and knowledge bases have been widely used to optimize performance. However, researchers are placing increased emphasis on internal relations of ontologies but neglect latent semantic relations between ontologies and documents. They generally annotate instances mentioned in documents, which are related to concepts in ontologies. In this paper, we propose an Ontology-based Latent Semantic Indexing approach utilizing Long Short-Term Memory networks (LSTM-OLSI). We utilize an importance-aware topic model to extract document-level semantic features and leverage ontologies to extract word-level contextual features. Then we encode the above two levels of features and match their embedding vectors utilizing LSTM networks. Finally, the experimental results reveal that LSTM-OLSI outperforms existing techniques and demonstrates deep comprehension of instances and articles.
    Object
    Latent Semantic Indexing
  3. Vlachidis, A.; Tudhope, D.: ¬A knowledge-based approach to information extraction for semantic interoperability in the archaeology domain (2016) 0.10
    0.10465857 = product of:
      0.15698785 = sum of:
        0.12799433 = weight(_text_:semantic in 2895) [ClassicSimilarity], result of:
          0.12799433 = score(doc=2895,freq=14.0), product of:
            0.21061863 = queryWeight, product of:
              4.1578603 = idf(docFreq=1879, maxDocs=44218)
              0.050655533 = queryNorm
            0.6077066 = fieldWeight in 2895, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              4.1578603 = idf(docFreq=1879, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2895)
        0.02899351 = product of:
          0.05798702 = sum of:
            0.05798702 = weight(_text_:indexing in 2895) [ClassicSimilarity], result of:
              0.05798702 = score(doc=2895,freq=4.0), product of:
                0.19390269 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.050655533 = queryNorm
                0.29905218 = fieldWeight in 2895, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2895)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    The article presents a method for automatic semantic indexing of archaeological grey-literature reports using empirical (rule-based) Information Extraction techniques in combination with domain-specific knowledge organization systems. The semantic annotation system (OPTIMA) performs the tasks of Named Entity Recognition, Relation Extraction, Negation Detection, and Word-Sense Disambiguation using hand-crafted rules and terminological resources for associating contextual abstractions with classes of the standard ontology CIDOC Conceptual Reference Model (CRM) for cultural heritage and its archaeological extension, CRM-EH. Relation Extraction (RE) performance benefits from a syntactic-based definition of RE patterns derived from domain oriented corpus analysis. The evaluation also shows clear benefit in the use of assistive natural language processing (NLP) modules relating to Word-Sense Disambiguation, Negation Detection, and Noun Phrase Validation, together with controlled thesaurus expansion. The semantic indexing results demonstrate the capacity of rule-based Information Extraction techniques to deliver interoperable semantic abstractions (semantic annotations) with respect to the CIDOC CRM and archaeological thesauri. Major contributions include recognition of relevant entities using shallow parsing NLP techniques driven by a complimentary use of ontological and terminological domain resources and empirical derivation of context-driven RE rules for the recognition of semantic relationships from phrases of unstructured text.
  4. Fauzi, F.; Belkhatir, M.: Multifaceted conceptual image indexing on the world wide web (2013) 0.09
    0.0875351 = product of:
      0.13130264 = sum of:
        0.08209902 = weight(_text_:semantic in 2721) [ClassicSimilarity], result of:
          0.08209902 = score(doc=2721,freq=4.0), product of:
            0.21061863 = queryWeight, product of:
              4.1578603 = idf(docFreq=1879, maxDocs=44218)
              0.050655533 = queryNorm
            0.38979942 = fieldWeight in 2721, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.1578603 = idf(docFreq=1879, maxDocs=44218)
              0.046875 = fieldNorm(doc=2721)
        0.04920362 = product of:
          0.09840724 = sum of:
            0.09840724 = weight(_text_:indexing in 2721) [ClassicSimilarity], result of:
              0.09840724 = score(doc=2721,freq=8.0), product of:
                0.19390269 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.050655533 = queryNorm
                0.5075084 = fieldWeight in 2721, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2721)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    In this paper, we describe a user-centered design of an automated multifaceted concept-based indexing framework which analyzes the semantics of the Web image contextual information and classifies it into five broad semantic concept facets: signal, object, abstract, scene, and relational; and identifies the semantic relationships between the concepts. An important aspect of our indexing model is that it relates to the users' levels of image descriptions. Also, a major contribution relies on the fact that the classification is performed automatically with the raw image contextual information extracted from any general webpage and is not solely based on image tags like state-of-the-art solutions. Human Language Technology techniques and an external knowledge base are used to analyze the information both syntactically and semantically. Experimental results on a human-annotated Web image collection and corresponding contextual information indicate that our method outperforms empirical frameworks employing tf-idf and location-based tf-idf weighting schemes as well as n-gram indexing in a recall/precision based evaluation framework.
  5. Chung, E.-K.; Miksa, S.; Hastings, S.K.: ¬A framework of automatic subject term assignment for text categorization : an indexing conception-based approach (2010) 0.09
    0.08617924 = product of:
      0.12926885 = sum of:
        0.077403694 = weight(_text_:semantic in 3434) [ClassicSimilarity], result of:
          0.077403694 = score(doc=3434,freq=8.0), product of:
            0.21061863 = queryWeight, product of:
              4.1578603 = idf(docFreq=1879, maxDocs=44218)
              0.050655533 = queryNorm
            0.36750638 = fieldWeight in 3434, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              4.1578603 = idf(docFreq=1879, maxDocs=44218)
              0.03125 = fieldNorm(doc=3434)
        0.051865168 = product of:
          0.103730336 = sum of:
            0.103730336 = weight(_text_:indexing in 3434) [ClassicSimilarity], result of:
              0.103730336 = score(doc=3434,freq=20.0), product of:
                0.19390269 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.050655533 = queryNorm
                0.5349608 = fieldWeight in 3434, product of:
                  4.472136 = tf(freq=20.0), with freq of:
                    20.0 = termFreq=20.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.03125 = fieldNorm(doc=3434)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    The purpose of this study is to examine whether the understandings of subject-indexing processes conducted by human indexers have a positive impact on the effectiveness of automatic subject term assignment through text categorization (TC). More specifically, human indexers' subject-indexing approaches, or conceptions, in conjunction with semantic sources were explored in the context of a typical scientific journal article dataset. Based on the premise that subject indexing approaches or conceptions with semantic sources are important for automatic subject term assignment through TC, this study proposed an indexing conception-based framework. For the purpose of this study, two research questions were explored: To what extent are semantic sources effective? To what extent are indexing conceptions effective? The experiments were conducted using a Support Vector Machine implementation in WEKA (I.H. Witten & E. Frank, [2000]). Using F-measure, the experiment results showed that cited works, source title, and title were as effective as the full text while a keyword was found more effective than the full text. In addition, the findings showed that an indexing conception-based framework was more effective than the full text. The content-oriented and the document-oriented indexing approaches especially were found more effective than the full text. Among three indexing conception-based approaches, the content-oriented approach and the document-oriented approach were more effective than the domain-oriented approach. In other words, in the context of a typical scientific journal article dataset, the objective contents and authors' intentions were more desirable for automatic subject term assignment via TC than the possible users' needs. The findings of this study support that incorporation of human indexers' indexing approaches or conception in conjunction with semantic sources has a positive impact on the effectiveness of automatic subject term assignment.
  6. Grün, S.: Mehrwortbegriffe und Latent Semantic Analysis : Bewertung automatisch extrahierter Mehrwortgruppen mit LSA (2017) 0.08
    0.07519031 = product of:
      0.112785466 = sum of:
        0.08379196 = weight(_text_:semantic in 3954) [ClassicSimilarity], result of:
          0.08379196 = score(doc=3954,freq=6.0), product of:
            0.21061863 = queryWeight, product of:
              4.1578603 = idf(docFreq=1879, maxDocs=44218)
              0.050655533 = queryNorm
            0.39783734 = fieldWeight in 3954, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.1578603 = idf(docFreq=1879, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3954)
        0.02899351 = product of:
          0.05798702 = sum of:
            0.05798702 = weight(_text_:indexing in 3954) [ClassicSimilarity], result of:
              0.05798702 = score(doc=3954,freq=4.0), product of:
                0.19390269 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.050655533 = queryNorm
                0.29905218 = fieldWeight in 3954, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3954)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Die vorliegende Studie untersucht das Potenzial von Mehrwortbegriffen für das Information Retrieval. Zielsetzung der Arbeit ist es, intellektuell positiv bewertete Kandidaten mithilfe des Latent Semantic Analysis (LSA) Verfahren höher zu gewichten, als negativ bewertete Kandidaten. Die positiven Kandidaten sollen demnach bei einem Ranking im Information Retrieval bevorzugt werden. Als Kollektion wurde eine Version der sozialwissenschaftlichen GIRT-Datenbank (German Indexing and Retrieval Testdatabase) eingesetzt. Um Kandidaten für Mehrwortbegriffe zu identifizieren wurde die automatische Indexierung Lingo verwendet. Die notwendigen Kernfunktionalitäten waren Lemmatisierung, Identifizierung von Komposita, algorithmische Mehrworterkennung sowie Gewichtung von Indextermen durch das LSA-Modell. Die durch Lingo erkannten und LSAgewichteten Mehrwortkandidaten wurden evaluiert. Zuerst wurde dazu eine intellektuelle Auswahl von positiven und negativen Mehrwortkandidaten vorgenommen. Im zweiten Schritt der Evaluierung erfolgte die Berechnung der Ausbeute, um den Anteil der positiven Mehrwortkandidaten zu erhalten. Im letzten Schritt der Evaluierung wurde auf der Basis der R-Precision berechnet, wie viele positiv bewerteten Mehrwortkandidaten es an der Stelle k des Rankings geschafft haben. Die Ausbeute der positiven Mehrwortkandidaten lag bei durchschnittlich ca. 39%, während die R-Precision einen Durchschnittswert von 54% erzielte. Das LSA-Modell erzielt ein ambivalentes Ergebnis mit positiver Tendenz.
    Object
    Latent Semantic Indexing
  7. Junger, U.: Can indexing be automated? : the example of the Deutsche Nationalbibliothek (2012) 0.07
    0.07221276 = product of:
      0.10831914 = sum of:
        0.06772823 = weight(_text_:semantic in 1717) [ClassicSimilarity], result of:
          0.06772823 = score(doc=1717,freq=2.0), product of:
            0.21061863 = queryWeight, product of:
              4.1578603 = idf(docFreq=1879, maxDocs=44218)
              0.050655533 = queryNorm
            0.32156807 = fieldWeight in 1717, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1578603 = idf(docFreq=1879, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1717)
        0.040590912 = product of:
          0.081181824 = sum of:
            0.081181824 = weight(_text_:indexing in 1717) [ClassicSimilarity], result of:
              0.081181824 = score(doc=1717,freq=4.0), product of:
                0.19390269 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.050655533 = queryNorm
                0.41867304 = fieldWeight in 1717, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1717)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    The German subject headings authority file (Schlagwortnormdatei/SWD) provides a broad controlled vocabulary for indexing documents of all subjects. Traditionally used for intellectual subject cataloguing primarily of books the Deutsche Nationalbibliothek (DNB, German National Library) has been working on developping and implementing procedures for automated assignment of subject headings for online publications. This project, its results and problems are sketched in the paper.
    Content
    Beitrag für die Tagung: Beyond libraries - subject metadata in the digital environment and semantic web. IFLA Satellite Post-Conference, 17-18 August 2012, Tallinn. Vgl.: http://http://www.nlib.ee/index.php?id=17763.
  8. Junger, U.: Can indexing be automated? : the example of the Deutsche Nationalbibliothek (2014) 0.07
    0.07221276 = product of:
      0.10831914 = sum of:
        0.06772823 = weight(_text_:semantic in 1969) [ClassicSimilarity], result of:
          0.06772823 = score(doc=1969,freq=2.0), product of:
            0.21061863 = queryWeight, product of:
              4.1578603 = idf(docFreq=1879, maxDocs=44218)
              0.050655533 = queryNorm
            0.32156807 = fieldWeight in 1969, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1578603 = idf(docFreq=1879, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1969)
        0.040590912 = product of:
          0.081181824 = sum of:
            0.081181824 = weight(_text_:indexing in 1969) [ClassicSimilarity], result of:
              0.081181824 = score(doc=1969,freq=4.0), product of:
                0.19390269 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.050655533 = queryNorm
                0.41867304 = fieldWeight in 1969, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1969)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    The German Integrated Authority File (Gemeinsame Normdatei, GND), provides a broad controlled vocabulary for indexing documents on all subjects. Traditionally used for intellectual subject cataloging primarily for books, the Deutsche Nationalbibliothek (DNB, German National Library) has been working on developing and implementing procedures for automated assignment of subject headings for online publications. This project, its results, and problems are outlined in this article.
    Footnote
    Contribution in a special issue "Beyond libraries: Subject metadata in the digital environment and Semantic Web" - Enthält Beiträge der gleichnamigen IFLA Satellite Post-Conference, 17-18 August 2012, Tallinn.
  9. Wang, S.; Koopman, R.: Embed first, then predict (2019) 0.06
    0.059278235 = product of:
      0.08891735 = sum of:
        0.06841584 = weight(_text_:semantic in 5400) [ClassicSimilarity], result of:
          0.06841584 = score(doc=5400,freq=4.0), product of:
            0.21061863 = queryWeight, product of:
              4.1578603 = idf(docFreq=1879, maxDocs=44218)
              0.050655533 = queryNorm
            0.32483283 = fieldWeight in 5400, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.1578603 = idf(docFreq=1879, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5400)
        0.020501507 = product of:
          0.041003015 = sum of:
            0.041003015 = weight(_text_:indexing in 5400) [ClassicSimilarity], result of:
              0.041003015 = score(doc=5400,freq=2.0), product of:
                0.19390269 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.050655533 = queryNorm
                0.21146181 = fieldWeight in 5400, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5400)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Automatic subject prediction is a desirable feature for modern digital library systems, as manual indexing can no longer cope with the rapid growth of digital collections. It is also desirable to be able to identify a small set of entities (e.g., authors, citations, bibliographic records) which are most relevant to a query. This gets more difficult when the amount of data increases dramatically. Data sparsity and model scalability are the major challenges to solving this type of extreme multilabel classification problem automatically. In this paper, we propose to address this problem in two steps: we first embed different types of entities into the same semantic space, where similarity could be computed easily; second, we propose a novel non-parametric method to identify the most relevant entities in addition to direct semantic similarities. We show how effectively this approach predicts even very specialised subjects, which are associated with few documents in the training set and are more problematic for a classifier.
  10. Smiraglia, R.P.; Cai, X.: Tracking the evolution of clustering, machine learning, automatic indexing and automatic classification in knowledge organization (2017) 0.05
    0.051580545 = product of:
      0.077370815 = sum of:
        0.048377305 = weight(_text_:semantic in 3627) [ClassicSimilarity], result of:
          0.048377305 = score(doc=3627,freq=2.0), product of:
            0.21061863 = queryWeight, product of:
              4.1578603 = idf(docFreq=1879, maxDocs=44218)
              0.050655533 = queryNorm
            0.22969149 = fieldWeight in 3627, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1578603 = idf(docFreq=1879, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3627)
        0.02899351 = product of:
          0.05798702 = sum of:
            0.05798702 = weight(_text_:indexing in 3627) [ClassicSimilarity], result of:
              0.05798702 = score(doc=3627,freq=4.0), product of:
                0.19390269 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.050655533 = queryNorm
                0.29905218 = fieldWeight in 3627, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3627)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    A very important extension of the traditional domain of knowledge organization (KO) arises from attempts to incorporate techniques devised in the computer science domain for automatic concept extraction and for grouping, categorizing, clustering and otherwise organizing knowledge using mechanical means. Four specific terms have emerged to identify the most prevalent techniques: machine learning, clustering, automatic indexing, and automatic classification. Our study presents three domain analytical case analyses in search of answers. The first case relies on citations located using the ISKO-supported "Knowledge Organization Bibliography." The second case relies on works in both Web of Science and SCOPUS. Case three applies co-word analysis and citation analysis to the contents of the papers in the present special issue. We observe scholars involved in "clustering" and "automatic classification" who share common thematic emphases. But we have found no coherence, no common activity and no social semantics. We have not found a research front, or a common teleology within the KO domain. We also have found a lively group of authors who have succeeded in submitting papers to this special issue, and their work quite interestingly aligns with the case studies we report. There is an emphasis on KO for information retrieval; there is much work on clustering (which involves conceptual points within texts) and automatic classification (which involves semantic groupings at the meta-document level).
  11. Martins, A.L.; Souza, R.R.; Ribeiro de Mello, H.: ¬The use of noun phrases in information retrieval : proposing a mechanism for automatic classification (2014) 0.05
    0.045639288 = product of:
      0.06845893 = sum of:
        0.054732677 = weight(_text_:semantic in 1441) [ClassicSimilarity], result of:
          0.054732677 = score(doc=1441,freq=4.0), product of:
            0.21061863 = queryWeight, product of:
              4.1578603 = idf(docFreq=1879, maxDocs=44218)
              0.050655533 = queryNorm
            0.25986627 = fieldWeight in 1441, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.1578603 = idf(docFreq=1879, maxDocs=44218)
              0.03125 = fieldNorm(doc=1441)
        0.01372625 = product of:
          0.0274525 = sum of:
            0.0274525 = weight(_text_:22 in 1441) [ClassicSimilarity], result of:
              0.0274525 = score(doc=1441,freq=2.0), product of:
                0.17738704 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050655533 = queryNorm
                0.15476047 = fieldWeight in 1441, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03125 = fieldNorm(doc=1441)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    This paper presents a research on syntactic structures known as noun phrases (NP) being applied to increase the effectiveness and efficiency of the mechanisms for the document's classification. Our hypothesis is the fact that the NP can be used instead of single words as a semantic aggregator to reduce the number of words that will be used for the classification system without losing its semantic coverage, increasing its efficiency. The experiment divided the documents classification process in three phases: a) NP preprocessing b) system training; and c) classification experiments. In the first step, a corpus of digitalized texts was submitted to a natural language processing platform1 in which the part-of-speech tagging was done, and them PERL scripts pertaining to the PALAVRAS package were used to extract the Noun Phrases. The preprocessing also involved the tasks of a) removing NP low meaning pre-modifiers, as quantifiers; b) identification of synonyms and corresponding substitution for common hyperonyms; and c) stemming of the relevant words contained in the NP, for similitude checking with other NPs. The first tests with the resulting documents have demonstrated its effectiveness. We have compared the structural similarity of the documents before and after the whole pre-processing steps of phase one. The texts maintained the consistency with the original and have kept the readability. The second phase involves submitting the modified documents to a SVM algorithm to identify clusters and classify the documents. The classification rules are to be established using a machine learning approach. Finally, tests will be conducted to check the effectiveness of the whole process.
    Source
    Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik
  12. Greiner-Petter, A.; Schubotz, M.; Cohl, H.S.; Gipp, B.: Semantic preserving bijective mappings for expressions involving special functions between computer algebra systems and document preparation systems (2019) 0.03
    0.034952067 = product of:
      0.052428097 = sum of:
        0.038701847 = weight(_text_:semantic in 5499) [ClassicSimilarity], result of:
          0.038701847 = score(doc=5499,freq=2.0), product of:
            0.21061863 = queryWeight, product of:
              4.1578603 = idf(docFreq=1879, maxDocs=44218)
              0.050655533 = queryNorm
            0.18375319 = fieldWeight in 5499, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1578603 = idf(docFreq=1879, maxDocs=44218)
              0.03125 = fieldNorm(doc=5499)
        0.01372625 = product of:
          0.0274525 = sum of:
            0.0274525 = weight(_text_:22 in 5499) [ClassicSimilarity], result of:
              0.0274525 = score(doc=5499,freq=2.0), product of:
                0.17738704 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050655533 = queryNorm
                0.15476047 = fieldWeight in 5499, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03125 = fieldNorm(doc=5499)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Date
    20. 1.2015 18:30:22
  13. Gábor, K.; Zargayouna, H.; Tellier, I.; Buscaldi, D.; Charnois, T.: ¬A typology of semantic relations dedicated to scientific literature analysis (2016) 0.03
    0.031927396 = product of:
      0.09578218 = sum of:
        0.09578218 = weight(_text_:semantic in 2933) [ClassicSimilarity], result of:
          0.09578218 = score(doc=2933,freq=4.0), product of:
            0.21061863 = queryWeight, product of:
              4.1578603 = idf(docFreq=1879, maxDocs=44218)
              0.050655533 = queryNorm
            0.45476598 = fieldWeight in 2933, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.1578603 = idf(docFreq=1879, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2933)
      0.33333334 = coord(1/3)
    
    Abstract
    We propose a method for improving access to scientific literature by analyzing the content of research papers beyond citation links and topic tracking. Our model relies on a typology of explicit semantic relations. These relations are instantiated in the abstract/introduction part of the papers and can be identified automatically using textual data and external ontologies. Preliminary results show a promising precision in unsupervised relationship classification.
  14. Mesquita, L.A.P.; Souza, R.R.; Baracho Porto, R.M.A.: Noun phrases in automatic indexing: : a structural analysis of the distribution of relevant terms in doctoral theses (2014) 0.03
    0.028089315 = product of:
      0.084267944 = sum of:
        0.084267944 = sum of:
          0.05681544 = weight(_text_:indexing in 1442) [ClassicSimilarity], result of:
            0.05681544 = score(doc=1442,freq=6.0), product of:
              0.19390269 = queryWeight, product of:
                3.8278677 = idf(docFreq=2614, maxDocs=44218)
                0.050655533 = queryNorm
              0.2930101 = fieldWeight in 1442, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.8278677 = idf(docFreq=2614, maxDocs=44218)
                0.03125 = fieldNorm(doc=1442)
          0.0274525 = weight(_text_:22 in 1442) [ClassicSimilarity], result of:
            0.0274525 = score(doc=1442,freq=2.0), product of:
              0.17738704 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.050655533 = queryNorm
              0.15476047 = fieldWeight in 1442, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.03125 = fieldNorm(doc=1442)
      0.33333334 = coord(1/3)
    
    Abstract
    The main objective of this research was to analyze whether there was a characteristic distribution behavior of relevant terms over a scientific text that could contribute as a criterion for their process of automatic indexing. The terms considered in this study were only full noun phrases contained in the texts themselves. The texts were considered a total of 98 doctoral theses of the eight areas of knowledge in a same university. Initially, 20 full noun phrases were automatically extracted from each text as candidates to be the most relevant terms, and each author of each text assigned a relevance value 0-6 (not relevant and highly relevant, respectively) for each of the 20 noun phrases sent. Only, 22.1 % of noun phrases were considered not relevant. A relevance values of the terms assigned by the authors were associated with their positions in the text. Each full noun phrases found in the text was considered as a valid linear position. The results that were obtained showed values resulting from this distribution by considering two types of position: linear, with values consolidated into ten equal consecutive parts; and structural, considering parts of the text (such as introduction, development and conclusion). As a result of considerable importance, all areas of knowledge related to the Natural Sciences showed a characteristic behavior in the distribution of relevant terms, as well as all areas of knowledge related to Social Sciences showed the same characteristic behavior of distribution, but distinct from the Natural Sciences. The difference of the distribution behavior between the Natural and Social Sciences can be clearly visualized through graphs. All behaviors, including the general behavior of all areas of knowledge together, were characterized in polynomial equations and can be applied in future as criteria for automatic indexing. Until the present date this work has become inedited of for two reasons: to present a method for characterizing the distribution of relevant terms in a scientific text, and also, through this method, pointing out a quantitative trait difference between the Natural and Social Sciences.
    Source
    Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik
  15. Cui, H.; Boufford, D.; Selden, P.: Semantic annotation of biosystematics literature without training examples (2010) 0.03
    0.02736634 = product of:
      0.08209902 = sum of:
        0.08209902 = weight(_text_:semantic in 3422) [ClassicSimilarity], result of:
          0.08209902 = score(doc=3422,freq=4.0), product of:
            0.21061863 = queryWeight, product of:
              4.1578603 = idf(docFreq=1879, maxDocs=44218)
              0.050655533 = queryNorm
            0.38979942 = fieldWeight in 3422, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.1578603 = idf(docFreq=1879, maxDocs=44218)
              0.046875 = fieldNorm(doc=3422)
      0.33333334 = coord(1/3)
    
    Abstract
    This article presents an unsupervised algorithm for semantic annotation of morphological descriptions of whole organisms. The algorithm is able to annotate plain text descriptions with high accuracy at the clause level by exploiting the corpus itself. In other words, the algorithm does not need lexicons, syntactic parsers, training examples, or annotation templates. The evaluation on two real-life description collections in botany and paleontology shows that the algorithm has the following desirable features: (a) reduces/eliminates manual labor required to compile dictionaries and prepare source documents; (b) improves annotation coverage: the algorithm annotates what appears in documents and is not limited by predefined and often incomplete templates; (c) learns clean and reusable concepts: the algorithm learns organ names and character states that can be used to construct reusable domain lexicons, as opposed to collection-dependent patterns whose applicability is often limited to a particular collection; (d) insensitive to collection size; and (e) runs in linear time with respect to the number of clauses to be annotated.
  16. Gil-Leiva, I.: SISA-automatic indexing system for scientific articles : experiments with location heuristics rules versus TF-IDF rules (2017) 0.02
    0.02460181 = product of:
      0.07380543 = sum of:
        0.07380543 = product of:
          0.14761086 = sum of:
            0.14761086 = weight(_text_:indexing in 3622) [ClassicSimilarity], result of:
              0.14761086 = score(doc=3622,freq=18.0), product of:
                0.19390269 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.050655533 = queryNorm
                0.76126254 = fieldWeight in 3622, product of:
                  4.2426405 = tf(freq=18.0), with freq of:
                    18.0 = termFreq=18.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3622)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Indexing is contextualized and a brief description is provided of some of the most used automatic indexing systems. We describe SISA, a system which uses location heuristics rules, statistical rules like term frequency (TF) or TF-IDF to obtain automatic or semi-automatic indexing, depending on the user's preference. The aim of this research is to ascertain which rules (location heuristics rules or TF-IDF rules) provide the best indexing terms. SISA is used to obtain the automatic indexing of 200 scientific articles on fruit growing written in Portuguese. It uses, on the one hand, location heuristics rules founded on the value of certain parts of the articles for indexing such as titles, abstracts, keywords, headings, first paragraph, conclusions and references and, on the other, TF-IDF rules. The indexing is then evaluated to ascertain retrieval performance through recall, precision and f-measure. Automatic indexing of the articles with location heuristics rules provided the best results with the evaluation measures.
  17. Strobel, S.; Marín-Arraiza, P.: Metadata for scientific audiovisual media : current practices and perspectives of the TIB / AV-portal (2015) 0.02
    0.022805281 = product of:
      0.06841584 = sum of:
        0.06841584 = weight(_text_:semantic in 3667) [ClassicSimilarity], result of:
          0.06841584 = score(doc=3667,freq=4.0), product of:
            0.21061863 = queryWeight, product of:
              4.1578603 = idf(docFreq=1879, maxDocs=44218)
              0.050655533 = queryNorm
            0.32483283 = fieldWeight in 3667, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.1578603 = idf(docFreq=1879, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3667)
      0.33333334 = coord(1/3)
    
    Abstract
    Descriptive metadata play a key role in finding relevant search results in large amounts of unstructured data. However, current scientific audiovisual media are provided with little metadata, which makes them hard to find, let alone individual sequences. In this paper, the TIB / AV-Portal is presented as a use case where methods concerning the automatic generation of metadata, a semantic search and cross-lingual retrieval (German/English) have already been applied. These methods result in a better discoverability of the scientific audiovisual media hosted in the portal. Text, speech, and image content of the video are automatically indexed by specialised GND (Gemeinsame Normdatei) subject headings. A semantic search is established based on properties of the GND ontology. The cross-lingual retrieval uses English 'translations' that were derived by an ontology mapping (DBpedia i. a.). Further ways of increasing the discoverability and reuse of the metadata are publishing them as Linked Open Data and interlinking them with other data sets.
  18. Lu, K.; Mao, J.: ¬An automatic approach to weighted subject indexing : an empirical study in the biomedical domain (2015) 0.02
    0.02266527 = product of:
      0.06799581 = sum of:
        0.06799581 = product of:
          0.13599162 = sum of:
            0.13599162 = weight(_text_:indexing in 4005) [ClassicSimilarity], result of:
              0.13599162 = score(doc=4005,freq=22.0), product of:
                0.19390269 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.050655533 = queryNorm
                0.70133954 = fieldWeight in 4005, product of:
                  4.690416 = tf(freq=22.0), with freq of:
                    22.0 = termFreq=22.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4005)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Subject indexing is an intellectually intensive process that has many inherent uncertainties. Existing manual subject indexing systems generally produce binary outcomes for whether or not to assign an indexing term. This does not sufficiently reflect the extent to which the indexing terms are associated with the documents. On the other hand, the idea of probabilistic or weighted indexing was proposed a long time ago and has seen success in capturing uncertainties in the automatic indexing process. One hurdle to overcome in implementing weighted indexing in manual subject indexing systems is the practical burden that could be added to the already intensive indexing process. This study proposes a method to infer automatically the associations between subject terms and documents through text mining. By uncovering the connections between MeSH descriptors and document text, we are able to derive the weights of MeSH descriptors manually assigned to documents. Our initial results suggest that the inference method is feasible and promising. The study has practical implications for improving subject indexing practice and providing better support for information retrieval.
  19. Schulz, K.U.; Brunner, L.: Vollautomatische thematische Verschlagwortung großer Textkollektionen mittels semantischer Netze (2017) 0.02
    0.022576077 = product of:
      0.06772823 = sum of:
        0.06772823 = weight(_text_:semantic in 3493) [ClassicSimilarity], result of:
          0.06772823 = score(doc=3493,freq=2.0), product of:
            0.21061863 = queryWeight, product of:
              4.1578603 = idf(docFreq=1879, maxDocs=44218)
              0.050655533 = queryNorm
            0.32156807 = fieldWeight in 3493, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1578603 = idf(docFreq=1879, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3493)
      0.33333334 = coord(1/3)
    
    Source
    Theorie, Semantik und Organisation von Wissen: Proceedings der 13. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) und dem 13. Internationalen Symposium der Informationswissenschaft der Higher Education Association for Information Science (HI) Potsdam (19.-20.03.2013): 'Theory, Information and Organization of Knowledge' / Proceedings der 14. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) und Natural Language & Information Systems (NLDB) Passau (16.06.2015): 'Lexical Resources for Knowledge Organization' / Proceedings des Workshops der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) auf der SEMANTICS Leipzig (1.09.2014): 'Knowledge Organization and Semantic Web' / Proceedings des Workshops der Polnischen und Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) Cottbus (29.-30.09.2011): 'Economics of Knowledge Production and Organization'. Hrsg. von W. Babik, H.P. Ohly u. K. Weber
  20. Böhm, A.; Seifert, C.; Schlötterer, J.; Granitzer, M.: Identifying tweets from the economic domain (2017) 0.02
    0.022576077 = product of:
      0.06772823 = sum of:
        0.06772823 = weight(_text_:semantic in 3495) [ClassicSimilarity], result of:
          0.06772823 = score(doc=3495,freq=2.0), product of:
            0.21061863 = queryWeight, product of:
              4.1578603 = idf(docFreq=1879, maxDocs=44218)
              0.050655533 = queryNorm
            0.32156807 = fieldWeight in 3495, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1578603 = idf(docFreq=1879, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3495)
      0.33333334 = coord(1/3)
    
    Source
    Theorie, Semantik und Organisation von Wissen: Proceedings der 13. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) und dem 13. Internationalen Symposium der Informationswissenschaft der Higher Education Association for Information Science (HI) Potsdam (19.-20.03.2013): 'Theory, Information and Organization of Knowledge' / Proceedings der 14. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) und Natural Language & Information Systems (NLDB) Passau (16.06.2015): 'Lexical Resources for Knowledge Organization' / Proceedings des Workshops der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) auf der SEMANTICS Leipzig (1.09.2014): 'Knowledge Organization and Semantic Web' / Proceedings des Workshops der Polnischen und Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) Cottbus (29.-30.09.2011): 'Economics of Knowledge Production and Organization'. Hrsg. von W. Babik, H.P. Ohly u. K. Weber

Languages

  • e 32
  • d 12

Types

  • a 41
  • el 7
  • x 3
  • More… Less…