Search (204 results, page 1 of 11)

Ma, N.; Zheng, H.T.; Xiao, X.: ¬An ontology-based latent semantic indexing approach using long short-term memory networks (2017) 0.09

0.08717725 = product of:
  0.14529541 = sum of:
    0.023397226 = weight(_text_:retrieval in 3810) [ClassicSimilarity], result of:
      0.023397226 = score(doc=3810,freq=2.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.16710453 = fieldWeight in 3810, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3810)
    0.10828129 = weight(_text_:semantic in 3810) [ClassicSimilarity], result of:
      0.10828129 = score(doc=3810,freq=12.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.56262696 = fieldWeight in 3810, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3810)
    0.013616893 = product of:
      0.027233787 = sum of:
        0.027233787 = weight(_text_:web in 3810) [ClassicSimilarity], result of:
          0.027233787 = score(doc=3810,freq=2.0), product of:
            0.15105948 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.04628742 = queryNorm
            0.18028519 = fieldWeight in 3810, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3810)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Abstract: Nowadays, online data shows an astonishing increase and the issue of semantic indexing remains an open question. Ontologies and knowledge bases have been widely used to optimize performance. However, researchers are placing increased emphasis on internal relations of ontologies but neglect latent semantic relations between ontologies and documents. They generally annotate instances mentioned in documents, which are related to concepts in ontologies. In this paper, we propose an Ontology-based Latent Semantic Indexing approach utilizing Long Short-Term Memory networks (LSTM-OLSI). We utilize an importance-aware topic model to extract document-level semantic features and leverage ontologies to extract word-level contextual features. Then we encode the above two levels of features and match their embedding vectors utilizing LSTM networks. Finally, the experimental results reveal that LSTM-OLSI outperforms existing techniques and demonstrates deep comprehension of instances and articles.
Object: Latent Semantic Indexing
Source: Web and Big Data: First International Joint Conference, APWeb-WAIM 2017, Beijing, China, July 7-9, 2017, Proceedings, Part I. Eds.: L. Chen et al
Theme: Semantisches Umfeld in Indexierung u. Retrieval

Gábor, K.; Zargayouna, H.; Tellier, I.; Buscaldi, D.; Charnois, T.: ¬A typology of semantic relations dedicated to scientific literature analysis (2016) 0.08

0.083605506 = product of:
  0.1393425 = sum of:
    0.032756116 = weight(_text_:retrieval in 2933) [ClassicSimilarity], result of:
      0.032756116 = score(doc=2933,freq=2.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.23394634 = fieldWeight in 2933, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2933)
    0.08752273 = weight(_text_:semantic in 2933) [ClassicSimilarity], result of:
      0.08752273 = score(doc=2933,freq=4.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.45476598 = fieldWeight in 2933, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2933)
    0.019063652 = product of:
      0.038127303 = sum of:
        0.038127303 = weight(_text_:web in 2933) [ClassicSimilarity], result of:
          0.038127303 = score(doc=2933,freq=2.0), product of:
            0.15105948 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.04628742 = queryNorm
            0.25239927 = fieldWeight in 2933, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2933)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Abstract: We propose a method for improving access to scientific literature by analyzing the content of research papers beyond citation links and topic tracking. Our model relies on a typology of explicit semantic relations. These relations are instantiated in the abstract/introduction part of the papers and can be identified automatically using textual data and external ontologies. Preliminary results show a promising precision in unsupervised relationship classification.
Content: Vortrag, "Semantics, Analytics, Visualisation: Enhancing Scholarly Data Workshop co-located with the 25th International World Wide Web Conference April 11, 2016 - Montreal, Canada", Montreal 2016.
Theme: Semantisches Umfeld in Indexierung u. Retrieval

Bordoni, L.; Pazienza, M.T.: Documents automatic indexing in an environmental domain (1997) 0.08

0.07809698 = product of:
  0.13016163 = sum of:
    0.04632414 = weight(_text_:retrieval in 530) [ClassicSimilarity], result of:
      0.04632414 = score(doc=530,freq=4.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.33085006 = fieldWeight in 530, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=530)
    0.06188791 = weight(_text_:semantic in 530) [ClassicSimilarity], result of:
      0.06188791 = score(doc=530,freq=2.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.32156807 = fieldWeight in 530, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0546875 = fieldNorm(doc=530)
    0.021949572 = product of:
      0.043899145 = sum of:
        0.043899145 = weight(_text_:22 in 530) [ClassicSimilarity], result of:
          0.043899145 = score(doc=530,freq=2.0), product of:
            0.16209066 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04628742 = queryNorm
            0.2708308 = fieldWeight in 530, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=530)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Abstract: Describes an application of Natural Language Processing (NLP) techniques, in HIRMA (Hypertextual Information Retrieval Managed by ARIOSTO), to the problem of document indexing by referring to a system which incorporates natural language processing techniques to determine the subject of the text of documents and to associate them with relevant semantic indexes. Describes briefly the overall system, details of its implementation on a corpus of scientific abstracts related to environmental topics and experimental evidence of the system's behaviour. Analyzes in detail an experiment designed to evaluate the system's retrieval ability in terms of recall and precision
Source: International forum on information and documentation. 22(1997) no.1, S.17-28

Lochbaum, K.E.; Streeter, A.R.: Comparing and combining the effectiveness of latent semantic indexing and the ordinary vector space model for information retrieval (1989) 0.07
```
0.06689858 = product of:
  0.16724643 = sum of:
    0.048630223 = weight(_text_:retrieval in 3458) [ClassicSimilarity], result of:
      0.048630223 = score(doc=3458,freq=6.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.34732026 = fieldWeight in 3458, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=3458)
    0.118616216 = weight(_text_:semantic in 3458) [ClassicSimilarity], result of:
      0.118616216 = score(doc=3458,freq=10.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.616327 = fieldWeight in 3458, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.046875 = fieldNorm(doc=3458)
  0.4 = coord(2/5)
```
Abstract

A retrievalsystem was built to find individuals with appropriate expertise within a large research establishment on the basis of their authored documents. The expert-locating system uses a new method for automatic indexing and retrieval based on singular value decomposition, a matrix decomposition technique related to the factor analysis. Organizational groups, represented by the documents they write, and the terms contained in these documents, are fit simultaneously into a 100-dimensional "semantic" space. User queries are positioned in the semantic space, and the most similar groups are returned to the user. Here we compared the standard vector-space model with this new technique and found that combining the two methods improved performance over either alone. We also examined the effects of various experimental variables on the system`s retrieval accuracy. In particular, the effects of: term weighting functions in the semantic space construction and in query construction, suffix stripping, and using lexical units larger than a a single word were studied.

Object

Latent Semantic Indexing

Chowdhury, G.G.: Natural language processing and information retrieval : pt.1: basic issues; pt.2: major applications (1991) 0.06

0.061835457 = product of:
  0.15458864 = sum of:
    0.066177346 = weight(_text_:retrieval in 3313) [ClassicSimilarity], result of:
      0.066177346 = score(doc=3313,freq=4.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.47264296 = fieldWeight in 3313, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.078125 = fieldNorm(doc=3313)
    0.0884113 = weight(_text_:semantic in 3313) [ClassicSimilarity], result of:
      0.0884113 = score(doc=3313,freq=2.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.45938298 = fieldWeight in 3313, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.078125 = fieldNorm(doc=3313)
  0.4 = coord(2/5)

Abstract: Reviews the basic issues and procedures involved in natural language processing of textual material for final use in information retrieval. Covers: natural language processing; natural language understanding; syntactic and semantic analysis; parsing; knowledge bases and knowledge representation

Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.05

0.05001663 = product of:
  0.12504157 = sum of:
    0.07487112 = weight(_text_:retrieval in 402) [ClassicSimilarity], result of:
      0.07487112 = score(doc=402,freq=2.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.5347345 = fieldWeight in 402, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.125 = fieldNorm(doc=402)
    0.05017045 = product of:
      0.1003409 = sum of:
        0.1003409 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
          0.1003409 = score(doc=402,freq=2.0), product of:
            0.16209066 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04628742 = queryNorm
            0.61904186 = fieldWeight in 402, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.125 = fieldNorm(doc=402)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Source: Information processing and management. 22(1986) no.6, S.465-476

Martins, A.L.; Souza, R.R.; Ribeiro de Mello, H.: ¬The use of noun phrases in information retrieval : proposing a mechanism for automatic classification (2014) 0.05
```
0.04876403 = product of:
  0.081273384 = sum of:
    0.01871778 = weight(_text_:retrieval in 1441) [ClassicSimilarity], result of:
      0.01871778 = score(doc=1441,freq=2.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.13368362 = fieldWeight in 1441, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.03125 = fieldNorm(doc=1441)
    0.050012987 = weight(_text_:semantic in 1441) [ClassicSimilarity], result of:
      0.050012987 = score(doc=1441,freq=4.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.25986627 = fieldWeight in 1441, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.03125 = fieldNorm(doc=1441)
    0.012542613 = product of:
      0.025085226 = sum of:
        0.025085226 = weight(_text_:22 in 1441) [ClassicSimilarity], result of:
          0.025085226 = score(doc=1441,freq=2.0), product of:
            0.16209066 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04628742 = queryNorm
            0.15476047 = fieldWeight in 1441, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=1441)
      0.5 = coord(1/2)
  0.6 = coord(3/5)
```
Abstract

This paper presents a research on syntactic structures known as noun phrases (NP) being applied to increase the effectiveness and efficiency of the mechanisms for the document's classification. Our hypothesis is the fact that the NP can be used instead of single words as a semantic aggregator to reduce the number of words that will be used for the classification system without losing its semantic coverage, increasing its efficiency. The experiment divided the documents classification process in three phases: a) NP preprocessing b) system training; and c) classification experiments. In the first step, a corpus of digitalized texts was submitted to a natural language processing platform1 in which the part-of-speech tagging was done, and them PERL scripts pertaining to the PALAVRAS package were used to extract the Noun Phrases. The preprocessing also involved the tasks of a) removing NP low meaning pre-modifiers, as quantifiers; b) identification of synonyms and corresponding substitution for common hyperonyms; and c) stemming of the relevant words contained in the NP, for similitude checking with other NPs. The first tests with the resulting documents have demonstrated its effectiveness. We have compared the structural similarity of the documents before and after the whole pre-processing steps of phase one. The texts maintained the consistency with the original and have kept the readability. The second phase involves submitting the modified documents to a SVM algorithm to identify clusters and classify the documents. The classification rules are to be established using a machine learning approach. Finally, tests will be conducted to check the effectiveness of the whole process.

Source

Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik

Smiraglia, R.P.; Cai, X.: Tracking the evolution of clustering, machine learning, automatic indexing and automatic classification in knowledge organization (2017) 0.05

0.04873186 = product of:
  0.08121976 = sum of:
    0.023397226 = weight(_text_:retrieval in 3627) [ClassicSimilarity], result of:
      0.023397226 = score(doc=3627,freq=2.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.16710453 = fieldWeight in 3627, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3627)
    0.04420565 = weight(_text_:semantic in 3627) [ClassicSimilarity], result of:
      0.04420565 = score(doc=3627,freq=2.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.22969149 = fieldWeight in 3627, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3627)
    0.013616893 = product of:
      0.027233787 = sum of:
        0.027233787 = weight(_text_:web in 3627) [ClassicSimilarity], result of:
          0.027233787 = score(doc=3627,freq=2.0), product of:
            0.15105948 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.04628742 = queryNorm
            0.18028519 = fieldWeight in 3627, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3627)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Abstract: A very important extension of the traditional domain of knowledge organization (KO) arises from attempts to incorporate techniques devised in the computer science domain for automatic concept extraction and for grouping, categorizing, clustering and otherwise organizing knowledge using mechanical means. Four specific terms have emerged to identify the most prevalent techniques: machine learning, clustering, automatic indexing, and automatic classification. Our study presents three domain analytical case analyses in search of answers. The first case relies on citations located using the ISKO-supported "Knowledge Organization Bibliography." The second case relies on works in both Web of Science and SCOPUS. Case three applies co-word analysis and citation analysis to the contents of the papers in the present special issue. We observe scholars involved in "clustering" and "automatic classification" who share common thematic emphases. But we have found no coherence, no common activity and no social semantics. We have not found a research front, or a common teleology within the KO domain. We also have found a lively group of authors who have succeeded in submitting papers to this special issue, and their work quite interestingly aligns with the case studies we report. There is an emphasis on KO for information retrieval; there is much work on clustering (which involves conceptual points within texts) and automatic classification (which involves semantic groupings at the meta-document level).

Stankovic, R. et al.: Indexing of textual databases based on lexical resources : a case study for Serbian (2016) 0.05

0.047907133 = product of:
  0.11976783 = sum of:
    0.0884113 = weight(_text_:semantic in 2759) [ClassicSimilarity], result of:
      0.0884113 = score(doc=2759,freq=2.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.45938298 = fieldWeight in 2759, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.078125 = fieldNorm(doc=2759)
    0.031356532 = product of:
      0.062713064 = sum of:
        0.062713064 = weight(_text_:22 in 2759) [ClassicSimilarity], result of:
          0.062713064 = score(doc=2759,freq=2.0), product of:
            0.16209066 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04628742 = queryNorm
            0.38690117 = fieldWeight in 2759, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2759)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Date: 1. 2.2016 18:25:22
Source: Semantic keyword-based search on structured data sources: First COST Action IC1302 International KEYSTONE Conference, IKC 2015, Coimbra, Portugal, September 8-9, 2015. Revised Selected Papers. Eds.: J. Cardoso et al

Grün, S.: Mehrwortbegriffe und Latent Semantic Analysis : Bewertung automatisch extrahierter Mehrwortgruppen mit LSA (2017) 0.05
```
0.04683665 = product of:
  0.117091626 = sum of:
    0.040525187 = weight(_text_:retrieval in 3954) [ClassicSimilarity], result of:
      0.040525187 = score(doc=3954,freq=6.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.28943354 = fieldWeight in 3954, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3954)
    0.076566435 = weight(_text_:semantic in 3954) [ClassicSimilarity], result of:
      0.076566435 = score(doc=3954,freq=6.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.39783734 = fieldWeight in 3954, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3954)
  0.4 = coord(2/5)
```
Abstract

Die vorliegende Studie untersucht das Potenzial von Mehrwortbegriffen für das Information Retrieval. Zielsetzung der Arbeit ist es, intellektuell positiv bewertete Kandidaten mithilfe des Latent Semantic Analysis (LSA) Verfahren höher zu gewichten, als negativ bewertete Kandidaten. Die positiven Kandidaten sollen demnach bei einem Ranking im Information Retrieval bevorzugt werden. Als Kollektion wurde eine Version der sozialwissenschaftlichen GIRT-Datenbank (German Indexing and Retrieval Testdatabase) eingesetzt. Um Kandidaten für Mehrwortbegriffe zu identifizieren wurde die automatische Indexierung Lingo verwendet. Die notwendigen Kernfunktionalitäten waren Lemmatisierung, Identifizierung von Komposita, algorithmische Mehrworterkennung sowie Gewichtung von Indextermen durch das LSA-Modell. Die durch Lingo erkannten und LSAgewichteten Mehrwortkandidaten wurden evaluiert. Zuerst wurde dazu eine intellektuelle Auswahl von positiven und negativen Mehrwortkandidaten vorgenommen. Im zweiten Schritt der Evaluierung erfolgte die Berechnung der Ausbeute, um den Anteil der positiven Mehrwortkandidaten zu erhalten. Im letzten Schritt der Evaluierung wurde auf der Basis der R-Precision berechnet, wie viele positiv bewerteten Mehrwortkandidaten es an der Stelle k des Rankings geschafft haben. Die Ausbeute der positiven Mehrwortkandidaten lag bei durchschnittlich ca. 39%, während die R-Precision einen Durchschnittswert von 54% erzielte. Das LSA-Modell erzielt ein ambivalentes Ergebnis mit positiver Tendenz.

Object

Latent Semantic Indexing

Lassalle, E.: Text retrieval : from a monolingual system to a multilingual system (1993) 0.04

0.043284822 = product of:
  0.108212054 = sum of:
    0.04632414 = weight(_text_:retrieval in 7403) [ClassicSimilarity], result of:
      0.04632414 = score(doc=7403,freq=4.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.33085006 = fieldWeight in 7403, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7403)
    0.06188791 = weight(_text_:semantic in 7403) [ClassicSimilarity], result of:
      0.06188791 = score(doc=7403,freq=2.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.32156807 = fieldWeight in 7403, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7403)
  0.4 = coord(2/5)

Abstract: Describes the TELMI monolingual text retrieval system and its future extension, a multilingual system. TELMI is designed for medium sized databases containing short texts. The characteristics of the system are fine-grained natural language processing (NLP); an open domain and a large scale knowledge base; automated indexing based on conceptual representation of texts and reusability of the NLP tools. Discusses the French MINITEL service, the MGS information service and the TELMI research system covering the full text system; NLP architecture; the lexical level; the syntactic level; the semantic level and an example of the use of a generic system

Gödert, W.; Liebig, M.: Maschinelle Indexierung auf dem Prüfstand : Ergebnisse eines Retrievaltests zum MILOS II Projekt (1997) 0.04

0.043284822 = product of:
  0.108212054 = sum of:
    0.04632414 = weight(_text_:retrieval in 1174) [ClassicSimilarity], result of:
      0.04632414 = score(doc=1174,freq=4.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.33085006 = fieldWeight in 1174, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1174)
    0.06188791 = weight(_text_:semantic in 1174) [ClassicSimilarity], result of:
      0.06188791 = score(doc=1174,freq=2.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.32156807 = fieldWeight in 1174, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1174)
  0.4 = coord(2/5)

Abstract: The test ran between Nov 95-Aug 96 in Cologne Fachhochschule fur Bibliothekswesen (College of Librarianship).The test basis was a database of 190,000 book titles published between 1990-95. MILOS II mechanized indexing methods proved helpful in avoiding or reducing numbers of unsatisfied/no result retrieval searches. Retrieval from mechanised indexing is 3 times more successful than from title keyword data. MILOS II also used a standardized semantic vocabulary. Mechanised indexing demands high quality software and output data

Fauzi, F.; Belkhatir, M.: Multifaceted conceptual image indexing on the world wide web (2013) 0.04
```
0.04132867 = product of:
  0.10332167 = sum of:
    0.075019486 = weight(_text_:semantic in 2721) [ClassicSimilarity], result of:
      0.075019486 = score(doc=2721,freq=4.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.38979942 = fieldWeight in 2721, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.046875 = fieldNorm(doc=2721)
    0.028302183 = product of:
      0.056604367 = sum of:
        0.056604367 = weight(_text_:web in 2721) [ClassicSimilarity], result of:
          0.056604367 = score(doc=2721,freq=6.0), product of:
            0.15105948 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.04628742 = queryNorm
            0.37471575 = fieldWeight in 2721, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=2721)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

In this paper, we describe a user-centered design of an automated multifaceted concept-based indexing framework which analyzes the semantics of the Web image contextual information and classifies it into five broad semantic concept facets: signal, object, abstract, scene, and relational; and identifies the semantic relationships between the concepts. An important aspect of our indexing model is that it relates to the users' levels of image descriptions. Also, a major contribution relies on the fact that the classification is performed automatically with the raw image contextual information extracted from any general webpage and is not solely based on image tags like state-of-the-art solutions. Human Language Technology techniques and an external knowledge base are used to analyze the information both syntactically and semantically. Experimental results on a human-annotated Web image collection and corresponding contextual information indicate that our method outperforms empirical frameworks employing tf-idf and location-based tf-idf weighting schemes as well as n-gram indexing in a recall/precision based evaluation framework.
SIGIR'92 : Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (1992) 0.04
```
0.039232496 = product of:
  0.09808124 = sum of:
    0.054319873 = weight(_text_:retrieval in 6671) [ClassicSimilarity], result of:
      0.054319873 = score(doc=6671,freq=22.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.3879561 = fieldWeight in 6671, product of:
          4.690416 = tf(freq=22.0), with freq of:
            22.0 = termFreq=22.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.02734375 = fieldNorm(doc=6671)
    0.043761365 = weight(_text_:semantic in 6671) [ClassicSimilarity], result of:
      0.043761365 = score(doc=6671,freq=4.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.22738299 = fieldWeight in 6671, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.02734375 = fieldNorm(doc=6671)
  0.4 = coord(2/5)
```
Content

HARMAN, D.: Relevance feedback revisited; AALBERSBERG, I.J.: Incremental relevance feedback; TAGUE-SUTCLIFFE, J.: Measuring the informativeness of a retrieval process; LEWIS, D.D.: An evaluation of phrasal and clustered representations on a text categorization task; BLOSSEVILLE, M.J., G. HÉBRAIL, M.G. MONTEIL u. N. PÉNOT: Automatic document classification: natural language processing, statistical analysis, and expert system techniques used together; MASAND, B., G. LINOFF u. D. WALTZ: Classifying news stories using memory based reasoning; KEEN, E.M.: Term position ranking: some new test results; CROUCH, C.J. u. B. YANG: Experiments in automatic statistical thesaurus construction; GREFENSTETTE, G.: Use of syntactic context to produce term association lists for text retrieval; ANICK, P.G. u. R.A. FLYNN: Versioning of full-text information retrieval system; BURKOWSKI, F.J.: Retrieval activities in a database consisting of heterogeneous collections; DEERWESTER, S.C., K. WACLENA u. M. LaMAR: A textual object management system; NIE, J.-Y.:Towards a probabilistic modal logic for semantic-based information retrieval; WANG, A.W., S.K.M. WONG u. Y.Y. YAO: An analysis of vector space models based on computational geometry; BARTELL, B.T., G.W. COTTRELL u. R.K. BELEW: Latent semantic indexing is an optimal special case of multidimensional scaling; GLAVITSCH, U. u. P. SCHÄUBLE: A system for retrieving speech documents; MARGULIS, E.L.: N-Poisson document modelling; HESS, M.: An incrementally extensible document retrieval system based on linguistics and logical principles; COOPER, W.S., F.C. GEY u. D.P. DABNEY: Probabilistic retrieval based on staged logistic regression; FUHR, N.: Integration of probabilistic fact and text retrieval; CROFT, B., L.A. SMITH u. H. TURTLE: A loosely-coupled integration of a text retrieval system and an object-oriented database system; DUMAIS, S.T. u. J. NIELSEN: Automating the assignement of submitted manuscripts to reviewers; GOST, M.A. u. M. MASOTTI: Design of an OPAC database to permit different subject searching accesses; ROBERTSON, A.M. u. P. WILLETT: Searching for historical word forms in a database of 17th century English text using spelling correction methods; FAX, E.A., Q.F. CHEN u. L.S. HEATH: A faster algorithm for constructing minimal perfect hash functions; MOFFAT, A. u. J. ZOBEL: Parameterised compression for sparse bitmaps; GRANDI, F., P. TIBERIO u. P. Zezula: Frame-sliced patitioned parallel signature files; ALLEN, B.: Cognitive differences in end user searching of a CD-ROM index; SONNENWALD, D.H.: Developing a theory to guide the process of designing information retrieval systems; CUTTING, D.R., J.O. PEDERSEN, D. KARGER, u. J.W. TUKEY: Scatter/ Gather: a cluster-based approach to browsing large document collections; CHALMERS, M. u. P. CHITSON: Bead: Explorations in information visualization; WILLIAMSON, C. u. B. SHNEIDERMAN: The dynamic HomeFinder: evaluating dynamic queries in a real-estate information exploring system

Biebricher, N.; Fuhr, N.; Lustig, G.; Schwantner, M.; Knorz, G.: ¬The automatic indexing system AIR/PHYS : from research to application (1988) 0.04

0.039013553 = product of:
  0.09753388 = sum of:
    0.066177346 = weight(_text_:retrieval in 1952) [ClassicSimilarity], result of:
      0.066177346 = score(doc=1952,freq=4.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.47264296 = fieldWeight in 1952, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.078125 = fieldNorm(doc=1952)
    0.031356532 = product of:
      0.062713064 = sum of:
        0.062713064 = weight(_text_:22 in 1952) [ClassicSimilarity], result of:
          0.062713064 = score(doc=1952,freq=2.0), product of:
            0.16209066 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04628742 = queryNorm
            0.38690117 = fieldWeight in 1952, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=1952)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Date: 16. 8.1998 12:51:22
Footnote: Wiederabgedruckt in: Readings in information retrieval. Ed.: K. Sparck Jones u. P. Willett. San Francisco: Morgan Kaufmann 1997. S.513-517.
Source: Proceedings of the 11th annual conference on research and development in information retrieval. Ed.: Y. Chiaramella

Tsujii, J.-I.: Automatic acquisition of semantic collocation from corpora (1995) 0.04

0.03832571 = product of:
  0.095814265 = sum of:
    0.07072904 = weight(_text_:semantic in 4709) [ClassicSimilarity], result of:
      0.07072904 = score(doc=4709,freq=2.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.36750638 = fieldWeight in 4709, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0625 = fieldNorm(doc=4709)
    0.025085226 = product of:
      0.05017045 = sum of:
        0.05017045 = weight(_text_:22 in 4709) [ClassicSimilarity], result of:
          0.05017045 = score(doc=4709,freq=2.0), product of:
            0.16209066 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04628742 = queryNorm
            0.30952093 = fieldWeight in 4709, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=4709)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Date: 31. 7.1996 9:22:19

Strobel, S.; Marín-Arraiza, P.: Metadata for scientific audiovisual media : current practices and perspectives of the TIB / AV-portal (2015) 0.04
```
0.038241964 = product of:
  0.09560491 = sum of:
    0.033088673 = weight(_text_:retrieval in 3667) [ClassicSimilarity], result of:
      0.033088673 = score(doc=3667,freq=4.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.23632148 = fieldWeight in 3667, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3667)
    0.062516235 = weight(_text_:semantic in 3667) [ClassicSimilarity], result of:
      0.062516235 = score(doc=3667,freq=4.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.32483283 = fieldWeight in 3667, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3667)
  0.4 = coord(2/5)
```
Abstract

Descriptive metadata play a key role in finding relevant search results in large amounts of unstructured data. However, current scientific audiovisual media are provided with little metadata, which makes them hard to find, let alone individual sequences. In this paper, the TIB / AV-Portal is presented as a use case where methods concerning the automatic generation of metadata, a semantic search and cross-lingual retrieval (German/English) have already been applied. These methods result in a better discoverability of the scientific audiovisual media hosted in the portal. Text, speech, and image content of the video are automatically indexed by specialised GND (Gemeinsame Normdatei) subject headings. A semantic search is established based on properties of the GND ontology. The cross-lingual retrieval uses English 'translations' that were derived by an ontology mapping (DBpedia i. a.). Further ways of increasing the discoverability and reuse of the metadata are publishing them as Linked Open Data and interlinking them with other data sets.

Hodges, P.R.: Keyword in title indexes : effectiveness of retrieval in computer searches (1983) 0.03

0.034984723 = product of:
  0.08746181 = sum of:
    0.06551223 = weight(_text_:retrieval in 5001) [ClassicSimilarity], result of:
      0.06551223 = score(doc=5001,freq=8.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.46789268 = fieldWeight in 5001, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5001)
    0.021949572 = product of:
      0.043899145 = sum of:
        0.043899145 = weight(_text_:22 in 5001) [ClassicSimilarity], result of:
          0.043899145 = score(doc=5001,freq=2.0), product of:
            0.16209066 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04628742 = queryNorm
            0.2708308 = fieldWeight in 5001, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5001)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: A study was done to test the effectiveness of retrieval using title word searching. It was based on actual search profiles used in the Mechanized Information Center at Ohio State University, in order ro replicate as closely as possible actual searching conditions. Fewer than 50% of the relevant titles were retrieved by keywords in titles. The low rate of retrieval can be attributes to three sources: titles themselves, user and information specialist ignorance of the subject vocabulary in use, and to general language problems. Across fields it was found that the social sciences had the best retrieval rate, with science having the next best, and arts and humanities the lowest. Ways to enhance and supplement keyword in title searching on the computer and in printed indexes are discussed.
Date: 14. 3.1996 13:22:21

Pirkola, A.: Morphological typology of languages for IR (2001) 0.03

0.032449383 = product of:
  0.08112346 = sum of:
    0.028076671 = weight(_text_:retrieval in 4476) [ClassicSimilarity], result of:
      0.028076671 = score(doc=4476,freq=2.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.20052543 = fieldWeight in 4476, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=4476)
    0.05304678 = weight(_text_:semantic in 4476) [ClassicSimilarity], result of:
      0.05304678 = score(doc=4476,freq=2.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.2756298 = fieldWeight in 4476, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.046875 = fieldNorm(doc=4476)
  0.4 = coord(2/5)

Abstract: This paper presents a morphological classification of languages from the IR perspective. Linguistic typology research has shown that the morphological complexity of every language in the world can be described by two variables, index of synthesis and index of fusion. These variables provide a theoretical basis for IR research handling morphological issues. A common theoretical framework is needed in particular because of the increasing significance of cross-language retrieval research and CLIR systems processing different languages. The paper elaborates the linguistic morphological typology for the purposes of IR research. It studies how the indexes of synthesis and fusion could be used as practical tools in mono- and cross-lingual IR research. The need for semantic and syntactic typologies is discussed. The paper also reviews studies made in different languages on the effects of morphology and stemming in IR.

Karpathy, A.; Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions (2015) 0.03
```
0.032449383 = product of:
  0.08112346 = sum of:
    0.028076671 = weight(_text_:retrieval in 1868) [ClassicSimilarity], result of:
      0.028076671 = score(doc=1868,freq=2.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.20052543 = fieldWeight in 1868, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=1868)
    0.05304678 = weight(_text_:semantic in 1868) [ClassicSimilarity], result of:
      0.05304678 = score(doc=1868,freq=2.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.2756298 = fieldWeight in 1868, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.046875 = fieldNorm(doc=1868)
  0.4 = coord(2/5)
```
Abstract

We present a model that generates free-form natural language descriptions of image regions. Our model leverages datasets of images and their sentence descriptions to learn about the inter-modal correspondences between text and visual data. Our approach is based on a novel combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural Networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding. We then describe a Recurrent Neural Network architecture that uses the inferred alignments to learn to generate novel descriptions of image regions. We demonstrate the effectiveness of our alignment model with ranking experiments on Flickr8K, Flickr30K and COCO datasets, where we substantially improve on the state of the art. We then show that the sentences created by our generative model outperform retrieval baselines on the three aforementioned datasets and a new dataset of region-level annotations.

Search (204 results, page 1 of 11)

Authors

Years

Languages

Types

Themes

Subjects

Classifications