Search (185 results, page 1 of 10)

Stankovic, R. et al.: Indexing of textual databases based on lexical resources : a case study for Serbian (2016) 0.16

0.16492793 = product of:
  0.2473919 = sum of:
    0.09675461 = weight(_text_:semantic in 2759) [ClassicSimilarity], result of:
      0.09675461 = score(doc=2759,freq=2.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.45938298 = fieldWeight in 2759, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.078125 = fieldNorm(doc=2759)
    0.15063728 = sum of:
      0.08200603 = weight(_text_:indexing in 2759) [ClassicSimilarity], result of:
        0.08200603 = score(doc=2759,freq=2.0), product of:
          0.19390269 = queryWeight, product of:
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.050655533 = queryNorm
          0.42292362 = fieldWeight in 2759, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.078125 = fieldNorm(doc=2759)
      0.068631254 = weight(_text_:22 in 2759) [ClassicSimilarity], result of:
        0.068631254 = score(doc=2759,freq=2.0), product of:
          0.17738704 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.050655533 = queryNorm
          0.38690117 = fieldWeight in 2759, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.078125 = fieldNorm(doc=2759)
  0.6666667 = coord(2/3)

Date: 1. 2.2016 18:25:22
Source: Semantic keyword-based search on structured data sources: First COST Action IC1302 International KEYSTONE Conference, IKC 2015, Coimbra, Portugal, September 8-9, 2015. Revised Selected Papers. Eds.: J. Cardoso et al

Bordoni, L.; Pazienza, M.T.: Documents automatic indexing in an environmental domain (1997) 0.13

0.13130128 = product of:
  0.19695193 = sum of:
    0.06772823 = weight(_text_:semantic in 530) [ClassicSimilarity], result of:
      0.06772823 = score(doc=530,freq=2.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.32156807 = fieldWeight in 530, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0546875 = fieldNorm(doc=530)
    0.1292237 = sum of:
      0.081181824 = weight(_text_:indexing in 530) [ClassicSimilarity], result of:
        0.081181824 = score(doc=530,freq=4.0), product of:
          0.19390269 = queryWeight, product of:
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.050655533 = queryNorm
          0.41867304 = fieldWeight in 530, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.0546875 = fieldNorm(doc=530)
      0.048041876 = weight(_text_:22 in 530) [ClassicSimilarity], result of:
        0.048041876 = score(doc=530,freq=2.0), product of:
          0.17738704 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.050655533 = queryNorm
          0.2708308 = fieldWeight in 530, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=530)
  0.6666667 = coord(2/3)

Abstract: Describes an application of Natural Language Processing (NLP) techniques, in HIRMA (Hypertextual Information Retrieval Managed by ARIOSTO), to the problem of document indexing by referring to a system which incorporates natural language processing techniques to determine the subject of the text of documents and to associate them with relevant semantic indexes. Describes briefly the overall system, details of its implementation on a corpus of scientific abstracts related to environmental topics and experimental evidence of the system's behaviour. Analyzes in detail an experiment designed to evaluate the system's retrieval ability in terms of recall and precision
Source: International forum on information and documentation. 22(1997) no.1, S.17-28

Prasad, A.R.D.: PROMETHEUS: an automatic indexing system (1996) 0.12

0.11671345 = product of:
  0.17507017 = sum of:
    0.10946535 = weight(_text_:semantic in 5189) [ClassicSimilarity], result of:
      0.10946535 = score(doc=5189,freq=4.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.51973253 = fieldWeight in 5189, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0625 = fieldNorm(doc=5189)
    0.06560482 = product of:
      0.13120964 = sum of:
        0.13120964 = weight(_text_:indexing in 5189) [ClassicSimilarity], result of:
          0.13120964 = score(doc=5189,freq=8.0), product of:
            0.19390269 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.050655533 = queryNorm
            0.6766778 = fieldWeight in 5189, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0625 = fieldNorm(doc=5189)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: An automatic indexing system using the tools and techniques of artificial intelligence is described. The paper presents the various components of the system like the parser, grammar formalism, lexicon, and the frame based knowledge representation for semantic representation. The semantic representation is based on the Ranganathan school of thought, especially that of Deep Structure of Subject Indexing Languages enunciated by Bhattacharyya. It is attempted to demonstrate the various stepts in indexing by providing an illustration

Lochbaum, K.E.; Streeter, A.R.: Comparing and combining the effectiveness of latent semantic indexing and the ordinary vector space model for information retrieval (1989) 0.11
```
0.11494769 = product of:
  0.17242153 = sum of:
    0.12980995 = weight(_text_:semantic in 3458) [ClassicSimilarity], result of:
      0.12980995 = score(doc=3458,freq=10.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.616327 = fieldWeight in 3458, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.046875 = fieldNorm(doc=3458)
    0.04261158 = product of:
      0.08522316 = sum of:
        0.08522316 = weight(_text_:indexing in 3458) [ClassicSimilarity], result of:
          0.08522316 = score(doc=3458,freq=6.0), product of:
            0.19390269 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.050655533 = queryNorm
            0.4395151 = fieldWeight in 3458, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.046875 = fieldNorm(doc=3458)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

A retrievalsystem was built to find individuals with appropriate expertise within a large research establishment on the basis of their authored documents. The expert-locating system uses a new method for automatic indexing and retrieval based on singular value decomposition, a matrix decomposition technique related to the factor analysis. Organizational groups, represented by the documents they write, and the terms contained in these documents, are fit simultaneously into a 100-dimensional "semantic" space. User queries are positioned in the semantic space, and the most similar groups are returned to the user. Here we compared the standard vector-space model with this new technique and found that combining the two methods improved performance over either alone. We also examined the effects of various experimental variables on the system`s retrieval accuracy. In particular, the effects of: term weighting functions in the semantic space construction and in query construction, suffix stripping, and using lexical units larger than a a single word were studied.

Object

Latent Semantic Indexing

Ma, N.; Zheng, H.T.; Xiao, X.: ¬An ontology-based latent semantic indexing approach using long short-term memory networks (2017) 0.11

0.106335156 = product of:
  0.15950273 = sum of:
    0.11849972 = weight(_text_:semantic in 3810) [ClassicSimilarity], result of:
      0.11849972 = score(doc=3810,freq=12.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.56262696 = fieldWeight in 3810, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3810)
    0.041003015 = product of:
      0.08200603 = sum of:
        0.08200603 = weight(_text_:indexing in 3810) [ClassicSimilarity], result of:
          0.08200603 = score(doc=3810,freq=8.0), product of:
            0.19390269 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.050655533 = queryNorm
            0.42292362 = fieldWeight in 3810, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3810)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: Nowadays, online data shows an astonishing increase and the issue of semantic indexing remains an open question. Ontologies and knowledge bases have been widely used to optimize performance. However, researchers are placing increased emphasis on internal relations of ontologies but neglect latent semantic relations between ontologies and documents. They generally annotate instances mentioned in documents, which are related to concepts in ontologies. In this paper, we propose an Ontology-based Latent Semantic Indexing approach utilizing Long Short-Term Memory networks (LSTM-OLSI). We utilize an importance-aware topic model to extract document-level semantic features and leverage ontologies to extract word-level contextual features. Then we encode the above two levels of features and match their embedding vectors utilizing LSTM networks. Finally, the experimental results reveal that LSTM-OLSI outperforms existing techniques and demonstrates deep comprehension of instances and articles.
Object: Latent Semantic Indexing

Vlachidis, A.; Tudhope, D.: ¬A knowledge-based approach to information extraction for semantic interoperability in the archaeology domain (2016) 0.10
```
0.10465857 = product of:
  0.15698785 = sum of:
    0.12799433 = weight(_text_:semantic in 2895) [ClassicSimilarity], result of:
      0.12799433 = score(doc=2895,freq=14.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.6077066 = fieldWeight in 2895, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2895)
    0.02899351 = product of:
      0.05798702 = sum of:
        0.05798702 = weight(_text_:indexing in 2895) [ClassicSimilarity], result of:
          0.05798702 = score(doc=2895,freq=4.0), product of:
            0.19390269 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.050655533 = queryNorm
            0.29905218 = fieldWeight in 2895, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2895)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

The article presents a method for automatic semantic indexing of archaeological grey-literature reports using empirical (rule-based) Information Extraction techniques in combination with domain-specific knowledge organization systems. The semantic annotation system (OPTIMA) performs the tasks of Named Entity Recognition, Relation Extraction, Negation Detection, and Word-Sense Disambiguation using hand-crafted rules and terminological resources for associating contextual abstractions with classes of the standard ontology CIDOC Conceptual Reference Model (CRM) for cultural heritage and its archaeological extension, CRM-EH. Relation Extraction (RE) performance benefits from a syntactic-based definition of RE patterns derived from domain oriented corpus analysis. The evaluation also shows clear benefit in the use of assistive natural language processing (NLP) modules relating to Word-Sense Disambiguation, Negation Detection, and Noun Phrase Validation, together with controlled thesaurus expansion. The semantic indexing results demonstrate the capacity of rule-based Information Extraction techniques to deliver interoperable semantic abstractions (semantic annotations) with respect to the CIDOC CRM and archaeological thesauri. Major contributions include recognition of relevant entities using shallow parsing NLP techniques driven by a complimentary use of ontological and terminological domain resources and empirical derivation of context-driven RE rules for the recognition of semantic relationships from phrases of unstructured text.

Liu, G.Z.: Semantic vector space model : implementation and evaluation (1997) 0.10

0.10294118 = product of:
  0.15441176 = sum of:
    0.12980995 = weight(_text_:semantic in 161) [ClassicSimilarity], result of:
      0.12980995 = score(doc=161,freq=10.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.616327 = fieldWeight in 161, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.046875 = fieldNorm(doc=161)
    0.02460181 = product of:
      0.04920362 = sum of:
        0.04920362 = weight(_text_:indexing in 161) [ClassicSimilarity], result of:
          0.04920362 = score(doc=161,freq=2.0), product of:
            0.19390269 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.050655533 = queryNorm
            0.2537542 = fieldWeight in 161, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.046875 = fieldNorm(doc=161)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: Presents the Semantic Vector Space Model (SVSM), a text representation and searching technique based on the combination of Vector Space Model (VSM) with heuristic syntax parsing and distributed representation of semantic case structures. Both document and queries are represented as semantic matrices. A search mechanism is designed to compute the similarity between 2 semantic matrices to predict relevancy. A prototype system was built to implement this model by modifying the SMART system and using the Xerox Part of Speech tagged as the pre-processor of the indexing. The prototype system was used in an experimental study to evaluate this technique in terms of precision, recall, and effectiveness of relevance ranking. Results show that if documents and queries were too short, the technique was less effective than VSM. But with longer documents and queires, especially when original docuemtns were used as queries, the system based on this technique was found be performance better than SMART

Hlava, M.M.K.: Machine-Aided Indexing (MAI) in a multilingual environemt (1992) 0.09

0.08947943 = product of:
  0.13421914 = sum of:
    0.077403694 = weight(_text_:semantic in 2378) [ClassicSimilarity], result of:
      0.077403694 = score(doc=2378,freq=2.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.36750638 = fieldWeight in 2378, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0625 = fieldNorm(doc=2378)
    0.05681544 = product of:
      0.11363088 = sum of:
        0.11363088 = weight(_text_:indexing in 2378) [ClassicSimilarity], result of:
          0.11363088 = score(doc=2378,freq=6.0), product of:
            0.19390269 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.050655533 = queryNorm
            0.5860202 = fieldWeight in 2378, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0625 = fieldNorm(doc=2378)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: The Machine-Aided Indexing (MAI) program, developed by Access Innovations, Inc., is a semantic based, Boolean statement, rule interpreting application designed to operate in a multilingual environment. Use of MAI across several languages with controlled vocabularies for each language provides a consistency in indexing not available through any other mechanism

Leyva, I.G.; Munoz, J.V.R.: Tendencias en los sistemas de indizacion automatica : estudio evolutivo (1996) 0.09

0.08947943 = product of:
  0.13421914 = sum of:
    0.077403694 = weight(_text_:semantic in 1462) [ClassicSimilarity], result of:
      0.077403694 = score(doc=1462,freq=2.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.36750638 = fieldWeight in 1462, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0625 = fieldNorm(doc=1462)
    0.05681544 = product of:
      0.11363088 = sum of:
        0.11363088 = weight(_text_:indexing in 1462) [ClassicSimilarity], result of:
          0.11363088 = score(doc=1462,freq=6.0), product of:
            0.19390269 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.050655533 = queryNorm
            0.5860202 = fieldWeight in 1462, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0625 = fieldNorm(doc=1462)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: Early research at the end of the 1950s on computerized indexing used statistical methods based on e.g. frequency, probability, clustering, and relevance. In the 1960s interest began to focus on linguistic analysis and natural language processing e.g. morphological, morphosyntactical, syntactical and semantic analysis. Since the 1980s computerized indexing research has widened to include images, graphics and sound. Examples are given of notable systems developed within each line of approach
Footnote: Übers. d. Titels: Tendencies in computerized indexing systems: an evolutionary study

Fauzi, F.; Belkhatir, M.: Multifaceted conceptual image indexing on the world wide web (2013) 0.09
```
0.0875351 = product of:
  0.13130264 = sum of:
    0.08209902 = weight(_text_:semantic in 2721) [ClassicSimilarity], result of:
      0.08209902 = score(doc=2721,freq=4.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.38979942 = fieldWeight in 2721, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.046875 = fieldNorm(doc=2721)
    0.04920362 = product of:
      0.09840724 = sum of:
        0.09840724 = weight(_text_:indexing in 2721) [ClassicSimilarity], result of:
          0.09840724 = score(doc=2721,freq=8.0), product of:
            0.19390269 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.050655533 = queryNorm
            0.5075084 = fieldWeight in 2721, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.046875 = fieldNorm(doc=2721)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

In this paper, we describe a user-centered design of an automated multifaceted concept-based indexing framework which analyzes the semantics of the Web image contextual information and classifies it into five broad semantic concept facets: signal, object, abstract, scene, and relational; and identifies the semantic relationships between the concepts. An important aspect of our indexing model is that it relates to the users' levels of image descriptions. Also, a major contribution relies on the fact that the classification is performed automatically with the raw image contextual information extracted from any general webpage and is not solely based on image tags like state-of-the-art solutions. Human Language Technology techniques and an external knowledge base are used to analyze the information both syntactically and semantically. Experimental results on a human-annotated Web image collection and corresponding contextual information indicate that our method outperforms empirical frameworks employing tf-idf and location-based tf-idf weighting schemes as well as n-gram indexing in a recall/precision based evaluation framework.
Chung, E.-K.; Miksa, S.; Hastings, S.K.: ¬A framework of automatic subject term assignment for text categorization : an indexing conception-based approach (2010) 0.09
```
0.08617924 = product of:
  0.12926885 = sum of:
    0.077403694 = weight(_text_:semantic in 3434) [ClassicSimilarity], result of:
      0.077403694 = score(doc=3434,freq=8.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.36750638 = fieldWeight in 3434, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.03125 = fieldNorm(doc=3434)
    0.051865168 = product of:
      0.103730336 = sum of:
        0.103730336 = weight(_text_:indexing in 3434) [ClassicSimilarity], result of:
          0.103730336 = score(doc=3434,freq=20.0), product of:
            0.19390269 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.050655533 = queryNorm
            0.5349608 = fieldWeight in 3434, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.03125 = fieldNorm(doc=3434)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

The purpose of this study is to examine whether the understandings of subject-indexing processes conducted by human indexers have a positive impact on the effectiveness of automatic subject term assignment through text categorization (TC). More specifically, human indexers' subject-indexing approaches, or conceptions, in conjunction with semantic sources were explored in the context of a typical scientific journal article dataset. Based on the premise that subject indexing approaches or conceptions with semantic sources are important for automatic subject term assignment through TC, this study proposed an indexing conception-based framework. For the purpose of this study, two research questions were explored: To what extent are semantic sources effective? To what extent are indexing conceptions effective? The experiments were conducted using a Support Vector Machine implementation in WEKA (I.H. Witten & E. Frank, [2000]). Using F-measure, the experiment results showed that cited works, source title, and title were as effective as the full text while a keyword was found more effective than the full text. In addition, the findings showed that an indexing conception-based framework was more effective than the full text. The content-oriented and the document-oriented indexing approaches especially were found more effective than the full text. Among three indexing conception-based approaches, the content-oriented approach and the document-oriented approach were more effective than the domain-oriented approach. In other words, in the context of a typical scientific journal article dataset, the objective contents and authors' intentions were more desirable for automatic subject term assignment via TC than the possible users' needs. The findings of this study support that incorporation of human indexers' indexing approaches or conception in conjunction with semantic sources has a positive impact on the effectiveness of automatic subject term assignment.

Hlava, M.M.K.: Automatic indexing : comparing rule-based and statistics-based indexing systems (2005) 0.09

0.08614914 = product of:
  0.2584474 = sum of:
    0.2584474 = sum of:
      0.16236365 = weight(_text_:indexing in 6265) [ClassicSimilarity], result of:
        0.16236365 = score(doc=6265,freq=4.0), product of:
          0.19390269 = queryWeight, product of:
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.050655533 = queryNorm
          0.8373461 = fieldWeight in 6265, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.109375 = fieldNorm(doc=6265)
      0.09608375 = weight(_text_:22 in 6265) [ClassicSimilarity], result of:
        0.09608375 = score(doc=6265,freq=2.0), product of:
          0.17738704 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.050655533 = queryNorm
          0.5416616 = fieldWeight in 6265, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.109375 = fieldNorm(doc=6265)
  0.33333334 = coord(1/3)

Source: Information outlook. 9(2005) no.8, S.22-23

Hlava, M.M.K.: Machine aided indexing (MAI) in a multilingual environment (1993) 0.08

0.0782945 = product of:
  0.11744174 = sum of:
    0.06772823 = weight(_text_:semantic in 7405) [ClassicSimilarity], result of:
      0.06772823 = score(doc=7405,freq=2.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.32156807 = fieldWeight in 7405, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7405)
    0.049713515 = product of:
      0.09942703 = sum of:
        0.09942703 = weight(_text_:indexing in 7405) [ClassicSimilarity], result of:
          0.09942703 = score(doc=7405,freq=6.0), product of:
            0.19390269 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.050655533 = queryNorm
            0.5127677 = fieldWeight in 7405, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0546875 = fieldNorm(doc=7405)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: The machine aided indexing (MAI) software devloped by Access Innovations, Inc., is a semantic based, Boolean statement, rule interpreting application with 3 modules: the MA engine which accepts input files, matches terms in the knowledge base, interprets rules, and outputs a text file with suggested indexing terms; a rule building application allowing each Boolean style rule in the knowledge base to be created or modifies; and a statistical computation module which analyzes performance of the MA software against text manually indexed by professional human indexers. The MA software can be applied across multiple languages and can be used where the text to be searched is in one language and the indexes to be output are in another

Gödert, W.; Liebig, M.: Maschinelle Indexierung auf dem Prüfstand : Ergebnisse eines Retrievaltests zum MILOS II Projekt (1997) 0.08

0.0782945 = product of:
  0.11744174 = sum of:
    0.06772823 = weight(_text_:semantic in 1174) [ClassicSimilarity], result of:
      0.06772823 = score(doc=1174,freq=2.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.32156807 = fieldWeight in 1174, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1174)
    0.049713515 = product of:
      0.09942703 = sum of:
        0.09942703 = weight(_text_:indexing in 1174) [ClassicSimilarity], result of:
          0.09942703 = score(doc=1174,freq=6.0), product of:
            0.19390269 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.050655533 = queryNorm
            0.5127677 = fieldWeight in 1174, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1174)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: The test ran between Nov 95-Aug 96 in Cologne Fachhochschule fur Bibliothekswesen (College of Librarianship).The test basis was a database of 190,000 book titles published between 1990-95. MILOS II mechanized indexing methods proved helpful in avoiding or reducing numbers of unsatisfied/no result retrieval searches. Retrieval from mechanised indexing is 3 times more successful than from title keyword data. MILOS II also used a standardized semantic vocabulary. Mechanised indexing demands high quality software and output data

Vledutz-Stokolov, N.: Concept recognition in an automatic text-processing system for the life sciences (1987) 0.08
```
0.07519031 = product of:
  0.112785466 = sum of:
    0.08379196 = weight(_text_:semantic in 2849) [ClassicSimilarity], result of:
      0.08379196 = score(doc=2849,freq=6.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.39783734 = fieldWeight in 2849, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2849)
    0.02899351 = product of:
      0.05798702 = sum of:
        0.05798702 = weight(_text_:indexing in 2849) [ClassicSimilarity], result of:
          0.05798702 = score(doc=2849,freq=4.0), product of:
            0.19390269 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.050655533 = queryNorm
            0.29905218 = fieldWeight in 2849, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2849)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

This article describes a natural-language text-processing system designed as an automatic aid to subject indexing at BIOSIS. The intellectual procedure the system should model is a deep indexing with a controlled vocabulary of biological concepts - Concept Headings (CHs). On the average, ten CHs are assigned to each article by BIOSIS indexers. The automatic procedure consists of two stages: (1) translation of natural-language biological titles into title-semantic representations which are in the constructed formalized language of Concept Primitives, and (2) translation of the latter representations into the language of CHs. The first stage is performed by matching the titles agianst the system's Semantic Vocabulary (SV). The SV currently contains approximately 15.000 biological natural-language terms and their translations in the language of Concept Primitives. Tor the ambiguous terms, the SV contains the algorithmical rules of term disambiguation, ruels based on semantic analysis of the contexts. The second stage of the automatic procedure is performed by matching the title representations against the CH definitions, formulated as Boolean search strategies in the language of Concept Primitives. Three experiments performed with the system and their results are decribed. The most typical problems the system encounters, the problems of lexical and situational ambiguities, are discussed. The disambiguation techniques employed are described and demonstrated in many examples
Grün, S.: Mehrwortbegriffe und Latent Semantic Analysis : Bewertung automatisch extrahierter Mehrwortgruppen mit LSA (2017) 0.08
```
0.07519031 = product of:
  0.112785466 = sum of:
    0.08379196 = weight(_text_:semantic in 3954) [ClassicSimilarity], result of:
      0.08379196 = score(doc=3954,freq=6.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.39783734 = fieldWeight in 3954, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3954)
    0.02899351 = product of:
      0.05798702 = sum of:
        0.05798702 = weight(_text_:indexing in 3954) [ClassicSimilarity], result of:
          0.05798702 = score(doc=3954,freq=4.0), product of:
            0.19390269 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.050655533 = queryNorm
            0.29905218 = fieldWeight in 3954, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3954)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Die vorliegende Studie untersucht das Potenzial von Mehrwortbegriffen für das Information Retrieval. Zielsetzung der Arbeit ist es, intellektuell positiv bewertete Kandidaten mithilfe des Latent Semantic Analysis (LSA) Verfahren höher zu gewichten, als negativ bewertete Kandidaten. Die positiven Kandidaten sollen demnach bei einem Ranking im Information Retrieval bevorzugt werden. Als Kollektion wurde eine Version der sozialwissenschaftlichen GIRT-Datenbank (German Indexing and Retrieval Testdatabase) eingesetzt. Um Kandidaten für Mehrwortbegriffe zu identifizieren wurde die automatische Indexierung Lingo verwendet. Die notwendigen Kernfunktionalitäten waren Lemmatisierung, Identifizierung von Komposita, algorithmische Mehrworterkennung sowie Gewichtung von Indextermen durch das LSA-Modell. Die durch Lingo erkannten und LSAgewichteten Mehrwortkandidaten wurden evaluiert. Zuerst wurde dazu eine intellektuelle Auswahl von positiven und negativen Mehrwortkandidaten vorgenommen. Im zweiten Schritt der Evaluierung erfolgte die Berechnung der Ausbeute, um den Anteil der positiven Mehrwortkandidaten zu erhalten. Im letzten Schritt der Evaluierung wurde auf der Basis der R-Precision berechnet, wie viele positiv bewerteten Mehrwortkandidaten es an der Stelle k des Rankings geschafft haben. Die Ausbeute der positiven Mehrwortkandidaten lag bei durchschnittlich ca. 39%, während die R-Precision einen Durchschnittswert von 54% erzielte. Das LSA-Modell erzielt ein ambivalentes Ergebnis mit positiver Tendenz.

Object

Latent Semantic Indexing

Junger, U.: Can indexing be automated? : the example of the Deutsche Nationalbibliothek (2012) 0.07

0.07221276 = product of:
  0.10831914 = sum of:
    0.06772823 = weight(_text_:semantic in 1717) [ClassicSimilarity], result of:
      0.06772823 = score(doc=1717,freq=2.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.32156807 = fieldWeight in 1717, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1717)
    0.040590912 = product of:
      0.081181824 = sum of:
        0.081181824 = weight(_text_:indexing in 1717) [ClassicSimilarity], result of:
          0.081181824 = score(doc=1717,freq=4.0), product of:
            0.19390269 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.050655533 = queryNorm
            0.41867304 = fieldWeight in 1717, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1717)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: The German subject headings authority file (Schlagwortnormdatei/SWD) provides a broad controlled vocabulary for indexing documents of all subjects. Traditionally used for intellectual subject cataloguing primarily of books the Deutsche Nationalbibliothek (DNB, German National Library) has been working on developping and implementing procedures for automated assignment of subject headings for online publications. This project, its results and problems are sketched in the paper.
Content: Beitrag für die Tagung: Beyond libraries - subject metadata in the digital environment and semantic web. IFLA Satellite Post-Conference, 17-18 August 2012, Tallinn. Vgl.: http://http://www.nlib.ee/index.php?id=17763.

Junger, U.: Can indexing be automated? : the example of the Deutsche Nationalbibliothek (2014) 0.07

0.07221276 = product of:
  0.10831914 = sum of:
    0.06772823 = weight(_text_:semantic in 1969) [ClassicSimilarity], result of:
      0.06772823 = score(doc=1969,freq=2.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.32156807 = fieldWeight in 1969, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1969)
    0.040590912 = product of:
      0.081181824 = sum of:
        0.081181824 = weight(_text_:indexing in 1969) [ClassicSimilarity], result of:
          0.081181824 = score(doc=1969,freq=4.0), product of:
            0.19390269 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.050655533 = queryNorm
            0.41867304 = fieldWeight in 1969, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1969)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: The German Integrated Authority File (Gemeinsame Normdatei, GND), provides a broad controlled vocabulary for indexing documents on all subjects. Traditionally used for intellectual subject cataloging primarily for books, the Deutsche Nationalbibliothek (DNB, German National Library) has been working on developing and implementing procedures for automated assignment of subject headings for online publications. This project, its results, and problems are outlined in this article.
Footnote: Contribution in a special issue "Beyond libraries: Subject metadata in the digital environment and Semantic Web" - Enthält Beiträge der gleichnamigen IFLA Satellite Post-Conference, 17-18 August 2012, Tallinn.

Tsujii, J.-I.: Automatic acquisition of semantic collocation from corpora (1995) 0.07

0.06990413 = product of:
  0.10485619 = sum of:
    0.077403694 = weight(_text_:semantic in 4709) [ClassicSimilarity], result of:
      0.077403694 = score(doc=4709,freq=2.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.36750638 = fieldWeight in 4709, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0625 = fieldNorm(doc=4709)
    0.0274525 = product of:
      0.054905 = sum of:
        0.054905 = weight(_text_:22 in 4709) [ClassicSimilarity], result of:
          0.054905 = score(doc=4709,freq=2.0), product of:
            0.17738704 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050655533 = queryNorm
            0.30952093 = fieldWeight in 4709, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=4709)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Date: 31. 7.1996 9:22:19

Lassalle, E.: Text retrieval : from a monolingual system to a multilingual system (1993) 0.06

0.064286895 = product of:
  0.09643034 = sum of:
    0.06772823 = weight(_text_:semantic in 7403) [ClassicSimilarity], result of:
      0.06772823 = score(doc=7403,freq=2.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.32156807 = fieldWeight in 7403, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7403)
    0.028702112 = product of:
      0.057404224 = sum of:
        0.057404224 = weight(_text_:indexing in 7403) [ClassicSimilarity], result of:
          0.057404224 = score(doc=7403,freq=2.0), product of:
            0.19390269 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.050655533 = queryNorm
            0.29604656 = fieldWeight in 7403, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0546875 = fieldNorm(doc=7403)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: Describes the TELMI monolingual text retrieval system and its future extension, a multilingual system. TELMI is designed for medium sized databases containing short texts. The characteristics of the system are fine-grained natural language processing (NLP); an open domain and a large scale knowledge base; automated indexing based on conceptual representation of texts and reusability of the NLP tools. Discusses the French MINITEL service, the MGS information service and the TELMI research system covering the full text system; NLP architecture; the lexical level; the syntactic level; the semantic level and an example of the use of a generic system

Search (185 results, page 1 of 10)

Authors

Years

Languages

Types

Themes

Subjects

Classifications