Search (142 results, page 1 of 8)

Stankovic, R. et al.: Indexing of textual databases based on lexical resources : a case study for Serbian (2016) 0.16

0.16492793 = product of:
  0.2473919 = sum of:
    0.09675461 = weight(_text_:semantic in 2759) [ClassicSimilarity], result of:
      0.09675461 = score(doc=2759,freq=2.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.45938298 = fieldWeight in 2759, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.078125 = fieldNorm(doc=2759)
    0.15063728 = sum of:
      0.08200603 = weight(_text_:indexing in 2759) [ClassicSimilarity], result of:
        0.08200603 = score(doc=2759,freq=2.0), product of:
          0.19390269 = queryWeight, product of:
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.050655533 = queryNorm
          0.42292362 = fieldWeight in 2759, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.078125 = fieldNorm(doc=2759)
      0.068631254 = weight(_text_:22 in 2759) [ClassicSimilarity], result of:
        0.068631254 = score(doc=2759,freq=2.0), product of:
          0.17738704 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.050655533 = queryNorm
          0.38690117 = fieldWeight in 2759, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.078125 = fieldNorm(doc=2759)
  0.6666667 = coord(2/3)

Date: 1. 2.2016 18:25:22
Source: Semantic keyword-based search on structured data sources: First COST Action IC1302 International KEYSTONE Conference, IKC 2015, Coimbra, Portugal, September 8-9, 2015. Revised Selected Papers. Eds.: J. Cardoso et al

Bordoni, L.; Pazienza, M.T.: Documents automatic indexing in an environmental domain (1997) 0.13

0.13130128 = product of:
  0.19695193 = sum of:
    0.06772823 = weight(_text_:semantic in 530) [ClassicSimilarity], result of:
      0.06772823 = score(doc=530,freq=2.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.32156807 = fieldWeight in 530, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0546875 = fieldNorm(doc=530)
    0.1292237 = sum of:
      0.081181824 = weight(_text_:indexing in 530) [ClassicSimilarity], result of:
        0.081181824 = score(doc=530,freq=4.0), product of:
          0.19390269 = queryWeight, product of:
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.050655533 = queryNorm
          0.41867304 = fieldWeight in 530, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.0546875 = fieldNorm(doc=530)
      0.048041876 = weight(_text_:22 in 530) [ClassicSimilarity], result of:
        0.048041876 = score(doc=530,freq=2.0), product of:
          0.17738704 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.050655533 = queryNorm
          0.2708308 = fieldWeight in 530, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=530)
  0.6666667 = coord(2/3)

Abstract: Describes an application of Natural Language Processing (NLP) techniques, in HIRMA (Hypertextual Information Retrieval Managed by ARIOSTO), to the problem of document indexing by referring to a system which incorporates natural language processing techniques to determine the subject of the text of documents and to associate them with relevant semantic indexes. Describes briefly the overall system, details of its implementation on a corpus of scientific abstracts related to environmental topics and experimental evidence of the system's behaviour. Analyzes in detail an experiment designed to evaluate the system's retrieval ability in terms of recall and precision
Source: International forum on information and documentation. 22(1997) no.1, S.17-28

Prasad, A.R.D.: PROMETHEUS: an automatic indexing system (1996) 0.12

0.11671345 = product of:
  0.17507017 = sum of:
    0.10946535 = weight(_text_:semantic in 5189) [ClassicSimilarity], result of:
      0.10946535 = score(doc=5189,freq=4.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.51973253 = fieldWeight in 5189, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0625 = fieldNorm(doc=5189)
    0.06560482 = product of:
      0.13120964 = sum of:
        0.13120964 = weight(_text_:indexing in 5189) [ClassicSimilarity], result of:
          0.13120964 = score(doc=5189,freq=8.0), product of:
            0.19390269 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.050655533 = queryNorm
            0.6766778 = fieldWeight in 5189, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0625 = fieldNorm(doc=5189)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: An automatic indexing system using the tools and techniques of artificial intelligence is described. The paper presents the various components of the system like the parser, grammar formalism, lexicon, and the frame based knowledge representation for semantic representation. The semantic representation is based on the Ranganathan school of thought, especially that of Deep Structure of Subject Indexing Languages enunciated by Bhattacharyya. It is attempted to demonstrate the various stepts in indexing by providing an illustration

Lochbaum, K.E.; Streeter, A.R.: Comparing and combining the effectiveness of latent semantic indexing and the ordinary vector space model for information retrieval (1989) 0.11
```
0.11494769 = product of:
  0.17242153 = sum of:
    0.12980995 = weight(_text_:semantic in 3458) [ClassicSimilarity], result of:
      0.12980995 = score(doc=3458,freq=10.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.616327 = fieldWeight in 3458, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.046875 = fieldNorm(doc=3458)
    0.04261158 = product of:
      0.08522316 = sum of:
        0.08522316 = weight(_text_:indexing in 3458) [ClassicSimilarity], result of:
          0.08522316 = score(doc=3458,freq=6.0), product of:
            0.19390269 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.050655533 = queryNorm
            0.4395151 = fieldWeight in 3458, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.046875 = fieldNorm(doc=3458)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

A retrievalsystem was built to find individuals with appropriate expertise within a large research establishment on the basis of their authored documents. The expert-locating system uses a new method for automatic indexing and retrieval based on singular value decomposition, a matrix decomposition technique related to the factor analysis. Organizational groups, represented by the documents they write, and the terms contained in these documents, are fit simultaneously into a 100-dimensional "semantic" space. User queries are positioned in the semantic space, and the most similar groups are returned to the user. Here we compared the standard vector-space model with this new technique and found that combining the two methods improved performance over either alone. We also examined the effects of various experimental variables on the system`s retrieval accuracy. In particular, the effects of: term weighting functions in the semantic space construction and in query construction, suffix stripping, and using lexical units larger than a a single word were studied.

Object

Latent Semantic Indexing
Vlachidis, A.; Tudhope, D.: ¬A knowledge-based approach to information extraction for semantic interoperability in the archaeology domain (2016) 0.10
```
0.10465857 = product of:
  0.15698785 = sum of:
    0.12799433 = weight(_text_:semantic in 2895) [ClassicSimilarity], result of:
      0.12799433 = score(doc=2895,freq=14.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.6077066 = fieldWeight in 2895, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2895)
    0.02899351 = product of:
      0.05798702 = sum of:
        0.05798702 = weight(_text_:indexing in 2895) [ClassicSimilarity], result of:
          0.05798702 = score(doc=2895,freq=4.0), product of:
            0.19390269 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.050655533 = queryNorm
            0.29905218 = fieldWeight in 2895, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2895)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

The article presents a method for automatic semantic indexing of archaeological grey-literature reports using empirical (rule-based) Information Extraction techniques in combination with domain-specific knowledge organization systems. The semantic annotation system (OPTIMA) performs the tasks of Named Entity Recognition, Relation Extraction, Negation Detection, and Word-Sense Disambiguation using hand-crafted rules and terminological resources for associating contextual abstractions with classes of the standard ontology CIDOC Conceptual Reference Model (CRM) for cultural heritage and its archaeological extension, CRM-EH. Relation Extraction (RE) performance benefits from a syntactic-based definition of RE patterns derived from domain oriented corpus analysis. The evaluation also shows clear benefit in the use of assistive natural language processing (NLP) modules relating to Word-Sense Disambiguation, Negation Detection, and Noun Phrase Validation, together with controlled thesaurus expansion. The semantic indexing results demonstrate the capacity of rule-based Information Extraction techniques to deliver interoperable semantic abstractions (semantic annotations) with respect to the CIDOC CRM and archaeological thesauri. Major contributions include recognition of relevant entities using shallow parsing NLP techniques driven by a complimentary use of ontological and terminological domain resources and empirical derivation of context-driven RE rules for the recognition of semantic relationships from phrases of unstructured text.

Liu, G.Z.: Semantic vector space model : implementation and evaluation (1997) 0.10

0.10294118 = product of:
  0.15441176 = sum of:
    0.12980995 = weight(_text_:semantic in 161) [ClassicSimilarity], result of:
      0.12980995 = score(doc=161,freq=10.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.616327 = fieldWeight in 161, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.046875 = fieldNorm(doc=161)
    0.02460181 = product of:
      0.04920362 = sum of:
        0.04920362 = weight(_text_:indexing in 161) [ClassicSimilarity], result of:
          0.04920362 = score(doc=161,freq=2.0), product of:
            0.19390269 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.050655533 = queryNorm
            0.2537542 = fieldWeight in 161, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.046875 = fieldNorm(doc=161)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: Presents the Semantic Vector Space Model (SVSM), a text representation and searching technique based on the combination of Vector Space Model (VSM) with heuristic syntax parsing and distributed representation of semantic case structures. Both document and queries are represented as semantic matrices. A search mechanism is designed to compute the similarity between 2 semantic matrices to predict relevancy. A prototype system was built to implement this model by modifying the SMART system and using the Xerox Part of Speech tagged as the pre-processor of the indexing. The prototype system was used in an experimental study to evaluate this technique in terms of precision, recall, and effectiveness of relevance ranking. Results show that if documents and queries were too short, the technique was less effective than VSM. But with longer documents and queires, especially when original docuemtns were used as queries, the system based on this technique was found be performance better than SMART

Hlava, M.M.K.: Machine-Aided Indexing (MAI) in a multilingual environemt (1992) 0.09

0.08947943 = product of:
  0.13421914 = sum of:
    0.077403694 = weight(_text_:semantic in 2378) [ClassicSimilarity], result of:
      0.077403694 = score(doc=2378,freq=2.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.36750638 = fieldWeight in 2378, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0625 = fieldNorm(doc=2378)
    0.05681544 = product of:
      0.11363088 = sum of:
        0.11363088 = weight(_text_:indexing in 2378) [ClassicSimilarity], result of:
          0.11363088 = score(doc=2378,freq=6.0), product of:
            0.19390269 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.050655533 = queryNorm
            0.5860202 = fieldWeight in 2378, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0625 = fieldNorm(doc=2378)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: The Machine-Aided Indexing (MAI) program, developed by Access Innovations, Inc., is a semantic based, Boolean statement, rule interpreting application designed to operate in a multilingual environment. Use of MAI across several languages with controlled vocabularies for each language provides a consistency in indexing not available through any other mechanism

Fauzi, F.; Belkhatir, M.: Multifaceted conceptual image indexing on the world wide web (2013) 0.09
```
0.0875351 = product of:
  0.13130264 = sum of:
    0.08209902 = weight(_text_:semantic in 2721) [ClassicSimilarity], result of:
      0.08209902 = score(doc=2721,freq=4.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.38979942 = fieldWeight in 2721, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.046875 = fieldNorm(doc=2721)
    0.04920362 = product of:
      0.09840724 = sum of:
        0.09840724 = weight(_text_:indexing in 2721) [ClassicSimilarity], result of:
          0.09840724 = score(doc=2721,freq=8.0), product of:
            0.19390269 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.050655533 = queryNorm
            0.5075084 = fieldWeight in 2721, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.046875 = fieldNorm(doc=2721)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

In this paper, we describe a user-centered design of an automated multifaceted concept-based indexing framework which analyzes the semantics of the Web image contextual information and classifies it into five broad semantic concept facets: signal, object, abstract, scene, and relational; and identifies the semantic relationships between the concepts. An important aspect of our indexing model is that it relates to the users' levels of image descriptions. Also, a major contribution relies on the fact that the classification is performed automatically with the raw image contextual information extracted from any general webpage and is not solely based on image tags like state-of-the-art solutions. Human Language Technology techniques and an external knowledge base are used to analyze the information both syntactically and semantically. Experimental results on a human-annotated Web image collection and corresponding contextual information indicate that our method outperforms empirical frameworks employing tf-idf and location-based tf-idf weighting schemes as well as n-gram indexing in a recall/precision based evaluation framework.
Chung, E.-K.; Miksa, S.; Hastings, S.K.: ¬A framework of automatic subject term assignment for text categorization : an indexing conception-based approach (2010) 0.09
```
0.08617924 = product of:
  0.12926885 = sum of:
    0.077403694 = weight(_text_:semantic in 3434) [ClassicSimilarity], result of:
      0.077403694 = score(doc=3434,freq=8.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.36750638 = fieldWeight in 3434, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.03125 = fieldNorm(doc=3434)
    0.051865168 = product of:
      0.103730336 = sum of:
        0.103730336 = weight(_text_:indexing in 3434) [ClassicSimilarity], result of:
          0.103730336 = score(doc=3434,freq=20.0), product of:
            0.19390269 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.050655533 = queryNorm
            0.5349608 = fieldWeight in 3434, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.03125 = fieldNorm(doc=3434)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

The purpose of this study is to examine whether the understandings of subject-indexing processes conducted by human indexers have a positive impact on the effectiveness of automatic subject term assignment through text categorization (TC). More specifically, human indexers' subject-indexing approaches, or conceptions, in conjunction with semantic sources were explored in the context of a typical scientific journal article dataset. Based on the premise that subject indexing approaches or conceptions with semantic sources are important for automatic subject term assignment through TC, this study proposed an indexing conception-based framework. For the purpose of this study, two research questions were explored: To what extent are semantic sources effective? To what extent are indexing conceptions effective? The experiments were conducted using a Support Vector Machine implementation in WEKA (I.H. Witten & E. Frank, [2000]). Using F-measure, the experiment results showed that cited works, source title, and title were as effective as the full text while a keyword was found more effective than the full text. In addition, the findings showed that an indexing conception-based framework was more effective than the full text. The content-oriented and the document-oriented indexing approaches especially were found more effective than the full text. Among three indexing conception-based approaches, the content-oriented approach and the document-oriented approach were more effective than the domain-oriented approach. In other words, in the context of a typical scientific journal article dataset, the objective contents and authors' intentions were more desirable for automatic subject term assignment via TC than the possible users' needs. The findings of this study support that incorporation of human indexers' indexing approaches or conception in conjunction with semantic sources has a positive impact on the effectiveness of automatic subject term assignment.

Hlava, M.M.K.: Automatic indexing : comparing rule-based and statistics-based indexing systems (2005) 0.09

0.08614914 = product of:
  0.2584474 = sum of:
    0.2584474 = sum of:
      0.16236365 = weight(_text_:indexing in 6265) [ClassicSimilarity], result of:
        0.16236365 = score(doc=6265,freq=4.0), product of:
          0.19390269 = queryWeight, product of:
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.050655533 = queryNorm
          0.8373461 = fieldWeight in 6265, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.109375 = fieldNorm(doc=6265)
      0.09608375 = weight(_text_:22 in 6265) [ClassicSimilarity], result of:
        0.09608375 = score(doc=6265,freq=2.0), product of:
          0.17738704 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.050655533 = queryNorm
          0.5416616 = fieldWeight in 6265, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.109375 = fieldNorm(doc=6265)
  0.33333334 = coord(1/3)

Source: Information outlook. 9(2005) no.8, S.22-23

Hlava, M.M.K.: Machine aided indexing (MAI) in a multilingual environment (1993) 0.08

0.0782945 = product of:
  0.11744174 = sum of:
    0.06772823 = weight(_text_:semantic in 7405) [ClassicSimilarity], result of:
      0.06772823 = score(doc=7405,freq=2.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.32156807 = fieldWeight in 7405, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7405)
    0.049713515 = product of:
      0.09942703 = sum of:
        0.09942703 = weight(_text_:indexing in 7405) [ClassicSimilarity], result of:
          0.09942703 = score(doc=7405,freq=6.0), product of:
            0.19390269 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.050655533 = queryNorm
            0.5127677 = fieldWeight in 7405, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0546875 = fieldNorm(doc=7405)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: The machine aided indexing (MAI) software devloped by Access Innovations, Inc., is a semantic based, Boolean statement, rule interpreting application with 3 modules: the MA engine which accepts input files, matches terms in the knowledge base, interprets rules, and outputs a text file with suggested indexing terms; a rule building application allowing each Boolean style rule in the knowledge base to be created or modifies; and a statistical computation module which analyzes performance of the MA software against text manually indexed by professional human indexers. The MA software can be applied across multiple languages and can be used where the text to be searched is in one language and the indexes to be output are in another

Vledutz-Stokolov, N.: Concept recognition in an automatic text-processing system for the life sciences (1987) 0.08
```
0.07519031 = product of:
  0.112785466 = sum of:
    0.08379196 = weight(_text_:semantic in 2849) [ClassicSimilarity], result of:
      0.08379196 = score(doc=2849,freq=6.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.39783734 = fieldWeight in 2849, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2849)
    0.02899351 = product of:
      0.05798702 = sum of:
        0.05798702 = weight(_text_:indexing in 2849) [ClassicSimilarity], result of:
          0.05798702 = score(doc=2849,freq=4.0), product of:
            0.19390269 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.050655533 = queryNorm
            0.29905218 = fieldWeight in 2849, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2849)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

This article describes a natural-language text-processing system designed as an automatic aid to subject indexing at BIOSIS. The intellectual procedure the system should model is a deep indexing with a controlled vocabulary of biological concepts - Concept Headings (CHs). On the average, ten CHs are assigned to each article by BIOSIS indexers. The automatic procedure consists of two stages: (1) translation of natural-language biological titles into title-semantic representations which are in the constructed formalized language of Concept Primitives, and (2) translation of the latter representations into the language of CHs. The first stage is performed by matching the titles agianst the system's Semantic Vocabulary (SV). The SV currently contains approximately 15.000 biological natural-language terms and their translations in the language of Concept Primitives. Tor the ambiguous terms, the SV contains the algorithmical rules of term disambiguation, ruels based on semantic analysis of the contexts. The second stage of the automatic procedure is performed by matching the title representations against the CH definitions, formulated as Boolean search strategies in the language of Concept Primitives. Three experiments performed with the system and their results are decribed. The most typical problems the system encounters, the problems of lexical and situational ambiguities, are discussed. The disambiguation techniques employed are described and demonstrated in many examples

Junger, U.: Can indexing be automated? : the example of the Deutsche Nationalbibliothek (2012) 0.07

0.07221276 = product of:
  0.10831914 = sum of:
    0.06772823 = weight(_text_:semantic in 1717) [ClassicSimilarity], result of:
      0.06772823 = score(doc=1717,freq=2.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.32156807 = fieldWeight in 1717, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1717)
    0.040590912 = product of:
      0.081181824 = sum of:
        0.081181824 = weight(_text_:indexing in 1717) [ClassicSimilarity], result of:
          0.081181824 = score(doc=1717,freq=4.0), product of:
            0.19390269 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.050655533 = queryNorm
            0.41867304 = fieldWeight in 1717, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1717)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: The German subject headings authority file (Schlagwortnormdatei/SWD) provides a broad controlled vocabulary for indexing documents of all subjects. Traditionally used for intellectual subject cataloguing primarily of books the Deutsche Nationalbibliothek (DNB, German National Library) has been working on developping and implementing procedures for automated assignment of subject headings for online publications. This project, its results and problems are sketched in the paper.
Content: Beitrag für die Tagung: Beyond libraries - subject metadata in the digital environment and semantic web. IFLA Satellite Post-Conference, 17-18 August 2012, Tallinn. Vgl.: http://http://www.nlib.ee/index.php?id=17763.

Junger, U.: Can indexing be automated? : the example of the Deutsche Nationalbibliothek (2014) 0.07

0.07221276 = product of:
  0.10831914 = sum of:
    0.06772823 = weight(_text_:semantic in 1969) [ClassicSimilarity], result of:
      0.06772823 = score(doc=1969,freq=2.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.32156807 = fieldWeight in 1969, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1969)
    0.040590912 = product of:
      0.081181824 = sum of:
        0.081181824 = weight(_text_:indexing in 1969) [ClassicSimilarity], result of:
          0.081181824 = score(doc=1969,freq=4.0), product of:
            0.19390269 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.050655533 = queryNorm
            0.41867304 = fieldWeight in 1969, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1969)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: The German Integrated Authority File (Gemeinsame Normdatei, GND), provides a broad controlled vocabulary for indexing documents on all subjects. Traditionally used for intellectual subject cataloging primarily for books, the Deutsche Nationalbibliothek (DNB, German National Library) has been working on developing and implementing procedures for automated assignment of subject headings for online publications. This project, its results, and problems are outlined in this article.
Footnote: Contribution in a special issue "Beyond libraries: Subject metadata in the digital environment and Semantic Web" - Enthält Beiträge der gleichnamigen IFLA Satellite Post-Conference, 17-18 August 2012, Tallinn.

Tsujii, J.-I.: Automatic acquisition of semantic collocation from corpora (1995) 0.07

0.06990413 = product of:
  0.10485619 = sum of:
    0.077403694 = weight(_text_:semantic in 4709) [ClassicSimilarity], result of:
      0.077403694 = score(doc=4709,freq=2.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.36750638 = fieldWeight in 4709, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0625 = fieldNorm(doc=4709)
    0.0274525 = product of:
      0.054905 = sum of:
        0.054905 = weight(_text_:22 in 4709) [ClassicSimilarity], result of:
          0.054905 = score(doc=4709,freq=2.0), product of:
            0.17738704 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050655533 = queryNorm
            0.30952093 = fieldWeight in 4709, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=4709)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Date: 31. 7.1996 9:22:19

Lassalle, E.: Text retrieval : from a monolingual system to a multilingual system (1993) 0.06

0.064286895 = product of:
  0.09643034 = sum of:
    0.06772823 = weight(_text_:semantic in 7403) [ClassicSimilarity], result of:
      0.06772823 = score(doc=7403,freq=2.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.32156807 = fieldWeight in 7403, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7403)
    0.028702112 = product of:
      0.057404224 = sum of:
        0.057404224 = weight(_text_:indexing in 7403) [ClassicSimilarity], result of:
          0.057404224 = score(doc=7403,freq=2.0), product of:
            0.19390269 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.050655533 = queryNorm
            0.29604656 = fieldWeight in 7403, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0546875 = fieldNorm(doc=7403)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: Describes the TELMI monolingual text retrieval system and its future extension, a multilingual system. TELMI is designed for medium sized databases containing short texts. The characteristics of the system are fine-grained natural language processing (NLP); an open domain and a large scale knowledge base; automated indexing based on conceptual representation of texts and reusability of the NLP tools. Discusses the French MINITEL service, the MGS information service and the TELMI research system covering the full text system; NLP architecture; the lexical level; the syntactic level; the semantic level and an example of the use of a generic system

Wang, S.; Koopman, R.: Embed first, then predict (2019) 0.06

0.059278235 = product of:
  0.08891735 = sum of:
    0.06841584 = weight(_text_:semantic in 5400) [ClassicSimilarity], result of:
      0.06841584 = score(doc=5400,freq=4.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.32483283 = fieldWeight in 5400, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5400)
    0.020501507 = product of:
      0.041003015 = sum of:
        0.041003015 = weight(_text_:indexing in 5400) [ClassicSimilarity], result of:
          0.041003015 = score(doc=5400,freq=2.0), product of:
            0.19390269 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.050655533 = queryNorm
            0.21146181 = fieldWeight in 5400, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5400)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: Automatic subject prediction is a desirable feature for modern digital library systems, as manual indexing can no longer cope with the rapid growth of digital collections. It is also desirable to be able to identify a small set of entities (e.g., authors, citations, bibliographic records) which are most relevant to a query. This gets more difficult when the amount of data increases dramatically. Data sparsity and model scalability are the major challenges to solving this type of extreme multilabel classification problem automatically. In this paper, we propose to address this problem in two steps: we first embed different types of entities into the same semantic space, where similarity could be computed easily; second, we propose a novel non-parametric method to identify the most relevant entities in addition to direct semantic similarities. We show how effectively this approach predicts even very specialised subjects, which are associated with few documents in the training set and are more problematic for a classifier.

Medelyan, O.; Witten, I.H.: Domain-independent automatic keyphrase indexing with small training sets (2008) 0.06

0.055103056 = product of:
  0.08265458 = sum of:
    0.058052767 = weight(_text_:semantic in 1871) [ClassicSimilarity], result of:
      0.058052767 = score(doc=1871,freq=2.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.2756298 = fieldWeight in 1871, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.046875 = fieldNorm(doc=1871)
    0.02460181 = product of:
      0.04920362 = sum of:
        0.04920362 = weight(_text_:indexing in 1871) [ClassicSimilarity], result of:
          0.04920362 = score(doc=1871,freq=2.0), product of:
            0.19390269 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.050655533 = queryNorm
            0.2537542 = fieldWeight in 1871, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.046875 = fieldNorm(doc=1871)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: Keyphrases are widely used in both physical and digital libraries as a brief, but precise, summary of documents. They help organize material based on content, provide thematic access, represent search results, and assist with navigation. Manual assignment is expensive because trained human indexers must reach an understanding of the document and select appropriate descriptors according to defined cataloging rules. We propose a new method that enhances automatic keyphrase extraction by using semantic information about terms and phrases gleaned from a domain-specific thesaurus. The key advantage of the new approach is that it performs well with very little training data. We evaluate it on a large set of manually indexed documents in the domain of agriculture, compare its consistency with a group of six professional indexers, and explore its performance on smaller collections of documents in other domains and of French and Spanish documents.

Ward, M.L.: ¬The future of the human indexer (1996) 0.05
```
0.053900834 = product of:
  0.1617025 = sum of:
    0.1617025 = sum of:
      0.12052374 = weight(_text_:indexing in 7244) [ClassicSimilarity], result of:
        0.12052374 = score(doc=7244,freq=12.0), product of:
          0.19390269 = queryWeight, product of:
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.050655533 = queryNorm
          0.6215682 = fieldWeight in 7244, product of:
            3.4641016 = tf(freq=12.0), with freq of:
              12.0 = termFreq=12.0
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.046875 = fieldNorm(doc=7244)
      0.04117875 = weight(_text_:22 in 7244) [ClassicSimilarity], result of:
        0.04117875 = score(doc=7244,freq=2.0), product of:
          0.17738704 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.050655533 = queryNorm
          0.23214069 = fieldWeight in 7244, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=7244)
  0.33333334 = coord(1/3)
```
Abstract

Considers the principles of indexing and the intellectual skills involved in order to determine what automatic indexing systems would be required in order to supplant or complement the human indexer. Good indexing requires: considerable prior knowledge of the literature; judgement as to what to index and what depth to index; reading skills; abstracting skills; and classification skills, Illustrates these features with a detailed description of abstracting and indexing processes involved in generating entries for the mechanical engineering database POWERLINK. Briefly assesses the possibility of replacing human indexers with specialist indexing software, with particular reference to the Object Analyzer from the InTEXT automatic indexing system and using the criteria described for human indexers. At present, it is unlikely that the automatic indexer will replace the human indexer, but when more primary texts are available in electronic form, it may be a useful productivity tool for dealing with large quantities of low grade texts (should they be wanted in the database)

Date

9. 2.1997 18:44:22
Smiraglia, R.P.; Cai, X.: Tracking the evolution of clustering, machine learning, automatic indexing and automatic classification in knowledge organization (2017) 0.05
```
0.051580545 = product of:
  0.077370815 = sum of:
    0.048377305 = weight(_text_:semantic in 3627) [ClassicSimilarity], result of:
      0.048377305 = score(doc=3627,freq=2.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.22969149 = fieldWeight in 3627, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3627)
    0.02899351 = product of:
      0.05798702 = sum of:
        0.05798702 = weight(_text_:indexing in 3627) [ClassicSimilarity], result of:
          0.05798702 = score(doc=3627,freq=4.0), product of:
            0.19390269 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.050655533 = queryNorm
            0.29905218 = fieldWeight in 3627, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3627)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

A very important extension of the traditional domain of knowledge organization (KO) arises from attempts to incorporate techniques devised in the computer science domain for automatic concept extraction and for grouping, categorizing, clustering and otherwise organizing knowledge using mechanical means. Four specific terms have emerged to identify the most prevalent techniques: machine learning, clustering, automatic indexing, and automatic classification. Our study presents three domain analytical case analyses in search of answers. The first case relies on citations located using the ISKO-supported "Knowledge Organization Bibliography." The second case relies on works in both Web of Science and SCOPUS. Case three applies co-word analysis and citation analysis to the contents of the papers in the present special issue. We observe scholars involved in "clustering" and "automatic classification" who share common thematic emphases. But we have found no coherence, no common activity and no social semantics. We have not found a research front, or a common teleology within the KO domain. We also have found a lively group of authors who have succeeded in submitting papers to this special issue, and their work quite interestingly aligns with the case studies we report. There is an emphasis on KO for information retrieval; there is much work on clustering (which involves conceptual points within texts) and automatic classification (which involves semantic groupings at the meta-document level).

Search (142 results, page 1 of 8)

Authors

Years

Types

Themes