Search (174 results, page 1 of 9)

Faraj, N.: Analyse d'une methode d'indexation automatique basée sur une analyse syntaxique de texte (1996) 0.13

0.12597175 = product of:
  0.18895762 = sum of:
    0.14345708 = weight(_text_:systematic in 685) [ClassicSimilarity], result of:
      0.14345708 = score(doc=685,freq=2.0), product of:
        0.28397155 = queryWeight, product of:
          5.715473 = idf(docFreq=395, maxDocs=44218)
          0.049684696 = queryNorm
        0.5051812 = fieldWeight in 685, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.715473 = idf(docFreq=395, maxDocs=44218)
          0.0625 = fieldNorm(doc=685)
    0.045500536 = product of:
      0.09100107 = sum of:
        0.09100107 = weight(_text_:indexing in 685) [ClassicSimilarity], result of:
          0.09100107 = score(doc=685,freq=4.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.47848347 = fieldWeight in 685, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0625 = fieldNorm(doc=685)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: Evaluates an automatic indexing method based on syntactical text analysis combined with statistical analysis. Tests many combinations for the choice of term categories and weighting methods. The experiment, conducted on a software engineering corpus, shows systematic improvement in the use of syntactic term phrases compared to using only individual words as index terms
Footnote: Übers. d. Titels: Analysis of an automatic indexing method based on syntactic analysis of text

Golub, K.; Soergel, D.; Buchanan, G.; Tudhope, D.; Lykke, M.; Hiom, D.: ¬A framework for evaluating automatic indexing or classification in the context of retrieval (2016) 0.10
```
0.09524199 = product of:
  0.14286299 = sum of:
    0.08966068 = weight(_text_:systematic in 3311) [ClassicSimilarity], result of:
      0.08966068 = score(doc=3311,freq=2.0), product of:
        0.28397155 = queryWeight, product of:
          5.715473 = idf(docFreq=395, maxDocs=44218)
          0.049684696 = queryNorm
        0.31573826 = fieldWeight in 3311, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.715473 = idf(docFreq=395, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3311)
    0.053202312 = product of:
      0.106404625 = sum of:
        0.106404625 = weight(_text_:indexing in 3311) [ClassicSimilarity], result of:
          0.106404625 = score(doc=3311,freq=14.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.55947536 = fieldWeight in 3311, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3311)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Tools for automatic subject assignment help deal with scale and sustainability in creating and enriching metadata, establishing more connections across and between resources and enhancing consistency. Although some software vendors and experimental researchers claim the tools can replace manual subject indexing, hard scientific evidence of their performance in operating information environments is scarce. A major reason for this is that research is usually conducted in laboratory conditions, excluding the complexities of real-life systems and situations. The article reviews and discusses issues with existing evaluation approaches such as problems of aboutness and relevance assessments, implying the need to use more than a single "gold standard" method when evaluating indexing and retrieval, and proposes a comprehensive evaluation framework. The framework is informed by a systematic review of the literature on evaluation approaches: evaluating indexing quality directly through assessment by an evaluator or through comparison with a gold standard, evaluating the quality of computer-assisted indexing directly in the context of an indexing workflow, and evaluating indexing quality indirectly through analyzing retrieval performance.

Hlava, M.M.K.: Automatic indexing : comparing rule-based and statistics-based indexing systems (2005) 0.08

0.08449805 = product of:
  0.25349414 = sum of:
    0.25349414 = sum of:
      0.15925187 = weight(_text_:indexing in 6265) [ClassicSimilarity], result of:
        0.15925187 = score(doc=6265,freq=4.0), product of:
          0.19018644 = queryWeight, product of:
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.049684696 = queryNorm
          0.8373461 = fieldWeight in 6265, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.109375 = fieldNorm(doc=6265)
      0.09424227 = weight(_text_:22 in 6265) [ClassicSimilarity], result of:
        0.09424227 = score(doc=6265,freq=2.0), product of:
          0.17398734 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.049684696 = queryNorm
          0.5416616 = fieldWeight in 6265, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.109375 = fieldNorm(doc=6265)
  0.33333334 = coord(1/3)

Source: Information outlook. 9(2005) no.8, S.22-23

Suominen, O.; Koskenniemi, I.: Annif Analyzer Shootout : comparing text lemmatization methods for automated subject indexing (2022) 0.08
```
0.08299318 = product of:
  0.12448977 = sum of:
    0.08966068 = weight(_text_:systematic in 658) [ClassicSimilarity], result of:
      0.08966068 = score(doc=658,freq=2.0), product of:
        0.28397155 = queryWeight, product of:
          5.715473 = idf(docFreq=395, maxDocs=44218)
          0.049684696 = queryNorm
        0.31573826 = fieldWeight in 658, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.715473 = idf(docFreq=395, maxDocs=44218)
          0.0390625 = fieldNorm(doc=658)
    0.03482909 = product of:
      0.06965818 = sum of:
        0.06965818 = weight(_text_:indexing in 658) [ClassicSimilarity], result of:
          0.06965818 = score(doc=658,freq=6.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.3662626 = fieldWeight in 658, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0390625 = fieldNorm(doc=658)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Automated text classification is an important function for many AI systems relevant to libraries, including automated subject indexing and classification. When implemented using the traditional natural language processing (NLP) paradigm, one key part of the process is the normalization of words using stemming or lemmatization, which reduces the amount of linguistic variation and often improves the quality of classification. In this paper, we compare the output of seven different text lemmatization algorithms as well as two baseline methods. We measure how the choice of method affects the quality of text classification using example corpora in three languages. The experiments have been performed using the open source Annif toolkit for automated subject indexing and classification, but should generalize also to other NLP toolkits and similar text classification tasks. The results show that lemmatization methods in most cases outperform baseline methods in text classification particularly for Finnish and Swedish text, but not English, where baseline methods are most effective. The differences between lemmatization methods are quite small. The systematic comparison will help optimize text classification pipelines and inform the further development of the Annif toolkit to incorporate a wider choice of normalization methods.
Ward, M.L.: ¬The future of the human indexer (1996) 0.05
```
0.052867796 = product of:
  0.15860339 = sum of:
    0.15860339 = sum of:
      0.11821385 = weight(_text_:indexing in 7244) [ClassicSimilarity], result of:
        0.11821385 = score(doc=7244,freq=12.0), product of:
          0.19018644 = queryWeight, product of:
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.049684696 = queryNorm
          0.6215682 = fieldWeight in 7244, product of:
            3.4641016 = tf(freq=12.0), with freq of:
              12.0 = termFreq=12.0
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.046875 = fieldNorm(doc=7244)
      0.04038954 = weight(_text_:22 in 7244) [ClassicSimilarity], result of:
        0.04038954 = score(doc=7244,freq=2.0), product of:
          0.17398734 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.049684696 = queryNorm
          0.23214069 = fieldWeight in 7244, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=7244)
  0.33333334 = coord(1/3)
```
Abstract

Considers the principles of indexing and the intellectual skills involved in order to determine what automatic indexing systems would be required in order to supplant or complement the human indexer. Good indexing requires: considerable prior knowledge of the literature; judgement as to what to index and what depth to index; reading skills; abstracting skills; and classification skills, Illustrates these features with a detailed description of abstracting and indexing processes involved in generating entries for the mechanical engineering database POWERLINK. Briefly assesses the possibility of replacing human indexers with specialist indexing software, with particular reference to the Object Analyzer from the InTEXT automatic indexing system and using the criteria described for human indexers. At present, it is unlikely that the automatic indexer will replace the human indexer, but when more primary texts are available in electronic form, it may be a useful productivity tool for dealing with large quantities of low grade texts (should they be wanted in the database)

Date

9. 2.1997 18:44:22

Biebricher, N.; Fuhr, N.; Lustig, G.; Schwantner, M.; Knorz, G.: ¬The automatic indexing system AIR/PHYS : from research to application (1988) 0.05

0.04925008 = product of:
  0.14775024 = sum of:
    0.14775024 = sum of:
      0.08043434 = weight(_text_:indexing in 1952) [ClassicSimilarity], result of:
        0.08043434 = score(doc=1952,freq=2.0), product of:
          0.19018644 = queryWeight, product of:
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.049684696 = queryNorm
          0.42292362 = fieldWeight in 1952, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.078125 = fieldNorm(doc=1952)
      0.06731591 = weight(_text_:22 in 1952) [ClassicSimilarity], result of:
        0.06731591 = score(doc=1952,freq=2.0), product of:
          0.17398734 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.049684696 = queryNorm
          0.38690117 = fieldWeight in 1952, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.078125 = fieldNorm(doc=1952)
  0.33333334 = coord(1/3)

Date: 16. 8.1998 12:51:22

Kutschekmanesch, S.; Lutes, B.; Moelle, K.; Thiel, U.; Tzeras, K.: Automated multilingual indexing : a synthesis of rule-based and thesaurus-based methods (1998) 0.05

0.04925008 = product of:
  0.14775024 = sum of:
    0.14775024 = sum of:
      0.08043434 = weight(_text_:indexing in 4157) [ClassicSimilarity], result of:
        0.08043434 = score(doc=4157,freq=2.0), product of:
          0.19018644 = queryWeight, product of:
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.049684696 = queryNorm
          0.42292362 = fieldWeight in 4157, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.078125 = fieldNorm(doc=4157)
      0.06731591 = weight(_text_:22 in 4157) [ClassicSimilarity], result of:
        0.06731591 = score(doc=4157,freq=2.0), product of:
          0.17398734 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.049684696 = queryNorm
          0.38690117 = fieldWeight in 4157, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.078125 = fieldNorm(doc=4157)
  0.33333334 = coord(1/3)

Source: Information und Märkte: 50. Deutscher Dokumentartag 1998, Kongreß der Deutschen Gesellschaft für Dokumentation e.V. (DGD), Rheinische Friedrich-Wilhelms-Universität Bonn, 22.-24. September 1998. Hrsg. von Marlies Ockenfeld u. Gerhard J. Mantwill

Tsareva, P.V.: Algoritmy dlya raspoznavaniya pozitivnykh i negativnykh vkhozdenii deskriptorov v tekst i protsedura avtomaticheskoi klassifikatsii tekstov (1999) 0.05

0.04925008 = product of:
  0.14775024 = sum of:
    0.14775024 = sum of:
      0.08043434 = weight(_text_:indexing in 374) [ClassicSimilarity], result of:
        0.08043434 = score(doc=374,freq=2.0), product of:
          0.19018644 = queryWeight, product of:
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.049684696 = queryNorm
          0.42292362 = fieldWeight in 374, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.078125 = fieldNorm(doc=374)
      0.06731591 = weight(_text_:22 in 374) [ClassicSimilarity], result of:
        0.06731591 = score(doc=374,freq=2.0), product of:
          0.17398734 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.049684696 = queryNorm
          0.38690117 = fieldWeight in 374, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.078125 = fieldNorm(doc=374)
  0.33333334 = coord(1/3)

Date: 1. 4.2002 10:22:41
Footnote: Übers. des Titels: Algorithms for selection of positive and negative descriptors from text and automated text indexing

Stankovic, R. et al.: Indexing of textual databases based on lexical resources : a case study for Serbian (2016) 0.05

0.04925008 = product of:
  0.14775024 = sum of:
    0.14775024 = sum of:
      0.08043434 = weight(_text_:indexing in 2759) [ClassicSimilarity], result of:
        0.08043434 = score(doc=2759,freq=2.0), product of:
          0.19018644 = queryWeight, product of:
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.049684696 = queryNorm
          0.42292362 = fieldWeight in 2759, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.078125 = fieldNorm(doc=2759)
      0.06731591 = weight(_text_:22 in 2759) [ClassicSimilarity], result of:
        0.06731591 = score(doc=2759,freq=2.0), product of:
          0.17398734 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.049684696 = queryNorm
          0.38690117 = fieldWeight in 2759, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.078125 = fieldNorm(doc=2759)
  0.33333334 = coord(1/3)

Date: 1. 2.2016 18:25:22

Bordoni, L.; Pazienza, M.T.: Documents automatic indexing in an environmental domain (1997) 0.04

0.042249024 = product of:
  0.12674707 = sum of:
    0.12674707 = sum of:
      0.079625934 = weight(_text_:indexing in 530) [ClassicSimilarity], result of:
        0.079625934 = score(doc=530,freq=4.0), product of:
          0.19018644 = queryWeight, product of:
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.049684696 = queryNorm
          0.41867304 = fieldWeight in 530, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.0546875 = fieldNorm(doc=530)
      0.047121134 = weight(_text_:22 in 530) [ClassicSimilarity], result of:
        0.047121134 = score(doc=530,freq=2.0), product of:
          0.17398734 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.049684696 = queryNorm
          0.2708308 = fieldWeight in 530, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=530)
  0.33333334 = coord(1/3)

Abstract: Describes an application of Natural Language Processing (NLP) techniques, in HIRMA (Hypertextual Information Retrieval Managed by ARIOSTO), to the problem of document indexing by referring to a system which incorporates natural language processing techniques to determine the subject of the text of documents and to associate them with relevant semantic indexes. Describes briefly the overall system, details of its implementation on a corpus of scientific abstracts related to environmental topics and experimental evidence of the system's behaviour. Analyzes in detail an experiment designed to evaluate the system's retrieval ability in terms of recall and precision
Source: International forum on information and documentation. 22(1997) no.1, S.17-28

Milstead, J.L.: Thesauri in a full-text world (1998) 0.03

0.030177874 = product of:
  0.09053362 = sum of:
    0.09053362 = sum of:
      0.05687567 = weight(_text_:indexing in 2337) [ClassicSimilarity], result of:
        0.05687567 = score(doc=2337,freq=4.0), product of:
          0.19018644 = queryWeight, product of:
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.049684696 = queryNorm
          0.29905218 = fieldWeight in 2337, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.0390625 = fieldNorm(doc=2337)
      0.033657953 = weight(_text_:22 in 2337) [ClassicSimilarity], result of:
        0.033657953 = score(doc=2337,freq=2.0), product of:
          0.17398734 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.049684696 = queryNorm
          0.19345059 = fieldWeight in 2337, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=2337)
  0.33333334 = coord(1/3)

Abstract: Despite early claims to the contemporary, thesauri continue to find use as access tools for information in the full-text environment. Their mode of use is changing, but this change actually represents an expansion rather than a contrdiction of their utility. Thesauri and similar vocabulary tools can complement full-text access by aiding users in focusing their searches, by supplementing the linguistic analysis of the text search engine, and even by serving as one of the tools used by the linguistic engine for its analysis. While human indexing contunues to be used for many databases, the trend is to increase the use of machine aids for this purpose. All machine-aided indexing (MAI) systems rely on thesauri as the basis for term selection. In the 21st century, the balance of effort between human and machine will change at both input and output, but thesauri will continue to play an important role for the foreseeable future
Date: 22. 9.1997 19:16:05

Mesquita, L.A.P.; Souza, R.R.; Baracho Porto, R.M.A.: Noun phrases in automatic indexing: : a structural analysis of the distribution of relevant terms in doctoral theses (2014) 0.03
```
0.027550971 = product of:
  0.08265291 = sum of:
    0.08265291 = sum of:
      0.055726547 = weight(_text_:indexing in 1442) [ClassicSimilarity], result of:
        0.055726547 = score(doc=1442,freq=6.0), product of:
          0.19018644 = queryWeight, product of:
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.049684696 = queryNorm
          0.2930101 = fieldWeight in 1442, product of:
            2.4494898 = tf(freq=6.0), with freq of:
              6.0 = termFreq=6.0
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.03125 = fieldNorm(doc=1442)
      0.026926363 = weight(_text_:22 in 1442) [ClassicSimilarity], result of:
        0.026926363 = score(doc=1442,freq=2.0), product of:
          0.17398734 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.049684696 = queryNorm
          0.15476047 = fieldWeight in 1442, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.03125 = fieldNorm(doc=1442)
  0.33333334 = coord(1/3)
```
Abstract

The main objective of this research was to analyze whether there was a characteristic distribution behavior of relevant terms over a scientific text that could contribute as a criterion for their process of automatic indexing. The terms considered in this study were only full noun phrases contained in the texts themselves. The texts were considered a total of 98 doctoral theses of the eight areas of knowledge in a same university. Initially, 20 full noun phrases were automatically extracted from each text as candidates to be the most relevant terms, and each author of each text assigned a relevance value 0-6 (not relevant and highly relevant, respectively) for each of the 20 noun phrases sent. Only, 22.1 % of noun phrases were considered not relevant. A relevance values of the terms assigned by the authors were associated with their positions in the text. Each full noun phrases found in the text was considered as a valid linear position. The results that were obtained showed values resulting from this distribution by considering two types of position: linear, with values consolidated into ten equal consecutive parts; and structural, considering parts of the text (such as introduction, development and conclusion). As a result of considerable importance, all areas of knowledge related to the Natural Sciences showed a characteristic behavior in the distribution of relevant terms, as well as all areas of knowledge related to Social Sciences showed the same characteristic behavior of distribution, but distinct from the Natural Sciences. The difference of the distribution behavior between the Natural and Social Sciences can be clearly visualized through graphs. All behaviors, including the general behavior of all areas of knowledge together, were characterized in polynomial equations and can be applied in future as criteria for automatic indexing. Until the present date this work has become inedited of for two reasons: to present a method for characterizing the distribution of relevant terms in a scientific text, and also, through this method, pointing out a quantitative trait difference between the Natural and Social Sciences.

Source

Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik
Humphrey, S.M.: Automatic indexing of documents from journal descriptors : a preliminary investigation (1999) 0.03
```
0.025435574 = product of:
  0.07630672 = sum of:
    0.07630672 = product of:
      0.15261345 = sum of:
        0.15261345 = weight(_text_:indexing in 3769) [ClassicSimilarity], result of:
          0.15261345 = score(doc=3769,freq=20.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.80244124 = fieldWeight in 3769, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.046875 = fieldNorm(doc=3769)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

A new, fully automated approach for indedexing documents is presented based on associating textwords in a training set of bibliographic citations with the indexing of journals. This journal-level indexing is in the form of a consistent, timely set of journal descriptors (JDs) indexing the individual journals themselves. This indexing is maintained in journal records in a serials authority database. The advantage of this novel approach is that the training set does not depend on previous manual indexing of thousands of documents (i.e., any such indexing already in the training set is not used), but rather the relatively small intellectual effort of indexing at the journal level, usually a matter of a few thousand unique journals for which retrospective indexing to maintain consistency and currency may be feasible. If successful, JD indexing would provide topical categorization of documents outside the training set, i.e., journal articles, monographs, Web documents, reports from the grey literature, etc., and therefore be applied in searching. Because JDs are quite general, corresponding to subject domains, their most problable use would be for improving or refining search results
Wan, T.-L.; Evens, M.; Wan, Y.-W.; Pao, Y.-Y.: Experiments with automatic indexing and a relational thesaurus in a Chinese information retrieval system (1997) 0.02
```
0.024827747 = product of:
  0.07448324 = sum of:
    0.07448324 = product of:
      0.14896648 = sum of:
        0.14896648 = weight(_text_:indexing in 956) [ClassicSimilarity], result of:
          0.14896648 = score(doc=956,freq=14.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.78326553 = fieldWeight in 956, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0546875 = fieldNorm(doc=956)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

This article describes a series of experiments with an interactive Chinese information retrieval system named CIRS and an interactive relational thesaurus. 2 important issues have been explored: whether thesauri enhance the retrieval effectiveness of Chinese documents, and whether automatic indexing can complete with manual indexing in a Chinese information retrieval system. Recall and precision are used to measure and evaluate the effectiveness of the system. Statistical analysis of the recall and precision measures suggest that the use of the relational thesaurus does improve the retrieval effectiveness both in the automatic indexing environment and in the manual indexing environment and that automatic indexing is at least as good as manual indexing

Plaunt, C.; Norgard, B.A.: ¬An association-based method for automatic indexing with a controlled vocabulary (1998) 0.02

0.02462504 = product of:
  0.07387512 = sum of:
    0.07387512 = sum of:
      0.04021717 = weight(_text_:indexing in 1794) [ClassicSimilarity], result of:
        0.04021717 = score(doc=1794,freq=2.0), product of:
          0.19018644 = queryWeight, product of:
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.049684696 = queryNorm
          0.21146181 = fieldWeight in 1794, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1794)
      0.033657953 = weight(_text_:22 in 1794) [ClassicSimilarity], result of:
        0.033657953 = score(doc=1794,freq=2.0), product of:
          0.17398734 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.049684696 = queryNorm
          0.19345059 = fieldWeight in 1794, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1794)
  0.33333334 = coord(1/3)

Date: 11. 9.2000 19:53:22

Bloomfield, M.: Indexing : neglected and poorly understood (2001) 0.02
```
0.024130303 = product of:
  0.07239091 = sum of:
    0.07239091 = product of:
      0.14478181 = sum of:
        0.14478181 = weight(_text_:indexing in 5439) [ClassicSimilarity], result of:
          0.14478181 = score(doc=5439,freq=18.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.76126254 = fieldWeight in 5439, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.046875 = fieldNorm(doc=5439)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

The growth of the Internet has highlighted the use of machine indexing. The difficulties in using the Internet as a searching device can be frustrating. The use of the term "Python" is given as an example. Machine indexing is noted as "rotten" and human indexing as "capricious." The problem seems to be a lack of a theoretical foundation for the art of indexing. What librarians have learned over the last hundred years has yet to yield a consistent approach to what really works best in preparing index terms and in the ability of our customers to search the various indexes. An attempt is made to consider the elements of indexing, their pros and cons. The argument is made that machine indexing is far too prolific in its production of index terms. Neither librarians nor computer programmers have made much progress to improve Internet indexing. Human indexing has had the same problems for over fifty years.
Gil-Leiva, I.: SISA-automatic indexing system for scientific articles : experiments with location heuristics rules versus TF-IDF rules (2017) 0.02
```
0.024130303 = product of:
  0.07239091 = sum of:
    0.07239091 = product of:
      0.14478181 = sum of:
        0.14478181 = weight(_text_:indexing in 3622) [ClassicSimilarity], result of:
          0.14478181 = score(doc=3622,freq=18.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.76126254 = fieldWeight in 3622, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.046875 = fieldNorm(doc=3622)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

Indexing is contextualized and a brief description is provided of some of the most used automatic indexing systems. We describe SISA, a system which uses location heuristics rules, statistical rules like term frequency (TF) or TF-IDF to obtain automatic or semi-automatic indexing, depending on the user's preference. The aim of this research is to ascertain which rules (location heuristics rules or TF-IDF rules) provide the best indexing terms. SISA is used to obtain the automatic indexing of 200 scientific articles on fruit growing written in Portuguese. It uses, on the one hand, location heuristics rules founded on the value of certain parts of the articles for indexing such as titles, abstracts, keywords, headings, first paragraph, conclusions and references and, on the other, TF-IDF rules. The indexing is then evaluated to ascertain retrieval performance through recall, precision and f-measure. Automatic indexing of the articles with location heuristics rules provided the best results with the evaluation measures.
Kim, P.K.: ¬An automatic indexing of compound words based on mutual information for Korean text retrieval (1995) 0.02
```
0.02398089 = product of:
  0.071942665 = sum of:
    0.071942665 = product of:
      0.14388533 = sum of:
        0.14388533 = weight(_text_:indexing in 620) [ClassicSimilarity], result of:
          0.14388533 = score(doc=620,freq=10.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.7565488 = fieldWeight in 620, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0625 = fieldNorm(doc=620)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

Presents an automatic indexing technique for compound words suitable for an agglutinative language, specifically Korean. Discusses some construction conditions for compound words and the rules for decomposing compound words to enhance the exhaustivity of indexing, demonstrating that this system, mutual information, enhances both the exhaustivity of indexing and the specifity of terms. Suggests that the construction conditions and rules for decomposition presented may be used in multilingual information retrieval systems to translate the indexing terms of the specific language into those of the language required

Li, Z.: Research on dynamic morphological indexing (1998) 0.02

0.02398089 = product of:
  0.071942665 = sum of:
    0.071942665 = product of:
      0.14388533 = sum of:
        0.14388533 = weight(_text_:indexing in 3242) [ClassicSimilarity], result of:
          0.14388533 = score(doc=3242,freq=10.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.7565488 = fieldWeight in 3242, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0625 = fieldNorm(doc=3242)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Abstract: Notes that in automatic indexing of Chinese words using dictionary matching methods, there is some difficulty in the indexing of proper nouns. Presents a solution called dynamic morphological indexing, based on work using automatic indexing of archive documents. Presents the algorithm for this solution

Hodge, G.M.: Computer-assisted database indexing : the state-of-the-art (1994) 0.02

0.023219395 = product of:
  0.06965818 = sum of:
    0.06965818 = product of:
      0.13931637 = sum of:
        0.13931637 = weight(_text_:indexing in 7936) [ClassicSimilarity], result of:
          0.13931637 = score(doc=7936,freq=6.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.7325252 = fieldWeight in 7936, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.078125 = fieldNorm(doc=7936)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Abstract: Discusses the state-of-the art of computer indexing, defines indexing and computer assistance, describes the reasons for renewed interest. Identifies the types of computer support in use using selected operational systems, describes the integration of various computer supports in one databases production system, and speculates on the future

Search (174 results, page 1 of 9)

Authors

Years

Languages

Types

Themes

Subjects

Classifications