Search (52 results, page 1 of 3)

Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.01

0.009295925 = product of:
  0.0371837 = sum of:
    0.0371837 = product of:
      0.0743674 = sum of:
        0.0743674 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
          0.0743674 = score(doc=402,freq=2.0), product of:
            0.120133065 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0343058 = queryNorm
            0.61904186 = fieldWeight in 402, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.125 = fieldNorm(doc=402)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Source: Information processing and management. 22(1986) no.6, S.465-476

Leung, C.-H.; Kan, W.-K.: ¬A statistical learning approach to automatic indexing of controlled index terms (1997) 0.01
```
0.008960307 = product of:
  0.035841227 = sum of:
    0.035841227 = product of:
      0.1433649 = sum of:
        0.1433649 = weight(_text_:learning in 6497) [ClassicSimilarity], result of:
          0.1433649 = score(doc=6497,freq=20.0), product of:
            0.15317118 = queryWeight, product of:
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.0343058 = queryNorm
            0.9359783 = fieldWeight in 6497, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.046875 = fieldNorm(doc=6497)
      0.25 = coord(1/4)
  0.25 = coord(1/4)
```
Abstract

A statistical learning approach to assigning controlled index terms is presented. In this approach, there are two processes: (1) the learning process and (2) the indexing process. The learning process constructs a relationship between an index term and the words relevant and irrelevant to it, based on the positive training set and negative training set, and those not indexed by it, respectively. The indexing process determines whether an index term is assigned to a certain document, based on the relationship constructed by the learning process, and the text found in the document. Furthermore, a learning feedback technique is introduced. This technique used in the learning process modifies the relationship between an index term and its relevant and irrelevant words to improve the learning performance and, thus, the indexing performance. Experimental results have shown that the statistical learning approach and the learning feedback technique are practical means to automatic indexing of controlled index terms
Martins, A.L.; Souza, R.R.; Ribeiro de Mello, H.: ¬The use of noun phrases in information retrieval : proposing a mechanism for automatic classification (2014) 0.01
```
0.008425959 = product of:
  0.016851919 = sum of:
    0.0075559937 = product of:
      0.030223975 = sum of:
        0.030223975 = weight(_text_:learning in 1441) [ClassicSimilarity], result of:
          0.030223975 = score(doc=1441,freq=2.0), product of:
            0.15317118 = queryWeight, product of:
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.0343058 = queryNorm
            0.19732155 = fieldWeight in 1441, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.03125 = fieldNorm(doc=1441)
      0.25 = coord(1/4)
    0.009295925 = product of:
      0.01859185 = sum of:
        0.01859185 = weight(_text_:22 in 1441) [ClassicSimilarity], result of:
          0.01859185 = score(doc=1441,freq=2.0), product of:
            0.120133065 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0343058 = queryNorm
            0.15476047 = fieldWeight in 1441, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=1441)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

This paper presents a research on syntactic structures known as noun phrases (NP) being applied to increase the effectiveness and efficiency of the mechanisms for the document's classification. Our hypothesis is the fact that the NP can be used instead of single words as a semantic aggregator to reduce the number of words that will be used for the classification system without losing its semantic coverage, increasing its efficiency. The experiment divided the documents classification process in three phases: a) NP preprocessing b) system training; and c) classification experiments. In the first step, a corpus of digitalized texts was submitted to a natural language processing platform1 in which the part-of-speech tagging was done, and them PERL scripts pertaining to the PALAVRAS package were used to extract the Noun Phrases. The preprocessing also involved the tasks of a) removing NP low meaning pre-modifiers, as quantifiers; b) identification of synonyms and corresponding substitution for common hyperonyms; and c) stemming of the relevant words contained in the NP, for similitude checking with other NPs. The first tests with the resulting documents have demonstrated its effectiveness. We have compared the structural similarity of the documents before and after the whole pre-processing steps of phase one. The texts maintained the consistency with the original and have kept the readability. The second phase involves submitting the modified documents to a SVM algorithm to identify clusters and classify the documents. The classification rules are to be established using a machine learning approach. Finally, tests will be conducted to check the effectiveness of the whole process.

Source

Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik

Fuhr, N.; Niewelt, B.: ¬Ein Retrievaltest mit automatisch indexierten Dokumenten (1984) 0.01

0.008133934 = product of:
  0.032535736 = sum of:
    0.032535736 = product of:
      0.06507147 = sum of:
        0.06507147 = weight(_text_:22 in 262) [ClassicSimilarity], result of:
          0.06507147 = score(doc=262,freq=2.0), product of:
            0.120133065 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0343058 = queryNorm
            0.5416616 = fieldWeight in 262, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=262)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 20.10.2000 12:22:23

Hlava, M.M.K.: Automatic indexing : comparing rule-based and statistics-based indexing systems (2005) 0.01

0.008133934 = product of:
  0.032535736 = sum of:
    0.032535736 = product of:
      0.06507147 = sum of:
        0.06507147 = weight(_text_:22 in 6265) [ClassicSimilarity], result of:
          0.06507147 = score(doc=6265,freq=2.0), product of:
            0.120133065 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0343058 = queryNorm
            0.5416616 = fieldWeight in 6265, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=6265)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Source: Information outlook. 9(2005) no.8, S.22-23

Fuhr, N.: Ranking-Experimente mit gewichteter Indexierung (1986) 0.01

0.006971943 = product of:
  0.027887773 = sum of:
    0.027887773 = product of:
      0.055775546 = sum of:
        0.055775546 = weight(_text_:22 in 58) [ClassicSimilarity], result of:
          0.055775546 = score(doc=58,freq=2.0), product of:
            0.120133065 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0343058 = queryNorm
            0.46428138 = fieldWeight in 58, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=58)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 14. 6.2015 22:12:44

Hauer, M.: Automatische Indexierung (2000) 0.01

0.006971943 = product of:
  0.027887773 = sum of:
    0.027887773 = product of:
      0.055775546 = sum of:
        0.055775546 = weight(_text_:22 in 5887) [ClassicSimilarity], result of:
          0.055775546 = score(doc=5887,freq=2.0), product of:
            0.120133065 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0343058 = queryNorm
            0.46428138 = fieldWeight in 5887, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=5887)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Source: Wissen in Aktion: Wege des Knowledge Managements. 22. Online-Tagung der DGI, Frankfurt am Main, 2.-4.5.2000. Proceedings. Hrsg.: R. Schmidt

Fuhr, N.: Rankingexperimente mit gewichteter Indexierung (1986) 0.01

0.006971943 = product of:
  0.027887773 = sum of:
    0.027887773 = product of:
      0.055775546 = sum of:
        0.055775546 = weight(_text_:22 in 2051) [ClassicSimilarity], result of:
          0.055775546 = score(doc=2051,freq=2.0), product of:
            0.120133065 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0343058 = queryNorm
            0.46428138 = fieldWeight in 2051, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=2051)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 14. 6.2015 22:12:56

Hauer, M.: Tiefenindexierung im Bibliothekskatalog : 17 Jahre intelligentCAPTURE (2019) 0.01

0.006971943 = product of:
  0.027887773 = sum of:
    0.027887773 = product of:
      0.055775546 = sum of:
        0.055775546 = weight(_text_:22 in 5629) [ClassicSimilarity], result of:
          0.055775546 = score(doc=5629,freq=2.0), product of:
            0.120133065 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0343058 = queryNorm
            0.46428138 = fieldWeight in 5629, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=5629)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Source: B.I.T.online. 22(2019) H.2, S.163-166

Biebricher, N.; Fuhr, N.; Lustig, G.; Schwantner, M.; Knorz, G.: ¬The automatic indexing system AIR/PHYS : from research to application (1988) 0.01

0.005809953 = product of:
  0.023239812 = sum of:
    0.023239812 = product of:
      0.046479624 = sum of:
        0.046479624 = weight(_text_:22 in 1952) [ClassicSimilarity], result of:
          0.046479624 = score(doc=1952,freq=2.0), product of:
            0.120133065 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0343058 = queryNorm
            0.38690117 = fieldWeight in 1952, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=1952)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 16. 8.1998 12:51:22

Kutschekmanesch, S.; Lutes, B.; Moelle, K.; Thiel, U.; Tzeras, K.: Automated multilingual indexing : a synthesis of rule-based and thesaurus-based methods (1998) 0.01

0.005809953 = product of:
  0.023239812 = sum of:
    0.023239812 = product of:
      0.046479624 = sum of:
        0.046479624 = weight(_text_:22 in 4157) [ClassicSimilarity], result of:
          0.046479624 = score(doc=4157,freq=2.0), product of:
            0.120133065 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0343058 = queryNorm
            0.38690117 = fieldWeight in 4157, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=4157)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Source: Information und Märkte: 50. Deutscher Dokumentartag 1998, Kongreß der Deutschen Gesellschaft für Dokumentation e.V. (DGD), Rheinische Friedrich-Wilhelms-Universität Bonn, 22.-24. September 1998. Hrsg. von Marlies Ockenfeld u. Gerhard J. Mantwill

Tsareva, P.V.: Algoritmy dlya raspoznavaniya pozitivnykh i negativnykh vkhozdenii deskriptorov v tekst i protsedura avtomaticheskoi klassifikatsii tekstov (1999) 0.01

0.005809953 = product of:
  0.023239812 = sum of:
    0.023239812 = product of:
      0.046479624 = sum of:
        0.046479624 = weight(_text_:22 in 374) [ClassicSimilarity], result of:
          0.046479624 = score(doc=374,freq=2.0), product of:
            0.120133065 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0343058 = queryNorm
            0.38690117 = fieldWeight in 374, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=374)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 1. 4.2002 10:22:41

Stankovic, R. et al.: Indexing of textual databases based on lexical resources : a case study for Serbian (2016) 0.01

0.005809953 = product of:
  0.023239812 = sum of:
    0.023239812 = product of:
      0.046479624 = sum of:
        0.046479624 = weight(_text_:22 in 2759) [ClassicSimilarity], result of:
          0.046479624 = score(doc=2759,freq=2.0), product of:
            0.120133065 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0343058 = queryNorm
            0.38690117 = fieldWeight in 2759, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2759)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 1. 2.2016 18:25:22

Selisskaya, M.A.: Ispol'zovanie mashinnogo obucheniya pri avtomaticheskoi klassifikatsii tekstov (1999) 0.01

0.0056669954 = product of:
  0.022667982 = sum of:
    0.022667982 = product of:
      0.09067193 = sum of:
        0.09067193 = weight(_text_:learning in 375) [ClassicSimilarity], result of:
          0.09067193 = score(doc=375,freq=2.0), product of:
            0.15317118 = queryWeight, product of:
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.0343058 = queryNorm
            0.59196466 = fieldWeight in 375, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.09375 = fieldNorm(doc=375)
      0.25 = coord(1/4)
  0.25 = coord(1/4)

Footnote: Übers. des Titels: Machine learning as a tool for development of automated text indexing systems

Tsujii, J.-I.: Automatic acquisition of semantic collocation from corpora (1995) 0.00

0.0046479623 = product of:
  0.01859185 = sum of:
    0.01859185 = product of:
      0.0371837 = sum of:
        0.0371837 = weight(_text_:22 in 4709) [ClassicSimilarity], result of:
          0.0371837 = score(doc=4709,freq=2.0), product of:
            0.120133065 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0343058 = queryNorm
            0.30952093 = fieldWeight in 4709, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=4709)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 31. 7.1996 9:22:19

Riloff, E.: ¬An empirical study of automated dictionary construction for information extraction in three domains (1996) 0.00

0.0046479623 = product of:
  0.01859185 = sum of:
    0.01859185 = product of:
      0.0371837 = sum of:
        0.0371837 = weight(_text_:22 in 6752) [ClassicSimilarity], result of:
          0.0371837 = score(doc=6752,freq=2.0), product of:
            0.120133065 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0343058 = queryNorm
            0.30952093 = fieldWeight in 6752, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=6752)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 6. 3.1997 16:22:15

Lepsky, K.; Vorhauer, J.: Lingo - ein open source System für die Automatische Indexierung deutschsprachiger Dokumente (2006) 0.00

0.0046479623 = product of:
  0.01859185 = sum of:
    0.01859185 = product of:
      0.0371837 = sum of:
        0.0371837 = weight(_text_:22 in 3581) [ClassicSimilarity], result of:
          0.0371837 = score(doc=3581,freq=2.0), product of:
            0.120133065 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0343058 = queryNorm
            0.30952093 = fieldWeight in 3581, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=3581)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 24. 3.2006 12:22:02

Probst, M.; Mittelbach, J.: Maschinelle Indexierung in der Sacherschließung wissenschaftlicher Bibliotheken (2006) 0.00

0.0046479623 = product of:
  0.01859185 = sum of:
    0.01859185 = product of:
      0.0371837 = sum of:
        0.0371837 = weight(_text_:22 in 1755) [ClassicSimilarity], result of:
          0.0371837 = score(doc=1755,freq=2.0), product of:
            0.120133065 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0343058 = queryNorm
            0.30952093 = fieldWeight in 1755, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=1755)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 22. 3.2008 12:35:19

Glaesener, L.: Automatisches Indexieren einer informationswissenschaftlichen Datenbank mit Mehrwortgruppen (2012) 0.00

0.0046479623 = product of:
  0.01859185 = sum of:
    0.01859185 = product of:
      0.0371837 = sum of:
        0.0371837 = weight(_text_:22 in 401) [ClassicSimilarity], result of:
          0.0371837 = score(doc=401,freq=2.0), product of:
            0.120133065 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0343058 = queryNorm
            0.30952093 = fieldWeight in 401, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=401)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 11. 9.2012 19:43:22

Ahmed, M.: Automatic indexing for agriculture : designing a framework by deploying Agrovoc, Agris and Annif (2023) 0.00
```
0.004089802 = product of:
  0.016359208 = sum of:
    0.016359208 = product of:
      0.06543683 = sum of:
        0.06543683 = weight(_text_:learning in 1024) [ClassicSimilarity], result of:
          0.06543683 = score(doc=1024,freq=6.0), product of:
            0.15317118 = queryWeight, product of:
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.0343058 = queryNorm
            0.42721373 = fieldWeight in 1024, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1024)
      0.25 = coord(1/4)
  0.25 = coord(1/4)
```
Abstract

There are several ways to employ machine learning for automating subject indexing. One popular strategy is to utilize a supervised learning algorithm to train a model on a set of documents that have been manually indexed by subject matter using a standard vocabulary. The resulting model can then predict the subject of new and previously unseen documents by identifying patterns learned from the training data. To do this, the first step is to gather a large dataset of documents and manually assign each document a set of subject keywords/descriptors from a controlled vocabulary (e.g., from Agrovoc). Next, the dataset (obtained from Agris) can be divided into - i) a training dataset, and ii) a test dataset. The training dataset is used to train the model, while the test dataset is used to evaluate the model's performance. Machine learning can be a powerful tool for automating the process of subject indexing. This research is an attempt to apply Annif (http://annif. org/), an open-source AI/ML framework, to autogenerate subject keywords/descriptors for documentary resources in the domain of agriculture. The training dataset is obtained from Agris, which applies the Agrovoc thesaurus as a vocabulary tool (https://www.fao.org/agris/download).

Search (52 results, page 1 of 3)

Authors

Years

Languages

Types

Themes