Search (33 results, page 1 of 2)

Dolamic, L.; Savoy, J.: Indexing and searching strategies for the Russian language (2009) 0.04
```
0.040342852 = product of:
  0.080685705 = sum of:
    0.080685705 = product of:
      0.16137141 = sum of:
        0.16137141 = weight(_text_:light in 3301) [ClassicSimilarity], result of:
          0.16137141 = score(doc=3301,freq=6.0), product of:
            0.2920221 = queryWeight, product of:
              5.7753086 = idf(docFreq=372, maxDocs=44218)
              0.050563898 = queryNorm
            0.55259997 = fieldWeight in 3301, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              5.7753086 = idf(docFreq=372, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3301)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

This paper describes and evaluates various stemming and indexing strategies for the Russian language. We design and evaluate two stemming approaches, a light and a more aggressive one, and compare these stemmers to the Snowball stemmer, to no stemming, and also to a language-independent approach (n-gram). To evaluate the suggested stemming strategies we apply various probabilistic information retrieval (IR) models, including the Okapi, the Divergence from Randomness (DFR), a statistical language model (LM), as well as two vector-space approaches, namely, the classical tf idf scheme and the dtu-dtn model. We find that the vector-space dtu-dtn and the DFR models tend to result in better retrieval effectiveness than the Okapi, LM, or tf idf models, while only the latter two IR approaches result in statistically significant performance differences. Ignoring stemming generally reduces the MAP by more than 50%, and these differences are always significant. When applying an n-gram approach, performance differences are usually lower than an approach involving stemming. Finally, our light stemmer tends to perform best, although performance differences between the light, aggressive, and Snowball stemmers are not statistically significant.
Chou, C.; Chu, T.: ¬An analysis of BERT (NLP) for assisted subject indexing for Project Gutenberg (2022) 0.03
```
0.03260874 = product of:
  0.06521748 = sum of:
    0.06521748 = product of:
      0.13043496 = sum of:
        0.13043496 = weight(_text_:light in 1139) [ClassicSimilarity], result of:
          0.13043496 = score(doc=1139,freq=2.0), product of:
            0.2920221 = queryWeight, product of:
              5.7753086 = idf(docFreq=372, maxDocs=44218)
              0.050563898 = queryNorm
            0.44666123 = fieldWeight in 1139, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.7753086 = idf(docFreq=372, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1139)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

In light of AI (Artificial Intelligence) and NLP (Natural language processing) technologies, this article examines the feasibility of using AI/NLP models to enhance the subject indexing of digital resources. While BERT (Bidirectional Encoder Representations from Transformers) models are widely used in scholarly communities, the authors assess whether BERT models can be used in machine-assisted indexing in the Project Gutenberg collection, through suggesting Library of Congress subject headings filtered by certain Library of Congress Classification subclass labels. The findings of this study are informative for further research on BERT models to assist with automatic subject indexing for digital library collections.

Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.03

0.027402842 = product of:
  0.054805685 = sum of:
    0.054805685 = product of:
      0.10961137 = sum of:
        0.10961137 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
          0.10961137 = score(doc=402,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.61904186 = fieldWeight in 402, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.125 = fieldNorm(doc=402)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Information processing and management. 22(1986) no.6, S.465-476

Fuhr, N.; Niewelt, B.: ¬Ein Retrievaltest mit automatisch indexierten Dokumenten (1984) 0.02

0.023977486 = product of:
  0.047954973 = sum of:
    0.047954973 = product of:
      0.095909946 = sum of:
        0.095909946 = weight(_text_:22 in 262) [ClassicSimilarity], result of:
          0.095909946 = score(doc=262,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.5416616 = fieldWeight in 262, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=262)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 20.10.2000 12:22:23

Hlava, M.M.K.: Automatic indexing : comparing rule-based and statistics-based indexing systems (2005) 0.02

0.023977486 = product of:
  0.047954973 = sum of:
    0.047954973 = product of:
      0.095909946 = sum of:
        0.095909946 = weight(_text_:22 in 6265) [ClassicSimilarity], result of:
          0.095909946 = score(doc=6265,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.5416616 = fieldWeight in 6265, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=6265)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Information outlook. 9(2005) no.8, S.22-23

Kanan, T.; Fox, E.A.: Automated arabic text classification with P-Stemmer, machine learning, and a tailored news article taxonomy (2016) 0.02
```
0.023291955 = product of:
  0.04658391 = sum of:
    0.04658391 = product of:
      0.09316782 = sum of:
        0.09316782 = weight(_text_:light in 3151) [ClassicSimilarity], result of:
          0.09316782 = score(doc=3151,freq=2.0), product of:
            0.2920221 = queryWeight, product of:
              5.7753086 = idf(docFreq=372, maxDocs=44218)
              0.050563898 = queryNorm
            0.31904373 = fieldWeight in 3151, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.7753086 = idf(docFreq=372, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3151)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Arabic news articles in electronic collections are difficult to study. Browsing by category is rarely supported. Although helpful machine-learning methods have been applied successfully to similar situations for English news articles, limited research has been completed to yield suitable solutions for Arabic news. In connection with a Qatar National Research Fund (QNRF)-funded project to build digital library community and infrastructure in Qatar, we developed software for browsing a collection of about 237,000 Arabic news articles, which should be applicable to other Arabic news collections. We designed a simple taxonomy for Arabic news stories that is suitable for the needs of Qatar and other nations, is compatible with the subject codes of the International Press Telecommunications Council, and was enhanced with the aid of a librarian expert as well as five Arabic-speaking volunteers. We developed tailored stemming (i.e., a new Arabic light stemmer called P-Stemmer) and automatic classification methods (the best being binary Support Vector Machines classifiers) to work with the taxonomy. Using evaluation techniques commonly used in the information retrieval community, including 10-fold cross-validation and the Wilcoxon signed-rank test, we showed that our approach to stemming and classification is superior to state-of-the-art techniques.

Fuhr, N.: Ranking-Experimente mit gewichteter Indexierung (1986) 0.02

0.02055213 = product of:
  0.04110426 = sum of:
    0.04110426 = product of:
      0.08220852 = sum of:
        0.08220852 = weight(_text_:22 in 58) [ClassicSimilarity], result of:
          0.08220852 = score(doc=58,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.46428138 = fieldWeight in 58, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=58)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 14. 6.2015 22:12:44

Hauer, M.: Automatische Indexierung (2000) 0.02

0.02055213 = product of:
  0.04110426 = sum of:
    0.04110426 = product of:
      0.08220852 = sum of:
        0.08220852 = weight(_text_:22 in 5887) [ClassicSimilarity], result of:
          0.08220852 = score(doc=5887,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.46428138 = fieldWeight in 5887, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=5887)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Wissen in Aktion: Wege des Knowledge Managements. 22. Online-Tagung der DGI, Frankfurt am Main, 2.-4.5.2000. Proceedings. Hrsg.: R. Schmidt

Fuhr, N.: Rankingexperimente mit gewichteter Indexierung (1986) 0.02

0.02055213 = product of:
  0.04110426 = sum of:
    0.04110426 = product of:
      0.08220852 = sum of:
        0.08220852 = weight(_text_:22 in 2051) [ClassicSimilarity], result of:
          0.08220852 = score(doc=2051,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.46428138 = fieldWeight in 2051, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=2051)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 14. 6.2015 22:12:56

Hauer, M.: Tiefenindexierung im Bibliothekskatalog : 17 Jahre intelligentCAPTURE (2019) 0.02

0.02055213 = product of:
  0.04110426 = sum of:
    0.04110426 = product of:
      0.08220852 = sum of:
        0.08220852 = weight(_text_:22 in 5629) [ClassicSimilarity], result of:
          0.08220852 = score(doc=5629,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.46428138 = fieldWeight in 5629, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=5629)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: B.I.T.online. 22(2019) H.2, S.163-166

Biebricher, N.; Fuhr, N.; Lustig, G.; Schwantner, M.; Knorz, G.: ¬The automatic indexing system AIR/PHYS : from research to application (1988) 0.02

0.017126776 = product of:
  0.034253553 = sum of:
    0.034253553 = product of:
      0.068507105 = sum of:
        0.068507105 = weight(_text_:22 in 1952) [ClassicSimilarity], result of:
          0.068507105 = score(doc=1952,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.38690117 = fieldWeight in 1952, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=1952)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 16. 8.1998 12:51:22

Kutschekmanesch, S.; Lutes, B.; Moelle, K.; Thiel, U.; Tzeras, K.: Automated multilingual indexing : a synthesis of rule-based and thesaurus-based methods (1998) 0.02

0.017126776 = product of:
  0.034253553 = sum of:
    0.034253553 = product of:
      0.068507105 = sum of:
        0.068507105 = weight(_text_:22 in 4157) [ClassicSimilarity], result of:
          0.068507105 = score(doc=4157,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.38690117 = fieldWeight in 4157, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=4157)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Information und Märkte: 50. Deutscher Dokumentartag 1998, Kongreß der Deutschen Gesellschaft für Dokumentation e.V. (DGD), Rheinische Friedrich-Wilhelms-Universität Bonn, 22.-24. September 1998. Hrsg. von Marlies Ockenfeld u. Gerhard J. Mantwill

Tsareva, P.V.: Algoritmy dlya raspoznavaniya pozitivnykh i negativnykh vkhozdenii deskriptorov v tekst i protsedura avtomaticheskoi klassifikatsii tekstov (1999) 0.02

0.017126776 = product of:
  0.034253553 = sum of:
    0.034253553 = product of:
      0.068507105 = sum of:
        0.068507105 = weight(_text_:22 in 374) [ClassicSimilarity], result of:
          0.068507105 = score(doc=374,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.38690117 = fieldWeight in 374, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=374)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 1. 4.2002 10:22:41

Stankovic, R. et al.: Indexing of textual databases based on lexical resources : a case study for Serbian (2016) 0.02

0.017126776 = product of:
  0.034253553 = sum of:
    0.034253553 = product of:
      0.068507105 = sum of:
        0.068507105 = weight(_text_:22 in 2759) [ClassicSimilarity], result of:
          0.068507105 = score(doc=2759,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.38690117 = fieldWeight in 2759, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2759)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 1. 2.2016 18:25:22

Tsujii, J.-I.: Automatic acquisition of semantic collocation from corpora (1995) 0.01

0.013701421 = product of:
  0.027402842 = sum of:
    0.027402842 = product of:
      0.054805685 = sum of:
        0.054805685 = weight(_text_:22 in 4709) [ClassicSimilarity], result of:
          0.054805685 = score(doc=4709,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.30952093 = fieldWeight in 4709, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=4709)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 31. 7.1996 9:22:19

Riloff, E.: ¬An empirical study of automated dictionary construction for information extraction in three domains (1996) 0.01

0.013701421 = product of:
  0.027402842 = sum of:
    0.027402842 = product of:
      0.054805685 = sum of:
        0.054805685 = weight(_text_:22 in 6752) [ClassicSimilarity], result of:
          0.054805685 = score(doc=6752,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.30952093 = fieldWeight in 6752, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=6752)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 6. 3.1997 16:22:15

Lepsky, K.; Vorhauer, J.: Lingo - ein open source System für die Automatische Indexierung deutschsprachiger Dokumente (2006) 0.01

0.013701421 = product of:
  0.027402842 = sum of:
    0.027402842 = product of:
      0.054805685 = sum of:
        0.054805685 = weight(_text_:22 in 3581) [ClassicSimilarity], result of:
          0.054805685 = score(doc=3581,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.30952093 = fieldWeight in 3581, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=3581)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 24. 3.2006 12:22:02

Probst, M.; Mittelbach, J.: Maschinelle Indexierung in der Sacherschließung wissenschaftlicher Bibliotheken (2006) 0.01

0.013701421 = product of:
  0.027402842 = sum of:
    0.027402842 = product of:
      0.054805685 = sum of:
        0.054805685 = weight(_text_:22 in 1755) [ClassicSimilarity], result of:
          0.054805685 = score(doc=1755,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.30952093 = fieldWeight in 1755, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=1755)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 3.2008 12:35:19

Hodges, P.R.: Keyword in title indexes : effectiveness of retrieval in computer searches (1983) 0.01

0.011988743 = product of:
  0.023977486 = sum of:
    0.023977486 = product of:
      0.047954973 = sum of:
        0.047954973 = weight(_text_:22 in 5001) [ClassicSimilarity], result of:
          0.047954973 = score(doc=5001,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.2708308 = fieldWeight in 5001, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5001)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 14. 3.1996 13:22:21

Bordoni, L.; Pazienza, M.T.: Documents automatic indexing in an environmental domain (1997) 0.01

0.011988743 = product of:
  0.023977486 = sum of:
    0.023977486 = product of:
      0.047954973 = sum of:
        0.047954973 = weight(_text_:22 in 530) [ClassicSimilarity], result of:
          0.047954973 = score(doc=530,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.2708308 = fieldWeight in 530, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=530)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: International forum on information and documentation. 22(1997) no.1, S.17-28

Search (33 results, page 1 of 2)

Authors

Years

Languages

Themes