Search (21 results, page 1 of 2)

Dolamic, L.; Savoy, J.: Indexing and searching strategies for the Russian language (2009) 0.04
```
0.040342852 = product of:
  0.080685705 = sum of:
    0.080685705 = product of:
      0.16137141 = sum of:
        0.16137141 = weight(_text_:light in 3301) [ClassicSimilarity], result of:
          0.16137141 = score(doc=3301,freq=6.0), product of:
            0.2920221 = queryWeight, product of:
              5.7753086 = idf(docFreq=372, maxDocs=44218)
              0.050563898 = queryNorm
            0.55259997 = fieldWeight in 3301, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              5.7753086 = idf(docFreq=372, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3301)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

This paper describes and evaluates various stemming and indexing strategies for the Russian language. We design and evaluate two stemming approaches, a light and a more aggressive one, and compare these stemmers to the Snowball stemmer, to no stemming, and also to a language-independent approach (n-gram). To evaluate the suggested stemming strategies we apply various probabilistic information retrieval (IR) models, including the Okapi, the Divergence from Randomness (DFR), a statistical language model (LM), as well as two vector-space approaches, namely, the classical tf idf scheme and the dtu-dtn model. We find that the vector-space dtu-dtn and the DFR models tend to result in better retrieval effectiveness than the Okapi, LM, or tf idf models, while only the latter two IR approaches result in statistically significant performance differences. Ignoring stemming generally reduces the MAP by more than 50%, and these differences are always significant. When applying an n-gram approach, performance differences are usually lower than an approach involving stemming. Finally, our light stemmer tends to perform best, although performance differences between the light, aggressive, and Snowball stemmers are not statistically significant.
Chou, C.; Chu, T.: ¬An analysis of BERT (NLP) for assisted subject indexing for Project Gutenberg (2022) 0.03
```
0.03260874 = product of:
  0.06521748 = sum of:
    0.06521748 = product of:
      0.13043496 = sum of:
        0.13043496 = weight(_text_:light in 1139) [ClassicSimilarity], result of:
          0.13043496 = score(doc=1139,freq=2.0), product of:
            0.2920221 = queryWeight, product of:
              5.7753086 = idf(docFreq=372, maxDocs=44218)
              0.050563898 = queryNorm
            0.44666123 = fieldWeight in 1139, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.7753086 = idf(docFreq=372, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1139)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

In light of AI (Artificial Intelligence) and NLP (Natural language processing) technologies, this article examines the feasibility of using AI/NLP models to enhance the subject indexing of digital resources. While BERT (Bidirectional Encoder Representations from Transformers) models are widely used in scholarly communities, the authors assess whether BERT models can be used in machine-assisted indexing in the Project Gutenberg collection, through suggesting Library of Congress subject headings filtered by certain Library of Congress Classification subclass labels. The findings of this study are informative for further research on BERT models to assist with automatic subject indexing for digital library collections.

Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.03

0.027402842 = product of:
  0.054805685 = sum of:
    0.054805685 = product of:
      0.10961137 = sum of:
        0.10961137 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
          0.10961137 = score(doc=402,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.61904186 = fieldWeight in 402, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.125 = fieldNorm(doc=402)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Information processing and management. 22(1986) no.6, S.465-476

Hlava, M.M.K.: Automatic indexing : comparing rule-based and statistics-based indexing systems (2005) 0.02

0.023977486 = product of:
  0.047954973 = sum of:
    0.047954973 = product of:
      0.095909946 = sum of:
        0.095909946 = weight(_text_:22 in 6265) [ClassicSimilarity], result of:
          0.095909946 = score(doc=6265,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.5416616 = fieldWeight in 6265, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=6265)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Information outlook. 9(2005) no.8, S.22-23

Kanan, T.; Fox, E.A.: Automated arabic text classification with P-Stemmer, machine learning, and a tailored news article taxonomy (2016) 0.02
```
0.023291955 = product of:
  0.04658391 = sum of:
    0.04658391 = product of:
      0.09316782 = sum of:
        0.09316782 = weight(_text_:light in 3151) [ClassicSimilarity], result of:
          0.09316782 = score(doc=3151,freq=2.0), product of:
            0.2920221 = queryWeight, product of:
              5.7753086 = idf(docFreq=372, maxDocs=44218)
              0.050563898 = queryNorm
            0.31904373 = fieldWeight in 3151, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.7753086 = idf(docFreq=372, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3151)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Arabic news articles in electronic collections are difficult to study. Browsing by category is rarely supported. Although helpful machine-learning methods have been applied successfully to similar situations for English news articles, limited research has been completed to yield suitable solutions for Arabic news. In connection with a Qatar National Research Fund (QNRF)-funded project to build digital library community and infrastructure in Qatar, we developed software for browsing a collection of about 237,000 Arabic news articles, which should be applicable to other Arabic news collections. We designed a simple taxonomy for Arabic news stories that is suitable for the needs of Qatar and other nations, is compatible with the subject codes of the International Press Telecommunications Council, and was enhanced with the aid of a librarian expert as well as five Arabic-speaking volunteers. We developed tailored stemming (i.e., a new Arabic light stemmer called P-Stemmer) and automatic classification methods (the best being binary Support Vector Machines classifiers) to work with the taxonomy. Using evaluation techniques commonly used in the information retrieval community, including 10-fold cross-validation and the Wilcoxon signed-rank test, we showed that our approach to stemming and classification is superior to state-of-the-art techniques.
Search Engines and Beyond : Developing efficient knowledge management systems, April 19-20 1999, Boston, Mass (1999) 0.02
```
0.018633565 = product of:
  0.03726713 = sum of:
    0.03726713 = product of:
      0.07453426 = sum of:
        0.07453426 = weight(_text_:light in 2596) [ClassicSimilarity], result of:
          0.07453426 = score(doc=2596,freq=2.0), product of:
            0.2920221 = queryWeight, product of:
              5.7753086 = idf(docFreq=372, maxDocs=44218)
              0.050563898 = queryNorm
            0.255235 = fieldWeight in 2596, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.7753086 = idf(docFreq=372, maxDocs=44218)
              0.03125 = fieldNorm(doc=2596)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Content

Ramana Rao (Inxight, Palo Alto, CA) 7 ± 2 Insights on achieving Effective Information Access Session One: Updates and a twelve month perspective Danny Sullivan (Search Engine Watch, US / England) Portalization and other search trends Carol Tenopir (University of Tennessee) Search realities faced by end users and professional searchers Session Two: Today's search engines and beyond Daniel Hoogterp (Retrieval Technologies, McLean, VA) Effective presentation and utilization of search techniques Rick Kenny (Fulcrum Technologies, Ontario, Canada) Beyond document clustering: The knowledge impact statement Gary Stock (Ingenius, Kalamazoo, MI) Automated change monitoring Gary Culliss (Direct Hit, Wellesley Hills, MA) User popularity ranked search engines Byron Dom (IBM, CA) Automatically finding the best pages on the World Wide Web (CLEVER) Peter Tomassi (LookSmart, San Francisco, CA) Adding human intellect to search technology Session Three: Panel discussion: Human v automated categorization and editing Ev Brenner (New York, NY)- Chairman James Callan (University of Massachusetts, MA) Marc Krellenstein (Northern Light Technology, Cambridge, MA) Dan Miller (Ask Jeeves, Berkeley, CA) Session Four: Updates and a twelve month perspective Steve Arnold (AIT, Harrods Creek, KY) Review: The leading edge in search and retrieval software Ellen Voorhees (NIST, Gaithersburg, MD) TREC update Session Five: Search engines now and beyond Intelligent Agents John Snyder (Muscat, Cambridge, England) Practical issues behind intelligent agents Text summarization Therese Firmin, (Dept of Defense, Ft George G. Meade, MD) The TIPSTER/SUMMAC evaluation of automatic text summarization systems Cross language searching Elizabeth Liddy (TextWise, Syracuse, NY) A conceptual interlingua approach to cross-language retrieval. Video search and retrieval Armon Amir (IBM, Almaden, CA) CueVideo: Modular system for automatic indexing and browsing of video/audio Speech recognition Michael Witbrock (Lycos, Waltham, MA) Retrieval of spoken documents Visualization James A. Wise (Integral Visuals, Richland, WA) Information visualization in the new millennium: Emerging science or passing fashion? Text mining David Evans (Claritech, Pittsburgh, PA) Text mining - towards decision support

Biebricher, N.; Fuhr, N.; Lustig, G.; Schwantner, M.; Knorz, G.: ¬The automatic indexing system AIR/PHYS : from research to application (1988) 0.02

0.017126776 = product of:
  0.034253553 = sum of:
    0.034253553 = product of:
      0.068507105 = sum of:
        0.068507105 = weight(_text_:22 in 1952) [ClassicSimilarity], result of:
          0.068507105 = score(doc=1952,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.38690117 = fieldWeight in 1952, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=1952)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 16. 8.1998 12:51:22

Kutschekmanesch, S.; Lutes, B.; Moelle, K.; Thiel, U.; Tzeras, K.: Automated multilingual indexing : a synthesis of rule-based and thesaurus-based methods (1998) 0.02

0.017126776 = product of:
  0.034253553 = sum of:
    0.034253553 = product of:
      0.068507105 = sum of:
        0.068507105 = weight(_text_:22 in 4157) [ClassicSimilarity], result of:
          0.068507105 = score(doc=4157,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.38690117 = fieldWeight in 4157, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=4157)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Information und Märkte: 50. Deutscher Dokumentartag 1998, Kongreß der Deutschen Gesellschaft für Dokumentation e.V. (DGD), Rheinische Friedrich-Wilhelms-Universität Bonn, 22.-24. September 1998. Hrsg. von Marlies Ockenfeld u. Gerhard J. Mantwill

Stankovic, R. et al.: Indexing of textual databases based on lexical resources : a case study for Serbian (2016) 0.02

0.017126776 = product of:
  0.034253553 = sum of:
    0.034253553 = product of:
      0.068507105 = sum of:
        0.068507105 = weight(_text_:22 in 2759) [ClassicSimilarity], result of:
          0.068507105 = score(doc=2759,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.38690117 = fieldWeight in 2759, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2759)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 1. 2.2016 18:25:22

Tsujii, J.-I.: Automatic acquisition of semantic collocation from corpora (1995) 0.01

0.013701421 = product of:
  0.027402842 = sum of:
    0.027402842 = product of:
      0.054805685 = sum of:
        0.054805685 = weight(_text_:22 in 4709) [ClassicSimilarity], result of:
          0.054805685 = score(doc=4709,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.30952093 = fieldWeight in 4709, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=4709)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 31. 7.1996 9:22:19

Riloff, E.: ¬An empirical study of automated dictionary construction for information extraction in three domains (1996) 0.01

0.013701421 = product of:
  0.027402842 = sum of:
    0.027402842 = product of:
      0.054805685 = sum of:
        0.054805685 = weight(_text_:22 in 6752) [ClassicSimilarity], result of:
          0.054805685 = score(doc=6752,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.30952093 = fieldWeight in 6752, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=6752)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 6. 3.1997 16:22:15

Hodges, P.R.: Keyword in title indexes : effectiveness of retrieval in computer searches (1983) 0.01

0.011988743 = product of:
  0.023977486 = sum of:
    0.023977486 = product of:
      0.047954973 = sum of:
        0.047954973 = weight(_text_:22 in 5001) [ClassicSimilarity], result of:
          0.047954973 = score(doc=5001,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.2708308 = fieldWeight in 5001, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5001)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 14. 3.1996 13:22:21

Bordoni, L.; Pazienza, M.T.: Documents automatic indexing in an environmental domain (1997) 0.01

0.011988743 = product of:
  0.023977486 = sum of:
    0.023977486 = product of:
      0.047954973 = sum of:
        0.047954973 = weight(_text_:22 in 530) [ClassicSimilarity], result of:
          0.047954973 = score(doc=530,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.2708308 = fieldWeight in 530, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=530)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: International forum on information and documentation. 22(1997) no.1, S.17-28

Wolfekuhler, M.R.; Punch, W.F.: Finding salient features for personal Web pages categories (1997) 0.01

0.011988743 = product of:
  0.023977486 = sum of:
    0.023977486 = product of:
      0.047954973 = sum of:
        0.047954973 = weight(_text_:22 in 2673) [ClassicSimilarity], result of:
          0.047954973 = score(doc=2673,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.2708308 = fieldWeight in 2673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2673)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 1. 8.1996 22:08:06

Newman, D.J.; Block, S.: Probabilistic topic decomposition of an eighteenth-century American newspaper (2006) 0.01

0.011988743 = product of:
  0.023977486 = sum of:
    0.023977486 = product of:
      0.047954973 = sum of:
        0.047954973 = weight(_text_:22 in 5291) [ClassicSimilarity], result of:
          0.047954973 = score(doc=5291,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.2708308 = fieldWeight in 5291, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5291)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 7.2006 17:32:00

Ward, M.L.: ¬The future of the human indexer (1996) 0.01

0.010276065 = product of:
  0.02055213 = sum of:
    0.02055213 = product of:
      0.04110426 = sum of:
        0.04110426 = weight(_text_:22 in 7244) [ClassicSimilarity], result of:
          0.04110426 = score(doc=7244,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.23214069 = fieldWeight in 7244, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=7244)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 9. 2.1997 18:44:22

Plaunt, C.; Norgard, B.A.: ¬An association-based method for automatic indexing with a controlled vocabulary (1998) 0.01

0.008563388 = product of:
  0.017126776 = sum of:
    0.017126776 = product of:
      0.034253553 = sum of:
        0.034253553 = weight(_text_:22 in 1794) [ClassicSimilarity], result of:
          0.034253553 = score(doc=1794,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.19345059 = fieldWeight in 1794, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1794)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 11. 9.2000 19:53:22

Milstead, J.L.: Thesauri in a full-text world (1998) 0.01

0.008563388 = product of:
  0.017126776 = sum of:
    0.017126776 = product of:
      0.034253553 = sum of:
        0.034253553 = weight(_text_:22 in 2337) [ClassicSimilarity], result of:
          0.034253553 = score(doc=2337,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.19345059 = fieldWeight in 2337, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2337)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 9.1997 19:16:05

Martins, A.L.; Souza, R.R.; Ribeiro de Mello, H.: ¬The use of noun phrases in information retrieval : proposing a mechanism for automatic classification (2014) 0.01

0.0068507106 = product of:
  0.013701421 = sum of:
    0.013701421 = product of:
      0.027402842 = sum of:
        0.027402842 = weight(_text_:22 in 1441) [ClassicSimilarity], result of:
          0.027402842 = score(doc=1441,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.15476047 = fieldWeight in 1441, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=1441)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik

Mesquita, L.A.P.; Souza, R.R.; Baracho Porto, R.M.A.: Noun phrases in automatic indexing: : a structural analysis of the distribution of relevant terms in doctoral theses (2014) 0.01

0.0068507106 = product of:
  0.013701421 = sum of:
    0.013701421 = product of:
      0.027402842 = sum of:
        0.027402842 = weight(_text_:22 in 1442) [ClassicSimilarity], result of:
          0.027402842 = score(doc=1442,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.15476047 = fieldWeight in 1442, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=1442)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik

Search (21 results, page 1 of 2)

Authors

Years

Types

Themes