Search (29 results, page 1 of 2)

Hlava, M.M.K.: Automatic indexing : comparing rule-based and statistics-based indexing systems (2005) 0.08

0.08449805 = product of:
  0.25349414 = sum of:
    0.25349414 = sum of:
      0.15925187 = weight(_text_:indexing in 6265) [ClassicSimilarity], result of:
        0.15925187 = score(doc=6265,freq=4.0), product of:
          0.19018644 = queryWeight, product of:
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.049684696 = queryNorm
          0.8373461 = fieldWeight in 6265, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.109375 = fieldNorm(doc=6265)
      0.09424227 = weight(_text_:22 in 6265) [ClassicSimilarity], result of:
        0.09424227 = score(doc=6265,freq=2.0), product of:
          0.17398734 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.049684696 = queryNorm
          0.5416616 = fieldWeight in 6265, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.109375 = fieldNorm(doc=6265)
  0.33333334 = coord(1/3)

Source: Information outlook. 9(2005) no.8, S.22-23

Bloomfield, M.: Indexing : neglected and poorly understood (2001) 0.02
```
0.024130303 = product of:
  0.07239091 = sum of:
    0.07239091 = product of:
      0.14478181 = sum of:
        0.14478181 = weight(_text_:indexing in 5439) [ClassicSimilarity], result of:
          0.14478181 = score(doc=5439,freq=18.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.76126254 = fieldWeight in 5439, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.046875 = fieldNorm(doc=5439)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

The growth of the Internet has highlighted the use of machine indexing. The difficulties in using the Internet as a searching device can be frustrating. The use of the term "Python" is given as an example. Machine indexing is noted as "rotten" and human indexing as "capricious." The problem seems to be a lack of a theoretical foundation for the art of indexing. What librarians have learned over the last hundred years has yet to yield a consistent approach to what really works best in preparing index terms and in the ability of our customers to search the various indexes. An attempt is made to consider the elements of indexing, their pros and cons. The argument is made that machine indexing is far too prolific in its production of index terms. Neither librarians nor computer programmers have made much progress to improve Internet indexing. Human indexing has had the same problems for over fifty years.

Anderson, J.D.; Pérez-Carballo, J.: ¬The nature of indexing: how humans and machines analyze messages and texts for retrieval : Part I: Research and the nature of human indexing (2001) 0.02

0.02275027 = product of:
  0.068250805 = sum of:
    0.068250805 = product of:
      0.13650161 = sum of:
        0.13650161 = weight(_text_:indexing in 3136) [ClassicSimilarity], result of:
          0.13650161 = score(doc=3136,freq=4.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.7177252 = fieldWeight in 3136, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.09375 = fieldNorm(doc=3136)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Moens, M.F.: Automatic indexing and abstracting of document texts (2000) 0.02

0.018958557 = product of:
  0.05687567 = sum of:
    0.05687567 = product of:
      0.11375134 = sum of:
        0.11375134 = weight(_text_:indexing in 6892) [ClassicSimilarity], result of:
          0.11375134 = score(doc=6892,freq=4.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.59810436 = fieldWeight in 6892, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.078125 = fieldNorm(doc=6892)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Content: Need for indexing and abstracting texts; attributes of texts; text representations and their use; selection of natural language index terms; assignment of controlled language index texts; automatic abstracting; applications

Anderson, J.D.; Pérez-Carballo, J.: ¬The nature of indexing: how humans and machines analyze messages and texts for retrieval : Part II: Machine indexing, and the allocation of human versus machine effort (2001) 0.02

0.018958557 = product of:
  0.05687567 = sum of:
    0.05687567 = product of:
      0.11375134 = sum of:
        0.11375134 = weight(_text_:indexing in 368) [ClassicSimilarity], result of:
          0.11375134 = score(doc=368,freq=4.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.59810436 = fieldWeight in 368, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.078125 = fieldNorm(doc=368)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Hlava, M.M.: Automatic indexing : a matter of degree (2002) 0.02

0.018768014 = product of:
  0.05630404 = sum of:
    0.05630404 = product of:
      0.11260808 = sum of:
        0.11260808 = weight(_text_:indexing in 2501) [ClassicSimilarity], result of:
          0.11260808 = score(doc=2501,freq=2.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.5920931 = fieldWeight in 2501, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.109375 = fieldNorm(doc=2501)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Yusuff, A.: Automatisches Indexing and Abstracting : Grundlagen und Beispiele (2002) 0.02

0.018768014 = product of:
  0.05630404 = sum of:
    0.05630404 = product of:
      0.11260808 = sum of:
        0.11260808 = weight(_text_:indexing in 1577) [ClassicSimilarity], result of:
          0.11260808 = score(doc=1577,freq=2.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.5920931 = fieldWeight in 1577, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.109375 = fieldNorm(doc=1577)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Pulgarin, A.; Gil-Leiva, I.: Bibliometric analysis of the automatic indexing literature : 1956-2000 (2004) 0.02
```
0.016253578 = product of:
  0.04876073 = sum of:
    0.04876073 = product of:
      0.09752146 = sum of:
        0.09752146 = weight(_text_:indexing in 2566) [ClassicSimilarity], result of:
          0.09752146 = score(doc=2566,freq=6.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.5127677 = fieldWeight in 2566, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2566)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

We present a bibliometric study of a corpus of 839 bibliographic references about automatic indexing, covering the period 1956-2000. We analyse the distribution of authors and works, the obsolescence and its dispersion, and the distribution of the literature by topic, year, and source type. We conclude that: (i) there has been a constant interest on the part of researchers; (ii) the most studied topics were the techniques and methods employed and the general aspects of automatic indexing; (iii) the productivity of the authors does fit a Lotka distribution (Dmax=0.02 and critical value=0.054); (iv) the annual aging factor is 95%; and (v) the dispersion of the literature is low.

Thirion, B.; Leroy, J.P.; Baudic, F.; Douyère, M.; Piot, J.; Darmoni, S.J.: SDI selecting, decribing, and indexing : did you mean automatically? (2001) 0.02

0.016086869 = product of:
  0.048260607 = sum of:
    0.048260607 = product of:
      0.09652121 = sum of:
        0.09652121 = weight(_text_:indexing in 6198) [ClassicSimilarity], result of:
          0.09652121 = score(doc=6198,freq=2.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.5075084 = fieldWeight in 6198, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.09375 = fieldNorm(doc=6198)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Roberts, D.; Souter, C.: ¬The automation of controlled vocabulary subject indexing of medical journal articles (2000) 0.02
```
0.016086869 = product of:
  0.048260607 = sum of:
    0.048260607 = product of:
      0.09652121 = sum of:
        0.09652121 = weight(_text_:indexing in 711) [ClassicSimilarity], result of:
          0.09652121 = score(doc=711,freq=8.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.5075084 = fieldWeight in 711, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.046875 = fieldNorm(doc=711)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

This article discusses the possibility of the automation of sophisticated subject indexing of medical journal articles. Approaches to subject descriptor assignment in information retrieval research are usually either based upon the manual descriptors in the database or generation of search parameters from the text of the article. The principles of the Medline indexing system are described, followed by a summary of a pilot project, based upon the Amed database. The results suggest that a more extended study, based upon Medline, should encompass various components: Extraction of 'concept strings' from titles and abstracts of records, based upon linguistic features characteristic of medical literature. Use of the Unified Medical Language System (UMLS) for identification of controlled vocabulary descriptors. Coordination of descriptors, utilising features of the Medline indexing system. The emphasis should be on system manipulation of data, based upon input, available resources and specifically designed rules.
Mansour, N.; Haraty, R.A.; Daher, W.; Houri, M.: ¬An auto-indexing method for Arabic text (2008) 0.02
```
0.016086869 = product of:
  0.048260607 = sum of:
    0.048260607 = product of:
      0.09652121 = sum of:
        0.09652121 = weight(_text_:indexing in 2103) [ClassicSimilarity], result of:
          0.09652121 = score(doc=2103,freq=8.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.5075084 = fieldWeight in 2103, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.046875 = fieldNorm(doc=2103)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

This work addresses the information retrieval problem of auto-indexing Arabic documents. Auto-indexing a text document refers to automatically extracting words that are suitable for building an index for the document. In this paper, we propose an auto-indexing method for Arabic text documents. This method is mainly based on morphological analysis and on a technique for assigning weights to words. The morphological analysis uses a number of grammatical rules to extract stem words that become candidate index words. The weight assignment technique computes weights for these words relative to the container document. The weight is based on how spread is the word in a document and not only on its rate of occurrence. The candidate index words are then sorted in descending order by weight so that information retrievers can select the more important index words. We empirically verify the usefulness of our method using several examples. For these examples, we obtained an average recall of 46% and an average precision of 64%.
Ahlgren, P.; Kekäläinen, J.: Indexing strategies for Swedish full text retrieval under different user scenarios (2007) 0.01
```
0.014988055 = product of:
  0.044964164 = sum of:
    0.044964164 = product of:
      0.08992833 = sum of:
        0.08992833 = weight(_text_:indexing in 896) [ClassicSimilarity], result of:
          0.08992833 = score(doc=896,freq=10.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.47284302 = fieldWeight in 896, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0390625 = fieldNorm(doc=896)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

This paper deals with Swedish full text retrieval and the problem of morphological variation of query terms in the document database. The effects of combination of indexing strategies with query terms on retrieval effectiveness were studied. Three of five tested combinations involved indexing strategies that used conflation, in the form of normalization. Further, two of these three combinations used indexing strategies that employed compound splitting. Normalization and compound splitting were performed by SWETWOL, a morphological analyzer for the Swedish language. A fourth combination attempted to group related terms by right hand truncation of query terms. The four combinations were compared to each other and to a baseline combination, where no attempt was made to counteract the problem of morphological variation of query terms in the document database. The five combinations were evaluated under six different user scenarios, where each scenario simulated a certain user type. The four alternative combinations outperformed the baseline, for each user scenario. The truncation combination had the best performance under each user scenario. The main conclusion of the paper is that normalization and right hand truncation (performed by a search expert) enhanced retrieval effectiveness in comparison to the baseline. The performance of the three combinations of indexing strategies with query terms based on normalization was not far below the performance of the truncation combination.

Hauer, M.: Automatische Indexierung (2000) 0.01

0.0134631805 = product of:
  0.04038954 = sum of:
    0.04038954 = product of:
      0.08077908 = sum of:
        0.08077908 = weight(_text_:22 in 5887) [ClassicSimilarity], result of:
          0.08077908 = score(doc=5887,freq=2.0), product of:
            0.17398734 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049684696 = queryNorm
            0.46428138 = fieldWeight in 5887, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=5887)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Source: Wissen in Aktion: Wege des Knowledge Managements. 22. Online-Tagung der DGI, Frankfurt am Main, 2.-4.5.2000. Proceedings. Hrsg.: R. Schmidt

Humphrey, S.M.; Névéol, A.; Browne, A.; Gobeil, J.; Ruch, P.; Darmoni, S.J.: Comparing a rule-based versus statistical system for automatic categorization of MEDLINE documents according to biomedical specialty (2009) 0.01
```
0.0134057235 = product of:
  0.04021717 = sum of:
    0.04021717 = product of:
      0.08043434 = sum of:
        0.08043434 = weight(_text_:indexing in 3300) [ClassicSimilarity], result of:
          0.08043434 = score(doc=3300,freq=8.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.42292362 = fieldWeight in 3300, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3300)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

Automatic document categorization is an important research problem in Information Science and Natural Language Processing. Many applications, including, Word Sense Disambiguation and Information Retrieval in large collections, can benefit from such categorization. This paper focuses on automatic categorization of documents from the biomedical literature into broad discipline-based categories. Two different systems are described and contrasted: CISMeF, which uses rules based on human indexing of the documents by the Medical Subject Headings (MeSH) controlled vocabulary in order to assign metaterms (MTs), and Journal Descriptor Indexing (JDI), based on human categorization of about 4,000 journals and statistical associations between journal descriptors (JDs) and textwords in the documents. We evaluate and compare the performance of these systems against a gold standard of humanly assigned categories for 100 MEDLINE documents, using six measures selected from trec_eval. The results show that for five of the measures performance is comparable, and for one measure JDI is superior. We conclude that these results favor JDI, given the significantly greater intellectual overhead involved in human indexing and maintaining a rule base for mapping MeSH terms to MTs. We also note a JDI method that associates JDs with MeSH indexing rather than textwords, and it may be worthwhile to investigate whether this JDI method (statistical) and CISMeF (rule-based) might be combined and then evaluated showing they are complementary to one another.
Tsai, C.-F.; McGarry, K.; Tait, J.: Qualitative evaluation of automatic assignment of keywords to images (2006) 0.01
```
0.011609698 = product of:
  0.03482909 = sum of:
    0.03482909 = product of:
      0.06965818 = sum of:
        0.06965818 = weight(_text_:indexing in 963) [ClassicSimilarity], result of:
          0.06965818 = score(doc=963,freq=6.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.3662626 = fieldWeight in 963, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0390625 = fieldNorm(doc=963)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

In image retrieval, most systems lack user-centred evaluation since they are assessed by some chosen ground truth dataset. The results reported through precision and recall assessed against the ground truth are thought of as being an acceptable surrogate for the judgment of real users. Much current research focuses on automatically assigning keywords to images for enhancing retrieval effectiveness. However, evaluation methods are usually based on system-level assessment, e.g. classification accuracy based on some chosen ground truth dataset. In this paper, we present a qualitative evaluation methodology for automatic image indexing systems. The automatic indexing task is formulated as one of image annotation, or automatic metadata generation for images. The evaluation is composed of two individual methods. First, the automatic indexing annotation results are assessed by human subjects. Second, the subjects are asked to annotate some chosen images as the test set whose annotations are used as ground truth. Then, the system is tested by the test set whose annotation results are judged against the ground truth. Only one of these methods is reported for most systems on which user-centred evaluation are conducted. We believe that both methods need to be considered for full evaluation. We also provide an example evaluation of our system based on this methodology. According to this study, our proposed evaluation methodology is able to provide deeper understanding of the system's performance.
Witschel, H.F.: Terminology extraction and automatic indexing : comparison and qualitative evaluation of methods (2005) 0.01
```
0.011609698 = product of:
  0.03482909 = sum of:
    0.03482909 = product of:
      0.06965818 = sum of:
        0.06965818 = weight(_text_:indexing in 1842) [ClassicSimilarity], result of:
          0.06965818 = score(doc=1842,freq=6.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.3662626 = fieldWeight in 1842, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1842)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

Many terminology engineering processes involve the task of automatic terminology extraction: before the terminology of a given domain can be modelled, organised or standardised, important concepts (or terms) of this domain have to be identified and fed into terminological databases. These serve in further steps as a starting point for compiling dictionaries, thesauri or maybe even terminological ontologies for the domain. For the extraction of the initial concepts, extraction methods are needed that operate on specialised language texts. On the other hand, many machine learning or information retrieval applications require automatic indexing techniques. In Machine Learning applications concerned with the automatic clustering or classification of texts, often feature vectors are needed that describe the contents of a given text briefly but meaningfully. These feature vectors typically consist of a fairly small set of index terms together with weights indicating their importance. Short but meaningful descriptions of document contents as provided by good index terms are also useful to humans: some knowledge management applications (e.g. topic maps) use them as a set of basic concepts (topics). The author believes that the tasks of terminology extraction and automatic indexing have much in common and can thus benefit from the same set of basic algorithms. It is the goal of this paper to outline some methods that may be used in both contexts, but also to find the discriminating factors between the two tasks that call for the variation of parameters or application of different techniques. The discussion of these methods will be based on statistical, syntactical and especially morphological properties of (index) terms. The paper is concluded by the presentation of some qualitative and quantitative results comparing statistical and morphological methods.
Souza, R.R.; Raghavan, K.S.: ¬A methodology for noun phrase-based automatic indexing (2006) 0.01
```
0.011375135 = product of:
  0.034125403 = sum of:
    0.034125403 = product of:
      0.068250805 = sum of:
        0.068250805 = weight(_text_:indexing in 173) [ClassicSimilarity], result of:
          0.068250805 = score(doc=173,freq=4.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.3588626 = fieldWeight in 173, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.046875 = fieldNorm(doc=173)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

The scholarly community is increasingly employing the Web both for publication of scholarly output and for locating and accessing relevant scholarly literature. Organization of this vast body of digital information assumes significance in this context. The sheer volume of digital information to be handled makes traditional indexing and knowledge representation strategies ineffective and impractical. It is, therefore, worth exploring new approaches. An approach being discussed considers the intrinsic semantics of texts of documents. Based on the hypothesis that noun phrases in a text are semantically rich in terms of their ability to represent the subject content of the document, this approach seeks to identify and extract noun phrases instead of single keywords, and use them as descriptors. This paper presents a methodology that has been developed for extracting noun phrases from Portuguese texts. The results of an experiment carried out to test the adequacy of the methodology are also presented.

Salton, G.: SMART System: 1961-1976 (2009) 0.01

0.01072458 = product of:
  0.032173738 = sum of:
    0.032173738 = product of:
      0.064347476 = sum of:
        0.064347476 = weight(_text_:indexing in 3879) [ClassicSimilarity], result of:
          0.064347476 = score(doc=3879,freq=2.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.3383389 = fieldWeight in 3879, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0625 = fieldNorm(doc=3879)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Abstract: While a number of researchers had experimented during the 1950's on automatic indexing and retrieval in various forms, it was Gerard Salton who brought the information retrieval experimental paradigm to full fruition, with his "SMART" system. His work has been enormously influential.

Dolamic, L.; Savoy, J.: Indexing and searching strategies for the Russian language (2009) 0.01
```
0.009479279 = product of:
  0.028437834 = sum of:
    0.028437834 = product of:
      0.05687567 = sum of:
        0.05687567 = weight(_text_:indexing in 3301) [ClassicSimilarity], result of:
          0.05687567 = score(doc=3301,freq=4.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.29905218 = fieldWeight in 3301, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3301)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

This paper describes and evaluates various stemming and indexing strategies for the Russian language. We design and evaluate two stemming approaches, a light and a more aggressive one, and compare these stemmers to the Snowball stemmer, to no stemming, and also to a language-independent approach (n-gram). To evaluate the suggested stemming strategies we apply various probabilistic information retrieval (IR) models, including the Okapi, the Divergence from Randomness (DFR), a statistical language model (LM), as well as two vector-space approaches, namely, the classical tf idf scheme and the dtu-dtn model. We find that the vector-space dtu-dtn and the DFR models tend to result in better retrieval effectiveness than the Okapi, LM, or tf idf models, while only the latter two IR approaches result in statistically significant performance differences. Ignoring stemming generally reduces the MAP by more than 50%, and these differences are always significant. When applying an n-gram approach, performance differences are usually lower than an approach involving stemming. Finally, our light stemmer tends to perform best, although performance differences between the light, aggressive, and Snowball stemmers are not statistically significant.

Lepsky, K.; Vorhauer, J.: Lingo - ein open source System für die Automatische Indexierung deutschsprachiger Dokumente (2006) 0.01

0.008975455 = product of:
  0.026926363 = sum of:
    0.026926363 = product of:
      0.053852726 = sum of:
        0.053852726 = weight(_text_:22 in 3581) [ClassicSimilarity], result of:
          0.053852726 = score(doc=3581,freq=2.0), product of:
            0.17398734 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049684696 = queryNorm
            0.30952093 = fieldWeight in 3581, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=3581)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 24. 3.2006 12:22:02

Search (29 results, page 1 of 2)

Authors

Languages

Types

Themes

Classifications