Search (78 results, page 2 of 4)

Pepper, S.; Arnaud, P.J.L.: Absolutely PHAB : toward a general model of associative relations (2020) 0.02
```
0.020116309 = product of:
  0.040232617 = sum of:
    0.040232617 = product of:
      0.080465235 = sum of:
        0.080465235 = weight(_text_:n in 103) [ClassicSimilarity], result of:
          0.080465235 = score(doc=103,freq=6.0), product of:
            0.19504215 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.045236014 = queryNorm
            0.41255307 = fieldWeight in 103, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0390625 = fieldNorm(doc=103)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

There have been many attempts at classifying the semantic modification relations (R) of N + N compounds but this work has not led to the acceptance of a definitive scheme, so that devising a reusable classification is a worthwhile aim. The scope of this undertaking is extended to other binominal lexemes, i.e. units that contain two thing-morphemes without explicitly stating R, like prepositional units, N + relational adjective units, etc. The 25-relation taxonomy of Bourque (2014) was tested against over 15,000 binominal lexemes from 106 languages and extended to a 29-relation scheme ("Bourque2") through the introduction of two new reversible relations. Bourque2 is then mapped onto Hatcher's (1960) four-relation scheme (extended by the addition of a fifth relation, similarity , as "Hatcher2"). This results in a two-tier system usable at different degrees of granularities. On account of its semantic proximity to compounding, metonymy is then taken into account, following Janda's (2011) suggestion that it plays a role in word formation; Peirsman and Geeraerts' (2006) inventory of 23 metonymic patterns is mapped onto Bourque2, confirming the identity of metonymic and binominal modification relations. Finally, Blank's (2003) and Koch's (2001) work on lexical semantics justifies the addition to the scheme of a third, superordinate level which comprises the three Aristotelean principles of similarity, contiguity and contrast.

Frakes, W.B.: Stemming algorithms (1992) 0.02

0.01858265 = product of:
  0.0371653 = sum of:
    0.0371653 = product of:
      0.0743306 = sum of:
        0.0743306 = weight(_text_:n in 3503) [ClassicSimilarity], result of:
          0.0743306 = score(doc=3503,freq=2.0), product of:
            0.19504215 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.045236014 = queryNorm
            0.38110018 = fieldWeight in 3503, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0625 = fieldNorm(doc=3503)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: Desribes stemming algorithms - programs that relate morphologically similar indexing and search terms. Stemming is used to improve retrieval effectiveness and to reduce the size of indexing files. Several approaches to stemming are describes - table lookup, affix removal, successor variety, and n-gram. empirical studies of stemming are summarized. The Porter stemmer is described in detail, and a full implementation in C is presented

Koppel, M.; Akiva, N.; Dagan, I.: Feature instability as a criterion for selecting potential style markers (2006) 0.02

0.01858265 = product of:
  0.0371653 = sum of:
    0.0371653 = product of:
      0.0743306 = sum of:
        0.0743306 = weight(_text_:n in 6092) [ClassicSimilarity], result of:
          0.0743306 = score(doc=6092,freq=2.0), product of:
            0.19504215 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.045236014 = queryNorm
            0.38110018 = fieldWeight in 6092, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0625 = fieldNorm(doc=6092)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Byrne, C.C.; McCracken, S.A.: ¬An adaptive thesaurus employing semantic distance, relational inheritance and nominal compound interpretation for linguistic support of information retrieval (1999) 0.02

0.018386567 = product of:
  0.036773134 = sum of:
    0.036773134 = product of:
      0.07354627 = sum of:
        0.07354627 = weight(_text_:22 in 4483) [ClassicSimilarity], result of:
          0.07354627 = score(doc=4483,freq=2.0), product of:
            0.15840882 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045236014 = queryNorm
            0.46428138 = fieldWeight in 4483, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=4483)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 15. 3.2000 10:22:37

Boleda, G.; Evert, S.: Multiword expressions : a pain in the neck of lexical semantics (2009) 0.02

0.018386567 = product of:
  0.036773134 = sum of:
    0.036773134 = product of:
      0.07354627 = sum of:
        0.07354627 = weight(_text_:22 in 4888) [ClassicSimilarity], result of:
          0.07354627 = score(doc=4888,freq=2.0), product of:
            0.15840882 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045236014 = queryNorm
            0.46428138 = fieldWeight in 4888, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=4888)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 1. 3.2013 14:56:22

Liu, X.; Croft, W.B.: Statistical language modeling for information retrieval (2004) 0.02
```
0.016424898 = product of:
  0.032849796 = sum of:
    0.032849796 = product of:
      0.06569959 = sum of:
        0.06569959 = weight(_text_:n in 4277) [ClassicSimilarity], result of:
          0.06569959 = score(doc=4277,freq=4.0), product of:
            0.19504215 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.045236014 = queryNorm
            0.33684817 = fieldWeight in 4277, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4277)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

This chapter reviews research and applications in statistical language modeling for information retrieval (IR), which has emerged within the past several years as a new probabilistic framework for describing information retrieval processes. Generally speaking, statistical language modeling, or more simply language modeling (LM), involves estimating a probability distribution that captures statistical regularities of natural language use. Applied to information retrieval, language modeling refers to the problem of estimating the likelihood that a query and a document could have been generated by the same language model, given the language model of the document either with or without a language model of the query. The roots of statistical language modeling date to the beginning of the twentieth century when Markov tried to model letter sequences in works of Russian literature (Manning & Schütze, 1999). Zipf (1929, 1932, 1949, 1965) studied the statistical properties of text and discovered that the frequency of works decays as a Power function of each works rank. However, it was Shannon's (1951) work that inspired later research in this area. In 1951, eager to explore the applications of his newly founded information theory to human language, Shannon used a prediction game involving n-grams to investigate the information content of English text. He evaluated n-gram models' performance by comparing their crossentropy an texts with the true entropy estimated using predictions made by human subjects. For many years, statistical language models have been used primarily for automatic speech recognition. Since 1980, when the first significant language model was proposed (Rosenfeld, 2000), statistical language modeling has become a fundamental component of speech recognition, machine translation, and spelling correction.
Doval, Y.; Gómez-Rodríguez, C.: Comparing neural- and N-gram-based language models for word segmentation (2019) 0.02
```
0.016424898 = product of:
  0.032849796 = sum of:
    0.032849796 = product of:
      0.06569959 = sum of:
        0.06569959 = weight(_text_:n in 4675) [ClassicSimilarity], result of:
          0.06569959 = score(doc=4675,freq=4.0), product of:
            0.19504215 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.045236014 = queryNorm
            0.33684817 = fieldWeight in 4675, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4675)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Word segmentation is the task of inserting or deleting word boundary characters in order to separate character sequences that correspond to words in some language. In this article we propose an approach based on a beam search algorithm and a language model working at the byte/character level, the latter component implemented either as an n-gram model or a recurrent neural network. The resulting system analyzes the text input with no word boundaries one token at a time, which can be a character or a byte, and uses the information gathered by the language model to determine if a boundary must be placed in the current position or not. Our aim is to use this system in a preprocessing step for a microtext normalization system. This means that it needs to effectively cope with the data sparsity present on this kind of texts. We also strove to surpass the performance of two readily available word segmentation systems: The well-known and accessible Word Breaker by Microsoft, and the Python module WordSegment by Grant Jenks. The results show that we have met our objectives, and we hope to continue to improve both the precision and the efficiency of our system in the future.
SIGIR'92 : Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (1992) 0.02
```
0.016259817 = product of:
  0.032519635 = sum of:
    0.032519635 = product of:
      0.06503927 = sum of:
        0.06503927 = weight(_text_:n in 6671) [ClassicSimilarity], result of:
          0.06503927 = score(doc=6671,freq=8.0), product of:
            0.19504215 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.045236014 = queryNorm
            0.33346266 = fieldWeight in 6671, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.02734375 = fieldNorm(doc=6671)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Content

HARMAN, D.: Relevance feedback revisited; AALBERSBERG, I.J.: Incremental relevance feedback; TAGUE-SUTCLIFFE, J.: Measuring the informativeness of a retrieval process; LEWIS, D.D.: An evaluation of phrasal and clustered representations on a text categorization task; BLOSSEVILLE, M.J., G. HÉBRAIL, M.G. MONTEIL u. N. PÉNOT: Automatic document classification: natural language processing, statistical analysis, and expert system techniques used together; MASAND, B., G. LINOFF u. D. WALTZ: Classifying news stories using memory based reasoning; KEEN, E.M.: Term position ranking: some new test results; CROUCH, C.J. u. B. YANG: Experiments in automatic statistical thesaurus construction; GREFENSTETTE, G.: Use of syntactic context to produce term association lists for text retrieval; ANICK, P.G. u. R.A. FLYNN: Versioning of full-text information retrieval system; BURKOWSKI, F.J.: Retrieval activities in a database consisting of heterogeneous collections; DEERWESTER, S.C., K. WACLENA u. M. LaMAR: A textual object management system; NIE, J.-Y.:Towards a probabilistic modal logic for semantic-based information retrieval; WANG, A.W., S.K.M. WONG u. Y.Y. YAO: An analysis of vector space models based on computational geometry; BARTELL, B.T., G.W. COTTRELL u. R.K. BELEW: Latent semantic indexing is an optimal special case of multidimensional scaling; GLAVITSCH, U. u. P. SCHÄUBLE: A system for retrieving speech documents; MARGULIS, E.L.: N-Poisson document modelling; HESS, M.: An incrementally extensible document retrieval system based on linguistics and logical principles; COOPER, W.S., F.C. GEY u. D.P. DABNEY: Probabilistic retrieval based on staged logistic regression; FUHR, N.: Integration of probabilistic fact and text retrieval; CROFT, B., L.A. SMITH u. H. TURTLE: A loosely-coupled integration of a text retrieval system and an object-oriented database system; DUMAIS, S.T. u. J. NIELSEN: Automating the assignement of submitted manuscripts to reviewers; GOST, M.A. u. M. MASOTTI: Design of an OPAC database to permit different subject searching accesses; ROBERTSON, A.M. u. P. WILLETT: Searching for historical word forms in a database of 17th century English text using spelling correction methods; FAX, E.A., Q.F. CHEN u. L.S. HEATH: A faster algorithm for constructing minimal perfect hash functions; MOFFAT, A. u. J. ZOBEL: Parameterised compression for sparse bitmaps; GRANDI, F., P. TIBERIO u. P. Zezula: Frame-sliced patitioned parallel signature files; ALLEN, B.: Cognitive differences in end user searching of a CD-ROM index; SONNENWALD, D.H.: Developing a theory to guide the process of designing information retrieval systems; CUTTING, D.R., J.O. PEDERSEN, D. KARGER, u. J.W. TUKEY: Scatter/ Gather: a cluster-based approach to browsing large document collections; CHALMERS, M. u. P. CHITSON: Bead: Explorations in information visualization; WILLIAMSON, C. u. B. SHNEIDERMAN: The dynamic HomeFinder: evaluating dynamic queries in a real-estate information exploring system

Editor

Belkin, N.; Ingwersen, P.; Pejtersen, A.M.
Ekmekcioglu, F.C.; Lynch, M.F.; Willet, P.: Development and evaluation of conflation techniques for the implementation of a document retrieval system for Turkish text databases (1995) 0.02
```
0.016259817 = product of:
  0.032519635 = sum of:
    0.032519635 = product of:
      0.06503927 = sum of:
        0.06503927 = weight(_text_:n in 5797) [ClassicSimilarity], result of:
          0.06503927 = score(doc=5797,freq=2.0), product of:
            0.19504215 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.045236014 = queryNorm
            0.33346266 = fieldWeight in 5797, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5797)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Considers language processing techniques necessary for the implementation of a document retrieval system for Turkish text databases. Introduces the main characteristics of the Turkish language. Discusses the development of a stopword list and the evaluation of a stemming algorithm that takes account of the language's morphological structure. A 2 level description of Turkish morphology developed in Bilkent University, Ankara, is incorporated into a morphological parser, PC-KIMMO, to carry out stemming in Turkish databases. Describes the evaluation of string similarity measures - n-gram matching techniques - for Turkish. Reports experiments on 6 different Turkish text corpora

Melucci, M.; Orio, N.: Design, implementation, and evaluation of a methodology for automatic stemmer generation (2007) 0.02

0.016259817 = product of:
  0.032519635 = sum of:
    0.032519635 = product of:
      0.06503927 = sum of:
        0.06503927 = weight(_text_:n in 268) [ClassicSimilarity], result of:
          0.06503927 = score(doc=268,freq=2.0), product of:
            0.19504215 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.045236014 = queryNorm
            0.33346266 = fieldWeight in 268, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0546875 = fieldNorm(doc=268)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Bacchin, M.; Ferro, N.; Melucci, M.: ¬A probabilistic model for stemmer generation (2005) 0.02

0.016259817 = product of:
  0.032519635 = sum of:
    0.032519635 = product of:
      0.06503927 = sum of:
        0.06503927 = weight(_text_:n in 1001) [ClassicSimilarity], result of:
          0.06503927 = score(doc=1001,freq=2.0), product of:
            0.19504215 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.045236014 = queryNorm
            0.33346266 = fieldWeight in 1001, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1001)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Wordhoard (o.J.) 0.02
```
0.016259817 = product of:
  0.032519635 = sum of:
    0.032519635 = product of:
      0.06503927 = sum of:
        0.06503927 = weight(_text_:n in 3922) [ClassicSimilarity], result of:
          0.06503927 = score(doc=3922,freq=2.0), product of:
            0.19504215 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.045236014 = queryNorm
            0.33346266 = fieldWeight in 3922, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3922)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

WordHoard defines a multiword unit as a special type of collocate in which the component words comprise a meaningful phrase. For example, "Knight of the Round Table" is a meaningful multiword unit or phrase. WordHoard uses the notion of a pseudo-bigram to generalize the computation of bigram (two word) statistical measures to phrases (n-grams) longer than two words, and to allow comparisons of these measures for phrases with different word counts. WordHoard applies the localmaxs algorithm of Silva et al. to the pseudo-bigrams to identify potential compositional phrases that "stand out" in a text. WordHoard can also filter two and three word phrases using the word class filters suggested by Justeson and Katz.
WordHoard: finding multiword units (20??) 0.02
```
0.016259817 = product of:
  0.032519635 = sum of:
    0.032519635 = product of:
      0.06503927 = sum of:
        0.06503927 = weight(_text_:n in 1123) [ClassicSimilarity], result of:
          0.06503927 = score(doc=1123,freq=2.0), product of:
            0.19504215 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.045236014 = queryNorm
            0.33346266 = fieldWeight in 1123, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1123)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

WordHoard defines a multiword unit as a special type of collocate in which the component words comprise a meaningful phrase. For example, "Knight of the Round Table" is a meaningful multiword unit or phrase. WordHoard uses the notion of a pseudo-bigram to generalize the computation of bigram (two word) statistical measures to phrases (n-grams) longer than two words, and to allow comparisons of these measures for phrases with different word counts. WordHoard applies the localmaxs algorithm of Silva et al. to the pseudo-bigrams to identify potential compositional phrases that "stand out" in a text. WordHoard can also filter two and three word phrases using the word class filters suggested by Justeson and Katz.

Aizawa, A.; Kohlhase, M.: Mathematical information retrieval (2021) 0.02

0.016259817 = product of:
  0.032519635 = sum of:
    0.032519635 = product of:
      0.06503927 = sum of:
        0.06503927 = weight(_text_:n in 667) [ClassicSimilarity], result of:
          0.06503927 = score(doc=667,freq=2.0), product of:
            0.19504215 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.045236014 = queryNorm
            0.33346266 = fieldWeight in 667, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0546875 = fieldNorm(doc=667)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Evaluating information retrieval and access tasks. Eds.: Sakai, T., Oard, D., Kando, N. [https://doi.org/10.1007/978-981-15-5554-1_12]

Liu, P.J.; Saleh, M.; Pot, E.; Goodrich, B.; Sepassi, R.; Kaiser, L.; Shazeer, N.: Generating Wikipedia by summarizing long sequences (2018) 0.02

0.016259817 = product of:
  0.032519635 = sum of:
    0.032519635 = product of:
      0.06503927 = sum of:
        0.06503927 = weight(_text_:n in 773) [ClassicSimilarity], result of:
          0.06503927 = score(doc=773,freq=2.0), product of:
            0.19504215 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.045236014 = queryNorm
            0.33346266 = fieldWeight in 773, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0546875 = fieldNorm(doc=773)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Hutchins, J.: From first conception to first demonstration : the nascent years of machine translation, 1947-1954. A chronology (1997) 0.02

0.0153221395 = product of:
  0.030644279 = sum of:
    0.030644279 = product of:
      0.061288558 = sum of:
        0.061288558 = weight(_text_:22 in 1463) [ClassicSimilarity], result of:
          0.061288558 = score(doc=1463,freq=2.0), product of:
            0.15840882 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045236014 = queryNorm
            0.38690117 = fieldWeight in 1463, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=1463)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 31. 7.1996 9:22:19

Wacholder, N.; Byrd, R.J.: Retrieving information from full text using linguistic knowledge (1994) 0.01

0.013936987 = product of:
  0.027873974 = sum of:
    0.027873974 = product of:
      0.05574795 = sum of:
        0.05574795 = weight(_text_:n in 8524) [ClassicSimilarity], result of:
          0.05574795 = score(doc=8524,freq=2.0), product of:
            0.19504215 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.045236014 = queryNorm
            0.28582513 = fieldWeight in 8524, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.046875 = fieldNorm(doc=8524)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Beitzel, S.M.; Jensen, E.C.; Chowdhury, A.; Grossman, D.; Frieder, O; Goharian, N.: Fusion of effective retrieval strategies in the same information retrieval system (2004) 0.01

0.013936987 = product of:
  0.027873974 = sum of:
    0.027873974 = product of:
      0.05574795 = sum of:
        0.05574795 = weight(_text_:n in 2502) [ClassicSimilarity], result of:
          0.05574795 = score(doc=2502,freq=2.0), product of:
            0.19504215 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.045236014 = queryNorm
            0.28582513 = fieldWeight in 2502, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.046875 = fieldNorm(doc=2502)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Argamon, S.; Whitelaw, C.; Chase, P.; Hota, S.R.; Garg, N.; Levitan, S.: Stylistic text classification using functional lexical features (2007) 0.01

0.013936987 = product of:
  0.027873974 = sum of:
    0.027873974 = product of:
      0.05574795 = sum of:
        0.05574795 = weight(_text_:n in 280) [ClassicSimilarity], result of:
          0.05574795 = score(doc=280,freq=2.0), product of:
            0.19504215 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.045236014 = queryNorm
            0.28582513 = fieldWeight in 280, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.046875 = fieldNorm(doc=280)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Airio, E.; Kettunen, K.: Does dictionary based bilingual retrieval work in a non-normalized index? (2009) 0.01
```
0.013936987 = product of:
  0.027873974 = sum of:
    0.027873974 = product of:
      0.05574795 = sum of:
        0.05574795 = weight(_text_:n in 4224) [ClassicSimilarity], result of:
          0.05574795 = score(doc=4224,freq=2.0), product of:
            0.19504215 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.045236014 = queryNorm
            0.28582513 = fieldWeight in 4224, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.046875 = fieldNorm(doc=4224)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Many operational IR indexes are non-normalized, i.e. no lemmatization or stemming techniques, etc. have been employed in indexing. This poses a challenge for dictionary-based cross-language retrieval (CLIR), because translations are mostly lemmas. In this study, we face the challenge of dictionary-based CLIR in a non-normalized index. We test two optional approaches: FCG (Frequent Case Generation) and s-gramming. The idea of FCG is to automatically generate the most frequent inflected forms for a given lemma. FCG has been tested in monolingual retrieval and has been shown to be a good method for inflected retrieval, especially for highly inflected languages. S-gramming is an approximate string matching technique (an extension of n-gramming). The language pairs in our tests were English-Finnish, English-Swedish, Swedish-Finnish and Finnish-Swedish. Both our approaches performed quite well, but the results varied depending on the language pair. S-gramming and FCG performed quite equally in all the other language pairs except Finnish-Swedish, where s-gramming outperformed FCG.

Search (78 results, page 2 of 4)

Authors

Years

Types

Themes