Search (80 results, page 2 of 4)

Vazov, N.: Identification des differentes structures temporelles dans des textes et leur rôles dans le raisonnement temporel (1999) 0.02

0.01858265 = product of:
  0.0371653 = sum of:
    0.0371653 = product of:
      0.0743306 = sum of:
        0.0743306 = weight(_text_:n in 6203) [ClassicSimilarity], result of:
          0.0743306 = score(doc=6203,freq=2.0), product of:
            0.19504215 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.045236014 = queryNorm
            0.38110018 = fieldWeight in 6203, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0625 = fieldNorm(doc=6203)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Ferret, O.; Grau, B.; Masson, N.: Utilisation d'un réseau de cooccurences lexikales pour a méliorer une analyse thématique fondée sur la distribution des mots (1999) 0.02

0.01858265 = product of:
  0.0371653 = sum of:
    0.0371653 = product of:
      0.0743306 = sum of:
        0.0743306 = weight(_text_:n in 6295) [ClassicSimilarity], result of:
          0.0743306 = score(doc=6295,freq=2.0), product of:
            0.19504215 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.045236014 = queryNorm
            0.38110018 = fieldWeight in 6295, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0625 = fieldNorm(doc=6295)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Kummer, N.: Indexierungstechniken für das japanische Retrieval (2006) 0.02

0.01858265 = product of:
  0.0371653 = sum of:
    0.0371653 = product of:
      0.0743306 = sum of:
        0.0743306 = weight(_text_:n in 5979) [ClassicSimilarity], result of:
          0.0743306 = score(doc=5979,freq=2.0), product of:
            0.19504215 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.045236014 = queryNorm
            0.38110018 = fieldWeight in 5979, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0625 = fieldNorm(doc=5979)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Koppel, M.; Akiva, N.; Dagan, I.: Feature instability as a criterion for selecting potential style markers (2006) 0.02

0.01858265 = product of:
  0.0371653 = sum of:
    0.0371653 = product of:
      0.0743306 = sum of:
        0.0743306 = weight(_text_:n in 6092) [ClassicSimilarity], result of:
          0.0743306 = score(doc=6092,freq=2.0), product of:
            0.19504215 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.045236014 = queryNorm
            0.38110018 = fieldWeight in 6092, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0625 = fieldNorm(doc=6092)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Byrne, C.C.; McCracken, S.A.: ¬An adaptive thesaurus employing semantic distance, relational inheritance and nominal compound interpretation for linguistic support of information retrieval (1999) 0.02

0.018386567 = product of:
  0.036773134 = sum of:
    0.036773134 = product of:
      0.07354627 = sum of:
        0.07354627 = weight(_text_:22 in 4483) [ClassicSimilarity], result of:
          0.07354627 = score(doc=4483,freq=2.0), product of:
            0.15840882 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045236014 = queryNorm
            0.46428138 = fieldWeight in 4483, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=4483)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 15. 3.2000 10:22:37

Monnerjahn, P.: Vorsprung ohne Technik : Übersetzen: Computer und Qualität (2000) 0.02

0.018386567 = product of:
  0.036773134 = sum of:
    0.036773134 = product of:
      0.07354627 = sum of:
        0.07354627 = weight(_text_:22 in 5429) [ClassicSimilarity], result of:
          0.07354627 = score(doc=5429,freq=2.0), product of:
            0.15840882 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045236014 = queryNorm
            0.46428138 = fieldWeight in 5429, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=5429)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: c't. 2000, H.22, S.230-231

Liu, X.; Croft, W.B.: Statistical language modeling for information retrieval (2004) 0.02
```
0.016424898 = product of:
  0.032849796 = sum of:
    0.032849796 = product of:
      0.06569959 = sum of:
        0.06569959 = weight(_text_:n in 4277) [ClassicSimilarity], result of:
          0.06569959 = score(doc=4277,freq=4.0), product of:
            0.19504215 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.045236014 = queryNorm
            0.33684817 = fieldWeight in 4277, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4277)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

This chapter reviews research and applications in statistical language modeling for information retrieval (IR), which has emerged within the past several years as a new probabilistic framework for describing information retrieval processes. Generally speaking, statistical language modeling, or more simply language modeling (LM), involves estimating a probability distribution that captures statistical regularities of natural language use. Applied to information retrieval, language modeling refers to the problem of estimating the likelihood that a query and a document could have been generated by the same language model, given the language model of the document either with or without a language model of the query. The roots of statistical language modeling date to the beginning of the twentieth century when Markov tried to model letter sequences in works of Russian literature (Manning & Schütze, 1999). Zipf (1929, 1932, 1949, 1965) studied the statistical properties of text and discovered that the frequency of works decays as a Power function of each works rank. However, it was Shannon's (1951) work that inspired later research in this area. In 1951, eager to explore the applications of his newly founded information theory to human language, Shannon used a prediction game involving n-grams to investigate the information content of English text. He evaluated n-gram models' performance by comparing their crossentropy an texts with the true entropy estimated using predictions made by human subjects. For many years, statistical language models have been used primarily for automatic speech recognition. Since 1980, when the first significant language model was proposed (Rosenfeld, 2000), statistical language modeling has become a fundamental component of speech recognition, machine translation, and spelling correction.
Doval, Y.; Gómez-Rodríguez, C.: Comparing neural- and N-gram-based language models for word segmentation (2019) 0.02
```
0.016424898 = product of:
  0.032849796 = sum of:
    0.032849796 = product of:
      0.06569959 = sum of:
        0.06569959 = weight(_text_:n in 4675) [ClassicSimilarity], result of:
          0.06569959 = score(doc=4675,freq=4.0), product of:
            0.19504215 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.045236014 = queryNorm
            0.33684817 = fieldWeight in 4675, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4675)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Word segmentation is the task of inserting or deleting word boundary characters in order to separate character sequences that correspond to words in some language. In this article we propose an approach based on a beam search algorithm and a language model working at the byte/character level, the latter component implemented either as an n-gram model or a recurrent neural network. The resulting system analyzes the text input with no word boundaries one token at a time, which can be a character or a byte, and uses the information gathered by the language model to determine if a boundary must be placed in the current position or not. Our aim is to use this system in a preprocessing step for a microtext normalization system. This means that it needs to effectively cope with the data sparsity present on this kind of texts. We also strove to surpass the performance of two readily available word segmentation systems: The well-known and accessible Word Breaker by Microsoft, and the Python module WordSegment by Grant Jenks. The results show that we have met our objectives, and we hope to continue to improve both the precision and the efficiency of our system in the future.
Ekmekcioglu, F.C.; Lynch, M.F.; Willet, P.: Development and evaluation of conflation techniques for the implementation of a document retrieval system for Turkish text databases (1995) 0.02
```
0.016259817 = product of:
  0.032519635 = sum of:
    0.032519635 = product of:
      0.06503927 = sum of:
        0.06503927 = weight(_text_:n in 5797) [ClassicSimilarity], result of:
          0.06503927 = score(doc=5797,freq=2.0), product of:
            0.19504215 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.045236014 = queryNorm
            0.33346266 = fieldWeight in 5797, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5797)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Considers language processing techniques necessary for the implementation of a document retrieval system for Turkish text databases. Introduces the main characteristics of the Turkish language. Discusses the development of a stopword list and the evaluation of a stemming algorithm that takes account of the language's morphological structure. A 2 level description of Turkish morphology developed in Bilkent University, Ankara, is incorporated into a morphological parser, PC-KIMMO, to carry out stemming in Turkish databases. Describes the evaluation of string similarity measures - n-gram matching techniques - for Turkish. Reports experiments on 6 different Turkish text corpora

Melucci, M.; Orio, N.: Design, implementation, and evaluation of a methodology for automatic stemmer generation (2007) 0.02

0.016259817 = product of:
  0.032519635 = sum of:
    0.032519635 = product of:
      0.06503927 = sum of:
        0.06503927 = weight(_text_:n in 268) [ClassicSimilarity], result of:
          0.06503927 = score(doc=268,freq=2.0), product of:
            0.19504215 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.045236014 = queryNorm
            0.33346266 = fieldWeight in 268, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0546875 = fieldNorm(doc=268)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Bacchin, M.; Ferro, N.; Melucci, M.: ¬A probabilistic model for stemmer generation (2005) 0.02

0.016259817 = product of:
  0.032519635 = sum of:
    0.032519635 = product of:
      0.06503927 = sum of:
        0.06503927 = weight(_text_:n in 1001) [ClassicSimilarity], result of:
          0.06503927 = score(doc=1001,freq=2.0), product of:
            0.19504215 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.045236014 = queryNorm
            0.33346266 = fieldWeight in 1001, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1001)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Wordhoard (o.J.) 0.02
```
0.016259817 = product of:
  0.032519635 = sum of:
    0.032519635 = product of:
      0.06503927 = sum of:
        0.06503927 = weight(_text_:n in 3922) [ClassicSimilarity], result of:
          0.06503927 = score(doc=3922,freq=2.0), product of:
            0.19504215 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.045236014 = queryNorm
            0.33346266 = fieldWeight in 3922, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3922)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

WordHoard defines a multiword unit as a special type of collocate in which the component words comprise a meaningful phrase. For example, "Knight of the Round Table" is a meaningful multiword unit or phrase. WordHoard uses the notion of a pseudo-bigram to generalize the computation of bigram (two word) statistical measures to phrases (n-grams) longer than two words, and to allow comparisons of these measures for phrases with different word counts. WordHoard applies the localmaxs algorithm of Silva et al. to the pseudo-bigrams to identify potential compositional phrases that "stand out" in a text. WordHoard can also filter two and three word phrases using the word class filters suggested by Justeson and Katz.
WordHoard: finding multiword units (20??) 0.02
```
0.016259817 = product of:
  0.032519635 = sum of:
    0.032519635 = product of:
      0.06503927 = sum of:
        0.06503927 = weight(_text_:n in 1123) [ClassicSimilarity], result of:
          0.06503927 = score(doc=1123,freq=2.0), product of:
            0.19504215 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.045236014 = queryNorm
            0.33346266 = fieldWeight in 1123, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1123)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

WordHoard defines a multiword unit as a special type of collocate in which the component words comprise a meaningful phrase. For example, "Knight of the Round Table" is a meaningful multiword unit or phrase. WordHoard uses the notion of a pseudo-bigram to generalize the computation of bigram (two word) statistical measures to phrases (n-grams) longer than two words, and to allow comparisons of these measures for phrases with different word counts. WordHoard applies the localmaxs algorithm of Silva et al. to the pseudo-bigrams to identify potential compositional phrases that "stand out" in a text. WordHoard can also filter two and three word phrases using the word class filters suggested by Justeson and Katz.

Aizawa, A.; Kohlhase, M.: Mathematical information retrieval (2021) 0.02

0.016259817 = product of:
  0.032519635 = sum of:
    0.032519635 = product of:
      0.06503927 = sum of:
        0.06503927 = weight(_text_:n in 667) [ClassicSimilarity], result of:
          0.06503927 = score(doc=667,freq=2.0), product of:
            0.19504215 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.045236014 = queryNorm
            0.33346266 = fieldWeight in 667, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0546875 = fieldNorm(doc=667)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Evaluating information retrieval and access tasks. Eds.: Sakai, T., Oard, D., Kando, N. [https://doi.org/10.1007/978-981-15-5554-1_12]

Liu, P.J.; Saleh, M.; Pot, E.; Goodrich, B.; Sepassi, R.; Kaiser, L.; Shazeer, N.: Generating Wikipedia by summarizing long sequences (2018) 0.02

0.016259817 = product of:
  0.032519635 = sum of:
    0.032519635 = product of:
      0.06503927 = sum of:
        0.06503927 = weight(_text_:n in 773) [ClassicSimilarity], result of:
          0.06503927 = score(doc=773,freq=2.0), product of:
            0.19504215 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.045236014 = queryNorm
            0.33346266 = fieldWeight in 773, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0546875 = fieldNorm(doc=773)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Hutchins, J.: From first conception to first demonstration : the nascent years of machine translation, 1947-1954. A chronology (1997) 0.02

0.0153221395 = product of:
  0.030644279 = sum of:
    0.030644279 = product of:
      0.061288558 = sum of:
        0.061288558 = weight(_text_:22 in 1463) [ClassicSimilarity], result of:
          0.061288558 = score(doc=1463,freq=2.0), product of:
            0.15840882 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045236014 = queryNorm
            0.38690117 = fieldWeight in 1463, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=1463)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 31. 7.1996 9:22:19

Kuhlmann, U.; Monnerjahn, P.: Sprache auf Knopfdruck : Sieben automatische Übersetzungsprogramme im Test (2000) 0.02

0.0153221395 = product of:
  0.030644279 = sum of:
    0.030644279 = product of:
      0.061288558 = sum of:
        0.061288558 = weight(_text_:22 in 5428) [ClassicSimilarity], result of:
          0.061288558 = score(doc=5428,freq=2.0), product of:
            0.15840882 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045236014 = queryNorm
            0.38690117 = fieldWeight in 5428, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=5428)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: c't. 2000, H.22, S.220-229

Lezius, W.; Rapp, R.; Wettler, M.: ¬A morphology-system and part-of-speech tagger for German (1996) 0.02

0.0153221395 = product of:
  0.030644279 = sum of:
    0.030644279 = product of:
      0.061288558 = sum of:
        0.061288558 = weight(_text_:22 in 1693) [ClassicSimilarity], result of:
          0.061288558 = score(doc=1693,freq=2.0), product of:
            0.15840882 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045236014 = queryNorm
            0.38690117 = fieldWeight in 1693, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=1693)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 3.2015 9:37:18

Wacholder, N.; Byrd, R.J.: Retrieving information from full text using linguistic knowledge (1994) 0.01

0.013936987 = product of:
  0.027873974 = sum of:
    0.027873974 = product of:
      0.05574795 = sum of:
        0.05574795 = weight(_text_:n in 8524) [ClassicSimilarity], result of:
          0.05574795 = score(doc=8524,freq=2.0), product of:
            0.19504215 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.045236014 = queryNorm
            0.28582513 = fieldWeight in 8524, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.046875 = fieldNorm(doc=8524)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Beitzel, S.M.; Jensen, E.C.; Chowdhury, A.; Grossman, D.; Frieder, O; Goharian, N.: Fusion of effective retrieval strategies in the same information retrieval system (2004) 0.01

0.013936987 = product of:
  0.027873974 = sum of:
    0.027873974 = product of:
      0.05574795 = sum of:
        0.05574795 = weight(_text_:n in 2502) [ClassicSimilarity], result of:
          0.05574795 = score(doc=2502,freq=2.0), product of:
            0.19504215 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.045236014 = queryNorm
            0.28582513 = fieldWeight in 2502, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.046875 = fieldNorm(doc=2502)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Search (80 results, page 2 of 4)

Authors

Years

Languages

Types

Themes