Search (22 results, page 1 of 2)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.11

0.10851419 = product of:
  0.27128547 = sum of:
    0.23174728 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
      0.23174728 = score(doc=562,freq=2.0), product of:
        0.41234848 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.04863741 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.039538182 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
      0.039538182 = score(doc=562,freq=2.0), product of:
        0.17031991 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.04863741 = queryNorm
        0.23214069 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
  0.4 = coord(2/5)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Figuerola, C.G.; Gomez, R.; Lopez de San Roman, E.: Stemming and n-grams in Spanish : an evaluation of their impact in information retrieval (2000) 0.08

0.08376864 = product of:
  0.4188432 = sum of:
    0.4188432 = weight(_text_:grams in 6501) [ClassicSimilarity], result of:
      0.4188432 = score(doc=6501,freq=2.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        1.0685225 = fieldWeight in 6501, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.09375 = fieldNorm(doc=6501)
  0.2 = coord(1/5)

Khoo, C.S.G.; Dai, D.; Loh, T.E.: Using statistical and contextual information to identify two- and three-character words in Chinese text (2002) 0.07
```
0.06980721 = product of:
  0.34903604 = sum of:
    0.34903604 = weight(_text_:grams in 5206) [ClassicSimilarity], result of:
      0.34903604 = score(doc=5206,freq=8.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.89043546 = fieldWeight in 5206, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5206)
  0.2 = coord(1/5)
```
Abstract

Khoo, Dai, and Loh examine new statistical methods for the identification of two and three character words in Chinese text. Some meaningful Chinese words are simple (independent units of one or more characters in a sentence that have independent meaning) but others are compounds of two or more simple words. In their segmentation they utilize the Modern Chinese Word Segmentation for Application of Information Processing, with some modifications to focus on meaningful words to do manual segmentation. About 37% of meaningful words are longer than 2 characters indicating a need to handle three and four character words. Four hundred sentences from news articles were manually broken into overlapping bi-grams and tri-grams. Using logistic regression, the log of the odds that such bi/tri-grams were meaningful words was calculated. Variables like relative frequency, document frequency, local frequency, and contextual and positional information, were incorporated in the model only if the concordance measure improved by at least 2% with their addition. For two- and three-character words relative frequency of adjacent characters and document frequency of overlapping bi-grams were found to be significant. Using measures of recall and precision where correct automatic segmentation is normalized either by manual segmentation or by automatic segmentation, the contextual information formula for 2 character words provides significantly better results than previous formulations and using both the 2 and 3 character formulations in combination significantly improves the 2 character results.
Ahmed, F.; Nürnberger, A.: Evaluation of n-gram conflation approaches for Arabic text retrieval (2009) 0.06
```
0.05923338 = product of:
  0.2961669 = sum of:
    0.2961669 = weight(_text_:grams in 2941) [ClassicSimilarity], result of:
      0.2961669 = score(doc=2941,freq=4.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.7555595 = fieldWeight in 2941, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.046875 = fieldNorm(doc=2941)
  0.2 = coord(1/5)
```
Abstract

In this paper we present a language-independent approach for conflation that does not depend on predefined rules or prior knowledge of the target language. The proposed unsupervised method is based on an enhancement of the pure n-gram model that can group related words based on various string-similarity measures, while restricting the search to specific locations of the target word by taking into account the order of n-grams. We show that the method is effective to achieve high score similarities for all word-form variations and reduces the ambiguity, i.e., obtains a higher precision and recall, compared to pure n-gram-based approaches for English, Portuguese, and Arabic. The proposed method is especially suited for conflation approaches in Arabic, since Arabic is a highly inflectional language. Therefore, we present in addition an adaptive user interface for Arabic text retrieval called araSearch. araSearch serves as a metasearch interface to existing search engines. The system is able to extend a query using the proposed conflation approach such that additional results for relevant subwords can be found automatically.

Object

n-grams
WordHoard: finding multiword units (20??) 0.05
```
0.048865046 = product of:
  0.24432522 = sum of:
    0.24432522 = weight(_text_:grams in 1123) [ClassicSimilarity], result of:
      0.24432522 = score(doc=1123,freq=2.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.6233048 = fieldWeight in 1123, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1123)
  0.2 = coord(1/5)
```
Abstract

WordHoard defines a multiword unit as a special type of collocate in which the component words comprise a meaningful phrase. For example, "Knight of the Round Table" is a meaningful multiword unit or phrase. WordHoard uses the notion of a pseudo-bigram to generalize the computation of bigram (two word) statistical measures to phrases (n-grams) longer than two words, and to allow comparisons of these measures for phrases with different word counts. WordHoard applies the localmaxs algorithm of Silva et al. to the pseudo-bigrams to identify potential compositional phrases that "stand out" in a text. WordHoard can also filter two and three word phrases using the word class filters suggested by Justeson and Katz.
Liu, X.; Croft, W.B.: Statistical language modeling for information retrieval (2004) 0.03
```
0.034903605 = product of:
  0.17451802 = sum of:
    0.17451802 = weight(_text_:grams in 4277) [ClassicSimilarity], result of:
      0.17451802 = score(doc=4277,freq=2.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.44521773 = fieldWeight in 4277, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4277)
  0.2 = coord(1/5)
```
Abstract

This chapter reviews research and applications in statistical language modeling for information retrieval (IR), which has emerged within the past several years as a new probabilistic framework for describing information retrieval processes. Generally speaking, statistical language modeling, or more simply language modeling (LM), involves estimating a probability distribution that captures statistical regularities of natural language use. Applied to information retrieval, language modeling refers to the problem of estimating the likelihood that a query and a document could have been generated by the same language model, given the language model of the document either with or without a language model of the query. The roots of statistical language modeling date to the beginning of the twentieth century when Markov tried to model letter sequences in works of Russian literature (Manning & Schütze, 1999). Zipf (1929, 1932, 1949, 1965) studied the statistical properties of text and discovered that the frequency of works decays as a Power function of each works rank. However, it was Shannon's (1951) work that inspired later research in this area. In 1951, eager to explore the applications of his newly founded information theory to human language, Shannon used a prediction game involving n-grams to investigate the information content of English text. He evaluated n-gram models' performance by comparing their crossentropy an texts with the true entropy estimated using predictions made by human subjects. For many years, statistical language models have been used primarily for automatic speech recognition. Since 1980, when the first significant language model was proposed (Rosenfeld, 2000), statistical language modeling has become a fundamental component of speech recognition, machine translation, and spelling correction.

Boleda, G.; Evert, S.: Multiword expressions : a pain in the neck of lexical semantics (2009) 0.02

0.015815273 = product of:
  0.079076365 = sum of:
    0.079076365 = weight(_text_:22 in 4888) [ClassicSimilarity], result of:
      0.079076365 = score(doc=4888,freq=2.0), product of:
        0.17031991 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.04863741 = queryNorm
        0.46428138 = fieldWeight in 4888, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.09375 = fieldNorm(doc=4888)
  0.2 = coord(1/5)

Date: 1. 3.2013 14:56:22

Monnerjahn, P.: Vorsprung ohne Technik : Übersetzen: Computer und Qualität (2000) 0.02

0.015815273 = product of:
  0.079076365 = sum of:
    0.079076365 = weight(_text_:22 in 5429) [ClassicSimilarity], result of:
      0.079076365 = score(doc=5429,freq=2.0), product of:
        0.17031991 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.04863741 = queryNorm
        0.46428138 = fieldWeight in 5429, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.09375 = fieldNorm(doc=5429)
  0.2 = coord(1/5)

Source: c't. 2000, H.22, S.230-231

Kuhlmann, U.; Monnerjahn, P.: Sprache auf Knopfdruck : Sieben automatische Übersetzungsprogramme im Test (2000) 0.01

0.013179394 = product of:
  0.06589697 = sum of:
    0.06589697 = weight(_text_:22 in 5428) [ClassicSimilarity], result of:
      0.06589697 = score(doc=5428,freq=2.0), product of:
        0.17031991 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.04863741 = queryNorm
        0.38690117 = fieldWeight in 5428, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.078125 = fieldNorm(doc=5428)
  0.2 = coord(1/5)

Source: c't. 2000, H.22, S.220-229

Doszkocs, T.E.; Zamora, A.: Dictionary services and spelling aids for Web searching (2004) 0.01

0.009319239 = product of:
  0.046596196 = sum of:
    0.046596196 = weight(_text_:22 in 2541) [ClassicSimilarity], result of:
      0.046596196 = score(doc=2541,freq=4.0), product of:
        0.17031991 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.04863741 = queryNorm
        0.27358043 = fieldWeight in 2541, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2541)
  0.2 = coord(1/5)

Date: 14. 8.2004 17:22:56
Source: Online. 28(2004) no.3, S.22-29

Hammwöhner, R.: TransRouter revisited : Decision support in the routing of translation projects (2000) 0.01

0.009225576 = product of:
  0.04612788 = sum of:
    0.04612788 = weight(_text_:22 in 5483) [ClassicSimilarity], result of:
      0.04612788 = score(doc=5483,freq=2.0), product of:
        0.17031991 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.04863741 = queryNorm
        0.2708308 = fieldWeight in 5483, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5483)
  0.2 = coord(1/5)

Date: 10.12.2000 18:22:35

Schneider, J.W.; Borlund, P.: ¬A bibliometric-based semiautomatic approach to identification of candidate thesaurus terms : parsing and filtering of noun phrases from citation contexts (2005) 0.01

0.009225576 = product of:
  0.04612788 = sum of:
    0.04612788 = weight(_text_:22 in 156) [ClassicSimilarity], result of:
      0.04612788 = score(doc=156,freq=2.0), product of:
        0.17031991 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.04863741 = queryNorm
        0.2708308 = fieldWeight in 156, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.0546875 = fieldNorm(doc=156)
  0.2 = coord(1/5)

Date: 8. 3.2007 19:55:22

Paolillo, J.C.: Linguistics and the information sciences (2009) 0.01

0.009225576 = product of:
  0.04612788 = sum of:
    0.04612788 = weight(_text_:22 in 3840) [ClassicSimilarity], result of:
      0.04612788 = score(doc=3840,freq=2.0), product of:
        0.17031991 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.04863741 = queryNorm
        0.2708308 = fieldWeight in 3840, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3840)
  0.2 = coord(1/5)

Date: 27. 8.2011 14:22:33

Schneider, R.: Web 3.0 ante portas? : Integration von Social Web und Semantic Web (2008) 0.01

0.009225576 = product of:
  0.04612788 = sum of:
    0.04612788 = weight(_text_:22 in 4184) [ClassicSimilarity], result of:
      0.04612788 = score(doc=4184,freq=2.0), product of:
        0.17031991 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.04863741 = queryNorm
        0.2708308 = fieldWeight in 4184, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4184)
  0.2 = coord(1/5)

Date: 22. 1.2011 10:38:28

Bian, G.-W.; Chen, H.-H.: Cross-language information access to multilingual collections on the Internet (2000) 0.01

0.007907636 = product of:
  0.039538182 = sum of:
    0.039538182 = weight(_text_:22 in 4436) [ClassicSimilarity], result of:
      0.039538182 = score(doc=4436,freq=2.0), product of:
        0.17031991 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.04863741 = queryNorm
        0.23214069 = fieldWeight in 4436, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.046875 = fieldNorm(doc=4436)
  0.2 = coord(1/5)

Date: 16. 2.2000 14:22:39

Lorenz, S.: Konzeption und prototypische Realisierung einer begriffsbasierten Texterschließung (2006) 0.01

0.007907636 = product of:
  0.039538182 = sum of:
    0.039538182 = weight(_text_:22 in 1746) [ClassicSimilarity], result of:
      0.039538182 = score(doc=1746,freq=2.0), product of:
        0.17031991 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.04863741 = queryNorm
        0.23214069 = fieldWeight in 1746, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.046875 = fieldNorm(doc=1746)
  0.2 = coord(1/5)

Date: 22. 3.2015 9:17:30

Sienel, J.; Weiss, M.; Laube, M.: Sprachtechnologien für die Informationsgesellschaft des 21. Jahrhunderts (2000) 0.01

0.006589697 = product of:
  0.032948487 = sum of:
    0.032948487 = weight(_text_:22 in 5557) [ClassicSimilarity], result of:
      0.032948487 = score(doc=5557,freq=2.0), product of:
        0.17031991 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.04863741 = queryNorm
        0.19345059 = fieldWeight in 5557, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5557)
  0.2 = coord(1/5)

Date: 26.12.2000 13:22:17

Pinker, S.: Wörter und Regeln : Die Natur der Sprache (2000) 0.01

0.006589697 = product of:
  0.032948487 = sum of:
    0.032948487 = weight(_text_:22 in 734) [ClassicSimilarity], result of:
      0.032948487 = score(doc=734,freq=2.0), product of:
        0.17031991 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.04863741 = queryNorm
        0.19345059 = fieldWeight in 734, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.0390625 = fieldNorm(doc=734)
  0.2 = coord(1/5)

Date: 19. 7.2002 14:22:31

Computational linguistics for the new millennium : divergence or synergy? Proceedings of the International Symposium held at the Ruprecht-Karls Universität Heidelberg, 21-22 July 2000. Festschrift in honour of Peter Hellwig on the occasion of his 60th birthday (2002) 0.01

0.006589697 = product of:
  0.032948487 = sum of:
    0.032948487 = weight(_text_:22 in 4900) [ClassicSimilarity], result of:
      0.032948487 = score(doc=4900,freq=2.0), product of:
        0.17031991 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.04863741 = queryNorm
        0.19345059 = fieldWeight in 4900, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4900)
  0.2 = coord(1/5)

Schürmann, H.: Software scannt Radio- und Fernsehsendungen : Recherche in Nachrichtenarchiven erleichtert (2001) 0.00

0.004612788 = product of:
  0.02306394 = sum of:
    0.02306394 = weight(_text_:22 in 5759) [ClassicSimilarity], result of:
      0.02306394 = score(doc=5759,freq=2.0), product of:
        0.17031991 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.04863741 = queryNorm
        0.1354154 = fieldWeight in 5759, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.02734375 = fieldNorm(doc=5759)
  0.2 = coord(1/5)

Source: Handelsblatt. Nr.79 vom 24.4.2001, S.22

Search (22 results, page 1 of 2)

Authors

Languages

Types

Themes

Subjects