Search (226 results, page 2 of 12)

Bernth, A.; McCord, M.; Warburton, K.: Terminology extraction for global content management (2003) 0.00

0.003827074 = product of:
  0.007654148 = sum of:
    0.007654148 = product of:
      0.015308296 = sum of:
        0.015308296 = weight(_text_:a in 4122) [ClassicSimilarity], result of:
          0.015308296 = score(doc=4122,freq=4.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.28826174 = fieldWeight in 4122, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.125 = fieldNorm(doc=4122)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Type: a

Koppel, M.; Akiva, N.; Dagan, I.: Feature instability as a criterion for selecting potential style markers (2006) 0.00

0.0035799001 = product of:
  0.0071598003 = sum of:
    0.0071598003 = product of:
      0.014319601 = sum of:
        0.014319601 = weight(_text_:a in 6092) [ClassicSimilarity], result of:
          0.014319601 = score(doc=6092,freq=14.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.26964417 = fieldWeight in 6092, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=6092)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: We introduce a new measure on linguistic features, called stability, which captures the extent to which a language element such as a word or a syntactic construct is replaceable by semantically equivalent elements. This measure may be perceived as quantifying the degree of available "synonymy" for a language item. We show that frequent, but unstable, features are especially useful as discriminators of an author's writing style.
Type: a

Kettunen, K.; Kunttu, T.; Järvelin, K.: To stem or lemmatize a highly inflectional language in a probabilistic IR environment? (2005) 0.00
```
0.0033826875 = product of:
  0.006765375 = sum of:
    0.006765375 = product of:
      0.01353075 = sum of:
        0.01353075 = weight(_text_:a in 4395) [ClassicSimilarity], result of:
          0.01353075 = score(doc=4395,freq=32.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.25478977 = fieldWeight in 4395, product of:
              5.656854 = tf(freq=32.0), with freq of:
                32.0 = termFreq=32.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4395)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Purpose - To show that stem generation compares well with lemmatization as a morphological tool for a highly inflectional language for IR purposes in a best-match retrieval system. Design/methodology/approach - Effects of three different morphological methods - lemmatization, stemming and stem production - for Finnish are compared in a probabilistic IR environment (INQUERY). Evaluation is done using a four-point relevance scale which is partitioned differently in different test settings. Findings - Results show that stem production, a lighter method than morphological lemmatization, compares well with lemmatization in a best-match IR environment. Differences in performance between stem production and lemmatization are small and they are not statistically significant in most of the tested settings. It is also shown that hitherto a rather neglected method of morphological processing for Finnish, stemming, performs reasonably well although the stemmer used - a Porter stemmer implementation - is far from optimal for a morphologically complex language like Finnish. In another series of tests, the effects of compound splitting and derivational expansion of queries are tested. Practical implications - Usefulness of morphological lemmatization and stem generation for IR purposes can be estimated with many factors. On the average P-R level they seem to behave very close to each other in a probabilistic IR system. Thus, the choice of the used method with highly inflectional languages needs to be estimated along other dimensions too. Originality/value - Results are achieved using Finnish as an example of a highly inflectional language. The results are of interest for anyone who is interested in processing of morphological variation of a highly inflected language for IR purposes.

Type

a
Comeau, D.C.; Wilbur, W.J.: Non-Word Identification or Spell Checking Without a Dictionary (2004) 0.00
```
0.0033657318 = product of:
  0.0067314636 = sum of:
    0.0067314636 = product of:
      0.013462927 = sum of:
        0.013462927 = weight(_text_:a in 2092) [ClassicSimilarity], result of:
          0.013462927 = score(doc=2092,freq=22.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.25351265 = fieldWeight in 2092, product of:
              4.690416 = tf(freq=22.0), with freq of:
                22.0 = termFreq=22.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=2092)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

MEDLINE is a collection of more than 12 million references and abstracts covering recent life science literature. With its continued growth and cutting-edge terminology, spell-checking with a traditional lexicon based approach requires significant additional manual followup. In this work, an internal corpus based context quality rating a, frequency, and simple misspelling transformations are used to rank words from most likely to be misspellings to least likely. Eleven-point average precisions of 0.891 have been achieved within a class of 42,340 all alphabetic words having an a score less than 10. Our models predict that 16,274 or 38% of these words are misspellings. Based an test data, this result has a recall of 79% and a precision of 86%. In other words, spell checking can be done by statistics instead of with a dictionary. As an application we examine the time history of low a words in MEDLINE titles and abstracts.

Type

a
Zhou, L.; Zhang, D.: NLPIR: a theoretical framework for applying Natural Language Processing to information retrieval (2003) 0.00
```
0.0033657318 = product of:
  0.0067314636 = sum of:
    0.0067314636 = product of:
      0.013462927 = sum of:
        0.013462927 = weight(_text_:a in 5148) [ClassicSimilarity], result of:
          0.013462927 = score(doc=5148,freq=22.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.25351265 = fieldWeight in 5148, product of:
              4.690416 = tf(freq=22.0), with freq of:
                22.0 = termFreq=22.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=5148)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Zhou and Zhang believe that for the potential of natural language processing NLP to be reached in information retrieval a framework for guiding the effort should be in place. They provide a graphic model that identifies different levels of natural language processing effort during the query, document matching process. A direct matching approach uses little NLP, an expansion approach with thesauri, little more, but an extraction approach will often use a variety of NLP techniques, as well as statistical methods. A transformation approach which creates intermediate representations of documents and queries is a step higher in NLP usage, and a uniform approach, which relies on a body of knowledge beyond that of the documents and queries to provide inference and sense making prior to matching would require a maximum NPL effort.

Type

a

Patrick, J.; Zhang, J.; Artola-Zubillaga, X.: ¬An architecture and query language for a federation of heterogeneous dictionary databases (2000) 0.00

0.00334869 = product of:
  0.00669738 = sum of:
    0.00669738 = product of:
      0.01339476 = sum of:
        0.01339476 = weight(_text_:a in 339) [ClassicSimilarity], result of:
          0.01339476 = score(doc=339,freq=4.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.25222903 = fieldWeight in 339, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.109375 = fieldNorm(doc=339)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Type: a

Pirkola, A.; Hedlund, T.; Keskustalo, H.; Järvelin, K.: Dictionary-based cross-language information retrieval : problems, methods, and research findings (2001) 0.00

0.00334869 = product of:
  0.00669738 = sum of:
    0.00669738 = product of:
      0.01339476 = sum of:
        0.01339476 = weight(_text_:a in 3908) [ClassicSimilarity], result of:
          0.01339476 = score(doc=3908,freq=4.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.25222903 = fieldWeight in 3908, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.109375 = fieldNorm(doc=3908)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Type: a

Cruys, T. van de; Moirón, B.V.: Semantics-based multiword expression extraction (2007) 0.00
```
0.00334869 = product of:
  0.00669738 = sum of:
    0.00669738 = product of:
      0.01339476 = sum of:
        0.01339476 = weight(_text_:a in 2919) [ClassicSimilarity], result of:
          0.01339476 = score(doc=2919,freq=16.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.25222903 = fieldWeight in 2919, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2919)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

This paper describes a fully unsupervised and automated method for large-scale extraction of multiword expressions (MWEs) from large corpora. The method aims at capturing the non-compositionality of MWEs; the intuition is that a noun within a MWE cannot easily be replaced by a semantically similar noun. To implement this intuition, a noun clustering is automatically extracted (using distributional similarity measures), which gives us clusters of semantically related nouns. Next, a number of statistical measures - based on selectional preferences - is developed that formalize the intuition of non-compositionality. Our approach has been tested on Dutch, and automatically evaluated using Dutch lexical resources.

Source

Proceedings of the Workshop on A Broader Perspective on Multiword Expressions, Prag 2007

Type

a
WordHoard: finding multiword units (20??) 0.00
```
0.0031324127 = product of:
  0.0062648254 = sum of:
    0.0062648254 = product of:
      0.012529651 = sum of:
        0.012529651 = weight(_text_:a in 1123) [ClassicSimilarity], result of:
          0.012529651 = score(doc=1123,freq=14.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.23593865 = fieldWeight in 1123, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1123)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

WordHoard defines a multiword unit as a special type of collocate in which the component words comprise a meaningful phrase. For example, "Knight of the Round Table" is a meaningful multiword unit or phrase. WordHoard uses the notion of a pseudo-bigram to generalize the computation of bigram (two word) statistical measures to phrases (n-grams) longer than two words, and to allow comparisons of these measures for phrases with different word counts. WordHoard applies the localmaxs algorithm of Silva et al. to the pseudo-bigrams to identify potential compositional phrases that "stand out" in a text. WordHoard can also filter two and three word phrases using the word class filters suggested by Justeson and Katz.

Type

a

Diaz, I.; Morato, J.; Lioréns, J.: ¬An algorithm for term conflation based on tree structures (2002) 0.00

0.0030255679 = product of:
  0.0060511357 = sum of:
    0.0060511357 = product of:
      0.012102271 = sum of:
        0.012102271 = weight(_text_:a in 246) [ClassicSimilarity], result of:
          0.012102271 = score(doc=246,freq=10.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.22789092 = fieldWeight in 246, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=246)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: This work presents a new stemming algorithm. This algorithm stores the stemming information in tree structures. This storage allows us to enhance the performance of the algorithm due to the reduction of the search space and the overall complexity. The final result of that stemming algorithm is a normalized concept, understanding this process as the automatic extraction of the generic form (or a lexeme) for a selected term.
Type: a

Fox, B.; Fox, C.J.: Efficient stemmer generation (2002) 0.00

0.0030255679 = product of:
  0.0060511357 = sum of:
    0.0060511357 = product of:
      0.012102271 = sum of:
        0.012102271 = weight(_text_:a in 2585) [ClassicSimilarity], result of:
          0.012102271 = score(doc=2585,freq=10.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.22789092 = fieldWeight in 2585, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=2585)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: This paper presents an algorithm for generating stemmers from text stemmer specification files. A small study shows that the generated stemmers are computationally efficient, often running faster than stemmers custom written to implement particular stemming algorithms. The stemmer specification files are easily written and modified by non-programmers, making it much easier to create a stemmer, or tune a stemmer's performance, than would be the case with a custom stemmer program. Stemmer generation is thus also human-resource efficient.
Type: a

Olsen, K.A.; Williams, J.G.: Spelling and grammar checking using the Web as a text repository (2004) 0.00
```
0.0030255679 = product of:
  0.0060511357 = sum of:
    0.0060511357 = product of:
      0.012102271 = sum of:
        0.012102271 = weight(_text_:a in 2891) [ClassicSimilarity], result of:
          0.012102271 = score(doc=2891,freq=40.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.22789092 = fieldWeight in 2891, product of:
              6.3245554 = tf(freq=40.0), with freq of:
                40.0 = termFreq=40.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.03125 = fieldNorm(doc=2891)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Natural languages are both complex and dynamic. They are in part formalized through dictionaries and grammar. Dictionaries attempt to provide definitions and examples of various usages for all the words in a language. Grammar, on the other hand, is the system of rules that defines the structure of a language and is concerned with the correct use and application of the language in speaking or writing. The fact that these two mechanisms lag behind the language as currently used is not a serious problem for those living in a language culture and talking their native language. However, the correct choice of words, expressions, and word relationships is much more difficult when speaking or writing in a foreign language. The basics of the grammar of a language may have been learned in school decades ago, and even then there were always several choices for the correct expression for an idea, fact, opinion, or emotion. Although many different parts of speech and their relationships can make for difficult language decisions, prepositions tend to be problematic for nonnative speakers of English, and, in reality, prepositions are a major problem in most languages. Does a speaker or writer say "in the West Coast" or "on the West Coast," or perhaps "at the West Coast"? In Norwegian, we are "in" a city, but "at" a place. But the distinction between cities and places is vague. To be absolutely correct, one really has to learn the right preposition for every single place. A simplistic way of resolving these language issues is to ask a native speaker. But even native speakers may disagree about the right choice of words. If there is disagreement, then one will have to ask more than one native speaker, treat his/her response as a vote for a particular choice, and perhaps choose the majority choice as the best possible alternative. In real life, such a procedure may be impossible or impractical, but in the electronic world, as we shall see, this is quite easy to achieve. Using the vast text repository of the Web, we may get a significant voting base for even the most detailed and distinct phrases. We shall start by introducing a set of examples to present our idea of using the text repository an the Web to aid in making the best word selection, especially for the use of prepositions. Then we will present a more general discussion of the possibilities and limitations of using the Web as an aid for correct writing.

Type

a

Liddy, E.D.: Natural language processing for information retrieval (2009) 0.00

0.0030255679 = product of:
  0.0060511357 = sum of:
    0.0060511357 = product of:
      0.012102271 = sum of:
        0.012102271 = weight(_text_:a in 3854) [ClassicSimilarity], result of:
          0.012102271 = score(doc=3854,freq=10.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.22789092 = fieldWeight in 3854, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=3854)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: Natural language processing (NLP) is the computerized approach to analyzing text that is based on both a set of theories and a set of technologies. Although NLP is a relatively recent area of research and application, compared with other information technology approaches, there have been sufficient successes to date that suggest that NLP-based information access technologies will continue to be a major area of research and development in information systems now and into the future.
Type: a

Perera, P.; Witte, R.: ¬A self-learning context-aware lemmatizer for German (2005) 0.00

0.0030255679 = product of:
  0.0060511357 = sum of:
    0.0060511357 = product of:
      0.012102271 = sum of:
        0.012102271 = weight(_text_:a in 4638) [ClassicSimilarity], result of:
          0.012102271 = score(doc=4638,freq=10.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.22789092 = fieldWeight in 4638, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=4638)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: Accurate lemmatization of German nouns mandates the use of a lexicon. Comprehensive lexicons, however, are expensive to build and maintain. We present a self-learning lemmatizer capable of automatically creating a full-form lexicon by processing German documents.
Type: a

Melucci, M.; Orio, N.: Design, implementation, and evaluation of a methodology for automatic stemmer generation (2007) 0.00
```
0.0029000505 = product of:
  0.005800101 = sum of:
    0.005800101 = product of:
      0.011600202 = sum of:
        0.011600202 = weight(_text_:a in 268) [ClassicSimilarity], result of:
          0.011600202 = score(doc=268,freq=12.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.21843673 = fieldWeight in 268, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=268)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The authors describe a statistical approach based on hidden Markov models (HMMs), for generating stemmers automatically. The proposed approach requires little effort to insert new languages in the system even if minimal linguistic knowledge is available. This is a key advantage especially for digital libraries, which are often developed for a specific institution or government because the program can manage a great amount of documents written in local languages. The evaluation described in the article shows that the stemmers implemented by means of HMMs are as effective as those based on linguistic rules.

Type

a
Bacchin, M.; Ferro, N.; Melucci, M.: ¬A probabilistic model for stemmer generation (2005) 0.00
```
0.0029000505 = product of:
  0.005800101 = sum of:
    0.005800101 = product of:
      0.011600202 = sum of:
        0.011600202 = weight(_text_:a in 1001) [ClassicSimilarity], result of:
          0.011600202 = score(doc=1001,freq=12.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.21843673 = fieldWeight in 1001, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1001)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

In this paper we will present a language-independent probabilistic model which can automatically generate stemmers. Stemmers can improve the retrieval effectiveness of information retrieval systems, however the designing and the implementation of stemmers requires a laborious amount of effort due to the fact that documents and queries are often written or spoken in several different languages. The probabilistic model proposed in this paper aims at the development of stemmers used for several languages. The proposed model describes the mutual reinforcement relationship between stems and derivations and then provides a probabilistic interpretation. A series of experiments shows that the stemmers generated by the probabilistic model are as effective as the ones based on linguistic knowledge.

Type

a

Kunze, C.; Wagner, A.: Anwendungsperspektive des GermaNet, eines lexikalisch-semantischen Netzes für das Deutsche (2001) 0.00

0.0028703054 = product of:
  0.005740611 = sum of:
    0.005740611 = product of:
      0.011481222 = sum of:
        0.011481222 = weight(_text_:a in 7456) [ClassicSimilarity], result of:
          0.011481222 = score(doc=7456,freq=4.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.2161963 = fieldWeight in 7456, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.09375 = fieldNorm(doc=7456)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Type: a

Hull, D.; Ait-Mokhtar, S.; Chuat, M.; Eisele, A.; Gaussier, E.; Grefenstette, G.; Isabelle, P.; Samulesson, C.; Segand, F.: Language technologies and patent search and classification (2001) 0.00

0.0028703054 = product of:
  0.005740611 = sum of:
    0.005740611 = product of:
      0.011481222 = sum of:
        0.011481222 = weight(_text_:a in 6318) [ClassicSimilarity], result of:
          0.011481222 = score(doc=6318,freq=4.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.2161963 = fieldWeight in 6318, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.09375 = fieldNorm(doc=6318)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Type: a

Vilar, P.; Dimec, J.: Krnjenje kot osnova nekaterih nekonvencionalnih metod poizvedovanja (2000) 0.00

0.0028703054 = product of:
  0.005740611 = sum of:
    0.005740611 = product of:
      0.011481222 = sum of:
        0.011481222 = weight(_text_:a in 6331) [ClassicSimilarity], result of:
          0.011481222 = score(doc=6331,freq=4.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.2161963 = fieldWeight in 6331, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.09375 = fieldNorm(doc=6331)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Footnote: Übers. d. Titels: Stemming as a basis for some non-conventional methods of information retrieval
Type: a

Bookstein, A.; Kulyukin, V.; Raita, T.; Nicholson, J.: Adapting measures of clumping strength to assess term-term similarity (2003) 0.00
```
0.0028703054 = product of:
  0.005740611 = sum of:
    0.005740611 = product of:
      0.011481222 = sum of:
        0.011481222 = weight(_text_:a in 1609) [ClassicSimilarity], result of:
          0.011481222 = score(doc=1609,freq=16.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.2161963 = fieldWeight in 1609, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=1609)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Automated information retrieval relies heavily an statistical regularities that emerge as terms are deposited to produce text. This paper examines statistical patterns expected of a pair of terms that are semantically related to each other. Guided by a conceptualization of the text generation process, we derive measures of how tightly two terms are semantically associated. Our main objective is to probe whether such measures yield reasonable results. Specifically, we examine how the tendency of a content bearing term to clump, as quantified by previously developed measures of term clumping, is influenced by the presence of other terms. This approach allows us to present a toolkit from which a range of measures can be constructed. As an illustration, one of several suggested measures is evaluated an a large text corpus built from an on-line encyclopedia.

Type

a

Search (226 results, page 2 of 12)

Authors

Languages

Types

Themes

Subjects

Classifications