Search (3680 results, page 1 of 184)

Chau, M.; Lu, Y.; Fang, X.; Yang, C.C.: Characteristics of character usage in Chinese Web searching (2009) 0.13
```
0.13408904 = product of:
  0.33522257 = sum of:
    0.30227408 = weight(_text_:grams in 2456) [ClassicSimilarity], result of:
      0.30227408 = score(doc=2456,freq=6.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.77113974 = fieldWeight in 2456, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2456)
    0.032948487 = weight(_text_:22 in 2456) [ClassicSimilarity], result of:
      0.032948487 = score(doc=2456,freq=2.0), product of:
        0.17031991 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.04863741 = queryNorm
        0.19345059 = fieldWeight in 2456, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2456)
  0.4 = coord(2/5)
```
Abstract

The use of non-English Web search engines has been prevalent. Given the popularity of Chinese Web searching and the unique characteristics of Chinese language, it is imperative to conduct studies with focuses on the analysis of Chinese Web search queries. In this paper, we report our research on the character usage of Chinese search logs from a Web search engine in Hong Kong. By examining the distribution of search query terms, we found that users tended to use more diversified terms and that the usage of characters in search queries was quite different from the character usage of general online information in Chinese. After studying the Zipf distribution of n-grams with different values of n, we found that the curve of unigram is the most curved one of all while the bigram curve follows the Zipf distribution best, and that the curves of n-grams with larger n (n = 3-6) had similar structures with ?-values in the range of 0.66-0.86. The distribution of combined n-grams was also studied. All the analyses are performed on the data both before and after the removal of function terms and incomplete terms and similar findings are revealed. We believe the findings from this study have provided some insights into further research in non-English Web searching and will assist in the design of more effective Chinese Web search engines.

Date

22.11.2008 17:57:22
Robertson, A.M.; Willett, P.: Applications of n-grams in textual information systems (1998) 0.11
```
0.111691535 = product of:
  0.5584577 = sum of:
    0.5584577 = weight(_text_:grams in 4715) [ClassicSimilarity], result of:
      0.5584577 = score(doc=4715,freq=8.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        1.4246967 = fieldWeight in 4715, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.0625 = fieldNorm(doc=4715)
  0.2 = coord(1/5)
```
Abstract

Provides an introduction to the use of n-grams in textual information systems, where an n-gram is a string of n, usually adjacent, characters, extracted from a section of continuous text. Applications that can be implemented efficiently and effectively using sets of n-grams include spelling errors detection and correction, query expansion, information retrieval with serial, inverted and signature files, dictionary look up, text compression, and language identification

Object

n-grams

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.11

0.10851419 = product of:
  0.27128547 = sum of:
    0.23174728 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
      0.23174728 = score(doc=562,freq=2.0), product of:
        0.41234848 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.04863741 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.039538182 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
      0.039538182 = score(doc=562,freq=2.0), product of:
        0.17031991 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.04863741 = queryNorm
        0.23214069 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
  0.4 = coord(2/5)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Huffman, S.: Acquaintance : language-independent document categorization by n-grams (1996) 0.10

0.09773009 = product of:
  0.48865044 = sum of:
    0.48865044 = weight(_text_:grams in 7530) [ClassicSimilarity], result of:
      0.48865044 = score(doc=7530,freq=2.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        1.2466096 = fieldWeight in 7530, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.109375 = fieldNorm(doc=7530)
  0.2 = coord(1/5)

Verwer, K.: Freiheit und Verantwortung bei Hans Jonas (2011) 0.09

0.09269892 = product of:
  0.46349457 = sum of:
    0.46349457 = weight(_text_:3a in 973) [ClassicSimilarity], result of:
      0.46349457 = score(doc=973,freq=2.0), product of:
        0.41234848 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.04863741 = queryNorm
        1.1240361 = fieldWeight in 973, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.09375 = fieldNorm(doc=973)
  0.2 = coord(1/5)

Content: Vgl.: http%3A%2F%2Fcreativechoice.org%2Fdoc%2FHansJonas.pdf&usg=AOvVaw1TM3teaYKgABL5H9yoIifA&opi=89978449.

Fachsystematik Bremen nebst Schlüssel 1970 ff. (1970 ff) 0.09
```
0.090428494 = product of:
  0.22607124 = sum of:
    0.19312274 = weight(_text_:3a in 3577) [ClassicSimilarity], result of:
      0.19312274 = score(doc=3577,freq=2.0), product of:
        0.41234848 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.04863741 = queryNorm
        0.46834838 = fieldWeight in 3577, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3577)
    0.032948487 = weight(_text_:22 in 3577) [ClassicSimilarity], result of:
      0.032948487 = score(doc=3577,freq=2.0), product of:
        0.17031991 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.04863741 = queryNorm
        0.19345059 = fieldWeight in 3577, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3577)
  0.4 = coord(2/5)
```
Content

1. Agrarwissenschaften 1981. - 3. Allgemeine Geographie 2.1972. - 3a. Allgemeine Naturwissenschaften 1.1973. - 4. Allgemeine Sprachwissenschaft, Allgemeine Literaturwissenschaft 2.1971. - 6. Allgemeines. 5.1983. - 7. Anglistik 3.1976. - 8. Astronomie, Geodäsie 4.1977. - 12. bio Biologie, bcp Biochemie-Biophysik, bot Botanik, zoo Zoologie 1981. - 13. Bremensien 3.1983. - 13a. Buch- und Bibliothekswesen 3.1975. - 14. Chemie 4.1977. - 14a. Elektrotechnik 1974. - 15 Ethnologie 2.1976. - 16,1. Geowissenschaften. Sachteil 3.1977. - 16,2. Geowissenschaften. Regionaler Teil 3.1977. - 17. Germanistik 6.1984. - 17a,1. Geschichte. Teilsystematik hil. - 17a,2. Geschichte. Teilsystematik his Neuere Geschichte. - 17a,3. Geschichte. Teilsystematik hit Neueste Geschichte. - 18. Humanbiologie 2.1983. - 19. Ingenieurwissenschaften 1974. - 20. siehe 14a. - 21. klassische Philologie 3.1977. - 22. Klinische Medizin 1975. - 23. Kunstgeschichte 2.1971. - 24. Kybernetik. 2.1975. - 25. Mathematik 3.1974. - 26. Medizin 1976. - 26a. Militärwissenschaft 1985. - 27. Musikwissenschaft 1978. - 27a. Noten 2.1974. - 28. Ozeanographie 3.1977. -29. Pädagogik 8.1985. - 30. Philosphie 3.1974. - 31. Physik 3.1974. - 33. Politik, Politische Wissenschaft, Sozialwissenschaft. Soziologie. Länderschlüssel. Register 1981. - 34. Psychologie 2.1972. - 35. Publizistik und Kommunikationswissenschaft 1985. - 36. Rechtswissenschaften 1986. - 37. Regionale Geograpgie 3.1975. - 37a. Religionswissenschaft 1970. - 38. Romanistik 3.1976. - 39. Skandinavistik 4.1985. - 40. Slavistik 1977. - 40a. Sonstige Sprachen und Literaturen 1973. - 43. Sport 4.1983. - 44. Theaterwissenschaft 1985. - 45. Theologie 2.1976. - 45a. Ur- und Frühgeschichte, Archäologie 1970. - 47. Volkskunde 1976. - 47a. Wirtschaftswissenschaften 1971 // Schlüssel: 1. Länderschlüssel 1971. - 2. Formenschlüssel (Kurzform) 1974. - 3. Personenschlüssel Literatur 5. Fassung 1968

Figuerola, C.G.; Gomez, R.; Lopez de San Roman, E.: Stemming and n-grams in Spanish : an evaluation of their impact in information retrieval (2000) 0.08

0.08376864 = product of:
  0.4188432 = sum of:
    0.4188432 = weight(_text_:grams in 6501) [ClassicSimilarity], result of:
      0.4188432 = score(doc=6501,freq=2.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        1.0685225 = fieldWeight in 6501, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.09375 = fieldNorm(doc=6501)
  0.2 = coord(1/5)

Kleineberg, M.: Context analysis and context indexing : formal pragmatics in knowledge organization (2014) 0.08

0.0772491 = product of:
  0.3862455 = sum of:
    0.3862455 = weight(_text_:3a in 1826) [ClassicSimilarity], result of:
      0.3862455 = score(doc=1826,freq=2.0), product of:
        0.41234848 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.04863741 = queryNorm
        0.93669677 = fieldWeight in 1826, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.078125 = fieldNorm(doc=1826)
  0.2 = coord(1/5)

Source: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=5&ved=0CDQQFjAE&url=http%3A%2F%2Fdigbib.ubka.uni-karlsruhe.de%2Fvolltexte%2Fdocuments%2F3131107&ei=HzFWVYvGMsiNsgGTyoFI&usg=AFQjCNE2FHUeR9oQTQlNC4TPedv4Mo3DaQ&sig2=Rlzpr7a3BLZZkqZCXXN_IA&bvm=bv.93564037,d.bGg&cad=rja

Stamatatos, E.: Plagiarism detection using stopword n-grams (2011) 0.07
```
0.07254578 = product of:
  0.3627289 = sum of:
    0.3627289 = weight(_text_:grams in 4955) [ClassicSimilarity], result of:
      0.3627289 = score(doc=4955,freq=6.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.92536765 = fieldWeight in 4955, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.046875 = fieldNorm(doc=4955)
  0.2 = coord(1/5)
```
Abstract

In this paper a novel method for detecting plagiarized passages in document collections is presented. In contrast to previous work in this field that uses content terms to represent documents, the proposed method is based on a small list of stopwords (i.e., very frequent words). We show that stopword n-grams reveal important information for plagiarism detection since they are able to capture syntactic similarities between suspicious and original documents and they can be used to detect the exact plagiarized passage boundaries. Experimental results on a publicly available corpus demonstrate that the performance of the proposed approach is competitive when compared with the best reported results. More importantly, it achieves significantly better results when dealing with difficult plagiarism cases where the plagiarized passages are highly modified and most of the words or phrases have been replaced with synonyms.

Object

n-grams
Khoo, C.S.G.; Dai, D.; Loh, T.E.: Using statistical and contextual information to identify two- and three-character words in Chinese text (2002) 0.07
```
0.06980721 = product of:
  0.34903604 = sum of:
    0.34903604 = weight(_text_:grams in 5206) [ClassicSimilarity], result of:
      0.34903604 = score(doc=5206,freq=8.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.89043546 = fieldWeight in 5206, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5206)
  0.2 = coord(1/5)
```
Abstract

Khoo, Dai, and Loh examine new statistical methods for the identification of two and three character words in Chinese text. Some meaningful Chinese words are simple (independent units of one or more characters in a sentence that have independent meaning) but others are compounds of two or more simple words. In their segmentation they utilize the Modern Chinese Word Segmentation for Application of Information Processing, with some modifications to focus on meaningful words to do manual segmentation. About 37% of meaningful words are longer than 2 characters indicating a need to handle three and four character words. Four hundred sentences from news articles were manually broken into overlapping bi-grams and tri-grams. Using logistic regression, the log of the odds that such bi/tri-grams were meaningful words was calculated. Variables like relative frequency, document frequency, local frequency, and contextual and positional information, were incorporated in the model only if the concordance measure improved by at least 2% with their addition. For two- and three-character words relative frequency of adjacent characters and document frequency of overlapping bi-grams were found to be significant. Using measures of recall and precision where correct automatic segmentation is normalized either by manual segmentation or by automatic segmentation, the contextual information formula for 2 character words provides significantly better results than previous formulations and using both the 2 and 3 character formulations in combination significantly improves the 2 character results.
Chen, L.; Fang, H.: ¬An automatic method for ex-tracting innovative ideas based on the Scopus® database (2019) 0.07
```
0.06980721 = product of:
  0.34903604 = sum of:
    0.34903604 = weight(_text_:grams in 5310) [ClassicSimilarity], result of:
      0.34903604 = score(doc=5310,freq=8.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.89043546 = fieldWeight in 5310, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5310)
  0.2 = coord(1/5)
```
Abstract

The novelty of knowledge claims in a research paper can be considered an evaluation criterion for papers to supplement citations. To provide a foundation for research evaluation from the perspective of innovativeness, we propose an automatic approach for extracting innovative ideas from the abstracts of technology and engineering papers. The approach extracts N-grams as candidates based on part-of-speech tagging and determines whether they are novel by checking the Scopus® database to determine whether they had ever been presented previously. Moreover, we discussed the distributions of innovative ideas in different abstract structures. To improve the performance by excluding noisy N-grams, a list of stopwords and a list of research description characteristics were developed. We selected abstracts of articles published from 2011 to 2017 with the topic of semantic analysis as the experimental texts. Excluding noisy N-grams, considering the distribution of innovative ideas in abstracts, and suitably combining N-grams can effectively improve the performance of automatic innovative idea extraction. Unlike co-word and co-citation analysis, innovative-idea extraction aims to identify the differences in a paper from all previously published papers.
Mustafa, S.H.; AI-Radaideh, Q.A.: Using n-grams for Arabic text searching (2004) 0.07
```
0.06910561 = product of:
  0.34552804 = sum of:
    0.34552804 = weight(_text_:grams in 2888) [ClassicSimilarity], result of:
      0.34552804 = score(doc=2888,freq=4.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.88148606 = fieldWeight in 2888, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2888)
  0.2 = coord(1/5)
```
Abstract

N-grams have been widely investigated for a number of text processing and retrieval applications. This article examines the performance of the digram and trigram term conflation techniques in the context of Arabic free text retrieval. It reports the results of using the N-gram approach for a corpus of thousands of distinct textual words drawn from a number of sources representing various disciplines. The results indicate that the digram method offers a better performance than trigram with respect to conflation precision and conflation recall ratios. In either case, the N-gram approach does not appear to provide an efficient conflation approach due to the peculiarities imposed by the Arabic infix structure that reduces the rate of correct N-gram matching.

Schrodt, R.: Tiefen und Untiefen im wissenschaftlichen Sprachgebrauch (2008) 0.06

0.061799277 = product of:
  0.30899638 = sum of:
    0.30899638 = weight(_text_:3a in 140) [ClassicSimilarity], result of:
      0.30899638 = score(doc=140,freq=2.0), product of:
        0.41234848 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.04863741 = queryNorm
        0.7493574 = fieldWeight in 140, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.0625 = fieldNorm(doc=140)
  0.2 = coord(1/5)

Content: Vgl. auch: https://studylibde.com/doc/13053640/richard-schrodt. Vgl. auch: http%3A%2F%2Fwww.univie.ac.at%2FGermanistik%2Fschrodt%2Fvorlesung%2Fwissenschaftssprache.doc&usg=AOvVaw1lDLDR6NFf1W0-oC9mEUJf.

Popper, K.R.: Three worlds : the Tanner lecture on human values. Deliverd at the University of Michigan, April 7, 1978 (1978) 0.06

0.061799277 = product of:
  0.30899638 = sum of:
    0.30899638 = weight(_text_:3a in 230) [ClassicSimilarity], result of:
      0.30899638 = score(doc=230,freq=2.0), product of:
        0.41234848 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.04863741 = queryNorm
        0.7493574 = fieldWeight in 230, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.0625 = fieldNorm(doc=230)
  0.2 = coord(1/5)

Source: https%3A%2F%2Ftannerlectures.utah.edu%2F_documents%2Fa-to-z%2Fp%2Fpopper80.pdf&usg=AOvVaw3f4QRTEH-OEBmoYr2J_c7H

Egghe, L.: Properties of the n-overlap vector and n-overlap similarity theory (2006) 0.06
```
0.060454816 = product of:
  0.30227408 = sum of:
    0.30227408 = weight(_text_:grams in 194) [ClassicSimilarity], result of:
      0.30227408 = score(doc=194,freq=6.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.77113974 = fieldWeight in 194, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.0390625 = fieldNorm(doc=194)
  0.2 = coord(1/5)
```
Abstract

In the first part of this article the author defines the n-overlap vector whose coordinates consist of the fraction of the objects (e.g., books, N-grams, etc.) that belong to 1, 2, , n sets (more generally: families) (e.g., libraries, databases, etc.). With the aid of the Lorenz concentration theory, a theory of n-overlap similarity is conceived together with corresponding measures, such as the generalized Jaccard index (generalizing the well-known Jaccard index in case n 5 2). Next, the distributional form of the n-overlap vector is determined assuming certain distributions of the object's and of the set (family) sizes. In this section the decreasing power law and decreasing exponential distribution is explained for the n-overlap vector. Both item (token) n-overlap and source (type) n-overlap are studied. The n-overlap properties of objects indexed by a hierarchical system (e.g., books indexed by numbers from a UDC or Dewey system or by N-grams) are presented in the final section. The author shows how the results given in the previous section can be applied as well as how the Lorenz order of the n-overlap vector is respected by an increase or a decrease of the level of refinement in the hierarchical system (e.g., the value N in N-grams).
Carterette, B.; Can, F.: Comparing inverted files and signature files for searching a large lexicon (2005) 0.06
```
0.05923338 = product of:
  0.2961669 = sum of:
    0.2961669 = weight(_text_:grams in 1029) [ClassicSimilarity], result of:
      0.2961669 = score(doc=1029,freq=4.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.7555595 = fieldWeight in 1029, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.046875 = fieldNorm(doc=1029)
  0.2 = coord(1/5)
```
Abstract

Signature files and inverted files are well-known index structures. In this paper we undertake a direct comparison of the two for searching for partially-specified queries in a large lexicon stored in main memory. Using n-grams to index lexicon terms, a bit-sliced signature file can be compressed to a smaller size than an inverted file if each n-gram sets only one bit in the term signature. With a signature width less than half the number of unique n-grams in the lexicon, the signature file method is about as fast as the inverted file method, and significantly smaller. Greater flexibility in memory usage and faster index generation time make signature files appropriate for searching large lexicons or other collections in an environment where memory is at a premium.
Ahmed, F.; Nürnberger, A.: Evaluation of n-gram conflation approaches for Arabic text retrieval (2009) 0.06
```
0.05923338 = product of:
  0.2961669 = sum of:
    0.2961669 = weight(_text_:grams in 2941) [ClassicSimilarity], result of:
      0.2961669 = score(doc=2941,freq=4.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.7555595 = fieldWeight in 2941, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.046875 = fieldNorm(doc=2941)
  0.2 = coord(1/5)
```
Abstract

In this paper we present a language-independent approach for conflation that does not depend on predefined rules or prior knowledge of the target language. The proposed unsupervised method is based on an enhancement of the pure n-gram model that can group related words based on various string-similarity measures, while restricting the search to specific locations of the target word by taking into account the order of n-grams. We show that the method is effective to achieve high score similarities for all word-form variations and reduces the ambiguity, i.e., obtains a higher precision and recall, compared to pure n-gram-based approaches for English, Portuguese, and Arabic. The proposed method is especially suited for conflation approaches in Arabic, since Arabic is a highly inflectional language. Therefore, we present in addition an adaptive user interface for Arabic text retrieval called araSearch. araSearch serves as a metasearch interface to existing search engines. The system is able to extend a query using the proposed conflation approach such that additional results for relevant subwords can be found automatically.

Object

n-grams

Vetere, G.; Lenzerini, M.: Models for semantic interoperability in service-oriented architectures (2005) 0.05

0.054074373 = product of:
  0.27037185 = sum of:
    0.27037185 = weight(_text_:3a in 306) [ClassicSimilarity], result of:
      0.27037185 = score(doc=306,freq=2.0), product of:
        0.41234848 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.04863741 = queryNorm
        0.65568775 = fieldWeight in 306, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.0546875 = fieldNorm(doc=306)
  0.2 = coord(1/5)

Content: Vgl.: http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=5386707&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D5386707.

Gödert, W.; Lepsky, K.: Informationelle Kompetenz : ein humanistischer Entwurf (2019) 0.05
```
0.054074373 = product of:
  0.27037185 = sum of:
    0.27037185 = weight(_text_:3a in 5955) [ClassicSimilarity], result of:
      0.27037185 = score(doc=5955,freq=2.0), product of:
        0.41234848 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.04863741 = queryNorm
        0.65568775 = fieldWeight in 5955, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5955)
  0.2 = coord(1/5)
```
Footnote

Rez. in: Philosophisch-ethische Rezensionen vom 09.11.2019 (Jürgen Czogalla), Unter: https://philosophisch-ethische-rezensionen.de/rezension/Goedert1.html. In: B.I.T. online 23(2020) H.3, S.345-347 (W. Sühl-Strohmenger) [Unter: https%3A%2F%2Fwww.b-i-t-online.de%2Fheft%2F2020-03-rezensionen.pdf&usg=AOvVaw0iY3f_zNcvEjeZ6inHVnOK]. In: Open Password Nr. 805 vom 14.08.2020 (H.-C. Hobohm) [Unter: https://www.password-online.de/?mailpoet_router&endpoint=view_in_browser&action=view&data=WzE0MywiOGI3NjZkZmNkZjQ1IiwwLDAsMTMxLDFd].

#220 0.05

0.05218774 = product of:
  0.2609387 = sum of:
    0.2609387 = weight(_text_:22 in 219) [ClassicSimilarity], result of:
      0.2609387 = score(doc=219,freq=4.0), product of:
        0.17031991 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.04863741 = queryNorm
        1.5320505 = fieldWeight in 219, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.21875 = fieldNorm(doc=219)
  0.2 = coord(1/5)

Date: 22. 5.1998 20:02:22

Search (3680 results, page 1 of 184)

Authors

Years

Languages

Types

Themes

Subjects

Classifications