Search (1306 results, page 1 of 66)

Chau, M.; Lu, Y.; Fang, X.; Yang, C.C.: Characteristics of character usage in Chinese Web searching (2009) 0.13
```
0.13408904 = product of:
  0.33522257 = sum of:
    0.30227408 = weight(_text_:grams in 2456) [ClassicSimilarity], result of:
      0.30227408 = score(doc=2456,freq=6.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.77113974 = fieldWeight in 2456, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2456)
    0.032948487 = weight(_text_:22 in 2456) [ClassicSimilarity], result of:
      0.032948487 = score(doc=2456,freq=2.0), product of:
        0.17031991 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.04863741 = queryNorm
        0.19345059 = fieldWeight in 2456, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2456)
  0.4 = coord(2/5)
```
Abstract

The use of non-English Web search engines has been prevalent. Given the popularity of Chinese Web searching and the unique characteristics of Chinese language, it is imperative to conduct studies with focuses on the analysis of Chinese Web search queries. In this paper, we report our research on the character usage of Chinese search logs from a Web search engine in Hong Kong. By examining the distribution of search query terms, we found that users tended to use more diversified terms and that the usage of characters in search queries was quite different from the character usage of general online information in Chinese. After studying the Zipf distribution of n-grams with different values of n, we found that the curve of unigram is the most curved one of all while the bigram curve follows the Zipf distribution best, and that the curves of n-grams with larger n (n = 3-6) had similar structures with ?-values in the range of 0.66-0.86. The distribution of combined n-grams was also studied. All the analyses are performed on the data both before and after the removal of function terms and incomplete terms and similar findings are revealed. We believe the findings from this study have provided some insights into further research in non-English Web searching and will assist in the design of more effective Chinese Web search engines.

Date

22.11.2008 17:57:22

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.11

0.10851419 = product of:
  0.27128547 = sum of:
    0.23174728 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
      0.23174728 = score(doc=562,freq=2.0), product of:
        0.41234848 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.04863741 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.039538182 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
      0.039538182 = score(doc=562,freq=2.0), product of:
        0.17031991 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.04863741 = queryNorm
        0.23214069 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
  0.4 = coord(2/5)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Figuerola, C.G.; Gomez, R.; Lopez de San Roman, E.: Stemming and n-grams in Spanish : an evaluation of their impact in information retrieval (2000) 0.08

0.08376864 = product of:
  0.4188432 = sum of:
    0.4188432 = weight(_text_:grams in 6501) [ClassicSimilarity], result of:
      0.4188432 = score(doc=6501,freq=2.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        1.0685225 = fieldWeight in 6501, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.09375 = fieldNorm(doc=6501)
  0.2 = coord(1/5)

Khoo, C.S.G.; Dai, D.; Loh, T.E.: Using statistical and contextual information to identify two- and three-character words in Chinese text (2002) 0.07
```
0.06980721 = product of:
  0.34903604 = sum of:
    0.34903604 = weight(_text_:grams in 5206) [ClassicSimilarity], result of:
      0.34903604 = score(doc=5206,freq=8.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.89043546 = fieldWeight in 5206, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5206)
  0.2 = coord(1/5)
```
Abstract

Khoo, Dai, and Loh examine new statistical methods for the identification of two and three character words in Chinese text. Some meaningful Chinese words are simple (independent units of one or more characters in a sentence that have independent meaning) but others are compounds of two or more simple words. In their segmentation they utilize the Modern Chinese Word Segmentation for Application of Information Processing, with some modifications to focus on meaningful words to do manual segmentation. About 37% of meaningful words are longer than 2 characters indicating a need to handle three and four character words. Four hundred sentences from news articles were manually broken into overlapping bi-grams and tri-grams. Using logistic regression, the log of the odds that such bi/tri-grams were meaningful words was calculated. Variables like relative frequency, document frequency, local frequency, and contextual and positional information, were incorporated in the model only if the concordance measure improved by at least 2% with their addition. For two- and three-character words relative frequency of adjacent characters and document frequency of overlapping bi-grams were found to be significant. Using measures of recall and precision where correct automatic segmentation is normalized either by manual segmentation or by automatic segmentation, the contextual information formula for 2 character words provides significantly better results than previous formulations and using both the 2 and 3 character formulations in combination significantly improves the 2 character results.
Mustafa, S.H.; AI-Radaideh, Q.A.: Using n-grams for Arabic text searching (2004) 0.07
```
0.06910561 = product of:
  0.34552804 = sum of:
    0.34552804 = weight(_text_:grams in 2888) [ClassicSimilarity], result of:
      0.34552804 = score(doc=2888,freq=4.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.88148606 = fieldWeight in 2888, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2888)
  0.2 = coord(1/5)
```
Abstract

N-grams have been widely investigated for a number of text processing and retrieval applications. This article examines the performance of the digram and trigram term conflation techniques in the context of Arabic free text retrieval. It reports the results of using the N-gram approach for a corpus of thousands of distinct textual words drawn from a number of sources representing various disciplines. The results indicate that the digram method offers a better performance than trigram with respect to conflation precision and conflation recall ratios. In either case, the N-gram approach does not appear to provide an efficient conflation approach due to the peculiarities imposed by the Arabic infix structure that reduces the rate of correct N-gram matching.

Schrodt, R.: Tiefen und Untiefen im wissenschaftlichen Sprachgebrauch (2008) 0.06

0.061799277 = product of:
  0.30899638 = sum of:
    0.30899638 = weight(_text_:3a in 140) [ClassicSimilarity], result of:
      0.30899638 = score(doc=140,freq=2.0), product of:
        0.41234848 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.04863741 = queryNorm
        0.7493574 = fieldWeight in 140, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.0625 = fieldNorm(doc=140)
  0.2 = coord(1/5)

Content: Vgl. auch: https://studylibde.com/doc/13053640/richard-schrodt. Vgl. auch: http%3A%2F%2Fwww.univie.ac.at%2FGermanistik%2Fschrodt%2Fvorlesung%2Fwissenschaftssprache.doc&usg=AOvVaw1lDLDR6NFf1W0-oC9mEUJf.

Egghe, L.: Properties of the n-overlap vector and n-overlap similarity theory (2006) 0.06
```
0.060454816 = product of:
  0.30227408 = sum of:
    0.30227408 = weight(_text_:grams in 194) [ClassicSimilarity], result of:
      0.30227408 = score(doc=194,freq=6.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.77113974 = fieldWeight in 194, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.0390625 = fieldNorm(doc=194)
  0.2 = coord(1/5)
```
Abstract

In the first part of this article the author defines the n-overlap vector whose coordinates consist of the fraction of the objects (e.g., books, N-grams, etc.) that belong to 1, 2, , n sets (more generally: families) (e.g., libraries, databases, etc.). With the aid of the Lorenz concentration theory, a theory of n-overlap similarity is conceived together with corresponding measures, such as the generalized Jaccard index (generalizing the well-known Jaccard index in case n 5 2). Next, the distributional form of the n-overlap vector is determined assuming certain distributions of the object's and of the set (family) sizes. In this section the decreasing power law and decreasing exponential distribution is explained for the n-overlap vector. Both item (token) n-overlap and source (type) n-overlap are studied. The n-overlap properties of objects indexed by a hierarchical system (e.g., books indexed by numbers from a UDC or Dewey system or by N-grams) are presented in the final section. The author shows how the results given in the previous section can be applied as well as how the Lorenz order of the n-overlap vector is respected by an increase or a decrease of the level of refinement in the hierarchical system (e.g., the value N in N-grams).
Carterette, B.; Can, F.: Comparing inverted files and signature files for searching a large lexicon (2005) 0.06
```
0.05923338 = product of:
  0.2961669 = sum of:
    0.2961669 = weight(_text_:grams in 1029) [ClassicSimilarity], result of:
      0.2961669 = score(doc=1029,freq=4.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.7555595 = fieldWeight in 1029, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.046875 = fieldNorm(doc=1029)
  0.2 = coord(1/5)
```
Abstract

Signature files and inverted files are well-known index structures. In this paper we undertake a direct comparison of the two for searching for partially-specified queries in a large lexicon stored in main memory. Using n-grams to index lexicon terms, a bit-sliced signature file can be compressed to a smaller size than an inverted file if each n-gram sets only one bit in the term signature. With a signature width less than half the number of unique n-grams in the lexicon, the signature file method is about as fast as the inverted file method, and significantly smaller. Greater flexibility in memory usage and faster index generation time make signature files appropriate for searching large lexicons or other collections in an environment where memory is at a premium.
Ahmed, F.; Nürnberger, A.: Evaluation of n-gram conflation approaches for Arabic text retrieval (2009) 0.06
```
0.05923338 = product of:
  0.2961669 = sum of:
    0.2961669 = weight(_text_:grams in 2941) [ClassicSimilarity], result of:
      0.2961669 = score(doc=2941,freq=4.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.7555595 = fieldWeight in 2941, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.046875 = fieldNorm(doc=2941)
  0.2 = coord(1/5)
```
Abstract

In this paper we present a language-independent approach for conflation that does not depend on predefined rules or prior knowledge of the target language. The proposed unsupervised method is based on an enhancement of the pure n-gram model that can group related words based on various string-similarity measures, while restricting the search to specific locations of the target word by taking into account the order of n-grams. We show that the method is effective to achieve high score similarities for all word-form variations and reduces the ambiguity, i.e., obtains a higher precision and recall, compared to pure n-gram-based approaches for English, Portuguese, and Arabic. The proposed method is especially suited for conflation approaches in Arabic, since Arabic is a highly inflectional language. Therefore, we present in addition an adaptive user interface for Arabic text retrieval called araSearch. araSearch serves as a metasearch interface to existing search engines. The system is able to extend a query using the proposed conflation approach such that additional results for relevant subwords can be found automatically.

Object

n-grams

Vetere, G.; Lenzerini, M.: Models for semantic interoperability in service-oriented architectures (2005) 0.05

0.054074373 = product of:
  0.27037185 = sum of:
    0.27037185 = weight(_text_:3a in 306) [ClassicSimilarity], result of:
      0.27037185 = score(doc=306,freq=2.0), product of:
        0.41234848 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.04863741 = queryNorm
        0.65568775 = fieldWeight in 306, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.0546875 = fieldNorm(doc=306)
  0.2 = coord(1/5)

Content: Vgl.: http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=5386707&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D5386707.

Yang, C.C.; Li, K.W.: ¬A heuristic method based on a statistical approach for chinese text segmentation (2005) 0.05
```
0.04936115 = product of:
  0.24680576 = sum of:
    0.24680576 = weight(_text_:grams in 4580) [ClassicSimilarity], result of:
      0.24680576 = score(doc=4580,freq=4.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.62963295 = fieldWeight in 4580, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4580)
  0.2 = coord(1/5)
```
Abstract

The authors propose a heuristic method for Chinese automatic text segmentation based an a statistical approach. This method is developed based an statistical information about the association among adjacent characters in Chinese text. Mutual information of bi-grams and significant estimation of tri-grams are utilized. A heuristic method with six rules is then proposed to determine the segmentation points in a Chinese sentence. No dictionary is required in this method. Chinese text segmentation is important in Chinese text indexing and thus greatly affects the performance of Chinese information retrieval. Due to the lack of delimiters of words in Chinese text, Chinese text segmentation is more difficult than English text segmentation. Besides, segmentation ambiguities and occurrences of out-of-vocabulary words (i.e., unknown words) are the major challenges in Chinese segmentation. Many research studies dealing with the problem of word segmentation have focused an the resolution of segmentation ambiguities. The problem of unknown word identification has not drawn much attention. The experimental result Shows that the proposed heuristic method is promising to segment the unknown words as weIl as the known words. The authors further investigated the distribution of the errors of commission and the errors of omission caused by the proposed heuristic method and benchmarked the proposed heuristic method with a previous proposed technique, boundary detection. It is found that the heuristic method outperformed the boundary detection method.
Morato, J.; Llorens, J.; Genova, G.; Moreiro, J.A.: Experiments in discourse analysis impact on information classification and retrieval algorithms (2003) 0.05
```
0.04936115 = product of:
  0.24680576 = sum of:
    0.24680576 = weight(_text_:grams in 1083) [ClassicSimilarity], result of:
      0.24680576 = score(doc=1083,freq=4.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.62963295 = fieldWeight in 1083, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1083)
  0.2 = coord(1/5)
```
Abstract

Researchers in indexing and retrieval systems have been advocating the inclusion of more contextual information to improve results. The proliferation of full-text databases and advances in computer storage capacity have made it possible to carry out text analysis by means of linguistic and extra-linguistic knowledge. Since the mid 80s, research has tended to pay more attention to context, giving discourse analysis a more central role. The research presented in this paper aims to check whether discourse variables have an impact on modern information retrieval and classification algorithms. In order to evaluate this hypothesis, a functional framework for information analysis in an automated environment has been proposed, where the n-grams (filtering) and the k-means and Chen's classification algorithms have been tested against sub-collections of documents based on the following discourse variables: "Genre", "Register", "Domain terminology", and "Document structure". The results obtained with the algorithms for the different sub-collections were compared to the MeSH information structure. These demonstrate that n-grams does not appear to have a clear dependence on discourse variables, though the k-means classification algorithm does, but only on domain terminology and document structure, and finally Chen's algorithm has a clear dependence on all of the discourse variables. This information could be used to design better classification algorithms, where discourse variables should be taken into account. Other minor conclusions drawn from these results are also presented.
Egghe, L.; Ravichandra Rao, I.K.: ¬The influence of the broadness of a query of a topic on its h-index : models and examples of the h-index of n-grams (2008) 0.05
```
0.04936115 = product of:
  0.24680576 = sum of:
    0.24680576 = weight(_text_:grams in 2009) [ClassicSimilarity], result of:
      0.24680576 = score(doc=2009,freq=4.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.62963295 = fieldWeight in 2009, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2009)
  0.2 = coord(1/5)
```
Abstract

The article studies the influence of the query formulation of a topic on its h-index. In order to generate pure random sets of documents, we used N-grams (N variable) to measure this influence: strings of zeros, truncated at the end. The used databases are WoS and Scopus. The formula h=T**1/alpha, proved in Egghe and Rousseau (2006) where T is the number of retrieved documents and is Lotka's exponent, is confirmed being a concavely increasing function of T. We also give a formula for the relation between h and N the length of the N-gram: h=D10**(-N/alpha) where D is a constant, a convexly decreasing function, which is found in our experiments. Nonlinear regression on h=T**1/alpha gives an estimation of , which can then be used to estimate the h-index of the entire database (Web of Science [WoS] and Scopus): h=S**1/alpha, , where S is the total number of documents in the database.
WordHoard: finding multiword units (20??) 0.05
```
0.048865046 = product of:
  0.24432522 = sum of:
    0.24432522 = weight(_text_:grams in 1123) [ClassicSimilarity], result of:
      0.24432522 = score(doc=1123,freq=2.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.6233048 = fieldWeight in 1123, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1123)
  0.2 = coord(1/5)
```
Abstract

WordHoard defines a multiword unit as a special type of collocate in which the component words comprise a meaningful phrase. For example, "Knight of the Round Table" is a meaningful multiword unit or phrase. WordHoard uses the notion of a pseudo-bigram to generalize the computation of bigram (two word) statistical measures to phrases (n-grams) longer than two words, and to allow comparisons of these measures for phrases with different word counts. WordHoard applies the localmaxs algorithm of Silva et al. to the pseudo-bigrams to identify potential compositional phrases that "stand out" in a text. WordHoard can also filter two and three word phrases using the word class filters suggested by Justeson and Katz.

Mas, S.; Marleau, Y.: Proposition of a faceted classification model to support corporate information organization and digital records management (2009) 0.05

0.04634946 = product of:
  0.23174728 = sum of:
    0.23174728 = weight(_text_:3a in 2918) [ClassicSimilarity], result of:
      0.23174728 = score(doc=2918,freq=2.0), product of:
        0.41234848 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.04863741 = queryNorm
        0.56201804 = fieldWeight in 2918, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=2918)
  0.2 = coord(1/5)

Footnote: Vgl.: http://ieeexplore.ieee.org/Xplore/login.jsp?reload=true&url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F4755313%2F4755314%2F04755480.pdf%3Farnumber%3D4755480&authDecision=-203.

Haas, S.W.; Grams, E.S.: Readers, authors, and page structure : a discussion of four questions arising from a content analysis of Web pages (2000) 0.04

0.04188432 = product of:
  0.2094216 = sum of:
    0.2094216 = weight(_text_:grams in 4387) [ClassicSimilarity], result of:
      0.2094216 = score(doc=4387,freq=2.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.5342612 = fieldWeight in 4387, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.046875 = fieldNorm(doc=4387)
  0.2 = coord(1/5)

Donsbach, W.: Wahrheit in den Medien : über den Sinn eines methodischen Objektivitätsbegriffes (2001) 0.04

0.03862455 = product of:
  0.19312274 = sum of:
    0.19312274 = weight(_text_:3a in 5895) [ClassicSimilarity], result of:
      0.19312274 = score(doc=5895,freq=2.0), product of:
        0.41234848 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.04863741 = queryNorm
        0.46834838 = fieldWeight in 5895, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5895)
  0.2 = coord(1/5)

Source: Politische Meinung. 381(2001) Nr.1, S.65-74 [https%3A%2F%2Fwww.dgfe.de%2Ffileadmin%2FOrdnerRedakteure%2FSektionen%2FSek02_AEW%2FKWF%2FPublikationen_Reihe_1989-2003%2FBand_17%2FBd_17_1994_355-406_A.pdf&usg=AOvVaw2KcbRsHy5UQ9QRIUyuOLNi]

Ackermann, E.: Piaget's constructivism, Papert's constructionism : what's the difference? (2001) 0.04

0.03862455 = product of:
  0.19312274 = sum of:
    0.19312274 = weight(_text_:3a in 692) [ClassicSimilarity], result of:
      0.19312274 = score(doc=692,freq=2.0), product of:
        0.41234848 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.04863741 = queryNorm
        0.46834838 = fieldWeight in 692, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.0390625 = fieldNorm(doc=692)
  0.2 = coord(1/5)

Content: Vgl.: https://www.semanticscholar.org/paper/Piaget-%E2%80%99-s-Constructivism-%2C-Papert-%E2%80%99-s-%3A-What-%E2%80%99-s-Ackermann/89cbcc1e740a4591443ff4765a6ae8df0fdf5554. Darunter weitere Hinweise auf verwandte Beiträge. Auch unter: Learning Group Publication 5(2001) no.3, S.438.

Liu, X.; Croft, W.B.: Statistical language modeling for information retrieval (2004) 0.03
```
0.034903605 = product of:
  0.17451802 = sum of:
    0.17451802 = weight(_text_:grams in 4277) [ClassicSimilarity], result of:
      0.17451802 = score(doc=4277,freq=2.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.44521773 = fieldWeight in 4277, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4277)
  0.2 = coord(1/5)
```
Abstract

This chapter reviews research and applications in statistical language modeling for information retrieval (IR), which has emerged within the past several years as a new probabilistic framework for describing information retrieval processes. Generally speaking, statistical language modeling, or more simply language modeling (LM), involves estimating a probability distribution that captures statistical regularities of natural language use. Applied to information retrieval, language modeling refers to the problem of estimating the likelihood that a query and a document could have been generated by the same language model, given the language model of the document either with or without a language model of the query. The roots of statistical language modeling date to the beginning of the twentieth century when Markov tried to model letter sequences in works of Russian literature (Manning & Schütze, 1999). Zipf (1929, 1932, 1949, 1965) studied the statistical properties of text and discovered that the frequency of works decays as a Power function of each works rank. However, it was Shannon's (1951) work that inspired later research in this area. In 1951, eager to explore the applications of his newly founded information theory to human language, Shannon used a prediction game involving n-grams to investigate the information content of English text. He evaluated n-gram models' performance by comparing their crossentropy an texts with the true entropy estimated using predictions made by human subjects. For many years, statistical language models have been used primarily for automatic speech recognition. Since 1980, when the first significant language model was proposed (Rosenfeld, 2000), statistical language modeling has become a fundamental component of speech recognition, machine translation, and spelling correction.
Dannenberg, R.B.; Birmingham, W.P.; Pardo, B.; Hu, N.; Meek, C.; Tzanetakis, G.: ¬A comparative evaluation of search techniques for query-by-humming using the MUSART testbed (2007) 0.03
```
0.034903605 = product of:
  0.17451802 = sum of:
    0.17451802 = weight(_text_:grams in 269) [ClassicSimilarity], result of:
      0.17451802 = score(doc=269,freq=2.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.44521773 = fieldWeight in 269, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.0390625 = fieldNorm(doc=269)
  0.2 = coord(1/5)
```
Abstract

Query-by-humming systems offer content-based searching for melodies and require no special musical training or knowledge. Many such systems have been built, but there has not been much useful evaluation and comparison in the literature due to the lack of shared databases and queries. The MUSART project testbed allows various search algorithms to be compared using a shared framework that automatically runs experiments and summarizes results. Using this testbed, the authors compared algorithms based on string alignment, melodic contour matching, a hidden Markov model, n-grams, and CubyHum. Retrieval performance is very sensitive to distance functions and the representation of pitch and rhythm, which raises questions about some previously published conclusions. Some algorithms are particularly sensitive to the quality of queries. Our queries, which are taken from human subjects in a realistic setting, are quite difficult, especially for n-gram models. Finally, simulations on query-by-humming performance as a function of database size indicate that retrieval performance falls only slowly as the database size increases.

Search (1306 results, page 1 of 66)

Authors

Languages

Types

Themes

Subjects

Classifications