Search (667 results, page 1 of 34)

Verwer, K.: Freiheit und Verantwortung bei Hans Jonas (2011) 0.09

0.09269892 = product of:
  0.46349457 = sum of:
    0.46349457 = weight(_text_:3a in 973) [ClassicSimilarity], result of:
      0.46349457 = score(doc=973,freq=2.0), product of:
        0.41234848 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.04863741 = queryNorm
        1.1240361 = fieldWeight in 973, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.09375 = fieldNorm(doc=973)
  0.2 = coord(1/5)

Content: Vgl.: http%3A%2F%2Fcreativechoice.org%2Fdoc%2FHansJonas.pdf&usg=AOvVaw1TM3teaYKgABL5H9yoIifA&opi=89978449.

Kleineberg, M.: Context analysis and context indexing : formal pragmatics in knowledge organization (2014) 0.08

0.0772491 = product of:
  0.3862455 = sum of:
    0.3862455 = weight(_text_:3a in 1826) [ClassicSimilarity], result of:
      0.3862455 = score(doc=1826,freq=2.0), product of:
        0.41234848 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.04863741 = queryNorm
        0.93669677 = fieldWeight in 1826, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.078125 = fieldNorm(doc=1826)
  0.2 = coord(1/5)

Source: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=5&ved=0CDQQFjAE&url=http%3A%2F%2Fdigbib.ubka.uni-karlsruhe.de%2Fvolltexte%2Fdocuments%2F3131107&ei=HzFWVYvGMsiNsgGTyoFI&usg=AFQjCNE2FHUeR9oQTQlNC4TPedv4Mo3DaQ&sig2=Rlzpr7a3BLZZkqZCXXN_IA&bvm=bv.93564037,d.bGg&cad=rja

Stamatatos, E.: Plagiarism detection using stopword n-grams (2011) 0.07
```
0.07254578 = product of:
  0.3627289 = sum of:
    0.3627289 = weight(_text_:grams in 4955) [ClassicSimilarity], result of:
      0.3627289 = score(doc=4955,freq=6.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.92536765 = fieldWeight in 4955, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.046875 = fieldNorm(doc=4955)
  0.2 = coord(1/5)
```
Abstract

In this paper a novel method for detecting plagiarized passages in document collections is presented. In contrast to previous work in this field that uses content terms to represent documents, the proposed method is based on a small list of stopwords (i.e., very frequent words). We show that stopword n-grams reveal important information for plagiarism detection since they are able to capture syntactic similarities between suspicious and original documents and they can be used to detect the exact plagiarized passage boundaries. Experimental results on a publicly available corpus demonstrate that the performance of the proposed approach is competitive when compared with the best reported results. More importantly, it achieves significantly better results when dealing with difficult plagiarism cases where the plagiarized passages are highly modified and most of the words or phrases have been replaced with synonyms.

Object

n-grams
Chen, L.; Fang, H.: ¬An automatic method for ex-tracting innovative ideas based on the Scopus® database (2019) 0.07
```
0.06980721 = product of:
  0.34903604 = sum of:
    0.34903604 = weight(_text_:grams in 5310) [ClassicSimilarity], result of:
      0.34903604 = score(doc=5310,freq=8.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.89043546 = fieldWeight in 5310, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5310)
  0.2 = coord(1/5)
```
Abstract

The novelty of knowledge claims in a research paper can be considered an evaluation criterion for papers to supplement citations. To provide a foundation for research evaluation from the perspective of innovativeness, we propose an automatic approach for extracting innovative ideas from the abstracts of technology and engineering papers. The approach extracts N-grams as candidates based on part-of-speech tagging and determines whether they are novel by checking the Scopus® database to determine whether they had ever been presented previously. Moreover, we discussed the distributions of innovative ideas in different abstract structures. To improve the performance by excluding noisy N-grams, a list of stopwords and a list of research description characteristics were developed. We selected abstracts of articles published from 2011 to 2017 with the topic of semantic analysis as the experimental texts. Excluding noisy N-grams, considering the distribution of innovative ideas in abstracts, and suitably combining N-grams can effectively improve the performance of automatic innovative idea extraction. Unlike co-word and co-citation analysis, innovative-idea extraction aims to identify the differences in a paper from all previously published papers.
Gödert, W.; Lepsky, K.: Informationelle Kompetenz : ein humanistischer Entwurf (2019) 0.05
```
0.054074373 = product of:
  0.27037185 = sum of:
    0.27037185 = weight(_text_:3a in 5955) [ClassicSimilarity], result of:
      0.27037185 = score(doc=5955,freq=2.0), product of:
        0.41234848 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.04863741 = queryNorm
        0.65568775 = fieldWeight in 5955, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5955)
  0.2 = coord(1/5)
```
Footnote

Rez. in: Philosophisch-ethische Rezensionen vom 09.11.2019 (Jürgen Czogalla), Unter: https://philosophisch-ethische-rezensionen.de/rezension/Goedert1.html. In: B.I.T. online 23(2020) H.3, S.345-347 (W. Sühl-Strohmenger) [Unter: https%3A%2F%2Fwww.b-i-t-online.de%2Fheft%2F2020-03-rezensionen.pdf&usg=AOvVaw0iY3f_zNcvEjeZ6inHVnOK]. In: Open Password Nr. 805 vom 14.08.2020 (H.-C. Hobohm) [Unter: https://www.password-online.de/?mailpoet_router&endpoint=view_in_browser&action=view&data=WzE0MywiOGI3NjZkZmNkZjQ1IiwwLDAsMTMxLDFd].
Dang, E.K.F.; Luk, R.W.P.; Allan, J.: Beyond bag-of-words : bigram-enhanced context-dependent term weights (2014) 0.05
```
0.04936115 = product of:
  0.24680576 = sum of:
    0.24680576 = weight(_text_:grams in 1283) [ClassicSimilarity], result of:
      0.24680576 = score(doc=1283,freq=4.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.62963295 = fieldWeight in 1283, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1283)
  0.2 = coord(1/5)
```
Abstract

While term independence is a widely held assumption in most of the established information retrieval approaches, it is clearly not true and various works in the past have investigated a relaxation of the assumption. One approach is to use n-grams in document representation instead of unigrams. However, the majority of early works on n-grams obtained only modest performance improvement. On the other hand, the use of information based on supporting terms or "contexts" of queries has been found to be promising. In particular, recent studies showed that using new context-dependent term weights improved the performance of relevance feedback (RF) retrieval compared with using traditional bag-of-words BM25 term weights. Calculation of the new term weights requires an estimation of the local probability of relevance of each query term occurrence. In previous studies, the estimation of this probability was based on unigrams that occur in the neighborhood of a query term. We explore an integration of the n-gram and context approaches by computing context-dependent term weights based on a mixture of unigrams and bigrams. Extensive experiments are performed using the title queries of the Text Retrieval Conference (TREC)-6, TREC-7, TREC-8, and TREC-2005 collections, for RF with relevance judgment of either the top 10 or top 20 documents of an initial retrieval. We identify some crucial elements needed in the use of bigrams in our methods, such as proper inverse document frequency (IDF) weighting of the bigrams and noise reduction by pruning bigrams with large document frequency values. We show that enhancing context-dependent term weights with bigrams is effective in further improving retrieval performance.

Zeng, Q.; Yu, M.; Yu, W.; Xiong, J.; Shi, Y.; Jiang, M.: Faceted hierarchy : a new graph type to organize scientific concepts and a construction method (2019) 0.05

0.04634946 = product of:
  0.23174728 = sum of:
    0.23174728 = weight(_text_:3a in 400) [ClassicSimilarity], result of:
      0.23174728 = score(doc=400,freq=2.0), product of:
        0.41234848 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.04863741 = queryNorm
        0.56201804 = fieldWeight in 400, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=400)
  0.2 = coord(1/5)

Content: Vgl.: https%3A%2F%2Faclanthology.org%2FD19-5317.pdf&usg=AOvVaw0ZZFyq5wWTtNTvNkrvjlGA.

Suchenwirth, L.: Sacherschliessung in Zeiten von Corona : neue Herausforderungen und Chancen (2019) 0.05

0.04634946 = product of:
  0.23174728 = sum of:
    0.23174728 = weight(_text_:3a in 484) [ClassicSimilarity], result of:
      0.23174728 = score(doc=484,freq=2.0), product of:
        0.41234848 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.04863741 = queryNorm
        0.56201804 = fieldWeight in 484, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=484)
  0.2 = coord(1/5)

Footnote: https%3A%2F%2Fjournals.univie.ac.at%2Findex.php%2Fvoebm%2Farticle%2Fdownload%2F5332%2F5271%2F&usg=AOvVaw2yQdFGHlmOwVls7ANCpTii.

Westerman, S.J.; Cribbin, T.; Collins, J.: Human assessments of document similarity (2010) 0.04

0.04188432 = product of:
  0.2094216 = sum of:
    0.2094216 = weight(_text_:grams in 3915) [ClassicSimilarity], result of:
      0.2094216 = score(doc=3915,freq=2.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.5342612 = fieldWeight in 3915, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.046875 = fieldNorm(doc=3915)
  0.2 = coord(1/5)

Object: n-grams

Teich, E.; Degaetano-Ortlieb, S.; Fankhauser, P.; Kermes, H.; Lapshinova-Koltunski, E.: ¬The linguistic construal of disciplinarity : a data-mining approach using register features (2016) 0.04
```
0.04188432 = product of:
  0.2094216 = sum of:
    0.2094216 = weight(_text_:grams in 3015) [ClassicSimilarity], result of:
      0.2094216 = score(doc=3015,freq=2.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.5342612 = fieldWeight in 3015, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.046875 = fieldNorm(doc=3015)
  0.2 = coord(1/5)
```
Abstract

We analyze the linguistic evolution of selected scientific disciplines over a 30-year time span (1970s to 2000s). Our focus is on four highly specialized disciplines at the boundaries of computer science that emerged during that time: computational linguistics, bioinformatics, digital construction, and microelectronics. Our analysis is driven by the question whether these disciplines develop a distinctive language use-both individually and collectively-over the given time period. The data set is the English Scientific Text Corpus (scitex), which includes texts from the 1970s/1980s and early 2000s. Our theoretical basis is register theory. In terms of methods, we combine corpus-based methods of feature extraction (various aggregated features [part-of-speech based], n-grams, lexico-grammatical patterns) and automatic text classification. The results of our research are directly relevant to the study of linguistic variation and languages for specific purposes (LSP) and have implications for various natural language processing (NLP) tasks, for example, authorship attribution, text mining, or training NLP tools.
Lin, Y.-R.; Margolin, D.; Lazer, D.: Uncovering social semantics from textual traces : a theory-driven approach and evidence from public statements of U.S. Members of Congress (2016) 0.04
```
0.04188432 = product of:
  0.2094216 = sum of:
    0.2094216 = weight(_text_:grams in 3078) [ClassicSimilarity], result of:
      0.2094216 = score(doc=3078,freq=2.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.5342612 = fieldWeight in 3078, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.046875 = fieldNorm(doc=3078)
  0.2 = coord(1/5)
```
Abstract

The increasing abundance of digital textual archives provides an opportunity for understanding human social systems. Yet the literature has not adequately considered the disparate social processes by which texts are produced. Drawing on communication theory, we identify three common processes by which documents might be detectably similar in their textual features-authors sharing subject matter, sharing goals, and sharing sources. We hypothesize that these processes produce distinct, detectable relationships between authors in different kinds of textual overlap. We develop a novel n-gram extraction technique to capture such signatures based on n-grams of different lengths. We test the hypothesis on a corpus where the author attributes are observable: the public statements of the members of the U.S. Congress. This article presents the first empirical finding that shows different social relationships are detectable through the structure of overlapping textual features. Our study has important implications for designing text modeling techniques to make sense of social phenomena from aggregate digital traces.

Farazi, M.: Faceted lightweight ontologies : a formalization and some experiments (2010) 0.04

0.03862455 = product of:
  0.19312274 = sum of:
    0.19312274 = weight(_text_:3a in 4997) [ClassicSimilarity], result of:
      0.19312274 = score(doc=4997,freq=2.0), product of:
        0.41234848 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.04863741 = queryNorm
        0.46834838 = fieldWeight in 4997, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4997)
  0.2 = coord(1/5)

Content: PhD Dissertation at International Doctorate School in Information and Communication Technology. Vgl.: https%3A%2F%2Fcore.ac.uk%2Fdownload%2Fpdf%2F150083013.pdf&usg=AOvVaw2n-qisNagpyT0lli_6QbAQ.

Shala, E.: ¬Die Autonomie des Menschen und der Maschine : gegenwärtige Definitionen von Autonomie zwischen philosophischem Hintergrund und technologischer Umsetzbarkeit (2014) 0.04
```
0.03862455 = product of:
  0.19312274 = sum of:
    0.19312274 = weight(_text_:3a in 4388) [ClassicSimilarity], result of:
      0.19312274 = score(doc=4388,freq=2.0), product of:
        0.41234848 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.04863741 = queryNorm
        0.46834838 = fieldWeight in 4388, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4388)
  0.2 = coord(1/5)
```
Footnote

Vgl. unter: https://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=2ahUKEwizweHljdbcAhVS16QKHXcFD9QQFjABegQICRAB&url=https%3A%2F%2Fwww.researchgate.net%2Fpublication%2F271200105_Die_Autonomie_des_Menschen_und_der_Maschine_-_gegenwartige_Definitionen_von_Autonomie_zwischen_philosophischem_Hintergrund_und_technologischer_Umsetzbarkeit_Redigierte_Version_der_Magisterarbeit_Karls&usg=AOvVaw06orrdJmFF2xbCCp_hL26q.

Piros, A.: Az ETO-jelzetek automatikus interpretálásának és elemzésének kérdései (2018) 0.04

0.03862455 = product of:
  0.19312274 = sum of:
    0.19312274 = weight(_text_:3a in 855) [ClassicSimilarity], result of:
      0.19312274 = score(doc=855,freq=2.0), product of:
        0.41234848 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.04863741 = queryNorm
        0.46834838 = fieldWeight in 855, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.0390625 = fieldNorm(doc=855)
  0.2 = coord(1/5)

Content: Vgl. auch: New automatic interpreter for complex UDC numbers. Unter: <https%3A%2F%2Fudcc.org%2Ffiles%2FAttilaPiros_EC_36-37_2014-2015.pdf&usg=AOvVaw3kc9CwDDCWP7aArpfjrs5b>

Hmeidi, I.I.; Al-Shalabi, R.F.; Al-Taani, A.T.; Najadat, H.; Al-Hazaimeh, S.A.: ¬A novel approach to the extraction of roots from Arabic words using bigrams (2010) 0.03
```
0.034903605 = product of:
  0.17451802 = sum of:
    0.17451802 = weight(_text_:grams in 3426) [ClassicSimilarity], result of:
      0.17451802 = score(doc=3426,freq=2.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.44521773 = fieldWeight in 3426, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3426)
  0.2 = coord(1/5)
```
Abstract

Root extraction is one of the most important topics in information retrieval (IR), natural language processing (NLP), text summarization, and many other important fields. In the last two decades, several algorithms have been proposed to extract Arabic roots. Most of these algorithms dealt with triliteral roots only, and some with fixed length words only. In this study, a novel approach to the extraction of roots from Arabic words using bigrams is proposed. Two similarity measures are used, the dissimilarity measure called the Manhattan distance, and Dice's measure of similarity. The proposed algorithm is tested on the Holy Qu'ran and on a corpus of 242 abstracts from the Proceedings of the Saudi Arabian National Computer Conferences. The two files used contain a wide range of data: the Holy Qu'ran contains most of the ancient Arabic words while the other file contains some modern Arabic words and some words borrowed from foreign languages in addition to the original Arabic words. The results of this study showed that combining N-grams with the Dice measure gives better results than using the Manhattan distance measure.
Yu, L.-C.; Wu, C.-H.; Chang, R.-Y.; Liu, C.-H.; Hovy, E.H.: Annotation and verification of sense pools in OntoNotes (2010) 0.03
```
0.034903605 = product of:
  0.17451802 = sum of:
    0.17451802 = weight(_text_:grams in 4236) [ClassicSimilarity], result of:
      0.17451802 = score(doc=4236,freq=2.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.44521773 = fieldWeight in 4236, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4236)
  0.2 = coord(1/5)
```
Abstract

The paper describes the OntoNotes, a multilingual (English, Chinese and Arabic) corpus with large-scale semantic annotations, including predicate-argument structure, word senses, ontology linking, and coreference. The underlying semantic model of OntoNotes involves word senses that are grouped into so-called sense pools, i.e., sets of near-synonymous senses of words. Such information is useful for many applications, including query expansion for information retrieval (IR) systems, (near-)duplicate detection for text summarization systems, and alternative word selection for writing support systems. Although a sense pool provides a set of near-synonymous senses of words, there is still no knowledge about whether two words in a pool are interchangeable in practical use. Therefore, this paper devises an unsupervised algorithm that incorporates Google n-grams and a statistical test to determine whether a word in a pool can be substituted by other words in the same pool. The n-gram features are used to measure the degree of context mismatch for a substitution. The statistical test is then applied to determine whether the substitution is adequate based on the degree of mismatch. The proposed method is compared with a supervised method, namely Linear Discriminant Analysis (LDA). Experimental results show that the proposed unsupervised method can achieve comparable performance with the supervised method.
Seki, K.; Uehara, K.: Opinionated document retrieval using subjective triggers (2011) 0.03
```
0.034903605 = product of:
  0.17451802 = sum of:
    0.17451802 = weight(_text_:grams in 4446) [ClassicSimilarity], result of:
      0.17451802 = score(doc=4446,freq=2.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.44521773 = fieldWeight in 4446, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4446)
  0.2 = coord(1/5)
```
Abstract

This article proposes a novel application of a statistical language model to opinionated document retrieval targeting weblogs (blogs). In particular, we explore the use of the trigger model-originally developed for incorporating distant word dependencies-in order to model the characteristics of personal opinions that cannot be properly modeled by standard n-grams. Our primary assumption is that there are two constituents to form a subjective opinion. One is the subject of the opinion or the object that the opinion is about, and the other is a subjective expression; the former is regarded as a triggering word and the latter as a triggered word. We automatically identify those subjective trigger patterns to build a language model from a corpus of product customer reviews. Experimental results on the Text Retrieval Conference Blog track test collections show that, when used for reranking initial search results, our proposed model significantly improves opinionated document retrieval. In addition, we report on an experiment on dynamic adaptation of the model to a given query, which is found effective for most of the difficult queries categorized under politics and organizations. We also demonstrate that, without any modification to the proposed model itself, it can be effectively applied to polarized opinion retrieval.
Ortiz-Cordova, A.; Yang, Y.; Jansen, B.J.: External to internal search : associating searching on search engines with searching on sites (2015) 0.03
```
0.034903605 = product of:
  0.17451802 = sum of:
    0.17451802 = weight(_text_:grams in 2675) [ClassicSimilarity], result of:
      0.17451802 = score(doc=2675,freq=2.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.44521773 = fieldWeight in 2675, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2675)
  0.2 = coord(1/5)
```
Abstract

We analyze the transitions from external search, searching on web search engines, to internal search, searching on websites. We categorize 295,571 search episodes composed of a query submitted to web search engines and the subsequent queries submitted to a single website search by the same users. There are a total of 1,136,390 queries from all searches, of which 295,571 are external search queries and 840,819 are internal search queries. We algorithmically classify queries into states and then use n-grams to categorize search patterns. We cluster the searching episodes into major patterns and identify the most commonly occurring, which are: (1) Explorers (43% of all patterns) with a broad external search query and then broad internal search queries, (2) Navigators (15%) with an external search query containing a URL component and then specific internal search queries, and (3) Shifters (15%) with a different, seemingly unrelated, query types when transitioning from external to internal search. The implications of this research are that external search and internal search sessions are part of a single search episode and that online businesses can leverage these search episodes to more effectively target potential customers.

Gencosman, B.C.; Ozmutlu, H.C.; Ozmutlu, S.: Character n-gram application for automatic new topic identification (2014) 0.03

0.034903605 = product of:
  0.17451802 = sum of:
    0.17451802 = weight(_text_:grams in 2688) [ClassicSimilarity], result of:
      0.17451802 = score(doc=2688,freq=2.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.44521773 = fieldWeight in 2688, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2688)
  0.2 = coord(1/5)

Object: n-grams

Roy, R.S.; Agarwal, S.; Ganguly, N.; Choudhury, M.: Syntactic complexity of Web search queries through the lenses of language models, networks and users (2016) 0.03
```
0.034903605 = product of:
  0.17451802 = sum of:
    0.17451802 = weight(_text_:grams in 3188) [ClassicSimilarity], result of:
      0.17451802 = score(doc=3188,freq=2.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.44521773 = fieldWeight in 3188, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3188)
  0.2 = coord(1/5)
```
Abstract

Across the world, millions of users interact with search engines every day to satisfy their information needs. As the Web grows bigger over time, such information needs, manifested through user search queries, also become more complex. However, there has been no systematic study that quantifies the structural complexity of Web search queries. In this research, we make an attempt towards understanding and characterizing the syntactic complexity of search queries using a multi-pronged approach. We use traditional statistical language modeling techniques to quantify and compare the perplexity of queries with natural language (NL). We then use complex network analysis for a comparative analysis of the topological properties of queries issued by real Web users and those generated by statistical models. Finally, we conduct experiments to study whether search engine users are able to identify real queries, when presented along with model-generated ones. The three complementary studies show that the syntactic structure of Web queries is more complex than what n-grams can capture, but simpler than NL. Queries, thus, seem to represent an intermediate stage between syntactic and non-syntactic communication.

Search (667 results, page 1 of 34)

Authors

Languages

Types

Themes

Subjects

Classifications