Search (2231 results, page 3 of 112)

Ortiz-Cordova, A.; Yang, Y.; Jansen, B.J.: External to internal search : associating searching on search engines with searching on sites (2015) 0.03
```
0.034903605 = product of:
  0.17451802 = sum of:
    0.17451802 = weight(_text_:grams in 2675) [ClassicSimilarity], result of:
      0.17451802 = score(doc=2675,freq=2.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.44521773 = fieldWeight in 2675, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2675)
  0.2 = coord(1/5)
```
Abstract

We analyze the transitions from external search, searching on web search engines, to internal search, searching on websites. We categorize 295,571 search episodes composed of a query submitted to web search engines and the subsequent queries submitted to a single website search by the same users. There are a total of 1,136,390 queries from all searches, of which 295,571 are external search queries and 840,819 are internal search queries. We algorithmically classify queries into states and then use n-grams to categorize search patterns. We cluster the searching episodes into major patterns and identify the most commonly occurring, which are: (1) Explorers (43% of all patterns) with a broad external search query and then broad internal search queries, (2) Navigators (15%) with an external search query containing a URL component and then specific internal search queries, and (3) Shifters (15%) with a different, seemingly unrelated, query types when transitioning from external to internal search. The implications of this research are that external search and internal search sessions are part of a single search episode and that online businesses can leverage these search episodes to more effectively target potential customers.

Gencosman, B.C.; Ozmutlu, H.C.; Ozmutlu, S.: Character n-gram application for automatic new topic identification (2014) 0.03

0.034903605 = product of:
  0.17451802 = sum of:
    0.17451802 = weight(_text_:grams in 2688) [ClassicSimilarity], result of:
      0.17451802 = score(doc=2688,freq=2.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.44521773 = fieldWeight in 2688, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2688)
  0.2 = coord(1/5)

Object: n-grams

Roy, R.S.; Agarwal, S.; Ganguly, N.; Choudhury, M.: Syntactic complexity of Web search queries through the lenses of language models, networks and users (2016) 0.03
```
0.034903605 = product of:
  0.17451802 = sum of:
    0.17451802 = weight(_text_:grams in 3188) [ClassicSimilarity], result of:
      0.17451802 = score(doc=3188,freq=2.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.44521773 = fieldWeight in 3188, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3188)
  0.2 = coord(1/5)
```
Abstract

Across the world, millions of users interact with search engines every day to satisfy their information needs. As the Web grows bigger over time, such information needs, manifested through user search queries, also become more complex. However, there has been no systematic study that quantifies the structural complexity of Web search queries. In this research, we make an attempt towards understanding and characterizing the syntactic complexity of search queries using a multi-pronged approach. We use traditional statistical language modeling techniques to quantify and compare the perplexity of queries with natural language (NL). We then use complex network analysis for a comparative analysis of the topological properties of queries issued by real Web users and those generated by statistical models. Finally, we conduct experiments to study whether search engine users are able to identify real queries, when presented along with model-generated ones. The three complementary studies show that the syntactic structure of Web queries is more complex than what n-grams can capture, but simpler than NL. Queries, thus, seem to represent an intermediate stage between syntactic and non-syntactic communication.
Lhadj, L.S.; Boughanem, M.; Amrouche, K.: Enhancing information retrieval through concept-based language modeling and semantic smoothing (2016) 0.03
```
0.034903605 = product of:
  0.17451802 = sum of:
    0.17451802 = weight(_text_:grams in 3221) [ClassicSimilarity], result of:
      0.17451802 = score(doc=3221,freq=2.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.44521773 = fieldWeight in 3221, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3221)
  0.2 = coord(1/5)
```
Abstract

Traditionally, many information retrieval models assume that terms occur in documents independently. Although these models have already shown good performance, the word independency assumption seems to be unrealistic from a natural language point of view, which considers that terms are related to each other. Therefore, such an assumption leads to two well-known problems in information retrieval (IR), namely, polysemy, or term mismatch, and synonymy. In language models, these issues have been addressed by considering dependencies such as bigrams, phrasal-concepts, or word relationships, but such models are estimated using simple n-grams or concept counting. In this paper, we address polysemy and synonymy mismatch with a concept-based language modeling approach that combines ontological concepts from external resources with frequently found collocations from the document collection. In addition, the concept-based model is enriched with subconcepts and semantic relationships through a semantic smoothing technique so as to perform semantic matching. Experiments carried out on TREC collections show that our model achieves significant improvements over a single word-based model and the Markov Random Field model (using a Markov classifier).
Ferro, N.; Silvello, G.: Toward an anatomy of IR system component performances (2018) 0.03
```
0.034903605 = product of:
  0.17451802 = sum of:
    0.17451802 = weight(_text_:grams in 4035) [ClassicSimilarity], result of:
      0.17451802 = score(doc=4035,freq=2.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.44521773 = fieldWeight in 4035, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4035)
  0.2 = coord(1/5)
```
Abstract

Information retrieval (IR) systems are the prominent means for searching and accessing huge amounts of unstructured information on the web and elsewhere. They are complex systems, constituted by many different components interacting together, and evaluation is crucial to both tune and improve them. Nevertheless, in the current evaluation methodology, there is still no way to determine how much each component contributes to the overall performances and how the components interact together. This hampers the possibility of a deep understanding of IR system behavior and, in turn, prevents us from designing ahead which components are best suited to work together for a specific search task. In this paper, we move the evaluation methodology one step forward by overcoming these barriers and beginning to devise an "anatomy" of IR systems and their internals. In particular, we propose a methodology based on the General Linear Mixed Model (GLMM) and analysis of variance (ANOVA) to develop statistical models able to isolate system variance and component effects as well as their interaction, by relying on a grid of points (GoP) containing all the combinations of the analyzed components. We apply the proposed methodology to the analysis of two relevant search tasks-news search and web search-by using standard TREC collections. We analyze the basic set of components typically part of an IR system, namely, stop lists, stemmers, and n-grams, and IR models. In this way, we derive insights about English text retrieval.
Juola, P.; Mikros, G.K.; Vinsick, S.: ¬A comparative assessment of the difficulty of authorship attribution in Greek and in English (2019) 0.03
```
0.034903605 = product of:
  0.17451802 = sum of:
    0.17451802 = weight(_text_:grams in 4676) [ClassicSimilarity], result of:
      0.17451802 = score(doc=4676,freq=2.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.44521773 = fieldWeight in 4676, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4676)
  0.2 = coord(1/5)
```
Abstract

Authorship attribution is an important problem in text classification, with many applications and a substantial body of research activity. Among the research findings are that many different methods will work, including a number of methods that are superficially language-independent (such as an analysis of the most common "words" or "character n-grams" in a document). Since all languages have words (and all written languages have characters), this method could (in theory) work on any language. However, it is not clear that the methods that work best on, for example English, would also work best on other languages. It is not even clear that the same level of performance is achievable in different languages, even under identical conditions. Unfortunately, it is very difficult to achieve "identical conditions" in practice. A new corpus, developed by George Mikros, provides very tight controls not only for author but also for topic, thus enabling a direct comparison of performance levels between the two languages Greek and English. We compare a number of different methods head-to-head on this corpus, and show that, overall, performance on English is higher than performance on Greek, often highly significantly so.
Agarwal, B.; Ramampiaro, H.; Langseth, H.; Ruocco, M.: ¬A deep network model for paraphrase detection in short text messages (2018) 0.03
```
0.034903605 = product of:
  0.17451802 = sum of:
    0.17451802 = weight(_text_:grams in 5043) [ClassicSimilarity], result of:
      0.17451802 = score(doc=5043,freq=2.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.44521773 = fieldWeight in 5043, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5043)
  0.2 = coord(1/5)
```
Abstract

This paper is concerned with paraphrase detection, i.e., identifying sentences that are semantically identical. The ability to detect similar sentences written in natural language is crucial for several applications, such as text mining, text summarization, plagiarism detection, authorship authentication and question answering. Recognizing this importance, we study in particular how to address the challenges with detecting paraphrases in user generated short texts, such as Twitter, which often contain language irregularity and noise, and do not necessarily contain as much semantic information as longer clean texts. We propose a novel deep neural network-based approach that relies on coarse-grained sentence modelling using a convolutional neural network (CNN) and a recurrent neural network (RNN) model, combined with a specific fine-grained word-level similarity matching model. More specifically, we develop a new architecture, called DeepParaphrase, which enables to create an informative semantic representation of each sentence by (1) using CNN to extract the local region information in form of important n-grams from the sentence, and (2) applying RNN to capture the long-term dependency information. In addition, we perform a comparative study on state-of-the-art approaches within paraphrase detection. An important insight from this study is that existing paraphrase approaches perform well when applied on clean texts, but they do not necessarily deliver good performance against noisy texts, and vice versa. In contrast, our evaluation has shown that the proposed DeepParaphrase-based approach achieves good results in both types of texts, thus making it more robust and generic than the existing approaches.

Stojanovic, N.: Ontology-based Information Retrieval : methods and tools for cooperative query answering (2005) 0.03

0.030899638 = product of:
  0.15449819 = sum of:
    0.15449819 = weight(_text_:3a in 701) [ClassicSimilarity], result of:
      0.15449819 = score(doc=701,freq=2.0), product of:
        0.41234848 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.04863741 = queryNorm
        0.3746787 = fieldWeight in 701, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.03125 = fieldNorm(doc=701)
  0.2 = coord(1/5)

Content: Vgl.: http%3A%2F%2Fdigbib.ubka.uni-karlsruhe.de%2Fvolltexte%2Fdocuments%2F1627&ei=tAtYUYrBNoHKtQb3l4GYBw&usg=AFQjCNHeaxKkKU3-u54LWxMNYGXaaDLCGw&sig2=8WykXWQoDKjDSdGtAakH2Q&bvm=bv.44442042,d.Yms.

Xiong, C.: Knowledge based text representations for information retrieval (2016) 0.03

0.030899638 = product of:
  0.15449819 = sum of:
    0.15449819 = weight(_text_:3a in 5820) [ClassicSimilarity], result of:
      0.15449819 = score(doc=5820,freq=2.0), product of:
        0.41234848 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.04863741 = queryNorm
        0.3746787 = fieldWeight in 5820, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.03125 = fieldNorm(doc=5820)
  0.2 = coord(1/5)

Content: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Language and Information Technologies. Vgl.: https%3A%2F%2Fwww.cs.cmu.edu%2F~cx%2Fpapers%2Fknowledge_based_text_representation.pdf&usg=AOvVaw0SaTSvhWLTh__Uz_HtOtl3.

Jascó, P.: Searching for images by similarity online (1998) 0.03

0.029821565 = product of:
  0.14910783 = sum of:
    0.14910783 = weight(_text_:22 in 393) [ClassicSimilarity], result of:
      0.14910783 = score(doc=393,freq=4.0), product of:
        0.17031991 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.04863741 = queryNorm
        0.8754574 = fieldWeight in 393, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.125 = fieldNorm(doc=393)
  0.2 = coord(1/5)

Date: 29.11.2004 13:03:22
Source: Online. 22(1998) no.6, S.99-102

Hawking, D.; Robertson, S.: On collection size and retrieval effectiveness (2003) 0.03

0.029821565 = product of:
  0.14910783 = sum of:
    0.14910783 = weight(_text_:22 in 4109) [ClassicSimilarity], result of:
      0.14910783 = score(doc=4109,freq=4.0), product of:
        0.17031991 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.04863741 = queryNorm
        0.8754574 = fieldWeight in 4109, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.125 = fieldNorm(doc=4109)
  0.2 = coord(1/5)

Date: 14. 8.2005 14:22:22

Multilingual information management : current levels and future abilities. A report Commissioned by the US National Science Foundation and also delivered to the European Commission's Language Engineering Office and the US Defense Advanced Research Projects Agency, April 1999 (1999) 0.03
```
0.027922884 = product of:
  0.13961442 = sum of:
    0.13961442 = weight(_text_:grams in 6068) [ClassicSimilarity], result of:
      0.13961442 = score(doc=6068,freq=2.0), product of:
        0.39198354 = queryWeight, product of:
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.04863741 = queryNorm
        0.35617417 = fieldWeight in 6068, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.03125 = fieldNorm(doc=6068)
  0.2 = coord(1/5)
```
Abstract

Over the past 50 years, a variety of language-related capabilities has been developed in machine translation, information retrieval, speech recognition, text summarization, and so on. These applications rest upon a set of core techniques such as language modeling, information extraction, parsing, generation, and multimedia planning and integration; and they involve methods using statistics, rules, grammars, lexicons, ontologies, training techniques, and so on. It is a puzzling fact that although all of this work deals with language in some form or other, the major applications have each developed a separate research field. For example, there is no reason why speech recognition techniques involving n-grams and hidden Markov models could not have been used in machine translation 15 years earlier than they were, or why some of the lexical and semantic insights from the subarea called Computational Linguistics are still not used in information retrieval.

Buzydlowski, J.W.; White, H.D.; Lin, X.: Term Co-occurrence Analysis as an Interface for Digital Libraries (2002) 0.03

0.027392859 = product of:
  0.13696429 = sum of:
    0.13696429 = weight(_text_:22 in 1339) [ClassicSimilarity], result of:
      0.13696429 = score(doc=1339,freq=6.0), product of:
        0.17031991 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.04863741 = queryNorm
        0.804159 = fieldWeight in 1339, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.09375 = fieldNorm(doc=1339)
  0.2 = coord(1/5)

Date: 22. 2.2003 17:25:39
22. 2.2003 18:16:22

Dahlberg, I.: Conceptual definitions for INTERCONCEPT (1981) 0.03

0.026358789 = product of:
  0.13179395 = sum of:
    0.13179395 = weight(_text_:22 in 1630) [ClassicSimilarity], result of:
      0.13179395 = score(doc=1630,freq=2.0), product of:
        0.17031991 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.04863741 = queryNorm
        0.77380234 = fieldWeight in 1630, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.15625 = fieldNorm(doc=1630)
  0.2 = coord(1/5)

Source: International classification. 8(1981), S.16-22

Pietris, M.K.D.: LCSH update (1988) 0.03

0.026358789 = product of:
  0.13179395 = sum of:
    0.13179395 = weight(_text_:22 in 2798) [ClassicSimilarity], result of:
      0.13179395 = score(doc=2798,freq=2.0), product of:
        0.17031991 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.04863741 = queryNorm
        0.77380234 = fieldWeight in 2798, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.15625 = fieldNorm(doc=2798)
  0.2 = coord(1/5)

Source: Cataloguing Australia. 13(1988), S.19-22

Serial cataloguing : modern perspectives and international developments (1992) 0.03

0.026358789 = product of:
  0.13179395 = sum of:
    0.13179395 = weight(_text_:22 in 3704) [ClassicSimilarity], result of:
      0.13179395 = score(doc=3704,freq=2.0), product of:
        0.17031991 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.04863741 = queryNorm
        0.77380234 = fieldWeight in 3704, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.15625 = fieldNorm(doc=3704)
  0.2 = coord(1/5)

Source: Serials librarian. 22(1992), nos.3/4

Woods, W.A.: What's important about knowledge representation? (1983) 0.03

0.026358789 = product of:
  0.13179395 = sum of:
    0.13179395 = weight(_text_:22 in 6143) [ClassicSimilarity], result of:
      0.13179395 = score(doc=6143,freq=2.0), product of:
        0.17031991 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.04863741 = queryNorm
        0.77380234 = fieldWeight in 6143, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.15625 = fieldNorm(doc=6143)
  0.2 = coord(1/5)

Source: Computer. 16(1983) no.10, S.22-27

Smith, G.: Newspapers on CD-ROM (1992) 0.03

0.026358789 = product of:
  0.13179395 = sum of:
    0.13179395 = weight(_text_:22 in 6396) [ClassicSimilarity], result of:
      0.13179395 = score(doc=6396,freq=2.0), product of:
        0.17031991 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.04863741 = queryNorm
        0.77380234 = fieldWeight in 6396, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.15625 = fieldNorm(doc=6396)
  0.2 = coord(1/5)

Source: Serials. 5(1992) no.3, S.17-22

Panizzi, A.K.C.B.: Passages in my official life (1871) 0.03

0.02609387 = product of:
  0.13046935 = sum of:
    0.13046935 = weight(_text_:22 in 935) [ClassicSimilarity], result of:
      0.13046935 = score(doc=935,freq=4.0), product of:
        0.17031991 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.04863741 = queryNorm
        0.76602525 = fieldWeight in 935, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.109375 = fieldNorm(doc=935)
  0.2 = coord(1/5)

Date: 22. 7.2007 12:05:26
22. 7.2007 12:08:24

Nanfito, N.: ¬The indexed Web : engineering tools for cataloging, storing and delivering Web based documents (1999) 0.03

0.02609387 = product of:
  0.13046935 = sum of:
    0.13046935 = weight(_text_:22 in 8727) [ClassicSimilarity], result of:
      0.13046935 = score(doc=8727,freq=4.0), product of:
        0.17031991 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.04863741 = queryNorm
        0.76602525 = fieldWeight in 8727, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.109375 = fieldNorm(doc=8727)
  0.2 = coord(1/5)

Date: 5. 8.2001 12:22:47
Source: Information outlook. 3(1999) no.2, S.18-22

Search (2231 results, page 3 of 112)

Authors

Years

Types

Themes

Subjects

Classifications