Search (130 results, page 7 of 7)

Li, Z.: ¬A domain specific search engine with explicit document relations (2013) 0.00
```
2.548789E-4 = product of:
  0.005097578 = sum of:
    0.005097578 = weight(_text_:in in 1210) [ClassicSimilarity], result of:
      0.005097578 = score(doc=1210,freq=6.0), product of:
        0.039165888 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.02879306 = queryNorm
        0.1301535 = fieldWeight in 1210, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1210)
  0.05 = coord(1/20)
```
Abstract

The current web consists of documents that are highly heterogeneous and hard for machines to understand. The Semantic Web is a progressive movement of the Word Wide Web, aiming at converting the current web of unstructured documents to the web of data. In the Semantic Web, web documents are annotated with metadata using standardized ontology language. These annotated documents are directly processable by machines and it highly improves their usability and usefulness. In Ericsson, similar problems occur. There are massive documents being created with well-defined structures. Though these documents are about domain specific knowledge and can have rich relations, they are currently managed by a traditional search engine, which ignores the rich domain specific information and presents few data to users. Motivated by the Semantic Web, we aim to find standard ways to process these documents, extract rich domain specific information and annotate these data to documents with formal markup languages. We propose this project to develop a domain specific search engine for processing different documents and building explicit relations for them. This research project consists of the three main focuses: examining different domain specific documents and finding ways to extract their metadata; integrating a text search engine with an ontology server; exploring novel ways to build relations for documents. We implement this system and demonstrate its functions. As a prototype, the system provides required features and will be extended in the future.
Lewandowski, D.: ¬A framework for evaluating the retrieval effectiveness of search engines (2012) 0.00
```
2.497293E-4 = product of:
  0.0049945856 = sum of:
    0.0049945856 = weight(_text_:in in 106) [ClassicSimilarity], result of:
      0.0049945856 = score(doc=106,freq=4.0), product of:
        0.039165888 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.02879306 = queryNorm
        0.12752387 = fieldWeight in 106, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=106)
  0.05 = coord(1/20)
```
Abstract

This chapter presents a theoretical framework for evaluating next generation search engines. The author focuses on search engines whose results presentation is enriched with additional information and does not merely present the usual list of "10 blue links," that is, of ten links to results, accompanied by a short description. While Web search is used as an example here, the framework can easily be applied to search engines in any other area. The framework not only addresses the results presentation, but also takes into account an extension of the general design of retrieval effectiveness tests. The chapter examines the ways in which this design might influence the results of such studies and how a reliable test is best designed.
Jindal, V.; Bawa, S.; Batra, S.: ¬A review of ranking approaches for semantic search on Web (2014) 0.00
```
2.497293E-4 = product of:
  0.0049945856 = sum of:
    0.0049945856 = weight(_text_:in in 2799) [ClassicSimilarity], result of:
      0.0049945856 = score(doc=2799,freq=4.0), product of:
        0.039165888 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.02879306 = queryNorm
        0.12752387 = fieldWeight in 2799, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=2799)
  0.05 = coord(1/20)
```
Abstract

With ever increasing information being available to the end users, search engines have become the most powerful tools for obtaining useful information scattered on the Web. However, it is very common that even most renowned search engines return result sets with not so useful pages to the user. Research on semantic search aims to improve traditional information search and retrieval methods where the basic relevance criteria rely primarily on the presence of query keywords within the returned pages. This work is an attempt to explore different relevancy ranking approaches based on semantics which are considered appropriate for the retrieval of relevant information. In this paper, various pilot projects and their corresponding outcomes have been investigated based on methodologies adopted and their most distinctive characteristics towards ranking. An overview of selected approaches and their comparison by means of the classification criteria has been presented. With the help of this comparison, some common concepts and outstanding features have been identified.

Theme

Semantisches Umfeld in Indexierung u. Retrieval
Bressan, M.; Peserico, E.: Choose the damping, choose the ranking? (2010) 0.00
```
2.35447E-4 = product of:
  0.00470894 = sum of:
    0.00470894 = weight(_text_:in in 2563) [ClassicSimilarity], result of:
      0.00470894 = score(doc=2563,freq=8.0), product of:
        0.039165888 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.02879306 = queryNorm
        0.120230645 = fieldWeight in 2563, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.03125 = fieldNorm(doc=2563)
  0.05 = coord(1/20)
```
Abstract

To what extent can changes in PageRank's damping factor affect node ranking? We prove that, at least on some graphs, the top k nodes assume all possible k! orderings as the damping factor varies, even if it varies within an arbitrarily small interval (e.g. [0.84999,0.85001][0.84999,0.85001]). Thus, the rank of a node for a given (finite set of discrete) damping factor(s) provides very little information about the rank of that node as the damping factor varies over a continuous interval. We bypass this problem introducing lineage analysis and proving that there is a simple condition, with a "natural" interpretation independent of PageRank, that allows one to verify "in one shot" if a node outperforms another simultaneously for all damping factors and all damping variables (informally, time variant damping factors). The novel notions of strong rank and weak rank of a node provide a measure of the fuzziness of the rank of that node, of the objective orderability of a graph's nodes, and of the quality of results returned by different ranking algorithms based on the random surfer model. We deploy our analytical tools on a 41M node snapshot of the .it Web domain and on a 0.7M node snapshot of the CiteSeer citation graph. Among other findings, we show that rank is indeed relatively stable in both graphs; that "classic" PageRank (d=0.85) marginally outperforms Weighted In-degree (d->0), mainly due to its ability to ferret out "niche" items; and that, for both the Web and CiteSeer, the ideal damping factor appears to be 0.8-0.9 to obtain those items of high importance to at least one (model of randomly surfing) user, but only 0.5-0.6 to obtain those items important to every (model of randomly surfing) user.
Shapira, B.; Zabar, B.: Personalized search : integrating collaboration and social networks (2011) 0.00
```
2.0810771E-4 = product of:
  0.004162154 = sum of:
    0.004162154 = weight(_text_:in in 4140) [ClassicSimilarity], result of:
      0.004162154 = score(doc=4140,freq=4.0), product of:
        0.039165888 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.02879306 = queryNorm
        0.10626988 = fieldWeight in 4140, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4140)
  0.05 = coord(1/20)
```
Abstract

Despite improvements in their capabilities, search engines still fail to provide users with only relevant results. One reason is that most search engines implement a "one size fits all" approach that ignores personal preferences when retrieving the results of a user's query. Recent studies (Smyth, 2010) have elaborated the importance of personalizing search results and have proposed integrating recommender system methods for enhancing results using contextual and extrinsic information that might indicate the user's actual needs. In this article, we review recommender system methods used for personalizing and improving search results and examine the effect of two such methods that are merged for this purpose. One method is based on collaborative users' knowledge; the second integrates information from the user's social network. We propose new methods for collaborative-and social-based search and demonstrate that each of these methods, when separately applied, produce more accurate search results than does a purely keyword-based search engine (referred to as "standard search engine"), where the social search engine is more accurate than is the collaborative one. However, separately applied, these methods do not produce a sufficient number of results (low coverage). Nevertheless, merging these methods with those implemented by standard search engines overcomes the low-coverage problem and produces personalized results for users that display significantly more accurate results while also providing sufficient coverage than do standard search engines. The improvement, however, is significant only for topics for which the diversity of terms used for queries among users is low.
Roy, R.S.; Agarwal, S.; Ganguly, N.; Choudhury, M.: Syntactic complexity of Web search queries through the lenses of language models, networks and users (2016) 0.00
```
2.0810771E-4 = product of:
  0.004162154 = sum of:
    0.004162154 = weight(_text_:in in 3188) [ClassicSimilarity], result of:
      0.004162154 = score(doc=3188,freq=4.0), product of:
        0.039165888 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.02879306 = queryNorm
        0.10626988 = fieldWeight in 3188, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3188)
  0.05 = coord(1/20)
```
Abstract

Across the world, millions of users interact with search engines every day to satisfy their information needs. As the Web grows bigger over time, such information needs, manifested through user search queries, also become more complex. However, there has been no systematic study that quantifies the structural complexity of Web search queries. In this research, we make an attempt towards understanding and characterizing the syntactic complexity of search queries using a multi-pronged approach. We use traditional statistical language modeling techniques to quantify and compare the perplexity of queries with natural language (NL). We then use complex network analysis for a comparative analysis of the topological properties of queries issued by real Web users and those generated by statistical models. Finally, we conduct experiments to study whether search engine users are able to identify real queries, when presented along with model-generated ones. The three complementary studies show that the syntactic structure of Web queries is more complex than what n-grams can capture, but simpler than NL. Queries, thus, seem to represent an intermediate stage between syntactic and non-syntactic communication.

Theme

Semantisches Umfeld in Indexierung u. Retrieval
Unkel, J.; Haas, A.: ¬The effects of credibility cues on the selection of search engine results (2017) 0.00
```
2.0810771E-4 = product of:
  0.004162154 = sum of:
    0.004162154 = weight(_text_:in in 3752) [ClassicSimilarity], result of:
      0.004162154 = score(doc=3752,freq=4.0), product of:
        0.039165888 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.02879306 = queryNorm
        0.10626988 = fieldWeight in 3752, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3752)
  0.05 = coord(1/20)
```
Abstract

Web search engines act as gatekeepers when people search for information online. Research has shown that search engine users seem to trust the search engines' ranking uncritically and mostly select top-ranked results. This study further examines search engine users' selection behavior. Drawing from the credibility and information research literature, we test whether the presence or absence of certain credibility cues influences the selection probability of search engine results. In an observational study, participants (N?=?247) completed two information research tasks on preset search engine results pages, on which three credibility cues (source reputation, message neutrality, and social recommendations) as well as the search result ranking were systematically varied. The results of our study confirm the significance of the ranking. Of the three credibility cues, only reputation had an additional effect on selection probabilities. Personal characteristics (prior knowledge about the researched issues, search engine usage patterns, etc.) did not influence the preference for search results linked with certain credibility cues. These findings are discussed in light of situational and contextual characteristics (e.g., involvement, low-cost scenarios).
Ortiz-Cordova, A.; Jansen, B.J.: Classifying web search queries to identify high revenue generating customers (2012) 0.00
```
1.7658525E-4 = product of:
  0.003531705 = sum of:
    0.003531705 = weight(_text_:in in 279) [ClassicSimilarity], result of:
      0.003531705 = score(doc=279,freq=2.0), product of:
        0.039165888 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.02879306 = queryNorm
        0.09017298 = fieldWeight in 279, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=279)
  0.05 = coord(1/20)
```
Abstract

Traffic from search engines is important for most online businesses, with the majority of visitors to many websites being referred by search engines. Therefore, an understanding of this search engine traffic is critical to the success of these websites. Understanding search engine traffic means understanding the underlying intent of the query terms and the corresponding user behaviors of searchers submitting keywords. In this research, using 712,643 query keywords from a popular Spanish music website relying on contextual advertising as its business model, we use a k-means clustering algorithm to categorize the referral keywords with similar characteristics of onsite customer behavior, including attributes such as clickthrough rate and revenue. We identified 6 clusters of consumer keywords. Clusters range from a large number of users who are low impact to a small number of high impact users. We demonstrate how online businesses can leverage this segmentation clustering approach to provide a more tailored consumer experience. Implications are that businesses can effectively segment customers to develop better business models to increase advertising conversion rates.
Fu, T.; Abbasi, A.; Chen, H.: ¬A focused crawler for Dark Web forums (2010) 0.00
```
1.4715438E-4 = product of:
  0.0029430876 = sum of:
    0.0029430876 = weight(_text_:in in 3471) [ClassicSimilarity], result of:
      0.0029430876 = score(doc=3471,freq=2.0), product of:
        0.039165888 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.02879306 = queryNorm
        0.07514416 = fieldWeight in 3471, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3471)
  0.05 = coord(1/20)
```
Abstract

The unprecedented growth of the Internet has given rise to the Dark Web, the problematic facet of the Web associated with cybercrime, hate, and extremism. Despite the need for tools to collect and analyze Dark Web forums, the covert nature of this part of the Internet makes traditional Web crawling techniques insufficient for capturing such content. In this study, we propose a novel crawling system designed to collect Dark Web forum content. The system uses a human-assisted accessibility approach to gain access to Dark Web forums. Several URL ordering features and techniques enable efficient extraction of forum postings. The system also includes an incremental crawler coupled with a recall-improvement mechanism intended to facilitate enhanced retrieval and updating of collected content. Experiments conducted to evaluate the effectiveness of the human-assisted accessibility approach and the recall-improvement-based, incremental-update procedure yielded favorable results. The human-assisted approach significantly improved access to Dark Web forums while the incremental crawler with recall improvement also outperformed standard periodic- and incremental-update approaches. Using the system, we were able to collect over 100 Dark Web forums from three regions. A case study encompassing link and content analysis of collected forums was used to illustrate the value and importance of gathering and analyzing content from such online communities.
Souza, J.; Carvalho, A.; Cristo, M.; Moura, E.; Calado, P.; Chirita, P.-A.; Nejdl, W.: Using site-level connections to estimate link confidence (2012) 0.00
```
1.4715438E-4 = product of:
  0.0029430876 = sum of:
    0.0029430876 = weight(_text_:in in 498) [ClassicSimilarity], result of:
      0.0029430876 = score(doc=498,freq=2.0), product of:
        0.039165888 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.02879306 = queryNorm
        0.07514416 = fieldWeight in 498, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0390625 = fieldNorm(doc=498)
  0.05 = coord(1/20)
```
Abstract

Search engines are essential tools for web users today. They rely on a large number of features to compute the rank of search results for each given query. The estimated reputation of pages is among the effective features available for search engine designers, probably being adopted by most current commercial search engines. Page reputation is estimated by analyzing the linkage relationships between pages. This information is used by link analysis algorithms as a query-independent feature, to be taken into account when computing the rank of the results. Unfortunately, several types of links found on the web may damage the estimated page reputation and thus cause a negative effect on the quality of search results. This work studies alternatives to reduce the negative impact of such noisy links. More specifically, the authors propose and evaluate new methods that deal with noisy links, considering scenarios where the reputation of pages is computed using the PageRank algorithm. They show, through experiments with real web content, that their methods achieve significant improvements when compared to previous solutions proposed in the literature.

Search (130 results, page 7 of 7)

Authors

Languages

Types

Themes

Subjects

Classifications