Search (1868 results, page 1 of 94)

  • × year_i:[2000 TO 2010}
  1. Nicholson, S.; Sierra, T.; Eseryel, U.Y.; Park, J.-H.; Barkow, P.; Pozo, E.J.; Ward, J.: How much of it is real? : analysis of paid placement in Web search engine results (2006) 0.19
    0.1896001 = product of:
      0.28440014 = sum of:
        0.09994029 = weight(_text_:query in 5278) [ClassicSimilarity], result of:
          0.09994029 = score(doc=5278,freq=4.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.43571556 = fieldWeight in 5278, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.046875 = fieldNorm(doc=5278)
        0.18445984 = sum of:
          0.14434065 = weight(_text_:page in 5278) [ClassicSimilarity], result of:
            0.14434065 = score(doc=5278,freq=4.0), product of:
              0.27565226 = queryWeight, product of:
                5.5854197 = idf(docFreq=450, maxDocs=44218)
                0.049352113 = queryNorm
              0.5236331 = fieldWeight in 5278, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.5854197 = idf(docFreq=450, maxDocs=44218)
                0.046875 = fieldNorm(doc=5278)
          0.040119182 = weight(_text_:22 in 5278) [ClassicSimilarity], result of:
            0.040119182 = score(doc=5278,freq=2.0), product of:
              0.1728227 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.049352113 = queryNorm
              0.23214069 = fieldWeight in 5278, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=5278)
      0.6666667 = coord(2/3)
    
    Abstract
    Most Web search tools integrate sponsored results with results from their internal editorial database in providing results to users. The goal of this research is to get a better idea of how much of the screen real estate displays real editorial results as compared to sponsored results. The overall average results are that 40% of all results presented on the first screen are real results, and when the entire first Web page is considered, 67% of the results are nonsponsored results. For general search tools such as Google, 56% of the first screen and 82% of the first Web page contain nonsponsored results. Other results include that query structure makes a significant difference in the percentage of nonsponsored results returned by a search. Similarly, the topic of the query also can have a significant effect on the percentage of sponsored results displayed by most Web search tools.
    Date
    22. 7.2006 16:32:57
  2. Bian, G.-W.; Chen, H.-H.: Cross-language information access to multilingual collections on the Internet (2000) 0.16
    0.16141582 = product of:
      0.24212372 = sum of:
        0.09994029 = weight(_text_:query in 4436) [ClassicSimilarity], result of:
          0.09994029 = score(doc=4436,freq=4.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.43571556 = fieldWeight in 4436, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.046875 = fieldNorm(doc=4436)
        0.14218344 = sum of:
          0.10206425 = weight(_text_:page in 4436) [ClassicSimilarity], result of:
            0.10206425 = score(doc=4436,freq=2.0), product of:
              0.27565226 = queryWeight, product of:
                5.5854197 = idf(docFreq=450, maxDocs=44218)
                0.049352113 = queryNorm
              0.37026453 = fieldWeight in 4436, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.5854197 = idf(docFreq=450, maxDocs=44218)
                0.046875 = fieldNorm(doc=4436)
          0.040119182 = weight(_text_:22 in 4436) [ClassicSimilarity], result of:
            0.040119182 = score(doc=4436,freq=2.0), product of:
              0.1728227 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.049352113 = queryNorm
              0.23214069 = fieldWeight in 4436, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=4436)
      0.6666667 = coord(2/3)
    
    Abstract
    Language barrier is the major problem that people face in searching for, retrieving, and understanding multilingual collections on the Internet. This paper deals with query translation and document translation in a Chinese-English information retrieval system called MTIR. Bilingual dictionary and monolingual corpus-based approaches are adopted to select suitable tranlated query terms. A machine transliteration algorithm is introduced to resolve proper name searching. We consider several design issues for document translation, including which material is translated, what roles the HTML tags play in translation, what the tradeoff is between the speed performance and the translation performance, and what from the translated result is presented in. About 100.000 Web pages translated in the last 4 months of 1997 are used for quantitative study of online and real-time Web page translation
    Date
    16. 2.2000 14:22:39
  3. Seo, H.-C.; Kim, S.-B.; Rim, H.-C.; Myaeng, S.-H.: lmproving query translation in English-Korean Cross-language information retrieval (2005) 0.15
    0.15470998 = product of:
      0.23206496 = sum of:
        0.21200538 = weight(_text_:query in 1023) [ClassicSimilarity], result of:
          0.21200538 = score(doc=1023,freq=18.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.92429227 = fieldWeight in 1023, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.046875 = fieldNorm(doc=1023)
        0.020059591 = product of:
          0.040119182 = sum of:
            0.040119182 = weight(_text_:22 in 1023) [ClassicSimilarity], result of:
              0.040119182 = score(doc=1023,freq=2.0), product of:
                0.1728227 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049352113 = queryNorm
                0.23214069 = fieldWeight in 1023, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1023)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Query translation is a viable method for cross-language information retrieval (CLIR), but it suffers from translation ambiguities caused by multiple translations of individual query terms. Previous research has employed various methods for disambiguation, including the method of selecting an individual target query term from multiple candidates by comparing their statistical associations with the candidate translations of other query terms. This paper proposes a new method where we examine all combinations of target query term translations corresponding to the source query terms, instead of looking at the candidates for each query term and selecting the best one at a time. The goodness value for a combination of target query terms is computed based on the association value between each pair of the terms in the combination. We tested our method using the NTCIR-3 English-Korean CLIR test collection. The results show some improvements regardless of the association measures we used.
    Date
    26.12.2007 20:22:38
  4. Stojanovic, N.: Ontology-based Information Retrieval : methods and tools for cooperative query answering (2005) 0.13
    0.12906206 = product of:
      0.19359308 = sum of:
        0.052256163 = product of:
          0.15676849 = sum of:
            0.15676849 = weight(_text_:3a in 701) [ClassicSimilarity], result of:
              0.15676849 = score(doc=701,freq=2.0), product of:
                0.41840777 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.049352113 = queryNorm
                0.3746787 = fieldWeight in 701, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.03125 = fieldNorm(doc=701)
          0.33333334 = coord(1/3)
        0.14133692 = weight(_text_:query in 701) [ClassicSimilarity], result of:
          0.14133692 = score(doc=701,freq=18.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.61619484 = fieldWeight in 701, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.03125 = fieldNorm(doc=701)
      0.6666667 = coord(2/3)
    
    Abstract
    By the explosion of possibilities for a ubiquitous content production, the information overload problem reaches the level of complexity which cannot be managed by traditional modelling approaches anymore. Due to their pure syntactical nature traditional information retrieval approaches did not succeed in treating content itself (i.e. its meaning, and not its representation). This leads to a very low usefulness of the results of a retrieval process for a user's task at hand. In the last ten years ontologies have been emerged from an interesting conceptualisation paradigm to a very promising (semantic) modelling technology, especially in the context of the Semantic Web. From the information retrieval point of view, ontologies enable a machine-understandable form of content description, such that the retrieval process can be driven by the meaning of the content. However, the very ambiguous nature of the retrieval process in which a user, due to the unfamiliarity with the underlying repository and/or query syntax, just approximates his information need in a query, implies a necessity to include the user in the retrieval process more actively in order to close the gap between the meaning of the content and the meaning of a user's query (i.e. his information need). This thesis lays foundation for such an ontology-based interactive retrieval process, in which the retrieval system interacts with a user in order to conceptually interpret the meaning of his query, whereas the underlying domain ontology drives the conceptualisation process. In that way the retrieval process evolves from a query evaluation process into a highly interactive cooperation between a user and the retrieval system, in which the system tries to anticipate the user's information need and to deliver the relevant content proactively. Moreover, the notion of content relevance for a user's query evolves from a content dependent artefact to the multidimensional context-dependent structure, strongly influenced by the user's preferences. This cooperation process is realized as the so-called Librarian Agent Query Refinement Process. In order to clarify the impact of an ontology on the retrieval process (regarding its complexity and quality), a set of methods and tools for different levels of content and query formalisation is developed, ranging from pure ontology-based inferencing to keyword-based querying in which semantics automatically emerges from the results. Our evaluation studies have shown that the possibilities to conceptualize a user's information need in the right manner and to interpret the retrieval results accordingly are key issues for realizing much more meaningful information retrieval systems.
    Content
    Vgl.: http%3A%2F%2Fdigbib.ubka.uni-karlsruhe.de%2Fvolltexte%2Fdocuments%2F1627&ei=tAtYUYrBNoHKtQb3l4GYBw&usg=AFQjCNHeaxKkKU3-u54LWxMNYGXaaDLCGw&sig2=8WykXWQoDKjDSdGtAakH2Q&bvm=bv.44442042,d.Yms.
  5. White, R.W.; Jose, J.M.; Ruthven, I.: ¬A task-oriented study on the influencing effects of query-biased summarisation in web searching (2003) 0.13
    0.12788323 = product of:
      0.19182482 = sum of:
        0.13168289 = weight(_text_:query in 1081) [ClassicSimilarity], result of:
          0.13168289 = score(doc=1081,freq=10.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.5741056 = fieldWeight in 1081, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1081)
        0.060141932 = product of:
          0.120283864 = sum of:
            0.120283864 = weight(_text_:page in 1081) [ClassicSimilarity], result of:
              0.120283864 = score(doc=1081,freq=4.0), product of:
                0.27565226 = queryWeight, product of:
                  5.5854197 = idf(docFreq=450, maxDocs=44218)
                  0.049352113 = queryNorm
                0.4363609 = fieldWeight in 1081, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.5854197 = idf(docFreq=450, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1081)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    The aim of the work described in this paper is to evaluate the influencing effects of query-biased summaries in web searching. For this purpose, a summarisation system has been developed, and a summary tailored to the user's query is generated automatically for each document retrieved. The system aims to provide both a better means of assessing document relevance than titles or abstracts typical of many web search result lists. Through visiting each result page at retrieval-time, the system provides the user with an idea of the current page content and thus deals with the dynamic nature of the web. To examine the effectiveness of this approach, a task-oriented, comparative evaluation between four different web retrieval systems was performed; two that use query-biased summarisation, and two that use the standard ranked titles/abstracts approach. The results from the evaluation indicate that query-biased summarisation techniques appear to be more useful and effective in helping users gauge document relevance than the traditional ranked titles/abstracts approach. The same methodology was used to compare the effectiveness of two of the web's major search engines; AltaVista and Google.
  6. Bar-Ilan, J.: Web links and search engine ranking : the case of Google and the query "Jew" (2006) 0.12
    0.124703124 = product of:
      0.18705468 = sum of:
        0.10200114 = weight(_text_:query in 6104) [ClassicSimilarity], result of:
          0.10200114 = score(doc=6104,freq=6.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.44470036 = fieldWeight in 6104, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0390625 = fieldNorm(doc=6104)
        0.08505354 = product of:
          0.17010708 = sum of:
            0.17010708 = weight(_text_:page in 6104) [ClassicSimilarity], result of:
              0.17010708 = score(doc=6104,freq=8.0), product of:
                0.27565226 = queryWeight, product of:
                  5.5854197 = idf(docFreq=450, maxDocs=44218)
                  0.049352113 = queryNorm
                0.6171075 = fieldWeight in 6104, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  5.5854197 = idf(docFreq=450, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=6104)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    The World Wide Web has become one of our more important information sources, and commercial search engines are the major tools for locating information; however, it is not enough for a Web page to be indexed by the search engines-it also must rank high on relevant queries. One of the parameters involved in ranking is the number and quality of links pointing to the page, based on the assumption that links convey appreciation for a page. This article presents the results of a content analysis of the links to two top pages retrieved by Google for the query "jew" as of July 2004: the "jew" entry on the free online encyclopedia Wikipedia, and the home page of "Jew Watch," a highly anti-Semitic site. The top results for the query "jew" gained public attention in April 2004, when it was noticed that the "Jew Watch" homepage ranked number 1. From this point on, both sides engaged in "Googlebombing" (i.e., increasing the number of links pointing to these pages). The results of the study show that most of the links to these pages come from blogs and discussion links, and the number of links pointing to these pages in appreciation of their content is extremely small. These findings have implications for ranking algorithms based on link counts, and emphasize the huge difference between Web links and citations in the scientific community.
  7. Larkey, L.S.; Connell, M.E.: Structured queries, language modelling, and relevance modelling in cross-language information retrieval (2005) 0.11
    0.10731181 = product of:
      0.16096771 = sum of:
        0.14425138 = weight(_text_:query in 1022) [ClassicSimilarity], result of:
          0.14425138 = score(doc=1022,freq=12.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.6289012 = fieldWeight in 1022, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1022)
        0.016716326 = product of:
          0.03343265 = sum of:
            0.03343265 = weight(_text_:22 in 1022) [ClassicSimilarity], result of:
              0.03343265 = score(doc=1022,freq=2.0), product of:
                0.1728227 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049352113 = queryNorm
                0.19345059 = fieldWeight in 1022, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1022)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Two probabilistic approaches to cross-lingual retrieval are in wide use today, those based on probabilistic models of relevance, as exemplified by INQUERY, and those based on language modeling. INQUERY, as a query net model, allows the easy incorporation of query operators, including a synonym operator, which has proven to be extremely useful in cross-language information retrieval (CLIR), in an approach often called structured query translation. In contrast, language models incorporate translation probabilities into a unified framework. We compare the two approaches on Arabic and Spanish data sets, using two kinds of bilingual dictionaries--one derived from a conventional dictionary, and one derived from a parallel corpus. We find that structured query processing gives slightly better results when queries are not expanded. On the other hand, when queries are expanded, language modeling gives better results, but only when using a probabilistic dictionary derived from a parallel corpus. We pursue two additional issues inherent in the comparison of structured query processing with language modeling. The first concerns query expansion, and the second is the role of translation probabilities. We compare conventional expansion techniques (pseudo-relevance feedback) with relevance modeling, a new IR approach which fits into the formal framework of language modeling. We find that relevance modeling and pseudo-relevance feedback achieve comparable levels of retrieval and that good translation probabilities confer a small but significant advantage.
    Date
    26.12.2007 20:22:11
  8. Haslhofer, B.: Uniform SPARQL access to interlinked (digital library) sources (2007) 0.11
    0.106666565 = product of:
      0.15999985 = sum of:
        0.13325372 = weight(_text_:query in 541) [ClassicSimilarity], result of:
          0.13325372 = score(doc=541,freq=4.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.5809541 = fieldWeight in 541, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0625 = fieldNorm(doc=541)
        0.026746122 = product of:
          0.053492244 = sum of:
            0.053492244 = weight(_text_:22 in 541) [ClassicSimilarity], result of:
              0.053492244 = score(doc=541,freq=2.0), product of:
                0.1728227 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049352113 = queryNorm
                0.30952093 = fieldWeight in 541, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=541)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    In this presentation, we therefore focus on a solution for providing uniform access to Digital Libraries and other online services. In order to enable uniform query access to heterogeneous sources, we must provide metadata interoperability in a way that a query language - in this case SPARQL - can cope with the incompatibility of the metadata in various sources without changing their already existing information models.
    Date
    26.12.2011 13:22:46
  9. Park, E.-K.; Ra, D.-Y.; Jang, M.-G.: Techniques for improving web retrieval effectiveness (2005) 0.10
    0.100648284 = product of:
      0.15097243 = sum of:
        0.09994029 = weight(_text_:query in 1060) [ClassicSimilarity], result of:
          0.09994029 = score(doc=1060,freq=4.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.43571556 = fieldWeight in 1060, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.046875 = fieldNorm(doc=1060)
        0.051032126 = product of:
          0.10206425 = sum of:
            0.10206425 = weight(_text_:page in 1060) [ClassicSimilarity], result of:
              0.10206425 = score(doc=1060,freq=2.0), product of:
                0.27565226 = queryWeight, product of:
                  5.5854197 = idf(docFreq=450, maxDocs=44218)
                  0.049352113 = queryNorm
                0.37026453 = fieldWeight in 1060, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.5854197 = idf(docFreq=450, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1060)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    This paper talks about several schemes for improving retrieval effectiveness that can be used in the named page finding tasks of web information retrieval (Overview of the TREC-2002 web track. In: Proceedings of the Eleventh Text Retrieval Conference TREC-2002, NIST Special Publication #500-251, 2003). These methods were applied on top of the basic information retrieval model as additional mechanisms to upgrade the system. Use of the title of web pages was found to be effective. It was confirmed that anchor texts of incoming links was beneficial as suggested in other works. Sentence-query similarity is a new type of information proposed by us and was identified to be the best information to take advantage of. Stratifying and re-ranking the retrieval list based on the maximum count of index terms in common between a sentence and a query resulted in significant improvement of performance. To demonstrate these facts a large-scale web information retrieval system was developed and used for experimentation.
  10. Friman, J.; Kangaspunta, J.; Leppäniemi, S.; Rasi, P.; Virrankoski, A.: Query performance analyser : a tool for teaching information retrieval skills through an educational game (2005) 0.10
    0.09893282 = product of:
      0.14839922 = sum of:
        0.13168289 = weight(_text_:query in 3010) [ClassicSimilarity], result of:
          0.13168289 = score(doc=3010,freq=10.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.5741056 = fieldWeight in 3010, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3010)
        0.016716326 = product of:
          0.03343265 = sum of:
            0.03343265 = weight(_text_:22 in 3010) [ClassicSimilarity], result of:
              0.03343265 = score(doc=3010,freq=2.0), product of:
                0.1728227 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049352113 = queryNorm
                0.19345059 = fieldWeight in 3010, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3010)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    The role of a modern librarian has become more and more demanding in the information age. One of the new challenges for the information specialists is what's usually called "the teaching librarian", meaning that the librarian or information specialist should be able to teach at least basic practical searching skills to the patrons in need for relevant information. Query Performance Analyser (QPA) is a tool for analysing and comparing the performance of individual queries. It has been developed in the department of information studies at the University of Tampere. It can be used in user training to demonstrate the characteristics of IR systems and different searching strategies. Usually users can't get any feedback about the effectiveness of their queries and therefore may have difficulties to perceive the actual fectiveness of a query formulated, or the effect changes between queries. QPA provides a instant visual feedback about the performance of a given query and gives the user a possibility to compare the effectiveness of multiple queries and the performance of different query formulation strategies. QPA is based on predefined search topics. They all contain a corpus of documents that are relevant to the given topic. The purpose of this paper is to give a brief insight to the infrastructure of QPA, the basic :Functionality of the QPA-based game, and to its implementation in IR education.
    Date
    22. 7.2009 11:03:43
  11. Liu, Y.; Zhang, M.; Cen, R.; Ru, L.; Ma, S.: Data cleansing for Web information retrieval using query independent features (2007) 0.10
    0.096351944 = product of:
      0.14452791 = sum of:
        0.10200114 = weight(_text_:query in 607) [ClassicSimilarity], result of:
          0.10200114 = score(doc=607,freq=6.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.44470036 = fieldWeight in 607, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0390625 = fieldNorm(doc=607)
        0.04252677 = product of:
          0.08505354 = sum of:
            0.08505354 = weight(_text_:page in 607) [ClassicSimilarity], result of:
              0.08505354 = score(doc=607,freq=2.0), product of:
                0.27565226 = queryWeight, product of:
                  5.5854197 = idf(docFreq=450, maxDocs=44218)
                  0.049352113 = queryNorm
                0.30855376 = fieldWeight in 607, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.5854197 = idf(docFreq=450, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=607)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Understanding what kinds of Web pages are the most useful for Web search engine users is a critical task in Web information retrieval (IR). Most previous works used hyperlink analysis algorithms to solve this problem. However, little research has been focused on query-independent Web data cleansing for Web IR. In this paper, we first provide analysis of the differences between retrieval target pages and ordinary ones based on more than 30 million Web pages obtained from both the Text Retrieval Conference (TREC) and a widely used Chinese search engine, SOGOU (www.sogou.com). We further propose a learning-based data cleansing algorithm for reducing Web pages that are unlikely to be useful for user requests. We found that there exists a large proportion of low-quality Web pages in both the English and the Chinese Web page corpus, and retrieval target pages can be identified using query-independent features and cleansing algorithms. The experimental results showed that our algorithm is effective in reducing a large portion of Web pages with a small loss in retrieval target pages. It makes it possible for Web IR tools to meet a large fraction of users' needs with only a small part of pages on the Web. These results may help Web search engines make better use of their limited storage and computation resources to improve search performance.
  12. Lorigo, L.; Pan, B.; Hembrooke, H.; Joachims, T.; Granka, L.; Gay, G.: ¬The influence of task and gender on search and evaluation behavior using Google (2006) 0.10
    0.096351944 = product of:
      0.14452791 = sum of:
        0.10200114 = weight(_text_:query in 978) [ClassicSimilarity], result of:
          0.10200114 = score(doc=978,freq=6.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.44470036 = fieldWeight in 978, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0390625 = fieldNorm(doc=978)
        0.04252677 = product of:
          0.08505354 = sum of:
            0.08505354 = weight(_text_:page in 978) [ClassicSimilarity], result of:
              0.08505354 = score(doc=978,freq=2.0), product of:
                0.27565226 = queryWeight, product of:
                  5.5854197 = idf(docFreq=450, maxDocs=44218)
                  0.049352113 = queryNorm
                0.30855376 = fieldWeight in 978, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.5854197 = idf(docFreq=450, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=978)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    To improve search engine effectiveness, we have observed an increased interest in gathering additional feedback about users' information needs that goes beyond the queries they type in. Adaptive search engines use explicit and implicit feedback indicators to model users or search tasks. In order to create appropriate models, it is essential to understand how users interact with search engines, including the determining factors of their actions. Using eye tracking, we extend this understanding by analyzing the sequences and patterns with which users evaluate query result returned to them when using Google. We find that the query result abstracts are viewed in the order of their ranking in only about one fifth of the cases, and only an average of about three abstracts per result page are viewed at all. We also compare search behavior variability with respect to different classes of users and different classes of search tasks to reveal whether user models or task models may be greater predictors of behavior. We discover that gender and task significantly influence different kinds of search behaviors discussed here. The results are suggestive of improvements to query-based search interface designs with respect to both their use of space and workflow.
  13. Campos, L.M. de; Fernández-Luna, J.M.; Huete, J.F.: Implementing relevance feedback in the Bayesian network retrieval model (2003) 0.09
    0.09497397 = product of:
      0.14246094 = sum of:
        0.12240136 = weight(_text_:query in 825) [ClassicSimilarity], result of:
          0.12240136 = score(doc=825,freq=6.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.5336404 = fieldWeight in 825, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.046875 = fieldNorm(doc=825)
        0.020059591 = product of:
          0.040119182 = sum of:
            0.040119182 = weight(_text_:22 in 825) [ClassicSimilarity], result of:
              0.040119182 = score(doc=825,freq=2.0), product of:
                0.1728227 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049352113 = queryNorm
                0.23214069 = fieldWeight in 825, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=825)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Relevance Feedback consists in automatically formulating a new query according to the relevance judgments provided by the user after evaluating a set of retrieved documents. In this article, we introduce several relevance feedback methods for the Bayesian Network Retrieval ModeL The theoretical frame an which our methods are based uses the concept of partial evidences, which summarize the new pieces of information gathered after evaluating the results obtained by the original query. These partial evidences are inserted into the underlying Bayesian network and a new inference process (probabilities propagation) is run to compute the posterior relevance probabilities of the documents in the collection given the new query. The quality of the proposed methods is tested using a preliminary experimentation with different standard document collections.
    Date
    22. 3.2003 19:30:19
  14. Cecchini, R.L.; Lorenzetti, C.M.; Maguitman, A.G.; Brignole, N.B.: Using genetic algorithms to evolve a population of topical queries (2008) 0.09
    0.08966473 = product of:
      0.13449709 = sum of:
        0.11778076 = weight(_text_:query in 2443) [ClassicSimilarity], result of:
          0.11778076 = score(doc=2443,freq=8.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.5134957 = fieldWeight in 2443, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2443)
        0.016716326 = product of:
          0.03343265 = sum of:
            0.03343265 = weight(_text_:22 in 2443) [ClassicSimilarity], result of:
              0.03343265 = score(doc=2443,freq=2.0), product of:
                0.1728227 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049352113 = queryNorm
                0.19345059 = fieldWeight in 2443, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2443)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Systems for searching the Web based on thematic contexts can be built on top of a conventional search engine and benefit from the huge amount of content as well as from the functionality available through the search engine interface. The quality of the material collected by such systems is highly dependant on the vocabulary used to generate the search queries. In this scenario, selecting good query terms can be seen as an optimization problem where the objective function to be optimized is based on the effectiveness of a query to retrieve relevant material. Some characteristics of this optimization problem are: (1) the high-dimensionality of the search space, where candidate solutions are queries and each term corresponds to a different dimension, (2) the existence of acceptable suboptimal solutions, (3) the possibility of finding multiple solutions, and in many cases (4) the quest for novelty. This article describes optimization techniques based on Genetic Algorithms to evolve "good query terms" in the context of a given topic. The proposed techniques place emphasis on searching for novel material that is related to the search context. We discuss the use of a mutation pool to allow the generation of queries with new terms, study the effect of different mutation rates on the exploration of query-space, and discuss the use of a especially developed fitness function that favors the construction of queries containing novel but related terms.
    Date
    22.11.2008 12:49:22
  15. Zhu, J.; Song, D.; Rüger, S.: Integrating multiple windows and document features for expert finding (2009) 0.09
    0.08966473 = product of:
      0.13449709 = sum of:
        0.11778076 = weight(_text_:query in 2755) [ClassicSimilarity], result of:
          0.11778076 = score(doc=2755,freq=8.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.5134957 = fieldWeight in 2755, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2755)
        0.016716326 = product of:
          0.03343265 = sum of:
            0.03343265 = weight(_text_:22 in 2755) [ClassicSimilarity], result of:
              0.03343265 = score(doc=2755,freq=2.0), product of:
                0.1728227 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049352113 = queryNorm
                0.19345059 = fieldWeight in 2755, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2755)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Expert finding is a key task in enterprise search and has recently attracted lots of attention from both research and industry communities. Given a search topic, a prominent existing approach is to apply some information retrieval (IR) system to retrieve top ranking documents, which will then be used to derive associations between experts and the search topic based on cooccurrences. However, we argue that expert finding is more sensitive to multiple levels of associations and document features that current expert finding systems insufficiently address, including (a) multiple levels of associations between experts and search topics, (b) document internal structure, and (c) document authority. We propose a novel approach that integrates the above-mentioned three aspects as well as a query expansion technique in a two-stage model for expert finding. A systematic evaluation is conducted on TREC collections to test the performance of our approach as well as the effects of multiple windows, document features, and query expansion. These experimental results show that query expansion can dramatically improve expert finding performance with statistical significance. For three well-known IR models with or without query expansion, document internal structures help improve a single window-based approach but without statistical significance, while our novel multiple window-based approach can significantly improve the performance of a single window-based approach both with and without document internal structures.
    Date
    22. 3.2009 18:55:47
  16. Henzinger, M.R.: Hyperlink analysis for the Web (2001) 0.09
    0.08696495 = product of:
      0.13044742 = sum of:
        0.047112305 = weight(_text_:query in 8) [ClassicSimilarity], result of:
          0.047112305 = score(doc=8,freq=2.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.20539828 = fieldWeight in 8, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.03125 = fieldNorm(doc=8)
        0.08333511 = product of:
          0.16667022 = sum of:
            0.16667022 = weight(_text_:page in 8) [ClassicSimilarity], result of:
              0.16667022 = score(doc=8,freq=12.0), product of:
                0.27565226 = queryWeight, product of:
                  5.5854197 = idf(docFreq=450, maxDocs=44218)
                  0.049352113 = queryNorm
                0.6046394 = fieldWeight in 8, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  5.5854197 = idf(docFreq=450, maxDocs=44218)
                  0.03125 = fieldNorm(doc=8)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Content
    Information retrieval is a computer science subfield whose goal is to find all documents relevant to a user query in a given collection of documents. As such, information retrieval should really be called document retrieval. Before the advent of the Web, IR systems were typically installed in libraries for use mostly by reference librarians. The retrieval algorithm for these systems was usually based exclusively on analysis of the words in the document. The Web changed all this. Now each Web user has access to various search engines whose retrieval algorithms often use not only the words in the documents but also information like the hyperlink structure of the Web or markup language tags. How are hyperlinks useful? The hyperlink functionality alone-that is, the hyperlink to Web page B that is contained in Web page A-is not directly useful in information retrieval. However, the way Web page authors use hyperlinks can give them valuable information content. Authors usually create hyperlinks they think will be useful to readers. Some may be navigational aids that, for example, take the reader back to the site's home page; others provide access to documents that augment the content of the current page. The latter tend to point to highquality pages that might be on the same topic as the page containing the hyperlink. Web information retrieval systems can exploit this information to refine searches for relevant documents. Hyperlink analysis significantly improves the relevance of the search results, so much so that all major Web search engines claim to use some type of hyperlink analysis. However, the search engines do not disclose details about the type of hyperlink analysis they perform- mostly to avoid manipulation of search results by Web-positioning companies. In this article, I discuss how hyperlink analysis can be applied to ranking algorithms, and survey other ways Web search engines can use this analysis.
  17. Spink, A.; Wolfram, D.; Jansen, B.J.; Saracevic, T.: Searching the Web : the public and their queries (2001) 0.09
    0.08647631 = product of:
      0.12971446 = sum of:
        0.08160091 = weight(_text_:query in 6980) [ClassicSimilarity], result of:
          0.08160091 = score(doc=6980,freq=6.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.35576028 = fieldWeight in 6980, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.03125 = fieldNorm(doc=6980)
        0.048113547 = product of:
          0.096227095 = sum of:
            0.096227095 = weight(_text_:page in 6980) [ClassicSimilarity], result of:
              0.096227095 = score(doc=6980,freq=4.0), product of:
                0.27565226 = queryWeight, product of:
                  5.5854197 = idf(docFreq=450, maxDocs=44218)
                  0.049352113 = queryNorm
                0.34908873 = fieldWeight in 6980, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.5854197 = idf(docFreq=450, maxDocs=44218)
                  0.03125 = fieldNorm(doc=6980)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    In previous articles, we reported the state of Web searching in 1997 (Jansen, Spink, & Saracevic, 2000) and in 1999 (Spink, Wolfram, Jansen, & Saracevic, 2001). Such snapshot studies and statistics on Web use appear regularly (OCLC, 1999), but provide little information about Web searching trends. In this article, we compare and contrast results from our two previous studies of Excite queries' data sets, each containing over 1 million queries submitted by over 200,000 Excite users collected on 16 September 1997 and 20 December 1999. We examine how public Web searching changing during that 2-year time period. As Table 1 shows, the overall structure of Web queries in some areas did not change, while in others we see change from 1997 to 1999. Our comparison shows how Web searching changed incrementally and also dramatically. We see some moves toward greater simplicity, including shorter queries (i.e., fewer terms) and shorter sessions (i.e., fewer queries per user), with little modification (addition or deletion) of terms in subsequent queries. The trend toward shorter queries suggests that Web information content should target specific terms in order to reach Web users. Another trend was to view fewer pages of results per query. Most Excite users examined only one page of results per query, since an Excite results page contains ten ranked Web sites. Were users satisfied with the results and did not need to view more pages? It appears that the public continues to have a low tolerance of wading through retrieved sites. This decline in interactivity levels is a disturbing finding for the future of Web searching. Queries that included Boolean operators were in the minority, but the percentage increased between the two time periods. Most Boolean use involved the AND operator with many mistakes. The use of relevance feedback almost doubled from 1997 to 1999, but overall use was still small. An unusually large number of terms were used with low frequency, such as personal names, spelling errors, non-English words, and Web-specific terms, such as URLs. Web query vocabulary contains more words than found in large English texts in general. The public language of Web queries has its own and unique characteristics. How did Web searching topics change from 1997 to 1999? We classified a random sample of 2,414 queries from 1997 and 2,539 queries from 1999 into 11 categories (Table 2). From 1997 to 1999, Web searching shifted from entertainment, recreation and sex, and pornography, preferences to e-commerce-related topics under commerce, travel, employment, and economy. This shift coincided with changes in information distribution on the publicly indexed Web.
  18. Haveliwala, T.: Context-Sensitive Web search (2005) 0.09
    0.08647631 = product of:
      0.12971446 = sum of:
        0.08160091 = weight(_text_:query in 2567) [ClassicSimilarity], result of:
          0.08160091 = score(doc=2567,freq=6.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.35576028 = fieldWeight in 2567, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.03125 = fieldNorm(doc=2567)
        0.048113547 = product of:
          0.096227095 = sum of:
            0.096227095 = weight(_text_:page in 2567) [ClassicSimilarity], result of:
              0.096227095 = score(doc=2567,freq=4.0), product of:
                0.27565226 = queryWeight, product of:
                  5.5854197 = idf(docFreq=450, maxDocs=44218)
                  0.049352113 = queryNorm
                0.34908873 = fieldWeight in 2567, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.5854197 = idf(docFreq=450, maxDocs=44218)
                  0.03125 = fieldNorm(doc=2567)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    As the Web continues to grow and encompass broader and more diverse sources of information, providing effective search facilities to users becomes an increasingly challenging problem. To help users deal with the deluge of Web-accessible information, we propose a search system which makes use of context to improve search results in a scalable way. By context, we mean any sources of information, in addition to any search query, that provide clues about the user's true information need. For instance, a user's bookmarks and search history can be considered a part of the search context. We consider two types of context-based search. The first type of functionality we consider is "similarity search." In this case, as the user is browsing Web pages, URLs for pages similar to the current page are retrieved and displayed in a side panel. No query is explicitly issued; context alone (i.e., the page currently being viewed) is used to provide the user with useful related information. The second type of functionality involves taking search context into account when ranking results to standard search queries. Web search differs from traditional information retrieval tasks in several major ways, making effective context-sensitive Web search challenging. First, scalability is of critical importance. With billions of publicly accessible documents, the Web is much larger than traditional datasets. Similarly, with millions of search queries issued each day, the query load is much higher than for traditional information retrieval systems. Second, there are no guarantees on the quality ofWeb pages, with Web-authors taking an adversarial, rather than cooperative, approach in attempts to inflate the rankings of their pages. Third, there is a significant amount of metadata embodied in the link structure corresponding to the hyperlinks between Web pages that can be exploitedduring the retrieval process. In this thesis, we design a search system, using the Stanford WebBase platform, that exploits the link structure of the Web to provide scalable, context-sensitive search.
  19. White, R.W.; Jose, J.M.; Ruthven, I.: Using top-ranking sentences to facilitate effective information access (2005) 0.08
    0.08387356 = product of:
      0.12581034 = sum of:
        0.08328357 = weight(_text_:query in 3881) [ClassicSimilarity], result of:
          0.08328357 = score(doc=3881,freq=4.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.3630963 = fieldWeight in 3881, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3881)
        0.04252677 = product of:
          0.08505354 = sum of:
            0.08505354 = weight(_text_:page in 3881) [ClassicSimilarity], result of:
              0.08505354 = score(doc=3881,freq=2.0), product of:
                0.27565226 = queryWeight, product of:
                  5.5854197 = idf(docFreq=450, maxDocs=44218)
                  0.049352113 = queryNorm
                0.30855376 = fieldWeight in 3881, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.5854197 = idf(docFreq=450, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3881)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Web searchers typically fall to view search results beyond the first page nor fully examine those results presented to them. In this article we describe an approach that encourages a deeper examination of the contents of the document set retrieved in response to a searcher's query. The approach shifts the focus of perusal and interaction away from potentially uninformative document surrogates (such as titles, sentence fragments, and URLs) to actual document content, and uses this content to drive the information seeking process. Current search interfaces assume searchers examine results document-by-document. In contrast our approach extracts, ranks, and presents the contents of the top-ranked document set. We use query-relevant topranking sentences extracted from the top documents at retrieval time as fine-grained representations of topranked document content and, when combined in a ranked list, an overview of these documents. The interaction of the searcher provides implicit evidence that is used to reorder the sentences where appropriate. We evaluate our approach in three separate user studies, each applying these sentences in a different way. The findings of these studies show that top-ranking sentences can facilitate effective information access.
  20. Agosti, M.; Pretto, L.: ¬A theoretical study of a generalized version of kleinberg's HITS algorithm (2005) 0.08
    0.08387356 = product of:
      0.12581034 = sum of:
        0.08328357 = weight(_text_:query in 4) [ClassicSimilarity], result of:
          0.08328357 = score(doc=4,freq=4.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.3630963 = fieldWeight in 4, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4)
        0.04252677 = product of:
          0.08505354 = sum of:
            0.08505354 = weight(_text_:page in 4) [ClassicSimilarity], result of:
              0.08505354 = score(doc=4,freq=2.0), product of:
                0.27565226 = queryWeight, product of:
                  5.5854197 = idf(docFreq=450, maxDocs=44218)
                  0.049352113 = queryNorm
                0.30855376 = fieldWeight in 4, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.5854197 = idf(docFreq=450, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Kleinberg's HITS (Hyperlink-Induced Topic Search) algorithm (Kleinberg 1999), which was originally developed in a Web context, tries to infer the authoritativeness of a Web page in relation to a specific query using the structure of a subgraph of the Web graph, which is obtained considering this specific query. Recent applications of this algorithm in contexts far removed from that of Web searching (Bacchin, Ferro and Melucci 2002, Ng et al. 2001) inspired us to study the algorithm in the abstract, independently of its particular applications, trying to mathematically illuminate its behaviour. In the present paper we detail this theoretical analysis. The original work starts from the definition of a revised and more general version of the algorithm, which includes the classic one as a particular case. We perform an analysis of the structure of two particular matrices, essential to studying the behaviour of the algorithm, and we prove the convergence of the algorithm in the most general case, finding the analytic expression of the vectors to which it converges. Then we study the symmetry of the algorithm and prove the equivalence between the existence of symmetry and the independence from the order of execution of some basic operations on initial vectors. Finally, we expound some interesting consequences of our theoretical results.

Languages

Types

  • a 1570
  • m 191
  • el 107
  • s 66
  • b 27
  • x 20
  • i 9
  • r 3
  • n 2
  • More… Less…

Themes

Subjects

Classifications