Search (124 results, page 2 of 7)

  • × theme_ss:"Retrievalalgorithmen"
  1. Thelwall, M.: Can Google's PageRank be used to find the most important academic Web pages? (2003) 0.00
    0.0046117427 = product of:
      0.027670456 = sum of:
        0.027670456 = product of:
          0.055340912 = sum of:
            0.055340912 = weight(_text_:web in 4457) [ClassicSimilarity], result of:
              0.055340912 = score(doc=4457,freq=10.0), product of:
                0.11439841 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03505379 = queryNorm
                0.48375595 = fieldWeight in 4457, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4457)
          0.5 = coord(1/2)
      0.16666667 = coord(1/6)
    
    Abstract
    Google's PageRank is an influential algorithm that uses a model of Web use that is dominated by its link structure in order to rank pages by their estimated value to the Web community. This paper reports on the outcome of applying the algorithm to the Web sites of three national university systems in order to test whether it is capable of identifying the most important Web pages. The results are also compared with simple inlink counts. It was discovered that the highest inlinked pages do not always have the highest PageRank, indicating that the two metrics are genuinely different, even for the top pages. More significantly, however, internal links dominated external links for the high ranks in either method and superficial reasons accounted for high scores in both cases. It is concluded that PageRank is not useful for identifying the top pages in a site and that it must be combined with a powerful text matching techniques in order to get the quality of information retrieval results provided by Google.
  2. Henzinger, M.R.: Link analysis in Web information retrieval (2000) 0.00
    0.0043479926 = product of:
      0.026087955 = sum of:
        0.026087955 = product of:
          0.05217591 = sum of:
            0.05217591 = weight(_text_:web in 801) [ClassicSimilarity], result of:
              0.05217591 = score(doc=801,freq=20.0), product of:
                0.11439841 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03505379 = queryNorm
                0.45608947 = fieldWeight in 801, product of:
                  4.472136 = tf(freq=20.0), with freq of:
                    20.0 = termFreq=20.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03125 = fieldNorm(doc=801)
          0.5 = coord(1/2)
      0.16666667 = coord(1/6)
    
    Abstract
    The analysis of the hyperlink structure of the web has led to significant improvements in web information retrieval. This survey describes two successful link analysis algorithms and the state-of-the art of the field.
    Content
    The goal of information retrieval is to find all documents relevant for a user query in a collection of documents. Decades of research in information retrieval were successful in developing and refining techniques that are solely word-based (see e.g., [2]). With the advent of the web new sources of information became available, one of them being the hyperlinks between documents and records of user behavior. To be precise, hypertexts (i.e., collections of documents connected by hyperlinks) have existed and have been studied for a long time. What was new was the large number of hyperlinks created by independent individuals. Hyperlinks provide a valuable source of information for web information retrieval as we will show in this article. This area of information retrieval is commonly called link analysis. Why would one expect hyperlinks to be useful? Ahyperlink is a reference of a web page B that is contained in a web page A. When the hyperlink is clicked on in a web browser, the browser displays page B. This functionality alone is not helpful for web information retrieval. However, the way hyperlinks are typically used by authors of web pages can give them valuable information content. Typically, authors create links because they think they will be useful for the readers of the pages. Thus, links are usually either navigational aids that, for example, bring the reader back to the homepage of the site, or links that point to pages whose content augments the content of the current page. The second kind of links tend to point to high-quality pages that might be on the same topic as the page containing the link.
  3. Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.00
    0.004221604 = product of:
      0.025329625 = sum of:
        0.025329625 = product of:
          0.075988874 = sum of:
            0.075988874 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
              0.075988874 = score(doc=402,freq=2.0), product of:
                0.1227524 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03505379 = queryNorm
                0.61904186 = fieldWeight in 402, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.125 = fieldNorm(doc=402)
          0.33333334 = coord(1/3)
      0.16666667 = coord(1/6)
    
    Source
    Information processing and management. 22(1986) no.6, S.465-476
  4. Jascó, P.: Mapping algorithms to translate natural language questions into search queries for Web databases (1997) 0.00
    0.0041248677 = product of:
      0.024749206 = sum of:
        0.024749206 = product of:
          0.049498413 = sum of:
            0.049498413 = weight(_text_:web in 314) [ClassicSimilarity], result of:
              0.049498413 = score(doc=314,freq=2.0), product of:
                0.11439841 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03505379 = queryNorm
                0.43268442 = fieldWeight in 314, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.09375 = fieldNorm(doc=314)
          0.5 = coord(1/2)
      0.16666667 = coord(1/6)
    
  5. Dominich, S.; Skrop, A.: PageRank and interaction information retrieval (2005) 0.00
    0.0041248677 = product of:
      0.024749206 = sum of:
        0.024749206 = product of:
          0.049498413 = sum of:
            0.049498413 = weight(_text_:web in 3268) [ClassicSimilarity], result of:
              0.049498413 = score(doc=3268,freq=8.0), product of:
                0.11439841 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03505379 = queryNorm
                0.43268442 = fieldWeight in 3268, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3268)
          0.5 = coord(1/2)
      0.16666667 = coord(1/6)
    
    Abstract
    The PageRank method is used by the Google Web search engine to compute the importance of Web pages. Two different views have been developed for the Interpretation of the PageRank method and values: (a) stochastic (random surfer): the PageRank values can be conceived as the steady-state distribution of a Markov chain, and (b) algebraic: the PageRank values form the eigenvector corresponding to eigenvalue 1 of the Web link matrix. The Interaction Information Retrieval (1**2 R) method is a nonclassical information retrieval paradigm, which represents a connectionist approach based an dynamic systems. In the present paper, a different Interpretation of PageRank is proposed, namely, a dynamic systems viewpoint, by showing that the PageRank method can be formally interpreted as a particular case of the Interaction Information Retrieval method; and thus, the PageRank values may be interpreted as neutral equilibrium points of the Web.
  6. Berry, M.W.; Browne, M.: Understanding search engines : mathematical modeling and text retrieval (1999) 0.00
    0.0041248677 = product of:
      0.024749206 = sum of:
        0.024749206 = product of:
          0.049498413 = sum of:
            0.049498413 = weight(_text_:web in 5777) [ClassicSimilarity], result of:
              0.049498413 = score(doc=5777,freq=8.0), product of:
                0.11439841 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03505379 = queryNorm
                0.43268442 = fieldWeight in 5777, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5777)
          0.5 = coord(1/2)
      0.16666667 = coord(1/6)
    
    LCSH
    Web search engines
    RSWK
    World Wide Web / Suchmaschine / Mathematisches Modell (BVB)
    Subject
    World Wide Web / Suchmaschine / Mathematisches Modell (BVB)
    Web search engines
  7. Archuby, C.G.: Interfaces se recuperacion para catalogos en linea con salidas ordenadas por probable relevancia (2000) 0.00
    0.0037652773 = product of:
      0.022591664 = sum of:
        0.022591664 = product of:
          0.06777499 = sum of:
            0.06777499 = weight(_text_:29 in 5727) [ClassicSimilarity], result of:
              0.06777499 = score(doc=5727,freq=4.0), product of:
                0.12330827 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03505379 = queryNorm
                0.5496386 = fieldWeight in 5727, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.078125 = fieldNorm(doc=5727)
          0.33333334 = coord(1/3)
      0.16666667 = coord(1/6)
    
    Date
    29. 1.1996 18:23:13
    Source
    Ciencia da informacao. 29(2000) no.3, S.5-13
  8. Crestani, F.: Combination of similarity measures for effective spoken document retrieval (2003) 0.00
    0.0037274342 = product of:
      0.022364605 = sum of:
        0.022364605 = product of:
          0.06709381 = sum of:
            0.06709381 = weight(_text_:29 in 4690) [ClassicSimilarity], result of:
              0.06709381 = score(doc=4690,freq=2.0), product of:
                0.12330827 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03505379 = queryNorm
                0.5441145 = fieldWeight in 4690, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.109375 = fieldNorm(doc=4690)
          0.33333334 = coord(1/3)
      0.16666667 = coord(1/6)
    
    Source
    Journal of information science. 29(2003) no.2, S.87-96
  9. Smeaton, A.F.; Rijsbergen, C.J. van: ¬The retrieval effects of query expansion on a feedback document retrieval system (1983) 0.00
    0.0036939036 = product of:
      0.02216342 = sum of:
        0.02216342 = product of:
          0.06649026 = sum of:
            0.06649026 = weight(_text_:22 in 2134) [ClassicSimilarity], result of:
              0.06649026 = score(doc=2134,freq=2.0), product of:
                0.1227524 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03505379 = queryNorm
                0.5416616 = fieldWeight in 2134, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=2134)
          0.33333334 = coord(1/3)
      0.16666667 = coord(1/6)
    
    Date
    30. 3.2001 13:32:22
  10. Back, J.: ¬An evaluation of relevancy ranking techniques used by Internet search engines (2000) 0.00
    0.0036939036 = product of:
      0.02216342 = sum of:
        0.02216342 = product of:
          0.06649026 = sum of:
            0.06649026 = weight(_text_:22 in 3445) [ClassicSimilarity], result of:
              0.06649026 = score(doc=3445,freq=2.0), product of:
                0.1227524 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03505379 = queryNorm
                0.5416616 = fieldWeight in 3445, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=3445)
          0.33333334 = coord(1/3)
      0.16666667 = coord(1/6)
    
    Date
    25. 8.2005 17:42:22
  11. Radev, D.; Fan, W.; Qu, H.; Wu, H.; Grewal, A.: Probabilistic question answering on the Web (2005) 0.00
    0.0035722405 = product of:
      0.021433443 = sum of:
        0.021433443 = product of:
          0.042866886 = sum of:
            0.042866886 = weight(_text_:web in 3455) [ClassicSimilarity], result of:
              0.042866886 = score(doc=3455,freq=6.0), product of:
                0.11439841 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03505379 = queryNorm
                0.37471575 = fieldWeight in 3455, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3455)
          0.5 = coord(1/2)
      0.16666667 = coord(1/6)
    
    Abstract
    Web-based search engines such as Google and NorthernLight return documents that are relevant to a user query, not answers to user questions. We have developed an architecture that augments existing search engines so that they support natural language question answering. The process entails five steps: query modulation, document retrieval, passage extraction, phrase extraction, and answer ranking. In this article, we describe some probabilistic approaches to the last three of these stages. We show how our techniques apply to a number of existing search engines, and we also present results contrasting three different methods for question answering. Our algorithm, probabilistic phrase reranking (PPR), uses proximity and question type features and achieves a total reciprocal document rank of .20 an the TREC8 corpus. Our techniques have been implemented as a Web-accessible system, called NSIR.
  12. Fu, X.: Towards a model of implicit feedback for Web search (2010) 0.00
    0.0035722405 = product of:
      0.021433443 = sum of:
        0.021433443 = product of:
          0.042866886 = sum of:
            0.042866886 = weight(_text_:web in 3310) [ClassicSimilarity], result of:
              0.042866886 = score(doc=3310,freq=6.0), product of:
                0.11439841 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03505379 = queryNorm
                0.37471575 = fieldWeight in 3310, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3310)
          0.5 = coord(1/2)
      0.16666667 = coord(1/6)
    
    Abstract
    This research investigated several important issues in using implicit feedback techniques to assist searchers with difficulties in formulating effective search strategies. It focused on examining the relationship between types of behavioral evidence that can be captured from Web searches and searchers' interests. A carefully crafted observation study was conducted to capture, examine, and elucidate the analytical processes and work practices of human analysts when they simulated the role of an implicit feedback system by trying to infer searchers' interests from behavioral traces. Findings provided rare insight into the complexities and nuances in using behavioral evidence for implicit feedback and led to the proposal of an implicit feedback model for Web search that bridged previous studies on behavioral evidence and implicit feedback measures. A new level of analysis termed an analytical lens emerged from the data and provides a road map for future research on this topic.
  13. Moura, E.S. de; Fernandes, D.; Ribeiro-Neto, B.; Silva, A.S. da; Gonçalves, M.A.: Using structural information to improve search in Web collections (2010) 0.00
    0.0035722405 = product of:
      0.021433443 = sum of:
        0.021433443 = product of:
          0.042866886 = sum of:
            0.042866886 = weight(_text_:web in 4119) [ClassicSimilarity], result of:
              0.042866886 = score(doc=4119,freq=6.0), product of:
                0.11439841 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03505379 = queryNorm
                0.37471575 = fieldWeight in 4119, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4119)
          0.5 = coord(1/2)
      0.16666667 = coord(1/6)
    
    Abstract
    In this work, we investigate the problem of using the block structure of Web pages to improve ranking results. Starting with basic intuitions provided by the concepts of term frequency (TF) and inverse document frequency (IDF), we propose nine block-weight functions to distinguish the impact of term occurrences inside page blocks, instead of inside whole pages. These are then used to compute a modified BM25 ranking function. Using four distinct Web collections, we ran extensive experiments to compare our block-weight ranking formulas with two other baselines: (a) a BM25 ranking applied to full pages, and (b) a BM25 ranking that takes into account best blocks. Our methods suggest that our block-weighting ranking method is superior to all baselines across all collections we used and that average gain in precision figures from 5 to 20% are generated.
  14. Walz, J.: Analyse der Übertragbarkeit allgemeiner Rankingfaktoren von Web-Suchmaschinen auf Discovery-Systeme (2018) 0.00
    0.0035722405 = product of:
      0.021433443 = sum of:
        0.021433443 = product of:
          0.042866886 = sum of:
            0.042866886 = weight(_text_:web in 5744) [ClassicSimilarity], result of:
              0.042866886 = score(doc=5744,freq=6.0), product of:
                0.11439841 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03505379 = queryNorm
                0.37471575 = fieldWeight in 5744, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5744)
          0.5 = coord(1/2)
      0.16666667 = coord(1/6)
    
    Abstract
    Ziel: Ziel dieser Bachelorarbeit war es, die Übertragbarkeit der allgemeinen Rankingfaktoren, wie sie von Web-Suchmaschinen verwendet werden, auf Discovery-Systeme zu analysieren. Dadurch könnte das bisher hauptsächlich auf dem textuellen Abgleich zwischen Suchanfrage und Dokumenten basierende bibliothekarische Ranking verbessert werden. Methode: Hierfür wurden Faktoren aus den Gruppen Popularität, Aktualität, Lokalität, Technische Faktoren, sowie dem personalisierten Ranking diskutiert. Die entsprechenden Rankingfaktoren wurden nach ihrer Vorkommenshäufigkeit in der analysierten Literatur und der daraus abgeleiteten Wichtigkeit, ausgewählt. Ergebnis: Von den 23 untersuchten Rankingfaktoren sind 14 (61 %) direkt vom Ranking der Web-Suchmaschinen auf das Ranking der Discovery-Systeme übertragbar. Zu diesen zählen unter anderem das Klickverhalten, das Erstellungsdatum, der Nutzerstandort, sowie die Sprache. Sechs (26%) der untersuchten Faktoren sind dagegen nicht übertragbar (z.B. Aktualisierungsfrequenz und Ladegeschwindigkeit). Die Linktopologie, die Nutzungshäufigkeit, sowie die Aktualisierungsfrequenz sind mit entsprechenden Modifikationen übertragbar.
  15. Courtois, M.P.; Berry, M.W.: Results ranking in Web search engines (1999) 0.00
    0.00343739 = product of:
      0.02062434 = sum of:
        0.02062434 = product of:
          0.04124868 = sum of:
            0.04124868 = weight(_text_:web in 3726) [ClassicSimilarity], result of:
              0.04124868 = score(doc=3726,freq=2.0), product of:
                0.11439841 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03505379 = queryNorm
                0.36057037 = fieldWeight in 3726, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.078125 = fieldNorm(doc=3726)
          0.5 = coord(1/2)
      0.16666667 = coord(1/6)
    
  16. Chen, H.; Lally, A.M.; Zhu, B.; Chau, M.: HelpfulMed : Intelligent searching for medical information over the Internet (2003) 0.00
    0.00343739 = product of:
      0.02062434 = sum of:
        0.02062434 = product of:
          0.04124868 = sum of:
            0.04124868 = weight(_text_:web in 1615) [ClassicSimilarity], result of:
              0.04124868 = score(doc=1615,freq=8.0), product of:
                0.11439841 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03505379 = queryNorm
                0.36057037 = fieldWeight in 1615, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1615)
          0.5 = coord(1/2)
      0.16666667 = coord(1/6)
    
    Abstract
    The Medical professionals and researchers need information from reputable sources to accomplish their work. Unfortunately, the Web has a large number of documents that are irrelevant to their work, even those documents that purport to be "medically-related." This paper describes an architecture designed to integrate advanced searching and indexing algorithms, an automatic thesaurus, or "concept space," and Kohonen-based Self-Organizing Map (SOM) technologies to provide searchers with finegrained results. Initial results indicate that these systems provide complementary retrieval functionalities. HelpfulMed not only allows users to search Web pages and other online databases, but also allows them to build searches through the use of an automatic thesaurus and browse a graphical display of medical-related topics. Evaluation results for each of the different components are included. Our spidering algorithm outperformed both breadth-first search and PageRank spiders an a test collection of 100,000 Web pages. The automatically generated thesaurus performed as well as both MeSH and UMLS-systems which require human mediation for currency. Lastly, a variant of the Kohonen SOM was comparable to MeSH terms in perceived cluster precision and significantly better at perceived cluster recall.
    Footnote
    Teil eines Themenheftes: "Web retrieval and mining: A machine learning perspective"
  17. Bar-Ilan, J.; Levene, M.: ¬The hw-rank : an h-index variant for ranking web pages (2015) 0.00
    0.00343739 = product of:
      0.02062434 = sum of:
        0.02062434 = product of:
          0.04124868 = sum of:
            0.04124868 = weight(_text_:web in 1694) [ClassicSimilarity], result of:
              0.04124868 = score(doc=1694,freq=2.0), product of:
                0.11439841 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03505379 = queryNorm
                0.36057037 = fieldWeight in 1694, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.078125 = fieldNorm(doc=1694)
          0.5 = coord(1/2)
      0.16666667 = coord(1/6)
    
  18. Austin, D.: How Google finds your needle in the Web's haystack : as we'll see, the trick is to ask the web itself to rank the importance of pages... (2006) 0.00
    0.0034028427 = product of:
      0.020417055 = sum of:
        0.020417055 = product of:
          0.04083411 = sum of:
            0.04083411 = weight(_text_:web in 93) [ClassicSimilarity], result of:
              0.04083411 = score(doc=93,freq=16.0), product of:
                0.11439841 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03505379 = queryNorm
                0.35694647 = fieldWeight in 93, product of:
                  4.0 = tf(freq=16.0), with freq of:
                    16.0 = termFreq=16.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=93)
          0.5 = coord(1/2)
      0.16666667 = coord(1/6)
    
    Abstract
    Imagine a library containing 25 billion documents but with no centralized organization and no librarians. In addition, anyone may add a document at any time without telling anyone. You may feel sure that one of the documents contained in the collection has a piece of information that is vitally important to you, and, being impatient like most of us, you'd like to find it in a matter of seconds. How would you go about doing it? Posed in this way, the problem seems impossible. Yet this description is not too different from the World Wide Web, a huge, highly-disorganized collection of documents in many different formats. Of course, we're all familiar with search engines (perhaps you found this article using one) so we know that there is a solution. This article will describe Google's PageRank algorithm and how it returns pages from the web's collection of 25 billion documents that match search criteria so well that "google" has become a widely used verb. Most search engines, including Google, continually run an army of computer programs that retrieve pages from the web, index the words in each document, and store this information in an efficient format. Each time a user asks for a web search using a search phrase, such as "search engine," the search engine determines all the pages on the web that contains the words in the search phrase. (Perhaps additional information such as the distance between the words "search" and "engine" will be noted as well.) Here is the problem: Google now claims to index 25 billion pages. Roughly 95% of the text in web pages is composed from a mere 10,000 words. This means that, for most searches, there will be a huge number of pages containing the words in the search phrase. What is needed is a means of ranking the importance of the pages that fit the search criteria so that the pages can be sorted with the most important pages at the top of the list. One way to determine the importance of pages is to use a human-generated ranking. For instance, you may have seen pages that consist mainly of a large number of links to other resources in a particular area of interest. Assuming the person maintaining this page is reliable, the pages referenced are likely to be useful. Of course, the list may quickly fall out of date, and the person maintaining the list may miss some important pages, either unintentionally or as a result of an unstated bias. Google's PageRank algorithm assesses the importance of web pages without human evaluation of the content. In fact, Google feels that the value of its service is largely in its ability to provide unbiased results to search queries; Google claims, "the heart of our software is PageRank." As we'll see, the trick is to ask the web itself to rank the importance of pages.
  19. Ning, X.; Jin, H.; Jia, W.; Yuan, P.: Practical and effective IR-style keyword search over semantic web (2009) 0.00
    0.0034028427 = product of:
      0.020417055 = sum of:
        0.020417055 = product of:
          0.04083411 = sum of:
            0.04083411 = weight(_text_:web in 4213) [ClassicSimilarity], result of:
              0.04083411 = score(doc=4213,freq=4.0), product of:
                0.11439841 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03505379 = queryNorm
                0.35694647 = fieldWeight in 4213, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=4213)
          0.5 = coord(1/2)
      0.16666667 = coord(1/6)
    
    Abstract
    This paper presents a novel IR-style keyword search model for semantic web data retrieval, distinguished from current retrieval methods. In this model, an answer to a keyword query is a connected subgraph that contains all the query keywords. In addition, the answer is minimal because any proper subgraph can not be an answer to the query. We provide an approximation algorithm to retrieve these answers efficiently. A special ranking strategy is also proposed so that answers can be appropriately ordered. The experimental results over real datasets show that our model outperforms existing possible solutions with respect to effectiveness and efficiency.
  20. Okada, M.; Ando, K.; Lee, S.S.; Hayashi, Y.; Aoe, J.I.: ¬An efficient substring search method by using delayed keyword extraction (2001) 0.00
    0.0031949438 = product of:
      0.019169662 = sum of:
        0.019169662 = product of:
          0.057508986 = sum of:
            0.057508986 = weight(_text_:29 in 6415) [ClassicSimilarity], result of:
              0.057508986 = score(doc=6415,freq=2.0), product of:
                0.12330827 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03505379 = queryNorm
                0.46638384 = fieldWeight in 6415, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.09375 = fieldNorm(doc=6415)
          0.33333334 = coord(1/3)
      0.16666667 = coord(1/6)
    
    Date
    29. 3.2002 17:24:03

Years

Languages

  • e 105
  • d 17
  • m 1
  • pt 1
  • More… Less…

Types

  • a 112
  • m 6
  • el 3
  • x 3
  • s 2
  • r 1
  • More… Less…