Search (64 results, page 1 of 4)

  • × language_ss:"e"
  • × theme_ss:"Retrievalalgorithmen"
  • × year_i:[2000 TO 2010}
  1. Ning, X.; Jin, H.; Wu, H.: RSS: a framework enabling ranked search on the semantic web (2008) 0.05
    0.0541602 = product of:
      0.0812403 = sum of:
        0.06519419 = weight(_text_:resources in 2069) [ClassicSimilarity], result of:
          0.06519419 = score(doc=2069,freq=6.0), product of:
            0.18665522 = queryWeight, product of:
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.051133685 = queryNorm
            0.349276 = fieldWeight in 2069, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2069)
        0.016046109 = product of:
          0.032092217 = sum of:
            0.032092217 = weight(_text_:management in 2069) [ClassicSimilarity], result of:
              0.032092217 = score(doc=2069,freq=2.0), product of:
                0.17235184 = queryWeight, product of:
                  3.3706124 = idf(docFreq=4130, maxDocs=44218)
                  0.051133685 = queryNorm
                0.18620178 = fieldWeight in 2069, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.3706124 = idf(docFreq=4130, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2069)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    The semantic web not only contains resources but also includes the heterogeneous relationships among them, which is sharply distinguished from the current web. As the growth of the semantic web, specialized search techniques are of significance. In this paper, we present RSS-a framework for enabling ranked semantic search on the semantic web. In this framework, the heterogeneity of relationships is fully exploited to determine the global importance of resources. In addition, the search results can be greatly expanded with entities most semantically related to the query, thus able to provide users with properly ordered semantic search results by combining global ranking values and the relevance between the resources and the query. The proposed semantic search model which supports inference is very different from traditional keyword-based search methods. Moreover, RSS also distinguishes from many current methods of accessing the semantic web data in that it applies novel ranking strategies to prevent returning search results in disorder. The experimental results show that the framework is feasible and can produce better ordering of semantic search results than directly applying the standard PageRank algorithm on the semantic web.
    Source
    Information processing and management. 44(2008) no.2, S.893-909
  2. Quiroga, L.M.; Mostafa, J.: ¬An experiment in building profiles in information filtering : the role of context of user relevance feedback (2002) 0.04
    0.035790663 = product of:
      0.053685993 = sum of:
        0.037639882 = weight(_text_:resources in 2579) [ClassicSimilarity], result of:
          0.037639882 = score(doc=2579,freq=2.0), product of:
            0.18665522 = queryWeight, product of:
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.051133685 = queryNorm
            0.20165458 = fieldWeight in 2579, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2579)
        0.016046109 = product of:
          0.032092217 = sum of:
            0.032092217 = weight(_text_:management in 2579) [ClassicSimilarity], result of:
              0.032092217 = score(doc=2579,freq=2.0), product of:
                0.17235184 = queryWeight, product of:
                  3.3706124 = idf(docFreq=4130, maxDocs=44218)
                  0.051133685 = queryNorm
                0.18620178 = fieldWeight in 2579, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.3706124 = idf(docFreq=4130, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2579)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    An experiment was conducted to see how relevance feedback could be used to build and adjust profiles to improve the performance of filtering systems. Data was collected during the system interaction of 18 graduate students with SIFTER (Smart Information Filtering Technology for Electronic Resources), a filtering system that ranks incoming information based on users' profiles. The data set came from a collection of 6000 records concerning consumer health. In the first phase of the study, three different modes of profile acquisition were compared. The explicit mode allowed users to directly specify the profile; the implicit mode utilized relevance feedback to create and refine the profile; and the combined mode allowed users to initialize the profile and to continuously refine it using relevance feedback. Filtering performance, measured in terms of Normalized Precision, showed that the three approaches were significantly different ( [small alpha, Greek] =0.05 and p =0.012). The explicit mode of profile acquisition consistently produced superior results. Exclusive reliance on relevance feedback in the implicit mode resulted in inferior performance. The low performance obtained by the implicit acquisition mode motivated the second phase of the study, which aimed to clarify the role of context in relevance feedback judgments. An inductive content analysis of thinking aloud protocols showed dimensions that were highly situational, establishing the importance context plays in feedback relevance assessments. Results suggest the need for better representation of documents, profiles, and relevance feedback mechanisms that incorporate dimensions identified in this research.
    Source
    Information processing and management. 38(2002) no.5, S.671-694
  3. Zhang, D.; Dong, Y.: ¬An effective algorithm to rank Web resources (2000) 0.04
    0.035130557 = product of:
      0.10539167 = sum of:
        0.10539167 = weight(_text_:resources in 3662) [ClassicSimilarity], result of:
          0.10539167 = score(doc=3662,freq=2.0), product of:
            0.18665522 = queryWeight, product of:
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.051133685 = queryNorm
            0.56463283 = fieldWeight in 3662, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.109375 = fieldNorm(doc=3662)
      0.33333334 = coord(1/3)
    
  4. Witschel, H.F.: Global term weights in distributed environments (2008) 0.03
    0.026692703 = product of:
      0.08007811 = sum of:
        0.08007811 = sum of:
          0.03851066 = weight(_text_:management in 2096) [ClassicSimilarity], result of:
            0.03851066 = score(doc=2096,freq=2.0), product of:
              0.17235184 = queryWeight, product of:
                3.3706124 = idf(docFreq=4130, maxDocs=44218)
                0.051133685 = queryNorm
              0.22344214 = fieldWeight in 2096, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3706124 = idf(docFreq=4130, maxDocs=44218)
                0.046875 = fieldNorm(doc=2096)
          0.04156745 = weight(_text_:22 in 2096) [ClassicSimilarity], result of:
            0.04156745 = score(doc=2096,freq=2.0), product of:
              0.17906146 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.051133685 = queryNorm
              0.23214069 = fieldWeight in 2096, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=2096)
      0.33333334 = coord(1/3)
    
    Date
    1. 8.2008 9:44:22
    Source
    Information processing and management. 44(2008) no.3, S.1049-1061
  5. Heinz, S.; Zobel, J.: Efficient single-pass index construction for text databases (2003) 0.02
    0.017565278 = product of:
      0.052695833 = sum of:
        0.052695833 = weight(_text_:resources in 1678) [ClassicSimilarity], result of:
          0.052695833 = score(doc=1678,freq=2.0), product of:
            0.18665522 = queryWeight, product of:
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.051133685 = queryNorm
            0.28231642 = fieldWeight in 1678, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1678)
      0.33333334 = coord(1/3)
    
    Abstract
    Efficient construction of inverted indexes is essential to provision of search over large collections of text data. In this article, we review the principal approaches to inversion, analyze their theoretical cost, and present experimental results. We identify the drawbacks of existing inversion approaches and propose a single-pass inversion method that, in contrast to previous approaches, does not require the complete vocabulary of the indexed collection in main memory, can operate within limited resources, and does not sacrifice speed with high temporary storage requirements. We show that the performance of the single-pass approach can be improved by constructing inverted files in segments, reducing the cost of disk accesses during inversion of large volumes of data.
  6. Back, J.: ¬An evaluation of relevancy ranking techniques used by Internet search engines (2000) 0.02
    0.01616512 = product of:
      0.04849536 = sum of:
        0.04849536 = product of:
          0.09699072 = sum of:
            0.09699072 = weight(_text_:22 in 3445) [ClassicSimilarity], result of:
              0.09699072 = score(doc=3445,freq=2.0), product of:
                0.17906146 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051133685 = queryNorm
                0.5416616 = fieldWeight in 3445, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=3445)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    25. 8.2005 17:42:22
  7. Lin, J.; Katz, B.: Building a reusable test collection for question answering (2006) 0.02
    0.015055953 = product of:
      0.045167856 = sum of:
        0.045167856 = weight(_text_:resources in 5045) [ClassicSimilarity], result of:
          0.045167856 = score(doc=5045,freq=2.0), product of:
            0.18665522 = queryWeight, product of:
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.051133685 = queryNorm
            0.2419855 = fieldWeight in 5045, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.046875 = fieldNorm(doc=5045)
      0.33333334 = coord(1/3)
    
    Abstract
    In contrast to traditional information retrieval systems, which return ranked lists of documents that users must manually browse through, a question answering system attempts to directly answer natural language questions posed by the user. Although such systems possess language-processing capabilities, they still rely on traditional document retrieval techniques to generate an initial candidate set of documents. In this article, the authors argue that document retrieval for question answering represents a task different from retrieving documents in response to more general retrospective information needs. Thus, to guide future system development, specialized question answering test collections must be constructed. They show that the current evaluation resources have major shortcomings; to remedy the situation, they have manually created a small, reusable question answering test collection for research purposes. In this article they describe their methodology for building this test collection and discuss issues they encountered regarding the notion of "answer correctness."
  8. Langville, A.N.; Meyer, C.D.: Google's PageRank and beyond : the science of search engine rankings (2006) 0.02
    0.015055953 = product of:
      0.045167856 = sum of:
        0.045167856 = weight(_text_:resources in 6) [ClassicSimilarity], result of:
          0.045167856 = score(doc=6,freq=8.0), product of:
            0.18665522 = queryWeight, product of:
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.051133685 = queryNorm
            0.2419855 = fieldWeight in 6, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.0234375 = fieldNorm(doc=6)
      0.33333334 = coord(1/3)
    
    Abstract
    Why doesn't your home page appear on the first page of search results, even when you query your own name? How do other Web pages always appear at the top? What creates these powerful rankings? And how? The first book ever about the science of Web page rankings, "Google's PageRank and Beyond" supplies the answers to these and other questions and more. The book serves two very different audiences: the curious science reader and the technical computational reader. The chapters build in mathematical sophistication, so that the first five are accessible to the general academic reader. While other chapters are much more mathematical in nature, each one contains something for both audiences. For example, the authors include entertaining asides such as how search engines make money and how the Great Firewall of China influences research. The book includes an extensive background chapter designed to help readers learn more about the mathematics of search engines, and it contains several MATLAB codes and links to sample Web data sets. The philosophy throughout is to encourage readers to experiment with the ideas and algorithms in the text. Any business seriously interested in improving its rankings in the major search engines can benefit from the clear examples, sample code, and list of resources provided. It includes: many illustrative examples and entertaining asides; MATLAB code; accessible and informal style; and complete and self-contained section for mathematics review.
    Content
    Chapter 9. Accelerating the Computation of PageRank: 9.1 An Adaptive Power Method - 9.2 Extrapolation - 9.3 Aggregation - 9.4 Other Numerical Methods Chapter 10. Updating the PageRank Vector: 10.1 The Two Updating Problems and their History - 10.2 Restarting the Power Method - 10.3 Approximate Updating Using Approximate Aggregation - 10.4 Exact Aggregation - 10.5 Exact vs. Approximate Aggregation - 10.6 Updating with Iterative Aggregation - 10.7 Determining the Partition - 10.8 Conclusions Chapter 11. The HITS Method for Ranking Webpages: 11.1 The HITS Algorithm - 11.2 HITS Implementation - 11.3 HITS Convergence - 11.4 HITS Example - 11.5 Strengths and Weaknesses of HITS - 11.6 HITS's Relationship to Bibliometrics - 11.7 Query-Independent HITS - 11.8 Accelerating HITS - 11.9 HITS Sensitivity Chapter 12. Other Link Methods for Ranking Webpages: 12.1 SALSA - 12.2 Hybrid Ranking Methods - 12.3 Rankings based on Traffic Flow Chapter 13. The Future of Web Information Retrieval: 13.1 Spam - 13.2 Personalization - 13.3 Clustering - 13.4 Intelligent Agents - 13.5 Trends and Time-Sensitive Search - 13.6 Privacy and Censorship - 13.7 Library Classification Schemes - 13.8 Data Fusion Chapter 14. Resources for Web Information Retrieval: 14.1 Resources for Getting Started - 14.2 Resources for Serious Study Chapter 15. The Mathematics Guide: 15.1 Linear Algebra - 15.2 Perron-Frobenius Theory - 15.3 Markov Chains - 15.4 Perron Complementation - 15.5 Stochastic Complementation - 15.6 Censoring - 15.7 Aggregation - 15.8 Disaggregation
  9. Daniowicz, C.; Baliski, J.: Document ranking based upon Markov chains (2001) 0.01
    0.014976369 = product of:
      0.044929106 = sum of:
        0.044929106 = product of:
          0.08985821 = sum of:
            0.08985821 = weight(_text_:management in 5388) [ClassicSimilarity], result of:
              0.08985821 = score(doc=5388,freq=2.0), product of:
                0.17235184 = queryWeight, product of:
                  3.3706124 = idf(docFreq=4130, maxDocs=44218)
                  0.051133685 = queryNorm
                0.521365 = fieldWeight in 5388, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.3706124 = idf(docFreq=4130, maxDocs=44218)
                  0.109375 = fieldNorm(doc=5388)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Source
    Information processing and management. 37(2001) no.4, S.623-637
  10. Horng, J.T.; Yeh, C.C.: Applying genetic algorithms to query optimization in document retrieval (2000) 0.01
    0.014976369 = product of:
      0.044929106 = sum of:
        0.044929106 = product of:
          0.08985821 = sum of:
            0.08985821 = weight(_text_:management in 3045) [ClassicSimilarity], result of:
              0.08985821 = score(doc=3045,freq=2.0), product of:
                0.17235184 = queryWeight, product of:
                  3.3706124 = idf(docFreq=4130, maxDocs=44218)
                  0.051133685 = queryNorm
                0.521365 = fieldWeight in 3045, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.3706124 = idf(docFreq=4130, maxDocs=44218)
                  0.109375 = fieldNorm(doc=3045)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Source
    Information processing and management. 36(2000) no.5, S.737-759
  11. Niemi, T.; Junkkari, M.; Järvelin, K.; Viita, S.: Advanced query language for manipulating complex entities (2004) 0.01
    0.014976369 = product of:
      0.044929106 = sum of:
        0.044929106 = product of:
          0.08985821 = sum of:
            0.08985821 = weight(_text_:management in 4218) [ClassicSimilarity], result of:
              0.08985821 = score(doc=4218,freq=2.0), product of:
                0.17235184 = queryWeight, product of:
                  3.3706124 = idf(docFreq=4130, maxDocs=44218)
                  0.051133685 = queryNorm
                0.521365 = fieldWeight in 4218, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.3706124 = idf(docFreq=4130, maxDocs=44218)
                  0.109375 = fieldNorm(doc=4218)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Source
    Information processing and management. 40(2004) no.6, S.869-
  12. Clarke, C.L.A.; Cormack, G.V.; Tudhope, E.A.: Relevance ranking for one to three term queries (2000) 0.01
    0.014976369 = product of:
      0.044929106 = sum of:
        0.044929106 = product of:
          0.08985821 = sum of:
            0.08985821 = weight(_text_:management in 437) [ClassicSimilarity], result of:
              0.08985821 = score(doc=437,freq=2.0), product of:
                0.17235184 = queryWeight, product of:
                  3.3706124 = idf(docFreq=4130, maxDocs=44218)
                  0.051133685 = queryNorm
                0.521365 = fieldWeight in 437, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.3706124 = idf(docFreq=4130, maxDocs=44218)
                  0.109375 = fieldNorm(doc=437)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Source
    Information processing and management. 36(2000) no.2, S.291-311
  13. Chung, Y.M.; Lee, J.Y.: Optimization of some factors affecting the performance of query expansion (2004) 0.01
    0.014976369 = product of:
      0.044929106 = sum of:
        0.044929106 = product of:
          0.08985821 = sum of:
            0.08985821 = weight(_text_:management in 2537) [ClassicSimilarity], result of:
              0.08985821 = score(doc=2537,freq=2.0), product of:
                0.17235184 = queryWeight, product of:
                  3.3706124 = idf(docFreq=4130, maxDocs=44218)
                  0.051133685 = queryNorm
                0.521365 = fieldWeight in 2537, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.3706124 = idf(docFreq=4130, maxDocs=44218)
                  0.109375 = fieldNorm(doc=2537)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Source
    Information processing and management. 40(2004) no.6, S.891-
  14. Okada, M.; Ando, K.; Lee, S.S.; Hayashi, Y.; Aoe, J.I.: ¬An efficient substring search method by using delayed keyword extraction (2001) 0.01
    0.0128368875 = product of:
      0.03851066 = sum of:
        0.03851066 = product of:
          0.07702132 = sum of:
            0.07702132 = weight(_text_:management in 6415) [ClassicSimilarity], result of:
              0.07702132 = score(doc=6415,freq=2.0), product of:
                0.17235184 = queryWeight, product of:
                  3.3706124 = idf(docFreq=4130, maxDocs=44218)
                  0.051133685 = queryNorm
                0.44688427 = fieldWeight in 6415, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.3706124 = idf(docFreq=4130, maxDocs=44218)
                  0.09375 = fieldNorm(doc=6415)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Source
    Information processing and management. 37(2001) no.5, S.741-761
  15. Silveira, M.; Ribeiro-Neto, B.: Concept-based ranking : a case study in the juridical domain (2004) 0.01
    0.0128368875 = product of:
      0.03851066 = sum of:
        0.03851066 = product of:
          0.07702132 = sum of:
            0.07702132 = weight(_text_:management in 2339) [ClassicSimilarity], result of:
              0.07702132 = score(doc=2339,freq=2.0), product of:
                0.17235184 = queryWeight, product of:
                  3.3706124 = idf(docFreq=4130, maxDocs=44218)
                  0.051133685 = queryNorm
                0.44688427 = fieldWeight in 2339, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.3706124 = idf(docFreq=4130, maxDocs=44218)
                  0.09375 = fieldNorm(doc=2339)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Source
    Information processing and management. 40(2004) no.5, S.791-806
  16. MacFarlane, A.; Robertson, S.E.; McCann, J.A.: Parallel computing for passage retrieval (2004) 0.01
    0.009237211 = product of:
      0.027711634 = sum of:
        0.027711634 = product of:
          0.055423267 = sum of:
            0.055423267 = weight(_text_:22 in 5108) [ClassicSimilarity], result of:
              0.055423267 = score(doc=5108,freq=2.0), product of:
                0.17906146 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051133685 = queryNorm
                0.30952093 = fieldWeight in 5108, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=5108)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    20. 1.2007 18:30:22
  17. Losada, D.E.; Barreiro, A.: Emebedding term similarity and inverse document frequency into a logical model of information retrieval (2003) 0.01
    0.009237211 = product of:
      0.027711634 = sum of:
        0.027711634 = product of:
          0.055423267 = sum of:
            0.055423267 = weight(_text_:22 in 1422) [ClassicSimilarity], result of:
              0.055423267 = score(doc=1422,freq=2.0), product of:
                0.17906146 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051133685 = queryNorm
                0.30952093 = fieldWeight in 1422, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1422)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    22. 3.2003 19:27:23
  18. Austin, D.: How Google finds your needle in the Web's haystack : as we'll see, the trick is to ask the web itself to rank the importance of pages... (2006) 0.01
    0.008782639 = product of:
      0.026347917 = sum of:
        0.026347917 = weight(_text_:resources in 93) [ClassicSimilarity], result of:
          0.026347917 = score(doc=93,freq=2.0), product of:
            0.18665522 = queryWeight, product of:
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.051133685 = queryNorm
            0.14115821 = fieldWeight in 93, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.02734375 = fieldNorm(doc=93)
      0.33333334 = coord(1/3)
    
    Abstract
    Imagine a library containing 25 billion documents but with no centralized organization and no librarians. In addition, anyone may add a document at any time without telling anyone. You may feel sure that one of the documents contained in the collection has a piece of information that is vitally important to you, and, being impatient like most of us, you'd like to find it in a matter of seconds. How would you go about doing it? Posed in this way, the problem seems impossible. Yet this description is not too different from the World Wide Web, a huge, highly-disorganized collection of documents in many different formats. Of course, we're all familiar with search engines (perhaps you found this article using one) so we know that there is a solution. This article will describe Google's PageRank algorithm and how it returns pages from the web's collection of 25 billion documents that match search criteria so well that "google" has become a widely used verb. Most search engines, including Google, continually run an army of computer programs that retrieve pages from the web, index the words in each document, and store this information in an efficient format. Each time a user asks for a web search using a search phrase, such as "search engine," the search engine determines all the pages on the web that contains the words in the search phrase. (Perhaps additional information such as the distance between the words "search" and "engine" will be noted as well.) Here is the problem: Google now claims to index 25 billion pages. Roughly 95% of the text in web pages is composed from a mere 10,000 words. This means that, for most searches, there will be a huge number of pages containing the words in the search phrase. What is needed is a means of ranking the importance of the pages that fit the search criteria so that the pages can be sorted with the most important pages at the top of the list. One way to determine the importance of pages is to use a human-generated ranking. For instance, you may have seen pages that consist mainly of a large number of links to other resources in a particular area of interest. Assuming the person maintaining this page is reliable, the pages referenced are likely to be useful. Of course, the list may quickly fall out of date, and the person maintaining the list may miss some important pages, either unintentionally or as a result of an unstated bias. Google's PageRank algorithm assesses the importance of web pages without human evaluation of the content. In fact, Google feels that the value of its service is largely in its ability to provide unbiased results to search queries; Google claims, "the heart of our software is PageRank." As we'll see, the trick is to ask the web itself to rank the importance of pages.
  19. Guerrero-Bote, V.P.; Moya Anegón, F. de; Herrero Solana, V.: Document organization using Kohonen's algorithm (2002) 0.01
    0.008557925 = product of:
      0.025673775 = sum of:
        0.025673775 = product of:
          0.05134755 = sum of:
            0.05134755 = weight(_text_:management in 2564) [ClassicSimilarity], result of:
              0.05134755 = score(doc=2564,freq=2.0), product of:
                0.17235184 = queryWeight, product of:
                  3.3706124 = idf(docFreq=4130, maxDocs=44218)
                  0.051133685 = queryNorm
                0.29792285 = fieldWeight in 2564, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.3706124 = idf(docFreq=4130, maxDocs=44218)
                  0.0625 = fieldNorm(doc=2564)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Source
    Information processing and management. 38(2002) no.1, S.79-89
  20. Aizawa, A.: ¬An information-theoretic perspective of tf-idf measures (2003) 0.01
    0.008557925 = product of:
      0.025673775 = sum of:
        0.025673775 = product of:
          0.05134755 = sum of:
            0.05134755 = weight(_text_:management in 4155) [ClassicSimilarity], result of:
              0.05134755 = score(doc=4155,freq=2.0), product of:
                0.17235184 = queryWeight, product of:
                  3.3706124 = idf(docFreq=4130, maxDocs=44218)
                  0.051133685 = queryNorm
                0.29792285 = fieldWeight in 4155, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.3706124 = idf(docFreq=4130, maxDocs=44218)
                  0.0625 = fieldNorm(doc=4155)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Source
    Information processing and management. 39(2003) no.1, S.45-65

Types

  • a 60
  • m 3
  • el 1
  • More… Less…