Search (8 results, page 1 of 1)

  • × theme_ss:"Suchmaschinen"
  • × type_ss:"el"
  • × year_i:[2000 TO 2010}
  1. Baeza-Yates, R.; Boldi, P.; Castillo, C.: Generalizing PageRank : damping functions for linkbased ranking algorithms (2006) 0.02
    0.020863095 = product of:
      0.04172619 = sum of:
        0.02586502 = weight(_text_:data in 2565) [ClassicSimilarity], result of:
          0.02586502 = score(doc=2565,freq=2.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.17468026 = fieldWeight in 2565, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2565)
        0.01586117 = product of:
          0.03172234 = sum of:
            0.03172234 = weight(_text_:22 in 2565) [ClassicSimilarity], result of:
              0.03172234 = score(doc=2565,freq=2.0), product of:
                0.16398162 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046827413 = queryNorm
                0.19345059 = fieldWeight in 2565, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2565)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    This paper introduces a family of link-based ranking algorithms that propagate page importance through links. In these algorithms there is a damping function that decreases with distance, so a direct link implies more endorsement than a link through a long path. PageRank is the most widely known ranking function of this family. The main objective of this paper is to determine whether this family of ranking techniques has some interest per se, and how different choices for the damping function impact on rank quality and on convergence speed. Even though our results suggest that PageRank can be approximated with other simpler forms of rankings that may be computed more efficiently, our focus is of more speculative nature, in that it aims at separating the kernel of PageRank, that is, link-based importance propagation, from the way propagation decays over paths. We focus on three damping functions, having linear, exponential, and hyperbolic decay on the lengths of the paths. The exponential decay corresponds to PageRank, and the other functions are new. Our presentation includes algorithms, analysis, comparisons and experiments that study their behavior under different parameters in real Web graph data. Among other results, we show how to calculate a linear approximation that induces a page ordering that is almost identical to PageRank's using a fixed small number of iterations; comparisons were performed using Kendall's tau on large domain datasets.
    Date
    16. 1.2016 10:22:28
  2. Radhakrishnan, A.: Swoogle : an engine for the Semantic Web (2007) 0.01
    0.008959906 = product of:
      0.035839625 = sum of:
        0.035839625 = weight(_text_:data in 4709) [ClassicSimilarity], result of:
          0.035839625 = score(doc=4709,freq=6.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.24204408 = fieldWeight in 4709, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.03125 = fieldNorm(doc=4709)
      0.25 = coord(1/4)
    
    Content
    "Swoogle, the Semantic web search engine, is a research project carried out by the ebiquity research group in the Computer Science and Electrical Engineering Department at the University of Maryland. It's an engine tailored towards finding documents on the semantic web. The whole research paper is available here. Semantic web is touted as the next generation of online content representation where the web documents are represented in a language that is not only easy for humans but is machine readable (easing the integration of data as never thought possible) as well. And the main elements of the semantic web include data model description formats such as Resource Description Framework (RDF), a variety of data interchange formats (e.g. RDF/XML, Turtle, N-Triples), and notations such as RDF Schema (RDFS), the Web Ontology Language (OWL), all of which are intended to provide a formal description of concepts, terms, and relationships within a given knowledge domain (Wikipedia). And Swoogle is an attempt to mine and index this new set of web documents. The engine performs crawling of semantic documents like most web search engines and the search is available as web service too. The engine is primarily written in Java with the PHP used for the front-end and MySQL for database. Swoogle is capable of searching over 10,000 ontologies and indexes more that 1.3 million web documents. It also computes the importance of a Semantic Web document. The techniques used for indexing are the more google-type page ranking and also mining the documents for inter-relationships that are the basis for the semantic web. For more information on how the RDF framework can be used to relate documents, read the link here. Being a research project, and with a non-commercial motive, there is not much hype around Swoogle. However, the approach to indexing of Semantic web documents is an approach that most engines will have to take at some point of time. When the Internet debuted, there were no specific engines available for indexing or searching. The Search domain only picked up as more and more content became available. One fundamental question that I've always wondered about it is - provided that the search engines return very relevant results for a query - how to ascertain that the documents are indeed the most relevant ones available. There is always an inherent delay in indexing of document. Its here that the new semantic documents search engines can close delay. Experimenting with the concept of Search in the semantic web can only bore well for the future of search technology."
  3. Bladow, N.; Dorey, C.; Frederickson, L.; Grover, P.; Knudtson, Y.; Krishnamurthy, S.; Lazarou, V.: What's the Buzz about? : An empirical examination of Search on Yahoo! (2005) 0.01
    0.0077595054 = product of:
      0.031038022 = sum of:
        0.031038022 = weight(_text_:data in 3072) [ClassicSimilarity], result of:
          0.031038022 = score(doc=3072,freq=2.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.2096163 = fieldWeight in 3072, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=3072)
      0.25 = coord(1/4)
    
    Abstract
    We present an analysis of the Yahoo Buzz Index over a period of 45 weeks. Our key findings are that: (1) It is most common for a search term to show up on the index for one week, followed by two weeks, three weeks, etc. Only two terms persist for all 45 weeks studied - Britney Spears and Jennifer Lopez. Search term longevity follows a power-law distribution or a winner-take-all structure; (2) Most search terms focus on entertainment. Search terms related to serious topics are found less often. The Buzz Index does not necessarily follow the "news cycle"; and, (3) We provide two ways to determine "star power" of various search terms - one that emphasizes staying power on the Index and another that emphasizes rank. In general, the methods lead to dramatically different results. Britney Spears performs well in both methods. We conclude that the data available on the Index is symptomatic of a celebrity-crazed, entertainment-centered culture.
  4. Khare, R.; Cutting, D.; Sitaker, K.; Rifkin, A.: Nutch: a flexible and scalable open-source Web search engine (2004) 0.01
    0.0077595054 = product of:
      0.031038022 = sum of:
        0.031038022 = weight(_text_:data in 852) [ClassicSimilarity], result of:
          0.031038022 = score(doc=852,freq=2.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.2096163 = fieldWeight in 852, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=852)
      0.25 = coord(1/4)
    
    Abstract
    Nutch is an open-source Web search engine that can be used at global, local, and even personal scale. Its initial design goal was to enable a transparent alternative for global Web search in the public interest - one of its signature features is the ability to "explain" its result rankings. Recent work has emphasized how it can also be used for intranets; by local communities with richer data models, such as the Creative Commons metadata-enabled search for licensed content; on a personal scale to index a user's files, email, and web-surfing history; and we also report on several other research projects built on Nutch. In this paper, we present how the architecture of the Nutch system enables it to be more flexible and scalable than other comparable systems today.
  5. Schomburg, S.; Prante, J.: Search Engine Federation in Libraries - Suchmaschinenföderation in Bibliotheken (2009) 0.01
    0.0077595054 = product of:
      0.031038022 = sum of:
        0.031038022 = weight(_text_:data in 2809) [ClassicSimilarity], result of:
          0.031038022 = score(doc=2809,freq=2.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.2096163 = fieldWeight in 2809, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=2809)
      0.25 = coord(1/4)
    
    Abstract
    The hbz (Academic Library Center, Cologne) has a strong focus on search engine applications: Beyond the projected integration of respective technologies into the new release of the Digital Library portal solution (DigiBib6), vascoda background services also apply and take advantage of search engine technology. Experience since 2003 has given proof that building and updating of search engine indexes involves a vast amount of resources. The use of search engine federations, however, pledges major improvements: The total amount of data records held in linked indexes can be almost unlimited but also allow for a joint output of all hits retrieved. A federation also comes with excellent response times - hits retrieved can also refer to or link into the original system's layout. Nonetheless, the major challenge these days is different search engine technologies, e.g. Lucene and FAST, the variations in terms of ranking, and the implementation or non-implementation of so-called drill-downs. The lecture is designed to give a brief insight into the hbz search engine workshop with an introduction to the special project state of play.
  6. Boldi, P.; Santini, M.; Vigna, S.: PageRank as a function of the damping factor (2005) 0.00
    0.0039652926 = product of:
      0.01586117 = sum of:
        0.01586117 = product of:
          0.03172234 = sum of:
            0.03172234 = weight(_text_:22 in 2564) [ClassicSimilarity], result of:
              0.03172234 = score(doc=2564,freq=2.0), product of:
                0.16398162 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046827413 = queryNorm
                0.19345059 = fieldWeight in 2564, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2564)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    16. 1.2016 10:22:28
  7. Bates, M.E.: Quick answers to odd questions (2004) 0.00
    0.0038797527 = product of:
      0.015519011 = sum of:
        0.015519011 = weight(_text_:data in 3071) [ClassicSimilarity], result of:
          0.015519011 = score(doc=3071,freq=2.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.10480815 = fieldWeight in 3071, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0234375 = fieldNorm(doc=3071)
      0.25 = coord(1/4)
    
    Content
    "One of the things I enjoyed the most when I was a reference librarian was the wide range of questions my clients sent my way. What was the original title of the first Godzilla movie? (Gojira, released in 1954) Who said 'I'm as pure as the driven slush'? (Tallulah Bankhead) What percentage of adults have gone to a jazz performance in the last year? (11%) I have found that librarians, speech writers and journalists have one thing in common - we all need to find information on all kinds of topics, and we usually need the answers right now. The following are a few of my favorite sites for finding answers to those there-must-be-an-answer-out-there questions. - For the electronic equivalent to the "ready reference" shelf of resources that most librarians keep hidden behind their desks, check out RefDesk . It is particularly good for answering factual questions - Where do I get the new Windows XP Service Pack? Where is the 386 area code? How do I contact my member of Congress? - Another resource for lots of those quick-fact questions is InfoPlease, the publishers of the Information Please almanac .- Right now, it's full of Olympics data, but it also has links to facts and factoids that you would look up in an almanac, atlas, or encyclopedia. - If you want numbers, start with the Statistical Abstract of the US. This source, produced by the U.S. Census Bureau, gives you everything from the divorce rate by state to airline cost indexes going back to 1980. It is many librarians' secret weapon for pulling numbers together quickly. - My favorite question is "how does that work?" Haven't you ever wondered how they get that Olympic torch to continue to burn while it is being carried by runners from one city to the next? Or how solar sails manage to propel a spacecraft? For answers, check out the appropriately-named How Stuff Works. - For questions about movies, my first resource is the Internet Movie Database. It is easy to search, is such a popular site that mistakes are corrected quickly, and is a fun place to catch trailers of both upcoming movies and those dating back to the 30s. - When I need to figure out who said what, I still tend to rely on the print sources such as Bartlett's Familiar Quotations . No, the current edition is not available on the web, but - and this is the librarian in me - I really appreciate the fact that I not only get the attribution but I also see the source of the quote. There are far too many quotes being attributed to a celebrity, but with no indication of the publication in which the quote appeared. Take, for example, the much-cited quote of Margaret Meade, "Never doubt that a small group of thoughtful committed people can change the world; indeed, it's the only thing that ever has!" Then see the page on the Institute for Intercultural Studies site, founded by Meade, and read its statement that it has never been able to verify this alleged quote from Meade. While there are lots of web-based sources of quotes (see QuotationsPage.com and Bartleby, for example), unless the site provides the original source for the quotation, I wouldn't rely on the citation. Of course, if you have a hunch as to the source of a quote, and it was published prior to 1923, head over to Project Gutenberg , which includes the full text of over 12,000 books that are in the public domain. When I needed to confirm a quotation of the Red Queen in "Through the Looking Glass", this is where I started. - And if you are stumped as to where to go to find information, instead of Googling it, try the Librarians' Index to the Internet. While it is somewhat US-centric, it is a great directory of web resources."
  8. Maurer, H.; Balke, T.; Kappe,, F.; Kulathuramaiyer, N.; Weber, S.; Zaka, B.: Report on dangers and opportunities posed by large search engines, particularly Google (2007) 0.00
    0.0038797527 = product of:
      0.015519011 = sum of:
        0.015519011 = weight(_text_:data in 754) [ClassicSimilarity], result of:
          0.015519011 = score(doc=754,freq=2.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.10480815 = fieldWeight in 754, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0234375 = fieldNorm(doc=754)
      0.25 = coord(1/4)
    
    Abstract
    The preliminary intended and approved list was: Section 1: To concentrate on Google as virtual monopoly, and Google's reported support of Wikipedia. To find experimental evidence of this support or show that the reports are not more than rumours. Section 2: To address the copy-past syndrome with socio-cultural consequences associated with it. Section 3: To deal with plagiarism and IPR violations as two intertwined topics: how they affect various players (teachers and pupils in school; academia; corporations; governmental studies, etc.). To establish that not enough is done concerning these issues, partially due to just plain ignorance. We will propose some ways to alleviate the problem. Section 4: To discuss the usual tools to fight plagiarism and their shortcomings. Section 5: To propose ways to overcome most of above problems according to proposals by Maurer/Zaka. To examples, but to make it clear that do this more seriously a pilot project is necessary beyond this particular study. Section 6: To briefly analyze various views of plagiarism as it is quite different in different fields (journalism, engineering, architecture, painting, .) and to present a concept that avoids plagiarism from the very beginning. Section 7: To point out the many other dangers of Google or Google-like undertakings: opportunistic ranking, analysis of data as window into commercial future. Section 8: To outline the need of new international laws. Section 9: To mention the feeble European attempts to fight Google, despite Google's growing power. Section 10. To argue that there is no way to catch up with Google in a frontal attack.