Search (38 results, page 1 of 2)

Franke-Maier, M.; Rüter, C.: Discover Sacherschließung! : Was machen suchmaschinenbasierte Systeme mit unseren inhaltlichen Metadaten? (2015) 0.04

0.040945116 = product of:
  0.10236279 = sum of:
    0.08879929 = weight(_text_:index in 1706) [ClassicSimilarity], result of:
      0.08879929 = score(doc=1706,freq=4.0), product of:
        0.18579477 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.04251826 = queryNorm
        0.4779429 = fieldWeight in 1706, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1706)
    0.013563501 = product of:
      0.0406905 = sum of:
        0.0406905 = weight(_text_:29 in 1706) [ClassicSimilarity], result of:
          0.0406905 = score(doc=1706,freq=2.0), product of:
            0.14956595 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04251826 = queryNorm
            0.27205724 = fieldWeight in 1706, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1706)
      0.33333334 = coord(1/3)
  0.4 = coord(2/5)

Date: 2. 3.2015 10:29:44
Source: http://opus4.kobv.de/opus4-hsog/frontdoor/index/index/docId/1124

Brin, S.; Page, L.: ¬The anatomy of a large-scale hypertextual Web search engine (1998) 0.03
```
0.03469106 = product of:
  0.08672766 = sum of:
    0.06342807 = weight(_text_:index in 947) [ClassicSimilarity], result of:
      0.06342807 = score(doc=947,freq=4.0), product of:
        0.18579477 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.04251826 = queryNorm
        0.3413878 = fieldWeight in 947, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.0390625 = fieldNorm(doc=947)
    0.023299592 = weight(_text_:system in 947) [ClassicSimilarity], result of:
      0.023299592 = score(doc=947,freq=2.0), product of:
        0.13391352 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.04251826 = queryNorm
        0.17398985 = fieldWeight in 947, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.0390625 = fieldNorm(doc=947)
  0.4 = coord(2/5)
```
Abstract

In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 million pages is available at http://google.stanford.edu/. To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from three years ago. This paper provides an in-depth description of our large-scale web search engine -- the first such detailed public description we know of to date. Apart from the problems of scaling traditional search techniques to data of this magnitude, there are new technical challenges involved with using the additional information present in hypertext to produce better search results. This paper addresses this question of how to build a practical large-scale system which can exploit the additional information present in hypertext. Also we look at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want

Khare, R.; Cutting, D.; Sitaker, K.; Rifkin, A.: Nutch: a flexible and scalable open-source Web search engine (2004) 0.03

0.032712005 = product of:
  0.08178001 = sum of:
    0.0538205 = weight(_text_:index in 852) [ClassicSimilarity], result of:
      0.0538205 = score(doc=852,freq=2.0), product of:
        0.18579477 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.04251826 = queryNorm
        0.28967714 = fieldWeight in 852, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.046875 = fieldNorm(doc=852)
    0.027959513 = weight(_text_:system in 852) [ClassicSimilarity], result of:
      0.027959513 = score(doc=852,freq=2.0), product of:
        0.13391352 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.04251826 = queryNorm
        0.20878783 = fieldWeight in 852, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.046875 = fieldNorm(doc=852)
  0.4 = coord(2/5)

Abstract: Nutch is an open-source Web search engine that can be used at global, local, and even personal scale. Its initial design goal was to enable a transparent alternative for global Web search in the public interest - one of its signature features is the ability to "explain" its result rankings. Recent work has emphasized how it can also be used for intranets; by local communities with richer data models, such as the Creative Commons metadata-enabled search for licensed content; on a personal scale to index a user's files, email, and web-surfing history; and we also report on several other research projects built on Nutch. In this paper, we present how the architecture of the Nutch system enables it to be more flexible and scalable than other comparable systems today.

Bladow, N.; Dorey, C.; Frederickson, L.; Grover, P.; Knudtson, Y.; Krishnamurthy, S.; Lazarou, V.: What's the Buzz about? : An empirical examination of Search on Yahoo! (2005) 0.02
```
0.024069263 = product of:
  0.12034631 = sum of:
    0.12034631 = weight(_text_:index in 3072) [ClassicSimilarity], result of:
      0.12034631 = score(doc=3072,freq=10.0), product of:
        0.18579477 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.04251826 = queryNorm
        0.64773786 = fieldWeight in 3072, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.046875 = fieldNorm(doc=3072)
  0.2 = coord(1/5)
```
Abstract

We present an analysis of the Yahoo Buzz Index over a period of 45 weeks. Our key findings are that: (1) It is most common for a search term to show up on the index for one week, followed by two weeks, three weeks, etc. Only two terms persist for all 45 weeks studied - Britney Spears and Jennifer Lopez. Search term longevity follows a power-law distribution or a winner-take-all structure; (2) Most search terms focus on entertainment. Search terms related to serious topics are found less often. The Buzz Index does not necessarily follow the "news cycle"; and, (3) We provide two ways to determine "star power" of various search terms - one that emphasizes staying power on the Index and another that emphasizes rank. In general, the methods lead to dramatically different results. Britney Spears performs well in both methods. We conclude that the data available on the Index is symptomatic of a celebrity-crazed, entertainment-centered culture.

Leighton, H.V.: Performance of four World Wide Web (WWW) index services : Infoseek, Lycos, WebCrawler and WWWWorm (1995) 0.02

0.0215282 = product of:
  0.107641 = sum of:
    0.107641 = weight(_text_:index in 3168) [ClassicSimilarity], result of:
      0.107641 = score(doc=3168,freq=2.0), product of:
        0.18579477 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.04251826 = queryNorm
        0.5793543 = fieldWeight in 3168, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.09375 = fieldNorm(doc=3168)
  0.2 = coord(1/5)

Koch, T.: Searching the Web : systematic overview over indexes (1995) 0.02

0.0215282 = product of:
  0.107641 = sum of:
    0.107641 = weight(_text_:index in 3169) [ClassicSimilarity], result of:
      0.107641 = score(doc=3169,freq=2.0), product of:
        0.18579477 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.04251826 = queryNorm
        0.5793543 = fieldWeight in 3169, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.09375 = fieldNorm(doc=3169)
  0.2 = coord(1/5)

Object: Nordic Web Index

Matrix of WWW indices : a comparison of Internet indexing tools (1995) 0.02

0.017940167 = product of:
  0.08970083 = sum of:
    0.08970083 = weight(_text_:index in 3165) [ClassicSimilarity], result of:
      0.08970083 = score(doc=3165,freq=2.0), product of:
        0.18579477 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.04251826 = queryNorm
        0.48279524 = fieldWeight in 3165, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.078125 = fieldNorm(doc=3165)
  0.2 = coord(1/5)

Object: GNA Meta-Index

Internet search tool details (1996) 0.02

0.017940167 = product of:
  0.08970083 = sum of:
    0.08970083 = weight(_text_:index in 5677) [ClassicSimilarity], result of:
      0.08970083 = score(doc=5677,freq=2.0), product of:
        0.18579477 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.04251826 = queryNorm
        0.48279524 = fieldWeight in 5677, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.078125 = fieldNorm(doc=5677)
  0.2 = coord(1/5)

Abstract: Summaries of the popular engines extrated from the search sites. Summaries are from: AltaVista, Excite, HotBot, InfoSeek, Ultra, Lycos, OpenText Web Index, and Yahoo. Information covered includes Contents, Searching tips, Results, and Update frequency

Niemann, J.: "Ich cuil das mal" : Neue Suchmaschine fordert Google heraus (2008) 0.02
```
0.015270816 = product of:
  0.03817704 = sum of:
    0.03139529 = weight(_text_:index in 2049) [ClassicSimilarity], result of:
      0.03139529 = score(doc=2049,freq=2.0), product of:
        0.18579477 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.04251826 = queryNorm
        0.16897833 = fieldWeight in 2049, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.02734375 = fieldNorm(doc=2049)
    0.0067817504 = product of:
      0.02034525 = sum of:
        0.02034525 = weight(_text_:29 in 2049) [ClassicSimilarity], result of:
          0.02034525 = score(doc=2049,freq=2.0), product of:
            0.14956595 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04251826 = queryNorm
            0.13602862 = fieldWeight in 2049, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.02734375 = fieldNorm(doc=2049)
      0.33333334 = coord(1/3)
  0.4 = coord(2/5)
```
Content

"Daran, dass der Suchmaschinen-Gigant Google immer und in allem der Größte und Beste sein muss, haben sich Internet-Nutzer aus aller Welt längst gewöhnt. Und als das Unternehmen am Wochenende in seinem offiziellen Blog damit angab, nun den Meilenstein von eine Billion gefundener eigenständiger URLs erreicht zu haben, war das eigentlich kaum noch ein Grund aufzuhorchen. Zumal bisher der Google-Index auf 30 bis 50 Milliarden geschätzt wurde und unklar ist, ob die angeblichen Billionen Links auch indexiert sind und nicht zu großen Teilen auch zu den selben Seiten führen. Wenn nun aber plötzlich eine andere, völlig neue Suchmaschine namens "Cuil" - gesprochen "Cool"- am Start ist und behauptet, 121 Milliarden Seiten zu durchsuchen und dabei überhaupt keine Nutzerdaten speichert, ist das hingegen schon ein Anlass zum Aufhorchen. Schließlich ist man angesichts der "Daten-Kraken"-Meldungen über Google und seine Speichermethoden dankbar für jede Alternative. Gegründet wurde Cuil im Jahre 2006 von dem Ehepaar Tom Costello, ein früherer IBM-Manager und Stanford-Professor und Anna Patterson, ehemalige Google-Mitarbeiterin, in Menlo Park in Kalifornien mit einem Startkapital von 33 Millionen Dollar und startete am Wochenende offiziell den Suchbetrieb. Der ist allerdings noch stark verbesserungsfähig. Während Cuil zu dem Begriff "Schwangerschaft" angeblich 6.768.056 Treffer aufweisen kann, die allerdings in ihrer Priorisierung von Medikamenten, Blogs und Büchern eher unbrauchbar sind, stehen dem englischsprachigen User unter dem Begriff Pregnancy immerhin 241.127.157 auf den ersten Blick sehr präzise Treffer zur Verfügung. Da erscheint die Aussage Costellos, man wolle "Suchenden content-basierte Ergebnisse präsentieren und nicht nur populäre" weniger absurd. Google hat beim selben deutschen Suchbegriff über acht Millionen Treffer, zu Pregnancy über 111 Millionen. Im englischen steht Cuil Google also nicht nach, während es im deutschsprachigen Bereich allerdings auch bei Namen, Orten und Wikipedia-Einträgen noch recht bescheiden aussieht.

Date

29. 7.2008 18:17:14
Option für Metager als Standardsuchmaschine, Suchmaschine nach dem Peer-to-Peer-Prinzip (2021) 0.01
```
0.010148491 = product of:
  0.050742455 = sum of:
    0.050742455 = weight(_text_:index in 431) [ClassicSimilarity], result of:
      0.050742455 = score(doc=431,freq=4.0), product of:
        0.18579477 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.04251826 = queryNorm
        0.27311024 = fieldWeight in 431, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.03125 = fieldNorm(doc=431)
  0.2 = coord(1/5)
```
Content

YaCy: Suchmaschine nach dem Peer-to-Peer-Prinzip. YaCy ist eine dezentrale, freie Suchmaschine. Die Besonderheit: die freie Suchmaschine läuft nicht auf zentralen Servern eines einzelnen Betreibers, sondern funktioniert nach dem Peer-to-Peer (P2P) Prinzip. Dieses basiert darauf, dass die YaCy-Nutzer aufgerufene Webseiten auf ihrem Computer lokal indexieren. Jeder Nutzer "ercrawlt" sich damit einen kleinen Index, den er durch Kommunikation mit anderen YaCy-Peers teilen kann. Das Programm sorgt dafür, dass durch die kleinen dezentralen Crawler einzelner Nutzer schließlich ein globaler Gesamtindex entsteht. Je mehr Nutzer Teil dieser dezentralen Suche sind, desto größer wird der gemeinsame Index, auf den der einzelne Nutzer dann Zugriff haben kann. Seit kurzem befindet sich YaCy im Verbund unserer abgefragten Suchmaschinen. Wir sind somit auch Teil des Indexes der Suchmaschine.

Dambeck, H.: Wie Google mit Milliarden Unbekannten rechnet : Teil 2: Ausgerechnet: Der Page Rank für ein Mini-Web aus drei Seiten (2009) 0.01

0.009319837 = product of:
  0.046599183 = sum of:
    0.046599183 = weight(_text_:system in 3080) [ClassicSimilarity], result of:
      0.046599183 = score(doc=3080,freq=2.0), product of:
        0.13391352 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.04251826 = queryNorm
        0.3479797 = fieldWeight in 3080, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.078125 = fieldNorm(doc=3080)
  0.2 = coord(1/5)

Abstract: Ein simples Beispiel eines Mini-Internets aus drei Web-Seiten verdeutlicht, wie dieses Ranking-System in der Praxis funktioniert.

Sietmann, R.: Suchmaschine für das akademische Internet (2004) 0.01
```
0.008970084 = product of:
  0.044850416 = sum of:
    0.044850416 = weight(_text_:index in 5742) [ClassicSimilarity], result of:
      0.044850416 = score(doc=5742,freq=2.0), product of:
        0.18579477 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.04251826 = queryNorm
        0.24139762 = fieldWeight in 5742, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5742)
  0.2 = coord(1/5)
```
Abstract

In Zusammenarbeit mit der norwegischen Suchtechnologie-Firma Fast Search & Transfer hat die Universitätsbibliothek Bielefeld den Prototyp einer Suchmaschine für wissenschaftliche Bibliotheken entwickelt. Dieser demonstriert jetzt mit dem öffentlichen Zugriff auf ausgewählte digitalisierte Sammlungen der Projektteilnehmer die neuen Möglichkeiten des akademischen Retrieval. <http://www.heise.de/RealMedia/ads/adstream_lx.ads/www.heise.de/newsticker/meldungen/wissenschaft/954604605/Middle1/he-test-contentads/zaehler.html/38363566383735383364653062323630?_RM_EMPTY_> Während kommerzielle Suchmaschinen wie Google oder Yahoo sich nicht an akademischen Kriterien orientieren, beschränkt sich die Bielefeld Academic Search Engine (BASE ) auf die von wissenschaftlichen Bibliotheken erschlossenen und aufbereiteten Inhalte. Dazu gehören Hochschulschriften, Preprints, elektronische Zeitschriften und digitale Sammlungen, wie beispielsweise die "Internet Library of Early Journals" des Oxford University Library Service und die "Wissenschaftlichen Rezensionsorgane und Literaturzeitschriften des 18. und 19. Jahrhunderts aus dem deutschen Sprachraum" der UB Bielefeld. Wer etwa bei Google die Stichworte "Immanuel Kant" +Frieden eingibt, kommt zwar schnell an den Originaltext des Aufsatzes "Zum ewigen Frieden" heran, tut sich jedoch schwer, unter den bunt gemischten über 11.000 Treffern gezielt weiter zu recherchieren. Das BASE-Modell dagegen stellt dem Nutzer hierfür vielfältige Navigationshilfen und Metainformationen zur Verfügung. So erleichtert unter anderem die Verfeinerung der Suche auf das Erscheinungsjahr den Zugriff auf die zeitgenössische Diskussion der berühmten Schrift des Königsberger Philosophen. Derzeit ermöglicht der BASE-Prototyp das Retrieval in 15 verschiedenen Archivquellen. Darunter befinden sich die Zeitschriften der Aufklärung, die Elektronischen Dissertationen der Universität Bochum, das elektronische Journal Documenta Mathematica sowie die Mathematischen Volltexte des Springer-Verlags. Der geplante Ausbau soll sich auf eine verteilte Architektur stützen, in der von einzelnen Bibliotheken lokal erstellte Indexe gemeinsam zu einem virtuellen Master-Index beitragen. Dies würde dem Nutzer die nahtlose Navigation durch die verteilten Bestände erlauben."
Austin, D.: How Google finds your needle in the Web's haystack : as we'll see, the trick is to ask the web itself to rank the importance of pages... (2006) 0.01
```
0.008879929 = product of:
  0.044399645 = sum of:
    0.044399645 = weight(_text_:index in 93) [ClassicSimilarity], result of:
      0.044399645 = score(doc=93,freq=4.0), product of:
        0.18579477 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.04251826 = queryNorm
        0.23897146 = fieldWeight in 93, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.02734375 = fieldNorm(doc=93)
  0.2 = coord(1/5)
```
Abstract

Imagine a library containing 25 billion documents but with no centralized organization and no librarians. In addition, anyone may add a document at any time without telling anyone. You may feel sure that one of the documents contained in the collection has a piece of information that is vitally important to you, and, being impatient like most of us, you'd like to find it in a matter of seconds. How would you go about doing it? Posed in this way, the problem seems impossible. Yet this description is not too different from the World Wide Web, a huge, highly-disorganized collection of documents in many different formats. Of course, we're all familiar with search engines (perhaps you found this article using one) so we know that there is a solution. This article will describe Google's PageRank algorithm and how it returns pages from the web's collection of 25 billion documents that match search criteria so well that "google" has become a widely used verb. Most search engines, including Google, continually run an army of computer programs that retrieve pages from the web, index the words in each document, and store this information in an efficient format. Each time a user asks for a web search using a search phrase, such as "search engine," the search engine determines all the pages on the web that contains the words in the search phrase. (Perhaps additional information such as the distance between the words "search" and "engine" will be noted as well.) Here is the problem: Google now claims to index 25 billion pages. Roughly 95% of the text in web pages is composed from a mere 10,000 words. This means that, for most searches, there will be a huge number of pages containing the words in the search phrase. What is needed is a means of ranking the importance of the pages that fit the search criteria so that the pages can be sorted with the most important pages at the top of the list. One way to determine the importance of pages is to use a human-generated ranking. For instance, you may have seen pages that consist mainly of a large number of links to other resources in a particular area of interest. Assuming the person maintaining this page is reliable, the pages referenced are likely to be useful. Of course, the list may quickly fall out of date, and the person maintaining the list may miss some important pages, either unintentionally or as a result of an unstated bias. Google's PageRank algorithm assesses the importance of web pages without human evaluation of the content. In fact, Google feels that the value of its service is largely in its ability to provide unbiased results to search queries; Google claims, "the heart of our software is PageRank." As we'll see, the trick is to ask the web itself to rank the importance of pages.
Ding, L.; Finin, T.; Joshi, A.; Peng, Y.; Cost, R.S.; Sachs, J.; Pan, R.; Reddivari, P.; Doshi, V.: Swoogle : a Semantic Web search and metadata engine (2004) 0.01
```
0.007908144 = product of:
  0.03954072 = sum of:
    0.03954072 = weight(_text_:system in 4704) [ClassicSimilarity], result of:
      0.03954072 = score(doc=4704,freq=4.0), product of:
        0.13391352 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.04251826 = queryNorm
        0.29527056 = fieldWeight in 4704, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.046875 = fieldNorm(doc=4704)
  0.2 = coord(1/5)
```
Abstract

Swoogle is a crawler-based indexing and retrieval system for the Semantic Web, i.e., for Web documents in RDF or OWL. It extracts metadata for each discovered document, and computes relations between documents. Discovered documents are also indexed by an information retrieval system which can use either character N-Gram or URIrefs as keywords to find relevant documents and to compute the similarity among a set of documents. One of the interesting properties we compute is rank, a measure of the importance of a Semantic Web document.
Summann, F.; Lossau, N.: Search engine technology and digital libraries : moving from theory to practice (2004) 0.01
```
0.00745587 = product of:
  0.03727935 = sum of:
    0.03727935 = weight(_text_:system in 1196) [ClassicSimilarity], result of:
      0.03727935 = score(doc=1196,freq=8.0), product of:
        0.13391352 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.04251826 = queryNorm
        0.27838376 = fieldWeight in 1196, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.03125 = fieldNorm(doc=1196)
  0.2 = coord(1/5)
```
Abstract

This article describes the journey from the conception of and vision for a modern search-engine-based search environment to its technological realisation. In doing so, it takes up the thread of an earlier article on this subject, this time from a technical viewpoint. As well as presenting the conceptual considerations of the initial stages, this article will principally elucidate the technological aspects of this journey. The starting point for the deliberations about development of an academic search engine was the experience we gained through the generally successful project "Digital Library NRW", in which from 1998 to 2000-with Bielefeld University Library in overall charge-we designed a system model for an Internet-based library portal with an improved academic search environment at its core. At the heart of this system was a metasearch with an availability function, to which we added a user interface integrating all relevant source material for study and research. The deficiencies of this approach were felt soon after the system was launched in June 2001. There were problems with the stability and performance of the database retrieval system, with the integration of full-text documents and Internet pages, and with acceptance by users, because users are increasingly performing the searches themselves using search engines rather than going to the library for help in doing searches. Since a long list of problems are also encountered using commercial search engines for academic use (in particular the retrieval of academic information and long-term availability), the idea was born for a search engine configured specifically for academic use. We also hoped that with one single access point founded on improved search engine technology, we could access the heterogeneous academic resources of subject-based bibliographic databases, catalogues, electronic newspapers, document servers and academic web pages.

Powell, J.; Fox, E.A.: Multilingual federated searching across heterogeneous collections (1998) 0.01

0.00745587 = product of:
  0.03727935 = sum of:
    0.03727935 = weight(_text_:system in 1250) [ClassicSimilarity], result of:
      0.03727935 = score(doc=1250,freq=2.0), product of:
        0.13391352 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.04251826 = queryNorm
        0.27838376 = fieldWeight in 1250, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.0625 = fieldNorm(doc=1250)
  0.2 = coord(1/5)

Abstract: This article describes a scalable system for searching heterogeneous multilingual collections on the World Wide Web. It details a markup language for describing the characteristics of a search engine and its interface, and a protocol for requesting word translations between languages.

Körber, S.: Suchmuster erfahrener und unerfahrener Suchmaschinennutzer im deutschsprachigen World Wide Web (2000) 0.01
```
0.0071760663 = product of:
  0.03588033 = sum of:
    0.03588033 = weight(_text_:index in 5938) [ClassicSimilarity], result of:
      0.03588033 = score(doc=5938,freq=2.0), product of:
        0.18579477 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.04251826 = queryNorm
        0.1931181 = fieldWeight in 5938, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.03125 = fieldNorm(doc=5938)
  0.2 = coord(1/5)
```
Abstract

In einem Labor-Experiment wurden insgesamt achtzehn Studenten und Studentinnen mit zwei offenen Web-Rechercheaufgaben konfrontiert. Während deren Bewältigung mit einer Suchmaschine wurden sie per Proxy-Logfile-Protokollierung verdeckt beobachtet. Sie machten demographische und ihre Webnutzungs-Gewohnheiten betreffende Angaben, bewerteten Aufgaben-, Performance- und Suchmaschinen-Eigenschaften in Fragebögen und stellten sich einem Multiple-Choice-Test zu ihrem Wissen über Suchmaschinen. Die Versuchspersonen wurden gezielt angeworben und eingeteilt: in eine erfahrene und eine unerfahrene Untergruppe mit je neun Teilnehmern. Die Untersuchung beruht auf dem Vergleich der beiden Gruppen: Im Zentrum stehen dabei die Lesezeichen, die sie als Lösungen ablegten, ihre Einschätzungen aus den Fragebögen, ihre Suchphrasen sowie die Muster ihrer Suchmaschinen-Interaktion und Navigation in Zielseiten. Diese aus den Logfiles gewonnen sequentiellen Aktionsmuster wurden vergleichend visualisiert, ausgezählt und interpretiert. Zunächst wird das World Wide Web als strukturell und inhaltlich komplexer Informationsraum beschrieben. Daraufhin beleuchtet der Autor die allgemeinen Aufgaben und Typen von Meta-Medienanwendungen, sowie die Komponenten Index-basierter Suchmaschinen. Im Anschluß daran wechselt die Perspektive von der strukturell-medialen Seite hin zu Nutzungsaspekten. Der Autor beschreibt Nutzung von Meta-Medienanwendungen als Ko-Selektion zwischen Nutzer und Suchmaschine auf der Basis von Entscheidungen und entwickelt ein einfaches, dynamisches Phasenmodell. Der Einfluß unterschiedlicher Wissensarten auf den Selektionsprozeß findet hier Beachtung.Darauf aufbauend werden im folgenden Schritt allgemeine Forschungsfragen und Hypothesen für das Experiment formuliert. Dessen Eigenschaften sind das anschließende Thema, wobei das Beobachtungsinstrument Logfile-Analyse, die Wahl des Suchdienstes, die Formulierung der Aufgaben, Ausarbeitung der Fragebögen und der Ablauf im Zentrum stehen. Im folgenden präsentiert der Autor die Ergebnisse in drei Schwerpunkten: erstens in bezug auf die Performance - was die Prüfung der Hypothesen erlaubt - zweitens in bezug auf die Bewertungen, Kommentare und Suchphrasen der Versuchspersonen und drittens in bezug auf die visuelle und rechnerische Auswertung der Suchmuster. Letztere erlauben einen Einblick in das Suchverhalten der Versuchspersonen. Zusammenfassende Interpretationen und ein Ausblick schließen die Arbeit ab
Radhakrishnan, A.: Swoogle : an engine for the Semantic Web (2007) 0.01
```
0.0071760663 = product of:
  0.03588033 = sum of:
    0.03588033 = weight(_text_:index in 4709) [ClassicSimilarity], result of:
      0.03588033 = score(doc=4709,freq=2.0), product of:
        0.18579477 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.04251826 = queryNorm
        0.1931181 = fieldWeight in 4709, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.03125 = fieldNorm(doc=4709)
  0.2 = coord(1/5)
```
Content

"Swoogle, the Semantic web search engine, is a research project carried out by the ebiquity research group in the Computer Science and Electrical Engineering Department at the University of Maryland. It's an engine tailored towards finding documents on the semantic web. The whole research paper is available here. Semantic web is touted as the next generation of online content representation where the web documents are represented in a language that is not only easy for humans but is machine readable (easing the integration of data as never thought possible) as well. And the main elements of the semantic web include data model description formats such as Resource Description Framework (RDF), a variety of data interchange formats (e.g. RDF/XML, Turtle, N-Triples), and notations such as RDF Schema (RDFS), the Web Ontology Language (OWL), all of which are intended to provide a formal description of concepts, terms, and relationships within a given knowledge domain (Wikipedia). And Swoogle is an attempt to mine and index this new set of web documents. The engine performs crawling of semantic documents like most web search engines and the search is available as web service too. The engine is primarily written in Java with the PHP used for the front-end and MySQL for database. Swoogle is capable of searching over 10,000 ontologies and indexes more that 1.3 million web documents. It also computes the importance of a Semantic Web document. The techniques used for indexing are the more google-type page ranking and also mining the documents for inter-relationships that are the basis for the semantic web. For more information on how the RDF framework can be used to relate documents, read the link here. Being a research project, and with a non-commercial motive, there is not much hype around Swoogle. However, the approach to indexing of Semantic web documents is an approach that most engines will have to take at some point of time. When the Internet debuted, there were no specific engines available for indexing or searching. The Search domain only picked up as more and more content became available. One fundamental question that I've always wondered about it is - provided that the search engines return very relevant results for a query - how to ascertain that the documents are indeed the most relevant ones available. There is always an inherent delay in indexing of document. Its here that the new semantic documents search engines can close delay. Experimenting with the concept of Search in the semantic web can only bore well for the future of search technology."
Hogan, A.; Harth, A.; Umbrich, J.; Kinsella, S.; Polleres, A.; Decker, S.: Searching and browsing Linked Data with SWSE : the Semantic Web Search Engine (2011) 0.01
```
0.0065901205 = product of:
  0.032950602 = sum of:
    0.032950602 = weight(_text_:system in 438) [ClassicSimilarity], result of:
      0.032950602 = score(doc=438,freq=4.0), product of:
        0.13391352 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.04251826 = queryNorm
        0.24605882 = fieldWeight in 438, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.0390625 = fieldNorm(doc=438)
  0.2 = coord(1/5)
```
Abstract

In this paper, we discuss the architecture and implementation of the Semantic Web Search Engine (SWSE). Following traditional search engine architecture, SWSE consists of crawling, data enhancing, indexing and a user interface for search, browsing and retrieval of information; unlike traditional search engines, SWSE operates over RDF Web data - loosely also known as Linked Data - which implies unique challenges for the system design, architecture, algorithms, implementation and user interface. In particular, many challenges exist in adopting Semantic Web technologies for Web data: the unique challenges of the Web - in terms of scale, unreliability, inconsistency and noise - are largely overlooked by the current Semantic Web standards. Herein, we describe the current SWSE system, initially detailing the architecture and later elaborating upon the function, design, implementation and performance of each individual component. In so doing, we also give an insight into how current Semantic Web standards can be tailored, in a best-effort manner, for use on Web data. Throughout, we offer evaluation and complementary argumentation to support our design choices, and also offer discussion on future directions and open research questions. Later, we also provide candid discussion relating to the difficulties currently faced in bringing such a search engine into the mainstream, and lessons learnt from roughly six years working on the Semantic Web Search Engine project.
Li, Z.: ¬A domain specific search engine with explicit document relations (2013) 0.01
```
0.0065901205 = product of:
  0.032950602 = sum of:
    0.032950602 = weight(_text_:system in 1210) [ClassicSimilarity], result of:
      0.032950602 = score(doc=1210,freq=4.0), product of:
        0.13391352 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.04251826 = queryNorm
        0.24605882 = fieldWeight in 1210, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1210)
  0.2 = coord(1/5)
```
Abstract

The current web consists of documents that are highly heterogeneous and hard for machines to understand. The Semantic Web is a progressive movement of the Word Wide Web, aiming at converting the current web of unstructured documents to the web of data. In the Semantic Web, web documents are annotated with metadata using standardized ontology language. These annotated documents are directly processable by machines and it highly improves their usability and usefulness. In Ericsson, similar problems occur. There are massive documents being created with well-defined structures. Though these documents are about domain specific knowledge and can have rich relations, they are currently managed by a traditional search engine, which ignores the rich domain specific information and presents few data to users. Motivated by the Semantic Web, we aim to find standard ways to process these documents, extract rich domain specific information and annotate these data to documents with formal markup languages. We propose this project to develop a domain specific search engine for processing different documents and building explicit relations for them. This research project consists of the three main focuses: examining different domain specific documents and finding ways to extract their metadata; integrating a text search engine with an ontology server; exploring novel ways to build relations for documents. We implement this system and demonstrate its functions. As a prototype, the system provides required features and will be extended in the future.

Search (38 results, page 1 of 2)

Authors

Years

Languages

Types

Themes