Search (95 results, page 1 of 5)

Cohen, D.J.: From Babel to knowledge : data mining large digital collections (2006) 0.06
```
0.057399243 = product of:
  0.114798486 = sum of:
    0.072267525 = weight(_text_:engines in 1178) [ClassicSimilarity], result of:
      0.072267525 = score(doc=1178,freq=4.0), product of:
        0.22757743 = queryWeight, product of:
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.04479146 = queryNorm
        0.31755137 = fieldWeight in 1178, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.03125 = fieldNorm(doc=1178)
    0.042530965 = product of:
      0.08506193 = sum of:
        0.08506193 = weight(_text_:programming in 1178) [ClassicSimilarity], result of:
          0.08506193 = score(doc=1178,freq=2.0), product of:
            0.29361802 = queryWeight, product of:
              6.5552235 = idf(docFreq=170, maxDocs=44218)
              0.04479146 = queryNorm
            0.28970268 = fieldWeight in 1178, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.5552235 = idf(docFreq=170, maxDocs=44218)
              0.03125 = fieldNorm(doc=1178)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

In Jorge Luis Borges's curious short story The Library of Babel, the narrator describes an endless collection of books stored from floor to ceiling in a labyrinth of countless hexagonal rooms. The pages of the library's books seem to contain random sequences of letters and spaces; occasionally a few intelligible words emerge in the sea of paper and ink. Nevertheless, readers diligently, and exasperatingly, scan the shelves for coherent passages. The narrator himself has wandered numerous rooms in search of enlightenment, but with resignation he simply awaits his death and burial - which Borges explains (with signature dark humor) consists of being tossed unceremoniously over the library's banister. Borges's nightmare, of course, is a cursed vision of the research methods of disciplines such as literature, history, and philosophy, where the careful reading of books, one after the other, is supposed to lead inexorably to knowledge and understanding. Computer scientists would approach Borges's library far differently. Employing the information theory that forms the basis for search engines and other computerized techniques for assessing in one fell swoop large masses of documents, they would quickly realize the collection's incoherence though sampling and statistical methods - and wisely start looking for the library's exit. These computational methods, which allow us to find patterns, determine relationships, categorize documents, and extract information from massive corpuses, will form the basis for new tools for research in the humanities and other disciplines in the coming decade. For the past three years I have been experimenting with how to provide such end-user tools - that is, tools that harness the power of vast electronic collections while hiding much of their complicated technical plumbing. In particular, I have made extensive use of the application programming interfaces (APIs) the leading search engines provide for programmers to query their databases directly (from server to server without using their web interfaces). In addition, I have explored how one might extract information from large digital collections, from the well-curated lexicographic database WordNet to the democratic (and poorly curated) online reference work Wikipedia. While processing these digital corpuses is currently an imperfect science, even now useful tools can be created by combining various collections and methods for searching and analyzing them. And more importantly, these nascent services suggest a future in which information can be gleaned from, and sense can be made out of, even imperfect digital libraries of enormous scale. A brief examination of two approaches to data mining large digital collections hints at this future, while also providing some lessons about how to get there.

TASI: ¬A review of image search engines (2003) 0.06

0.055318307 = product of:
  0.22127323 = sum of:
    0.22127323 = weight(_text_:engines in 6757) [ClassicSimilarity], result of:
      0.22127323 = score(doc=6757,freq=6.0), product of:
        0.22757743 = queryWeight, product of:
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.04479146 = queryNorm
        0.9722986 = fieldWeight in 6757, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.078125 = fieldNorm(doc=6757)
  0.25 = coord(1/4)

Abstract: Replacing an earlier review, TASI's report outlines the different types of image search engines available and suggests the things to look out for when using one to find images. It includes TASI's own critical evaluation of the most popular engines.

Gerhart, S.L.: Do Web search engines suppress controversy? : Simulating the exchange process (2004) 0.05

0.051100858 = product of:
  0.20440343 = sum of:
    0.20440343 = weight(_text_:engines in 8164) [ClassicSimilarity], result of:
      0.20440343 = score(doc=8164,freq=2.0), product of:
        0.22757743 = queryWeight, product of:
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.04479146 = queryNorm
        0.8981709 = fieldWeight in 8164, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.125 = fieldNorm(doc=8164)
  0.25 = coord(1/4)

Bradley, P.: ¬The relevance of underpants to searching the Web (2000) 0.04

0.044713248 = product of:
  0.17885299 = sum of:
    0.17885299 = weight(_text_:engines in 3961) [ClassicSimilarity], result of:
      0.17885299 = score(doc=3961,freq=2.0), product of:
        0.22757743 = queryWeight, product of:
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.04479146 = queryNorm
        0.7858995 = fieldWeight in 3961, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.109375 = fieldNorm(doc=3961)
  0.25 = coord(1/4)

Footnote: Auch unter: http://www.ariadne.ac.uk/issue24/search-engines

Entlich, R.: FAQ: Image Search Engines (2001) 0.04
```
0.042849373 = product of:
  0.17139749 = sum of:
    0.17139749 = weight(_text_:engines in 155) [ClassicSimilarity], result of:
      0.17139749 = score(doc=155,freq=10.0), product of:
        0.22757743 = queryWeight, product of:
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.04479146 = queryNorm
        0.75313926 = fieldWeight in 155, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.046875 = fieldNorm(doc=155)
  0.25 = coord(1/4)
```
Abstract

Everyone loves images. The web wasn't anything until images came along, then it was an overnight success. So how does one find a specific image on the web? By using one of a burgeoning number of image-focused search engines. These search engines are simply optimized versions of typical web indexes, with crawlers that go around sucking down web content and indexing it. But with image search engines, they focus on images only, and the web page text that may describe them. As information professionals, we know that this is a clumsy approach at best, but as the author puts it, until more sophisticated methods become available, the tools profiled here will "have to suffice." Seven search engines are thoroughly tested in this review article, with Google's Image Search (http://www.google.com/imghp?hl=en) being the highest rated
Sirapyan, N.: In Search of... (2001) 0.03
```
0.03319098 = product of:
  0.13276392 = sum of:
    0.13276392 = weight(_text_:engines in 5661) [ClassicSimilarity], result of:
      0.13276392 = score(doc=5661,freq=6.0), product of:
        0.22757743 = queryWeight, product of:
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.04479146 = queryNorm
        0.58337915 = fieldWeight in 5661, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.046875 = fieldNorm(doc=5661)
  0.25 = coord(1/4)
```
Abstract

In a series of capsule reviews of 20 search engines Sirapyan gives a good overview of the state of Internet search tools. She starts out with a clear discussion of the types of search tools available, the availability of advanced features such as Boolean queries and differences between directories, regular search engines and metasearch engines. It is unclear from the article whether the author and other testers used the same searches across all of the 20 tools but each review clearly outlines perceived strengths and weaknesses, gives tips on the advanced features, if any, of the search tool in question and suggests the types of searches that are most successful. The tools which receive top honors are Google, Northern Light, HotBot and Oingo. Finally, there is an extra sidebar the discusses meta and specialized search tools such as Infozoid and FirstGov. I can't help thinking that the usefulness of this article is related to the fact that Sirapyan is PC Magazine's librarian and goes into greater depth on those features that are of interest to information professionals
Warnick, W.L.; Leberman, A.; Scott, R.L.; Spence, K.J.; Johnsom, L.A.; Allen, V.S.: Searching the deep Web : directed query engine applications at the Department of Energy (2001) 0.03
```
0.03319098 = product of:
  0.13276392 = sum of:
    0.13276392 = weight(_text_:engines in 1215) [ClassicSimilarity], result of:
      0.13276392 = score(doc=1215,freq=6.0), product of:
        0.22757743 = queryWeight, product of:
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.04479146 = queryNorm
        0.58337915 = fieldWeight in 1215, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.046875 = fieldNorm(doc=1215)
  0.25 = coord(1/4)
```
Abstract

Directed Query Engines, an emerging class of search engine specifically designed to access distributed resources on the deep web, offer the opportunity to create inexpensive digital libraries. Already, one such engine, Distributed Explorer, has been used to select and assemble high quality information resources and incorporate them into publicly available systems for the physical sciences. By nesting Directed Query Engines so that one query launches several other engines in a cascading fashion, enormous virtual collections may soon be assembled to form a comprehensive information infrastructure for the physical sciences. Once a Directed Query Engine has been configured for a set of information resources, distributed alerts tools can provide patrons with personalized, profile-based notices of recent additions to any of the selected resources. Due to the potentially enormous size and scope of Directed Query Engine applications, consideration must be given to issues surrounding the representation of large quantities of information from multiple, heterogeneous sources.
Radhakrishnan, A.: Swoogle : an engine for the Semantic Web (2007) 0.03
```
0.02856625 = product of:
  0.114265 = sum of:
    0.114265 = weight(_text_:engines in 4709) [ClassicSimilarity], result of:
      0.114265 = score(doc=4709,freq=10.0), product of:
        0.22757743 = queryWeight, product of:
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.04479146 = queryNorm
        0.50209284 = fieldWeight in 4709, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.03125 = fieldNorm(doc=4709)
  0.25 = coord(1/4)
```
Content

"Swoogle, the Semantic web search engine, is a research project carried out by the ebiquity research group in the Computer Science and Electrical Engineering Department at the University of Maryland. It's an engine tailored towards finding documents on the semantic web. The whole research paper is available here. Semantic web is touted as the next generation of online content representation where the web documents are represented in a language that is not only easy for humans but is machine readable (easing the integration of data as never thought possible) as well. And the main elements of the semantic web include data model description formats such as Resource Description Framework (RDF), a variety of data interchange formats (e.g. RDF/XML, Turtle, N-Triples), and notations such as RDF Schema (RDFS), the Web Ontology Language (OWL), all of which are intended to provide a formal description of concepts, terms, and relationships within a given knowledge domain (Wikipedia). And Swoogle is an attempt to mine and index this new set of web documents. The engine performs crawling of semantic documents like most web search engines and the search is available as web service too. The engine is primarily written in Java with the PHP used for the front-end and MySQL for database. Swoogle is capable of searching over 10,000 ontologies and indexes more that 1.3 million web documents. It also computes the importance of a Semantic Web document. The techniques used for indexing are the more google-type page ranking and also mining the documents for inter-relationships that are the basis for the semantic web. For more information on how the RDF framework can be used to relate documents, read the link here. Being a research project, and with a non-commercial motive, there is not much hype around Swoogle. However, the approach to indexing of Semantic web documents is an approach that most engines will have to take at some point of time. When the Internet debuted, there were no specific engines available for indexing or searching. The Search domain only picked up as more and more content became available. One fundamental question that I've always wondered about it is - provided that the search engines return very relevant results for a query - how to ascertain that the documents are indeed the most relevant ones available. There is always an inherent delay in indexing of document. Its here that the new semantic documents search engines can close delay. Experimenting with the concept of Search in the semantic web can only bore well for the future of search technology."
Lewandowski, D.; Mayr, P.: Exploring the academic invisible Web (2006) 0.03
```
0.027659154 = product of:
  0.110636614 = sum of:
    0.110636614 = weight(_text_:engines in 3752) [ClassicSimilarity], result of:
      0.110636614 = score(doc=3752,freq=6.0), product of:
        0.22757743 = queryWeight, product of:
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.04479146 = queryNorm
        0.4861493 = fieldWeight in 3752, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3752)
  0.25 = coord(1/4)
```
Abstract

Purpose: To provide a critical review of Bergman's 2001 study on the Deep Web. In addition, we bring a new concept into the discussion, the Academic Invisible Web (AIW). We define the Academic Invisible Web as consisting of all databases and collections relevant to academia but not searchable by the general-purpose internet search engines. Indexing this part of the Invisible Web is central to scien-tific search engines. We provide an overview of approaches followed thus far. Design/methodology/approach: Discussion of measures and calculations, estima-tion based on informetric laws. Literature review on approaches for uncovering information from the Invisible Web. Findings: Bergman's size estimate of the Invisible Web is highly questionable. We demonstrate some major errors in the conceptual design of the Bergman paper. A new (raw) size estimate is given. Research limitations/implications: The precision of our estimate is limited due to a small sample size and lack of reliable data. Practical implications: We can show that no single library alone will be able to index the Academic Invisible Web. We suggest collaboration to accomplish this task. Originality/value: Provides library managers and those interested in developing academic search engines with data on the size and attributes of the Academic In-visible Web.
Lewandowski, D.: How can library materials be ranked in the OPAC? (2009) 0.03
```
0.027659154 = product of:
  0.110636614 = sum of:
    0.110636614 = weight(_text_:engines in 2810) [ClassicSimilarity], result of:
      0.110636614 = score(doc=2810,freq=6.0), product of:
        0.22757743 = queryWeight, product of:
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.04479146 = queryNorm
        0.4861493 = fieldWeight in 2810, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2810)
  0.25 = coord(1/4)
```
Abstract

Some Online Public Access Catalogues offer a ranking component. However, ranking there is merely text-based and is doomed to fail due to limited text in bibliographic data. The main assumption for the talk is that we are in a situation where the appropriate ranking factors for OPACs should be defined, while the implementation is no major problem. We must define what we want, and not so much focus on the technical work. Some deep thinking is necessary on the "perfect results set" and how we can achieve it through ranking. The talk presents a set of potential ranking factors and clustering possibilities for further discussion. A look at commercial Web search engines could provide us with ideas how ranking can be improved with additional factors. Search engines are way beyond pure text-based ranking and apply ranking factors in the groups like popularity, freshness, personalisation, etc. The talk describes the main factors used in search engines and how derivatives of these could be used for libraries' purposes. The goal of ranking is to provide the user with the best-suitable results on top of the results list. How can this goal be achieved with the library catalogue and also concerning the library's different collections and databases? The assumption is that ranking of such materials is a complex problem and is yet nowhere near solved. Libraries should focus on ranking to improve user experience.
Maurer, H.; Balke, T.; Kappe,, F.; Kulathuramaiyer, N.; Weber, S.; Zaka, B.: Report on dangers and opportunities posed by large search engines, particularly Google (2007) 0.02
```
0.023469567 = product of:
  0.09387827 = sum of:
    0.09387827 = weight(_text_:engines in 754) [ClassicSimilarity], result of:
      0.09387827 = score(doc=754,freq=12.0), product of:
        0.22757743 = queryWeight, product of:
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.04479146 = queryNorm
        0.41251132 = fieldWeight in 754, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.0234375 = fieldNorm(doc=754)
  0.25 = coord(1/4)
```
Abstract

The aim of our investigation was to discuss exactly what is formulated in the title. This will of course constitute a main part of this write-up. However, in the process of investigations it also became clear that the focus has to be extended, not to just cover Google and search engines in an isolated fashion, but to also cover other Web 2.0 related phenomena, particularly Wikipedia, Blogs, and other related community efforts. It was the purpose of our investigation to demonstrate: - Plagiarism and IPR violation are serious concerns in academia and in the commercial world - Current techniques to fight both are rudimentary, yet could be improved by a concentrated initiative - One reason why the fight is difficult is the dominance of Google as THE major search engine and that Google is unwilling to cooperate - The monopolistic behaviour of Google is also threatening how we see the world, how we as individuals are seen (complete loss of privacy) and is threatening even world economy (!) In our proposal we did present a list of typical sections that would be covered at varying depth, with the possible replacement of one or the other by items that would emerge as still more important.
Section 11: To argue that fighting large search engines and plagiarism slice-by-slice by using dedicated servers combined by one hub could eventually decrease the importance of other global search engines. Section 12: To argue that global search engines are an area that cannot be left to the free market, but require some government control or at least non-profit institutions. We will mention other areas where similar if not as glaring phenomena are visible. Section 13: We will mention in passing the potential role of virtual worlds, such as the currently overhyped system "second life". Section 14: To elaborate and try out a model for knowledge workers that does not require special search engines, with a description of a simple demonstrator. Section 15 (Not originally part of the proposal): To propose concrete actions and to describe an Austrian effort that could, with moderate support, minimize the role of Google for Austria. Section 16: References (Not originally part of the proposal) In what follows, we will stick to Sections 1 -14 plus the new Sections 15 and 16 as listed, plus a few Appendices.
Zhang, L.; Liu, Q.L.; Zhang, J.; Wang, H.F.; Pan, Y.; Yu, Y.: Semplore: an IR approach to scalable hybrid query of Semantic Web data (2007) 0.02
```
0.022583602 = product of:
  0.09033441 = sum of:
    0.09033441 = weight(_text_:engines in 231) [ClassicSimilarity], result of:
      0.09033441 = score(doc=231,freq=4.0), product of:
        0.22757743 = queryWeight, product of:
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.04479146 = queryNorm
        0.39693922 = fieldWeight in 231, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.0390625 = fieldNorm(doc=231)
  0.25 = coord(1/4)
```
Abstract

As an extension to the current Web, Semantic Web will not only contain structured data with machine understandable semantics but also textual information. While structured queries can be used to find information more precisely on the Semantic Web, keyword searches are still needed to help exploit textual information. It thus becomes very important that we can combine precise structured queries with imprecise keyword searches to have a hybrid query capability. In addition, due to the huge volume of information on the Semantic Web, the hybrid query must be processed in a very scalable way. In this paper, we define such a hybrid query capability that combines unary tree-shaped structured queries with keyword searches. We show how existing information retrieval (IR) index structures and functions can be reused to index semantic web data and its textual information, and how the hybrid query is evaluated on the index structure using IR engines in an efficient and scalable manner. We implemented this IR approach in an engine called Semplore. Comprehensive experiments on its performance show that it is a promising approach. It leads us to believe that it may be possible to evolve current web search engines to query and search the Semantic Web. Finally, we briefy describe how Semplore is used for searching Wikipedia and an IBM customer's product information.

Krempl, S.: Google muss zerschlagen werden (2007) 0.02

0.022356624 = product of:
  0.089426495 = sum of:
    0.089426495 = weight(_text_:engines in 753) [ClassicSimilarity], result of:
      0.089426495 = score(doc=753,freq=2.0), product of:
        0.22757743 = queryWeight, product of:
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.04479146 = queryNorm
        0.39294976 = fieldWeight in 753, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.0546875 = fieldNorm(doc=753)
  0.25 = coord(1/4)

Content: Vgl. die Studie "Maurer, H. et al: Report on dangers and opportunities posed by large search engines, particularly Google" unter: http://www.iicm.tugraz.at/iicm_papers/dangers_google.pdf.

Godby, C.J.; Young, J.A.; Childress, E.: ¬A repository of metadata crosswalks (2004) 0.02
```
0.022356624 = product of:
  0.089426495 = sum of:
    0.089426495 = weight(_text_:engines in 1155) [ClassicSimilarity], result of:
      0.089426495 = score(doc=1155,freq=2.0), product of:
        0.22757743 = queryWeight, product of:
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.04479146 = queryNorm
        0.39294976 = fieldWeight in 1155, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1155)
  0.25 = coord(1/4)
```
Abstract

This paper proposes a model for metadata crosswalks that associates three pieces of information: the crosswalk, the source metadata standard, and the target metadata standard, each of which may have a machine-readable encoding and human-readable description. The crosswalks are encoded as METS records that are made available to a repository for processing by search engines, OAI harvesters, and custom-designed Web services. The METS object brings together all of the information required to access and interpret crosswalks and represents a significant improvement over previously available formats. But it raises questions about how best to describe these complex objects and exposes gaps that must eventually be filled in by the digital library community.
Lossau, N.: Search engine technology and digital libraries : libraries need to discover the academic internet (2004) 0.02
```
0.022356624 = product of:
  0.089426495 = sum of:
    0.089426495 = weight(_text_:engines in 1161) [ClassicSimilarity], result of:
      0.089426495 = score(doc=1161,freq=2.0), product of:
        0.22757743 = queryWeight, product of:
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.04479146 = queryNorm
        0.39294976 = fieldWeight in 1161, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1161)
  0.25 = coord(1/4)
```
Abstract

With the development of the World Wide Web, the "information search" has grown to be a significant business sector of a global, competitive and commercial market. Powerful players have entered this market, such as commercial internet search engines, information portals, multinational publishers and online content integrators. Will Google, Yahoo or Microsoft be the only portals to global knowledge in 2010? If libraries do not want to become marginalized in a key area of their traditional services, they need to acknowledge the challenges that come with the globalisation of scholarly information, the existence and further growth of the academic internet
Colomb, R.M.: Quality of ontologies in interoperating information systems (2002) 0.02
```
0.018607298 = product of:
  0.07442919 = sum of:
    0.07442919 = product of:
      0.14885838 = sum of:
        0.14885838 = weight(_text_:programming in 7858) [ClassicSimilarity], result of:
          0.14885838 = score(doc=7858,freq=2.0), product of:
            0.29361802 = queryWeight, product of:
              6.5552235 = idf(docFreq=170, maxDocs=44218)
              0.04479146 = queryNorm
            0.5069797 = fieldWeight in 7858, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.5552235 = idf(docFreq=170, maxDocs=44218)
              0.0546875 = fieldNorm(doc=7858)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

The focus of this paper is an quality of ontologies as they relate to interoperating information systems. Quality is not a property of something but a judgment, so must be relative to some purpose, and generally involves recognition of design tradeoffs. Ontologies used for information systems interoperability have much in common with classification systems in information science, knowledge based systems, and programming languages, and inherit quality characteristics from each of these older areas. Factors peculiar to the new field lead to some additional characteristics relevant to quality, some of which are more profitably considered quality aspects not of the ontology as such, but of the environment through which the ontology is made available to its users. Suggestions are presented as to how to use these Factors in producing quality ontologies.
Summann, F.; Lossau, N.: Search engine technology and digital libraries : moving from theory to practice (2004) 0.02
```
0.018066881 = product of:
  0.072267525 = sum of:
    0.072267525 = weight(_text_:engines in 1196) [ClassicSimilarity], result of:
      0.072267525 = score(doc=1196,freq=4.0), product of:
        0.22757743 = queryWeight, product of:
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.04479146 = queryNorm
        0.31755137 = fieldWeight in 1196, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.03125 = fieldNorm(doc=1196)
  0.25 = coord(1/4)
```
Abstract

This article describes the journey from the conception of and vision for a modern search-engine-based search environment to its technological realisation. In doing so, it takes up the thread of an earlier article on this subject, this time from a technical viewpoint. As well as presenting the conceptual considerations of the initial stages, this article will principally elucidate the technological aspects of this journey. The starting point for the deliberations about development of an academic search engine was the experience we gained through the generally successful project "Digital Library NRW", in which from 1998 to 2000-with Bielefeld University Library in overall charge-we designed a system model for an Internet-based library portal with an improved academic search environment at its core. At the heart of this system was a metasearch with an availability function, to which we added a user interface integrating all relevant source material for study and research. The deficiencies of this approach were felt soon after the system was launched in June 2001. There were problems with the stability and performance of the database retrieval system, with the integration of full-text documents and Internet pages, and with acceptance by users, because users are increasingly performing the searches themselves using search engines rather than going to the library for help in doing searches. Since a long list of problems are also encountered using commercial search engines for academic use (in particular the retrieval of academic information and long-term availability), the idea was born for a search engine configured specifically for academic use. We also hoped that with one single access point founded on improved search engine technology, we could access the heterogeneous academic resources of subject-based bibliographic databases, catalogues, electronic newspapers, document servers and academic web pages.
Birmingham, W.; Pardo, B.; Meek, C.; Shifrin, J.: ¬The MusArt music-retrieval system (2002) 0.02
```
0.018066881 = product of:
  0.072267525 = sum of:
    0.072267525 = weight(_text_:engines in 1205) [ClassicSimilarity], result of:
      0.072267525 = score(doc=1205,freq=4.0), product of:
        0.22757743 = queryWeight, product of:
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.04479146 = queryNorm
        0.31755137 = fieldWeight in 1205, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.03125 = fieldNorm(doc=1205)
  0.25 = coord(1/4)
```
Abstract

Music websites are ubiquitous, and music downloads, such as MP3, are a major source of Web traffic. As the amount of musical content increases and the Web becomes an important mechanism for distributing music, we expect to see a rising demand for music search services. Many currently available music search engines rely on file names, song title, composer or performer as the indexing and retrieval mechanism. These systems do not make use of the musical content. We believe that a more natural, effective, and usable music-information retrieval (MIR) system should have audio input, where the user can query with musical content. We are developing a system called MusArt for audio-input MIR. With MusArt, as with other audio-input MIR systems, a user sings or plays a theme, hook, or riff from the desired piece of music. The system transcribes the query and searches for related themes in a database, returning the most similar themes, given some measure of similarity. We call this "retrieval by query." In this paper, we describe the architecture of MusArt. An important element of MusArt is metadata creation: we believe that it is essential to automatically abstract important musical elements, particularly themes. Theme extraction is performed by a subsystem called MME, which we describe later in this paper. Another important element of MusArt is its support for a variety of search engines, as we believe that MIR is too complex for a single approach to work for all queries. Currently, MusArt supports a dynamic time-warping search engine that has high recall, and a complementary stochastic search engine that searches over themes, emphasizing speed and relevancy. The stochastic search engine is discussed in this paper.
Schallier, W.: Why organize information if you can find it? : UDC and libraries in an Internet world (2007) 0.02
```
0.015969018 = product of:
  0.06387607 = sum of:
    0.06387607 = weight(_text_:engines in 549) [ClassicSimilarity], result of:
      0.06387607 = score(doc=549,freq=2.0), product of:
        0.22757743 = queryWeight, product of:
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.04479146 = queryNorm
        0.2806784 = fieldWeight in 549, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.0390625 = fieldNorm(doc=549)
  0.25 = coord(1/4)
```
Abstract

The Belgians Otlet and LaFontaine created the Universal Decimal Classification in order to collect and organize the world's knowledge. This happened in an age when information was almost exclusively made available by libraries. Since the internet, the quantity of information outside libraries is enormous and keeps growing every day. The internet is accessible to anybody, it is fundamentally unorganized and its content changes constantly. Collecting and organizing the world's knowledge seem to have become an impossible ambition. Perhaps it is even unnecessary, since search engines make information retrievable now. And why would we organize information if we can find it? So what will be the role of UDC and libraries in this internet environment? Libraries can still play a role as a major information provider, if they adapt fully to the expectations of a modern end user. The design and the functionalities of online catalogues should allow maximal accessibility, usability and active participation of the end user in the internet environment. Metadata, like UDC, should maximize the visibility of information, enrich it and invite the end user to assign metadata himself.
Hofmann-Apitius, M.: Direct use of information extraction from scientific text for modeling and simulation in the life sciences (2009) 0.02
```
0.015969018 = product of:
  0.06387607 = sum of:
    0.06387607 = weight(_text_:engines in 2814) [ClassicSimilarity], result of:
      0.06387607 = score(doc=2814,freq=2.0), product of:
        0.22757743 = queryWeight, product of:
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.04479146 = queryNorm
        0.2806784 = fieldWeight in 2814, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2814)
  0.25 = coord(1/4)
```
Abstract

Scientific biomedical publications are a rich source of information about diseases and the molecules that play a role in the molecular etiology of a disease. With the development of automated methods for the identification of named biomedical entities in scientific text ("text mining") we are now able to automatically screen millions of publications for genes, their relationships to other genes, their role in the development of a disease and their role as potential targets for therapeutic cures. In fact, modern advanced search engines are now able to extract various terms in scientific text that represent entities which can be directly used for modeling of diseases and simulation of disease-relevant molecular networks. In my presentation, I will demonstrate how scientific text can be analyzed using a combination of algorithmic approaches (dictionary- and rule-based as well as machine learning - based methods). I will furthermore demonstrate, how scientific information extracted from text can be applied in disease modeling approaches that combine heterogeneous information types (protein-protein-interactions, allelic variants of genes, clinical phenotype information) extracted from scientific publications. I will furthermore show how the analysis of scientific text can be used to construct "knowledge descriptors" that allow a completely new way of predicting the activity of small pharmaceutical molecules. Taken together, the talk will hopefully provide a clue how far we really are from using text analytics for direct modeling and simulation in the life sciences.

Search (95 results, page 1 of 5)

Authors

Languages

Types

Themes