Search (66 results, page 1 of 4)

Popper, K.R.: Three worlds : the Tanner lecture on human values. Deliverd at the University of Michigan, April 7, 1978 (1978) 0.19

0.18970858 = product of:
  0.37941715 = sum of:
    0.09485429 = product of:
      0.28456286 = sum of:
        0.28456286 = weight(_text_:3a in 230) [ClassicSimilarity], result of:
          0.28456286 = score(doc=230,freq=2.0), product of:
            0.3797425 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.04479146 = queryNorm
            0.7493574 = fieldWeight in 230, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.0625 = fieldNorm(doc=230)
      0.33333334 = coord(1/3)
    0.28456286 = weight(_text_:2f in 230) [ClassicSimilarity], result of:
      0.28456286 = score(doc=230,freq=2.0), product of:
        0.3797425 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.04479146 = queryNorm
        0.7493574 = fieldWeight in 230, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.0625 = fieldNorm(doc=230)
  0.5 = coord(2/4)

Source: https%3A%2F%2Ftannerlectures.utah.edu%2F_documents%2Fa-to-z%2Fp%2Fpopper80.pdf&usg=AOvVaw3f4QRTEH-OEBmoYr2J_c7H

Dunning, A.: Do we still need search engines? (1999) 0.11

0.11066668 = product of:
  0.22133335 = sum of:
    0.17885299 = weight(_text_:engines in 6021) [ClassicSimilarity], result of:
      0.17885299 = score(doc=6021,freq=2.0), product of:
        0.22757743 = queryWeight, product of:
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.04479146 = queryNorm
        0.7858995 = fieldWeight in 6021, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.109375 = fieldNorm(doc=6021)
    0.042480372 = product of:
      0.084960744 = sum of:
        0.084960744 = weight(_text_:22 in 6021) [ClassicSimilarity], result of:
          0.084960744 = score(doc=6021,freq=2.0), product of:
            0.15685207 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04479146 = queryNorm
            0.5416616 = fieldWeight in 6021, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=6021)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Source: Ariadne. 1999, no.22

Cohen, D.J.: From Babel to knowledge : data mining large digital collections (2006) 0.06
```
0.057399243 = product of:
  0.114798486 = sum of:
    0.072267525 = weight(_text_:engines in 1178) [ClassicSimilarity], result of:
      0.072267525 = score(doc=1178,freq=4.0), product of:
        0.22757743 = queryWeight, product of:
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.04479146 = queryNorm
        0.31755137 = fieldWeight in 1178, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.03125 = fieldNorm(doc=1178)
    0.042530965 = product of:
      0.08506193 = sum of:
        0.08506193 = weight(_text_:programming in 1178) [ClassicSimilarity], result of:
          0.08506193 = score(doc=1178,freq=2.0), product of:
            0.29361802 = queryWeight, product of:
              6.5552235 = idf(docFreq=170, maxDocs=44218)
              0.04479146 = queryNorm
            0.28970268 = fieldWeight in 1178, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.5552235 = idf(docFreq=170, maxDocs=44218)
              0.03125 = fieldNorm(doc=1178)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

In Jorge Luis Borges's curious short story The Library of Babel, the narrator describes an endless collection of books stored from floor to ceiling in a labyrinth of countless hexagonal rooms. The pages of the library's books seem to contain random sequences of letters and spaces; occasionally a few intelligible words emerge in the sea of paper and ink. Nevertheless, readers diligently, and exasperatingly, scan the shelves for coherent passages. The narrator himself has wandered numerous rooms in search of enlightenment, but with resignation he simply awaits his death and burial - which Borges explains (with signature dark humor) consists of being tossed unceremoniously over the library's banister. Borges's nightmare, of course, is a cursed vision of the research methods of disciplines such as literature, history, and philosophy, where the careful reading of books, one after the other, is supposed to lead inexorably to knowledge and understanding. Computer scientists would approach Borges's library far differently. Employing the information theory that forms the basis for search engines and other computerized techniques for assessing in one fell swoop large masses of documents, they would quickly realize the collection's incoherence though sampling and statistical methods - and wisely start looking for the library's exit. These computational methods, which allow us to find patterns, determine relationships, categorize documents, and extract information from massive corpuses, will form the basis for new tools for research in the humanities and other disciplines in the coming decade. For the past three years I have been experimenting with how to provide such end-user tools - that is, tools that harness the power of vast electronic collections while hiding much of their complicated technical plumbing. In particular, I have made extensive use of the application programming interfaces (APIs) the leading search engines provide for programmers to query their databases directly (from server to server without using their web interfaces). In addition, I have explored how one might extract information from large digital collections, from the well-curated lexicographic database WordNet to the democratic (and poorly curated) online reference work Wikipedia. While processing these digital corpuses is currently an imperfect science, even now useful tools can be created by combining various collections and methods for searching and analyzing them. And more importantly, these nascent services suggest a future in which information can be gleaned from, and sense can be made out of, even imperfect digital libraries of enormous scale. A brief examination of two approaches to data mining large digital collections hints at this future, while also providing some lessons about how to get there.

Overton, R.: Search engines get faster and faster, but not always better (1996) 0.06

0.055318307 = product of:
  0.22127323 = sum of:
    0.22127323 = weight(_text_:engines in 5669) [ClassicSimilarity], result of:
      0.22127323 = score(doc=5669,freq=6.0), product of:
        0.22757743 = queryWeight, product of:
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.04479146 = queryNorm
        0.9722986 = fieldWeight in 5669, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.078125 = fieldNorm(doc=5669)
  0.25 = coord(1/4)

Abstract: Good article listing the pros and cons of the most popular search engines. Grades search engines and recommends thoch ones to use and not to use. Also provides good table of features

Stanley, T.: Alta Vista vs. Lycos (1996) 0.05

0.054200646 = product of:
  0.21680258 = sum of:
    0.21680258 = weight(_text_:engines in 3939) [ClassicSimilarity], result of:
      0.21680258 = score(doc=3939,freq=4.0), product of:
        0.22757743 = queryWeight, product of:
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.04479146 = queryNorm
        0.9526541 = fieldWeight in 3939, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.09375 = fieldNorm(doc=3939)
  0.25 = coord(1/4)

Abstract: Very good review of what many people think are the top 2 rated search engines. has extensive narrative and several tables
Footnote: Auch unter: http://ukoln.bath.ac.uk/ariadne/issue2/engines/

Page, A.: ¬The search is over : the search-engines secrets of the pros (1996) 0.05

0.045167204 = product of:
  0.18066882 = sum of:
    0.18066882 = weight(_text_:engines in 5670) [ClassicSimilarity], result of:
      0.18066882 = score(doc=5670,freq=4.0), product of:
        0.22757743 = queryWeight, product of:
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.04479146 = queryNorm
        0.79387844 = fieldWeight in 5670, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.078125 = fieldNorm(doc=5670)
  0.25 = coord(1/4)

Abstract: Covers 8 of the most popular search engines. Gives a summary of each and has a nice table of features that also briefly lists the pros and cons. Includes a short explanation of Boolean operators too

Bradley, P.: ¬The relevance of underpants to searching the Web (2000) 0.04

0.044713248 = product of:
  0.17885299 = sum of:
    0.17885299 = weight(_text_:engines in 3961) [ClassicSimilarity], result of:
      0.17885299 = score(doc=3961,freq=2.0), product of:
        0.22757743 = queryWeight, product of:
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.04479146 = queryNorm
        0.7858995 = fieldWeight in 3961, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.109375 = fieldNorm(doc=3961)
  0.25 = coord(1/4)

Footnote: Auch unter: http://www.ariadne.ac.uk/issue24/search-engines

Priss, U.: Description logic and faceted knowledge representation (1999) 0.04
```
0.04100116 = product of:
  0.16400464 = sum of:
    0.16400464 = sum of:
      0.12759289 = weight(_text_:programming in 2655) [ClassicSimilarity], result of:
        0.12759289 = score(doc=2655,freq=2.0), product of:
          0.29361802 = queryWeight, product of:
            6.5552235 = idf(docFreq=170, maxDocs=44218)
            0.04479146 = queryNorm
          0.43455404 = fieldWeight in 2655, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            6.5552235 = idf(docFreq=170, maxDocs=44218)
            0.046875 = fieldNorm(doc=2655)
      0.036411747 = weight(_text_:22 in 2655) [ClassicSimilarity], result of:
        0.036411747 = score(doc=2655,freq=2.0), product of:
          0.15685207 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.04479146 = queryNorm
          0.23214069 = fieldWeight in 2655, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=2655)
  0.25 = coord(1/4)
```
Abstract

The term "facet" was introduced into the field of library classification systems by Ranganathan in the 1930's [Ranganathan, 1962]. A facet is a viewpoint or aspect. In contrast to traditional classification systems, faceted systems are modular in that a domain is analyzed in terms of baseline facets which are then synthesized. In this paper, the term "facet" is used in a broader meaning. Facets can describe different aspects on the same level of abstraction or the same aspect on different levels of abstraction. The notion of facets is related to database views, multicontexts and conceptual scaling in formal concept analysis [Ganter and Wille, 1999], polymorphism in object-oriented design, aspect-oriented programming, views and contexts in description logic and semantic networks. This paper presents a definition of facets in terms of faceted knowledge representation that incorporates the traditional narrower notion of facets and potentially facilitates translation between different knowledge representation formalisms. A goal of this approach is a modular, machine-aided knowledge base design mechanism. A possible application is faceted thesaurus construction for information retrieval and data mining. Reasoning complexity depends on the size of the modules (facets). A more general analysis of complexity will be left for future research.

Date

22. 1.2016 17:30:31
Wallis, R.; Isaac, A.; Charles, V.; Manguinhas, H.: Recommendations for the application of Schema.org to aggregated cultural heritage metadata to increase relevance and visibility to search engines : the case of Europeana (2017) 0.04
```
0.035707813 = product of:
  0.14283125 = sum of:
    0.14283125 = weight(_text_:engines in 3372) [ClassicSimilarity], result of:
      0.14283125 = score(doc=3372,freq=10.0), product of:
        0.22757743 = queryWeight, product of:
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.04479146 = queryNorm
        0.62761605 = fieldWeight in 3372, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3372)
  0.25 = coord(1/4)
```
Abstract

Europeana provides access to more than 54 million cultural heritage objects through its portal Europeana Collections. It is crucial for Europeana to be recognized by search engines as a trusted authoritative repository of cultural heritage objects. Indeed, even though its portal is the main entry point, most Europeana users come to it via search engines. Europeana Collections is fuelled by metadata describing cultural objects, represented in the Europeana Data Model (EDM). This paper presents the research and consequent recommendations for publishing Europeana metadata using the Schema.org vocabulary and best practices. Schema.org html embedded metadata to be consumed by search engines to power rich services (such as Google Knowledge Graph). Schema.org is an open and widely adopted initiative (used by over 12 million domains) backed by Google, Bing, Yahoo!, and Yandex, for sharing metadata across the web It underpins the emergence of new web techniques, such as so called Semantic SEO. Our research addressed the representation of the embedded metadata as part of the Europeana HTML pages and sitemaps so that the re-use of this data can be optimized. The practical objective of our work is to produce a Schema.org representation of Europeana resources described in EDM, being the richest as possible and tailored to Europeana's realities and user needs as well the search engines and their users.
Warnick, W.L.; Leberman, A.; Scott, R.L.; Spence, K.J.; Johnsom, L.A.; Allen, V.S.: Searching the deep Web : directed query engine applications at the Department of Energy (2001) 0.03
```
0.03319098 = product of:
  0.13276392 = sum of:
    0.13276392 = weight(_text_:engines in 1215) [ClassicSimilarity], result of:
      0.13276392 = score(doc=1215,freq=6.0), product of:
        0.22757743 = queryWeight, product of:
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.04479146 = queryNorm
        0.58337915 = fieldWeight in 1215, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.046875 = fieldNorm(doc=1215)
  0.25 = coord(1/4)
```
Abstract

Directed Query Engines, an emerging class of search engine specifically designed to access distributed resources on the deep web, offer the opportunity to create inexpensive digital libraries. Already, one such engine, Distributed Explorer, has been used to select and assemble high quality information resources and incorporate them into publicly available systems for the physical sciences. By nesting Directed Query Engines so that one query launches several other engines in a cascading fashion, enormous virtual collections may soon be assembled to form a comprehensive information infrastructure for the physical sciences. Once a Directed Query Engine has been configured for a set of information resources, distributed alerts tools can provide patrons with personalized, profile-based notices of recent additions to any of the selected resources. Due to the potentially enormous size and scope of Directed Query Engine applications, consideration must be given to issues surrounding the representation of large quantities of information from multiple, heterogeneous sources.
Brin, S.; Page, L.: ¬The anatomy of a large-scale hypertextual Web search engine (1998) 0.02
```
0.022583602 = product of:
  0.09033441 = sum of:
    0.09033441 = weight(_text_:engines in 947) [ClassicSimilarity], result of:
      0.09033441 = score(doc=947,freq=4.0), product of:
        0.22757743 = queryWeight, product of:
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.04479146 = queryNorm
        0.39693922 = fieldWeight in 947, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.0390625 = fieldNorm(doc=947)
  0.25 = coord(1/4)
```
Abstract

In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 million pages is available at http://google.stanford.edu/. To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from three years ago. This paper provides an in-depth description of our large-scale web search engine -- the first such detailed public description we know of to date. Apart from the problems of scaling traditional search techniques to data of this magnitude, there are new technical challenges involved with using the additional information present in hypertext to produce better search results. This paper addresses this question of how to build a practical large-scale system which can exploit the additional information present in hypertext. Also we look at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want
Zhang, L.; Liu, Q.L.; Zhang, J.; Wang, H.F.; Pan, Y.; Yu, Y.: Semplore: an IR approach to scalable hybrid query of Semantic Web data (2007) 0.02
```
0.022583602 = product of:
  0.09033441 = sum of:
    0.09033441 = weight(_text_:engines in 231) [ClassicSimilarity], result of:
      0.09033441 = score(doc=231,freq=4.0), product of:
        0.22757743 = queryWeight, product of:
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.04479146 = queryNorm
        0.39693922 = fieldWeight in 231, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.0390625 = fieldNorm(doc=231)
  0.25 = coord(1/4)
```
Abstract

As an extension to the current Web, Semantic Web will not only contain structured data with machine understandable semantics but also textual information. While structured queries can be used to find information more precisely on the Semantic Web, keyword searches are still needed to help exploit textual information. It thus becomes very important that we can combine precise structured queries with imprecise keyword searches to have a hybrid query capability. In addition, due to the huge volume of information on the Semantic Web, the hybrid query must be processed in a very scalable way. In this paper, we define such a hybrid query capability that combines unary tree-shaped structured queries with keyword searches. We show how existing information retrieval (IR) index structures and functions can be reused to index semantic web data and its textual information, and how the hybrid query is evaluated on the index structure using IR engines in an efficient and scalable manner. We implemented this IR approach in an engine called Semplore. Comprehensive experiments on its performance show that it is a promising approach. It leads us to believe that it may be possible to evolve current web search engines to query and search the Semantic Web. Finally, we briefy describe how Semplore is used for searching Wikipedia and an IBM customer's product information.
Rajasurya, S.; Muralidharan, T.; Devi, S.; Swamynathan, S.: Semantic information retrieval using ontology in university domain (2012) 0.02
```
0.022583602 = product of:
  0.09033441 = sum of:
    0.09033441 = weight(_text_:engines in 2861) [ClassicSimilarity], result of:
      0.09033441 = score(doc=2861,freq=4.0), product of:
        0.22757743 = queryWeight, product of:
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.04479146 = queryNorm
        0.39693922 = fieldWeight in 2861, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2861)
  0.25 = coord(1/4)
```
Abstract

Today's conventional search engines hardly do provide the essential content relevant to the user's search query. This is because the context and semantics of the request made by the user is not analyzed to the full extent. So here the need for a semantic web search arises. SWS is upcoming in the area of web search which combines Natural Language Processing and Artificial Intelligence. The objective of the work done here is to design, develop and implement a semantic search engine- SIEU(Semantic Information Extraction in University Domain) confined to the university domain. SIEU uses ontology as a knowledge base for the information retrieval process. It is not just a mere keyword search. It is one layer above what Google or any other search engines retrieve by analyzing just the keywords. Here the query is analyzed both syntactically and semantically. The developed system retrieves the web results more relevant to the user query through keyword expansion. The results obtained here will be accurate enough to satisfy the request made by the user. The level of accuracy will be enhanced since the query is analyzed semantically. The system will be of great use to the developers and researchers who work on web. The Google results are re-ranked and optimized for providing the relevant links. For ranking an algorithm has been applied which fetches more apt results for the user query.
Zhao, Y.; Ma, F.; Xia, X.: Evaluating the coverage of entities in knowledge graphs behind general web search engines : Poster (2017) 0.02
```
0.022583602 = product of:
  0.09033441 = sum of:
    0.09033441 = weight(_text_:engines in 3854) [ClassicSimilarity], result of:
      0.09033441 = score(doc=3854,freq=4.0), product of:
        0.22757743 = queryWeight, product of:
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.04479146 = queryNorm
        0.39693922 = fieldWeight in 3854, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3854)
  0.25 = coord(1/4)
```
Abstract

Web search engines, such as Google and Bing, are constantly employing results from knowledge organization and various visualization features to improve their search services. Knowledge graph, a large repository of structured knowledge represented by formal languages such as RDF (Resource Description Framework), is used to support entity search feature of Google and Bing (Demartini, 2016). When a user searchs for an entity, such as a person, an organization, or a place in Google or Bing, it is likely that a knowledge cardwill be presented on the right side bar of the search engine result pages (SERPs). For example, when a user searches the entity Benedict Cumberbatch on Google, the knowledge card will show the basic structured information about this person, including his date of birth, height, spouse, parents, and his movies, etc. The knowledge card, which is used to present the result of entity search, is generated from knowledge graphs. Therefore, the quality of knowledge graphs is essential to the performance of entity search. However, studies on the quality of knowledge graphs from the angle of entity coverage are scant in the literature. This study aims to investigate the coverage of entities of knowledge graphs behind Google and Bing.

Perovsek, M.; Kranjca, J.; Erjaveca, T.; Cestnika, B.; Lavraca, N.: TextFlows : a visual programming platform for text mining and natural language processing (2016) 0.02

0.02255545 = product of:
  0.0902218 = sum of:
    0.0902218 = product of:
      0.1804436 = sum of:
        0.1804436 = weight(_text_:programming in 2697) [ClassicSimilarity], result of:
          0.1804436 = score(doc=2697,freq=4.0), product of:
            0.29361802 = queryWeight, product of:
              6.5552235 = idf(docFreq=170, maxDocs=44218)
              0.04479146 = queryNorm
            0.6145522 = fieldWeight in 2697, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              6.5552235 = idf(docFreq=170, maxDocs=44218)
              0.046875 = fieldNorm(doc=2697)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Source: Science of computer programming. In Press, 2016

Klic, L.; Miller, M.; Nelson, J.K.; Germann, J.E.: Approaching the largest 'API' : extracting information from the Internet with Python (2018) 0.02
```
0.02255545 = product of:
  0.0902218 = sum of:
    0.0902218 = product of:
      0.1804436 = sum of:
        0.1804436 = weight(_text_:programming in 4239) [ClassicSimilarity], result of:
          0.1804436 = score(doc=4239,freq=4.0), product of:
            0.29361802 = queryWeight, product of:
              6.5552235 = idf(docFreq=170, maxDocs=44218)
              0.04479146 = queryNorm
            0.6145522 = fieldWeight in 4239, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              6.5552235 = idf(docFreq=170, maxDocs=44218)
              0.046875 = fieldNorm(doc=4239)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

This article explores the need for libraries to algorithmically access and manipulate the world's largest API: the Internet. The billions of pages on the 'Internet API' (HTTP, HTML, CSS, XPath, DOM, etc.) are easily accessible and manipulable. Libraries can assist in creating meaning through the datafication of information on the world wide web. Because most information is created for human consumption, some programming is required for automated extraction. Python is an easy-to-learn programming language with extensive packages and community support for web page automation. Four packages (Urllib, Selenium, BeautifulSoup, Scrapy) in Python can automate almost any web page for all sized projects. An example warrant data project is explained to illustrate how well Python packages can manipulate web pages to create meaning through assembling custom datasets.
Godby, C.J.; Young, J.A.; Childress, E.: ¬A repository of metadata crosswalks (2004) 0.02
```
0.022356624 = product of:
  0.089426495 = sum of:
    0.089426495 = weight(_text_:engines in 1155) [ClassicSimilarity], result of:
      0.089426495 = score(doc=1155,freq=2.0), product of:
        0.22757743 = queryWeight, product of:
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.04479146 = queryNorm
        0.39294976 = fieldWeight in 1155, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1155)
  0.25 = coord(1/4)
```
Abstract

This paper proposes a model for metadata crosswalks that associates three pieces of information: the crosswalk, the source metadata standard, and the target metadata standard, each of which may have a machine-readable encoding and human-readable description. The crosswalks are encoded as METS records that are made available to a repository for processing by search engines, OAI harvesters, and custom-designed Web services. The METS object brings together all of the information required to access and interpret crosswalks and represents a significant improvement over previously available formats. But it raises questions about how best to describe these complex objects and exposes gaps that must eventually be filled in by the digital library community.
Lossau, N.: Search engine technology and digital libraries : libraries need to discover the academic internet (2004) 0.02
```
0.022356624 = product of:
  0.089426495 = sum of:
    0.089426495 = weight(_text_:engines in 1161) [ClassicSimilarity], result of:
      0.089426495 = score(doc=1161,freq=2.0), product of:
        0.22757743 = queryWeight, product of:
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.04479146 = queryNorm
        0.39294976 = fieldWeight in 1161, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1161)
  0.25 = coord(1/4)
```
Abstract

With the development of the World Wide Web, the "information search" has grown to be a significant business sector of a global, competitive and commercial market. Powerful players have entered this market, such as commercial internet search engines, information portals, multinational publishers and online content integrators. Will Google, Yahoo or Microsoft be the only portals to global knowledge in 2010? If libraries do not want to become marginalized in a key area of their traditional services, they need to acknowledge the challenges that come with the globalisation of scholarly information, the existence and further growth of the academic internet
Chen, H.: Semantic research for digital libraries (1999) 0.02
```
0.01916282 = product of:
  0.07665128 = sum of:
    0.07665128 = weight(_text_:engines in 1247) [ClassicSimilarity], result of:
      0.07665128 = score(doc=1247,freq=2.0), product of:
        0.22757743 = queryWeight, product of:
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.04479146 = queryNorm
        0.33681408 = fieldWeight in 1247, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.046875 = fieldNorm(doc=1247)
  0.25 = coord(1/4)
```
Abstract

In this era of the Internet and distributed, multimedia computing, new and emerging classes of information systems applications have swept into the lives of office workers and people in general. From digital libraries, multimedia systems, geographic information systems, and collaborative computing to electronic commerce, virtual reality, and electronic video arts and games, these applications have created tremendous opportunities for information and computer science researchers and practitioners. As applications become more pervasive, pressing, and diverse, several well-known information retrieval (IR) problems have become even more urgent. Information overload, a result of the ease of information creation and transmission via the Internet and WWW, has become more troublesome (e.g., even stockbrokers and elementary school students, heavily exposed to various WWW search engines, are versed in such IR terminology as recall and precision). Significant variations in database formats and structures, the richness of information media (text, audio, and video), and an abundance of multilingual information content also have created severe information interoperability problems -- structural interoperability, media interoperability, and multilingual interoperability.
Hjoerland, B.: Information retrieval and knowledge organization : a perspective from the philosophy of science 0.02
```
0.01916282 = product of:
  0.07665128 = sum of:
    0.07665128 = weight(_text_:engines in 206) [ClassicSimilarity], result of:
      0.07665128 = score(doc=206,freq=2.0), product of:
        0.22757743 = queryWeight, product of:
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.04479146 = queryNorm
        0.33681408 = fieldWeight in 206, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.080822 = idf(docFreq=746, maxDocs=44218)
          0.046875 = fieldNorm(doc=206)
  0.25 = coord(1/4)
```
Abstract

Information retrieval (IR) is about making systems for finding documents or information. Knowledge organization (KO) is the field concerned with indexing, classification, and representing documents for IR, browsing, and related processes, whether performed by humans or computers. The field of IR is today dominated by search engines like Google. An important difference between KO and IR as research fields is that KO attempts to reflect knowledge as depicted by contemporary scholarship, in contrast to IR, which is based on, for example, "match" techniques, popularity measures or personalization principles. The classification of documents in KO mostly aims at reflecting the classification of knowledge in the sciences. Books about birds, for example, mostly reflect (or aim at reflecting) how birds are classified in ornithology. KO therefore requires access to the adequate subject knowledge; however, this is often characterized by disagreements. At the deepest layer, such disagreements are based on philosophical issues best characterized as "paradigms". No IR technology and no system of knowledge organization can ever be neutral in relation to paradigmatic conflicts, and therefore such philosophical problems represent the basis for the study of IR and KO.

Search (66 results, page 1 of 4)

Authors

Years

Themes