Search (457 results, page 2 of 23)

Klic, L.; Miller, M.; Nelson, J.K.; Germann, J.E.: Approaching the largest 'API' : extracting information from the Internet with Python (2018) 0.04

0.04017412 = product of:
  0.12052235 = sum of:
    0.057803504 = weight(_text_:wide in 4239) [ClassicSimilarity], result of:
      0.057803504 = score(doc=4239,freq=2.0), product of:
        0.19679762 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.044416238 = queryNorm
        0.29372054 = fieldWeight in 4239, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.046875 = fieldNorm(doc=4239)
    0.062718846 = weight(_text_:web in 4239) [ClassicSimilarity], result of:
      0.062718846 = score(doc=4239,freq=8.0), product of:
        0.14495286 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.044416238 = queryNorm
        0.43268442 = fieldWeight in 4239, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=4239)
  0.33333334 = coord(2/6)

Abstract: This article explores the need for libraries to algorithmically access and manipulate the world's largest API: the Internet. The billions of pages on the 'Internet API' (HTTP, HTML, CSS, XPath, DOM, etc.) are easily accessible and manipulable. Libraries can assist in creating meaning through the datafication of information on the world wide web. Because most information is created for human consumption, some programming is required for automated extraction. Python is an easy-to-learn programming language with extensive packages and community support for web page automation. Four packages (Urllib, Selenium, BeautifulSoup, Scrapy) in Python can automate almost any web page for all sized projects. An example warrant data project is explained to illustrate how well Python packages can manipulate web pages to create meaning through assembling custom datasets.

Zhang, L.; Liu, Q.L.; Zhang, J.; Wang, H.F.; Pan, Y.; Yu, Y.: Semplore: an IR approach to scalable hybrid query of Semantic Web data (2007) 0.04
```
0.039814256 = product of:
  0.11944277 = sum of:
    0.08667288 = weight(_text_:web in 231) [ClassicSimilarity], result of:
      0.08667288 = score(doc=231,freq=22.0), product of:
        0.14495286 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.044416238 = queryNorm
        0.59793836 = fieldWeight in 231, product of:
          4.690416 = tf(freq=22.0), with freq of:
            22.0 = termFreq=22.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=231)
    0.03276989 = weight(_text_:computer in 231) [ClassicSimilarity], result of:
      0.03276989 = score(doc=231,freq=2.0), product of:
        0.16231956 = queryWeight, product of:
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.044416238 = queryNorm
        0.20188503 = fieldWeight in 231, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.0390625 = fieldNorm(doc=231)
  0.33333334 = coord(2/6)
```
Abstract

As an extension to the current Web, Semantic Web will not only contain structured data with machine understandable semantics but also textual information. While structured queries can be used to find information more precisely on the Semantic Web, keyword searches are still needed to help exploit textual information. It thus becomes very important that we can combine precise structured queries with imprecise keyword searches to have a hybrid query capability. In addition, due to the huge volume of information on the Semantic Web, the hybrid query must be processed in a very scalable way. In this paper, we define such a hybrid query capability that combines unary tree-shaped structured queries with keyword searches. We show how existing information retrieval (IR) index structures and functions can be reused to index semantic web data and its textual information, and how the hybrid query is evaluated on the index structure using IR engines in an efficient and scalable manner. We implemented this IR approach in an engine called Semplore. Comprehensive experiments on its performance show that it is a promising approach. It leads us to believe that it may be possible to evolve current web search engines to query and search the Semantic Web. Finally, we briefy describe how Semplore is used for searching Wikipedia and an IBM customer's product information.

Series

Lecture notes in computer science; 4825

Source

Proceeding ISWC'07/ASWC'07 : Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference. Ed.: K. Aberer et al

Theme

Semantic Web

Powell, J.; Fox, E.A.: Multilingual federated searching across heterogeneous collections (1998) 0.04

0.03962797 = product of:
  0.11888391 = sum of:
    0.07707134 = weight(_text_:wide in 1250) [ClassicSimilarity], result of:
      0.07707134 = score(doc=1250,freq=2.0), product of:
        0.19679762 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.044416238 = queryNorm
        0.3916274 = fieldWeight in 1250, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0625 = fieldNorm(doc=1250)
    0.041812565 = weight(_text_:web in 1250) [ClassicSimilarity], result of:
      0.041812565 = score(doc=1250,freq=2.0), product of:
        0.14495286 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.044416238 = queryNorm
        0.2884563 = fieldWeight in 1250, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0625 = fieldNorm(doc=1250)
  0.33333334 = coord(2/6)

Abstract: This article describes a scalable system for searching heterogeneous multilingual collections on the World Wide Web. It details a markup language for describing the characteristics of a search engine and its interface, and a protocol for requesting word translations between languages.

Thesaurus software (2001) 0.04

0.039404403 = product of:
  0.11821321 = sum of:
    0.036585998 = weight(_text_:web in 6773) [ClassicSimilarity], result of:
      0.036585998 = score(doc=6773,freq=2.0), product of:
        0.14495286 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.044416238 = queryNorm
        0.25239927 = fieldWeight in 6773, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=6773)
    0.081627205 = product of:
      0.16325441 = sum of:
        0.16325441 = weight(_text_:programs in 6773) [ClassicSimilarity], result of:
          0.16325441 = score(doc=6773,freq=4.0), product of:
            0.25748047 = queryWeight, product of:
              5.79699 = idf(docFreq=364, maxDocs=44218)
              0.044416238 = queryNorm
            0.6340458 = fieldWeight in 6773, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.79699 = idf(docFreq=364, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6773)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)

Abstract: Members offer comments and suggest resources on programs for creating, maintaining, and publishing thesauri. Formerly a tool for writers and indexers, the thesaurus has taken on a new role as an essential component of the corporate information infrastructure. Many people are using word processor or database programs to create and maintain thesauri, while others are using specialized tools that perform consistency checks and offer special reporting capabilities. Some also use thesaurus modules integrated into another application, such as web publishing, content management, or e-commerce. This article includes material comes from our own experience, email responses from members, and comments from participants in our seminars and roundtables. There's also an introduction to thesauri in a corporate information management system

Subramanian, S.; Shafer, K.E.: Clustering (1998) 0.04

0.039268494 = product of:
  0.11780548 = sum of:
    0.052265707 = weight(_text_:web in 1103) [ClassicSimilarity], result of:
      0.052265707 = score(doc=1103,freq=2.0), product of:
        0.14495286 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.044416238 = queryNorm
        0.36057037 = fieldWeight in 1103, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.078125 = fieldNorm(doc=1103)
    0.06553978 = weight(_text_:computer in 1103) [ClassicSimilarity], result of:
      0.06553978 = score(doc=1103,freq=2.0), product of:
        0.16231956 = queryWeight, product of:
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.044416238 = queryNorm
        0.40377006 = fieldWeight in 1103, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.078125 = fieldNorm(doc=1103)
  0.33333334 = coord(2/6)

Abstract: This article presents our exploration of computer science clustering algorithms as they relate to the Scorpion system. Scorpion is a research project at OCLC that explores the indexing and cataloging of electronic resources. For a more complete description of the Scorpion, please visit the Scorpion Web site at <http://purl.oclc.org/scorpion>

OWL Web Ontology Language Test Cases (2004) 0.04

0.039188966 = product of:
  0.11756689 = sum of:
    0.09349574 = weight(_text_:web in 4685) [ClassicSimilarity], result of:
      0.09349574 = score(doc=4685,freq=10.0), product of:
        0.14495286 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.044416238 = queryNorm
        0.6450079 = fieldWeight in 4685, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0625 = fieldNorm(doc=4685)
    0.024071148 = product of:
      0.048142295 = sum of:
        0.048142295 = weight(_text_:22 in 4685) [ClassicSimilarity], result of:
          0.048142295 = score(doc=4685,freq=2.0), product of:
            0.1555381 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044416238 = queryNorm
            0.30952093 = fieldWeight in 4685, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=4685)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)

Abstract: This document contains and presents test cases for the Web Ontology Language (OWL) approved by the Web Ontology Working Group. Many of the test cases illustrate the correct usage of the Web Ontology Language (OWL), and the formal meaning of its constructs. Other test cases illustrate the resolution of issues considered by the Working Group. Conformance for OWL documents and OWL document checkers is specified.
Date: 14. 8.2011 13:33:22
Theme: Semantic Web

Dextre Clarke, S.G.: Challenges and opportunities for KOS standards (2007) 0.04

0.03843217 = product of:
  0.115296505 = sum of:
    0.073171996 = weight(_text_:web in 4643) [ClassicSimilarity], result of:
      0.073171996 = score(doc=4643,freq=2.0), product of:
        0.14495286 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.044416238 = queryNorm
        0.50479853 = fieldWeight in 4643, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.109375 = fieldNorm(doc=4643)
    0.04212451 = product of:
      0.08424902 = sum of:
        0.08424902 = weight(_text_:22 in 4643) [ClassicSimilarity], result of:
          0.08424902 = score(doc=4643,freq=2.0), product of:
            0.1555381 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044416238 = queryNorm
            0.5416616 = fieldWeight in 4643, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=4643)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)

Date: 22. 9.2007 15:41:14
Theme: Semantic Web

Smith, A.G.: Search features of digital libraries (2000) 0.04
```
0.037701976 = product of:
  0.11310592 = sum of:
    0.0817465 = weight(_text_:wide in 940) [ClassicSimilarity], result of:
      0.0817465 = score(doc=940,freq=4.0), product of:
        0.19679762 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.044416238 = queryNorm
        0.4153836 = fieldWeight in 940, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.046875 = fieldNorm(doc=940)
    0.031359423 = weight(_text_:web in 940) [ClassicSimilarity], result of:
      0.031359423 = score(doc=940,freq=2.0), product of:
        0.14495286 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.044416238 = queryNorm
        0.21634221 = fieldWeight in 940, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=940)
  0.33333334 = coord(2/6)
```
Abstract

Traditional on-line search services such as Dialog, DataStar and Lexis provide a wide range of search features (boolean and proximity operators, truncation, etc). This paper discusses the use of these features for effective searching, and argues that these features are required, regardless of advances in search engine technology. The literature on on-line searching is reviewed, identifying features that searchers find desirable for effective searching. A selective survey of current digital libraries available on the Web was undertaken, identifying which search features are present. The survey indicates that current digital libraries do not implement a wide range of search features. For instance: under half of the examples included controlled vocabulary, under half had proximity searching, only one enabled browsing of term indexes, and none of the digital libraries enable searchers to refine an initial search. Suggestions are made for enhancing the search effectiveness of digital libraries; for instance, by providing a full range of search operators, enabling browsing of search terms, enhancement of records with controlled vocabulary, enabling the refining of initial searches, etc.
Radhakrishnan, A.: Swoogle : an engine for the Semantic Web (2007) 0.04
```
0.037471574 = product of:
  0.11241472 = sum of:
    0.08619881 = weight(_text_:web in 4709) [ClassicSimilarity], result of:
      0.08619881 = score(doc=4709,freq=34.0), product of:
        0.14495286 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.044416238 = queryNorm
        0.59466785 = fieldWeight in 4709, product of:
          5.8309517 = tf(freq=34.0), with freq of:
            34.0 = termFreq=34.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03125 = fieldNorm(doc=4709)
    0.02621591 = weight(_text_:computer in 4709) [ClassicSimilarity], result of:
      0.02621591 = score(doc=4709,freq=2.0), product of:
        0.16231956 = queryWeight, product of:
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.044416238 = queryNorm
        0.16150802 = fieldWeight in 4709, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.03125 = fieldNorm(doc=4709)
  0.33333334 = coord(2/6)
```
Content

"Swoogle, the Semantic web search engine, is a research project carried out by the ebiquity research group in the Computer Science and Electrical Engineering Department at the University of Maryland. It's an engine tailored towards finding documents on the semantic web. The whole research paper is available here. Semantic web is touted as the next generation of online content representation where the web documents are represented in a language that is not only easy for humans but is machine readable (easing the integration of data as never thought possible) as well. And the main elements of the semantic web include data model description formats such as Resource Description Framework (RDF), a variety of data interchange formats (e.g. RDF/XML, Turtle, N-Triples), and notations such as RDF Schema (RDFS), the Web Ontology Language (OWL), all of which are intended to provide a formal description of concepts, terms, and relationships within a given knowledge domain (Wikipedia). And Swoogle is an attempt to mine and index this new set of web documents. The engine performs crawling of semantic documents like most web search engines and the search is available as web service too. The engine is primarily written in Java with the PHP used for the front-end and MySQL for database. Swoogle is capable of searching over 10,000 ontologies and indexes more that 1.3 million web documents. It also computes the importance of a Semantic Web document. The techniques used for indexing are the more google-type page ranking and also mining the documents for inter-relationships that are the basis for the semantic web. For more information on how the RDF framework can be used to relate documents, read the link here. Being a research project, and with a non-commercial motive, there is not much hype around Swoogle. However, the approach to indexing of Semantic web documents is an approach that most engines will have to take at some point of time. When the Internet debuted, there were no specific engines available for indexing or searching. The Search domain only picked up as more and more content became available. One fundamental question that I've always wondered about it is - provided that the search engines return very relevant results for a query - how to ascertain that the documents are indeed the most relevant ones available. There is always an inherent delay in indexing of document. Its here that the new semantic documents search engines can close delay. Experimenting with the concept of Search in the semantic web can only bore well for the future of search technology."

Source

http://www.searchenginejournal.com/swoogle-an-engine-for-the-semantic-web/5469/

Theme

Semantic Web
Miller, D.R.: XML: Libraries' strategic opportunity (2001) 0.04
```
0.037393916 = product of:
  0.112181745 = sum of:
    0.04816959 = weight(_text_:wide in 1467) [ClassicSimilarity], result of:
      0.04816959 = score(doc=1467,freq=2.0), product of:
        0.19679762 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.044416238 = queryNorm
        0.24476713 = fieldWeight in 1467, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1467)
    0.064012155 = weight(_text_:web in 1467) [ClassicSimilarity], result of:
      0.064012155 = score(doc=1467,freq=12.0), product of:
        0.14495286 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.044416238 = queryNorm
        0.4416067 = fieldWeight in 1467, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1467)
  0.33333334 = coord(2/6)
```
Abstract

XML (eXtensible Markup Language) is fast gaining favor as the universal format for data and document exchange -- in effect becoming the lingua franca of the Information Age. Currently, "library information" is at a particular disadvantage on the rapidly evolving World Wide Web. Why? Despite libraries'explorations of web catalogs, scanning projects, digital data repositories, and creation of web pages galore, there remains a digital divide. The core of libraries' data troves are stored in proprietary formats of integrated library systems (ILS) and in the complex and arcane MARC formats -- both restricted chiefly to the province of technical services and systems librarians. Even they are hard-pressed to extract and integrate this wealth of data with resources from outside this rarefied environment. Segregation of library information underlies many difficulties: producing standard bibliographic citations from MARC data, automatically creating new materials lists (including new web resources) on a particular topic, exchanging data with our vendors, and even migrating from one ILS to another. Why do we continue to hobble our potential by embracing these self-imposed limitations? Most ILSs began in libraries, which soon recognized the pitfalls of do-it-yourself solutions. Thus, we wisely anticipated the necessity for standards. However, with the advent of the web, we soon found "our" collections and a flood of new resources appearing in digital format on opposite sides of the divide. If we do not act quickly to integrate library resources with mainstream web resources, we are in grave danger of becoming marginalized
OWL Web Ontology Language Guide (2004) 0.04
```
0.037393916 = product of:
  0.112181745 = sum of:
    0.04816959 = weight(_text_:wide in 4687) [ClassicSimilarity], result of:
      0.04816959 = score(doc=4687,freq=2.0), product of:
        0.19679762 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.044416238 = queryNorm
        0.24476713 = fieldWeight in 4687, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4687)
    0.064012155 = weight(_text_:web in 4687) [ClassicSimilarity], result of:
      0.064012155 = score(doc=4687,freq=12.0), product of:
        0.14495286 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.044416238 = queryNorm
        0.4416067 = fieldWeight in 4687, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4687)
  0.33333334 = coord(2/6)
```
Abstract

The World Wide Web as it is currently constituted resembles a poorly mapped geography. Our insight into the documents and capabilities available are based on keyword searches, abetted by clever use of document connectivity and usage patterns. The sheer mass of this data is unmanageable without powerful tool support. In order to map this terrain more precisely, computational agents require machine-readable descriptions of the content and capabilities of Web accessible resources. These descriptions must be in addition to the human-readable versions of that information. The OWL Web Ontology Language is intended to provide a language that can be used to describe the classes and relations between them that are inherent in Web documents and applications. This document demonstrates the use of the OWL language to - formalize a domain by defining classes and properties of those classes, - define individuals and assert properties about them, and - reason about these classes and individuals to the degree permitted by the formal semantics of the OWL language. The sections are organized to present an incremental definition of a set of classes, properties and individuals, beginning with the fundamentals and proceeding to more complex language components.

Theme

Semantic Web
Hollink, L.; Assem, M. van: Estimating the relevance of search results in the Culture-Web : a study of semantic distance measures (2010) 0.04
```
0.03737721 = product of:
  0.112131625 = sum of:
    0.094078265 = weight(_text_:web in 4649) [ClassicSimilarity], result of:
      0.094078265 = score(doc=4649,freq=18.0), product of:
        0.14495286 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.044416238 = queryNorm
        0.64902663 = fieldWeight in 4649, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=4649)
    0.01805336 = product of:
      0.03610672 = sum of:
        0.03610672 = weight(_text_:22 in 4649) [ClassicSimilarity], result of:
          0.03610672 = score(doc=4649,freq=2.0), product of:
            0.1555381 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044416238 = queryNorm
            0.23214069 = fieldWeight in 4649, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=4649)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)
```
Abstract

More and more cultural heritage institutions publish their collections, vocabularies and metadata on the Web. The resulting Web of linked cultural data opens up exciting new possibilities for searching and browsing through these cultural heritage collections. We report on ongoing work in which we investigate the estimation of relevance in this Web of Culture. We study existing measures of semantic distance and how they apply to two use cases. The use cases relate to the structured, multilingual and multimodal nature of the Culture Web. We distinguish between measures using the Web, such as Google distance and PMI, and measures using the Linked Data Web, i.e. the semantic structure of metadata vocabularies. We perform a small study in which we compare these semantic distance measures to human judgements of relevance. Although it is too early to draw any definitive conclusions, the study provides new insights into the applicability of semantic distance measures to the Web of Culture, and clear starting points for further research.

Date

26.12.2011 13:40:22

Theme

Semantic Web

Louie, A.J.; Maddox, E.L.; Washington, W.: Using faceted classification to provide structure for information architecture (2003) 0.04

0.03737321 = product of:
  0.11211963 = sum of:
    0.057803504 = weight(_text_:wide in 2471) [ClassicSimilarity], result of:
      0.057803504 = score(doc=2471,freq=2.0), product of:
        0.19679762 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.044416238 = queryNorm
        0.29372054 = fieldWeight in 2471, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.046875 = fieldNorm(doc=2471)
    0.054316122 = weight(_text_:web in 2471) [ClassicSimilarity], result of:
      0.054316122 = score(doc=2471,freq=6.0), product of:
        0.14495286 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.044416238 = queryNorm
        0.37471575 = fieldWeight in 2471, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=2471)
  0.33333334 = coord(2/6)

Abstract: This is a short, but very thorough and very interesting, report on how the writers built a faceted classification for some legal information and used it to structure a web site with navigation and searching. There is a good summary of why facets work well and how they fit into bibliographic control in general. The last section is about their implementation of a web site for the Washington State Bar Association's Council for Legal Public Education. Their classification uses three facets: Purpose (the general aim of the document, e.g. Resources for K-12 Teachers), Topic (the subject of the document), and Type (the legal format of the document). See Example Web Sites, below, for a discussion of the site and a problem with its design.
Content: A very large PDF of the six-foot-wide illustrated poster from their poster session is available at http://depts.washington.edu/pettt/presentations/conf_2003/IASummit-Poster-Louie.pdf.

Menzel, C.: Knowledge representation, the World Wide Web, and the evolution of logic (2011) 0.04
```
0.03737321 = product of:
  0.11211963 = sum of:
    0.057803504 = weight(_text_:wide in 761) [ClassicSimilarity], result of:
      0.057803504 = score(doc=761,freq=2.0), product of:
        0.19679762 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.044416238 = queryNorm
        0.29372054 = fieldWeight in 761, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.046875 = fieldNorm(doc=761)
    0.054316122 = weight(_text_:web in 761) [ClassicSimilarity], result of:
      0.054316122 = score(doc=761,freq=6.0), product of:
        0.14495286 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.044416238 = queryNorm
        0.37471575 = fieldWeight in 761, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=761)
  0.33333334 = coord(2/6)
```
Abstract

In this paper, I have traced a series of evolutionary adaptations of FOL motivated entirely by its use by knowledge engineers to represent and share information on the Web culminating in the development of Common Logic. While the primary goal in this paper has been to document this evolution, it is arguable, I think that CL's syntactic and semantic egalitarianism better realizes the goal "topic neutrality" that a logic should ideally exemplify - understood, at least in part, as the idea that logic should as far as possible not itself embody any metaphysical presuppositions. Instead of retaining the traditional metaphysical divisions of FOL that reflect its Fregean origins, CL begins as it were with a single, metaphysically homogeneous domain in which, potentially, anything can play the traditional roles of object, property, relation, and function. Note that the effect of this is not to destroy traditional metaphysical divisions. Rather, it simply to refrain from building those divisions explicitly into one's logic; instead, such divisions are left to the user to introduce and enforce axiomatically in an explicit metaphysical theory.

Theme

Semantic Web

Si, L.E.; O'Brien, A.; Probets, S.: Integration of distributed terminology resources to facilitate subject cross-browsing for library portal systems (2009) 0.04

0.036973603 = product of:
  0.073947206 = sum of:
    0.026132854 = weight(_text_:web in 3628) [ClassicSimilarity], result of:
      0.026132854 = score(doc=3628,freq=2.0), product of:
        0.14495286 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.044416238 = queryNorm
        0.18028519 = fieldWeight in 3628, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3628)
    0.03276989 = weight(_text_:computer in 3628) [ClassicSimilarity], result of:
      0.03276989 = score(doc=3628,freq=2.0), product of:
        0.16231956 = queryWeight, product of:
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.044416238 = queryNorm
        0.20188503 = fieldWeight in 3628, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3628)
    0.0150444675 = product of:
      0.030088935 = sum of:
        0.030088935 = weight(_text_:22 in 3628) [ClassicSimilarity], result of:
          0.030088935 = score(doc=3628,freq=2.0), product of:
            0.1555381 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044416238 = queryNorm
            0.19345059 = fieldWeight in 3628, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3628)
      0.5 = coord(1/2)
  0.5 = coord(3/6)

Abstract: Purpose: To develop a prototype middleware framework between different terminology resources in order to provide a subject cross-browsing service for library portal systems. Design/methodology/approach: Nine terminology experts were interviewed to collect appropriate knowledge to support the development of a theoretical framework for the research. Based on this, a simplified software-based prototype system was constructed incorporating the knowledge acquired. The prototype involved mappings between the computer science schedule of the Dewey Decimal Classification (which acted as a spine) and two controlled vocabularies UKAT and ACM Computing Classification. Subsequently, six further experts in the field were invited to evaluate the prototype system and provide feedback to improve the framework. Findings: The major findings showed that given the large variety of terminology resources distributed on the web, the proposed middleware service is essential to integrate technically and semantically the different terminology resources in order to facilitate subject cross-browsing. A set of recommendations are also made outlining the important approaches and features that support such a cross browsing middleware service.
Content: This paper is a pre-print version presented at the ISKO UK 2009 conference, 22-23 June, prior to peer review and editing. For published proceedings see special issue of Aslib Proceedings journal.

Mayfield, J.; Finin, T.: Information retrieval on the Semantic Web : integrating inference and retrieval 0.04

0.036893092 = product of:
  0.11067928 = sum of:
    0.08961702 = weight(_text_:web in 4330) [ClassicSimilarity], result of:
      0.08961702 = score(doc=4330,freq=12.0), product of:
        0.14495286 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.044416238 = queryNorm
        0.6182494 = fieldWeight in 4330, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4330)
    0.021062255 = product of:
      0.04212451 = sum of:
        0.04212451 = weight(_text_:22 in 4330) [ClassicSimilarity], result of:
          0.04212451 = score(doc=4330,freq=2.0), product of:
            0.1555381 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044416238 = queryNorm
            0.2708308 = fieldWeight in 4330, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4330)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)

Abstract: One vision of the Semantic Web is that it will be much like the Web we know today, except that documents will be enriched by annotations in machine understandable markup. These annotations will provide metadata about the documents as well as machine interpretable statements capturing some of the meaning of document content. We discuss how the information retrieval paradigm might be recast in such an environment. We suggest that retrieval can be tightly bound to inference. Doing so makes today's Web search engines useful to Semantic Web inference engines, and causes improvements in either retrieval or inference to lead directly to improvements in the other.
Date: 12. 2.2011 17:35:22
Theme: Semantic Web

Mäkelä, E.; Hyvönen, E.; Ruotsalo, T.: How to deal with massively heterogeneous cultural heritage data : lessons learned in CultureSampo (2012) 0.04
```
0.036481895 = product of:
  0.109445676 = sum of:
    0.07012181 = weight(_text_:web in 3263) [ClassicSimilarity], result of:
      0.07012181 = score(doc=3263,freq=10.0), product of:
        0.14495286 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.044416238 = queryNorm
        0.48375595 = fieldWeight in 3263, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=3263)
    0.039323866 = weight(_text_:computer in 3263) [ClassicSimilarity], result of:
      0.039323866 = score(doc=3263,freq=2.0), product of:
        0.16231956 = queryWeight, product of:
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.044416238 = queryNorm
        0.24226204 = fieldWeight in 3263, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.046875 = fieldNorm(doc=3263)
  0.33333334 = coord(2/6)
```
Abstract

This paper presents the CultureSampo system for publishing heterogeneous linked data as a service. Discussed are the problems of converting legacy data into linked data, as well as the challenge of making the massively heterogeneous yet interlinked cultural heritage content interoperable on a semantic level. Novel user interface concepts for then utilizing the content are also presented. In the approach described, the data is published not only for human use, but also as intelligent services for other computer systems that can then provide interfaces of their own for the linked data. As a concrete use case of using CultureSampo as a service, the BookSampo system for publishing Finnish fiction literature on the semantic web is presented.

Content

Beitrag eines Schwerpunktthemas: Semantic Web and Reasoning for Cultural Heritage and Digital Libraries: http://www.semantic-web-journal.net/content/how-deal-massively-heterogeneous-cultural-heritage-data-%E2%80%93-lessons-learned-culturesampo http://www.semantic-web-journal.net/sites/default/files/swj160_0.pdf.

Source

Semantic Web journal. 3(2012) no.1, S.85-109

Leskinen, P.; Hyvönen, E.: Extracting genealogical networks of linked data from biographical texts (2019) 0.04

0.036415547 = product of:
  0.10924664 = sum of:
    0.063368805 = weight(_text_:web in 5798) [ClassicSimilarity], result of:
      0.063368805 = score(doc=5798,freq=6.0), product of:
        0.14495286 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.044416238 = queryNorm
        0.43716836 = fieldWeight in 5798, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5798)
    0.04587784 = weight(_text_:computer in 5798) [ClassicSimilarity], result of:
      0.04587784 = score(doc=5798,freq=2.0), product of:
        0.16231956 = queryWeight, product of:
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.044416238 = queryNorm
        0.28263903 = fieldWeight in 5798, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5798)
  0.33333334 = coord(2/6)

Abstract: This paper presents the idea and our work of extracting and reassembling a genealogical network automatically from a collection of biographies. The network can be used as a tool for network analysis of historical persons. The data has been published as Linked Data and as an interactive online service as part of the in-use data service and semantic portal BiographySampo - Finnish Biographies on the Semantic Web.
Series: Lecture notes in computer science; vol.11762
Source: ¬The Semantic Web: ESWC 2019 Satellite Events. ESWC 2019. Ed.: P. Hitzler et al
Theme: Semantic Web

Lavoie, B.; Connaway, L.S.; Dempsey, L.: Anatomy of aggregate collections : the example of Google print for libraries (2005) 0.04
```
0.03613339 = product of:
  0.108400166 = sum of:
    0.04087325 = weight(_text_:wide in 1184) [ClassicSimilarity], result of:
      0.04087325 = score(doc=1184,freq=4.0), product of:
        0.19679762 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.044416238 = queryNorm
        0.2076918 = fieldWeight in 1184, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0234375 = fieldNorm(doc=1184)
    0.067526914 = sum of:
      0.049473554 = weight(_text_:programs in 1184) [ClassicSimilarity], result of:
        0.049473554 = score(doc=1184,freq=2.0), product of:
          0.25748047 = queryWeight, product of:
            5.79699 = idf(docFreq=364, maxDocs=44218)
            0.044416238 = queryNorm
          0.19214487 = fieldWeight in 1184, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            5.79699 = idf(docFreq=364, maxDocs=44218)
            0.0234375 = fieldNorm(doc=1184)
      0.01805336 = weight(_text_:22 in 1184) [ClassicSimilarity], result of:
        0.01805336 = score(doc=1184,freq=2.0), product of:
          0.1555381 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.044416238 = queryNorm
          0.116070345 = fieldWeight in 1184, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0234375 = fieldNorm(doc=1184)
  0.33333334 = coord(2/6)
```
Abstract

Google's December 2004 announcement of its intention to collaborate with five major research libraries - Harvard University, the University of Michigan, Stanford University, the University of Oxford, and the New York Public Library - to digitize and surface their print book collections in the Google searching universe has, predictably, stirred conflicting opinion, with some viewing the project as a welcome opportunity to enhance the visibility of library collections in new environments, and others wary of Google's prospective role as gateway to these collections. The project has been vigorously debated on discussion lists and blogs, with the participating libraries commonly referred to as "the Google 5". One point most observers seem to concede is that the questions raised by this initiative are both timely and significant. The Google Print Library Project (GPLP) has galvanized a long overdue, multi-faceted discussion about library print book collections. The print book is core to library identity and practice, but in an era of zero-sum budgeting, it is almost inevitable that print book budgets will decline as budgets for serials, digital resources, and other materials expand. As libraries re-allocate resources to accommodate changing patterns of user needs, print book budgets may be adversely impacted. Of course, the degree of impact will depend on a library's perceived mission. A public library may expect books to justify their shelf-space, with de-accession the consequence of minimal use. A national library, on the other hand, has a responsibility to the scholarly and cultural record and may seek to collect comprehensively within particular areas, with the attendant obligation to secure the long-term retention of its print book collections. The combination of limited budgets, changing user needs, and differences in library collection strategies underscores the need to think about a collective, or system-wide, print book collection - in particular, how can an inter-institutional system be organized to achieve goals that would be difficult, and/or prohibitively expensive, for any one library to undertake individually [4]? Mass digitization programs like GPLP cast new light on these and other issues surrounding the future of library print book collections, but at this early stage, it is light that illuminates only dimly. It will be some time before GPLP's implications for libraries and library print book collections can be fully appreciated and evaluated. But the strong interest and lively debate generated by this initiative suggest that some preliminary analysis - premature though it may be - would be useful, if only to undertake a rough mapping of the terrain over which GPLP potentially will extend. At the least, some early perspective helps shape interesting questions for the future, when the boundaries of GPLP become settled, workflows for producing and managing the digitized materials become systematized, and usage patterns within the GPLP framework begin to emerge.
This article offers some perspectives on GPLP in light of what is known about library print book collections in general, and those of the Google 5 in particular, from information in OCLC's WorldCat bibliographic database and holdings file. Questions addressed include: * Coverage: What proportion of the system-wide print book collection will GPLP potentially cover? What is the degree of holdings overlap across the print book collections of the five participating libraries? * Language: What is the distribution of languages associated with the print books held by the GPLP libraries? Which languages are predominant? * Copyright: What proportion of the GPLP libraries' print book holdings are out of copyright? * Works: How many distinct works are represented in the holdings of the GPLP libraries? How does a focus on works impact coverage and holdings overlap? * Convergence: What are the effects on coverage of using a different set of five libraries? What are the effects of adding the holdings of additional libraries to those of the GPLP libraries, and how do these effects vary by library type? These questions certainly do not exhaust the analytical possibilities presented by GPLP. More in-depth analysis might look at Google 5 coverage in particular subject areas; it also would be interesting to see how many books covered by the GPLP have already been digitized in other contexts. However, these questions are left to future studies. The purpose here is to explore a few basic questions raised by GPLP, and in doing so, provide an empirical context for the debate that is sure to continue for some time to come. A secondary objective is to lay some groundwork for a general set of questions that could be used to explore the implications of any mass digitization initiative. A suggested list of questions is provided in the conclusion of the article.

Date

26.12.2011 14:08:22
Baker, T.: ¬A grammar of Dublin Core (2000) 0.04
```
0.035738762 = product of:
  0.071477525 = sum of:
    0.03853567 = weight(_text_:wide in 1236) [ClassicSimilarity], result of:
      0.03853567 = score(doc=1236,freq=2.0), product of:
        0.19679762 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.044416238 = queryNorm
        0.1958137 = fieldWeight in 1236, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.03125 = fieldNorm(doc=1236)
    0.020906283 = weight(_text_:web in 1236) [ClassicSimilarity], result of:
      0.020906283 = score(doc=1236,freq=2.0), product of:
        0.14495286 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.044416238 = queryNorm
        0.14422815 = fieldWeight in 1236, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03125 = fieldNorm(doc=1236)
    0.012035574 = product of:
      0.024071148 = sum of:
        0.024071148 = weight(_text_:22 in 1236) [ClassicSimilarity], result of:
          0.024071148 = score(doc=1236,freq=2.0), product of:
            0.1555381 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044416238 = queryNorm
            0.15476047 = fieldWeight in 1236, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=1236)
      0.5 = coord(1/2)
  0.5 = coord(3/6)
```
Abstract

Dublin Core is often presented as a modern form of catalog card -- a set of elements (and now qualifiers) that describe resources in a complete package. Sometimes it is proposed as an exchange format for sharing records among multiple collections. The founding principle that "every element is optional and repeatable" reinforces the notion that a Dublin Core description is to be taken as a whole. This paper, in contrast, is based on a much different premise: Dublin Core is a language. More precisely, it is a small language for making a particular class of statements about resources. Like natural languages, it has a vocabulary of word-like terms, the two classes of which -- elements and qualifiers -- function within statements like nouns and adjectives; and it has a syntax for arranging elements and qualifiers into statements according to a simple pattern. Whenever tourists order a meal or ask directions in an unfamiliar language, considerate native speakers will spontaneously limit themselves to basic words and simple sentence patterns along the lines of "I am so-and-so" or "This is such-and-such". Linguists call this pidginization. In such situations, a small phrase book or translated menu can be most helpful. By analogy, today's Web has been called an Internet Commons where users and information providers from a wide range of scientific, commercial, and social domains present their information in a variety of incompatible data models and description languages. In this context, Dublin Core presents itself as a metadata pidgin for digital tourists who must find their way in this linguistically diverse landscape. Its vocabulary is small enough to learn quickly, and its basic pattern is easily grasped. It is well-suited to serve as an auxiliary language for digital libraries. This grammar starts by defining terms. It then follows a 200-year-old tradition of English grammar teaching by focusing on the structure of single statements. It concludes by looking at the growing dictionary of Dublin Core vocabulary terms -- its registry, and at how statements can be used to build the metadata equivalent of paragraphs and compositions -- the application profile.

Date

26.12.2011 14:01:22

Search (457 results, page 2 of 23)

Authors

Years

Types

Themes