Search (427 results, page 1 of 22)

Heidorn, P.B.; Wei, Q.: Automatic metadata extraction from museum specimen labels (2008) 0.08

0.08250257 = product of:
  0.19800617 = sum of:
    0.02018312 = weight(_text_:web in 2624) [ClassicSimilarity], result of:
      0.02018312 = score(doc=2624,freq=2.0), product of:
        0.111951075 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03430388 = queryNorm
        0.18028519 = fieldWeight in 2624, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2624)
    0.0058399485 = weight(_text_:information in 2624) [ClassicSimilarity], result of:
      0.0058399485 = score(doc=2624,freq=2.0), product of:
        0.060219705 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03430388 = queryNorm
        0.09697737 = fieldWeight in 2624, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2624)
    0.13377918 = weight(_text_:extraction in 2624) [ClassicSimilarity], result of:
      0.13377918 = score(doc=2624,freq=8.0), product of:
        0.20380433 = queryWeight, product of:
          5.941145 = idf(docFreq=315, maxDocs=44218)
          0.03430388 = queryNorm
        0.6564099 = fieldWeight in 2624, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          5.941145 = idf(docFreq=315, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2624)
    0.026584659 = weight(_text_:system in 2624) [ClassicSimilarity], result of:
      0.026584659 = score(doc=2624,freq=4.0), product of:
        0.10804188 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.03430388 = queryNorm
        0.24605882 = fieldWeight in 2624, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2624)
    0.011619256 = product of:
      0.023238512 = sum of:
        0.023238512 = weight(_text_:22 in 2624) [ClassicSimilarity], result of:
          0.023238512 = score(doc=2624,freq=2.0), product of:
            0.120126344 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03430388 = queryNorm
            0.19345059 = fieldWeight in 2624, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2624)
      0.5 = coord(1/2)
  0.41666666 = coord(5/12)

Abstract: This paper describes the information properties of museum specimen labels and machine learning tools to automatically extract Darwin Core (DwC) and other metadata from these labels processed through Optical Character Recognition (OCR). The DwC is a metadata profile describing the core set of access points for search and retrieval of natural history collections and observation databases. Using the HERBIS Learning System (HLS) we extract 74 independent elements from these labels. The automated text extraction tools are provided as a web service so that users can reference digital images of specimens and receive back an extended Darwin Core XML representation of the content of the label. This automated extraction task is made more difficult by the high variability of museum label formats, OCR errors and the open class nature of some elements. In this paper we introduce our overall system architecture, and variability robust solutions including, the application of Hidden Markov and Naïve Bayes machine learning models, data cleaning, use of field element identifiers, and specialist learning models. The techniques developed here could be adapted to any metadata extraction situation with noisy text and weakly ordered elements.
Source: Metadata for semantic and social applications : proceedings of the International Conference on Dublin Core and Metadata Applications, Berlin, 22 - 26 September 2008, DC 2008: Berlin, Germany / ed. by Jane Greenberg and Wolfgang Klas

Metadata for semantic and social applications : proceedings of the International Conference on Dublin Core and Metadata Applications, Berlin, 22 - 26 September 2008, DC 2008: Berlin, Germany (2008) 0.05
```
0.051642425 = product of:
  0.123941824 = sum of:
    0.02825637 = weight(_text_:web in 2668) [ClassicSimilarity], result of:
      0.02825637 = score(doc=2668,freq=8.0), product of:
        0.111951075 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03430388 = queryNorm
        0.25239927 = fieldWeight in 2668, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.02734375 = fieldNorm(doc=2668)
    0.008175928 = weight(_text_:information in 2668) [ClassicSimilarity], result of:
      0.008175928 = score(doc=2668,freq=8.0), product of:
        0.060219705 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03430388 = queryNorm
        0.13576832 = fieldWeight in 2668, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02734375 = fieldNorm(doc=2668)
    0.06621732 = weight(_text_:extraction in 2668) [ClassicSimilarity], result of:
      0.06621732 = score(doc=2668,freq=4.0), product of:
        0.20380433 = queryWeight, product of:
          5.941145 = idf(docFreq=315, maxDocs=44218)
          0.03430388 = queryNorm
        0.32490635 = fieldWeight in 2668, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.941145 = idf(docFreq=315, maxDocs=44218)
          0.02734375 = fieldNorm(doc=2668)
    0.013158734 = weight(_text_:system in 2668) [ClassicSimilarity], result of:
      0.013158734 = score(doc=2668,freq=2.0), product of:
        0.10804188 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.03430388 = queryNorm
        0.1217929 = fieldWeight in 2668, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.02734375 = fieldNorm(doc=2668)
    0.008133478 = product of:
      0.016266957 = sum of:
        0.016266957 = weight(_text_:22 in 2668) [ClassicSimilarity], result of:
          0.016266957 = score(doc=2668,freq=2.0), product of:
            0.120126344 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03430388 = queryNorm
            0.1354154 = fieldWeight in 2668, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.02734375 = fieldNorm(doc=2668)
      0.5 = coord(1/2)
  0.41666666 = coord(5/12)
```
Abstract

Metadata is a key aspect of our evolving infrastructure for information management, social computing, and scientific collaboration. DC-2008 will focus on metadata challenges, solutions, and innovation in initiatives and activities underlying semantic and social applications. Metadata is part of the fabric of social computing, which includes the use of wikis, blogs, and tagging for collaboration and participation. Metadata also underlies the development of semantic applications, and the Semantic Web - the representation and integration of multimedia knowledge structures on the basis of semantic models. These two trends flow together in applications such as Wikipedia, where authors collectively create structured information that can be extracted and used to enhance access to and use of information sources. Recent discussion has focused on how existing bibliographic standards can be expressed as Semantic Web vocabularies to facilitate the ingration of library and cultural heritage data with other types of data. Harnessing the efforts of content providers and end-users to link, tag, edit, and describe their information in interoperable ways ("participatory metadata") is a key step towards providing knowledge environments that are scalable, self-correcting, and evolvable. DC-2008 will explore conceptual and practical issues in the development and deployment of semantic and social applications to meet the needs of specific communities of practice.

Content

Carol Jean Godby, Devon Smith, Eric Childress: Encoding Application Profiles in a Computational Model of the Crosswalk. - Maria Elisabete Catarino, Ana Alice Baptista: Relating Folksonomies with Dublin Core. - Ed Summers, Antoine Isaac, Clay Redding, Dan Krech: LCSH, SKOS and Linked Data. - Xia Lin, Jiexun Li, Xiaohua Zhou: Theme Creation for Digital Collections. - Boris Lauser, Gudrun Johannsen, Caterina Caracciolo, Willem Robert van Hage, Johannes Keizer, Philipp Mayr: Comparing Human and Automatic Thesaurus Mapping Approaches in the Agricultural Domain. - P. Bryan Heidorn, Qin Wei: Automatic Metadata Extraction From Museum Specimen Labels. - Stuart Allen Sutton, Diny Golder: Achievement Standards Network (ASN): An Application Profile for Mapping K-12 Educational Resources to Achievement Standards. - Allen H. Renear, Karen M. Wickett, Richard J. Urban, David Dubin, Sarah L. Shreeves: Collection/Item Metadata Relationships. - Seth van Hooland, Yves Bontemps, Seth Kaufman: Answering the Call for more Accountability: Applying Data Profiling to Museum Metadata. - Thomas Margaritopoulos, Merkourios Margaritopoulos, Ioannis Mavridis, Athanasios Manitsaris: A Conceptual Framework for Metadata Quality Assessment. - Miao Chen, Xiaozhong Liu, Jian Qin: Semantic Relation Extraction from Socially-Generated Tags: A Methodology for Metadata Generation. - Hak Lae Kim, Simon Scerri, John G. Breslin, Stefan Decker, Hong Gee Kim: The State of the Art in Tag Ontologies: A Semantic Model for Tagging and Folksonomies. - Martin Malmsten: Making a Library Catalogue Part of the Semantic Web. - Philipp Mayr, Vivien Petras: Building a Terminology Network for Search: The KoMoHe Project. - Michael Panzer: Cool URIs for the DDC: Towards Web-scale Accessibility of a Large Classification System. - Barbara Levergood, Stefan Farrenkopf, Elisabeth Frasnelli: The Specification of the Language of the Field and Interoperability: Cross-language Access to Catalogues and Online Libraries (CACAO)

Méndez, E.; López, L.M.; Siches, A.; Bravo, A.G.: DCMF: DC & Microformats, a good marriage (2008) 0.05

0.045156818 = product of:
  0.13547045 = sum of:
    0.03425189 = weight(_text_:web in 2634) [ClassicSimilarity], result of:
      0.03425189 = score(doc=2634,freq=4.0), product of:
        0.111951075 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03430388 = queryNorm
        0.3059541 = fieldWeight in 2634, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=2634)
    0.0070079383 = weight(_text_:information in 2634) [ClassicSimilarity], result of:
      0.0070079383 = score(doc=2634,freq=2.0), product of:
        0.060219705 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03430388 = queryNorm
        0.116372846 = fieldWeight in 2634, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2634)
    0.08026751 = weight(_text_:extraction in 2634) [ClassicSimilarity], result of:
      0.08026751 = score(doc=2634,freq=2.0), product of:
        0.20380433 = queryWeight, product of:
          5.941145 = idf(docFreq=315, maxDocs=44218)
          0.03430388 = queryNorm
        0.39384598 = fieldWeight in 2634, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.941145 = idf(docFreq=315, maxDocs=44218)
          0.046875 = fieldNorm(doc=2634)
    0.013943106 = product of:
      0.027886212 = sum of:
        0.027886212 = weight(_text_:22 in 2634) [ClassicSimilarity], result of:
          0.027886212 = score(doc=2634,freq=2.0), product of:
            0.120126344 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03430388 = queryNorm
            0.23214069 = fieldWeight in 2634, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2634)
      0.5 = coord(1/2)
  0.33333334 = coord(4/12)

Abstract: This report introduces the Dublin Core Microformats (DCMF) project, a new way to use the DC element set within X/HTML. The DC microformats encode explicit semantic expressions in an X/HTML webpage, by using a specific list of terms for values of the attributes "rev" and "rel" for <a> and <link> elements, and "class" and "id" of other elements. Microformats can be easily processed by user agents and software, enabling a high level of interoperability. These characteristics are crucial for the growing number of social applications allowing users to participate in the Web 2.0 environment as information creators and consumers. This report reviews the origins of microformats; illustrates the coding of DC microformats using the Dublin Core Metadata Gen tool, and a Firefox extension for extraction and visualization; and discusses the benefits of creating Web services utilizing DC microformats.
Source: Metadata for semantic and social applications : proceedings of the International Conference on Dublin Core and Metadata Applications, Berlin, 22 - 26 September 2008, DC 2008: Berlin, Germany / ed. by Jane Greenberg and Wolfgang Klas

Franklin, R.A.: Re-inventing subject access for the semantic web (2003) 0.04

0.043382045 = product of:
  0.13014613 = sum of:
    0.064079426 = weight(_text_:web in 2556) [ClassicSimilarity], result of:
      0.064079426 = score(doc=2556,freq=14.0), product of:
        0.111951075 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03430388 = queryNorm
        0.57238775 = fieldWeight in 2556, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=2556)
    0.0070079383 = weight(_text_:information in 2556) [ClassicSimilarity], result of:
      0.0070079383 = score(doc=2556,freq=2.0), product of:
        0.060219705 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03430388 = queryNorm
        0.116372846 = fieldWeight in 2556, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2556)
    0.04511566 = weight(_text_:system in 2556) [ClassicSimilarity], result of:
      0.04511566 = score(doc=2556,freq=8.0), product of:
        0.10804188 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.03430388 = queryNorm
        0.41757566 = fieldWeight in 2556, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.046875 = fieldNorm(doc=2556)
    0.013943106 = product of:
      0.027886212 = sum of:
        0.027886212 = weight(_text_:22 in 2556) [ClassicSimilarity], result of:
          0.027886212 = score(doc=2556,freq=2.0), product of:
            0.120126344 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03430388 = queryNorm
            0.23214069 = fieldWeight in 2556, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2556)
      0.5 = coord(1/2)
  0.33333334 = coord(4/12)

Abstract: First generation scholarly research on the Web lacked a firm system of authority control. Second generation Web research is beginning to model subject access with library science principles of bibliographic control and cataloguing. Harnessing the Web and organising the intellectual content with standards and controlled vocabulary provides precise search and retrieval capability, increasing relevance and efficient use of technology. Dublin Core metadata standards permit a full evaluation and cataloguing of Web resources appropriate to highly specific research needs and discovery. Current research points to a type of structure based on a system of faceted classification. This system allows the semantic and syntactic relationships to be defined. Controlled vocabulary, such as the Library of Congress Subject Headings, can be assigned, not in a hierarchical structure, but rather as descriptive facets of relating concepts. Web design features such as this are adding value to discovery and filtering out data that lack authority. The system design allows for scalability and extensibility, two technical features that are integral to future development of the digital library and resource discovery.
Date: 30.12.2008 18:22:46
Source: Online information review. 27(2003) no.2, S.94-101
Theme: Semantic Web

Metadata and semantics research : 9th Research Conference, MTSR 2015, Manchester, UK, September 9-11, 2015, Proceedings (2015) 0.03

0.032547407 = product of:
  0.13018963 = sum of:
    0.03425189 = weight(_text_:web in 3274) [ClassicSimilarity], result of:
      0.03425189 = score(doc=3274,freq=4.0), product of:
        0.111951075 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03430388 = queryNorm
        0.3059541 = fieldWeight in 3274, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=3274)
    0.015670227 = weight(_text_:information in 3274) [ClassicSimilarity], result of:
      0.015670227 = score(doc=3274,freq=10.0), product of:
        0.060219705 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03430388 = queryNorm
        0.2602176 = fieldWeight in 3274, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=3274)
    0.08026751 = weight(_text_:extraction in 3274) [ClassicSimilarity], result of:
      0.08026751 = score(doc=3274,freq=2.0), product of:
        0.20380433 = queryWeight, product of:
          5.941145 = idf(docFreq=315, maxDocs=44218)
          0.03430388 = queryNorm
        0.39384598 = fieldWeight in 3274, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.941145 = idf(docFreq=315, maxDocs=44218)
          0.046875 = fieldNorm(doc=3274)
  0.25 = coord(3/12)

Content: The papers are organized in several sessions and tracks: general track on ontology evolution, engineering, and frameworks, semantic Web and metadata extraction, modelling, interoperability and exploratory search, data analysis, reuse and visualization; track on digital libraries, information retrieval, linked and social data; track on metadata and semantics for open repositories, research information systems and data infrastructure; track on metadata and semantics for agriculture, food and environment; track on metadata and semantics for cultural collections and applications; track on European and national projects.
LCSH: Information storage and retrieval systems
Series: Communications in computer and information science; 544
Subject: Information storage and retrieval systems
Theme: Semantic Web

Eichmann, D.; McGregor, T.; Danley, D.: Integrating structured databases into the Web : the MORE system (1994) 0.03

0.029343026 = product of:
  0.1173721 = sum of:
    0.055933107 = weight(_text_:web in 1501) [ClassicSimilarity], result of:
      0.055933107 = score(doc=1501,freq=6.0), product of:
        0.111951075 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03430388 = queryNorm
        0.49962097 = fieldWeight in 1501, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0625 = fieldNorm(doc=1501)
    0.009343918 = weight(_text_:information in 1501) [ClassicSimilarity], result of:
      0.009343918 = score(doc=1501,freq=2.0), product of:
        0.060219705 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03430388 = queryNorm
        0.1551638 = fieldWeight in 1501, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0625 = fieldNorm(doc=1501)
    0.052095078 = weight(_text_:system in 1501) [ClassicSimilarity], result of:
      0.052095078 = score(doc=1501,freq=6.0), product of:
        0.10804188 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.03430388 = queryNorm
        0.48217484 = fieldWeight in 1501, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.0625 = fieldNorm(doc=1501)
  0.25 = coord(3/12)

Abstract: Administering large quantities of information will be an increasing problem as the WWW grows in size and popularity. The MORE system is a metadatabase repository employing Mosaic and the Web as its sole user interface. Describes the design and implementation experience in migrating a repository system onto the Web

Özel, S.A.; Altingövde, I.S.; Ulusoy, Ö.; Özsoyoglu, G.; Özsoyoglu, Z.M.: Metadata-Based Modeling of Information Resources an the Web (2004) 0.02
```
0.024453774 = product of:
  0.0978151 = sum of:
    0.06054936 = weight(_text_:web in 2093) [ClassicSimilarity], result of:
      0.06054936 = score(doc=2093,freq=18.0), product of:
        0.111951075 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03430388 = queryNorm
        0.5408555 = fieldWeight in 2093, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2093)
    0.01846754 = weight(_text_:information in 2093) [ClassicSimilarity], result of:
      0.01846754 = score(doc=2093,freq=20.0), product of:
        0.060219705 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03430388 = queryNorm
        0.30666938 = fieldWeight in 2093, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2093)
    0.018798191 = weight(_text_:system in 2093) [ClassicSimilarity], result of:
      0.018798191 = score(doc=2093,freq=2.0), product of:
        0.10804188 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.03430388 = queryNorm
        0.17398985 = fieldWeight in 2093, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2093)
  0.25 = coord(3/12)
```
Abstract

This paper deals with the problem of modeling Web information resources using expert knowledge and personalized user information for improved Web searching capabilities. We propose a "Web information space" model, which is composed of Web-based information resources (HTML/XML [Hypertext Markup Language/Extensible Markup Language] documents an the Web), expert advice repositories (domain-expert-specified metadata for information resources), and personalized information about users (captured as user profiles that indicate users' preferences about experts as well as users' knowledge about topics). Expert advice, the heart of the Web information space model, is specified using topics and relationships among topics (called metalinks), along the lines of the recently proposed topic maps. Topics and metalinks constitute metadata that describe the contents of the underlying HTML/XML Web resources. The metadata specification process is semiautomated, and it exploits XML DTDs (Document Type Definition) to allow domain-expert guided mapping of DTD elements to topics and metalinks. The expert advice is stored in an object-relational database management system (DBMS). To demonstrate the practicality and usability of the proposed Web information space model, we created a prototype expert advice repository of more than one million topics/metalinks for DBLP (Database and Logic Programming) Bibliography data set. We also present a query interface that provides sophisticated querying fa cilities for DBLP Bibliography resources using the expert advice repository.

Source

Journal of the American Society for Information Science and technology. 55(2004) no.2, S.97-110

Liechti, O.; Sifer, M.J.; Ichikawa, T.: Structured graph format : XML metadata for describing Web site structure (1998) 0.02

0.02426079 = product of:
  0.09704316 = sum of:
    0.06921369 = weight(_text_:web in 3597) [ClassicSimilarity], result of:
      0.06921369 = score(doc=3597,freq=12.0), product of:
        0.111951075 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03430388 = queryNorm
        0.6182494 = fieldWeight in 3597, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3597)
    0.0115625085 = weight(_text_:information in 3597) [ClassicSimilarity], result of:
      0.0115625085 = score(doc=3597,freq=4.0), product of:
        0.060219705 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03430388 = queryNorm
        0.1920054 = fieldWeight in 3597, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3597)
    0.016266957 = product of:
      0.032533914 = sum of:
        0.032533914 = weight(_text_:22 in 3597) [ClassicSimilarity], result of:
          0.032533914 = score(doc=3597,freq=2.0), product of:
            0.120126344 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03430388 = queryNorm
            0.2708308 = fieldWeight in 3597, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3597)
      0.5 = coord(1/2)
  0.25 = coord(3/12)

Abstract: To improve searching, filtering and processing of information on the Web, a common effort is made in the direction of metadata, defined as machine understandable information about Web resources or other things. In particular, the eXtensible Markup Language (XML) aims at providing a common syntax to emerging metadata formats. Proposes the Structured Graph Format (SGF) an XML compliant markup language based on structured graphs, for capturing Web sites' structure. Presents SGMapper, a client-site tool, which aims to facilitate navigation in large Web sites by generating highly interactive site maps using SGF metadata
Date: 1. 8.1996 22:08:06
Footnote: Contribution to a special issue devoted to the Proceedings of the 7th International World Wide Web Conference, held 14-18 April 1998, Brisbane, Australia

Strötgen, R.: Treatment of semantic heterogeneity using meta-data extraction and query translation (2002) 0.02

0.023999525 = product of:
  0.14399715 = sum of:
    0.0115625085 = weight(_text_:information in 3595) [ClassicSimilarity], result of:
      0.0115625085 = score(doc=3595,freq=4.0), product of:
        0.060219705 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03430388 = queryNorm
        0.1920054 = fieldWeight in 3595, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3595)
    0.13243464 = weight(_text_:extraction in 3595) [ClassicSimilarity], result of:
      0.13243464 = score(doc=3595,freq=4.0), product of:
        0.20380433 = queryWeight, product of:
          5.941145 = idf(docFreq=315, maxDocs=44218)
          0.03430388 = queryNorm
        0.6498127 = fieldWeight in 3595, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.941145 = idf(docFreq=315, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3595)
  0.16666667 = coord(2/12)

Abstract: The project CARMEN ("Content Analysis, Retrieval and Metadata: Effective Networking") aimed - among other goals - at improving the expansion of searches in bibliographic databases into Internet searches. We pursued a set of different approaches to the treatment of semantic heterogeneity (meta-data extraction, query translation using statistic relations and Cross-concordances). This paper describes the concepts and implementation of these approaches and the evaluation of the impact for the retrieval result.
Source: Gaining insight from research information (CRIS2002): Proceedings of the 6th International Conference an Current Research Information Systems, University of Kassel, August 29 - 31, 2002. Eds: W. Adamczak u. A. Nase

Hickey, T.R.: CORC : a system for gateway creation (2000) 0.02

0.021350518 = product of:
  0.08540207 = sum of:
    0.02825637 = weight(_text_:web in 4870) [ClassicSimilarity], result of:
      0.02825637 = score(doc=4870,freq=2.0), product of:
        0.111951075 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03430388 = queryNorm
        0.25239927 = fieldWeight in 4870, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4870)
    0.0115625085 = weight(_text_:information in 4870) [ClassicSimilarity], result of:
      0.0115625085 = score(doc=4870,freq=4.0), product of:
        0.060219705 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03430388 = queryNorm
        0.1920054 = fieldWeight in 4870, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4870)
    0.045583192 = weight(_text_:system in 4870) [ClassicSimilarity], result of:
      0.045583192 = score(doc=4870,freq=6.0), product of:
        0.10804188 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.03430388 = queryNorm
        0.42190298 = fieldWeight in 4870, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4870)
  0.25 = coord(3/12)

Abstract: CORC is an OCLC project that id developing tools and systems to enable libraries to provide enhanced access to Internet resources. By adapting and extending library techniques and procedures, we are developing a self-supporting system capable of describing a large and useful subset of the Web. CORC is more a system for hosting and supporting subject gateways than a gateway itself and relies on large-scale cooperation among libraries to maintain a centralized database. By supporting emerging metadata standards such as Dublin Core and other standards such as Unicode and RDF, CORC broadens the range of libraries and librarians able to participate. Current plans are for OCLC as a full service in July 2000
Source: Online information review. 24(2000) no.1, S.49-53
Theme: Information Gateway

Handbook of metadata, semantics and ontologies (2014) 0.02
```
0.020864412 = product of:
  0.08345765 = sum of:
    0.027966553 = weight(_text_:web in 5134) [ClassicSimilarity], result of:
      0.027966553 = score(doc=5134,freq=6.0), product of:
        0.111951075 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03430388 = queryNorm
        0.24981049 = fieldWeight in 5134, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03125 = fieldNorm(doc=5134)
    0.009343918 = weight(_text_:information in 5134) [ClassicSimilarity], result of:
      0.009343918 = score(doc=5134,freq=8.0), product of:
        0.060219705 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03430388 = queryNorm
        0.1551638 = fieldWeight in 5134, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03125 = fieldNorm(doc=5134)
    0.04614717 = product of:
      0.09229434 = sum of:
        0.09229434 = weight(_text_:aufsatzsammlung in 5134) [ClassicSimilarity], result of:
          0.09229434 = score(doc=5134,freq=4.0), product of:
            0.2250708 = queryWeight, product of:
              6.5610886 = idf(docFreq=169, maxDocs=44218)
              0.03430388 = queryNorm
            0.41006804 = fieldWeight in 5134, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              6.5610886 = idf(docFreq=169, maxDocs=44218)
              0.03125 = fieldNorm(doc=5134)
      0.5 = coord(1/2)
  0.25 = coord(3/12)
```
Abstract

Metadata research has emerged as a discipline cross-cutting many domains, focused on the provision of distributed descriptions (often called annotations) to Web resources or applications. Such associated descriptions are supposed to serve as a foundation for advanced services in many application areas, including search and location, personalization, federation of repositories and automated delivery of information. Indeed, the Semantic Web is in itself a concrete technological framework for ontology-based metadata. For example, Web-based social networking requires metadata describing people and their interrelations, and large databases with biological information use complex and detailed metadata schemas for more precise and informed search strategies. There is a wide diversity in the languages and idioms used for providing meta-descriptions, from simple structured text in metadata schemas to formal annotations using ontologies, and the technologies for storing, sharing and exploiting meta-descriptions are also diverse and evolve rapidly. In addition, there is a proliferation of schemas and standards related to metadata, resulting in a complex and moving technological landscape - hence, the need for specialized knowledge and skills in this area. The Handbook of Metadata, Semantics and Ontologies is intended as an authoritative reference for students, practitioners and researchers, serving as a roadmap for the variety of metadata schemas and ontologies available in a number of key domain areas, including culture, biology, education, healthcare, engineering and library science.

LCSH

Semantic networks (Information theory)

RSWK

Metadaten / Ontologie <Wissensverarbeitung> / Aufsatzsammlung

Subject

Metadaten / Ontologie <Wissensverarbeitung> / Aufsatzsammlung
Semantic networks (Information theory)
Laparra, E.; Binford-Walsh, A.; Emerson, K.; Miller, M.L.; López-Hoffman, L.; Currim, F.; Bethard, S.: Addressing structural hurdles for metadata extraction from environmental impact statements (2023) 0.02
```
0.02028269 = product of:
  0.12169614 = sum of:
    0.0058399485 = weight(_text_:information in 1042) [ClassicSimilarity], result of:
      0.0058399485 = score(doc=1042,freq=2.0), product of:
        0.060219705 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03430388 = queryNorm
        0.09697737 = fieldWeight in 1042, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1042)
    0.115856186 = weight(_text_:extraction in 1042) [ClassicSimilarity], result of:
      0.115856186 = score(doc=1042,freq=6.0), product of:
        0.20380433 = queryWeight, product of:
          5.941145 = idf(docFreq=315, maxDocs=44218)
          0.03430388 = queryNorm
        0.56846774 = fieldWeight in 1042, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          5.941145 = idf(docFreq=315, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1042)
  0.16666667 = coord(2/12)
```
Abstract

Natural language processing techniques can be used to analyze the linguistic content of a document to extract missing pieces of metadata. However, accurate metadata extraction may not depend solely on the linguistics, but also on structural problems such as extremely large documents, unordered multi-file documents, and inconsistency in manually labeled metadata. In this work, we start from two standard machine learning solutions to extract pieces of metadata from Environmental Impact Statements, environmental policy documents that are regularly produced under the US National Environmental Policy Act of 1969. We present a series of experiments where we evaluate how these standard approaches are affected by different issues derived from real-world data. We find that metadata extraction can be strongly influenced by nonlinguistic factors such as document length and volume ordering and that the standard machine learning solutions often do not scale well to long documents. We demonstrate how such solutions can be better adapted to these scenarios, and conclude with suggestions for other NLP practitioners cataloging large document collections.

Source

Journal of the Association for Information Science and Technology. 74(2023) no.9, S.1124-1139
Söhler, M.: "Dumm wie Google" war gestern : semantische Suche im Netz (2011) 0.02
```
0.019310515 = product of:
  0.115863085 = sum of:
    0.02825637 = weight(_text_:web in 4440) [ClassicSimilarity], result of:
      0.02825637 = score(doc=4440,freq=8.0), product of:
        0.111951075 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03430388 = queryNorm
        0.25239927 = fieldWeight in 4440, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.02734375 = fieldNorm(doc=4440)
    0.08760671 = weight(_text_:suche in 4440) [ClassicSimilarity], result of:
      0.08760671 = score(doc=4440,freq=14.0), product of:
        0.17138755 = queryWeight, product of:
          4.996156 = idf(docFreq=812, maxDocs=44218)
          0.03430388 = queryNorm
        0.51116145 = fieldWeight in 4440, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          4.996156 = idf(docFreq=812, maxDocs=44218)
          0.02734375 = fieldNorm(doc=4440)
  0.16666667 = coord(2/12)
```
Abstract

"Casablanca" bringt bei der Google-Suche Millionen Ergebnisse. Ist die Stadt gemeint oder der Film? Suchmaschinen sind dumm und schnell. Schema.org will das ändern.

Content

"6.500 Einzelsprachen so zu verstehen, dass noch die dümmsten Maschinen sie in all ihren Sätzen, Wörtern, Bedeutungen nicht nur erfassen, sondern auch verarbeiten können - das ist ein komplexer Vorgang, an dem große Teile des Internets inklusive fast aller Suchmaschinen bisher gescheitert sind. Wem schon der gerade gelesene Satz zu komplex erscheint, dem sei es einfacher ausgedrückt: Erstmal geht es um "Teekesselchen". Wörter haben oft mehrere Bedeutungen. Einige kennen den "Kanal" als künstliche Wasserstraße, andere kennen ihn vom Zappen am Fernsehgerät. Die Waage kann zum Erfassen des Gewichts nützlich sein oder zur Orientierung auf der Horoskopseite einer Zeitung. Casablanca ist eine Stadt und ein Film zugleich. Wo Menschen mit der Zeit zu unterscheiden lernen, lernen dies Suchmaschinen von selbst nicht. Nach einer entsprechenden Eingabe listen sie dumpf hintereinander weg alles auf, was sie zum Thema finden können. "Dumm wie Google", könnte man sagen, "doof wie Yahoo" oder "blöd wie Bing". Damit das nicht so bleibt, haben sich nun Google, Yahoo und die zu Microsoft gehörende Suchmaschine Bing zusammengetan, um der Suche im Netz mehr Verständnis zu verpassen. Man spricht dabei auch von einer "semantischen Suche". Das Ergebnis heißt Schema.org. Wer die Webseite einmal besucht, sich ein wenig in die Unterstrukturen hereinklickt und weder Vorkenntnisse im Programmieren noch im Bereich des semantischen Webs hat, wird sich überfordert und gelangweilt wieder abwenden.
- Neue Standards Doch was hier entstehen könnte, hat das Zeug dazu, Teile des Netzes und speziell die Funktionen von Suchmaschinen mittel- oder langfristig zu verändern. "Große Player sind dabei, sich auf Standards zu einigen", sagt Daniel Bahls, Spezialist für Semantische Technologien beim ZBW Leibniz-Informationszentrum Wirtschaft in Hamburg. "Die semantischen Technologien stehen schon seit Jahren im Raum und wurden bisher nur im kleineren Kontext verwendet." Denn Schema.org lädt Entwickler, Forscher, die Semantic-Web-Community und am Ende auch alle Betreiber von Websites dazu ein, an der Umgestaltung der Suche im Netz mitzuwirken. "Damit wollen Google, Bing und Yahoo! dem Info-Chaos im WWW den Garaus machen", schreibt André Vatter im Blog ZBW Mediatalk. Inhalte von Websites sollen mit einem speziellen, aber einheitlichen Vokabular für die Crawler der Suchmaschinen gekennzeichnet und aufbereitet werden. Indem Schlagworte, so genannte Tags, in den Code von Websites eingebettet werden, sind Suchmachinen nicht mehr so sehr auf die Analyse der natürlichen Sprache angewiesen, um Texte inhaltlich zu erfassen. Im Blog wird dies als "Semantic Web light" bezeichnet - ein semantisches Web auf niedrigster Ebene. Aber selbst das werde "schon viel bewirken", meint Bahls. "Das semantische Web wird sich über die nächsten Jahrzehnte evolutionär weiterentwickeln." Einen "Abschluss" werde es nie geben, "da eine einheitliche Formalisierung von Begrifflichkeiten auf feiner Stufe kaum möglich ist."
- "Gemeinsames Format für strukturierte Daten" Aber warum sollten Google, Yahoo und Bing plötzlich zusammenarbeiten, wo doch bisher die Konkurrenz das Verhältnis prägte? Stefan Keuchel, Pressesprecher von Google Deutschland, betont, alle beteiligten Unternehmen wollten "ein deutliches Zeichen setzen, um die Qualität der Suche zu verbessern". Man entwickele "ein gemeinsames Format für strukturierte Daten, mit dem Dinge ermöglicht werden, die heute noch nicht möglich sind - Stichwort: semantische Suche". Die Ergebnisse aus Schema.org würden "zeitnah" in die Suchmaschine integriert, "denn einen Zeitplan" gebe es nicht. "Erst mit der Einigung auf eine gemeinsame Sprache können Suchmaschinen einen Mehrwert durch semantische Technologien generieren", antwortet Daniel Bahls auf die Frage nach Gemeinsamkeit und Konkurrenz der Suchmaschinen. Er weist außerdem darauf hin, dass es bereits die semantische Suchmaschine Sig.ma gibt. Geschwindigkeit und Menge der Ergebnisse nach einer Suchanfrage spielen hier keine Rolle. Sig.ma sammelt seine Informationen allein im Bereich des semantischen Webs und listet nach einer Anfrage alles Bekannte strukturiert auf."

Schaefer, M.T.: Demystifying metadata : initiatives for web document description (1998) 0.02

0.019259349 = product of:
  0.077037394 = sum of:
    0.02825637 = weight(_text_:web in 4635) [ClassicSimilarity], result of:
      0.02825637 = score(doc=4635,freq=2.0), product of:
        0.111951075 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03430388 = queryNorm
        0.25239927 = fieldWeight in 4635, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4635)
    0.0115625085 = weight(_text_:information in 4635) [ClassicSimilarity], result of:
      0.0115625085 = score(doc=4635,freq=4.0), product of:
        0.060219705 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03430388 = queryNorm
        0.1920054 = fieldWeight in 4635, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4635)
    0.03721852 = weight(_text_:system in 4635) [ClassicSimilarity], result of:
      0.03721852 = score(doc=4635,freq=4.0), product of:
        0.10804188 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.03430388 = queryNorm
        0.34448233 = fieldWeight in 4635, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4635)
  0.25 = coord(3/12)

Abstract: Examines international efforts to promote metadata as a common, interactive resource description tool for the Internet. These efforts centre on the Dublin Core Element Set, but include qualifiers such as those promoted by the Canberra Qualifiers. The LoC Network Development and MARC Standards Office maintains the Dublin Core / MARC / GILS (Government Information Location Standards) crosswalk which maps the common and correlative elements of each system. Describes current international initiatives and issues. Describes the Nordic metadata project which is aiming to create the basic elements of a metadata production and utilization system based on the Dublin Core Metadata Element Set. Describes the WWW consortium efforts in this area
Source: Information retrieval and library automation. 33(1998) no.11, S.1-5

Brasethvik, T.: ¬A semantic modeling approach to metadata (1998) 0.02

0.018627357 = product of:
  0.07450943 = sum of:
    0.03996054 = weight(_text_:web in 5165) [ClassicSimilarity], result of:
      0.03996054 = score(doc=5165,freq=4.0), product of:
        0.111951075 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03430388 = queryNorm
        0.35694647 = fieldWeight in 5165, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5165)
    0.018281931 = weight(_text_:information in 5165) [ClassicSimilarity], result of:
      0.018281931 = score(doc=5165,freq=10.0), product of:
        0.060219705 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03430388 = queryNorm
        0.3035872 = fieldWeight in 5165, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5165)
    0.016266957 = product of:
      0.032533914 = sum of:
        0.032533914 = weight(_text_:22 in 5165) [ClassicSimilarity], result of:
          0.032533914 = score(doc=5165,freq=2.0), product of:
            0.120126344 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03430388 = queryNorm
            0.2708308 = fieldWeight in 5165, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5165)
      0.5 = coord(1/2)
  0.25 = coord(3/12)

Abstract: States that heterogeneous project groups today may be expected to use the mechanisms of the Web for sharing information. Metadata has been proposed as a mechanism for expressing the semantics of information and, hence, facilitate information retrieval, understanding and use. Presents an approach to sharing information which aims to use a semantic modeling language as the basis for expressing the semantics of information and designing metadata schemes. Functioning on the borderline between human and computer understandability, the modeling language would be able to express the semantics of published Web documents. Reporting on work in progress, presents the overall framework and ideas
Date: 9. 9.2000 17:22:23

Wolfekuhler, M.R.; Punch, W.F.: Finding salient features for personal Web pages categories (1997) 0.02

0.018346088 = product of:
  0.07338435 = sum of:
    0.048941467 = weight(_text_:web in 2673) [ClassicSimilarity], result of:
      0.048941467 = score(doc=2673,freq=6.0), product of:
        0.111951075 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03430388 = queryNorm
        0.43716836 = fieldWeight in 2673, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2673)
    0.008175928 = weight(_text_:information in 2673) [ClassicSimilarity], result of:
      0.008175928 = score(doc=2673,freq=2.0), product of:
        0.060219705 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03430388 = queryNorm
        0.13576832 = fieldWeight in 2673, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2673)
    0.016266957 = product of:
      0.032533914 = sum of:
        0.032533914 = weight(_text_:22 in 2673) [ClassicSimilarity], result of:
          0.032533914 = score(doc=2673,freq=2.0), product of:
            0.120126344 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03430388 = queryNorm
            0.2708308 = fieldWeight in 2673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2673)
      0.5 = coord(1/2)
  0.25 = coord(3/12)

Abstract: Examines techniques that discover features in sets of pre-categorized documents, such that similar documents can be found on the WWW. Examines techniques which will classifiy training examples with high accuracy, then explains why this is not necessarily useful. Describes a method for extracting word clusters from the raw document features. Results show that the clustering technique is successful in discovering word groups in personal Web pages which can be used to find similar information on the WWW
Date: 1. 8.1996 22:08:06
Footnote: Contribution to a special issue of papers from the 6th International World Wide Web conference, held 7-11 Apr 1997, Santa Clara, California

Kopácsi, S. et al.: Development of a classification server to support metadata harmonization in a long term preservation system (2016) 0.02

0.018128697 = product of:
  0.07251479 = sum of:
    0.011679897 = weight(_text_:information in 3280) [ClassicSimilarity], result of:
      0.011679897 = score(doc=3280,freq=2.0), product of:
        0.060219705 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03430388 = queryNorm
        0.19395474 = fieldWeight in 3280, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.078125 = fieldNorm(doc=3280)
    0.037596382 = weight(_text_:system in 3280) [ClassicSimilarity], result of:
      0.037596382 = score(doc=3280,freq=2.0), product of:
        0.10804188 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.03430388 = queryNorm
        0.3479797 = fieldWeight in 3280, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.078125 = fieldNorm(doc=3280)
    0.023238512 = product of:
      0.046477024 = sum of:
        0.046477024 = weight(_text_:22 in 3280) [ClassicSimilarity], result of:
          0.046477024 = score(doc=3280,freq=2.0), product of:
            0.120126344 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03430388 = queryNorm
            0.38690117 = fieldWeight in 3280, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=3280)
      0.5 = coord(1/2)
  0.25 = coord(3/12)

Series: Communications in computer and information science; 672
Source: Metadata and semantics research: 10th International Conference, MTSR 2016, Göttingen, Germany, November 22-25, 2016, Proceedings. Eds.: E. Garoufallou

Wu, C.-J.: Experiments on using the Dublin Core to reduce the retrieval error ratio (1998) 0.02

0.01718374 = product of:
  0.06873496 = sum of:
    0.02825637 = weight(_text_:web in 5201) [ClassicSimilarity], result of:
      0.02825637 = score(doc=5201,freq=2.0), product of:
        0.111951075 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03430388 = queryNorm
        0.25239927 = fieldWeight in 5201, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5201)
    0.014161124 = weight(_text_:information in 5201) [ClassicSimilarity], result of:
      0.014161124 = score(doc=5201,freq=6.0), product of:
        0.060219705 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03430388 = queryNorm
        0.23515764 = fieldWeight in 5201, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5201)
    0.026317468 = weight(_text_:system in 5201) [ClassicSimilarity], result of:
      0.026317468 = score(doc=5201,freq=2.0), product of:
        0.10804188 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.03430388 = queryNorm
        0.2435858 = fieldWeight in 5201, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5201)
  0.25 = coord(3/12)

Abstract: In order to test the power of metadata on information retrieval, an experiment was designed and conducted on a group of 7 graduate students using the Dublin Core as the cataloguing metadata. Results show that, on average, the retrieval error rate is only 2.9 per cent for the MES system (http://140.136.85.194), which utilizes the Dublin Core to describe the documents on the World Wide Web, in contrast to 20.7 per cent for the 7 famous search engines including HOTBOT, GAIS, LYCOS, EXCITE, INFOSEEK, YAHOO, and OCTOPUS. The very low error rate indicates that the users can use the information of the Dublin Core to decide whether to retrieve the documents or not
Source: Journal of library and information science. 24(1998) no.1, S.50-64

Yang, T.-H.; Hsieh, Y.-L.; Liu, S.-H.; Chang, Y.-C.; Hsu, W.-L.: ¬A flexible template generation and matching method with applications for publication reference metadata extraction (2021) 0.02
```
0.01714252 = product of:
  0.10285511 = sum of:
    0.008258934 = weight(_text_:information in 63) [ClassicSimilarity], result of:
      0.008258934 = score(doc=63,freq=4.0), product of:
        0.060219705 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03430388 = queryNorm
        0.13714671 = fieldWeight in 63, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=63)
    0.09459618 = weight(_text_:extraction in 63) [ClassicSimilarity], result of:
      0.09459618 = score(doc=63,freq=4.0), product of:
        0.20380433 = queryWeight, product of:
          5.941145 = idf(docFreq=315, maxDocs=44218)
          0.03430388 = queryNorm
        0.46415195 = fieldWeight in 63, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.941145 = idf(docFreq=315, maxDocs=44218)
          0.0390625 = fieldNorm(doc=63)
  0.16666667 = coord(2/12)
```
Abstract

Conventional rule-based approaches use exact template matching to capture linguistic information and necessarily need to enumerate all variations. We propose a novel flexible template generation and matching scheme called the principle-based approach (PBA) based on sequence alignment, and employ it for reference metadata extraction (RME) to demonstrate its effectiveness. The main contributions of this research are threefold. First, we propose an automatic template generation that can capture prominent patterns using the dominating set algorithm. Second, we devise an alignment-based template-matching technique that uses a logistic regression model, which makes it more general and flexible than pure rule-based approaches. Last, we apply PBA to RME on extensive cross-domain corpora and demonstrate its robustness and generality. Experiments reveal that the same set of templates produced by the PBA framework not only deliver consistent performance on various unseen domains, but also surpass hand-crafted knowledge (templates). We use four independent journal style test sets and one conference style test set in the experiments. When compared to renowned machine learning methods, such as conditional random fields (CRF), as well as recent deep learning methods (i.e., bi-directional long short-term memory with a CRF layer, Bi-LSTM-CRF), PBA has the best performance for all datasets.

Source

Journal of the Association for Information Science and Technology. 72(2021) no.1, S.32-45
Söhler, M.: Schluss mit Schema F (2011) 0.02
```
0.016941646 = product of:
  0.10164987 = sum of:
    0.036104664 = weight(_text_:web in 4439) [ClassicSimilarity], result of:
      0.036104664 = score(doc=4439,freq=10.0), product of:
        0.111951075 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03430388 = queryNorm
        0.32250395 = fieldWeight in 4439, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03125 = fieldNorm(doc=4439)
    0.06554521 = weight(_text_:suche in 4439) [ClassicSimilarity], result of:
      0.06554521 = score(doc=4439,freq=6.0), product of:
        0.17138755 = queryWeight, product of:
          4.996156 = idf(docFreq=812, maxDocs=44218)
          0.03430388 = queryNorm
        0.38243857 = fieldWeight in 4439, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.996156 = idf(docFreq=812, maxDocs=44218)
          0.03125 = fieldNorm(doc=4439)
  0.16666667 = coord(2/12)
```
Abstract

Mit Schema.org und dem semantischen Web sollen Suchmaschinen verstehen lernen

Content

"Wörter haben oft mehrere Bedeutungen. Einige kennen den "Kanal" als künstliche Wasserstraße, andere vom Fernsehen. Die Waage kann zum Erfassen des Gewichts nützlich sein oder zur Orientierung auf der Horoskopseite. Casablanca ist eine Stadt und ein Film zugleich. Wo Menschen mit der Zeit Bedeutungen unterscheiden und verarbeiten lernen, können dies Suchmaschinen von selbst nicht. Stets listen sie dumpf hintereinander weg alles auf, was sie zu einem Thema finden. Damit das nicht so bleibt, haben sich nun Google, Yahoo und die zu Microsoft gehörende Suchmaschine Bing zusammengetan, um der Suche im Netz mehr Verständnis zu verpassen. Man spricht dabei auch von einer "semantischen Suche". Das Ergebnis heißt Schema.org. Wer die Webseite einmal besucht, sich ein wenig in die Unterstrukturen hereinklickt und weder Vorkenntnisse im Programmieren noch im Bereich des semantischen Webs hat, wird sich überfordert und gelangweilt wieder abwenden. Doch was hier entstehen könnte, hat das Zeug dazu, Teile des Netzes und speziell die Funktionen von Suchmaschinen mittel- oder langfristig zu verändern. "Große Player sind dabei, sich auf Standards zu einigen", sagt Daniel Bahls, Spezialist für Semantische Technologien beim ZBW Leibniz-Informationszentrum Wirtschaft in Hamburg. "Die semantischen Technologien stehen schon seit Jahren im Raum und wurden bisher nur im kleineren Kontext verwendet." Denn Schema.org lädt Entwickler, Forscher, die Semantic-Web-Community und am Ende auch alle Betreiber von Websites dazu ein, an der Umgestaltung der Suche im Netz mitzuwirken. Inhalte von Websites sollen mit einem speziellen, aber einheitlichen Vokabular für die Crawler - die Analyseprogramme der Suchmaschinen - gekennzeichnet und aufbereitet werden.
Indem Schlagworte, sogenannte Tags, in den für Normal-User nicht sichtbaren Teil des Codes von Websites eingebettet werden, sind Suchmachinen nicht mehr so sehr auf die Analyse der natürlichen Sprache angewiesen, um Texte inhaltlich zu erfassen. Im Blog ZBW Mediatalk wird dies als "Semantic Web light" bezeichnet - ein semantisches Web auf niedrigster Ebene. Aber selbst das werde "schon viel bewirken", meint Bahls. "Das semantische Web wird sich über die nächsten Jahrzehnte evolutionär weiterentwickeln." Einen "Abschluss" werde es nie geben, "da eine einheitliche Formalisierung von Begrifflichkeiten auf feiner Stufe kaum möglich ist". Die Ergebnisse aus Schema.org würden "zeitnah" in die Suchmaschine integriert, "denn einen Zeitplan" gebe es nicht, so Stefan Keuchel, Pressesprecher von Google Deutschland. Bis das so weit ist, hilft der Verweis von Daniel Bahns auf die bereits existierende semantische Suchmaschine Sig.ma. Geschwindigkeit und Menge der Ergebnisse nach einer Suchanfrage spielen hier keine Rolle. Sig.ma sammelt seine Informationen allein im Bereich des semantischen Webs und listet nach einer Anfrage alles Bekannte strukturiert auf.

Search (427 results, page 1 of 22)

Authors

Years

Languages

Types

Themes

Subjects

Classifications