Search (43 results, page 1 of 3)

Geisriegler, E.: Enriching electronic texts with semantic metadata : a use case for the historical Newspaper Collection ANNO (Austrian Newspapers Online) of the Austrian National Libraryhek (2012) 0.03
```
0.028608521 = product of:
  0.057217043 = sum of:
    0.008924231 = weight(_text_:in in 595) [ClassicSimilarity], result of:
      0.008924231 = score(doc=595,freq=8.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.15028831 = fieldWeight in 595, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0390625 = fieldNorm(doc=595)
    0.03350648 = weight(_text_:und in 595) [ClassicSimilarity], result of:
      0.03350648 = score(doc=595,freq=16.0), product of:
        0.09675359 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.043654136 = queryNorm
        0.34630734 = fieldWeight in 595, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.0390625 = fieldNorm(doc=595)
    0.014786332 = product of:
      0.029572664 = sum of:
        0.029572664 = weight(_text_:22 in 595) [ClassicSimilarity], result of:
          0.029572664 = score(doc=595,freq=2.0), product of:
            0.15286934 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.043654136 = queryNorm
            0.19345059 = fieldWeight in 595, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=595)
      0.5 = coord(1/2)
  0.5 = coord(3/6)
```
Abstract

Die vorliegende Master Thesis setzt sich mit der Frage nach Möglichkeiten der Anreicherung historischer Zeitungen mit semantischen Metadaten auseinander. Sie möchte außerdem analysieren, welcher Nutzen für vor allem geisteswissenschaftlich Forschende, durch die Anreicherung mit zusätzlichen Informationsquellen entsteht. Nach der Darstellung der Entwicklung der interdisziplinären 'Digital Humanities', wurde für die digitale Sammlung historischer Zeitungen (ANNO AustriaN Newspapers Online) der Österreichischen Nationalbibliothek ein Use Case entwickelt, bei dem 'Named Entities' (Personen, Orte, Organisationen und Daten) in ausgewählten Zeitungsausgaben manuell annotiert wurden. Methodisch wurde das Kodieren mit 'TEI', einem Dokumentenformat zur Kodierung und zum Austausch von Texten durchgeführt. Zusätzlich wurden zu allen annotierten 'Named Entities' Einträge in externen Datenbanken wie Wikipedia, Wikipedia Personensuche, der ehemaligen Personennamen- und Schlagwortnormdatei (jetzt Gemeinsame Normdatei GND), VIAF und dem Bildarchiv Austria gesucht und gegebenenfalls verlinkt. Eine Beschreibung der Ergebnisse des manuellen Annotierens der Zeitungsseiten schließt diesen Teil der Arbeit ab. In einem weiteren Abschnitt werden die Ergebnisse des manuellen Annotierens mit jenen Ergebnissen, die automatisch mit dem German NER (Named Entity Recognition) generiert wurden, verglichen und in ihrer Genauigkeit analysiert. Abschließend präsentiert die Arbeit einige Best Practice-Beispiele kodierter und angereicherter Zeitungsseiten, um den zusätzlichen Nutzen durch die Auszeichnung der 'Named Entities' und durch die Verlinkung mit externen Informationsquellen für die BenützerInnen darzustellen.

Date

3. 2.2013 18:00:22
Farazi, M.: Faceted lightweight ontologies : a formalization and some experiments (2010) 0.02
```
0.023466464 = product of:
  0.07039939 = sum of:
    0.057778623 = product of:
      0.17333587 = sum of:
        0.17333587 = weight(_text_:3a in 4997) [ClassicSimilarity], result of:
          0.17333587 = score(doc=4997,freq=2.0), product of:
            0.37010026 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.043654136 = queryNorm
            0.46834838 = fieldWeight in 4997, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4997)
      0.33333334 = coord(1/3)
    0.012620768 = weight(_text_:in in 4997) [ClassicSimilarity], result of:
      0.012620768 = score(doc=4997,freq=16.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.21253976 = fieldWeight in 4997, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4997)
  0.33333334 = coord(2/6)
```
Abstract

While classifications are heavily used to categorize web content, the evolution of the web foresees a more formal structure - ontology - which can serve this purpose. Ontologies are core artifacts of the Semantic Web which enable machines to use inference rules to conduct automated reasoning on data. Lightweight ontologies bridge the gap between classifications and ontologies. A lightweight ontology (LO) is an ontology representing a backbone taxonomy where the concept of the child node is more specific than the concept of the parent node. Formal lightweight ontologies can be generated from their informal ones. The key applications of formal lightweight ontologies are document classification, semantic search, and data integration. However, these applications suffer from the following problems: the disambiguation accuracy of the state of the art NLP tools used in generating formal lightweight ontologies from their informal ones; the lack of background knowledge needed for the formal lightweight ontologies; and the limitation of ontology reuse. In this dissertation, we propose a novel solution to these problems in formal lightweight ontologies; namely, faceted lightweight ontology (FLO). FLO is a lightweight ontology in which terms, present in each node label, and their concepts, are available in the background knowledge (BK), which is organized as a set of facets. A facet can be defined as a distinctive property of the groups of concepts that can help in differentiating one group from another. Background knowledge can be defined as a subset of a knowledge base, such as WordNet, and often represents a specific domain.

Content

PhD Dissertation at International Doctorate School in Information and Communication Technology. Vgl.: https%3A%2F%2Fcore.ac.uk%2Fdownload%2Fpdf%2F150083013.pdf&usg=AOvVaw2n-qisNagpyT0lli_6QbAQ.
Stojanovic, N.: Ontology-based Information Retrieval : methods and tools for cooperative query answering (2005) 0.02
```
0.01985982 = product of:
  0.05957946 = sum of:
    0.046222895 = product of:
      0.13866869 = sum of:
        0.13866869 = weight(_text_:3a in 701) [ClassicSimilarity], result of:
          0.13866869 = score(doc=701,freq=2.0), product of:
            0.37010026 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.043654136 = queryNorm
            0.3746787 = fieldWeight in 701, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.03125 = fieldNorm(doc=701)
      0.33333334 = coord(1/3)
    0.013356565 = weight(_text_:in in 701) [ClassicSimilarity], result of:
      0.013356565 = score(doc=701,freq=28.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.22493094 = fieldWeight in 701, product of:
          5.2915025 = tf(freq=28.0), with freq of:
            28.0 = termFreq=28.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.03125 = fieldNorm(doc=701)
  0.33333334 = coord(2/6)
```
Abstract

By the explosion of possibilities for a ubiquitous content production, the information overload problem reaches the level of complexity which cannot be managed by traditional modelling approaches anymore. Due to their pure syntactical nature traditional information retrieval approaches did not succeed in treating content itself (i.e. its meaning, and not its representation). This leads to a very low usefulness of the results of a retrieval process for a user's task at hand. In the last ten years ontologies have been emerged from an interesting conceptualisation paradigm to a very promising (semantic) modelling technology, especially in the context of the Semantic Web. From the information retrieval point of view, ontologies enable a machine-understandable form of content description, such that the retrieval process can be driven by the meaning of the content. However, the very ambiguous nature of the retrieval process in which a user, due to the unfamiliarity with the underlying repository and/or query syntax, just approximates his information need in a query, implies a necessity to include the user in the retrieval process more actively in order to close the gap between the meaning of the content and the meaning of a user's query (i.e. his information need). This thesis lays foundation for such an ontology-based interactive retrieval process, in which the retrieval system interacts with a user in order to conceptually interpret the meaning of his query, whereas the underlying domain ontology drives the conceptualisation process. In that way the retrieval process evolves from a query evaluation process into a highly interactive cooperation between a user and the retrieval system, in which the system tries to anticipate the user's information need and to deliver the relevant content proactively. Moreover, the notion of content relevance for a user's query evolves from a content dependent artefact to the multidimensional context-dependent structure, strongly influenced by the user's preferences. This cooperation process is realized as the so-called Librarian Agent Query Refinement Process. In order to clarify the impact of an ontology on the retrieval process (regarding its complexity and quality), a set of methods and tools for different levels of content and query formalisation is developed, ranging from pure ontology-based inferencing to keyword-based querying in which semantics automatically emerges from the results. Our evaluation studies have shown that the possibilities to conceptualize a user's information need in the right manner and to interpret the retrieval results accordingly are key issues for realizing much more meaningful information retrieval systems.

Content

Vgl.: http%3A%2F%2Fdigbib.ubka.uni-karlsruhe.de%2Fvolltexte%2Fdocuments%2F1627&ei=tAtYUYrBNoHKtQb3l4GYBw&usg=AFQjCNHeaxKkKU3-u54LWxMNYGXaaDLCGw&sig2=8WykXWQoDKjDSdGtAakH2Q&bvm=bv.44442042,d.Yms.
Xiong, C.: Knowledge based text representations for information retrieval (2016) 0.02
```
0.018977325 = product of:
  0.056931973 = sum of:
    0.046222895 = product of:
      0.13866869 = sum of:
        0.13866869 = weight(_text_:3a in 5820) [ClassicSimilarity], result of:
          0.13866869 = score(doc=5820,freq=2.0), product of:
            0.37010026 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.043654136 = queryNorm
            0.3746787 = fieldWeight in 5820, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.03125 = fieldNorm(doc=5820)
      0.33333334 = coord(1/3)
    0.010709076 = weight(_text_:in in 5820) [ClassicSimilarity], result of:
      0.010709076 = score(doc=5820,freq=18.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.18034597 = fieldWeight in 5820, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.03125 = fieldNorm(doc=5820)
  0.33333334 = coord(2/6)
```
Abstract

The successes of information retrieval (IR) in recent decades were built upon bag-of-words representations. Effective as it is, bag-of-words is only a shallow text understanding; there is a limited amount of information for document ranking in the word space. This dissertation goes beyond words and builds knowledge based text representations, which embed the external and carefully curated information from knowledge bases, and provide richer and structured evidence for more advanced information retrieval systems. This thesis research first builds query representations with entities associated with the query. Entities' descriptions are used by query expansion techniques that enrich the query with explanation terms. Then we present a general framework that represents a query with entities that appear in the query, are retrieved by the query, or frequently show up in the top retrieved documents. A latent space model is developed to jointly learn the connections from query to entities and the ranking of documents, modeling the external evidence from knowledge bases and internal ranking features cooperatively. To further improve the quality of relevant entities, a defining factor of our query representations, we introduce learning to rank to entity search and retrieve better entities from knowledge bases. In the document representation part, this thesis research also moves one step forward with a bag-of-entities model, in which documents are represented by their automatic entity annotations, and the ranking is performed in the entity space.

Content

Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Language and Information Technologies. Vgl.: https%3A%2F%2Fwww.cs.cmu.edu%2F~cx%2Fpapers%2Fknowledge_based_text_representation.pdf&usg=AOvVaw0SaTSvhWLTh__Uz_HtOtl3.
Baier Benninger, P.: Model requirements for the management of electronic records (MoReq2) : Anleitung zur Umsetzung (2011) 0.02
```
0.016106669 = product of:
  0.048320007 = sum of:
    0.010709076 = weight(_text_:in in 4343) [ClassicSimilarity], result of:
      0.010709076 = score(doc=4343,freq=8.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.18034597 = fieldWeight in 4343, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=4343)
    0.03761093 = weight(_text_:und in 4343) [ClassicSimilarity], result of:
      0.03761093 = score(doc=4343,freq=14.0), product of:
        0.09675359 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.043654136 = queryNorm
        0.38872904 = fieldWeight in 4343, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.046875 = fieldNorm(doc=4343)
  0.33333334 = coord(2/6)
```
Abstract

Viele auch kleinere Unternehmen, Verwaltungen und Organisationen sind angesichts eines wachsenden Berges von digitalen Informationen mit dem Ordnen und Strukturieren ihrer Ablagen beschäftigt. In den meisten Organisationen besteht ein Konzept der Dokumentenlenkung. Records Management verfolgt vor allem in zwei Punkten einen weiterführenden Ansatz. Zum einen stellt es über den Geschäftsalltag hinaus den Kontext und den Entstehungszusammenhang ins Zentrum und zum anderen gibt es Regeln vor, wie mit ungenutzten oder inaktiven Dokumenten zu verfahren ist. Mit den «Model Requirements for the Management of Electronic Records» - MoReq - wurde von der europäischen Kommission ein Standard geschaffen, der alle Kernbereiche des Records Managements und damit den gesamten Entstehungs-, Nutzungs-, Archivierungsund Aussonderungsbereich von Dokumenten abdeckt. In der «Anleitung zur Umsetzung» wird die umfangreiche Anforderungsliste von MoReq2 (August 2008) zusammengefasst und durch erklärende Abschnitte ergänzt, mit dem Ziel, als griffiges Instrument bei der Einführung eines Record Management Systems zu dienen.

Content

Diese Publikation entstand im Rahmen einer Bachelor Thesis zum Abschluss Bachelor of Science (BSc) FHO in Informationswissenschaft. Vgl. unter: http://www.fh-htwchur.ch/uploads/media/CSI_44_Baier.pdf.

Imprint

Chur : Hochschule für Technik und Wirtschaft / Arbeitsbereich Informationswissenschaft

Gordon, T.J.; Helmer-Hirschberg, O.: Report on a long-range forecasting study (1964) 0.01

0.014518088 = product of:
  0.04355426 = sum of:
    0.010096614 = weight(_text_:in in 4204) [ClassicSimilarity], result of:
      0.010096614 = score(doc=4204,freq=4.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.17003182 = fieldWeight in 4204, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0625 = fieldNorm(doc=4204)
    0.033457648 = product of:
      0.066915296 = sum of:
        0.066915296 = weight(_text_:22 in 4204) [ClassicSimilarity], result of:
          0.066915296 = score(doc=4204,freq=4.0), product of:
            0.15286934 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.043654136 = queryNorm
            0.4377287 = fieldWeight in 4204, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=4204)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)

Abstract: Description of an experimental trend-predicting exercise covering a time period as far as 50 years into the future. The Delphi technique is used in soliciting the opinions of experts in six areas: scientific breakthroughs, population growth, automation, space progress, probability and prevention of war, and future weapon systems. Possible objections to the approach are also discussed.
Date: 22. 6.2018 13:24:08
22. 6.2018 13:54:52

Tavakolizadeh-Ravari, M.: Analysis of the long term dynamics in thesaurus developments and its consequences (2017) 0.01
```
0.013137875 = product of:
  0.039413624 = sum of:
    0.009444519 = weight(_text_:in in 3081) [ClassicSimilarity], result of:
      0.009444519 = score(doc=3081,freq=14.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.15905021 = fieldWeight in 3081, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.03125 = fieldNorm(doc=3081)
    0.029969105 = weight(_text_:und in 3081) [ClassicSimilarity], result of:
      0.029969105 = score(doc=3081,freq=20.0), product of:
        0.09675359 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.043654136 = queryNorm
        0.3097467 = fieldWeight in 3081, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.03125 = fieldNorm(doc=3081)
  0.33333334 = coord(2/6)
```
Abstract

Die Arbeit analysiert die dynamische Entwicklung und den Gebrauch von Thesaurusbegriffen. Zusätzlich konzentriert sie sich auf die Faktoren, die die Zahl von Indexbegriffen pro Dokument oder Zeitschrift beeinflussen. Als Untersuchungsobjekt dienten der MeSH und die entsprechende Datenbank "MEDLINE". Die wichtigsten Konsequenzen sind: 1. Der MeSH-Thesaurus hat sich durch drei unterschiedliche Phasen jeweils logarithmisch entwickelt. Solch einen Thesaurus sollte folgenden Gleichung folgen: "T = 3.076,6 Ln (d) - 22.695 + 0,0039d" (T = Begriffe, Ln = natürlicher Logarithmus und d = Dokumente). Um solch einen Thesaurus zu konstruieren, muss man demnach etwa 1.600 Dokumente von unterschiedlichen Themen des Bereiches des Thesaurus haben. Die dynamische Entwicklung von Thesauri wie MeSH erfordert die Einführung eines neuen Begriffs pro Indexierung von 256 neuen Dokumenten. 2. Die Verteilung der Thesaurusbegriffe erbrachte drei Kategorien: starke, normale und selten verwendete Headings. Die letzte Gruppe ist in einer Testphase, während in der ersten und zweiten Kategorie die neu hinzukommenden Deskriptoren zu einem Thesauruswachstum führen. 3. Es gibt ein logarithmisches Verhältnis zwischen der Zahl von Index-Begriffen pro Aufsatz und dessen Seitenzahl für die Artikeln zwischen einer und einundzwanzig Seiten. 4. Zeitschriftenaufsätze, die in MEDLINE mit Abstracts erscheinen erhalten fast zwei Deskriptoren mehr. 5. Die Findablity der nicht-englisch sprachigen Dokumente in MEDLINE ist geringer als die englische Dokumente. 6. Aufsätze der Zeitschriften mit einem Impact Factor 0 bis fünfzehn erhalten nicht mehr Indexbegriffe als die der anderen von MEDINE erfassten Zeitschriften. 7. In einem Indexierungssystem haben unterschiedliche Zeitschriften mehr oder weniger Gewicht in ihrem Findability. Die Verteilung der Indexbegriffe pro Seite hat gezeigt, dass es bei MEDLINE drei Kategorien der Publikationen gibt. Außerdem gibt es wenige stark bevorzugten Zeitschriften."

Footnote

Dissertation, Humboldt-Universität zu Berlin - Institut für Bibliotheks- und Informationswissenschaft.

Imprint

Berlin : Humboldt-Universität zu Berlin / Institut für Bibliotheks- und Informationswissenschaft

Theme

Konzeption und Anwendung des Prinzips Thesaurus

Seidlmayer, E.: ¬An ontology of digital objects in philosophy : an approach for practical use in research (2018) 0.01

0.012918802 = product of:
  0.038756404 = sum of:
    0.015301868 = weight(_text_:in in 5496) [ClassicSimilarity], result of:
      0.015301868 = score(doc=5496,freq=12.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.2576908 = fieldWeight in 5496, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5496)
    0.023454536 = weight(_text_:und in 5496) [ClassicSimilarity], result of:
      0.023454536 = score(doc=5496,freq=4.0), product of:
        0.09675359 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.043654136 = queryNorm
        0.24241515 = fieldWeight in 5496, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5496)
  0.33333334 = coord(2/6)

Abstract: The digitalization of research enables new scientific insights and methods, especially in the humanities. Nonetheless, electronic book editions, encyclopedias, mobile applications or web sites presenting research projects are not in broad use in academic philosophy. This is contradictory to the large amount of helpful tools facilitating research also bearing new scientific subjects and approaches. A possible solution to this dilemma is the systematization and promotion of these tools in order to improve their accessibility and fully exploit the potential of digitalization for philosophy.
Footnote: Master thesis Library and Information Science, Fakultät für Informations- und Kommunikationswissenschaften, Technische Hochschule Köln. Schön auch: Bei Google Scholar unter 'Eva, S.' nachgewiesen.
Imprint: Köln : Technische Hochschule / Fakultät für Informations- und Kommunikationswissenschaften

Mair, M.: Increasing the value of meta data by using associative semantic networks (2002) 0.01
```
0.012568507 = product of:
  0.037705522 = sum of:
    0.009274333 = weight(_text_:in in 4972) [ClassicSimilarity], result of:
      0.009274333 = score(doc=4972,freq=6.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.1561842 = fieldWeight in 4972, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=4972)
    0.02843119 = weight(_text_:und in 4972) [ClassicSimilarity], result of:
      0.02843119 = score(doc=4972,freq=8.0), product of:
        0.09675359 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.043654136 = queryNorm
        0.29385152 = fieldWeight in 4972, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.046875 = fieldNorm(doc=4972)
  0.33333334 = coord(2/6)
```
Abstract

Momentan verbreitete Methoden zur Strukturierung von Information können ihre Aufgabe immer schlechter befriedigend erfüllen. Der Grund dafür ist das explosive Wachstum menschlichen Wissens. Diese Diplomarbeit schlägt als einen möglichen Ausweg die Verwendung assoziativer semantischer Netzwerke vor. Maschinelles Wissensmanagement kann wesentlich intuitiver und einfacher benutzbar werden, wenn man sich die Art und Weise zunutze macht, mit der das menschliche Gehirn Informationen verarbeitet (im Speziellen assoziative Verbindungen). Der theoretische Teil dieser Arbeit diskutiert verschiedene Aspekte eines möglichen Designs eines semantischen Netzwerks mit assoziativen Verbindungen. Außer den Grundelementen und Problemen der Visualisierung werden hauptsächlich Verbesserungen ausgearbeitet, welche ein leistungsstarkes Arbeiten mit einem solchen Netzwerk erlauben. Im praktischen Teil wird ein Netzwerk-Prototyp mit den wichtigsten herausgearbeiteten Merkmalen implementiert. Die Basis der Applikation bildet der Hyperwave Information Server. Dieser detailiiertere Design-Teil gewährt tieferen Einblick in Software Requirements, Use Cases und teilweise auch in Klassendetails. Am Ende wird eine kurze Einführung in die Benutzung des implementierten Prototypen gegeben.
Chen, X.: Indexing consistency between online catalogues (2008) 0.01
```
0.011775948 = product of:
  0.035327844 = sum of:
    0.006310384 = weight(_text_:in in 2209) [ClassicSimilarity], result of:
      0.006310384 = score(doc=2209,freq=4.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.10626988 = fieldWeight in 2209, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2209)
    0.029017461 = weight(_text_:und in 2209) [ClassicSimilarity], result of:
      0.029017461 = score(doc=2209,freq=12.0), product of:
        0.09675359 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.043654136 = queryNorm
        0.29991096 = fieldWeight in 2209, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2209)
  0.33333334 = coord(2/6)
```
Abstract

In der globalen Online-Umgebung stellen viele bibliographische Dienstleistungen integrierten Zugang zu unterschiedlichen internetbasierten OPACs zur Verfügung. In solch einer Umgebung erwarten Benutzer mehr Übereinstimmungen innerhalb und zwischen den Systemen zu sehen. Zweck dieser Studie ist, die Indexierungskonsistenz zwischen Systemen zu untersuchen. Währenddessen werden einige Faktoren, die die Indexierungskonsistenz beeinflussen können, untersucht. Wichtigstes Ziel dieser Studie ist, die Gründe für die Inkonsistenzen herauszufinden, damit sinnvolle Vorschläge gemacht werden können, um die Indexierungskonsistenz zu verbessern. Eine Auswahl von 3307 Monographien wurde aus zwei chinesischen bibliographischen Katalogen gewählt. Nach Hooper's Formel war die durchschnittliche Indexierungskonsistenz für Indexterme 64,2% und für Klassennummern 61,6%. Nach Rolling's Formel war sie für Indexterme 70,7% und für Klassennummern 63,4%. Mehrere Faktoren, die die Indexierungskonsistenz beeinflussen, wurden untersucht: (1) Indexierungsbereite; (2) Indexierungsspezifizität; (3) Länge der Monographien; (4) Kategorie der Indexierungssprache; (5) Sachgebiet der Monographien; (6) Entwicklung von Disziplinen; (7) Struktur des Thesaurus oder der Klassifikation; (8) Erscheinungsjahr. Gründe für die Inkonsistenzen wurden ebenfalls analysiert. Die Analyse ergab: (1) den Indexieren mangelt es an Fachwissen, Vertrautheit mit den Indexierungssprachen und den Indexierungsregeln, so dass viele Inkonsistenzen verursacht wurden; (2) der Mangel an vereinheitlichten oder präzisen Regeln brachte ebenfalls Inkonsistenzen hervor; (3) verzögerte Überarbeitungen der Indexierungssprachen, Mangel an terminologischer Kontrolle, zu wenige Erläuterungen und "siehe auch" Referenzen, sowie die hohe semantische Freiheit bei der Auswahl von Deskriptoren oder Klassen, verursachten Inkonsistenzen.

Imprint

Berlin : Humboldt-Universität / Institut für Bibliotheks- und Informationswissenschaft
Oberhauser, O.: Card-Image Public Access Catalogues (CIPACs) : a critical consideration of a cost-effective alternative to full retrospective catalogue conversion (2002) 0.01
```
0.011769604 = product of:
  0.03530881 = sum of:
    0.0054100277 = weight(_text_:in in 1703) [ClassicSimilarity], result of:
      0.0054100277 = score(doc=1703,freq=6.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.09110745 = fieldWeight in 1703, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.02734375 = fieldNorm(doc=1703)
    0.029898783 = weight(_text_:und in 1703) [ClassicSimilarity], result of:
      0.029898783 = score(doc=1703,freq=26.0), product of:
        0.09675359 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.043654136 = queryNorm
        0.3090199 = fieldWeight in 1703, product of:
          5.0990195 = tf(freq=26.0), with freq of:
            26.0 = termFreq=26.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.02734375 = fieldNorm(doc=1703)
  0.33333334 = coord(2/6)
```
Footnote

Rez. in: ABI-Technik 21(2002) H.3, S.292 (E. Pietzsch): "Otto C. Oberhauser hat mit seiner Diplomarbeit eine beeindruckende Analyse digitalisierter Zettelkataloge (CIPACs) vorgelegt. Die Arbeit wartet mit einer Fülle von Daten und Statistiken auf, wie sie bislang nicht vorgelegen haben. BibliothekarInnen, die sich mit der Digitalisierung von Katalogen tragen, finden darin eine einzigartige Vorlage zur Entscheidungsfindung. Nach einem einführenden Kapitel bringt Oberhauser zunächst einen Überblick über eine Auswahl weltweit verfügbarer CIPACs, deren Indexierungsmethode (Binäre Suche, partielle Indexierung, Suche in OCR-Daten) und stellt vergleichende Betrachtungen über geographische Verteilung, Größe, Software, Navigation und andere Eigenschaften an. Anschließend beschreibt und analysiert er Implementierungsprobleme, beginnend bei Gründen, die zur Digitalisierung führen können: Kosten, Umsetzungsdauer, Zugriffsverbesserung, Stellplatzersparnis. Er fährt fort mit technischen Aspekten wie Scannen und Qualitätskontrolle, Image Standards, OCR, manueller Nacharbeit, Servertechnologie. Dabei geht er auch auf die eher hinderlichen Eigenschaften älterer Kataloge ein sowie auf die Präsentation im Web und die Anbindung an vorhandene Opacs. Einem wichtigen Aspekt, nämlich der Beurteilung durch die wichtigste Zielgruppe, die BibliotheksbenutzerInnen, hat Oberhauser eine eigene Feldforschung gewidmet, deren Ergebnisse er im letzten Kapitel eingehend analysiert. Anhänge über die Art der Datenerhebung und Einzelbeschreibung vieler Kataloge runden die Arbeit ab. Insgesamt kann ich die Arbeit nur als die eindrucksvollste Sammlung von Daten, Statistiken und Analysen zum Thema CIPACs bezeichnen, die mir bislang begegnet ist. Auf einen schön herausgearbeiteten Einzelaspekt, nämlich die weitgehende Zersplitterung bei den eingesetzten Softwaresystemen, will ich besonders eingehen: Derzeit können wir grob zwischen Komplettlösungen (eine beauftragte Firma führt als Generalunternehmung sämtliche Aufgaben von der Digitalisierung bis zur Ablieferung der fertigen Anwendung aus) und geteilten Lösungen (die Digitalisierung wird getrennt von der Indexierung und der Softwareerstellung vergeben bzw. im eigenen Hause vorgenommen) unterscheiden. Letztere setzen ein Projektmanagement im Hause voraus. Gerade die Softwareerstellung im eigenen Haus aber kann zu Lösungen führen, die kommerziellen Angeboten keineswegs nachstehen. Schade ist nur, daß die vielfältigen Eigenentwicklungen bislang noch nicht zu Initiativen geführt haben, die, ähnlich wie bei Public Domain Software, eine "optimale", kostengünstige und weithin akzeptierte Softwarelösung zum Ziel haben. Einige kritische Anmerkungen sollen dennoch nicht unerwähnt bleiben. Beispielsweise fehlt eine Differenzierung zwischen "Reiterkarten"-Systemen, d.h. solchen mit Indexierung jeder 20. oder 50. Karte, und Systemen mit vollständiger Indexierung sämtlicher Kartenköpfe, führt doch diese weitreichende Designentscheidung zu erheblichen Kostenverschiebungen zwischen Katalogerstellung und späterer Benutzung. Auch bei den statistischen Auswertungen der Feldforschung hätte ich mir eine feinere Differenzierung nach Typ des CIPAC oder nach Bibliothek gewünscht. So haben beispielsweise mehr als die Hälfte der befragten BenutzerInnen angegeben, die Bedienung des CIPAC sei zunächst schwer verständlich oder seine Benutzung sei zeitaufwendig gewesen. Offen beibt jedoch, ob es Unterschiede zwischen den verschiedenen Realisierungstypen gibt.
Nun haben Diplomarbeiten einen eigenen Charakter. Ihre Zielsetzung ist nicht unbedingt, Handlungsleitfäden zu geben. Manche FachkollegInnen, die schon selbst mit der Digitalisierung von Katalogen zu tun hatten, fragen sich aber, ob die jeweils gefundene Lösung denn tatsächlich die "beste" erreichbare war, ob es sich lohnt, über Verbesserungen nachzudenken, wie ihre Lösung im Vergleich zu anderen steht, ob die eingesetzte Recherchesoftware gute Ergebnisse liefert, ob die zunächst vielleicht niedrigen Erstellungskosten nicht doch relativ lange Verweildauern, d.h. verdeckte Kosten, bei der Recherche zur Folge haben. Oberhauser gibt dazu lediglich am Rande einige Hinweise. Wünschenswert wäre, wenn derartige Detailuntersuchungen in weiteren Arbeiten vorgenommen würden."

Schmolz, H.: Anaphora resolution and text retrieval : a lnguistic analysis of hypertexts (2013) 0.01

0.010872297 = product of:
  0.03261689 = sum of:
    0.008924231 = weight(_text_:in in 1810) [ClassicSimilarity], result of:
      0.008924231 = score(doc=1810,freq=2.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.15028831 = fieldWeight in 1810, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.078125 = fieldNorm(doc=1810)
    0.02369266 = weight(_text_:und in 1810) [ClassicSimilarity], result of:
      0.02369266 = score(doc=1810,freq=2.0), product of:
        0.09675359 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.043654136 = queryNorm
        0.24487628 = fieldWeight in 1810, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.078125 = fieldNorm(doc=1810)
  0.33333334 = coord(2/6)

Content: Trägerin des VFI-Dissertationspreises 2014: "Überzeugende gründliche linguistische und quantitative Analyse eines im Information Retrieval bisher wenig beachteten Textelementes anhand eines eigens erstellten grossen Hypertextkorpus, einschliesslich der Evaluation selbsterstellter Auflösungsregeln für die Nutzung in künftigen IR-Systemen.".

Eckert, K.: Thesaurus analysis and visualization in semantic search applications (2007) 0.01
```
0.010046529 = product of:
  0.030139584 = sum of:
    0.0133863455 = weight(_text_:in in 3222) [ClassicSimilarity], result of:
      0.0133863455 = score(doc=3222,freq=18.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.22543246 = fieldWeight in 3222, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3222)
    0.01675324 = weight(_text_:und in 3222) [ClassicSimilarity], result of:
      0.01675324 = score(doc=3222,freq=4.0), product of:
        0.09675359 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.043654136 = queryNorm
        0.17315367 = fieldWeight in 3222, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3222)
  0.33333334 = coord(2/6)
```
Abstract

The use of thesaurus-based indexing is a common approach for increasing the performance of information retrieval. In this thesis, we examine the suitability of a thesaurus for a given set of information and evaluate improvements of existing thesauri to get better search results. On this area, we focus on two aspects: 1. We demonstrate an analysis of the indexing results achieved by an automatic document indexer and the involved thesaurus. 2. We propose a method for thesaurus evaluation which is based on a combination of statistical measures and appropriate visualization techniques that support the detection of potential problems in a thesaurus. In this chapter, we give an overview of the context of our work. Next, we briefly outline the basics of thesaurus-based information retrieval and describe the Collexis Engine that was used for our experiments. In Chapter 3, we describe two experiments in automatically indexing documents in the areas of medicine and economics with corresponding thesauri and compare the results to available manual annotations. Chapter 4 describes methods for assessing thesauri and visualizing the result in terms of a treemap. We depict examples of interesting observations supported by the method and show that we actually find critical problems. We conclude with a discussion of open questions and future research in Chapter 5.

Imprint

Mannheim : Fakultät für Mathematik und Informatik

Theme

Konzeption und Anwendung des Prinzips Thesaurus
Huo, W.: Automatic multi-word term extraction and its application to Web-page summarization (2012) 0.01
```
0.00990557 = product of:
  0.02971671 = sum of:
    0.011973113 = weight(_text_:in in 563) [ClassicSimilarity], result of:
      0.011973113 = score(doc=563,freq=10.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.20163295 = fieldWeight in 563, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.017743597 = product of:
      0.035487194 = sum of:
        0.035487194 = weight(_text_:22 in 563) [ClassicSimilarity], result of:
          0.035487194 = score(doc=563,freq=2.0), product of:
            0.15286934 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.043654136 = queryNorm
            0.23214069 = fieldWeight in 563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=563)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)
```
Abstract

In this thesis we propose three new word association measures for multi-word term extraction. We combine these association measures with LocalMaxs algorithm in our extraction model and compare the results of different multi-word term extraction methods. Our approach is language and domain independent and requires no training data. It can be applied to such tasks as text summarization, information retrieval, and document classification. We further explore the potential of using multi-word terms as an effective representation for general web-page summarization. We extract multi-word terms from human written summaries in a large collection of web-pages, and generate the summaries by aligning document words with these multi-word terms. Our system applies machine translation technology to learn the aligning process from a training set and focuses on selecting high quality multi-word terms from human written summaries to generate suitable results for web-page summarization.

Content

A Thesis presented to The University of Guelph In partial fulfilment of requirements for the degree of Master of Science in Computer Science. Vgl. Unter: http://www.inf.ufrgs.br%2F~ceramisch%2Fdownload_files%2Fpublications%2F2009%2Fp01.pdf.

Date

10. 1.2013 19:22:47
Haslhofer, B.: ¬A Web-based mapping technique for establishing metadata interoperability (2008) 0.01
```
0.009227715 = product of:
  0.027683146 = sum of:
    0.010929906 = weight(_text_:in in 3173) [ClassicSimilarity], result of:
      0.010929906 = score(doc=3173,freq=48.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.18406484 = fieldWeight in 3173, product of:
          6.928203 = tf(freq=48.0), with freq of:
            48.0 = termFreq=48.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.01953125 = fieldNorm(doc=3173)
    0.01675324 = weight(_text_:und in 3173) [ClassicSimilarity], result of:
      0.01675324 = score(doc=3173,freq=16.0), product of:
        0.09675359 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.043654136 = queryNorm
        0.17315367 = fieldWeight in 3173, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.01953125 = fieldNorm(doc=3173)
  0.33333334 = coord(2/6)
```
Abstract

The integration of metadata from distinct, heterogeneous data sources requires metadata interoperability, which is a qualitative property of metadata information objects that is not given by default. The technique of metadata mapping allows domain experts to establish metadata interoperability in a certain integration scenario. Mapping solutions, as a technical manifestation of this technique, are already available for the intensively studied domain of database system interoperability, but they rarely exist for the Web. If we consider the amount of steadily increasing structured metadata and corresponding metadata schemes on theWeb, we can observe a clear need for a mapping solution that can operate in aWeb-based environment. To achieve that, we first need to build its technical core, which is a mapping model that provides the language primitives to define mapping relationships. Existing SemanticWeb languages such as RDFS and OWL define some basic mapping elements (e.g., owl:equivalentProperty, owl:sameAs), but do not address the full spectrum of semantic and structural heterogeneities that can occur among distinct, incompatible metadata information objects. Furthermore, it is still unclear how to process defined mapping relationships during run-time in order to deliver metadata to the client in a uniform way. As the main contribution of this thesis, we present an abstract mapping model, which reflects the mapping problem on a generic level and provides the means for reconciling incompatible metadata. Instance transformation functions and URIs take a central role in that model. The former cover a broad spectrum of possible structural and semantic heterogeneities, while the latter bind the complete mapping model to the architecture of the Word Wide Web. On the concrete, language-specific level we present a binding of the abstract mapping model for the RDF Vocabulary Description Language (RDFS), which allows us to create mapping specifications among incompatible metadata schemes expressed in RDFS. The mapping model is embedded in a cyclic process that categorises the requirements a mapping solution should fulfil into four subsequent phases: mapping discovery, mapping representation, mapping execution, and mapping maintenance. In this thesis, we mainly focus on mapping representation and on the transformation of mapping specifications into executable SPARQL queries. For mapping discovery support, the model provides an interface for plugging-in schema and ontology matching algorithms. For mapping maintenance we introduce the concept of a simple, but effective mapping registry. Based on the mapping model, we propose aWeb-based mediator wrapper-architecture that allows domain experts to set up mediation endpoints that provide a uniform SPARQL query interface to a set of distributed metadata sources. The involved data sources are encapsulated by wrapper components that expose the contained metadata and the schema definitions on the Web and provide a SPARQL query interface to these metadata. In this thesis, we present the OAI2LOD Server, a wrapper component for integrating metadata that are accessible via the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). In a case study, we demonstrate how mappings can be created in aWeb environment and how our mediator wrapper architecture can easily be configured in order to integrate metadata from various heterogeneous data sources without the need to install any mapping solution or metadata integration solution in a local system environment.

Content

Die Integration von Metadaten aus unterschiedlichen, heterogenen Datenquellen erfordert Metadaten-Interoperabilität, eine Eigenschaft die nicht standardmäßig gegeben ist. Metadaten Mapping Verfahren ermöglichen es Domänenexperten Metadaten-Interoperabilität in einem bestimmten Integrationskontext herzustellen. Mapping Lösungen sollen dabei die notwendige Unterstützung bieten. Während diese für den etablierten Bereich interoperabler Datenbanken bereits existieren, ist dies für Web-Umgebungen nicht der Fall. Betrachtet man das Ausmaß ständig wachsender strukturierter Metadaten und Metadatenschemata im Web, so zeichnet sich ein Bedarf nach Web-basierten Mapping Lösungen ab. Den Kern einer solchen Lösung bildet ein Mappingmodell, das die zur Spezifikation von Mappings notwendigen Sprachkonstrukte definiert. Existierende Semantic Web Sprachen wie beispielsweise RDFS oder OWL bieten zwar grundlegende Mappingelemente (z.B.: owl:equivalentProperty, owl:sameAs), adressieren jedoch nicht das gesamte Sprektrum möglicher semantischer und struktureller Heterogenitäten, die zwischen unterschiedlichen, inkompatiblen Metadatenobjekten auftreten können. Außerdem fehlen technische Lösungsansätze zur Überführung zuvor definierter Mappings in ausfu¨hrbare Abfragen. Als zentraler wissenschaftlicher Beitrag dieser Dissertation, wird ein abstraktes Mappingmodell pr¨asentiert, welches das Mappingproblem auf generischer Ebene reflektiert und Lösungsansätze zum Abgleich inkompatibler Schemata bietet. Instanztransformationsfunktionen und URIs nehmen in diesem Modell eine zentrale Rolle ein. Erstere überbrücken ein breites Spektrum möglicher semantischer und struktureller Heterogenitäten, während letztere das Mappingmodell in die Architektur des World Wide Webs einbinden. Auf einer konkreten, sprachspezifischen Ebene wird die Anbindung des abstrakten Modells an die RDF Vocabulary Description Language (RDFS) präsentiert, wodurch ein Mapping zwischen unterschiedlichen, in RDFS ausgedrückten Metadatenschemata ermöglicht wird. Das Mappingmodell ist in einen zyklischen Mappingprozess eingebunden, der die Anforderungen an Mappinglösungen in vier aufeinanderfolgende Phasen kategorisiert: mapping discovery, mapping representation, mapping execution und mapping maintenance. Im Rahmen dieser Dissertation beschäftigen wir uns hauptsächlich mit der Representation-Phase sowie mit der Transformation von Mappingspezifikationen in ausführbare SPARQL-Abfragen. Zur Unterstützung der Discovery-Phase bietet das Mappingmodell eine Schnittstelle zur Einbindung von Schema- oder Ontologymatching-Algorithmen. Für die Maintenance-Phase präsentieren wir ein einfaches, aber seinen Zweck erfüllendes Mapping-Registry Konzept. Auf Basis des Mappingmodells stellen wir eine Web-basierte Mediator-Wrapper Architektur vor, die Domänenexperten die Möglichkeit bietet, SPARQL-Mediationsschnittstellen zu definieren. Die zu integrierenden Datenquellen müssen dafür durch Wrapper-Komponenen gekapselt werden, welche die enthaltenen Metadaten im Web exponieren und SPARQL-Zugriff ermöglichen. Als beipielhafte Wrapper Komponente präsentieren wir den OAI2LOD Server, mit dessen Hilfe Datenquellen eingebunden werden können, die ihre Metadaten über das Open Archives Initative Protocol for Metadata Harvesting (OAI-PMH) exponieren. Im Rahmen einer Fallstudie zeigen wir, wie Mappings in Web-Umgebungen erstellt werden können und wie unsere Mediator-Wrapper Architektur nach wenigen, einfachen Konfigurationsschritten Metadaten aus unterschiedlichen, heterogenen Datenquellen integrieren kann, ohne dass dadurch die Notwendigkeit entsteht, eine Mapping Lösung in einer lokalen Systemumgebung zu installieren.
Knitel, M.: ¬The application of linked data principles to library data : opportunities and challenges (2012) 0.01
```
0.008942943 = product of:
  0.02682883 = sum of:
    0.006310384 = weight(_text_:in in 599) [ClassicSimilarity], result of:
      0.006310384 = score(doc=599,freq=4.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.10626988 = fieldWeight in 599, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0390625 = fieldNorm(doc=599)
    0.020518444 = weight(_text_:und in 599) [ClassicSimilarity], result of:
      0.020518444 = score(doc=599,freq=6.0), product of:
        0.09675359 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.043654136 = queryNorm
        0.21206908 = fieldWeight in 599, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.0390625 = fieldNorm(doc=599)
  0.33333334 = coord(2/6)
```
Abstract

Linked Data hat sich im Laufe der letzten Jahre zu einem vorherrschenden Thema der Bibliothekswissenschaft entwickelt. Als ein Standard für Erfassung und Austausch von Daten, bestehen zahlreiche Berührungspunkte mit traditionellen bibliothekarischen Techniken. Diese Arbeit stellt in einem ersten Teil die grundlegenden Technologien dieses neuen Paradigmas vor, um sodann deren Anwendung auf bibliothekarische Daten zu untersuchen. Den zentralen Prinzipien der Linked Data Initiative folgend, werden dabei die Adressierung von Entitäten durch URIs, die Anwendung des RDF Datenmodells und die Verknüpfung von heterogenen Datenbeständen näher beleuchtet. Den dabei zu Tage tretenden Herausforderungen der Sicherstellung von qualitativ hochwertiger Information, der permanenten Adressierung von Inhalten im World Wide Web sowie Problemen der Interoperabilität von Metadatenstandards wird dabei besondere Aufmerksamkeit geschenkt. Der letzte Teil der Arbeit skizziert ein Programm, welches eine mögliche Erweiterung der Suchmaschine des österreichischen Bibliothekenverbundes darstellt. Dessen prototypische Umsetzung erlaubt eine realistische Einschätzung der derzeitigen Möglichkeiten von Linked Data und unterstreicht viele der vorher theoretisch erarbeiteten Themengebiete. Es zeigt sich, dass für den voll produktiven Einsatz von Linked Data noch viele Hürden zu überwinden sind. Insbesondere befinden sich viele Projekte derzeit noch in einem frühen Reifegrad. Andererseits sind die Möglichkeiten, die aus einem konsequenten Einsatz von RDF resultieren würden, vielversprechend. RDF qualifiziert sich somit als Kandidat für den Ersatz von auslaufenden bibliographischen Datenformaten wie MAB oder MARC.
Makewita, S.M.: Investigating the generic information-seeking function of organisational decision-makers : perspectives on improving organisational information systems (2002) 0.01
```
0.008863994 = product of:
  0.02659198 = sum of:
    0.011805649 = weight(_text_:in in 642) [ClassicSimilarity], result of:
      0.011805649 = score(doc=642,freq=14.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.19881277 = fieldWeight in 642, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0390625 = fieldNorm(doc=642)
    0.014786332 = product of:
      0.029572664 = sum of:
        0.029572664 = weight(_text_:22 in 642) [ClassicSimilarity], result of:
          0.029572664 = score(doc=642,freq=2.0), product of:
            0.15286934 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.043654136 = queryNorm
            0.19345059 = fieldWeight in 642, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=642)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)
```
Abstract

The past decade has seen the emergence of a new paradigm in the corporate world where organisations emphasised connectivity as a means of exposing decision-makers to wider resources of information within and outside the organisation. Many organisations followed the initiatives of enhancing infrastructures, manipulating cultural shifts and emphasising managerial commitment for creating pools and networks of knowledge. However, the concept of connectivity is not merely presenting people with the data, but more importantly, to create environments where people can seek information efficiently. This paradigm has therefore caused a shift in the function of information systems in organisations. They have to be now assessed in relation to how they underpin people's information-seeking activities within the context of their organisational environment. This research project used interpretative research methods to investigate the nature of people's information-seeking activities at two culturally contrasting organisations. Outcomes of this research project provide insights into phenomena associated with people's information-seeking function, and show how they depend on the organisational context that is defined partly by information systems. It suggests that information-seeking is not just searching for data. The inefficiencies inherent in both people and their environments can bring opaqueness into people's data, which they need to avoid or eliminate as part of seeking information. This seems to have made information-seeking a two-tier process consisting of a primary process of searching and interpreting data and auxiliary process of avoiding and eliminating opaqueness in data. Based on this view, this research suggests that organisational information systems operate naturally as implicit dual-mechanisms to underpin the above two-tier process, and that improvements to information systems should concern maintaining the balance in these dual-mechanisms.

Date

22. 7.2022 12:16:58
Kiren, T.: ¬A clustering based indexing technique of modularized ontologies for information retrieval (2017) 0.01
```
0.007512714 = product of:
  0.02253814 = sum of:
    0.010709076 = weight(_text_:in in 4399) [ClassicSimilarity], result of:
      0.010709076 = score(doc=4399,freq=18.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.18034597 = fieldWeight in 4399, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.03125 = fieldNorm(doc=4399)
    0.011829065 = product of:
      0.02365813 = sum of:
        0.02365813 = weight(_text_:22 in 4399) [ClassicSimilarity], result of:
          0.02365813 = score(doc=4399,freq=2.0), product of:
            0.15286934 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.043654136 = queryNorm
            0.15476047 = fieldWeight in 4399, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=4399)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)
```
Abstract

Indexing plays a vital role in Information Retrieval. With the availability of huge volume of information, it has become necessary to index the information in such a way to make easier for the end users to find the information they want efficiently and accurately. Keyword-based indexing uses words as indexing terms. It is not capable of capturing the implicit relation among terms or the semantics of the words in the document. To eliminate this limitation, ontology-based indexing came into existence, which allows semantic based indexing to solve complex and indirect user queries. Ontologies are used for document indexing which allows semantic based information retrieval. Existing ontologies or the ones constructed from scratch are used presently for indexing. Constructing ontologies from scratch is a labor-intensive task and requires extensive domain knowledge whereas use of an existing ontology may leave some important concepts in documents un-annotated. Using multiple ontologies can overcome the problem of missing out concepts to a great extent, but it is difficult to manage (changes in ontologies over time by their developers) multiple ontologies and ontology heterogeneity also arises due to ontologies constructed by different ontology developers. One possible solution to managing multiple ontologies and build from scratch is to use modular ontologies for indexing.
Modular ontologies are built in modular manner by combining modules from multiple relevant ontologies. Ontology heterogeneity also arises during modular ontology construction because multiple ontologies are being dealt with, during this process. Ontologies need to be aligned before using them for modular ontology construction. The existing approaches for ontology alignment compare all the concepts of each ontology to be aligned, hence not optimized in terms of time and search space utilization. A new indexing technique is proposed based on modular ontology. An efficient ontology alignment technique is proposed to solve the heterogeneity problem during the construction of modular ontology. Results are satisfactory as Precision and Recall are improved by (8%) and (10%) respectively. The value of Pearsons Correlation Coefficient for degree of similarity, time, search space requirement, precision and recall are close to 1 which shows that the results are significant. Further research can be carried out for using modular ontology based indexing technique for Multimedia Information Retrieval and Bio-Medical information retrieval.

Content

Submitted to the Faculty of the Computer Science and Engineering Department of the University of Engineering and Technology Lahore in partial fulfillment of the requirements for the Degree of Doctor of Philosophy in Computer Science (2009 - 009-PhD-CS-04). Vgl.: http://prr.hec.gov.pk/jspui/bitstream/123456789/8375/1/Taybah_Kiren_Computer_Science_HSR_2017_UET_Lahore_14.12.2017.pdf.

Date

20. 1.2015 18:30:22
Markó, K.G.: Foundation, implementation and evaluation of the MorphoSaurus system (2008) 0.00
```
0.0024966146 = product of:
  0.0149796875 = sum of:
    0.0149796875 = weight(_text_:in in 4415) [ClassicSimilarity], result of:
      0.0149796875 = score(doc=4415,freq=46.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.2522651 = fieldWeight in 4415, product of:
          6.78233 = tf(freq=46.0), with freq of:
            46.0 = termFreq=46.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.02734375 = fieldNorm(doc=4415)
  0.16666667 = coord(1/6)
```
Abstract

This work proposes an approach which is intended to meet the particular challenges of Medical Language Processing, in particular medical information retrieval. At its core lies a new type of dictionary, in which the entries are equivalence classes of subwords, i.e., semantically minimal units. These equivalence classes capture intralingual as well as interlingual synonymy. As equivalence classes abstract away from subtle particularities within and between languages and reference to them is realized via a language-independent conceptual system, they form an interlingua. In this work, the theoretical foundations of this approach are elaborated on. Furthermore, design considerations of applications based on the subword methodology are drawn up and showcase implementations are evaluated in detail. Starting with the introduction of Medical Linguistics as a field of active research in Chapter two, its consideration as a domain separated form general linguistics is motivated. In particular, morphological phenomena inherent to medical language are figured in more detail, which leads to an alternative view on medical terms and the introduction of the notion of subwords. Chapter three describes the formal foundation of subwords and the underlying linguistic declarative as well as procedural knowledge. An implementation of the subword model for the medical domain, the MorphoSaurus system, is presented in Chapter four. Emphasis will be given on the multilingual aspect of the proposed approach, including English, German, and Portuguese. The automatic acquisition of (medical) subwords for other languages (Spanish, French, and Swedish), and their integration in already available resources is described in the fifth Chapter.
The proper handling of acronyms plays a crucial role in medical texts, e.g. in patient records, as well as in scientific literature. Chapter six presents an approach, in which acronyms are automatically acquired from (bio-) medical literature. Furthermore, acronyms and their definitions in different languages are linked to each other using the MorphoSaurus text processing system. Automatic word sense disambiguation is still one of the most challenging tasks in Natural Language Processing. In Chapter seven, cross-lingual considerations lead to a new methodology for automatic disambiguation applied to subwords. Beginning with Chapter eight, a series of applications based onMorphoSaurus are introduced. Firstly, the implementation of the subword approach within a crosslanguage information retrieval setting for the medical domain is described and evaluated on standard test document collections. In Chapter nine, this methodology is extended to multilingual information retrieval in the Web, for which user queries are translated into target languages based on the segmentation into subwords and their interlingual mappings. The cross-lingual, automatic assignment of document descriptors to documents is the topic of Chapter ten. A large-scale evaluation of a heuristic, as well as a statistical algorithm is carried out using a prominent medical thesaurus as a controlled vocabulary. In Chapter eleven, it will be shown how MorphoSaurus can be used to map monolingual, lexical resources across different languages. As a result, a large multilingual medical lexicon with high coverage and complete lexical information is built and evaluated against a comparable, already available and commonly used lexical repository for the medical domain. Chapter twelve sketches a few applications based on MorphoSaurus. The generality and applicability of the subword approach to other domains is outlined, and proof-of-concepts in real-world scenarios are presented. Finally, Chapter thirteen recapitulates the most important aspects of MorphoSaurus and the potential benefit of its employment in medical information systems is carefully assessed, both for medical experts in their everyday life, but also with regard to health care consumers and their existential information needs.
Styltsvig, H.B.: Ontology-based information retrieval (2006) 0.00
```
0.0024530364 = product of:
  0.014718218 = sum of:
    0.014718218 = weight(_text_:in in 1154) [ClassicSimilarity], result of:
      0.014718218 = score(doc=1154,freq=34.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.24786183 = fieldWeight in 1154, product of:
          5.8309517 = tf(freq=34.0), with freq of:
            34.0 = termFreq=34.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.03125 = fieldNorm(doc=1154)
  0.16666667 = coord(1/6)
```
Abstract

In this thesis, we will present methods for introducing ontologies in information retrieval. The main hypothesis is that the inclusion of conceptual knowledge such as ontologies in the information retrieval process can contribute to the solution of major problems currently found in information retrieval. This utilization of ontologies has a number of challenges. Our focus is on the use of similarity measures derived from the knowledge about relations between concepts in ontologies, the recognition of semantic information in texts and the mapping of this knowledge into the ontologies in use, as well as how to fuse together the ideas of ontological similarity and ontological indexing into a realistic information retrieval scenario. To achieve the recognition of semantic knowledge in a text, shallow natural language processing is used during indexing that reveals knowledge to the level of noun phrases. Furthermore, we briefly cover the identification of semantic relations inside and between noun phrases, as well as discuss which kind of problems are caused by an increase in compoundness with respect to the structure of concepts in the evaluation of queries. Measuring similarity between concepts based on distances in the structure of the ontology is discussed. In addition, a shared nodes measure is introduced and, based on a set of intuitive similarity properties, compared to a number of different measures. In this comparison the shared nodes measure appears to be superior, though more computationally complex. Some of the major problems of shared nodes which relate to the way relations differ with respect to the degree they bring the concepts they connect closer are discussed. A generalized measure called weighted shared nodes is introduced to deal with these problems. Finally, the utilization of concept similarity in query evaluation is discussed. A semantic expansion approach that incorporates concept similarity is introduced and a generalized fuzzy set retrieval model that applies expansion during query evaluation is presented. While not commonly used in present information retrieval systems, it appears that the fuzzy set model comprises the flexibility needed when generalizing to an ontology-based retrieval model and, with the introduction of a hierarchical fuzzy aggregation principle, compound concepts can be handled in a straightforward and natural manner.

Content

A dissertation Presented to the Faculties of Roskilde University in Partial Fulfillment of the Requirement for the Degree of Doctor of Philosophy. Vgl. unter: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.117.987 oder http://coitweb.uncc.edu/~ras/RS/Onto-Retrieval.pdf.

Search (43 results, page 1 of 3)

Authors

Years

Types

Themes

Classifications