Search (332 results, page 2 of 17)

Jackenkroll, M.: Nutzen von XML für die Herstellung verschiedener medialer Varianten von Informationsmitteln : dargestellt am Beispiel eines geografischen Lexikonartikels (2002) 0.02
```
0.022129418 = product of:
  0.07745296 = sum of:
    0.01928249 = weight(_text_:wide in 4804) [ClassicSimilarity], result of:
      0.01928249 = score(doc=4804,freq=2.0), product of:
        0.1312982 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.029633347 = queryNorm
        0.14686027 = fieldWeight in 4804, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0234375 = fieldNorm(doc=4804)
    0.018119143 = weight(_text_:web in 4804) [ClassicSimilarity], result of:
      0.018119143 = score(doc=4804,freq=6.0), product of:
        0.09670874 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.029633347 = queryNorm
        0.18735787 = fieldWeight in 4804, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0234375 = fieldNorm(doc=4804)
    0.031063944 = weight(_text_:elektronische in 4804) [ClassicSimilarity], result of:
      0.031063944 = score(doc=4804,freq=4.0), product of:
        0.14013545 = queryWeight, product of:
          4.728978 = idf(docFreq=1061, maxDocs=44218)
          0.029633347 = queryNorm
        0.22167085 = fieldWeight in 4804, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.728978 = idf(docFreq=1061, maxDocs=44218)
          0.0234375 = fieldNorm(doc=4804)
    0.008987385 = weight(_text_:retrieval in 4804) [ClassicSimilarity], result of:
      0.008987385 = score(doc=4804,freq=2.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.10026272 = fieldWeight in 4804, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0234375 = fieldNorm(doc=4804)
  0.2857143 = coord(4/14)
```
Content

"Die Extensible Markup Language (XML) ist eine Metaauszeichnungssprache, die 1998 vom World Wide Web Consortium (W3C), einer Organisation, die sich mit der Erstellung von Web Standards und neuen Technologien für das Internet beschäftigt, als neue Empfehlung für Web-Anwendungen festgesetzt wurde. Seitdem ist viel über XML und die sich durch diese Sprache ergebenden neuen Möglichkeiten des Datenaustausches über das Internet publiziert worden. In XML-Dokumenten werden die hierarchische Struktur und der Inhalt der Dokumente festgelegt, aber keinerlei Angaben zum Layout gemacht. Dieses wird in so genannten Stylesheets definiert. Mit Hilfe mehrerer Stylesheets, die sich alle auf ein XML-Dokument beziehen, ist es möglich, aus einem Datenbestand verschiedene Ausgabeprodukte, z.B. eine Online-Version und eine druckbare Ausgabe eines Dokuments, zu erzeugen. Diese Möglichkeit der Herstellung verschiedener medialer Varianten eines Produkts ist auch für die Herstellung von Informationsmitteln interessant. Im Bereich der Produktion von Informationsmitteln, vor allem von Lexika und Enzyklopädien, ist in den letzten Jahren zu beobachten gewesen, dass neben der traditionellen, gedruckten Ausgabe des Nachschlagewerks zunehmend auch elektronische Varianten, die durch multimediale Elemente angereichert sind, angeboten werden. Diese elektronischen Nachschlagewerke werden sowohl offline, d.h. auf CD-ROM bzw. DVD, als auch online im Internet veröffentlicht. Im Gegensatz zu den gedruckten Versionen werden die neuen Produkte fast jährlich aktualisiert. Diese neue Situation erforderte Veränderungen im Herstellungsprozess. Ein Verfahren, das die Erzeugung verschiedener medialer Varianten eines Produkts möglichst einfach und problemlos ermöglicht, wurde benötigt. XML und ihr Vorgänger, die Standard Generalized Markup Language (SGML), schienen die perfekte Lösung für dieses Problem zu sein. Die Erwartungen an den Nutzen, den SGML und XML bringen könnten, waren hoch: "Allein dieses Spitzklammerformat, eingespeist in einen Datenpool, soll auf Knopfdruck die Generierung der verschiedensten Medienprodukte ermöglichen". Ziel dieser Arbeit ist es, darzustellen, wie der neue Standard XML bei der Publikation von Informationsmitteln eingesetzt werden kann, um aus einem einmal erfassten Datenbestand mit möglichst geringem Aufwand mehrere Ausgabeprodukte zu generieren. Es wird darauf eingegangen, welche Ausgabeformen sich in diesem Bereich für XML-Dokumente anbieten und mit welchen Verfahren und Hilfsmitteln die jeweiligen Ausgabeformate erstellt werden können. In diesem Zusammenhang sollen auch die Aspekte behandelt werden, die sich bei der Umwandlung von XML-Dokumenten in andere For mate unter Umständen als problematisch erweisen könnten.
Ausgehend von dieser Sachlage ergibt sich die Struktur der vorliegenden Arbeit: Einleitend werden die Metaauszeichnungssprache XML sowie einige ausgewählte Spezifikationen, die im Zusammenhang mit XML entwickelt wurden und eine sinnvolle Anwendung dieser Sprache erst ermöglichen, vorgestellt (Kapitel 2). Dieses Kapitel soll einen knappen, theoretischen Überblick darüber geben, was XML und zugehörige Ergänzungen leisten können, welche Ziele sie jeweils verfolgen und mit welchen Methoden sie versuchen, diese Ziele zu erreichen. Damit soll dieser erste Teil dazu beitragen, das Vorgehen bei der Entwicklung der späteren Beispiel-DTD und den zugehörigen Stylesheets nachvollziehbar zu machen. Daher wird hier nur auf solche Spezifikationen eingegangen, die im Zusammenhang mit der Produktion von Informationsmitteln auf XML-Basis unbedingt benötigt werden bzw. in diesem Bereich von Nutzen sind. Neben der sogenannten Dokumenttypdefinition (DTD), die die Struktur der XML-Dokumente bestimmt, sollen daher die Spezifikationen zu den Themen Linking, Transformation und Formatierung behandelt werden. Sicherlich spielen auch Techniken zur Gestaltung des Retrieval bei elektronischen Ausgaben von Informationsmitteln eine Rolle. Dieser Bereich soll hier jedoch ausgeklammert werden, um den Rahmen dieser Arbeit nicht zu sprengen. Der Schwerpunkt liegt vielmehr auf den Bereichen der Transformation und Formatierung, da diese zur Erstellung von Stylesheets und damit zur Generierung der späteren Ausgabeprodukte von zentraler Bedeutung sind.
Das folgende Kapitel (Kapitel 3) der Arbeit beschäftigt sich mit dem Themenkomplex der Informationsmittel. Hier soll herausgearbeitet werden, welche Typen von Informationsmitteln es gibt und inwieweit sich elektronische und gedruckte Informationsmittel unterscheiden. Schwerpunktmäßig soll in diesem Teil aber dargestellt werden, wie XML und die ihr verwandte, aber komplexere Metaauszeichnungssprache SGML in Verlagen zur Publikation von Informationsmitteln eingesetzt werden, welche Vorteile eine derartige Auszeichnung der Daten mit sich bringt und an welchen Stellen Probleme auftauchen. Nach dem theoretischen Teil soll im weiteren Verlauf der Arbeit (Kapitel 4) die zuvor erläuterte Vorgehensweise an einem Beispiel demonstriert und in die Praxis umgesetzt werden. Anhand eines geografischen Lexikonartikels soll gezeigt werden, wie sich eine DTD entwickeln lässt, welche die Charakteristika dieses Dokumenttyps widerspiegelt und wie verschiedene Stylesheets eingesetzt werden können, um aus dem einmal erfassten Inhalt verschiedene Ausgabeprodukte zu erzeugen. Das entworfene XML-Dokument soll in diesem Fall als HTML-Dokument, als PDF-Dokument und als leicht verändertes XML-Dokument ausgegeben werden."

Kacmaz, E.: Konzeption und Erstellung eines Online-Nachschlagewerks für den Bereich Web Usability/Accessibility (2004) 0.02

0.019643849 = product of:
  0.091671295 = sum of:
    0.039451245 = weight(_text_:web in 3699) [ClassicSimilarity], result of:
      0.039451245 = score(doc=3699,freq=4.0), product of:
        0.09670874 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.029633347 = queryNorm
        0.4079388 = fieldWeight in 3699, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0625 = fieldNorm(doc=3699)
    0.044148326 = weight(_text_:bibliothek in 3699) [ClassicSimilarity], result of:
      0.044148326 = score(doc=3699,freq=2.0), product of:
        0.121660605 = queryWeight, product of:
          4.1055303 = idf(docFreq=1980, maxDocs=44218)
          0.029633347 = queryNorm
        0.36288103 = fieldWeight in 3699, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1055303 = idf(docFreq=1980, maxDocs=44218)
          0.0625 = fieldNorm(doc=3699)
    0.008071727 = weight(_text_:information in 3699) [ClassicSimilarity], result of:
      0.008071727 = score(doc=3699,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.1551638 = fieldWeight in 3699, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0625 = fieldNorm(doc=3699)
  0.21428572 = coord(3/14)

Abstract: Schrittweise wird der lexikographische Prozeß zur Entstehung eines Online-Nachschlagewerkes dargestellt mit Hilfe eines webbasierten Content Management Systems, dessen anvisierte Benutzer die Studenten des Studienganges Bibliotheks- und Informationsmanagement der Hochschule für Angewandte Wissenschaften Hamburg sein sollen. Selbst verfaßt werden Artikel zu Themen Accessibility und Web-Usability.
Imprint: Hamburg : Hochschule für Angewandte Wissenschaften, FB Bibliothek und Information

Li, Z.: ¬A domain specific search engine with explicit document relations (2013) 0.02
```
0.01962373 = product of:
  0.09157741 = sum of:
    0.032137483 = weight(_text_:wide in 1210) [ClassicSimilarity], result of:
      0.032137483 = score(doc=1210,freq=2.0), product of:
        0.1312982 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.029633347 = queryNorm
        0.24476713 = fieldWeight in 1210, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1210)
    0.052305456 = weight(_text_:web in 1210) [ClassicSimilarity], result of:
      0.052305456 = score(doc=1210,freq=18.0), product of:
        0.09670874 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.029633347 = queryNorm
        0.5408555 = fieldWeight in 1210, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1210)
    0.0071344664 = weight(_text_:information in 1210) [ClassicSimilarity], result of:
      0.0071344664 = score(doc=1210,freq=4.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.13714671 = fieldWeight in 1210, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1210)
  0.21428572 = coord(3/14)
```
Abstract

The current web consists of documents that are highly heterogeneous and hard for machines to understand. The Semantic Web is a progressive movement of the Word Wide Web, aiming at converting the current web of unstructured documents to the web of data. In the Semantic Web, web documents are annotated with metadata using standardized ontology language. These annotated documents are directly processable by machines and it highly improves their usability and usefulness. In Ericsson, similar problems occur. There are massive documents being created with well-defined structures. Though these documents are about domain specific knowledge and can have rich relations, they are currently managed by a traditional search engine, which ignores the rich domain specific information and presents few data to users. Motivated by the Semantic Web, we aim to find standard ways to process these documents, extract rich domain specific information and annotate these data to documents with formal markup languages. We propose this project to develop a domain specific search engine for processing different documents and building explicit relations for them. This research project consists of the three main focuses: examining different domain specific documents and finding ways to extract their metadata; integrating a text search engine with an ontology server; exploring novel ways to build relations for documents. We implement this system and demonstrate its functions. As a prototype, the system provides required features and will be extended in the future.

Theme

Semantic Web
Krüger, C.: Evaluation des WWW-Suchdienstes GERHARD unter besonderer Beachtung automatischer Indexierung (1999) 0.02
```
0.019562094 = product of:
  0.091289766 = sum of:
    0.045449268 = weight(_text_:wide in 1777) [ClassicSimilarity], result of:
      0.045449268 = score(doc=1777,freq=4.0), product of:
        0.1312982 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.029633347 = queryNorm
        0.34615302 = fieldWeight in 1777, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1777)
    0.02465703 = weight(_text_:web in 1777) [ClassicSimilarity], result of:
      0.02465703 = score(doc=1777,freq=4.0), product of:
        0.09670874 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.029633347 = queryNorm
        0.25496176 = fieldWeight in 1777, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1777)
    0.021183468 = weight(_text_:retrieval in 1777) [ClassicSimilarity], result of:
      0.021183468 = score(doc=1777,freq=4.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.23632148 = fieldWeight in 1777, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1777)
  0.21428572 = coord(3/14)
```
Abstract

Die vorliegende Arbeit beinhaltet eine Beschreibung und Evaluation des WWW - Suchdienstes GERHARD (German Harvest Automated Retrieval and Directory). GERHARD ist ein Such- und Navigationssystem für das deutsche World Wide Web, weiches ausschließlich wissenschaftlich relevante Dokumente sammelt, und diese auf der Basis computerlinguistischer und statistischer Methoden automatisch mit Hilfe eines bibliothekarischen Klassifikationssystems klassifiziert. Mit dem DFG - Projekt GERHARD ist der Versuch unternommen worden, mit einem auf einem automatischen Klassifizierungsverfahren basierenden World Wide Web - Dienst eine Alternative zu herkömmlichen Methoden der Interneterschließung zu entwickeln. GERHARD ist im deutschsprachigen Raum das einzige Verzeichnis von Internetressourcen, dessen Erstellung und Aktualisierung vollständig automatisch (also maschinell) erfolgt. GERHARD beschränkt sich dabei auf den Nachweis von Dokumenten auf wissenschaftlichen WWW - Servern. Die Grundidee dabei war, kostenintensive intellektuelle Erschließung und Klassifizierung von lnternetseiten durch computerlinguistische und statistische Methoden zu ersetzen, um auf diese Weise die nachgewiesenen Internetressourcen automatisch auf das Vokabular eines bibliothekarischen Klassifikationssystems abzubilden. GERHARD steht für German Harvest Automated Retrieval and Directory. Die WWW - Adresse (URL) von GERHARD lautet: http://www.gerhard.de. Im Rahmen der vorliegenden Diplomarbeit soll eine Beschreibung des Dienstes mit besonderem Schwerpunkt auf dem zugrundeliegenden Indexierungs- bzw. Klassifizierungssystem erfolgen und anschließend mit Hilfe eines kleinen Retrievaltests die Effektivität von GERHARD überprüft werden.

Eppendahl, F.: Entwurf eines Konzepts für die elektronische Dokumentenverwaltung von Verträgen (1989) 0.02

0.019041847 = product of:
  0.13329293 = sum of:
    0.11714947 = weight(_text_:elektronische in 2745) [ClassicSimilarity], result of:
      0.11714947 = score(doc=2745,freq=2.0), product of:
        0.14013545 = queryWeight, product of:
          4.728978 = idf(docFreq=1061, maxDocs=44218)
          0.029633347 = queryNorm
        0.83597314 = fieldWeight in 2745, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.728978 = idf(docFreq=1061, maxDocs=44218)
          0.125 = fieldNorm(doc=2745)
    0.016143454 = weight(_text_:information in 2745) [ClassicSimilarity], result of:
      0.016143454 = score(doc=2745,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.3103276 = fieldWeight in 2745, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.125 = fieldNorm(doc=2745)
  0.14285715 = coord(2/14)

Imprint: Darmstadt : Fachhochschule, Fachbereich Information und Dokumentation

Vocht, L. De: Exploring semantic relationships in the Web of Data : Semantische relaties verkennen in data op het web (2017) 0.02
```
0.018600633 = product of:
  0.06510221 = sum of:
    0.016068742 = weight(_text_:wide in 4232) [ClassicSimilarity], result of:
      0.016068742 = score(doc=4232,freq=2.0), product of:
        0.1312982 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.029633347 = queryNorm
        0.122383565 = fieldWeight in 4232, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.01953125 = fieldNorm(doc=4232)
    0.034870304 = weight(_text_:web in 4232) [ClassicSimilarity], result of:
      0.034870304 = score(doc=4232,freq=32.0), product of:
        0.09670874 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.029633347 = queryNorm
        0.36057037 = fieldWeight in 4232, product of:
          5.656854 = tf(freq=32.0), with freq of:
            32.0 = termFreq=32.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.01953125 = fieldNorm(doc=4232)
    0.006673682 = weight(_text_:information in 4232) [ClassicSimilarity], result of:
      0.006673682 = score(doc=4232,freq=14.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.128289 = fieldWeight in 4232, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.01953125 = fieldNorm(doc=4232)
    0.007489487 = weight(_text_:retrieval in 4232) [ClassicSimilarity], result of:
      0.007489487 = score(doc=4232,freq=2.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.08355226 = fieldWeight in 4232, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.01953125 = fieldNorm(doc=4232)
  0.2857143 = coord(4/14)
```
Abstract

After the launch of the World Wide Web, it became clear that searching documentson the Web would not be trivial. Well-known engines to search the web, like Google, focus on search in web documents using keywords. The documents are structured and indexed to ensure keywords match documents as accurately as possible. However, searching by keywords does not always suice. It is oen the case that users do not know exactly how to formulate the search query or which keywords guarantee retrieving the most relevant documents. Besides that, it occurs that users rather want to browse information than looking up something specific. It turned out that there is need for systems that enable more interactivity and facilitate the gradual refinement of search queries to explore the Web. Users expect more from the Web because the short keyword-based queries they pose during search, do not suffice for all cases. On top of that, the Web is changing structurally. The Web comprises, apart from a collection of documents, more and more linked data, pieces of information structured so they can be processed by machines. The consequently applied semantics allow users to exactly indicate machines their search intentions. This is made possible by describing data following controlled vocabularies, concept lists composed by experts, published uniquely identifiable on the Web. Even so, it is still not trivial to explore data on the Web. There is a large variety of vocabularies and various data sources use different terms to identify the same concepts.
This PhD-thesis describes how to effectively explore linked data on the Web. The main focus is on scenarios where users want to discover relationships between resources rather than finding out more about something specific. Searching for a specific document or piece of information fits in the theoretical framework of information retrieval and is associated with exploratory search. Exploratory search goes beyond 'looking up something' when users are seeking more detailed understanding, further investigation or navigation of the initial search results. The ideas behind exploratory search and querying linked data merge when it comes to the way knowledge is represented and indexed by machines - how data is structured and stored for optimal searchability. Queries and information should be aligned to facilitate that searches also reveal connections between results. This implies that they take into account the same semantic entities, relevant at that moment. To realize this, we research three techniques that are evaluated one by one in an experimental set-up to assess how well they succeed in their goals. In the end, the techniques are applied to a practical use case that focuses on forming a bridge between the Web and the use of digital libraries in scientific research. Our first technique focuses on the interactive visualization of search results. Linked data resources can be brought in relation with each other at will. This leads to complex and diverse graphs structures. Our technique facilitates navigation and supports a workflow starting from a broad overview on the data and allows narrowing down until the desired level of detail to then broaden again. To validate the flow, two visualizations where implemented and presented to test-users. The users judged the usability of the visualizations, how the visualizations fit in the workflow and to which degree their features seemed useful for the exploration of linked data.
The ideas behind exploratory search and querying linked data merge when it comes to the way knowledge is represented and indexed by machines - how data is structured and stored for optimal searchability. eries and information should be aligned to facilitate that searches also reveal connections between results. This implies that they take into account the same semantic entities, relevant at that moment. To realize this, we research three techniques that are evaluated one by one in an experimental set-up to assess how well they succeed in their goals. In the end, the techniques are applied to a practical use case that focuses on forming a bridge between the Web and the use of digital libraries in scientific research.
When we speak about finding relationships between resources, it is necessary to dive deeper in the structure. The graph structure of linked data where the semantics give meaning to the relationships between resources enable the execution of pathfinding algorithms. The assigned weights and heuristics are base components of such algorithms and ultimately define (the order) which resources are included in a path. These paths explain indirect connections between resources. Our third technique proposes an algorithm that optimizes the choice of resources in terms of serendipity. Some optimizations guard the consistence of candidate-paths where the coherence of consecutive connections is maximized to avoid trivial and too arbitrary paths. The implementation uses the A* algorithm, the de-facto reference when it comes to heuristically optimized minimal cost paths. The effectiveness of paths was measured based on common automatic metrics and surveys where the users could indicate their preference for paths, generated each time in a different way. Finally, all our techniques are applied to a use case about publications in digital libraries where they are aligned with information about scientific conferences and researchers. The application to this use case is a practical example because the different aspects of exploratory search come together. In fact, the techniques also evolved from the experiences when implementing the use case. Practical details about the semantic model are explained and the implementation of the search system is clarified module by module. The evaluation positions the result, a prototype of a tool to explore scientific publications, researchers and conferences next to some important alternatives.

Theme

Semantic Web
Mao, M.: Ontology mapping : towards semantic interoperability in distributed and heterogeneous environments (2008) 0.02
```
0.018016124 = product of:
  0.06305643 = sum of:
    0.025709987 = weight(_text_:wide in 4659) [ClassicSimilarity], result of:
      0.025709987 = score(doc=4659,freq=2.0), product of:
        0.1312982 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.029633347 = queryNorm
        0.1958137 = fieldWeight in 4659, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.03125 = fieldNorm(doc=4659)
    0.013948122 = weight(_text_:web in 4659) [ClassicSimilarity], result of:
      0.013948122 = score(doc=4659,freq=2.0), product of:
        0.09670874 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.029633347 = queryNorm
        0.14422815 = fieldWeight in 4659, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03125 = fieldNorm(doc=4659)
    0.011415146 = weight(_text_:information in 4659) [ClassicSimilarity], result of:
      0.011415146 = score(doc=4659,freq=16.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.21943474 = fieldWeight in 4659, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03125 = fieldNorm(doc=4659)
    0.0119831795 = weight(_text_:retrieval in 4659) [ClassicSimilarity], result of:
      0.0119831795 = score(doc=4659,freq=2.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.13368362 = fieldWeight in 4659, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.03125 = fieldNorm(doc=4659)
  0.2857143 = coord(4/14)
```
Abstract

This dissertation studies ontology mapping: the problem of finding semantic correspondences between similar elements of different ontologies. In the dissertation, elements denote classes or properties of ontologies. The goal of this research is to use ontology mapping to make heterogeneous information more accessible. The World Wide Web (WWW) now is widely used as a universal medium for information exchange. Semantic interoperability among different information systems in the WWW is limited due to information heterogeneity, and the non semantic nature of HTML and URLs. Ontologies have been suggested as a way to solve the problem of information heterogeneity by providing formal, explicit definitions of data and reasoning ability over related concepts. Given that no universal ontology exists for the WWW, work has focused on finding semantic correspondences between similar elements of different ontologies, i.e., ontology mapping. Ontology mapping can be done either by hand or using automated tools. Manual mapping becomes impractical as the size and complexity of ontologies increases. Full or semi-automated mapping approaches have been examined by several research studies. Previous full or semiautomated mapping approaches include analyzing linguistic information of elements in ontologies, treating ontologies as structural graphs, applying heuristic rules and machine learning techniques, and using probabilistic and reasoning methods etc. In this paper, two generic ontology mapping approaches are proposed. One is the PRIOR+ approach, which utilizes both information retrieval and artificial intelligence techniques in the context of ontology mapping. The other is the non-instance learning based approach, which experimentally explores machine learning algorithms to solve ontology mapping problem without requesting any instance. The results of the PRIOR+ on different tests at OAEI ontology matching campaign 2007 are encouraging. The non-instance learning based approach has shown potential for solving ontology mapping problem on OAEI benchmark tests.

Content

Submitted to the Graduate Faculty of School of Information Sciences in partial fulfillment of the requirements for the degree of Doctor of Philosophy.

Schwab, U.: ¬Der Information-Highway und seine Bedeutung für das elektronische Publizieren in Zeitungs- und Zeitschriftenverlagen (1995) 0.02

0.01749747 = product of:
  0.12248229 = sum of:
    0.10250579 = weight(_text_:elektronische in 2772) [ClassicSimilarity], result of:
      0.10250579 = score(doc=2772,freq=2.0), product of:
        0.14013545 = queryWeight, product of:
          4.728978 = idf(docFreq=1061, maxDocs=44218)
          0.029633347 = queryNorm
        0.7314765 = fieldWeight in 2772, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.728978 = idf(docFreq=1061, maxDocs=44218)
          0.109375 = fieldNorm(doc=2772)
    0.019976506 = weight(_text_:information in 2772) [ClassicSimilarity], result of:
      0.019976506 = score(doc=2772,freq=4.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.3840108 = fieldWeight in 2772, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.109375 = fieldNorm(doc=2772)
  0.14285715 = coord(2/14)

Imprint: Darmstadt : Fachhochschule, Fachbereich Information und Dokumentation

Nix, M.: ¬Die praktische Einsetzbarkeit des CIDOC CRM in Informationssystemen im Bereich des Kulturerbes (2004) 0.02
```
0.017291287 = product of:
  0.08069267 = sum of:
    0.045449268 = weight(_text_:wide in 3742) [ClassicSimilarity], result of:
      0.045449268 = score(doc=3742,freq=4.0), product of:
        0.1312982 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.029633347 = queryNorm
        0.34615302 = fieldWeight in 3742, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3742)
    0.03019857 = weight(_text_:web in 3742) [ClassicSimilarity], result of:
      0.03019857 = score(doc=3742,freq=6.0), product of:
        0.09670874 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.029633347 = queryNorm
        0.3122631 = fieldWeight in 3742, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3742)
    0.0050448296 = weight(_text_:information in 3742) [ClassicSimilarity], result of:
      0.0050448296 = score(doc=3742,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.09697737 = fieldWeight in 3742, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3742)
  0.21428572 = coord(3/14)
```
Abstract

Es steht uns eine praktisch unbegrenzte Menge an Informationen über das World Wide Web zur Verfügung. Das Problem, das daraus erwächst, ist, diese Menge zu bewältigen und an die Information zu gelangen, die im Augenblick benötigt wird. Das überwältigende Angebot zwingt sowohl professionelle Anwender als auch Laien zu suchen, ungeachtet ihrer Ansprüche an die gewünschten Informationen. Um dieses Suchen effizienter zu gestalten, gibt es einerseits die Möglichkeit, leistungsstärkere Suchmaschinen zu entwickeln. Eine andere Möglichkeit ist, Daten besser zu strukturieren, um an die darin enthaltenen Informationen zu gelangen. Hoch strukturierte Daten sind maschinell verarbeitbar, sodass ein Teil der Sucharbeit automatisiert werden kann. Das Semantic Web ist die Vision eines weiterentwickelten World Wide Web, in dem derart strukturierten Daten von so genannten Softwareagenten verarbeitet werden. Die fortschreitende inhaltliche Strukturierung von Daten wird Semantisierung genannt. Im ersten Teil der Arbeit sollen einige wichtige Methoden der inhaltlichen Strukturierung von Daten skizziert werden, um die Stellung von Ontologien innerhalb der Semantisierung zu klären. Im dritten Kapitel wird der Aufbau und die Aufgabe des CIDOC Conceptual Reference Model (CRM), einer Domain Ontologie im Bereich des Kulturerbes dargestellt. Im darauf folgenden praktischen Teil werden verschiedene Ansätze zur Verwendung des CRM diskutiert und umgesetzt. Es wird ein Vorschlag zur Implementierung des Modells in XML erarbeitet. Das ist eine Möglichkeit, die dem Datentransport dient. Außerdem wird der Entwurf einer Klassenbibliothek in Java dargelegt, auf die die Verarbeitung und Nutzung des Modells innerhalb eines Informationssystems aufbauen kann.

Mateika, O,: Feasibility-Studie zur Eignung der Pressedatenbank Archimedes zum Einsatz in der Pressedokumentation des Norddeutschen Rundfunks (2004) 0.02

0.017042106 = product of:
  0.07952983 = sum of:
    0.044148326 = weight(_text_:bibliothek in 3712) [ClassicSimilarity], result of:
      0.044148326 = score(doc=3712,freq=2.0), product of:
        0.121660605 = queryWeight, product of:
          4.1055303 = idf(docFreq=1980, maxDocs=44218)
          0.029633347 = queryNorm
        0.36288103 = fieldWeight in 3712, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1055303 = idf(docFreq=1980, maxDocs=44218)
          0.0625 = fieldNorm(doc=3712)
    0.011415146 = weight(_text_:information in 3712) [ClassicSimilarity], result of:
      0.011415146 = score(doc=3712,freq=4.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.21943474 = fieldWeight in 3712, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0625 = fieldNorm(doc=3712)
    0.023966359 = weight(_text_:retrieval in 3712) [ClassicSimilarity], result of:
      0.023966359 = score(doc=3712,freq=2.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.26736724 = fieldWeight in 3712, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0625 = fieldNorm(doc=3712)
  0.21428572 = coord(3/14)

Abstract: Das Datenbanksystem Planet, derzeit eingesetzt als Information Retrieval System in Pressearchiven innerhalb des SAD-Verbunds der ARD, soll durch ein mindestens gleichwertiges System abgelöst werden. Archimedes, derzeit eingesetzt im Dokumentationsbereich des Westdeutschen Rundfunks Köln, ist eine mögliche Alternative. Ob es die Vorgaben und Anforderungen erfüllt, wird mit Hilfe einer Feasibility-Studie geprüft, notwendige Funktionalitäten und strategisch-qualitative Anforderungen bewertet.
Imprint: Hamburg : Hochschule für Angewandte Wissenschaften, FB Bibliothek und Information

Timm, A.: Fachinformation in den Bereichen Gentechnologie und Molekularbiologie am Beispiel ausgewählter Datenbanken und Dienstleistungen im World Wide Web (1996) 0.02

0.016996333 = product of:
  0.11897433 = sum of:
    0.07712996 = weight(_text_:wide in 785) [ClassicSimilarity], result of:
      0.07712996 = score(doc=785,freq=2.0), product of:
        0.1312982 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.029633347 = queryNorm
        0.5874411 = fieldWeight in 785, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.09375 = fieldNorm(doc=785)
    0.041844364 = weight(_text_:web in 785) [ClassicSimilarity], result of:
      0.041844364 = score(doc=785,freq=2.0), product of:
        0.09670874 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.029633347 = queryNorm
        0.43268442 = fieldWeight in 785, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.09375 = fieldNorm(doc=785)
  0.14285715 = coord(2/14)

Schulze, M.: ¬Das Projekt "nestor" : Aufbau eines Kompetenznetzwerks Langzeitarchivierung und Langzeitverfügbarkeit digitaler Ressourcen für Deutschland (2004) 0.02

0.016661618 = product of:
  0.116631314 = sum of:
    0.10250579 = weight(_text_:elektronische in 4534) [ClassicSimilarity], result of:
      0.10250579 = score(doc=4534,freq=2.0), product of:
        0.14013545 = queryWeight, product of:
          4.728978 = idf(docFreq=1061, maxDocs=44218)
          0.029633347 = queryNorm
        0.7314765 = fieldWeight in 4534, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.728978 = idf(docFreq=1061, maxDocs=44218)
          0.109375 = fieldNorm(doc=4534)
    0.014125523 = weight(_text_:information in 4534) [ClassicSimilarity], result of:
      0.014125523 = score(doc=4534,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.27153665 = fieldWeight in 4534, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.109375 = fieldNorm(doc=4534)
  0.14285715 = coord(2/14)

Form: Elektronische Dokumente
Imprint: Potsdam : Fachhochschule, Institut für Information und Dokumentation

Woldt, M.: Informationsmodellierung in XML : Entwicklung einer XML-Applikation für Fotodaten mit Vergleich und Übersicht bestehender Fotoapplikationen (2004) 0.02

0.01632566 = product of:
  0.07618641 = sum of:
    0.044148326 = weight(_text_:bibliothek in 3707) [ClassicSimilarity], result of:
      0.044148326 = score(doc=3707,freq=2.0), product of:
        0.121660605 = queryWeight, product of:
          4.1055303 = idf(docFreq=1980, maxDocs=44218)
          0.029633347 = queryNorm
        0.36288103 = fieldWeight in 3707, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1055303 = idf(docFreq=1980, maxDocs=44218)
          0.0625 = fieldNorm(doc=3707)
    0.008071727 = weight(_text_:information in 3707) [ClassicSimilarity], result of:
      0.008071727 = score(doc=3707,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.1551638 = fieldWeight in 3707, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0625 = fieldNorm(doc=3707)
    0.023966359 = weight(_text_:retrieval in 3707) [ClassicSimilarity], result of:
      0.023966359 = score(doc=3707,freq=2.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.26736724 = fieldWeight in 3707, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0625 = fieldNorm(doc=3707)
  0.21428572 = coord(3/14)

Abstract: Ziel ist, ein bestmögliches Retrieval von Fotodaten zu ermöglichen. Dazu werden Funktionsweisen von XML und Tochtertechnologien DTD und XML Schemata erklärt. Vier Anwendungen und Konzepte werden vorgestellt, die bei der Fotoerfassung zum Einsatz kommen. Im praxisorientierten Teil wird ein entwickelter Prototyp einer XML Applikation zum Erfassen von Fotodaten vorgestellt. Um seine Einsatzmöglichkeit und Funktionalität zu verdeutlichen, werden mit der Applikation die Daten von 100 Fotos beispielhaft aufgenommen und in XML-Dokumenten abgelegt (beiliegende CD-ROM).
Imprint: Hamburg : Hochschule für Angewandte Wissenschaften, FB Bibliothek und Information

Haveliwala, T.: Context-Sensitive Web search (2005) 0.02
```
0.016187401 = product of:
  0.075541206 = sum of:
    0.044107836 = weight(_text_:web in 2567) [ClassicSimilarity], result of:
      0.044107836 = score(doc=2567,freq=20.0), product of:
        0.09670874 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.029633347 = queryNorm
        0.45608947 = fieldWeight in 2567, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03125 = fieldNorm(doc=2567)
    0.010677892 = weight(_text_:information in 2567) [ClassicSimilarity], result of:
      0.010677892 = score(doc=2567,freq=14.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.20526241 = fieldWeight in 2567, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03125 = fieldNorm(doc=2567)
    0.020755477 = weight(_text_:retrieval in 2567) [ClassicSimilarity], result of:
      0.020755477 = score(doc=2567,freq=6.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.23154683 = fieldWeight in 2567, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.03125 = fieldNorm(doc=2567)
  0.21428572 = coord(3/14)
```
Abstract

As the Web continues to grow and encompass broader and more diverse sources of information, providing effective search facilities to users becomes an increasingly challenging problem. To help users deal with the deluge of Web-accessible information, we propose a search system which makes use of context to improve search results in a scalable way. By context, we mean any sources of information, in addition to any search query, that provide clues about the user's true information need. For instance, a user's bookmarks and search history can be considered a part of the search context. We consider two types of context-based search. The first type of functionality we consider is "similarity search." In this case, as the user is browsing Web pages, URLs for pages similar to the current page are retrieved and displayed in a side panel. No query is explicitly issued; context alone (i.e., the page currently being viewed) is used to provide the user with useful related information. The second type of functionality involves taking search context into account when ranking results to standard search queries. Web search differs from traditional information retrieval tasks in several major ways, making effective context-sensitive Web search challenging. First, scalability is of critical importance. With billions of publicly accessible documents, the Web is much larger than traditional datasets. Similarly, with millions of search queries issued each day, the query load is much higher than for traditional information retrieval systems. Second, there are no guarantees on the quality ofWeb pages, with Web-authors taking an adversarial, rather than cooperative, approach in attempts to inflate the rankings of their pages. Third, there is a significant amount of metadata embodied in the link structure corresponding to the hyperlinks between Web pages that can be exploitedduring the retrieval process. In this thesis, we design a search system, using the Stanford WebBase platform, that exploits the link structure of the Web to provide scalable, context-sensitive search.
Hannech, A.: Système de recherche d'information étendue basé sur une projection multi-espaces (2018) 0.02
```
0.016166067 = product of:
  0.05658123 = sum of:
    0.018179707 = weight(_text_:wide in 4472) [ClassicSimilarity], result of:
      0.018179707 = score(doc=4472,freq=4.0), product of:
        0.1312982 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.029633347 = queryNorm
        0.1384612 = fieldWeight in 4472, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.015625 = fieldNorm(doc=4472)
    0.019725623 = weight(_text_:web in 4472) [ClassicSimilarity], result of:
      0.019725623 = score(doc=4472,freq=16.0), product of:
        0.09670874 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.029633347 = queryNorm
        0.2039694 = fieldWeight in 4472, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.015625 = fieldNorm(doc=4472)
    0.0066927224 = weight(_text_:information in 4472) [ClassicSimilarity], result of:
      0.0066927224 = score(doc=4472,freq=22.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.12865502 = fieldWeight in 4472, product of:
          4.690416 = tf(freq=22.0), with freq of:
            22.0 = termFreq=22.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.015625 = fieldNorm(doc=4472)
    0.0119831795 = weight(_text_:retrieval in 4472) [ClassicSimilarity], result of:
      0.0119831795 = score(doc=4472,freq=8.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.13368362 = fieldWeight in 4472, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.015625 = fieldNorm(doc=4472)
  0.2857143 = coord(4/14)
```
Abstract

Depuis son apparition au début des années 90, le World Wide Web (WWW ou Web) a offert un accès universel aux connaissances et le monde de l'information a été principalement témoin d'une grande révolution (la révolution numérique). Il est devenu rapidement très populaire, ce qui a fait de lui la plus grande et vaste base de données et de connaissances existantes grâce à la quantité et la diversité des données qu'il contient. Cependant, l'augmentation et l'évolution considérables de ces données soulèvent d'importants problèmes pour les utilisateurs notamment pour l'accès aux documents les plus pertinents à leurs requêtes de recherche. Afin de faire face à cette explosion exponentielle du volume de données et faciliter leur accès par les utilisateurs, différents modèles sont proposés par les systèmes de recherche d'information (SRIs) pour la représentation et la recherche des documents web. Les SRIs traditionnels utilisent, pour indexer et récupérer ces documents, des mots-clés simples qui ne sont pas sémantiquement liés. Cela engendre des limites en termes de la pertinence et de la facilité d'exploration des résultats. Pour surmonter ces limites, les techniques existantes enrichissent les documents en intégrant des mots-clés externes provenant de différentes sources. Cependant, ces systèmes souffrent encore de limitations qui sont liées aux techniques d'exploitation de ces sources d'enrichissement. Lorsque les différentes sources sont utilisées de telle sorte qu'elles ne peuvent être distinguées par le système, cela limite la flexibilité des modèles d'exploration qui peuvent être appliqués aux résultats de recherche retournés par ce système. Les utilisateurs se sentent alors perdus devant ces résultats, et se retrouvent dans l'obligation de les filtrer manuellement pour sélectionner l'information pertinente. S'ils veulent aller plus loin, ils doivent reformuler et cibler encore plus leurs requêtes de recherche jusqu'à parvenir aux documents qui répondent le mieux à leurs attentes. De cette façon, même si les systèmes parviennent à retrouver davantage des résultats pertinents, leur présentation reste problématique. Afin de cibler la recherche à des besoins d'information plus spécifiques de l'utilisateur et améliorer la pertinence et l'exploration de ses résultats de recherche, les SRIs avancés adoptent différentes techniques de personnalisation de données qui supposent que la recherche actuelle d'un utilisateur est directement liée à son profil et/ou à ses expériences de navigation/recherche antérieures. Cependant, cette hypothèse ne tient pas dans tous les cas, les besoins de l'utilisateur évoluent au fil du temps et peuvent s'éloigner de ses intérêts antérieurs stockés dans son profil.
Dans d'autres cas, le profil de l'utilisateur peut être mal exploité pour extraire ou inférer ses nouveaux besoins en information. Ce problème est beaucoup plus accentué avec les requêtes ambigües. Lorsque plusieurs centres d'intérêt auxquels est liée une requête ambiguë sont identifiés dans le profil de l'utilisateur, le système se voit incapable de sélectionner les données pertinentes depuis ce profil pour répondre à la requête. Ceci a un impact direct sur la qualité des résultats fournis à cet utilisateur. Afin de remédier à quelques-unes de ces limitations, nous nous sommes intéressés dans ce cadre de cette thèse de recherche au développement de techniques destinées principalement à l'amélioration de la pertinence des résultats des SRIs actuels et à faciliter l'exploration de grandes collections de documents. Pour ce faire, nous proposons une solution basée sur un nouveau concept d'indexation et de recherche d'information appelé la projection multi-espaces. Cette proposition repose sur l'exploitation de différentes catégories d'information sémantiques et sociales qui permettent d'enrichir l'univers de représentation des documents et des requêtes de recherche en plusieurs dimensions d'interprétations. L'originalité de cette représentation est de pouvoir distinguer entre les différentes interprétations utilisées pour la description et la recherche des documents. Ceci donne une meilleure visibilité sur les résultats retournés et aide à apporter une meilleure flexibilité de recherche et d'exploration, en donnant à l'utilisateur la possibilité de naviguer une ou plusieurs vues de données qui l'intéressent le plus. En outre, les univers multidimensionnels de représentation proposés pour la description des documents et l'interprétation des requêtes de recherche aident à améliorer la pertinence des résultats de l'utilisateur en offrant une diversité de recherche/exploration qui aide à répondre à ses différents besoins et à ceux des autres différents utilisateurs. Cette étude exploite différents aspects liés à la recherche personnalisée et vise à résoudre les problèmes engendrés par l'évolution des besoins en information de l'utilisateur. Ainsi, lorsque le profil de cet utilisateur est utilisé par notre système, une technique est proposée et employée pour identifier les intérêts les plus représentatifs de ses besoins actuels dans son profil. Cette technique se base sur la combinaison de trois facteurs influents, notamment le facteur contextuel, fréquentiel et temporel des données. La capacité des utilisateurs à interagir, à échanger des idées et d'opinions, et à former des réseaux sociaux sur le Web, a amené les systèmes à s'intéresser aux types d'interactions de ces utilisateurs, au niveau d'interaction entre eux ainsi qu'à leurs rôles sociaux dans le système. Ces informations sociales sont abordées et intégrées dans ce travail de recherche. L'impact et la manière de leur intégration dans le processus de RI sont étudiés pour améliorer la pertinence des résultats.
Since its appearance in the early 90's, the World Wide Web (WWW or Web) has provided universal access to knowledge and the world of information has been primarily witness to a great revolution (the digital revolution). It quickly became very popular, making it the largest and most comprehensive database and knowledge base thanks to the amount and diversity of data it contains. However, the considerable increase and evolution of these data raises important problems for users, in particular for accessing the documents most relevant to their search queries. In order to cope with this exponential explosion of data volume and facilitate their access by users, various models are offered by information retrieval systems (IRS) for the representation and retrieval of web documents. Traditional SRIs use simple keywords that are not semantically linked to index and retrieve these documents. This creates limitations in terms of the relevance and ease of exploration of results. To overcome these limitations, existing techniques enrich documents by integrating external keywords from different sources. However, these systems still suffer from limitations that are related to the exploitation techniques of these sources of enrichment. When the different sources are used so that they cannot be distinguished by the system, this limits the flexibility of the exploration models that can be applied to the results returned by this system. Users then feel lost to these results, and find themselves forced to filter them manually to select the relevant information. If they want to go further, they must reformulate and target their search queries even more until they reach the documents that best meet their expectations. In this way, even if the systems manage to find more relevant results, their presentation remains problematic. In order to target research to more user-specific information needs and improve the relevance and exploration of its research findings, advanced SRIs adopt different data personalization techniques that assume that current research of user is directly related to his profile and / or previous browsing / search experiences.
However, this assumption does not hold in all cases, the needs of the user evolve over time and can move away from his previous interests stored in his profile. In other cases, the user's profile may be misused to extract or infer new information needs. This problem is much more accentuated with ambiguous queries. When multiple POIs linked to a search query are identified in the user's profile, the system is unable to select the relevant data from that profile to respond to that request. This has a direct impact on the quality of the results provided to this user. In order to overcome some of these limitations, in this research thesis, we have been interested in the development of techniques aimed mainly at improving the relevance of the results of current SRIs and facilitating the exploration of major collections of documents. To do this, we propose a solution based on a new concept and model of indexing and information retrieval called multi-spaces projection. This proposal is based on the exploitation of different categories of semantic and social information that enrich the universe of document representation and search queries in several dimensions of interpretations. The originality of this representation is to be able to distinguish between the different interpretations used for the description and the search for documents. This gives a better visibility on the results returned and helps to provide a greater flexibility of search and exploration, giving the user the ability to navigate one or more views of data that interest him the most. In addition, the proposed multidimensional representation universes for document description and search query interpretation help to improve the relevance of the user's results by providing a diversity of research / exploration that helps meet his diverse needs and those of other different users. This study exploits different aspects that are related to the personalized search and aims to solve the problems caused by the evolution of the information needs of the user. Thus, when the profile of this user is used by our system, a technique is proposed and used to identify the interests most representative of his current needs in his profile. This technique is based on the combination of three influential factors, including the contextual, frequency and temporal factor of the data. The ability of users to interact, exchange ideas and opinions, and form social networks on the Web, has led systems to focus on the types of interactions these users have at the level of interaction between them as well as their social roles in the system. This social information is discussed and integrated into this research work. The impact and how they are integrated into the IR process are studied to improve the relevance of the results.

Theme

Semantisches Umfeld in Indexierung u. Retrieval
Bachfeld, S.: Möglichkeiten und Grenzen linguistischer Verfahren der automatischen Indexierung : Entwurf einer Simulation für den Einsatz im Grundstudium (2003) 0.02
```
0.0159667 = product of:
  0.07451127 = sum of:
    0.029287368 = weight(_text_:elektronische in 2827) [ClassicSimilarity], result of:
      0.029287368 = score(doc=2827,freq=2.0), product of:
        0.14013545 = queryWeight, product of:
          4.728978 = idf(docFreq=1061, maxDocs=44218)
          0.029633347 = queryNorm
        0.20899329 = fieldWeight in 2827, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.728978 = idf(docFreq=1061, maxDocs=44218)
          0.03125 = fieldNorm(doc=2827)
    0.038233574 = weight(_text_:bibliothek in 2827) [ClassicSimilarity], result of:
      0.038233574 = score(doc=2827,freq=6.0), product of:
        0.121660605 = queryWeight, product of:
          4.1055303 = idf(docFreq=1980, maxDocs=44218)
          0.029633347 = queryNorm
        0.3142642 = fieldWeight in 2827, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.1055303 = idf(docFreq=1980, maxDocs=44218)
          0.03125 = fieldNorm(doc=2827)
    0.0069903214 = weight(_text_:information in 2827) [ClassicSimilarity], result of:
      0.0069903214 = score(doc=2827,freq=6.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.1343758 = fieldWeight in 2827, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03125 = fieldNorm(doc=2827)
  0.21428572 = coord(3/14)
```
Abstract

Die Arbeit entwirft ein Konzept für eine Simulation, die als Teil eines ELearning-Moduls die Probleme der automatischen Freitextindexierung und linguistische Verfahren zur Verbesserung der Indexierungsergebnisse veranschaulichen soll. Zielgruppe der Simulation sind die im Studierenden des Fachbereichs Bibliothek und Information der HAW Hamburg, die sich im Grundstudium befinden. Es wird ein inhaltliches Konzept dafür entwickelt, wie die Simulation Vor- und Nachteile regelbasierter und wörterbuchbasierte Indexierungsverfahren für das Grundstudium darstellen kann. Ziel ist zu zeigen, dass regelbasierte Verfahren in einer stark flektierenden und kompositareichen Sprache wie dem Deutschen zu zahlreichen Indexierungsfehlern führen können und dass wörterbuchbasierte Verfahren bessere Indexate liefern. Im zweiten Teil der Arbeit wird eine Informationsarchitektur für die Simulation entworfen und ein Prototyp programmiert, der eine Freitextindexierung und darauf aufbauend ein regelbasiertes Reduktionsverfahren darstellt. Ziel dabei ist insbesondere zu zeigen, dass regelbasierte Indexierungsverfahren für das Deutsche keine befriedigenden Ergebnisse erzielen, und dass wörterbuchbasierte Verfahren im Deutschen zu bevorzugen sind. Vor diesem Hintergrund wird im zweiten Teil der Arbeit ein Prototyp für die Simulation konzipiert, die elektronische Volltexte zunächst nach der Freitextmethode und danach mit linguistischen Verfahren indexiert. Es wird eine Informationsarchitektur entwickelt, die nicht nur anstrebt, der Zielgruppe gerecht zu werden, sondern auch die Vor- und Nachteile der linguistischen Indexierungsverfahren möglichst deutlich zu zeigen. Für die Freitextindexierung als einfachste Form der automatischen Indexierung und für das regelbasierte Verfahren wird auch schon der Programmcode geschrieben. Für die regelbasierte Wortformenreduktion greift die Autorin auf ein schon bestehendes Programm zurück, das Cornelie Ahlfeld 1995 im Rahmen ihrer Diplomarbeit entwickelt hat. Die Autorin versucht, dieses Programm durch eine Präsentation der Indexierungsergebnisse zu ergänzen, die es für den Einsatz in der Lehre nützlich machen.

Footnote

Hausarbeit zur Diplomprüfung an der HAW Hamburg, Fachbereich Bibliothek und Information

Imprint

Hamburg : HAW Hamburg, Fachbereich Bibliothek und Information

Artemenko, O.; Shramko, M.: Entwicklung eines Werkzeugs zur Sprachidentifikation in mono- und multilingualen Texten (2005) 0.02

0.015363664 = product of:
  0.053772822 = sum of:
    0.022496238 = weight(_text_:wide in 572) [ClassicSimilarity], result of:
      0.022496238 = score(doc=572,freq=2.0), product of:
        0.1312982 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.029633347 = queryNorm
        0.171337 = fieldWeight in 572, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.02734375 = fieldNorm(doc=572)
    0.017259922 = weight(_text_:web in 572) [ClassicSimilarity], result of:
      0.017259922 = score(doc=572,freq=4.0), product of:
        0.09670874 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.029633347 = queryNorm
        0.17847323 = fieldWeight in 572, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.02734375 = fieldNorm(doc=572)
    0.0035313808 = weight(_text_:information in 572) [ClassicSimilarity], result of:
      0.0035313808 = score(doc=572,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.06788416 = fieldWeight in 572, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02734375 = fieldNorm(doc=572)
    0.010485282 = weight(_text_:retrieval in 572) [ClassicSimilarity], result of:
      0.010485282 = score(doc=572,freq=2.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.11697317 = fieldWeight in 572, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.02734375 = fieldNorm(doc=572)
  0.2857143 = coord(4/14)

Abstract: Mit der Verbreitung des Internets vermehrt sich die Menge der im World Wide Web verfügbaren Dokumente. Die Gewährleistung eines effizienten Zugangs zu gewünschten Informationen für die Internetbenutzer wird zu einer großen Herausforderung an die moderne Informationsgesellschaft. Eine Vielzahl von Werkzeugen wird bereits eingesetzt, um den Nutzern die Orientierung in der wachsenden Informationsflut zu erleichtern. Allerdings stellt die enorme Menge an unstrukturierten und verteilten Informationen nicht die einzige Schwierigkeit dar, die bei der Entwicklung von Werkzeugen dieser Art zu bewältigen ist. Die zunehmende Vielsprachigkeit von Web-Inhalten resultiert in dem Bedarf an Sprachidentifikations-Software, die Sprache/en von elektronischen Dokumenten zwecks gezielter Weiterverarbeitung identifiziert. Solche Sprachidentifizierer können beispielsweise effektiv im Bereich des Multilingualen Information Retrieval eingesetzt werden, da auf den Sprachidentifikationsergebnissen Prozesse der automatischen Indexbildung wie Stemming, Stoppwörterextraktion etc. aufbauen. In der vorliegenden Arbeit wird das neue System "LangIdent" zur Sprachidentifikation von elektronischen Textdokumenten vorgestellt, das in erster Linie für Lehre und Forschung an der Universität Hildesheim verwendet werden soll. "LangIdent" enthält eine Auswahl von gängigen Algorithmen zu der monolingualen Sprachidentifikation, die durch den Benutzer interaktiv ausgewählt und eingestellt werden können. Zusätzlich wurde im System ein neuer Algorithmus implementiert, der die Identifikation von Sprachen, in denen ein multilinguales Dokument verfasst ist, ermöglicht. Die Identifikation beschränkt sich nicht nur auf eine Aufzählung von gefundenen Sprachen, vielmehr wird der Text in monolinguale Abschnitte aufgeteilt, jeweils mit der Angabe der identifizierten Sprache.

Dinse, S.: ¬Die sachliche Suche im OPAC der Bibliothek des HWWA-Instituts für Wirtschaftsforschung Hamburg : eine kritische Bestandsaufnahme (1994) 0.02

0.01510862 = product of:
  0.105760336 = sum of:
    0.09365275 = weight(_text_:bibliothek in 4216) [ClassicSimilarity], result of:
      0.09365275 = score(doc=4216,freq=4.0), product of:
        0.121660605 = queryWeight, product of:
          4.1055303 = idf(docFreq=1980, maxDocs=44218)
          0.029633347 = queryNorm
        0.76978695 = fieldWeight in 4216, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.1055303 = idf(docFreq=1980, maxDocs=44218)
          0.09375 = fieldNorm(doc=4216)
    0.012107591 = weight(_text_:information in 4216) [ClassicSimilarity], result of:
      0.012107591 = score(doc=4216,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.23274569 = fieldWeight in 4216, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.09375 = fieldNorm(doc=4216)
  0.14285715 = coord(2/14)

Imprint: Hamburg : Fachhochschule, Fb Bibliothek und Information

Kara, S.: ¬An ontology-based retrieval system using semantic indexing (2012) 0.01

0.01478128 = product of:
  0.06897931 = sum of:
    0.020922182 = weight(_text_:web in 3829) [ClassicSimilarity], result of:
      0.020922182 = score(doc=3829,freq=2.0), product of:
        0.09670874 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.029633347 = queryNorm
        0.21634221 = fieldWeight in 3829, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=3829)
    0.012107591 = weight(_text_:information in 3829) [ClassicSimilarity], result of:
      0.012107591 = score(doc=3829,freq=8.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.23274569 = fieldWeight in 3829, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=3829)
    0.03594954 = weight(_text_:retrieval in 3829) [ClassicSimilarity], result of:
      0.03594954 = score(doc=3829,freq=8.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.40105087 = fieldWeight in 3829, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=3829)
  0.21428572 = coord(3/14)

Abstract: In this thesis, we present an ontology-based information extraction and retrieval system and its application to soccer domain. In general, we deal with three issues in semantic search, namely, usability, scalability and retrieval performance. We propose a keyword-based semantic retrieval approach. The performance of the system is improved considerably using domain-specific information extraction, inference and rules. Scalability is achieved by adapting a semantic indexing approach. The system is implemented using the state-of-the-art technologies in SemanticWeb and its performance is evaluated against traditional systems as well as the query expansion methods. Furthermore, a detailed evaluation is provided to observe the performance gain due to domain-specific information extraction and inference. Finally, we show how we use semantic indexing to solve simple structural ambiguities.
Source: Information Systems. 37(2012) no. 4, S.294-305
Theme: Semantic Web

Líska, M.: Evaluation of mathematics retrieval (2013) 0.01

0.014527298 = product of:
  0.067794055 = sum of:
    0.024409214 = weight(_text_:web in 1653) [ClassicSimilarity], result of:
      0.024409214 = score(doc=1653,freq=2.0), product of:
        0.09670874 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.029633347 = queryNorm
        0.25239927 = fieldWeight in 1653, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1653)
    0.0070627616 = weight(_text_:information in 1653) [ClassicSimilarity], result of:
      0.0070627616 = score(doc=1653,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.13576832 = fieldWeight in 1653, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1653)
    0.036322083 = weight(_text_:retrieval in 1653) [ClassicSimilarity], result of:
      0.036322083 = score(doc=1653,freq=6.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.40520695 = fieldWeight in 1653, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1653)
  0.21428572 = coord(3/14)

Abstract: The thesis deals with the evaluation of mathematics information retrieval (IR). It gives an overview of the history of regular IR evaluation, initiatives that are engaged in this field of research as well as most common methods and measures used for evaluation. The findings are applied to the specifics of mathematics retrieval. This thesis also summarizes the state-of-the-art of MIaS math search system, which is already being used in an international web portal. Latest developments aiming towards the second version of the system are described. In addition to participating in the international evaluation conference and workshop, MIaS is tested for effectiveness and efficiency in this work. Measured performance indicators are evaluated and future work is suggested accordingly.

Search (332 results, page 2 of 17)

Authors

Years

Languages

Types

Themes

Subjects

Classifications