Search (77 results, page 1 of 4)

Banerjee, K.; Johnson, M.: Improving access to archival collections with automated entity extraction (2015) 0.02

0.021928092 = product of:
  0.09867641 = sum of:
    0.043531876 = weight(_text_:open in 2144) [ClassicSimilarity], result of:
      0.043531876 = score(doc=2144,freq=2.0), product of:
        0.14582425 = queryWeight, product of:
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.03238235 = queryNorm
        0.2985229 = fieldWeight in 2144, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.046875 = fieldNorm(doc=2144)
    0.055144534 = weight(_text_:access in 2144) [ClassicSimilarity], result of:
      0.055144534 = score(doc=2144,freq=10.0), product of:
        0.10975764 = queryWeight, product of:
          3.389428 = idf(docFreq=4053, maxDocs=44218)
          0.03238235 = queryNorm
        0.5024209 = fieldWeight in 2144, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.389428 = idf(docFreq=4053, maxDocs=44218)
          0.046875 = fieldNorm(doc=2144)
  0.22222222 = coord(2/9)

Abstract: The complexity and diversity of archival resources make constructing rich metadata records time consuming and expensive, which in turn limits access to these valuable materials. However, significant automation of the metadata creation process would dramatically reduce the cost of providing access points, improve access to individual resources, and establish connections between resources that would otherwise remain unknown. Using a case study at Oregon Health & Science University as a lens to examine the conceptual and technical challenges associated with automated extraction of access points, we discuss using publically accessible API's to extract entities (i.e. people, places, concepts, etc.) from digital and digitized objects. We describe why Linked Open Data is not well suited for a use case such as ours. We conclude with recommendations about how this method can be used in archives as well as for other library applications.

Salton, G.: Future prospects for text-based information retrieval (1990) 0.02

0.021442771 = product of:
  0.19298494 = sum of:
    0.19298494 = weight(_text_:konstanz in 2327) [ClassicSimilarity], result of:
      0.19298494 = score(doc=2327,freq=4.0), product of:
        0.18256405 = queryWeight, product of:
          5.637764 = idf(docFreq=427, maxDocs=44218)
          0.03238235 = queryNorm
        1.0570807 = fieldWeight in 2327, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.637764 = idf(docFreq=427, maxDocs=44218)
          0.09375 = fieldNorm(doc=2327)
  0.11111111 = coord(1/9)

Imprint: Konstanz : Universitätsverlag
Source: Pragmatische Aspekte beim Entwurf und Betrieb von Informationssystemen: Proc. des 1. Int. Symposiums für Informationswissenschaft, Universität Konstanz, 17.-19.10.1990. Hrsg.: J. Herget u. R. Kuhlen

Lepsky, K.; Vorhauer, J.: Lingo - ein open source System für die Automatische Indexierung deutschsprachiger Dokumente (2006) 0.02

0.020840919 = product of:
  0.09378413 = sum of:
    0.0820845 = weight(_text_:open in 3581) [ClassicSimilarity], result of:
      0.0820845 = score(doc=3581,freq=4.0), product of:
        0.14582425 = queryWeight, product of:
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.03238235 = queryNorm
        0.5629002 = fieldWeight in 3581, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.0625 = fieldNorm(doc=3581)
    0.011699631 = product of:
      0.03509889 = sum of:
        0.03509889 = weight(_text_:22 in 3581) [ClassicSimilarity], result of:
          0.03509889 = score(doc=3581,freq=2.0), product of:
            0.11339747 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03238235 = queryNorm
            0.30952093 = fieldWeight in 3581, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=3581)
      0.33333334 = coord(1/3)
  0.22222222 = coord(2/9)

Abstract: Lingo ist ein frei verfügbares System (open source) zur automatischen Indexierung der deutschen Sprache. Bei der Entwicklung von lingo standen hohe Konfigurierbarkeit und Flexibilität des Systems für unterschiedliche Einsatzmöglichkeiten im Vordergrund. Der Beitrag zeigt den Nutzen einer linguistisch basierten automatischen Indexierung für das Information Retrieval auf. Die für eine Retrievalverbesserung zur Verfügung stehende linguistische Funktionalität von lingo wird vorgestellt und an Beispielen erläutert: Grundformerkennung, Kompositumerkennung bzw. Kompositumzerlegung, Wortrelationierung, lexikalische und algorithmische Mehrwortgruppenerkennung, OCR-Fehlerkorrektur. Der offene Systemaufbau von lingo wird beschrieben, mögliche Einsatzszenarien und Anwendungsgrenzen werden benannt.
Date: 24. 3.2006 12:22:02

Lück, W.; Rittberger, W.; Schwantner, M.: ¬Der Einsatz des Automatischen Indexierungs- und Retrievalsystems (AIR) im Fachinformationszentrum Karlsruhe (1994) 0.02

0.017868975 = product of:
  0.16082078 = sum of:
    0.16082078 = weight(_text_:konstanz in 8153) [ClassicSimilarity], result of:
      0.16082078 = score(doc=8153,freq=4.0), product of:
        0.18256405 = queryWeight, product of:
          5.637764 = idf(docFreq=427, maxDocs=44218)
          0.03238235 = queryNorm
        0.8809006 = fieldWeight in 8153, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.637764 = idf(docFreq=427, maxDocs=44218)
          0.078125 = fieldNorm(doc=8153)
  0.11111111 = coord(1/9)

Footnote: Wiederabdruck aus: Experimentelles und praktisches Information Retrieval. Hrsg.: R. Kuhlen. Konstanz: Universitätsverlag 1992
Imprint: Konstanz : Universitätsbibliothek

Hüther, H.: Selix im DFG-Projekt Kascade (1998) 0.02

0.017868975 = product of:
  0.16082078 = sum of:
    0.16082078 = weight(_text_:konstanz in 5151) [ClassicSimilarity], result of:
      0.16082078 = score(doc=5151,freq=4.0), product of:
        0.18256405 = queryWeight, product of:
          5.637764 = idf(docFreq=427, maxDocs=44218)
          0.03238235 = queryNorm
        0.8809006 = fieldWeight in 5151, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.637764 = idf(docFreq=427, maxDocs=44218)
          0.078125 = fieldNorm(doc=5151)
  0.11111111 = coord(1/9)

Imprint: Konstanz : UVK Universitätsverlag
Source: Knowledge Management und Kommunikationssysteme: Proceedings des 6. Internationalen Symposiums für Informationswissenschaft (ISI '98) Prag, 3.-7. November 1998 / Hochschulverband für Informationswissenschaft (HI) e.V. Konstanz ; Fachrichtung Informationswissenschaft der Universität des Saarlandes, Saarbrücken. Hrsg.: Harald H. Zimmermann u. Volker Schramm

Reimer, U.: Verfahren der automatischen Indexierung : benötigtes Vorwissen und Ansätze zu seiner automatischen Akquisition, ein Überblick (1992) 0.02

0.015162329 = product of:
  0.13646096 = sum of:
    0.13646096 = weight(_text_:konstanz in 7858) [ClassicSimilarity], result of:
      0.13646096 = score(doc=7858,freq=2.0), product of:
        0.18256405 = queryWeight, product of:
          5.637764 = idf(docFreq=427, maxDocs=44218)
          0.03238235 = queryNorm
        0.74746895 = fieldWeight in 7858, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.637764 = idf(docFreq=427, maxDocs=44218)
          0.09375 = fieldNorm(doc=7858)
  0.11111111 = coord(1/9)

Imprint: Konstanz : Universitätsverlag

Experimentelles und praktisches Information Retrieval : Festschrift für Gerhard Lustig (1992) 0.01

0.0107213855 = product of:
  0.09649247 = sum of:
    0.09649247 = weight(_text_:konstanz in 4) [ClassicSimilarity], result of:
      0.09649247 = score(doc=4,freq=4.0), product of:
        0.18256405 = queryWeight, product of:
          5.637764 = idf(docFreq=427, maxDocs=44218)
          0.03238235 = queryNorm
        0.5285404 = fieldWeight in 4, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.637764 = idf(docFreq=427, maxDocs=44218)
          0.046875 = fieldNorm(doc=4)
  0.11111111 = coord(1/9)

Imprint: Konstanz : Univ.-Verlag Konstanz

Milstead, J.L.: Thesauri in a full-text world (1998) 0.01
```
0.009535091 = product of:
  0.04290791 = sum of:
    0.03559564 = weight(_text_:access in 2337) [ClassicSimilarity], result of:
      0.03559564 = score(doc=2337,freq=6.0), product of:
        0.10975764 = queryWeight, product of:
          3.389428 = idf(docFreq=4053, maxDocs=44218)
          0.03238235 = queryNorm
        0.3243113 = fieldWeight in 2337, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.389428 = idf(docFreq=4053, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2337)
    0.0073122694 = product of:
      0.021936808 = sum of:
        0.021936808 = weight(_text_:22 in 2337) [ClassicSimilarity], result of:
          0.021936808 = score(doc=2337,freq=2.0), product of:
            0.11339747 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03238235 = queryNorm
            0.19345059 = fieldWeight in 2337, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2337)
      0.33333334 = coord(1/3)
  0.22222222 = coord(2/9)
```
Abstract

Despite early claims to the contemporary, thesauri continue to find use as access tools for information in the full-text environment. Their mode of use is changing, but this change actually represents an expansion rather than a contrdiction of their utility. Thesauri and similar vocabulary tools can complement full-text access by aiding users in focusing their searches, by supplementing the linguistic analysis of the text search engine, and even by serving as one of the tools used by the linguistic engine for its analysis. While human indexing contunues to be used for many databases, the trend is to increase the use of machine aids for this purpose. All machine-aided indexing (MAI) systems rely on thesauri as the basis for term selection. In the 21st century, the balance of effort between human and machine will change at both input and output, but thesauri will continue to play an important role for the foreseeable future

Date

22. 9.1997 19:16:05

Source

Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al

Goller, C.; Löning, J.; Will, T.; Wolff, W.: Automatic document classification : a thourough evaluation of various methods (2000) 0.01

0.0075811646 = product of:
  0.06823048 = sum of:
    0.06823048 = weight(_text_:konstanz in 5480) [ClassicSimilarity], result of:
      0.06823048 = score(doc=5480,freq=2.0), product of:
        0.18256405 = queryWeight, product of:
          5.637764 = idf(docFreq=427, maxDocs=44218)
          0.03238235 = queryNorm
        0.37373447 = fieldWeight in 5480, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.637764 = idf(docFreq=427, maxDocs=44218)
          0.046875 = fieldNorm(doc=5480)
  0.11111111 = coord(1/9)

Imprint: Konstanz : UVK, Universitätsverlag

Mielke, B.: Wider einige gängige Ansichten zur juristischen Informationserschließung (2002) 0.01

0.0075811646 = product of:
  0.06823048 = sum of:
    0.06823048 = weight(_text_:konstanz in 2145) [ClassicSimilarity], result of:
      0.06823048 = score(doc=2145,freq=2.0), product of:
        0.18256405 = queryWeight, product of:
          5.637764 = idf(docFreq=427, maxDocs=44218)
          0.03238235 = queryNorm
        0.37373447 = fieldWeight in 2145, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.637764 = idf(docFreq=427, maxDocs=44218)
          0.046875 = fieldNorm(doc=2145)
  0.11111111 = coord(1/9)

Imprint: Konstanz : UVK

Hirawa, M.: Role of keywords in the network searching era (1998) 0.01

0.0063281143 = product of:
  0.056953028 = sum of:
    0.056953028 = weight(_text_:access in 3446) [ClassicSimilarity], result of:
      0.056953028 = score(doc=3446,freq=6.0), product of:
        0.10975764 = queryWeight, product of:
          3.389428 = idf(docFreq=4053, maxDocs=44218)
          0.03238235 = queryNorm
        0.51889807 = fieldWeight in 3446, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.389428 = idf(docFreq=4053, maxDocs=44218)
          0.0625 = fieldNorm(doc=3446)
  0.11111111 = coord(1/9)

Abstract: A survey of Japanese OPACs available on the Internet was conducted relating to use of keywords for subject access. The findings suggest that present OPACs are not capable of storing subject-oriented information. Currently available keyword access derives from a merely title-based retrieval system. Contents data should be added to bibliographic records as an efficient way of providing subject access, and costings for this process should be estimated. Word standardisation issues must also be addressed

Scherer, B.: Automatische Indexierung und ihre Anwendung im DFG-Projekt "Gemeinsames Portal für Bibliotheken, Archive und Museen (BAM)" (2003) 0.01

0.0063176365 = product of:
  0.05685873 = sum of:
    0.05685873 = weight(_text_:konstanz in 4283) [ClassicSimilarity], result of:
      0.05685873 = score(doc=4283,freq=2.0), product of:
        0.18256405 = queryWeight, product of:
          5.637764 = idf(docFreq=427, maxDocs=44218)
          0.03238235 = queryNorm
        0.3114454 = fieldWeight in 4283, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.637764 = idf(docFreq=427, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4283)
  0.11111111 = coord(1/9)

Imprint: Konstanz : Universität / Fachbereich Informatik und Informationswissenschaft

Lassalle, E.: Text retrieval : from a monolingual system to a multilingual system (1993) 0.01
```
0.0056430213 = product of:
  0.05078719 = sum of:
    0.05078719 = weight(_text_:open in 7403) [ClassicSimilarity], result of:
      0.05078719 = score(doc=7403,freq=2.0), product of:
        0.14582425 = queryWeight, product of:
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.03238235 = queryNorm
        0.3482767 = fieldWeight in 7403, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7403)
  0.11111111 = coord(1/9)
```
Abstract

Describes the TELMI monolingual text retrieval system and its future extension, a multilingual system. TELMI is designed for medium sized databases containing short texts. The characteristics of the system are fine-grained natural language processing (NLP); an open domain and a large scale knowledge base; automated indexing based on conceptual representation of texts and reusability of the NLP tools. Discusses the French MINITEL service, the MGS information service and the TELMI research system covering the full text system; NLP architecture; the lexical level; the syntactic level; the semantic level and an example of the use of a generic system
Lepsky, K.; Müller, T.; Wille, J.: Metadata improvement for image information retrieval (2010) 0.01
```
0.0056430213 = product of:
  0.05078719 = sum of:
    0.05078719 = weight(_text_:open in 4995) [ClassicSimilarity], result of:
      0.05078719 = score(doc=4995,freq=2.0), product of:
        0.14582425 = queryWeight, product of:
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.03238235 = queryNorm
        0.3482767 = fieldWeight in 4995, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4995)
  0.11111111 = coord(1/9)
```
Abstract

This paper discusses the goals and results of the research project Perseus-a as an attempt to improve information retrieval of digital images by automatically connecting them with text-based descriptions. The development uses the image collection of prometheus, the distributed digital image archive for research and studies, the articles of the digitized Reallexikon zur Deutschen Kunstgeschichte, art historical terminological resources and classification data, and an open source system for linguistic and statistic automatic indexing called lingo.
Salton, G.; Buckley, C.; Allan, J.: Automatic structuring of text files (1992) 0.01
```
0.005166883 = product of:
  0.04650195 = sum of:
    0.04650195 = weight(_text_:access in 6507) [ClassicSimilarity], result of:
      0.04650195 = score(doc=6507,freq=4.0), product of:
        0.10975764 = queryWeight, product of:
          3.389428 = idf(docFreq=4053, maxDocs=44218)
          0.03238235 = queryNorm
        0.4236785 = fieldWeight in 6507, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.389428 = idf(docFreq=4053, maxDocs=44218)
          0.0625 = fieldNorm(doc=6507)
  0.11111111 = coord(1/9)
```
Abstract

In many practical information retrieval situations, it is necessary to process heterogeneous text databases that vary greatly in scope and coverage and deal with many different subjects. In such an environment it is important to provide flexible access to individual text pieces and to structure the collection so that related text elements are identified and properly linked. Describes methods for the automatic structuring of heterogeneous text collections and the construction of browsing tools and access procedures that facilitate collection use. Illustrates these emthods with searches using a large automated encyclopedia
Grün, S.: Bildung von Komposita-Indextermen auf der Basis einer algorithmischen Mehrwortgruppenanalyse mit Lingo (2015) 0.00
```
0.004836875 = product of:
  0.043531876 = sum of:
    0.043531876 = weight(_text_:open in 1335) [ClassicSimilarity], result of:
      0.043531876 = score(doc=1335,freq=2.0), product of:
        0.14582425 = queryWeight, product of:
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.03238235 = queryNorm
        0.2985229 = fieldWeight in 1335, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.046875 = fieldNorm(doc=1335)
  0.11111111 = coord(1/9)
```
Abstract

In der deutschen Sprache lassen sich Begriffe durch Komposita und Mehrwortgruppen ausdrücken. Letztere können dabei aber auch als Kompositum selbst ausgedrückt werden und entsprechend auf den gleichen Begriff verweisen. In der nachfolgenden Studie werden Mehrwortgruppen analysiert, die auch Komposita sein können. Ziel der Untersuchung ist es, diese Wortfolgen über Muster zu identifizieren. Analysiert wurden Daten des Karrieremanagers Placement24 GmbH - in Form von Stellenanzeigen. Die Extraktion von Mehrwortgruppen erfolgte algorithmisch und wurde mit der Open-Source Software Lingo durch geführt. Auf der Basis von Erweiterungen bzw. Anpassungen in Wörterbüchern und den darin getaggten Wörtern wurde drei- bis fünfstelligen Kandidaten analysiert. Aus positiv bewerteten Mehrwortgruppen wurden Komposita gebildet. Diese wurden mit den identifizierten Komposita aus den Stellenanzeigen verglichen. Der Vergleich zeigte, dass ein Großteil der neu generierten Komposita nicht durch eine Kompositaidentifizierung erzeugt wurde.

Mongin, L.; Fu, Y.Y.; Mostafa, J.: Open Archives data Service prototype and automated subject indexing using D-Lib archive content as a testbed (2003) 0.00

0.004836875 = product of:
  0.043531876 = sum of:
    0.043531876 = weight(_text_:open in 1167) [ClassicSimilarity], result of:
      0.043531876 = score(doc=1167,freq=2.0), product of:
        0.14582425 = queryWeight, product of:
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.03238235 = queryNorm
        0.2985229 = fieldWeight in 1167, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.046875 = fieldNorm(doc=1167)
  0.11111111 = coord(1/9)

Kasprzik, A.: Aufbau eines produktiven Dienstes für die automatisierte Inhaltserschließung an der ZBW : ein Status- und Erfahrungsbericht. (2023) 0.00
```
0.00456025 = product of:
  0.04104225 = sum of:
    0.04104225 = weight(_text_:open in 935) [ClassicSimilarity], result of:
      0.04104225 = score(doc=935,freq=4.0), product of:
        0.14582425 = queryWeight, product of:
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.03238235 = queryNorm
        0.2814501 = fieldWeight in 935, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.03125 = fieldNorm(doc=935)
  0.11111111 = coord(1/9)
```
Abstract

Die ZBW - Leibniz-Informationszentrum Wirtschaft betreibt seit 2016 eigene angewandte Forschung im Bereich Machine Learning mit dem Zweck, praktikable Lösungen für eine automatisierte oder maschinell unterstützte Inhaltserschließung zu entwickeln. 2020 begann ein Team an der ZBW die Konzeption und Implementierung einer Softwarearchitektur, die es ermöglichte, diese prototypischen Lösungen in einen produktiven Dienst zu überführen und mit den bestehenden Nachweis- und Informationssystemen zu verzahnen. Sowohl die angewandte Forschung als auch die für dieses Vorhaben ("AutoSE") notwendige Softwareentwicklung sind direkt im Bibliotheksbereich der ZBW angesiedelt, werden kontinuierlich anhand des State of the Art vorangetrieben und profitieren von einem engen Austausch mit den Verantwortlichen für die intellektuelle Inhaltserschließung. Dieser Beitrag zeigt die Meilensteine auf, die das AutoSE-Team in zwei Jahren in Bezug auf den Aufbau und die Integration der Software erreicht hat, und skizziert, welche bis zum Ende der Pilotphase (2024) noch ausstehen. Die Architektur basiert auf Open-Source-Software und die eingesetzten Machine-Learning-Komponenten werden im Rahmen einer internationalen Zusammenarbeit im engen Austausch mit der Finnischen Nationalbibliothek (NLF) weiterentwickelt und zur Nachnutzung in dem von der NLF entwickelten Open-Source-Werkzeugkasten Annif aufbereitet. Das Betriebsmodell des AutoSE-Dienstes sieht regelmäßige Überprüfungen sowohl einzelner Komponenten als auch des Produktionsworkflows als Ganzes vor und erlaubt eine fortlaufende Weiterentwicklung der Architektur. Eines der Ergebnisse, das bis zum Ende der Pilotphase vorliegen soll, ist die Dokumentation der Anforderungen an einen dauerhaften produktiven Betrieb des Dienstes, damit die Ressourcen dafür im Rahmen eines tragfähigen Modells langfristig gesichert werden können. Aus diesem Praxisbeispiel lässt sich ableiten, welche Bedingungen gegeben sein müssen, um Machine-Learning-Lösungen wie die in Annif enthaltenen erfolgreich an einer Institution für die Inhaltserschließung einsetzen zu können.
Strobel, S.; Marín-Arraiza, P.: Metadata for scientific audiovisual media : current practices and perspectives of the TIB / AV-portal (2015) 0.00
```
0.004030729 = product of:
  0.036276564 = sum of:
    0.036276564 = weight(_text_:open in 3667) [ClassicSimilarity], result of:
      0.036276564 = score(doc=3667,freq=2.0), product of:
        0.14582425 = queryWeight, product of:
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.03238235 = queryNorm
        0.24876907 = fieldWeight in 3667, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3667)
  0.11111111 = coord(1/9)
```
Abstract

Descriptive metadata play a key role in finding relevant search results in large amounts of unstructured data. However, current scientific audiovisual media are provided with little metadata, which makes them hard to find, let alone individual sequences. In this paper, the TIB / AV-Portal is presented as a use case where methods concerning the automatic generation of metadata, a semantic search and cross-lingual retrieval (German/English) have already been applied. These methods result in a better discoverability of the scientific audiovisual media hosted in the portal. Text, speech, and image content of the video are automatically indexed by specialised GND (Gemeinsame Normdatei) subject headings. A semantic search is established based on properties of the GND ontology. The cross-lingual retrieval uses English 'translations' that were derived by an ontology mapping (DBpedia i. a.). Further ways of increasing the discoverability and reuse of the metadata are publishing them as Linked Open Data and interlinking them with other data sets.
Ma, N.; Zheng, H.T.; Xiao, X.: ¬An ontology-based latent semantic indexing approach using long short-term memory networks (2017) 0.00
```
0.004030729 = product of:
  0.036276564 = sum of:
    0.036276564 = weight(_text_:open in 3810) [ClassicSimilarity], result of:
      0.036276564 = score(doc=3810,freq=2.0), product of:
        0.14582425 = queryWeight, product of:
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.03238235 = queryNorm
        0.24876907 = fieldWeight in 3810, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3810)
  0.11111111 = coord(1/9)
```
Abstract

Nowadays, online data shows an astonishing increase and the issue of semantic indexing remains an open question. Ontologies and knowledge bases have been widely used to optimize performance. However, researchers are placing increased emphasis on internal relations of ontologies but neglect latent semantic relations between ontologies and documents. They generally annotate instances mentioned in documents, which are related to concepts in ontologies. In this paper, we propose an Ontology-based Latent Semantic Indexing approach utilizing Long Short-Term Memory networks (LSTM-OLSI). We utilize an importance-aware topic model to extract document-level semantic features and leverage ontologies to extract word-level contextual features. Then we encode the above two levels of features and match their embedding vectors utilizing LSTM networks. Finally, the experimental results reveal that LSTM-OLSI outperforms existing techniques and demonstrates deep comprehension of instances and articles.

Search (77 results, page 1 of 4)

Authors

Years

Languages

Types

Themes