Search (61 results, page 1 of 4)

Kutschekmanesch, S.; Lutes, B.; Moelle, K.; Thiel, U.; Tzeras, K.: Automated multilingual indexing : a synthesis of rule-based and thesaurus-based methods (1998) 0.08

0.07630152 = product of:
  0.1907538 = sum of:
    0.12117098 = weight(_text_:thesaurus in 4157) [ClassicSimilarity], result of:
      0.12117098 = score(doc=4157,freq=2.0), product of:
        0.23732872 = queryWeight, product of:
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.051357865 = queryNorm
        0.5105618 = fieldWeight in 4157, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.078125 = fieldNorm(doc=4157)
    0.06958282 = weight(_text_:22 in 4157) [ClassicSimilarity], result of:
      0.06958282 = score(doc=4157,freq=2.0), product of:
        0.1798465 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.051357865 = queryNorm
        0.38690117 = fieldWeight in 4157, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.078125 = fieldNorm(doc=4157)
  0.4 = coord(2/5)

Source: Information und Märkte: 50. Deutscher Dokumentartag 1998, Kongreß der Deutschen Gesellschaft für Dokumentation e.V. (DGD), Rheinische Friedrich-Wilhelms-Universität Bonn, 22.-24. September 1998. Hrsg. von Marlies Ockenfeld u. Gerhard J. Mantwill

Siebenkäs, A.; Markscheffel, B.: Conception of a workflow for the semi-automatic construction of a thesaurus for the German printing industry (2015) 0.04
```
0.041552994 = product of:
  0.20776497 = sum of:
    0.20776497 = weight(_text_:thesaurus in 2091) [ClassicSimilarity], result of:
      0.20776497 = score(doc=2091,freq=12.0), product of:
        0.23732872 = queryWeight, product of:
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.051357865 = queryNorm
        0.8754312 = fieldWeight in 2091, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2091)
  0.2 = coord(1/5)
```
Abstract

During the BMWI granted project "Print-IT", the need of a thesaurus based uniform and consistent language for the German printing industry became evident. In this paper we introduce a semi-automatic construction approach for such a thesaurus and present a workflow which supports users to generate thesaurus typical information structures from relevant digitalized resources with the help of common IT-tools.

Object

MIDOS Thesaurus

Theme

Konzeption und Anwendung des Prinzips Thesaurus

Milstead, J.L.: Thesauri in a full-text world (1998) 0.04

0.03815076 = product of:
  0.0953769 = sum of:
    0.06058549 = weight(_text_:thesaurus in 2337) [ClassicSimilarity], result of:
      0.06058549 = score(doc=2337,freq=2.0), product of:
        0.23732872 = queryWeight, product of:
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.051357865 = queryNorm
        0.2552809 = fieldWeight in 2337, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2337)
    0.03479141 = weight(_text_:22 in 2337) [ClassicSimilarity], result of:
      0.03479141 = score(doc=2337,freq=2.0), product of:
        0.1798465 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.051357865 = queryNorm
        0.19345059 = fieldWeight in 2337, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2337)
  0.4 = coord(2/5)

Date: 22. 9.1997 19:16:05
Theme: Konzeption und Anwendung des Prinzips Thesaurus

Liedloff, V.: Anwendung eines existenten Klassifikationssystems im Bereich der computerunterstützten Inhaltsanalyse (1985) 0.03
```
0.029382406 = product of:
  0.14691202 = sum of:
    0.14691202 = weight(_text_:thesaurus in 2921) [ClassicSimilarity], result of:
      0.14691202 = score(doc=2921,freq=6.0), product of:
        0.23732872 = queryWeight, product of:
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.051357865 = queryNorm
        0.6190234 = fieldWeight in 2921, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2921)
  0.2 = coord(1/5)
```
Abstract

In universitärer Grundlagenforschung wurde das Computergestützte TeXterschließungssystem (CTX) entwickelt. Es ist ein wörterbuchorientiertes Verfahren, das aufbauend auf einer wort- und satzorientierten Verarbeitung von Texten zu einem deutschsprachigen Text/ Dokument formal-inhaltliche Stichwörter (Grundformen, systemintern "Deskriptoren" genannt) erstellt. Diese dienen als Input für die Computer-Unterstützte Inhaltsanalyse (CUI). Mit Hilfe eines Thesaurus werden die Deskriptoren zu Oberbegriffen zusammengefaßt und die durch CTX erstellte Deskriptorliste über eine Vergleichsliste auf die Kategorien (=Oberbegriffe) des Thesaurus abgebildet. Das Ergebnis wird über mathematisch-statistische Auswertungsverfahren weiterverarbeitet. Weitere Vorteile der Einbringung eines Thesaurus werden genannt
Wan, T.-L.; Evens, M.; Wan, Y.-W.; Pao, Y.-Y.: Experiments with automatic indexing and a relational thesaurus in a Chinese information retrieval system (1997) 0.03
```
0.029382406 = product of:
  0.14691202 = sum of:
    0.14691202 = weight(_text_:thesaurus in 956) [ClassicSimilarity], result of:
      0.14691202 = score(doc=956,freq=6.0), product of:
        0.23732872 = queryWeight, product of:
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.051357865 = queryNorm
        0.6190234 = fieldWeight in 956, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.0546875 = fieldNorm(doc=956)
  0.2 = coord(1/5)
```
Abstract

This article describes a series of experiments with an interactive Chinese information retrieval system named CIRS and an interactive relational thesaurus. 2 important issues have been explored: whether thesauri enhance the retrieval effectiveness of Chinese documents, and whether automatic indexing can complete with manual indexing in a Chinese information retrieval system. Recall and precision are used to measure and evaluate the effectiveness of the system. Statistical analysis of the recall and precision measures suggest that the use of the relational thesaurus does improve the retrieval effectiveness both in the automatic indexing environment and in the manual indexing environment and that automatic indexing is at least as good as manual indexing

Spitters, M.J.: Adjust : automatische thesauriele ontsluiting van grote hoeveelheden krantenartikelen (1999) 0.03

0.029081037 = product of:
  0.14540519 = sum of:
    0.14540519 = weight(_text_:thesaurus in 3938) [ClassicSimilarity], result of:
      0.14540519 = score(doc=3938,freq=2.0), product of:
        0.23732872 = queryWeight, product of:
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.051357865 = queryNorm
        0.61267424 = fieldWeight in 3938, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.09375 = fieldNorm(doc=3938)
  0.2 = coord(1/5)

Footnote: Übers. d. Titels: Adjust: automated indexing by thesaurus of large newspaper articles collections

Ferber, R.: Automated indexing with thesaurus descriptors : a co-occurence based approach to multilingual retrieval (1997) 0.03
```
0.02709466 = product of:
  0.1354733 = sum of:
    0.1354733 = weight(_text_:thesaurus in 4144) [ClassicSimilarity], result of:
      0.1354733 = score(doc=4144,freq=10.0), product of:
        0.23732872 = queryWeight, product of:
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.051357865 = queryNorm
        0.5708255 = fieldWeight in 4144, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4144)
  0.2 = coord(1/5)
```
Abstract

Indexing documents with descriptors from a multilingual thesaurus is an approach to multilingual information retrieval. However, manual indexing is expensive. Automazed indexing methods in general use terms found in the document. Thesaurus descriptors are complex terms that are often not used in documents or have specific meanings within the thesaurus; therefore most weighting schemes of automated indexing methods are not suited to select thesaurus descriptors. In this paper a linear associative system is described that uses similarity values extracted from a large corpus of manually indexed documents to construct a rank ordering of the descriptors for a given document title. The system is adaptive and has to be tuned with a training sample of records for the specific task. The system was tested on a corpus of some 80.000 bibliographic records. The results show a high variability with changing parameter values. This indicated that it is very important to empirically adapt the model to the specific situation it is used in. The overall median of the manually assigned descriptors in the automatically generated ranked list of all 3.631 descriptors is 14 for the set used to adapt the system and 11 for a test set not used in the optimization process. This result shows that the optimization is not a fitting to a specific training set but a real adaptation of the model to the setting
Willis, C.; Losee, R.M.: ¬A random walk on an ontology : using thesaurus structure for automatic subject indexing (2013) 0.03
```
0.025647065 = product of:
  0.12823533 = sum of:
    0.12823533 = weight(_text_:thesaurus in 1016) [ClassicSimilarity], result of:
      0.12823533 = score(doc=1016,freq=14.0), product of:
        0.23732872 = queryWeight, product of:
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.051357865 = queryNorm
        0.5403279 = fieldWeight in 1016, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.03125 = fieldNorm(doc=1016)
  0.2 = coord(1/5)
```
Abstract

Relationships between terms and features are an essential component of thesauri, ontologies, and a range of controlled vocabularies. In this article, we describe ways to identify important concepts in documents using the relationships in a thesaurus or other vocabulary structures. We introduce a methodology for the analysis and modeling of the indexing process based on a weighted random walk algorithm. The primary goal of this research is the analysis of the contribution of thesaurus structure to the indexing process. The resulting models are evaluated in the context of automatic subject indexing using four collections of documents pre-indexed with 4 different thesauri (AGROVOC [UN Food and Agriculture Organization], high-energy physics taxonomy [HEP], National Agricultural Library Thesaurus [NALT], and medical subject headings [MeSH]). We also introduce a thesaurus-centric matching algorithm intended to improve the quality of candidate concepts. In all cases, the weighted random walk improves automatic indexing performance over matching alone with an increase in average precision (AP) of 9% for HEP, 11% for MeSH, 35% for NALT, and 37% for AGROVOC. The results of the analysis support our hypothesis that subject indexing is in part a browsing process, and that using the vocabulary and its structure in a thesaurus contributes to the indexing process. The amount that the vocabulary structure contributes was found to differ among the 4 thesauri, possibly due to the vocabulary used in the corresponding thesauri and the structural relationships between the terms. Each of the thesauri and the manual indexing associated with it is characterized using the methods developed here.

Theme

Konzeption und Anwendung des Prinzips Thesaurus

Thönssen, B.: Automatische Indexierung und Schnittstellen zu Thesauri (1988) 0.02

0.024234196 = product of:
  0.12117098 = sum of:
    0.12117098 = weight(_text_:thesaurus in 30) [ClassicSimilarity], result of:
      0.12117098 = score(doc=30,freq=2.0), product of:
        0.23732872 = queryWeight, product of:
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.051357865 = queryNorm
        0.5105618 = fieldWeight in 30, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.078125 = fieldNorm(doc=30)
  0.2 = coord(1/5)

Theme: Konzeption und Anwendung des Prinzips Thesaurus

Tavakolizadeh-Ravari, M.: Analysis of the long term dynamics in thesaurus developments and its consequences (2017) 0.02
```
0.023744568 = product of:
  0.11872284 = sum of:
    0.11872284 = weight(_text_:thesaurus in 3081) [ClassicSimilarity], result of:
      0.11872284 = score(doc=3081,freq=12.0), product of:
        0.23732872 = queryWeight, product of:
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.051357865 = queryNorm
        0.5002464 = fieldWeight in 3081, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.03125 = fieldNorm(doc=3081)
  0.2 = coord(1/5)
```
Abstract

Die Arbeit analysiert die dynamische Entwicklung und den Gebrauch von Thesaurusbegriffen. Zusätzlich konzentriert sie sich auf die Faktoren, die die Zahl von Indexbegriffen pro Dokument oder Zeitschrift beeinflussen. Als Untersuchungsobjekt dienten der MeSH und die entsprechende Datenbank "MEDLINE". Die wichtigsten Konsequenzen sind: 1. Der MeSH-Thesaurus hat sich durch drei unterschiedliche Phasen jeweils logarithmisch entwickelt. Solch einen Thesaurus sollte folgenden Gleichung folgen: "T = 3.076,6 Ln (d) - 22.695 + 0,0039d" (T = Begriffe, Ln = natürlicher Logarithmus und d = Dokumente). Um solch einen Thesaurus zu konstruieren, muss man demnach etwa 1.600 Dokumente von unterschiedlichen Themen des Bereiches des Thesaurus haben. Die dynamische Entwicklung von Thesauri wie MeSH erfordert die Einführung eines neuen Begriffs pro Indexierung von 256 neuen Dokumenten. 2. Die Verteilung der Thesaurusbegriffe erbrachte drei Kategorien: starke, normale und selten verwendete Headings. Die letzte Gruppe ist in einer Testphase, während in der ersten und zweiten Kategorie die neu hinzukommenden Deskriptoren zu einem Thesauruswachstum führen. 3. Es gibt ein logarithmisches Verhältnis zwischen der Zahl von Index-Begriffen pro Aufsatz und dessen Seitenzahl für die Artikeln zwischen einer und einundzwanzig Seiten. 4. Zeitschriftenaufsätze, die in MEDLINE mit Abstracts erscheinen erhalten fast zwei Deskriptoren mehr. 5. Die Findablity der nicht-englisch sprachigen Dokumente in MEDLINE ist geringer als die englische Dokumente. 6. Aufsätze der Zeitschriften mit einem Impact Factor 0 bis fünfzehn erhalten nicht mehr Indexbegriffe als die der anderen von MEDINE erfassten Zeitschriften. 7. In einem Indexierungssystem haben unterschiedliche Zeitschriften mehr oder weniger Gewicht in ihrem Findability. Die Verteilung der Indexbegriffe pro Seite hat gezeigt, dass es bei MEDLINE drei Kategorien der Publikationen gibt. Außerdem gibt es wenige stark bevorzugten Zeitschriften."

Theme

Konzeption und Anwendung des Prinzips Thesaurus

Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.02

0.022266502 = product of:
  0.111332506 = sum of:
    0.111332506 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
      0.111332506 = score(doc=402,freq=2.0), product of:
        0.1798465 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.051357865 = queryNorm
        0.61904186 = fieldWeight in 402, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.125 = fieldNorm(doc=402)
  0.2 = coord(1/5)

Source: Information processing and management. 22(1986) no.6, S.465-476

Fuhr, N.; Niewelt, B.: ¬Ein Retrievaltest mit automatisch indexierten Dokumenten (1984) 0.02

0.01948319 = product of:
  0.09741595 = sum of:
    0.09741595 = weight(_text_:22 in 262) [ClassicSimilarity], result of:
      0.09741595 = score(doc=262,freq=2.0), product of:
        0.1798465 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.051357865 = queryNorm
        0.5416616 = fieldWeight in 262, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.109375 = fieldNorm(doc=262)
  0.2 = coord(1/5)

Date: 20.10.2000 12:22:23

Hlava, M.M.K.: Automatic indexing : comparing rule-based and statistics-based indexing systems (2005) 0.02

0.01948319 = product of:
  0.09741595 = sum of:
    0.09741595 = weight(_text_:22 in 6265) [ClassicSimilarity], result of:
      0.09741595 = score(doc=6265,freq=2.0), product of:
        0.1798465 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.051357865 = queryNorm
        0.5416616 = fieldWeight in 6265, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.109375 = fieldNorm(doc=6265)
  0.2 = coord(1/5)

Source: Information outlook. 9(2005) no.8, S.22-23

Dow Jones unveils knowledge indexing system (1997) 0.02
```
0.019387359 = product of:
  0.09693679 = sum of:
    0.09693679 = weight(_text_:thesaurus in 751) [ClassicSimilarity], result of:
      0.09693679 = score(doc=751,freq=2.0), product of:
        0.23732872 = queryWeight, product of:
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.051357865 = queryNorm
        0.40844947 = fieldWeight in 751, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.0625 = fieldNorm(doc=751)
  0.2 = coord(1/5)
```
Abstract

Dow Jones Interactive Publishing has developed a sophisticated automatic knowledge indexing system that will allow searchers of the Dow Jones News / Retrieval service to get highly targeted results from a search in the service's Publications Library. Instead of relying on a thesaurus of company names, the new system uses a combination of that basic algorithm plus unique rules based on the editorial styles of individual publications in the Library. Dow Jones have also announced its acceptance of the definitions of 'selected full text' and 'full text' from Bibliodata's Fulltext Sources Online directory

Zimmermann, H.H.: Automatische Indexierung und elektronische Thesauri (1996) 0.02

0.019387359 = product of:
  0.09693679 = sum of:
    0.09693679 = weight(_text_:thesaurus in 2062) [ClassicSimilarity], result of:
      0.09693679 = score(doc=2062,freq=2.0), product of:
        0.23732872 = queryWeight, product of:
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.051357865 = queryNorm
        0.40844947 = fieldWeight in 2062, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.0625 = fieldNorm(doc=2062)
  0.2 = coord(1/5)

Theme: Konzeption und Anwendung des Prinzips Thesaurus

Scherer, B.: Automatische Indexierung und ihre Anwendung im DFG-Projekt "Gemeinsames Portal für Bibliotheken, Archive und Museen (BAM)" (2003) 0.02
```
0.017136168 = product of:
  0.085680835 = sum of:
    0.085680835 = weight(_text_:thesaurus in 4283) [ClassicSimilarity], result of:
      0.085680835 = score(doc=4283,freq=4.0), product of:
        0.23732872 = queryWeight, product of:
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.051357865 = queryNorm
        0.36102176 = fieldWeight in 4283, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4283)
  0.2 = coord(1/5)
```
Abstract

Automatische Indexierung verzeichnet schon seit einigen Jahren aufgrund steigender Informationsflut ein wachsendes Interesse. Allerdings gibt es immer noch Vorbehalte gegenüber der intellektuellen Indexierung in Bezug auf Qualität und größerem Aufwand der Systemimplementierung bzw. -pflege. Neuere Entwicklungen aus dem Bereich des Wissensmanagements, wie beispielsweise Verfahren aus der Künstlichen Intelligenz, der Informationsextraktion, dem Text Mining bzw. der automatischen Klassifikation sollen die automatische Indexierung aufwerten und verbessern. Damit soll eine intelligentere und mehr inhaltsbasierte Erschließung geleistet werden. In dieser Masterarbeit wird außerhalb der Darstellung von Grundlagen und Verfahren der automatischen Indexierung sowie neueren Entwicklungen auch Möglichkeiten der Evaluation dargestellt. Die mögliche Anwendung der automatischen Indexierung im DFG-ProjektGemeinsames Portal für Bibliotheken, Archive und Museen (BAM)" bilden den Schwerpunkt der Arbeit. Im Portal steht die bibliothekarische Erschließung von Texten im Vordergrund. In einem umfangreichen Test werden drei deutsche, linguistische Systeme mit statistischen Verfahren kombiniert (die aber teilweise im System bereits integriert ist) und evaluiert, allerdings nur auf der Basis der ausgegebenen Indexate. Abschließend kann festgestellt werden, dass die Ergebnisse und damit die Qualität (bezogen auf die Indexate) von intellektueller und automatischer Indexierung noch signifikant unterschiedlich sind. Die Gründe liegen in noch zu lösenden semantischen Problemen bzw, in der Obereinstimmung mit Worten aus einem Thesaurus, die von einem automatischen Indexierungssystem nicht immer nachvollzogen werden kann. Eine Inhaltsanreicherung mit den Indexaten zum Vorteil beim Retrieval kann, je nach System oder auch über die Einbindung durch einen Thesaurus, erreicht werden.
Toepfer, M.; Kempf, A.O.: Automatische Indexierung auf Basis von Titeln und Autoren-Keywords : ein Werkstattbericht (2016) 0.02
```
0.017136168 = product of:
  0.085680835 = sum of:
    0.085680835 = weight(_text_:thesaurus in 3209) [ClassicSimilarity], result of:
      0.085680835 = score(doc=3209,freq=4.0), product of:
        0.23732872 = queryWeight, product of:
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.051357865 = queryNorm
        0.36102176 = fieldWeight in 3209, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3209)
  0.2 = coord(1/5)
```
Abstract

Automatische Verfahren sind für Bibliotheken essentiell, um die Erschliessung stetig wachsender Datenmengen zu stemmen. Die Deutsche Zentralbibliothek für Wirtschaftswissenschaften - Leibniz-Informationszentrum Wirtschaft sammelt seit Längerem Erfahrungen im Bereich automatischer Indexierung und baut hier eigene Kompetenzen auf. Aufgrund rechtlicher Restriktionen werden unter anderem Ansätze untersucht, die ohne Volltextnutzung arbeiten. Dieser Beitrag gibt einen Einblick in ein laufendes Teilprojekt, das unter Verwendung von Titeln und Autoren-Keywords auf eine Nachnormierung der inhaltsbeschreibenden Metadaten auf den Standard-Thesaurus Wirtschaft (STW) abzielt. Wir erläutern den Hintergrund der Arbeit, betrachten die Systemarchitektur und stellen erste vielversprechende Ergebnisse eines dokumentenorientierten Verfahrens vor.
Im Folgenden erläutern wir zunächst den Hintergrund der aktuellen Arbeit. Wir beziehen uns auf Erfahrungen mit maschinellen Verfahren allgemein und an der Deutschen Zentralbibliothek für Wirtschaftswissenschaften (ZBW) - Leibniz-Informationszentrum Wirtschaft im Speziellen. Im Anschluss geben wir einen konkreten Einblick in ein laufendes Teilprojekt, bei dem die Systemarchitektur der Automatik gegenüber früheren Arbeiten Titel und Autoren-Keywords gemeinsam verwendet, um eine Nachnormierung auf den Standard-Thesaurus Wirtschaft (STW) zu erzielen. Im Gegenssatz zu einer statischen Verknüpfung im Sinne einer Crosskonkordanz bzw. Vokabularabbildung ist das jetzt verfolgte Vorgehen dokumentenorientiert und damit in der Lage, kontextbezogene Zuordnungen vorzunehmen. Der Artikel stellt neben der Systemarchitektur auch erste experimentelle Ergebnisse vor, die im Vergleich zu titelbasierten Vorhersagen bereits deutliche Verbesserungen aufzeigen.
Warner, A.J.: ¬A linguistic approach to the automated hierarchical organization of phrases (1990) 0.02
```
0.01696394 = product of:
  0.0848197 = sum of:
    0.0848197 = weight(_text_:thesaurus in 4902) [ClassicSimilarity], result of:
      0.0848197 = score(doc=4902,freq=2.0), product of:
        0.23732872 = queryWeight, product of:
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.051357865 = queryNorm
        0.3573933 = fieldWeight in 4902, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4902)
  0.2 = coord(1/5)
```
Abstract

A linguistic analysis was carried out on 8 sets of phrases automatically selected from documents surrogates in mathematics. The purpose of this analysis was to derive an algorithm which would automatically generate a hierarchically organised arrangement of phrases for online display to the user. This would replace an alphabetical display and would be particularly useful in online browsing of large numbers of items. It is also the first step toward an automatic thesaurus generator
Chartron, G.; Dalbin, S.; Monteil, M.-G.; Verillon, M.: Indexation manuelle et indexation automatique : dépasser les oppositions (1989) 0.02
```
0.01696394 = product of:
  0.0848197 = sum of:
    0.0848197 = weight(_text_:thesaurus in 3516) [ClassicSimilarity], result of:
      0.0848197 = score(doc=3516,freq=2.0), product of:
        0.23732872 = queryWeight, product of:
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.051357865 = queryNorm
        0.3573933 = fieldWeight in 3516, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3516)
  0.2 = coord(1/5)
```
Abstract

Report of a study comparing 2 methods of indexing: LEXINET, a computerised system for indexing titles and summaries only; and manual indexing of full texts, using the thesaurus developed by French Electricity (EDF). Both systems were applied to a collection of approximately 2.000 documents on artifical intelligence from the EDF data base. The results were then analysed to compare quantitative performance (number and range of terms) and qualitative performance (ambiguity of terms, specificity, variability, consistency). Overall, neither system proved ideal: LEXINET was deficient as regards lack of accessibility and excessive ambiguity; while the manual system gave rise to an over-wide variation of terms. The ideal system would appear to be a combination of automatic and manual systems, on the evidence produced here.
Salton, G.: Automatic processing of foreign language documents (1985) 0.02
```
0.016789947 = product of:
  0.08394973 = sum of:
    0.08394973 = weight(_text_:thesaurus in 3650) [ClassicSimilarity], result of:
      0.08394973 = score(doc=3650,freq=6.0), product of:
        0.23732872 = queryWeight, product of:
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.051357865 = queryNorm
        0.35372764 = fieldWeight in 3650, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.03125 = fieldNorm(doc=3650)
  0.2 = coord(1/5)
```
Abstract

The attempt to computerize a process, such as indexing, abstracting, classifying, or retrieving information, begins with an analysis of the process into its intellectual and nonintellectual components. That part of the process which is amenable to computerization is mechanical or algorithmic. What is not is intellectual or creative and requires human intervention. Gerard Salton has been an innovator, experimenter, and promoter in the area of mechanized information systems since the early 1960s. He has been particularly ingenious at analyzing the process of information retrieval into its algorithmic components. He received a doctorate in applied mathematics from Harvard University before moving to the computer science department at Cornell, where he developed a prototype automatic retrieval system called SMART. Working with this system he and his students contributed for over a decade to our theoretical understanding of the retrieval process. On a more practical level, they have contributed design criteria for operating retrieval systems. The following selection presents one of the early descriptions of the SMART system; it is valuable as it shows the direction automatic retrieval methods were to take beyond simple word-matching techniques. These include various word normalization techniques to improve recall, for instance, the separation of words into stems and affixes; the correlation and clustering, using statistical association measures, of related terms; and the identification, using a concept thesaurus, of synonymous, broader, narrower, and sibling terms. They include, as weIl, techniques, both linguistic and statistical, to deal with the thorny problem of how to automatically extract from texts index terms that consist of more than one word. They include weighting techniques and various documentrequest matching algorithms. Significant among the latter are those which produce a retrieval output of citations ranked in relevante order. During the 1970s, Salton and his students went an to further refine these various techniques, particularly the weighting and statistical association measures. Many of their early innovations seem commonplace today. Some of their later techniques are still ahead of their time and await technological developments for implementation. The particular focus of the selection that follows is an the evaluation of a particular component of the SMART system, a multilingual thesaurus. By mapping English language expressions and their German equivalents to a common concept number, the thesaurus permitted the automatic processing of German language documents against English language queries and vice versa. The results of the evaluation, as it turned out, were somewhat inconclusive. However, this SMART experiment suggested in a bold and optimistic way how one might proceed to answer such complex questions as What is meant by retrieval language compatability? How it is to be achieved, and how evaluated?

Search (61 results, page 1 of 4)

Authors

Years

Languages

Types

Themes

Classifications