Search (82 results, page 1 of 5)

Salton, G.; Allen, J.; Buckley, C.; Singhal, A.: Automatic analysis, theme generation, and summarization of machine-readable data (1994) 0.06

0.060425792 = product of:
  0.27191606 = sum of:
    0.2149742 = weight(_text_:readable in 1168) [ClassicSimilarity], result of:
      0.2149742 = score(doc=1168,freq=2.0), product of:
        0.2262076 = queryWeight, product of:
          6.1439276 = idf(docFreq=257, maxDocs=44218)
          0.036818076 = queryNorm
        0.9503403 = fieldWeight in 1168, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.1439276 = idf(docFreq=257, maxDocs=44218)
          0.109375 = fieldNorm(doc=1168)
    0.05694187 = weight(_text_:data in 1168) [ClassicSimilarity], result of:
      0.05694187 = score(doc=1168,freq=2.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.48910472 = fieldWeight in 1168, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.109375 = fieldNorm(doc=1168)
  0.22222222 = coord(2/9)

Milstead, J.L.: Methodologies for subject analysis in bibliographic databases (1992) 0.02

0.022510225 = product of:
  0.10129601 = sum of:
    0.061032027 = weight(_text_:bibliographic in 2311) [ClassicSimilarity], result of:
      0.061032027 = score(doc=2311,freq=4.0), product of:
        0.14333439 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.036818076 = queryNorm
        0.4258017 = fieldWeight in 2311, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2311)
    0.040263984 = weight(_text_:data in 2311) [ClassicSimilarity], result of:
      0.040263984 = score(doc=2311,freq=4.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.34584928 = fieldWeight in 2311, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2311)
  0.22222222 = coord(2/9)

Abstract: The goal of the study was to determine the state of the art of subject analysis as applied to large bibliographic data bases. The intent was to gather and evaluate information, casting it in a form that could be applied by management. There was no attempt to determine actual costs or trade-offs among costs and possible benefits. Commercial automatic indexing packages were also reviewed. The overall conclusion was that data base producers should begin working seriously on upgrading their thesauri and codifying their indexing policies as a means of moving toward development of machine aids to indexing, but that fully automatic indexing is not yet ready for wholesale implementation

Polity, Y.: Vers une ergonomie linguistique (1994) 0.02

0.018191008 = product of:
  0.08185954 = sum of:
    0.049321324 = weight(_text_:bibliographic in 36) [ClassicSimilarity], result of:
      0.049321324 = score(doc=36,freq=2.0), product of:
        0.14333439 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.036818076 = queryNorm
        0.34409973 = fieldWeight in 36, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0625 = fieldNorm(doc=36)
    0.032538213 = weight(_text_:data in 36) [ClassicSimilarity], result of:
      0.032538213 = score(doc=36,freq=2.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.2794884 = fieldWeight in 36, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0625 = fieldNorm(doc=36)
  0.22222222 = coord(2/9)

Abstract: Analyzed a special type of man-mchine interaction, that of searching an information system with natural language. A model for full text processing for information retrieval was proposed that considered the system's users and how they employ information. Describes how INIST (the National Institute for Scientific and Technical Information) is developing computer assisted indexing as an aid to improving relevance when retrieving information from bibliographic data banks

Hirawa, M.: Role of keywords in the network searching era (1998) 0.02

0.018191008 = product of:
  0.08185954 = sum of:
    0.049321324 = weight(_text_:bibliographic in 3446) [ClassicSimilarity], result of:
      0.049321324 = score(doc=3446,freq=2.0), product of:
        0.14333439 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.036818076 = queryNorm
        0.34409973 = fieldWeight in 3446, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0625 = fieldNorm(doc=3446)
    0.032538213 = weight(_text_:data in 3446) [ClassicSimilarity], result of:
      0.032538213 = score(doc=3446,freq=2.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.2794884 = fieldWeight in 3446, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0625 = fieldNorm(doc=3446)
  0.22222222 = coord(2/9)

Abstract: A survey of Japanese OPACs available on the Internet was conducted relating to use of keywords for subject access. The findings suggest that present OPACs are not capable of storing subject-oriented information. Currently available keyword access derives from a merely title-based retrieval system. Contents data should be added to bibliographic records as an efficient way of providing subject access, and costings for this process should be estimated. Word standardisation issues must also be addressed

Salton, G.; Allan, J.; Buckley, C.; Singhal, A.: Automatic analysis, theme generation, and summarization of machine readable texts (1994) 0.02

0.017061446 = product of:
  0.15355301 = sum of:
    0.15355301 = weight(_text_:readable in 1949) [ClassicSimilarity], result of:
      0.15355301 = score(doc=1949,freq=2.0), product of:
        0.2262076 = queryWeight, product of:
          6.1439276 = idf(docFreq=257, maxDocs=44218)
          0.036818076 = queryNorm
        0.67881453 = fieldWeight in 1949, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.1439276 = idf(docFreq=257, maxDocs=44218)
          0.078125 = fieldNorm(doc=1949)
  0.11111111 = coord(1/9)

Gomez, I.: Coping with the problem of subject classification diversity (1996) 0.02

0.015917132 = product of:
  0.071627095 = sum of:
    0.04315616 = weight(_text_:bibliographic in 5074) [ClassicSimilarity], result of:
      0.04315616 = score(doc=5074,freq=2.0), product of:
        0.14333439 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.036818076 = queryNorm
        0.30108726 = fieldWeight in 5074, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5074)
    0.028470935 = weight(_text_:data in 5074) [ClassicSimilarity], result of:
      0.028470935 = score(doc=5074,freq=2.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.24455236 = fieldWeight in 5074, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5074)
  0.22222222 = coord(2/9)

Abstract: The delimination of a research field in bibliometric studies presents the problem of the diversity of subject classifications used in the sources of input and output data. Classification of documents according the thematic codes or keywords is the most accurate method, mainly used is specialized bibliographic or patent databases. Classification of journals in disciplines presents lower specifity, and some shortcomings as the change over time of both journals and disciplines and the increasing interdisciplinarity of research. Standardization of subject classifications emerges as an important point in bibliometric studies in order to allow international comparisons, although flexibility is needed to meet the needs of local studies

Stankovic, R. et al.: Indexing of textual databases based on lexical resources : a case study for Serbian (2016) 0.01

0.014580995 = product of:
  0.06561448 = sum of:
    0.040672768 = weight(_text_:data in 2759) [ClassicSimilarity], result of:
      0.040672768 = score(doc=2759,freq=2.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.34936053 = fieldWeight in 2759, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.078125 = fieldNorm(doc=2759)
    0.024941705 = product of:
      0.04988341 = sum of:
        0.04988341 = weight(_text_:22 in 2759) [ClassicSimilarity], result of:
          0.04988341 = score(doc=2759,freq=2.0), product of:
            0.12893063 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.036818076 = queryNorm
            0.38690117 = fieldWeight in 2759, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2759)
      0.5 = coord(1/2)
  0.22222222 = coord(2/9)

Date: 1. 2.2016 18:25:22
Source: Semantic keyword-based search on structured data sources: First COST Action IC1302 International KEYSTONE Conference, IKC 2015, Coimbra, Portugal, September 8-9, 2015. Revised Selected Papers. Eds.: J. Cardoso et al

Wang, S.; Koopman, R.: Embed first, then predict (2019) 0.01
```
0.013241293 = product of:
  0.059585817 = sum of:
    0.03082583 = weight(_text_:bibliographic in 5400) [ClassicSimilarity], result of:
      0.03082583 = score(doc=5400,freq=2.0), product of:
        0.14333439 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.036818076 = queryNorm
        0.21506234 = fieldWeight in 5400, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5400)
    0.028759988 = weight(_text_:data in 5400) [ClassicSimilarity], result of:
      0.028759988 = score(doc=5400,freq=4.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.24703519 = fieldWeight in 5400, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5400)
  0.22222222 = coord(2/9)
```
Abstract

Automatic subject prediction is a desirable feature for modern digital library systems, as manual indexing can no longer cope with the rapid growth of digital collections. It is also desirable to be able to identify a small set of entities (e.g., authors, citations, bibliographic records) which are most relevant to a query. This gets more difficult when the amount of data increases dramatically. Data sparsity and model scalability are the major challenges to solving this type of extreme multilabel classification problem automatically. In this paper, we propose to address this problem in two steps: we first embed different types of entities into the same semantic space, where similarity could be computed easily; second, we propose a novel non-parametric method to identify the most relevant entities in addition to direct semantic similarities. We show how effectively this approach predicts even very specialised subjects, which are associated with few documents in the training set and are more problematic for a classifier.

Husevag, A.-S.R.: Named entities in indexing : a case study of TV subtitles and metadata records (2016) 0.01

0.008037162 = product of:
  0.07233446 = sum of:
    0.07233446 = weight(_text_:germany in 3105) [ClassicSimilarity], result of:
      0.07233446 = score(doc=3105,freq=2.0), product of:
        0.21956629 = queryWeight, product of:
          5.963546 = idf(docFreq=308, maxDocs=44218)
          0.036818076 = queryNorm
        0.32944247 = fieldWeight in 3105, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.963546 = idf(docFreq=308, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3105)
  0.11111111 = coord(1/9)

Source: Proceedings of the 15th European Networked Knowledge Organization Systems Workshop (NKOS 2016) co-located with the 20th International Conference on Theory and Practice of Digital Libraries 2016 (TPDL 2016), Hannover, Germany, September 9, 2016. Edi. by Philipp Mayr et al. [http://ceur-ws.org/Vol-1676/=urn:nbn:de:0074-1676-5]

Milstead, J.L.: Thesauri in a full-text world (1998) 0.01

0.0072904974 = product of:
  0.03280724 = sum of:
    0.020336384 = weight(_text_:data in 2337) [ClassicSimilarity], result of:
      0.020336384 = score(doc=2337,freq=2.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.17468026 = fieldWeight in 2337, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2337)
    0.012470853 = product of:
      0.024941705 = sum of:
        0.024941705 = weight(_text_:22 in 2337) [ClassicSimilarity], result of:
          0.024941705 = score(doc=2337,freq=2.0), product of:
            0.12893063 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.036818076 = queryNorm
            0.19345059 = fieldWeight in 2337, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2337)
      0.5 = coord(1/2)
  0.22222222 = coord(2/9)

Date: 22. 9.1997 19:16:05
Source: Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al

Junger, U.; Schwens, U.: ¬Die inhaltliche Erschließung des schriftlichen kulturellen Erbes auf dem Weg in die Zukunft : Automatische Vergabe von Schlagwörtern in der Deutschen Nationalbibliothek (2017) 0.01
```
0.0072904974 = product of:
  0.03280724 = sum of:
    0.020336384 = weight(_text_:data in 3780) [ClassicSimilarity], result of:
      0.020336384 = score(doc=3780,freq=2.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.17468026 = fieldWeight in 3780, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3780)
    0.012470853 = product of:
      0.024941705 = sum of:
        0.024941705 = weight(_text_:22 in 3780) [ClassicSimilarity], result of:
          0.024941705 = score(doc=3780,freq=2.0), product of:
            0.12893063 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.036818076 = queryNorm
            0.19345059 = fieldWeight in 3780, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3780)
      0.5 = coord(1/2)
  0.22222222 = coord(2/9)
```
Abstract

Wir leben im 21. Jahrhundert, und vieles, was vor hundert und noch vor fünfzig Jahren als Science Fiction abgetan worden wäre, ist mittlerweile Realität. Raumsonden fliegen zum Mars, machen dort Experimente und liefern Daten zur Erde zurück. Roboter werden für Routineaufgaben eingesetzt, zum Beispiel in der Industrie oder in der Medizin. Digitalisierung, künstliche Intelligenz und automatisierte Verfahren sind kaum mehr aus unserem Alltag wegzudenken. Grundlage vieler Prozesse sind lernende Algorithmen. Die fortschreitende digitale Transformation ist global und umfasst alle Lebens- und Arbeitsbereiche: Wirtschaft, Gesellschaft und Politik. Sie eröffnet neue Möglichkeiten, von denen auch Bibliotheken profitieren. Der starke Anstieg digitaler Publikationen, die einen wichtigen und prozentual immer größer werdenden Teil des Kulturerbes darstellen, sollte für Bibliotheken Anlass sein, diese Möglichkeiten aktiv aufzugreifen und einzusetzen. Die Auswertbarkeit digitaler Inhalte, beispielsweise durch Text- and Data-Mining (TDM), und die Entwicklung technischer Verfahren, mittels derer Inhalte miteinander vernetzt und semantisch in Beziehung gesetzt werden können, bieten Raum, auch bibliothekarische Erschließungsverfahren neu zu denken. Daher beschäftigt sich die Deutsche Nationalbibliothek (DNB) seit einigen Jahren mit der Frage, wie sich die Prozesse bei der Erschließung von Medienwerken verbessern und maschinell unterstützen lassen. Sie steht dabei im regelmäßigen kollegialen Austausch mit anderen Bibliotheken, die sich ebenfalls aktiv mit dieser Fragestellung befassen, sowie mit europäischen Nationalbibliotheken, die ihrerseits Interesse an dem Thema und den Erfahrungen der DNB haben. Als Nationalbibliothek mit umfangreichen Beständen an digitalen Publikationen hat die DNB auch Expertise bei der digitalen Langzeitarchivierung aufgebaut und ist im Netzwerk ihrer Partner als kompetente Gesprächspartnerin geschätzt.

Date

19. 8.2017 9:24:22
Gross, D.: Maschinelle Bilderkennung mit Big Data und Deep Learning (2017) 0.01
```
0.006261982 = product of:
  0.05635784 = sum of:
    0.05635784 = weight(_text_:data in 3726) [ClassicSimilarity], result of:
      0.05635784 = score(doc=3726,freq=6.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.48408815 = fieldWeight in 3726, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0625 = fieldNorm(doc=3726)
  0.11111111 = coord(1/9)
```
Abstract

Die Arbeit mit unstrukturierten Daten dient gerne als Paradebeispiel für Big Data, weil die technologischen Möglichkeiten das Speichern und Verarbeiten großer Datenmengen erlauben und die Mehrheit dieser Daten unstrukturiert ist. Allerdings ist im Zusammenhang mit unstrukturierten Daten meist von der Analyse und der Extraktion von Informationen aus Texten die Rede. Viel weniger hingegen wird das Thema der Bildanalyse thematisiert. Diese gilt aber nach wie vor als eine Königdisziplin der modernen Computerwissenschaft.

Source

https://jaxenter.de/big-data-bildanalyse-50313
Vinyals, O.; Toshev, A.; Bengio, S.; Erhan, D.: ¬A picture is worth a thousand (coherent) words : building a natural description of images (2014) 0.01
```
0.0059715053 = product of:
  0.05374355 = sum of:
    0.05374355 = weight(_text_:readable in 1874) [ClassicSimilarity], result of:
      0.05374355 = score(doc=1874,freq=2.0), product of:
        0.2262076 = queryWeight, product of:
          6.1439276 = idf(docFreq=257, maxDocs=44218)
          0.036818076 = queryNorm
        0.23758507 = fieldWeight in 1874, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.1439276 = idf(docFreq=257, maxDocs=44218)
          0.02734375 = fieldNorm(doc=1874)
  0.11111111 = coord(1/9)
```
Content

"People can summarize a complex scene in a few words without thinking twice. It's much more difficult for computers. But we've just gotten a bit closer -- we've developed a machine-learning system that can automatically produce captions (like the three above) to accurately describe images the first time it sees them. This kind of system could eventually help visually impaired people understand pictures, provide alternate text for images in parts of the world where mobile connections are slow, and make it easier for everyone to search on Google for images. Recent research has greatly improved object detection, classification, and labeling. But accurately describing a complex scene requires a deeper representation of what's going on in the scene, capturing how the various objects relate to one another and translating it all into natural-sounding language. Many efforts to construct computer-generated natural descriptions of images propose combining current state-of-the-art techniques in both computer vision and natural language processing to form a complete image description approach. But what if we instead merged recent computer vision and language models into a single jointly trained system, taking an image and directly producing a human readable sequence of words to describe it? This idea comes from recent advances in machine translation between languages, where a Recurrent Neural Network (RNN) transforms, say, a French sentence into a vector representation, and a second RNN uses that vector representation to generate a target sentence in German. Now, what if we replaced that first RNN and its input words with a deep Convolutional Neural Network (CNN) trained to classify objects in images? Normally, the CNN's last layer is used in a final Softmax among known classes of objects, assigning a probability that each object might be in the image. But if we remove that final layer, we can instead feed the CNN's rich encoding of the image into a RNN designed to produce phrases. We can then train the whole system directly on images and their captions, so it maximizes the likelihood that descriptions it produces best match the training descriptions for each image.
Nohr, H.: Grundlagen der automatischen Indexierung : ein Lehrbuch (2003) 0.01
```
0.0058323974 = product of:
  0.026245788 = sum of:
    0.016269106 = weight(_text_:data in 1767) [ClassicSimilarity], result of:
      0.016269106 = score(doc=1767,freq=2.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.1397442 = fieldWeight in 1767, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03125 = fieldNorm(doc=1767)
    0.009976682 = product of:
      0.019953365 = sum of:
        0.019953365 = weight(_text_:22 in 1767) [ClassicSimilarity], result of:
          0.019953365 = score(doc=1767,freq=2.0), product of:
            0.12893063 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.036818076 = queryNorm
            0.15476047 = fieldWeight in 1767, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=1767)
      0.5 = coord(1/2)
  0.22222222 = coord(2/9)
```
Date

22. 6.2009 12:46:51

Footnote

Im fünften Kapitel "Information Extraction" geht Nohr auf eine Problemstellung ein, die in der Fachwelt eine noch stärkere Betonung verdiente: "Die stetig ansteigende Zahl elektronischer Dokumente macht neben einer automatischen Erschließung auch eine automatische Gewinnung der relevanten Informationen aus diesen Dokumenten wünschenswert, um diese z.B. für weitere Bearbeitungen oder Auswertungen in betriebliche Informationssysteme übernehmen zu können." (S. 103) "Indexierung und Retrievalverfahren" als voneinander abhängige Verfahren werden im sechsten Kapitel behandelt. Hier stehen Relevance Ranking und Relevance Feedback sowie die Anwendung informationslinguistischer Verfahren in der Recherche im Mittelpunkt. Die "Evaluation automatischer Indexierung" setzt den thematischen Schlusspunkt. Hier geht es vor allem um die Oualität einer Indexierung, um gängige Retrievalmaße in Retrievaltest und deren Einssatz. Weiterhin ist hervorzuheben, dass jedes Kapitel durch die Vorgabe von Lernzielen eingeleitet wird und zu den jeweiligen Kapiteln (im hinteren Teil des Buches) einige Kontrollfragen gestellt werden. Die sehr zahlreichen Beispiele aus der Praxis, ein Abkürzungsverzeichnis und ein Sachregister erhöhen den Nutzwert des Buches. Die Lektüre förderte beim Rezensenten das Verständnis für die Zusammenhänge von BID-Handwerkzeug, Wirtschaftsinformatik (insbesondere Data Warehousing) und Künstlicher Intelligenz. Die "Grundlagen der automatischen Indexierung" sollte auch in den bibliothekarischen Studiengängen zur Pflichtlektüre gehören. Holger Nohrs Lehrbuch ist auch für den BID-Profi geeignet, um die mehr oder weniger fundierten Kenntnisse auf dem Gebiet "automatisches Indexieren" schnell, leicht verständlich und informativ aufzufrischen."

SIGIR'92 : Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (1992) 0.01

0.005626014 = product of:
  0.050634123 = sum of:
    0.050634123 = weight(_text_:germany in 6671) [ClassicSimilarity], result of:
      0.050634123 = score(doc=6671,freq=2.0), product of:
        0.21956629 = queryWeight, product of:
          5.963546 = idf(docFreq=308, maxDocs=44218)
          0.036818076 = queryNorm
        0.23060973 = fieldWeight in 6671, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.963546 = idf(docFreq=308, maxDocs=44218)
          0.02734375 = fieldNorm(doc=6671)
  0.11111111 = coord(1/9)

Abstract: The conference was organized by the Royal School of Librarianship in Copenhagen and was held in cooperation with AICA-GLIR (Italy), BCS-IRSG (UK), DD (Denmark), GI (Germany), INRIA (France). It had support from Apple Computer, Denmark. The volume contains the 32 papers and reports on the two panel sessions, moderated by W.B. Croft, and R. Kovetz, respectively

Alexander, M.: Retrieving digital data with fuzzy matching (1997) 0.01
```
0.005112887 = product of:
  0.04601598 = sum of:
    0.04601598 = weight(_text_:data in 151) [ClassicSimilarity], result of:
      0.04601598 = score(doc=151,freq=4.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.3952563 = fieldWeight in 151, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0625 = fieldNorm(doc=151)
  0.11111111 = coord(1/9)
```
Abstract

In 1993 the British Library established a programme of activities entitled Initiatives for Access (IFA) to identify and develop computer applications based on the new technologies emerging in the aereas of digital and network service. Discusses the problem of the effective retrieval of digital data after its capture focusing on the product Excalibur EFS which looks at the way information is sorted at its fundamental level and identifies patterns in numbers. Looks at the benefits of Excalibur and outlines other experiments in progress as part of the IFA programme

Fox, C.: Lexical analysis and stoplists (1992) 0.01

0.005112887 = product of:
  0.04601598 = sum of:
    0.04601598 = weight(_text_:data in 3502) [ClassicSimilarity], result of:
      0.04601598 = score(doc=3502,freq=4.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.3952563 = fieldWeight in 3502, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0625 = fieldNorm(doc=3502)
  0.11111111 = coord(1/9)

Abstract: Lexical analysis is a fundamental operation in both query processing and automatic indexing, and filtering stoplist words is an important step in the automatic indexing process. Presents basic algorithms and data structures for lexical analysis, and shows how stoplist word removal can be efficiently incorporated into lexical analysis
Source: Information retrieval: data structures and algorithms. Ed.: W.B. Frakes u. R. Baeza-Yates

Pulgarin, A.; Gil-Leiva, I.: Bibliometric analysis of the automatic indexing literature : 1956-2000 (2004) 0.00
```
0.004795129 = product of:
  0.04315616 = sum of:
    0.04315616 = weight(_text_:bibliographic in 2566) [ClassicSimilarity], result of:
      0.04315616 = score(doc=2566,freq=2.0), product of:
        0.14333439 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.036818076 = queryNorm
        0.30108726 = fieldWeight in 2566, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2566)
  0.11111111 = coord(1/9)
```
Abstract

We present a bibliometric study of a corpus of 839 bibliographic references about automatic indexing, covering the period 1956-2000. We analyse the distribution of authors and works, the obsolescence and its dispersion, and the distribution of the literature by topic, year, and source type. We conclude that: (i) there has been a constant interest on the part of researchers; (ii) the most studied topics were the techniques and methods employed and the general aspects of automatic indexing; (iii) the productivity of the authors does fit a Lotka distribution (Dmax=0.02 and critical value=0.054); (iv) the annual aging factor is 95%; and (v) the dispersion of the literature is low.
Golub, K.: Automated subject indexing : an overview (2021) 0.00
```
0.004795129 = product of:
  0.04315616 = sum of:
    0.04315616 = weight(_text_:bibliographic in 718) [ClassicSimilarity], result of:
      0.04315616 = score(doc=718,freq=2.0), product of:
        0.14333439 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.036818076 = queryNorm
        0.30108726 = fieldWeight in 718, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0546875 = fieldNorm(doc=718)
  0.11111111 = coord(1/9)
```
Abstract

In the face of the ever-increasing document volume, libraries around the globe are more and more exploring (semi-) automated approaches to subject indexing. This helps sustain bibliographic objectives, enrich metadata, and establish more connections across documents from various collections, effectively leading to improved information retrieval and access. However, generally accepted automated approaches that are functional in operative systems are lacking. This article aims to provide an overview of basic principles used for automated subject indexing, major approaches in relation to their possible application in actual library systems, existing working examples, as well as related challenges calling for further research.
Gödert, W.; Liebig, M.: Maschinelle Indexierung auf dem Prüfstand : Ergebnisse eines Retrievaltests zum MILOS II Projekt (1997) 0.00
```
0.004473776 = product of:
  0.040263984 = sum of:
    0.040263984 = weight(_text_:data in 1174) [ClassicSimilarity], result of:
      0.040263984 = score(doc=1174,freq=4.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.34584928 = fieldWeight in 1174, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1174)
  0.11111111 = coord(1/9)
```
Abstract

The test ran between Nov 95-Aug 96 in Cologne Fachhochschule fur Bibliothekswesen (College of Librarianship).The test basis was a database of 190,000 book titles published between 1990-95. MILOS II mechanized indexing methods proved helpful in avoiding or reducing numbers of unsatisfied/no result retrieval searches. Retrieval from mechanised indexing is 3 times more successful than from title keyword data. MILOS II also used a standardized semantic vocabulary. Mechanised indexing demands high quality software and output data

Search (82 results, page 1 of 5)

Authors

Years

Languages

Types

Themes