Search (80 results, page 1 of 4)

Salton, G.; Allen, J.; Buckley, C.; Singhal, A.: Automatic analysis, theme generation, and summarization of machine-readable data (1994) 0.12

0.119171835 = product of:
  0.29792958 = sum of:
    0.26308668 = weight(_text_:readable in 1168) [ClassicSimilarity], result of:
      0.26308668 = score(doc=1168,freq=2.0), product of:
        0.2768342 = queryWeight, product of:
          6.1439276 = idf(docFreq=257, maxDocs=44218)
          0.04505818 = queryNorm
        0.9503403 = fieldWeight in 1168, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.1439276 = idf(docFreq=257, maxDocs=44218)
          0.109375 = fieldNorm(doc=1168)
    0.0348429 = product of:
      0.0696858 = sum of:
        0.0696858 = weight(_text_:data in 1168) [ClassicSimilarity], result of:
          0.0696858 = score(doc=1168,freq=2.0), product of:
            0.14247625 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.04505818 = queryNorm
            0.48910472 = fieldWeight in 1168, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.109375 = fieldNorm(doc=1168)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Milstead, J.L.: Methodologies for subject analysis in bibliographic databases (1992) 0.04

0.039731603 = product of:
  0.09932901 = sum of:
    0.074691355 = weight(_text_:bibliographic in 2311) [ClassicSimilarity], result of:
      0.074691355 = score(doc=2311,freq=4.0), product of:
        0.17541347 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.04505818 = queryNorm
        0.4258017 = fieldWeight in 2311, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2311)
    0.024637653 = product of:
      0.049275305 = sum of:
        0.049275305 = weight(_text_:data in 2311) [ClassicSimilarity], result of:
          0.049275305 = score(doc=2311,freq=4.0), product of:
            0.14247625 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.04505818 = queryNorm
            0.34584928 = fieldWeight in 2311, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2311)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: The goal of the study was to determine the state of the art of subject analysis as applied to large bibliographic data bases. The intent was to gather and evaluate information, casting it in a form that could be applied by management. There was no attempt to determine actual costs or trade-offs among costs and possible benefits. Commercial automatic indexing packages were also reviewed. The overall conclusion was that data base producers should begin working seriously on upgrading their thesauri and codifying their indexing policies as a means of moving toward development of machine aids to indexing, but that fully automatic indexing is not yet ready for wholesale implementation

Salton, G.; Allan, J.; Buckley, C.; Singhal, A.: Automatic analysis, theme generation, and summarization of machine readable texts (1994) 0.04

0.037583813 = product of:
  0.18791907 = sum of:
    0.18791907 = weight(_text_:readable in 1949) [ClassicSimilarity], result of:
      0.18791907 = score(doc=1949,freq=2.0), product of:
        0.2768342 = queryWeight, product of:
          6.1439276 = idf(docFreq=257, maxDocs=44218)
          0.04505818 = queryNorm
        0.67881453 = fieldWeight in 1949, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.1439276 = idf(docFreq=257, maxDocs=44218)
          0.078125 = fieldNorm(doc=1949)
  0.2 = coord(1/5)

Polity, Y.: Vers une ergonomie linguistique (1994) 0.03

0.032107983 = product of:
  0.080269955 = sum of:
    0.060359728 = weight(_text_:bibliographic in 36) [ClassicSimilarity], result of:
      0.060359728 = score(doc=36,freq=2.0), product of:
        0.17541347 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.04505818 = queryNorm
        0.34409973 = fieldWeight in 36, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0625 = fieldNorm(doc=36)
    0.01991023 = product of:
      0.03982046 = sum of:
        0.03982046 = weight(_text_:data in 36) [ClassicSimilarity], result of:
          0.03982046 = score(doc=36,freq=2.0), product of:
            0.14247625 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.04505818 = queryNorm
            0.2794884 = fieldWeight in 36, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0625 = fieldNorm(doc=36)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: Analyzed a special type of man-mchine interaction, that of searching an information system with natural language. A model for full text processing for information retrieval was proposed that considered the system's users and how they employ information. Describes how INIST (the National Institute for Scientific and Technical Information) is developing computer assisted indexing as an aid to improving relevance when retrieving information from bibliographic data banks

Hirawa, M.: Role of keywords in the network searching era (1998) 0.03

0.032107983 = product of:
  0.080269955 = sum of:
    0.060359728 = weight(_text_:bibliographic in 3446) [ClassicSimilarity], result of:
      0.060359728 = score(doc=3446,freq=2.0), product of:
        0.17541347 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.04505818 = queryNorm
        0.34409973 = fieldWeight in 3446, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0625 = fieldNorm(doc=3446)
    0.01991023 = product of:
      0.03982046 = sum of:
        0.03982046 = weight(_text_:data in 3446) [ClassicSimilarity], result of:
          0.03982046 = score(doc=3446,freq=2.0), product of:
            0.14247625 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.04505818 = queryNorm
            0.2794884 = fieldWeight in 3446, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0625 = fieldNorm(doc=3446)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: A survey of Japanese OPACs available on the Internet was conducted relating to use of keywords for subject access. The findings suggest that present OPACs are not capable of storing subject-oriented information. Currently available keyword access derives from a merely title-based retrieval system. Contents data should be added to bibliographic records as an efficient way of providing subject access, and costings for this process should be estimated. Word standardisation issues must also be addressed

Gomez, I.: Coping with the problem of subject classification diversity (1996) 0.03

0.028094485 = product of:
  0.07023621 = sum of:
    0.052814763 = weight(_text_:bibliographic in 5074) [ClassicSimilarity], result of:
      0.052814763 = score(doc=5074,freq=2.0), product of:
        0.17541347 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.04505818 = queryNorm
        0.30108726 = fieldWeight in 5074, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5074)
    0.01742145 = product of:
      0.0348429 = sum of:
        0.0348429 = weight(_text_:data in 5074) [ClassicSimilarity], result of:
          0.0348429 = score(doc=5074,freq=2.0), product of:
            0.14247625 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.04505818 = queryNorm
            0.24455236 = fieldWeight in 5074, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5074)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: The delimination of a research field in bibliometric studies presents the problem of the diversity of subject classifications used in the sources of input and output data. Classification of documents according the thematic codes or keywords is the most accurate method, mainly used is specialized bibliographic or patent databases. Classification of journals in disciplines presents lower specifity, and some shortcomings as the change over time of both journals and disciplines and the increasing interdisciplinarity of research. Standardization of subject classifications emerges as an important point in bibliometric studies in order to allow international comparisons, although flexibility is needed to meet the needs of local studies

Stankovic, R. et al.: Indexing of textual databases based on lexical resources : a case study for Serbian (2016) 0.02

0.02216464 = product of:
  0.11082319 = sum of:
    0.11082319 = sum of:
      0.049775574 = weight(_text_:data in 2759) [ClassicSimilarity], result of:
        0.049775574 = score(doc=2759,freq=2.0), product of:
          0.14247625 = queryWeight, product of:
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.04505818 = queryNorm
          0.34936053 = fieldWeight in 2759, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.078125 = fieldNorm(doc=2759)
      0.061047617 = weight(_text_:22 in 2759) [ClassicSimilarity], result of:
        0.061047617 = score(doc=2759,freq=2.0), product of:
          0.15778607 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.04505818 = queryNorm
          0.38690117 = fieldWeight in 2759, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.078125 = fieldNorm(doc=2759)
  0.2 = coord(1/5)

Date: 1. 2.2016 18:25:22
Source: Semantic keyword-based search on structured data sources: First COST Action IC1302 International KEYSTONE Conference, IKC 2015, Coimbra, Portugal, September 8-9, 2015. Revised Selected Papers. Eds.: J. Cardoso et al

Wang, S.; Koopman, R.: Embed first, then predict (2019) 0.02

0.022129262 = product of:
  0.055323154 = sum of:
    0.03772483 = weight(_text_:bibliographic in 5400) [ClassicSimilarity], result of:
      0.03772483 = score(doc=5400,freq=2.0), product of:
        0.17541347 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.04505818 = queryNorm
        0.21506234 = fieldWeight in 5400, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5400)
    0.017598324 = product of:
      0.035196647 = sum of:
        0.035196647 = weight(_text_:data in 5400) [ClassicSimilarity], result of:
          0.035196647 = score(doc=5400,freq=4.0), product of:
            0.14247625 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.04505818 = queryNorm
            0.24703519 = fieldWeight in 5400, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5400)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: Automatic subject prediction is a desirable feature for modern digital library systems, as manual indexing can no longer cope with the rapid growth of digital collections. It is also desirable to be able to identify a small set of entities (e.g., authors, citations, bibliographic records) which are most relevant to a query. This gets more difficult when the amount of data increases dramatically. Data sparsity and model scalability are the major challenges to solving this type of extreme multilabel classification problem automatically. In this paper, we propose to address this problem in two steps: we first embed different types of entities into the same semantic space, where similarity could be computed easily; second, we propose a novel non-parametric method to identify the most relevant entities in addition to direct semantic similarities. We show how effectively this approach predicts even very specialised subjects, which are associated with few documents in the training set and are more problematic for a classifier.

Vinyals, O.; Toshev, A.; Bengio, S.; Erhan, D.: ¬A picture is worth a thousand (coherent) words : building a natural description of images (2014) 0.01
```
0.013154334 = product of:
  0.06577167 = sum of:
    0.06577167 = weight(_text_:readable in 1874) [ClassicSimilarity], result of:
      0.06577167 = score(doc=1874,freq=2.0), product of:
        0.2768342 = queryWeight, product of:
          6.1439276 = idf(docFreq=257, maxDocs=44218)
          0.04505818 = queryNorm
        0.23758507 = fieldWeight in 1874, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.1439276 = idf(docFreq=257, maxDocs=44218)
          0.02734375 = fieldNorm(doc=1874)
  0.2 = coord(1/5)
```
Content

"People can summarize a complex scene in a few words without thinking twice. It's much more difficult for computers. But we've just gotten a bit closer -- we've developed a machine-learning system that can automatically produce captions (like the three above) to accurately describe images the first time it sees them. This kind of system could eventually help visually impaired people understand pictures, provide alternate text for images in parts of the world where mobile connections are slow, and make it easier for everyone to search on Google for images. Recent research has greatly improved object detection, classification, and labeling. But accurately describing a complex scene requires a deeper representation of what's going on in the scene, capturing how the various objects relate to one another and translating it all into natural-sounding language. Many efforts to construct computer-generated natural descriptions of images propose combining current state-of-the-art techniques in both computer vision and natural language processing to form a complete image description approach. But what if we instead merged recent computer vision and language models into a single jointly trained system, taking an image and directly producing a human readable sequence of words to describe it? This idea comes from recent advances in machine translation between languages, where a Recurrent Neural Network (RNN) transforms, say, a French sentence into a vector representation, and a second RNN uses that vector representation to generate a target sentence in German. Now, what if we replaced that first RNN and its input words with a deep Convolutional Neural Network (CNN) trained to classify objects in images? Normally, the CNN's last layer is used in a final Softmax among known classes of objects, assigning a probability that each object might be in the image. But if we remove that final layer, we can instead feed the CNN's rich encoding of the image into a RNN designed to produce phrases. We can then train the whole system directly on images and their captions, so it maximizes the likelihood that descriptions it produces best match the training descriptions for each image.

Milstead, J.L.: Thesauri in a full-text world (1998) 0.01

0.01108232 = product of:
  0.055411596 = sum of:
    0.055411596 = sum of:
      0.024887787 = weight(_text_:data in 2337) [ClassicSimilarity], result of:
        0.024887787 = score(doc=2337,freq=2.0), product of:
          0.14247625 = queryWeight, product of:
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.04505818 = queryNorm
          0.17468026 = fieldWeight in 2337, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.0390625 = fieldNorm(doc=2337)
      0.030523809 = weight(_text_:22 in 2337) [ClassicSimilarity], result of:
        0.030523809 = score(doc=2337,freq=2.0), product of:
          0.15778607 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.04505818 = queryNorm
          0.19345059 = fieldWeight in 2337, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=2337)
  0.2 = coord(1/5)

Date: 22. 9.1997 19:16:05
Source: Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al

Junger, U.; Schwens, U.: ¬Die inhaltliche Erschließung des schriftlichen kulturellen Erbes auf dem Weg in die Zukunft : Automatische Vergabe von Schlagwörtern in der Deutschen Nationalbibliothek (2017) 0.01
```
0.01108232 = product of:
  0.055411596 = sum of:
    0.055411596 = sum of:
      0.024887787 = weight(_text_:data in 3780) [ClassicSimilarity], result of:
        0.024887787 = score(doc=3780,freq=2.0), product of:
          0.14247625 = queryWeight, product of:
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.04505818 = queryNorm
          0.17468026 = fieldWeight in 3780, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.0390625 = fieldNorm(doc=3780)
      0.030523809 = weight(_text_:22 in 3780) [ClassicSimilarity], result of:
        0.030523809 = score(doc=3780,freq=2.0), product of:
          0.15778607 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.04505818 = queryNorm
          0.19345059 = fieldWeight in 3780, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=3780)
  0.2 = coord(1/5)
```
Abstract

Wir leben im 21. Jahrhundert, und vieles, was vor hundert und noch vor fünfzig Jahren als Science Fiction abgetan worden wäre, ist mittlerweile Realität. Raumsonden fliegen zum Mars, machen dort Experimente und liefern Daten zur Erde zurück. Roboter werden für Routineaufgaben eingesetzt, zum Beispiel in der Industrie oder in der Medizin. Digitalisierung, künstliche Intelligenz und automatisierte Verfahren sind kaum mehr aus unserem Alltag wegzudenken. Grundlage vieler Prozesse sind lernende Algorithmen. Die fortschreitende digitale Transformation ist global und umfasst alle Lebens- und Arbeitsbereiche: Wirtschaft, Gesellschaft und Politik. Sie eröffnet neue Möglichkeiten, von denen auch Bibliotheken profitieren. Der starke Anstieg digitaler Publikationen, die einen wichtigen und prozentual immer größer werdenden Teil des Kulturerbes darstellen, sollte für Bibliotheken Anlass sein, diese Möglichkeiten aktiv aufzugreifen und einzusetzen. Die Auswertbarkeit digitaler Inhalte, beispielsweise durch Text- and Data-Mining (TDM), und die Entwicklung technischer Verfahren, mittels derer Inhalte miteinander vernetzt und semantisch in Beziehung gesetzt werden können, bieten Raum, auch bibliothekarische Erschließungsverfahren neu zu denken. Daher beschäftigt sich die Deutsche Nationalbibliothek (DNB) seit einigen Jahren mit der Frage, wie sich die Prozesse bei der Erschließung von Medienwerken verbessern und maschinell unterstützen lassen. Sie steht dabei im regelmäßigen kollegialen Austausch mit anderen Bibliotheken, die sich ebenfalls aktiv mit dieser Fragestellung befassen, sowie mit europäischen Nationalbibliotheken, die ihrerseits Interesse an dem Thema und den Erfahrungen der DNB haben. Als Nationalbibliothek mit umfangreichen Beständen an digitalen Publikationen hat die DNB auch Expertise bei der digitalen Langzeitarchivierung aufgebaut und ist im Netzwerk ihrer Partner als kompetente Gesprächspartnerin geschätzt.

Date

19. 8.2017 9:24:22
Pulgarin, A.; Gil-Leiva, I.: Bibliometric analysis of the automatic indexing literature : 1956-2000 (2004) 0.01
```
0.010562953 = product of:
  0.052814763 = sum of:
    0.052814763 = weight(_text_:bibliographic in 2566) [ClassicSimilarity], result of:
      0.052814763 = score(doc=2566,freq=2.0), product of:
        0.17541347 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.04505818 = queryNorm
        0.30108726 = fieldWeight in 2566, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2566)
  0.2 = coord(1/5)
```
Abstract

We present a bibliometric study of a corpus of 839 bibliographic references about automatic indexing, covering the period 1956-2000. We analyse the distribution of authors and works, the obsolescence and its dispersion, and the distribution of the literature by topic, year, and source type. We conclude that: (i) there has been a constant interest on the part of researchers; (ii) the most studied topics were the techniques and methods employed and the general aspects of automatic indexing; (iii) the productivity of the authors does fit a Lotka distribution (Dmax=0.02 and critical value=0.054); (iv) the annual aging factor is 95%; and (v) the dispersion of the literature is low.
Golub, K.: Automated subject indexing : an overview (2021) 0.01
```
0.010562953 = product of:
  0.052814763 = sum of:
    0.052814763 = weight(_text_:bibliographic in 718) [ClassicSimilarity], result of:
      0.052814763 = score(doc=718,freq=2.0), product of:
        0.17541347 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.04505818 = queryNorm
        0.30108726 = fieldWeight in 718, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0546875 = fieldNorm(doc=718)
  0.2 = coord(1/5)
```
Abstract

In the face of the ever-increasing document volume, libraries around the globe are more and more exploring (semi-) automated approaches to subject indexing. This helps sustain bibliographic objectives, enrich metadata, and establish more connections across documents from various collections, effectively leading to improved information retrieval and access. However, generally accepted automated approaches that are functional in operative systems are lacking. This article aims to provide an overview of basic principles used for automated subject indexing, major approaches in relation to their possible application in actual library systems, existing working examples, as well as related challenges calling for further research.

Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.01

0.009767618 = product of:
  0.04883809 = sum of:
    0.04883809 = product of:
      0.09767618 = sum of:
        0.09767618 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
          0.09767618 = score(doc=402,freq=2.0), product of:
            0.15778607 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04505818 = queryNorm
            0.61904186 = fieldWeight in 402, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.125 = fieldNorm(doc=402)
      0.5 = coord(1/2)
  0.2 = coord(1/5)

Source: Information processing and management. 22(1986) no.6, S.465-476

Humphrey, S.M.: Automatic indexing of documents from journal descriptors : a preliminary investigation (1999) 0.01
```
0.0090539595 = product of:
  0.045269795 = sum of:
    0.045269795 = weight(_text_:bibliographic in 3769) [ClassicSimilarity], result of:
      0.045269795 = score(doc=3769,freq=2.0), product of:
        0.17541347 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.04505818 = queryNorm
        0.2580748 = fieldWeight in 3769, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.046875 = fieldNorm(doc=3769)
  0.2 = coord(1/5)
```
Abstract

A new, fully automated approach for indedexing documents is presented based on associating textwords in a training set of bibliographic citations with the indexing of journals. This journal-level indexing is in the form of a consistent, timely set of journal descriptors (JDs) indexing the individual journals themselves. This indexing is maintained in journal records in a serials authority database. The advantage of this novel approach is that the training set does not depend on previous manual indexing of thousands of documents (i.e., any such indexing already in the training set is not used), but rather the relatively small intellectual effort of indexing at the journal level, usually a matter of a few thousand unique journals for which retrospective indexing to maintain consistency and currency may be feasible. If successful, JD indexing would provide topical categorization of documents outside the training set, i.e., journal articles, monographs, Web documents, reports from the grey literature, etc., and therefore be applied in searching. Because JDs are quite general, corresponding to subject domains, their most problable use would be for improving or refining search results
Nohr, H.: Grundlagen der automatischen Indexierung : ein Lehrbuch (2003) 0.01
```
0.008865855 = product of:
  0.044329274 = sum of:
    0.044329274 = sum of:
      0.01991023 = weight(_text_:data in 1767) [ClassicSimilarity], result of:
        0.01991023 = score(doc=1767,freq=2.0), product of:
          0.14247625 = queryWeight, product of:
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.04505818 = queryNorm
          0.1397442 = fieldWeight in 1767, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.03125 = fieldNorm(doc=1767)
      0.024419045 = weight(_text_:22 in 1767) [ClassicSimilarity], result of:
        0.024419045 = score(doc=1767,freq=2.0), product of:
          0.15778607 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.04505818 = queryNorm
          0.15476047 = fieldWeight in 1767, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.03125 = fieldNorm(doc=1767)
  0.2 = coord(1/5)
```
Date

22. 6.2009 12:46:51

Footnote

Im fünften Kapitel "Information Extraction" geht Nohr auf eine Problemstellung ein, die in der Fachwelt eine noch stärkere Betonung verdiente: "Die stetig ansteigende Zahl elektronischer Dokumente macht neben einer automatischen Erschließung auch eine automatische Gewinnung der relevanten Informationen aus diesen Dokumenten wünschenswert, um diese z.B. für weitere Bearbeitungen oder Auswertungen in betriebliche Informationssysteme übernehmen zu können." (S. 103) "Indexierung und Retrievalverfahren" als voneinander abhängige Verfahren werden im sechsten Kapitel behandelt. Hier stehen Relevance Ranking und Relevance Feedback sowie die Anwendung informationslinguistischer Verfahren in der Recherche im Mittelpunkt. Die "Evaluation automatischer Indexierung" setzt den thematischen Schlusspunkt. Hier geht es vor allem um die Oualität einer Indexierung, um gängige Retrievalmaße in Retrievaltest und deren Einssatz. Weiterhin ist hervorzuheben, dass jedes Kapitel durch die Vorgabe von Lernzielen eingeleitet wird und zu den jeweiligen Kapiteln (im hinteren Teil des Buches) einige Kontrollfragen gestellt werden. Die sehr zahlreichen Beispiele aus der Praxis, ein Abkürzungsverzeichnis und ein Sachregister erhöhen den Nutzwert des Buches. Die Lektüre förderte beim Rezensenten das Verständnis für die Zusammenhänge von BID-Handwerkzeug, Wirtschaftsinformatik (insbesondere Data Warehousing) und Künstlicher Intelligenz. Die "Grundlagen der automatischen Indexierung" sollte auch in den bibliothekarischen Studiengängen zur Pflichtlektüre gehören. Holger Nohrs Lehrbuch ist auch für den BID-Profi geeignet, um die mehr oder weniger fundierten Kenntnisse auf dem Gebiet "automatisches Indexieren" schnell, leicht verständlich und informativ aufzufrischen."

Fuhr, N.; Niewelt, B.: ¬Ein Retrievaltest mit automatisch indexierten Dokumenten (1984) 0.01

0.008546666 = product of:
  0.04273333 = sum of:
    0.04273333 = product of:
      0.08546666 = sum of:
        0.08546666 = weight(_text_:22 in 262) [ClassicSimilarity], result of:
          0.08546666 = score(doc=262,freq=2.0), product of:
            0.15778607 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04505818 = queryNorm
            0.5416616 = fieldWeight in 262, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=262)
      0.5 = coord(1/2)
  0.2 = coord(1/5)

Date: 20.10.2000 12:22:23

Hlava, M.M.K.: Automatic indexing : comparing rule-based and statistics-based indexing systems (2005) 0.01

0.008546666 = product of:
  0.04273333 = sum of:
    0.04273333 = product of:
      0.08546666 = sum of:
        0.08546666 = weight(_text_:22 in 6265) [ClassicSimilarity], result of:
          0.08546666 = score(doc=6265,freq=2.0), product of:
            0.15778607 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04505818 = queryNorm
            0.5416616 = fieldWeight in 6265, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=6265)
      0.5 = coord(1/2)
  0.2 = coord(1/5)

Source: Information outlook. 9(2005) no.8, S.22-23

Ferber, R.: Automated indexing with thesaurus descriptors : a co-occurence based approach to multilingual retrieval (1997) 0.01
```
0.007544966 = product of:
  0.03772483 = sum of:
    0.03772483 = weight(_text_:bibliographic in 4144) [ClassicSimilarity], result of:
      0.03772483 = score(doc=4144,freq=2.0), product of:
        0.17541347 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.04505818 = queryNorm
        0.21506234 = fieldWeight in 4144, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4144)
  0.2 = coord(1/5)
```
Abstract

Indexing documents with descriptors from a multilingual thesaurus is an approach to multilingual information retrieval. However, manual indexing is expensive. Automazed indexing methods in general use terms found in the document. Thesaurus descriptors are complex terms that are often not used in documents or have specific meanings within the thesaurus; therefore most weighting schemes of automated indexing methods are not suited to select thesaurus descriptors. In this paper a linear associative system is described that uses similarity values extracted from a large corpus of manually indexed documents to construct a rank ordering of the descriptors for a given document title. The system is adaptive and has to be tuned with a training sample of records for the specific task. The system was tested on a corpus of some 80.000 bibliographic records. The results show a high variability with changing parameter values. This indicated that it is very important to empirically adapt the model to the specific situation it is used in. The overall median of the manually assigned descriptors in the automatically generated ranked list of all 3.631 descriptors is 14 for the set used to adapt the system and 11 for a test set not used in the optimization process. This result shows that the optimization is not a fitting to a specific training set but a real adaptation of the model to the setting

Fuhr, N.: Ranking-Experimente mit gewichteter Indexierung (1986) 0.01

0.0073257135 = product of:
  0.036628567 = sum of:
    0.036628567 = product of:
      0.07325713 = sum of:
        0.07325713 = weight(_text_:22 in 58) [ClassicSimilarity], result of:
          0.07325713 = score(doc=58,freq=2.0), product of:
            0.15778607 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04505818 = queryNorm
            0.46428138 = fieldWeight in 58, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=58)
      0.5 = coord(1/2)
  0.2 = coord(1/5)

Date: 14. 6.2015 22:12:44

Search (80 results, page 1 of 4)

Authors

Years

Languages

Types

Themes