Search (85 results, page 1 of 5)

Plaunt, C.; Norgard, B.A.: ¬An association-based method for automatic indexing with a controlled vocabulary (1998) 0.05
```
0.05479938 = product of:
  0.10959876 = sum of:
    0.10959876 = sum of:
      0.078771055 = weight(_text_:subject in 1794) [ClassicSimilarity], result of:
        0.078771055 = score(doc=1794,freq=12.0), product of:
          0.16275941 = queryWeight, product of:
            3.576596 = idf(docFreq=3361, maxDocs=44218)
            0.04550679 = queryNorm
          0.48397237 = fieldWeight in 1794, product of:
            3.4641016 = tf(freq=12.0), with freq of:
              12.0 = termFreq=12.0
            3.576596 = idf(docFreq=3361, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1794)
      0.03082771 = weight(_text_:22 in 1794) [ClassicSimilarity], result of:
        0.03082771 = score(doc=1794,freq=2.0), product of:
          0.15935703 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.04550679 = queryNorm
          0.19345059 = fieldWeight in 1794, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1794)
  0.5 = coord(1/2)
```
Abstract

In this article, we describe and test a two-stage algorithm based on a lexical collocation technique which maps from the lexical clues contained in a document representation into a controlled vocabulary list of subject headings. Using a collection of 4.626 INSPEC documents, we create a 'dictionary' of associations between the lexical items contained in the titles, authors, and abstracts, and controlled vocabulary subject headings assigned to those records by human indexers using a likelihood ratio statistic as the measure of association. In the deployment stage, we use the dictiony to predict which of the controlled vocabulary subject headings best describe new documents when they are presented to the system. Our evaluation of this algorithm, in which we compare the automatically assigned subject headings to the subject headings assigned to the test documents by human catalogers, shows that we can obtain results comparable to, and consistent with, human cataloging. In effect we have cast this as a classic partial match information retrieval problem. We consider the problem to be one of 'retrieving' (or assigning) the most probably 'relevant' (or correct) controlled vocabulary subject headings to a document based on the clues contained in that document

Date

11. 9.2000 19:53:22

Hodges, P.R.: Keyword in title indexes : effectiveness of retrieval in computer searches (1983) 0.04

0.0440901 = product of:
  0.0881802 = sum of:
    0.0881802 = sum of:
      0.045021407 = weight(_text_:subject in 5001) [ClassicSimilarity], result of:
        0.045021407 = score(doc=5001,freq=2.0), product of:
          0.16275941 = queryWeight, product of:
            3.576596 = idf(docFreq=3361, maxDocs=44218)
            0.04550679 = queryNorm
          0.27661324 = fieldWeight in 5001, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.576596 = idf(docFreq=3361, maxDocs=44218)
            0.0546875 = fieldNorm(doc=5001)
      0.043158792 = weight(_text_:22 in 5001) [ClassicSimilarity], result of:
        0.043158792 = score(doc=5001,freq=2.0), product of:
          0.15935703 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.04550679 = queryNorm
          0.2708308 = fieldWeight in 5001, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=5001)
  0.5 = coord(1/2)

Abstract: A study was done to test the effectiveness of retrieval using title word searching. It was based on actual search profiles used in the Mechanized Information Center at Ohio State University, in order ro replicate as closely as possible actual searching conditions. Fewer than 50% of the relevant titles were retrieved by keywords in titles. The low rate of retrieval can be attributes to three sources: titles themselves, user and information specialist ignorance of the subject vocabulary in use, and to general language problems. Across fields it was found that the social sciences had the best retrieval rate, with science having the next best, and arts and humanities the lowest. Ways to enhance and supplement keyword in title searching on the computer and in printed indexes are discussed.
Date: 14. 3.1996 13:22:21

Bordoni, L.; Pazienza, M.T.: Documents automatic indexing in an environmental domain (1997) 0.04

0.0440901 = product of:
  0.0881802 = sum of:
    0.0881802 = sum of:
      0.045021407 = weight(_text_:subject in 530) [ClassicSimilarity], result of:
        0.045021407 = score(doc=530,freq=2.0), product of:
          0.16275941 = queryWeight, product of:
            3.576596 = idf(docFreq=3361, maxDocs=44218)
            0.04550679 = queryNorm
          0.27661324 = fieldWeight in 530, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.576596 = idf(docFreq=3361, maxDocs=44218)
            0.0546875 = fieldNorm(doc=530)
      0.043158792 = weight(_text_:22 in 530) [ClassicSimilarity], result of:
        0.043158792 = score(doc=530,freq=2.0), product of:
          0.15935703 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.04550679 = queryNorm
          0.2708308 = fieldWeight in 530, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=530)
  0.5 = coord(1/2)

Abstract: Describes an application of Natural Language Processing (NLP) techniques, in HIRMA (Hypertextual Information Retrieval Managed by ARIOSTO), to the problem of document indexing by referring to a system which incorporates natural language processing techniques to determine the subject of the text of documents and to associate them with relevant semantic indexes. Describes briefly the overall system, details of its implementation on a corpus of scientific abstracts related to environmental topics and experimental evidence of the system's behaviour. Analyzes in detail an experiment designed to evaluate the system's retrieval ability in terms of recall and precision
Source: International forum on information and documentation. 22(1997) no.1, S.17-28

Milstead, J.L.: Thesauri in a full-text world (1998) 0.03

0.03149293 = product of:
  0.06298586 = sum of:
    0.06298586 = sum of:
      0.032158148 = weight(_text_:subject in 2337) [ClassicSimilarity], result of:
        0.032158148 = score(doc=2337,freq=2.0), product of:
          0.16275941 = queryWeight, product of:
            3.576596 = idf(docFreq=3361, maxDocs=44218)
            0.04550679 = queryNorm
          0.19758089 = fieldWeight in 2337, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.576596 = idf(docFreq=3361, maxDocs=44218)
            0.0390625 = fieldNorm(doc=2337)
      0.03082771 = weight(_text_:22 in 2337) [ClassicSimilarity], result of:
        0.03082771 = score(doc=2337,freq=2.0), product of:
          0.15935703 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.04550679 = queryNorm
          0.19345059 = fieldWeight in 2337, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=2337)
  0.5 = coord(1/2)

Date: 22. 9.1997 19:16:05
Source: Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al

Asula, M.; Makke, J.; Freienthal, L.; Kuulmets, H.-A.; Sirel, R.: Kratt: developing an automatic subject indexing tool for the National Library of Estonia : how to transfer metadata information among work cluster members (2021) 0.03
```
0.025524741 = product of:
  0.051049482 = sum of:
    0.051049482 = product of:
      0.102098964 = sum of:
        0.102098964 = weight(_text_:subject in 723) [ClassicSimilarity], result of:
          0.102098964 = score(doc=723,freq=14.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.6272999 = fieldWeight in 723, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.046875 = fieldNorm(doc=723)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Manual subject indexing in libraries is a time-consuming and costly process and the quality of the assigned subjects is affected by the cataloger's knowledge on the specific topics contained in the book. Trying to solve these issues, we exploited the opportunities arising from artificial intelligence to develop Kratt: a prototype of an automatic subject indexing tool. Kratt is able to subject index a book independent of its extent and genre with a set of keywords present in the Estonian Subject Thesaurus. It takes Kratt approximately one minute to subject index a book, outperforming humans 10-15 times. Although the resulting keywords were not considered satisfactory by the catalogers, the ratings of a small sample of regular library users showed more promise. We also argue that the results can be enhanced by including a bigger corpus for training the model and applying more careful preprocessing techniques.

Footnote

Teil eines Themenheftes: Artificial intelligence (AI) and automated processes for subject sccess
Micco, M.; Popp, R.: Improving library subject access (ILSA) : a theory of clustering based in classification (1994) 0.03
```
0.025167733 = product of:
  0.050335467 = sum of:
    0.050335467 = product of:
      0.10067093 = sum of:
        0.10067093 = weight(_text_:subject in 7715) [ClassicSimilarity], result of:
          0.10067093 = score(doc=7715,freq=10.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.61852604 = fieldWeight in 7715, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0546875 = fieldNorm(doc=7715)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The ILSA prototype was developed using an object-oriented multimedia user interfcae on six NeXT workstations with two databases: the first with 100.000 MARC records and the second with 20.000 additional records enhanced with table of contents data. The items are grouped into subject clusters consisting of the classification number and the first subject heading assigned. Every other distinct keyword in the MARC record is linked to the subject cluster in an automated natural language mapping scheme, which leads the user from the term entered to the controlled vocabulary of the subject clusters in which the term appeared. The use of a hierarchical classification number (Dewey) makes it possible to broaden or narrow a search at will
Keller, A.: Attitudes among German- and English-speaking librarians toward (automatic) subject indexing (2015) 0.03
```
0.025167733 = product of:
  0.050335467 = sum of:
    0.050335467 = product of:
      0.10067093 = sum of:
        0.10067093 = weight(_text_:subject in 2629) [ClassicSimilarity], result of:
          0.10067093 = score(doc=2629,freq=10.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.61852604 = fieldWeight in 2629, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2629)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The survey described in this article investigates the attitudes of librarians in German- and English-speaking countries toward subject indexing in general, and automatic subject indexing in particular. The results show great similarity between attitudes in both language areas. Respondents agree that the current quality standards should be upheld and dismiss critical voices claiming that subject indexing has lost relevance. With regard to automatic subject indexing, respondents demonstrate considerable skepticism-both with regard to the likely timeframe and the expected quality of such systems. The author considers how this low acceptance poses a difficulty for those involved in change management.

Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.02

0.024662167 = product of:
  0.049324334 = sum of:
    0.049324334 = product of:
      0.09864867 = sum of:
        0.09864867 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
          0.09864867 = score(doc=402,freq=2.0), product of:
            0.15935703 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04550679 = queryNorm
            0.61904186 = fieldWeight in 402, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.125 = fieldNorm(doc=402)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Information processing and management. 22(1986) no.6, S.465-476

Lu, K.; Mao, J.; Li, G.: Toward effective automated weighted subject indexing : a comparison of different approaches in different environments (2018) 0.02
```
0.02411861 = product of:
  0.04823722 = sum of:
    0.04823722 = product of:
      0.09647444 = sum of:
        0.09647444 = weight(_text_:subject in 4292) [ClassicSimilarity], result of:
          0.09647444 = score(doc=4292,freq=18.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.5927426 = fieldWeight in 4292, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4292)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Subject indexing plays an important role in supporting subject access to information resources. Current subject indexing systems do not make adequate distinctions on the importance of assigned subject descriptors. Assigning numeric weights to subject descriptors to distinguish their importance to the documents can strengthen the role of subject metadata. Automated methods are more cost-effective. This study compares different automated weighting methods in different environments. Two evaluation methods were used to assess the performance. Experiments on three datasets in the biomedical domain suggest the performance of different weighting methods depends on whether it is an abstract or full text environment. Mutual information with bag-of-words representation shows the best average performance in the full text environment, while cosine with bag-of-words representation is the best in an abstract environment. The cosine measure has relatively consistent and robust performance. A direct weighting method, IDF (Inverse Document Frequency), can produce quick and reasonable estimates of the weights. Bag-of-words representation generally outperforms the concept-based representation. Further improvement in performance can be obtained by using the learning-to-rank method to integrate different weighting methods. This study follows up Lu and Mao (Journal of the Association for Information Science and Technology, 66, 1776-1784, 2015), in which an automated weighted subject indexing method was proposed and validated. The findings from this study contribute to more effective weighted subject indexing.
Junger, U.: Can indexing be automated? : the example of the Deutsche Nationalbibliothek (2012) 0.02
```
0.022510704 = product of:
  0.045021407 = sum of:
    0.045021407 = product of:
      0.090042815 = sum of:
        0.090042815 = weight(_text_:subject in 1717) [ClassicSimilarity], result of:
          0.090042815 = score(doc=1717,freq=8.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.5532265 = fieldWeight in 1717, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1717)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The German subject headings authority file (Schlagwortnormdatei/SWD) provides a broad controlled vocabulary for indexing documents of all subjects. Traditionally used for intellectual subject cataloguing primarily of books the Deutsche Nationalbibliothek (DNB, German National Library) has been working on developping and implementing procedures for automated assignment of subject headings for online publications. This project, its results and problems are sketched in the paper.

Content

Beitrag für die Tagung: Beyond libraries - subject metadata in the digital environment and semantic web. IFLA Satellite Post-Conference, 17-18 August 2012, Tallinn. Vgl.: http://http://www.nlib.ee/index.php?id=17763.
Short, M.: Text mining and subject analysis for fiction; or, using machine learning and information extraction to assign subject headings to dime novels (2019) 0.02
```
0.022510704 = product of:
  0.045021407 = sum of:
    0.045021407 = product of:
      0.090042815 = sum of:
        0.090042815 = weight(_text_:subject in 5481) [ClassicSimilarity], result of:
          0.090042815 = score(doc=5481,freq=8.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.5532265 = fieldWeight in 5481, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5481)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

This article describes multiple experiments in text mining at Northern Illinois University that were undertaken to improve the efficiency and accuracy of cataloging. It focuses narrowly on subject analysis of dime novels, a format of inexpensive fiction that was popular in the United States between 1860 and 1915. NIU holds more than 55,000 dime novels in its collections, which it is in the process of comprehensively digitizing. Classification, keyword extraction, named-entity recognition, clustering, and topic modeling are discussed as means of assigning subject headings to improve their discoverability by researchers and to increase the productivity of digitization workflows.
Golub, K.: Automated subject indexing : an overview (2021) 0.02
```
0.022510704 = product of:
  0.045021407 = sum of:
    0.045021407 = product of:
      0.090042815 = sum of:
        0.090042815 = weight(_text_:subject in 718) [ClassicSimilarity], result of:
          0.090042815 = score(doc=718,freq=8.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.5532265 = fieldWeight in 718, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0546875 = fieldNorm(doc=718)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

In the face of the ever-increasing document volume, libraries around the globe are more and more exploring (semi-) automated approaches to subject indexing. This helps sustain bibliographic objectives, enrich metadata, and establish more connections across documents from various collections, effectively leading to improved information retrieval and access. However, generally accepted automated approaches that are functional in operative systems are lacking. This article aims to provide an overview of basic principles used for automated subject indexing, major approaches in relation to their possible application in actual library systems, existing working examples, as well as related challenges calling for further research.

Footnote

Teil eines Themenheftes: Artificial intelligence (AI) and automated processes for subject sccess
Chou, C.; Chu, T.: ¬An analysis of BERT (NLP) for assisted subject indexing for Project Gutenberg (2022) 0.02
```
0.022510704 = product of:
  0.045021407 = sum of:
    0.045021407 = product of:
      0.090042815 = sum of:
        0.090042815 = weight(_text_:subject in 1139) [ClassicSimilarity], result of:
          0.090042815 = score(doc=1139,freq=8.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.5532265 = fieldWeight in 1139, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1139)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

In light of AI (Artificial Intelligence) and NLP (Natural language processing) technologies, this article examines the feasibility of using AI/NLP models to enhance the subject indexing of digital resources. While BERT (Bidirectional Encoder Representations from Transformers) models are widely used in scholarly communities, the authors assess whether BERT models can be used in machine-assisted indexing in the Project Gutenberg collection, through suggesting Library of Congress subject headings filtered by certain Library of Congress Classification subclass labels. The findings of this study are informative for further research on BERT models to assist with automatic subject indexing for digital library collections.

Hirawa, M.: Role of keywords in the network searching era (1998) 0.02

0.02227982 = product of:
  0.04455964 = sum of:
    0.04455964 = product of:
      0.08911928 = sum of:
        0.08911928 = weight(_text_:subject in 3446) [ClassicSimilarity], result of:
          0.08911928 = score(doc=3446,freq=6.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.5475522 = fieldWeight in 3446, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0625 = fieldNorm(doc=3446)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: A survey of Japanese OPACs available on the Internet was conducted relating to use of keywords for subject access. The findings suggest that present OPACs are not capable of storing subject-oriented information. Currently available keyword access derives from a merely title-based retrieval system. Contents data should be added to bibliographic records as an efficient way of providing subject access, and costings for this process should be estimated. Word standardisation issues must also be addressed

Fuhr, N.; Niewelt, B.: ¬Ein Retrievaltest mit automatisch indexierten Dokumenten (1984) 0.02

0.021579396 = product of:
  0.043158792 = sum of:
    0.043158792 = product of:
      0.086317584 = sum of:
        0.086317584 = weight(_text_:22 in 262) [ClassicSimilarity], result of:
          0.086317584 = score(doc=262,freq=2.0), product of:
            0.15935703 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04550679 = queryNorm
            0.5416616 = fieldWeight in 262, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=262)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 20.10.2000 12:22:23

Hlava, M.M.K.: Automatic indexing : comparing rule-based and statistics-based indexing systems (2005) 0.02

0.021579396 = product of:
  0.043158792 = sum of:
    0.043158792 = product of:
      0.086317584 = sum of:
        0.086317584 = weight(_text_:22 in 6265) [ClassicSimilarity], result of:
          0.086317584 = score(doc=6265,freq=2.0), product of:
            0.15935703 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04550679 = queryNorm
            0.5416616 = fieldWeight in 6265, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=6265)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Information outlook. 9(2005) no.8, S.22-23

Lu, K.; Mao, J.: ¬An automatic approach to weighted subject indexing : an empirical study in the biomedical domain (2015) 0.02
```
0.019692764 = product of:
  0.039385527 = sum of:
    0.039385527 = product of:
      0.078771055 = sum of:
        0.078771055 = weight(_text_:subject in 4005) [ClassicSimilarity], result of:
          0.078771055 = score(doc=4005,freq=12.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.48397237 = fieldWeight in 4005, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4005)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Subject indexing is an intellectually intensive process that has many inherent uncertainties. Existing manual subject indexing systems generally produce binary outcomes for whether or not to assign an indexing term. This does not sufficiently reflect the extent to which the indexing terms are associated with the documents. On the other hand, the idea of probabilistic or weighted indexing was proposed a long time ago and has seen success in capturing uncertainties in the automatic indexing process. One hurdle to overcome in implementing weighted indexing in manual subject indexing systems is the practical burden that could be added to the already intensive indexing process. This study proposes a method to infer automatically the associations between subject terms and documents through text mining. By uncovering the connections between MeSH descriptors and document text, we are able to derive the weights of MeSH descriptors manually assigned to documents. Our initial results suggest that the inference method is feasible and promising. The study has practical implications for improving subject indexing practice and providing better support for information retrieval.
Golub, K.: Automatic subject indexing of text (2019) 0.02
```
0.019692764 = product of:
  0.039385527 = sum of:
    0.039385527 = product of:
      0.078771055 = sum of:
        0.078771055 = weight(_text_:subject in 5268) [ClassicSimilarity], result of:
          0.078771055 = score(doc=5268,freq=12.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.48397237 = fieldWeight in 5268, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5268)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Automatic subject indexing addresses problems of scale and sustainability and can be at the same time used to enrich existing metadata records, establish more connections across and between resources from various metadata and resource collec-tions, and enhance consistency of the metadata. In this work, au-tomatic subject indexing focuses on assigning index terms or classes from established knowledge organization systems (KOSs) for subject indexing like thesauri, subject headings systems and classification systems. The following major approaches are dis-cussed, in terms of their similarities and differences, advantages and disadvantages for automatic assigned indexing from KOSs: "text categorization," "document clustering," and "document classification." Text categorization is perhaps the most wide-spread, machine-learning approach with what seems generally good reported performance. Document clustering automatically both creates groups of related documents and extracts names of subjects depicting the group at hand. Document classification re-uses the intellectual effort invested into creating a KOS for sub-ject indexing and even simple string-matching algorithms have been reported to achieve good results, because one concept can be described using a number of different terms, including equiv-alent, related, narrower and broader terms. Finally, applicability of automatic subject indexing to operative information systems and challenges of evaluation are outlined, suggesting the need for more research.
Ahmed, M.: Automatic indexing for agriculture : designing a framework by deploying Agrovoc, Agris and Annif (2023) 0.02
```
0.019692764 = product of:
  0.039385527 = sum of:
    0.039385527 = product of:
      0.078771055 = sum of:
        0.078771055 = weight(_text_:subject in 1024) [ClassicSimilarity], result of:
          0.078771055 = score(doc=1024,freq=12.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.48397237 = fieldWeight in 1024, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1024)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

There are several ways to employ machine learning for automating subject indexing. One popular strategy is to utilize a supervised learning algorithm to train a model on a set of documents that have been manually indexed by subject matter using a standard vocabulary. The resulting model can then predict the subject of new and previously unseen documents by identifying patterns learned from the training data. To do this, the first step is to gather a large dataset of documents and manually assign each document a set of subject keywords/descriptors from a controlled vocabulary (e.g., from Agrovoc). Next, the dataset (obtained from Agris) can be divided into - i) a training dataset, and ii) a test dataset. The training dataset is used to train the model, while the test dataset is used to evaluate the model's performance. Machine learning can be a powerful tool for automating the process of subject indexing. This research is an attempt to apply Annif (http://annif. org/), an open-source AI/ML framework, to autogenerate subject keywords/descriptors for documentary resources in the domain of agriculture. The training dataset is obtained from Agris, which applies the Agrovoc thesaurus as a vocabulary tool (https://www.fao.org/agris/download).
Gomez, I.: Coping with the problem of subject classification diversity (1996) 0.02
```
0.019494843 = product of:
  0.038989685 = sum of:
    0.038989685 = product of:
      0.07797937 = sum of:
        0.07797937 = weight(_text_:subject in 5074) [ClassicSimilarity], result of:
          0.07797937 = score(doc=5074,freq=6.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.4791082 = fieldWeight in 5074, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5074)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The delimination of a research field in bibliometric studies presents the problem of the diversity of subject classifications used in the sources of input and output data. Classification of documents according the thematic codes or keywords is the most accurate method, mainly used is specialized bibliographic or patent databases. Classification of journals in disciplines presents lower specifity, and some shortcomings as the change over time of both journals and disciplines and the increasing interdisciplinarity of research. Standardization of subject classifications emerges as an important point in bibliometric studies in order to allow international comparisons, although flexibility is needed to meet the needs of local studies

Search (85 results, page 1 of 5)

Authors

Years

Languages

Types

Themes

Classifications