Search (62 results, page 2 of 4)

Chung, E.-K.; Miksa, S.; Hastings, S.K.: ¬A framework of automatic subject term assignment for text categorization : an indexing conception-based approach (2010) 0.02
```
0.018191395 = product of:
  0.03638279 = sum of:
    0.03638279 = product of:
      0.07276558 = sum of:
        0.07276558 = weight(_text_:subject in 3434) [ClassicSimilarity], result of:
          0.07276558 = score(doc=3434,freq=16.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.4470745 = fieldWeight in 3434, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.03125 = fieldNorm(doc=3434)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The purpose of this study is to examine whether the understandings of subject-indexing processes conducted by human indexers have a positive impact on the effectiveness of automatic subject term assignment through text categorization (TC). More specifically, human indexers' subject-indexing approaches, or conceptions, in conjunction with semantic sources were explored in the context of a typical scientific journal article dataset. Based on the premise that subject indexing approaches or conceptions with semantic sources are important for automatic subject term assignment through TC, this study proposed an indexing conception-based framework. For the purpose of this study, two research questions were explored: To what extent are semantic sources effective? To what extent are indexing conceptions effective? The experiments were conducted using a Support Vector Machine implementation in WEKA (I.H. Witten & E. Frank, [2000]). Using F-measure, the experiment results showed that cited works, source title, and title were as effective as the full text while a keyword was found more effective than the full text. In addition, the findings showed that an indexing conception-based framework was more effective than the full text. The content-oriented and the document-oriented indexing approaches especially were found more effective than the full text. Among three indexing conception-based approaches, the content-oriented approach and the document-oriented approach were more effective than the domain-oriented approach. In other words, in the context of a typical scientific journal article dataset, the objective contents and authors' intentions were more desirable for automatic subject term assignment via TC than the possible users' needs. The findings of this study support that incorporation of human indexers' indexing approaches or conception in conjunction with semantic sources has a positive impact on the effectiveness of automatic subject term assignment.

Oliver, C.: Leveraging KOS to extend our reach with automated processes (2021) 0.02

0.018191395 = product of:
  0.03638279 = sum of:
    0.03638279 = product of:
      0.07276558 = sum of:
        0.07276558 = weight(_text_:subject in 722) [ClassicSimilarity], result of:
          0.07276558 = score(doc=722,freq=4.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.4470745 = fieldWeight in 722, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0625 = fieldNorm(doc=722)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: This article provides a conclusion to the special issue on Artificial Intelligence (AI) and Automated Processes for Subject Access. The authors who contributed to this special issue have provoked interesting questions as well as bringing attention to important issues. This concluding article looks at common themes and highlights some of the questions raised.
Footnote: Teil eines Themenheftes: Artificial intelligence (AI) and automated processes for subject sccess

Roberts, D.; Souter, C.: ¬The automation of controlled vocabulary subject indexing of medical journal articles (2000) 0.02
```
0.016709864 = product of:
  0.03341973 = sum of:
    0.03341973 = product of:
      0.06683946 = sum of:
        0.06683946 = weight(_text_:subject in 711) [ClassicSimilarity], result of:
          0.06683946 = score(doc=711,freq=6.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.41066417 = fieldWeight in 711, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.046875 = fieldNorm(doc=711)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

This article discusses the possibility of the automation of sophisticated subject indexing of medical journal articles. Approaches to subject descriptor assignment in information retrieval research are usually either based upon the manual descriptors in the database or generation of search parameters from the text of the article. The principles of the Medline indexing system are described, followed by a summary of a pilot project, based upon the Amed database. The results suggest that a more extended study, based upon Medline, should encompass various components: Extraction of 'concept strings' from titles and abstracts of records, based upon linguistic features characteristic of medical literature. Use of the Unified Medical Language System (UMLS) for identification of controlled vocabulary descriptors. Coordination of descriptors, utilising features of the Medline indexing system. The emphasis should be on system manipulation of data, based upon input, available resources and specifically designed rules.

Olsgaard, J.N.; Evans, E.J.: Improving keyword indexing (1981) 0.02

0.016079074 = product of:
  0.032158148 = sum of:
    0.032158148 = product of:
      0.064316295 = sum of:
        0.064316295 = weight(_text_:subject in 4996) [ClassicSimilarity], result of:
          0.064316295 = score(doc=4996,freq=2.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.39516178 = fieldWeight in 4996, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.078125 = fieldNorm(doc=4996)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: This communication examines some of the most frequently cited critisms of keyword indexing. These critisms include (1) absence of general subject headings, (2) limited entry points, and (3) irrelevant indexing. Some solutions are suggested to meet these critisms.

Milstead, J.L.: Methodologies for subject analysis in bibliographic databases (1992) 0.02
```
0.01591747 = product of:
  0.03183494 = sum of:
    0.03183494 = product of:
      0.06366988 = sum of:
        0.06366988 = weight(_text_:subject in 2311) [ClassicSimilarity], result of:
          0.06366988 = score(doc=2311,freq=4.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.3911902 = fieldWeight in 2311, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2311)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The goal of the study was to determine the state of the art of subject analysis as applied to large bibliographic data bases. The intent was to gather and evaluate information, casting it in a form that could be applied by management. There was no attempt to determine actual costs or trade-offs among costs and possible benefits. Commercial automatic indexing packages were also reviewed. The overall conclusion was that data base producers should begin working seriously on upgrading their thesauri and codifying their indexing policies as a means of moving toward development of machine aids to indexing, but that fully automatic indexing is not yet ready for wholesale implementation
Losee, R.M.: ¬A Gray code based ordering for documents on shelves : classification for browsing and retrieval (1992) 0.02
```
0.01591747 = product of:
  0.03183494 = sum of:
    0.03183494 = product of:
      0.06366988 = sum of:
        0.06366988 = weight(_text_:subject in 2335) [ClassicSimilarity], result of:
          0.06366988 = score(doc=2335,freq=4.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.3911902 = fieldWeight in 2335, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2335)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

A document classifier places documents together in a linear arrangement for browsing or high-speed access by human or computerised information retrieval systems. Requirements for document classification and browsing systems are developed from similarity measures, distance measures, and the notion of subject aboutness. A requirement that documents be arranged in decreasing order of similarity as the distance from a given document increases can often not be met. Based on these requirements, information-theoretic considerations, and the Gray code, a classification system is proposed that can classifiy documents without human intervention. A measure of classifier performance is developed, and used to evaluate experimental results comparing the distance between subject headings assigned to documents given classifications from the proposed system and the Library of Congress Classification (LCC) system
Shafer, K.: Scorpion Project explores using Dewey to organize the Web (1996) 0.02
```
0.01591747 = product of:
  0.03183494 = sum of:
    0.03183494 = product of:
      0.06366988 = sum of:
        0.06366988 = weight(_text_:subject in 6750) [ClassicSimilarity], result of:
          0.06366988 = score(doc=6750,freq=4.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.3911902 = fieldWeight in 6750, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6750)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

As the amount of accessible information on the WWW increases, so will the cost of accessing it, even if search servcies remain free, due to the increasing amount of time users will have to spend to find needed items. Considers what the seemingly unorganized Web and the organized world of libraries can offer each other. The OCLC Scorpion Project is attempting to combine indexing and cataloguing, specifically focusing on building tools for automatic subject recognition using the technqiues of library science and information retrieval. If subject headings or concept domains can be automatically assigned to electronic items, improved filtering tools for searching can be produced

Biebricher, N.; Fuhr, N.; Lustig, G.; Schwantner, M.; Knorz, G.: ¬The automatic indexing system AIR/PHYS : from research to application (1988) 0.02

0.015413855 = product of:
  0.03082771 = sum of:
    0.03082771 = product of:
      0.06165542 = sum of:
        0.06165542 = weight(_text_:22 in 1952) [ClassicSimilarity], result of:
          0.06165542 = score(doc=1952,freq=2.0), product of:
            0.15935703 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04550679 = queryNorm
            0.38690117 = fieldWeight in 1952, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=1952)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 16. 8.1998 12:51:22

Kutschekmanesch, S.; Lutes, B.; Moelle, K.; Thiel, U.; Tzeras, K.: Automated multilingual indexing : a synthesis of rule-based and thesaurus-based methods (1998) 0.02

0.015413855 = product of:
  0.03082771 = sum of:
    0.03082771 = product of:
      0.06165542 = sum of:
        0.06165542 = weight(_text_:22 in 4157) [ClassicSimilarity], result of:
          0.06165542 = score(doc=4157,freq=2.0), product of:
            0.15935703 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04550679 = queryNorm
            0.38690117 = fieldWeight in 4157, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=4157)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Information und Märkte: 50. Deutscher Dokumentartag 1998, Kongreß der Deutschen Gesellschaft für Dokumentation e.V. (DGD), Rheinische Friedrich-Wilhelms-Universität Bonn, 22.-24. September 1998. Hrsg. von Marlies Ockenfeld u. Gerhard J. Mantwill

Stankovic, R. et al.: Indexing of textual databases based on lexical resources : a case study for Serbian (2016) 0.02

0.015413855 = product of:
  0.03082771 = sum of:
    0.03082771 = product of:
      0.06165542 = sum of:
        0.06165542 = weight(_text_:22 in 2759) [ClassicSimilarity], result of:
          0.06165542 = score(doc=2759,freq=2.0), product of:
            0.15935703 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04550679 = queryNorm
            0.38690117 = fieldWeight in 2759, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2759)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 1. 2.2016 18:25:22

Suominen, O.; Koskenniemi, I.: Annif Analyzer Shootout : comparing text lemmatization methods for automated subject indexing (2022) 0.01
```
0.0139248865 = product of:
  0.027849773 = sum of:
    0.027849773 = product of:
      0.055699546 = sum of:
        0.055699546 = weight(_text_:subject in 658) [ClassicSimilarity], result of:
          0.055699546 = score(doc=658,freq=6.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.34222013 = fieldWeight in 658, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0390625 = fieldNorm(doc=658)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Automated text classification is an important function for many AI systems relevant to libraries, including automated subject indexing and classification. When implemented using the traditional natural language processing (NLP) paradigm, one key part of the process is the normalization of words using stemming or lemmatization, which reduces the amount of linguistic variation and often improves the quality of classification. In this paper, we compare the output of seven different text lemmatization algorithms as well as two baseline methods. We measure how the choice of method affects the quality of text classification using example corpora in three languages. The experiments have been performed using the open source Annif toolkit for automated subject indexing and classification, but should generalize also to other NLP toolkits and similar text classification tasks. The results show that lemmatization methods in most cases outperform baseline methods in text classification particularly for Finnish and Swedish text, but not English, where baseline methods are most effective. The differences between lemmatization methods are quite small. The systematic comparison will help optimize text classification pipelines and inform the further development of the Annif toolkit to incorporate a wider choice of normalization methods.

Schuegraf, E.J.; Bommel, M.F.van: ¬An automatic document indexing system based on cooperating expert systems : design and development (1993) 0.01

0.012863259 = product of:
  0.025726518 = sum of:
    0.025726518 = product of:
      0.051453035 = sum of:
        0.051453035 = weight(_text_:subject in 6504) [ClassicSimilarity], result of:
          0.051453035 = score(doc=6504,freq=2.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.31612942 = fieldWeight in 6504, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0625 = fieldNorm(doc=6504)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: Discusses the design of an automatic indexing system based on two cooperating expert systems and the investigation related to its development. The design combines statistical and artificial intelligence techniques. Examines choice of content indicators, the effect of stemming and the identification of characteristic vocabularies for given subject areas. Presents experimental results. Discusses the application of machine learning algorithms to the identification of vocabularies

Abdul, H.; Khoo, C.: Automatic indexing of medical literature using phrase matching : an exploratory study 0.01

0.012863259 = product of:
  0.025726518 = sum of:
    0.025726518 = product of:
      0.051453035 = sum of:
        0.051453035 = weight(_text_:subject in 3601) [ClassicSimilarity], result of:
          0.051453035 = score(doc=3601,freq=2.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.31612942 = fieldWeight in 3601, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0625 = fieldNorm(doc=3601)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: Reports the 1st part of a study to apply the technique of phrase matching to the automatic assignment of MeSH subject headings and subheadings to abstracts of periodical articles.

Prasad, A.R.D.: PROMETHEUS: an automatic indexing system (1996) 0.01

0.012863259 = product of:
  0.025726518 = sum of:
    0.025726518 = product of:
      0.051453035 = sum of:
        0.051453035 = weight(_text_:subject in 5189) [ClassicSimilarity], result of:
          0.051453035 = score(doc=5189,freq=2.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.31612942 = fieldWeight in 5189, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0625 = fieldNorm(doc=5189)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: An automatic indexing system using the tools and techniques of artificial intelligence is described. The paper presents the various components of the system like the parser, grammar formalism, lexicon, and the frame based knowledge representation for semantic representation. The semantic representation is based on the Ranganathan school of thought, especially that of Deep Structure of Subject Indexing Languages enunciated by Bhattacharyya. It is attempted to demonstrate the various stepts in indexing by providing an illustration

Willis, C.; Losee, R.M.: ¬A random walk on an ontology : using thesaurus structure for automatic subject indexing (2013) 0.01
```
0.012863259 = product of:
  0.025726518 = sum of:
    0.025726518 = product of:
      0.051453035 = sum of:
        0.051453035 = weight(_text_:subject in 1016) [ClassicSimilarity], result of:
          0.051453035 = score(doc=1016,freq=8.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.31612942 = fieldWeight in 1016, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.03125 = fieldNorm(doc=1016)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Relationships between terms and features are an essential component of thesauri, ontologies, and a range of controlled vocabularies. In this article, we describe ways to identify important concepts in documents using the relationships in a thesaurus or other vocabulary structures. We introduce a methodology for the analysis and modeling of the indexing process based on a weighted random walk algorithm. The primary goal of this research is the analysis of the contribution of thesaurus structure to the indexing process. The resulting models are evaluated in the context of automatic subject indexing using four collections of documents pre-indexed with 4 different thesauri (AGROVOC [UN Food and Agriculture Organization], high-energy physics taxonomy [HEP], National Agricultural Library Thesaurus [NALT], and medical subject headings [MeSH]). We also introduce a thesaurus-centric matching algorithm intended to improve the quality of candidate concepts. In all cases, the weighted random walk improves automatic indexing performance over matching alone with an increase in average precision (AP) of 9% for HEP, 11% for MeSH, 35% for NALT, and 37% for AGROVOC. The results of the analysis support our hypothesis that subject indexing is in part a browsing process, and that using the vocabulary and its structure in a thesaurus contributes to the indexing process. The amount that the vocabulary structure contributes was found to differ among the 4 thesauri, possibly due to the vocabulary used in the corresponding thesauri and the structural relationships between the terms. Each of the thesauri and the manual indexing associated with it is characterized using the methods developed here.

Tsujii, J.-I.: Automatic acquisition of semantic collocation from corpora (1995) 0.01

0.012331083 = product of:
  0.024662167 = sum of:
    0.024662167 = product of:
      0.049324334 = sum of:
        0.049324334 = weight(_text_:22 in 4709) [ClassicSimilarity], result of:
          0.049324334 = score(doc=4709,freq=2.0), product of:
            0.15935703 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04550679 = queryNorm
            0.30952093 = fieldWeight in 4709, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=4709)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 31. 7.1996 9:22:19

Riloff, E.: ¬An empirical study of automated dictionary construction for information extraction in three domains (1996) 0.01

0.012331083 = product of:
  0.024662167 = sum of:
    0.024662167 = product of:
      0.049324334 = sum of:
        0.049324334 = weight(_text_:22 in 6752) [ClassicSimilarity], result of:
          0.049324334 = score(doc=6752,freq=2.0), product of:
            0.15935703 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04550679 = queryNorm
            0.30952093 = fieldWeight in 6752, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=6752)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 6. 3.1997 16:22:15

Golub, K.; Lykke, M.; Tudhope, D.: Enhancing social tagging with automated keywords from the Dewey Decimal Classification (2014) 0.01
```
0.011369622 = product of:
  0.022739245 = sum of:
    0.022739245 = product of:
      0.04547849 = sum of:
        0.04547849 = weight(_text_:subject in 2918) [ClassicSimilarity], result of:
          0.04547849 = score(doc=2918,freq=4.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.27942157 = fieldWeight in 2918, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2918)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Purpose - The purpose of this paper is to explore the potential of applying the Dewey Decimal Classification (DDC) as an established knowledge organization system (KOS) for enhancing social tagging, with the ultimate purpose of improving subject indexing and information retrieval. Design/methodology/approach - Over 11.000 Intute metadata records in politics were used. Totally, 28 politics students were each given four tasks, in which a total of 60 resources were tagged in two different configurations, one with uncontrolled social tags only and another with uncontrolled social tags as well as suggestions from a controlled vocabulary. The controlled vocabulary was DDC comprising also mappings from the Library of Congress Subject Headings. Findings - The results demonstrate the importance of controlled vocabulary suggestions for indexing and retrieval: to help produce ideas of which tags to use, to make it easier to find focus for the tagging, to ensure consistency and to increase the number of access points in retrieval. The value and usefulness of the suggestions proved to be dependent on the quality of the suggestions, both as to conceptual relevance to the user and as to appropriateness of the terminology. Originality/value - No research has investigated the enhancement of social tagging with suggestions from the DDC, an established KOS, in a user trial, comparing social tagging only and social tagging enhanced with the suggestions. This paper is a final reflection on all aspects of the study.
Golub, K.; Soergel, D.; Buchanan, G.; Tudhope, D.; Lykke, M.; Hiom, D.: ¬A framework for evaluating automatic indexing or classification in the context of retrieval (2016) 0.01
```
0.011369622 = product of:
  0.022739245 = sum of:
    0.022739245 = product of:
      0.04547849 = sum of:
        0.04547849 = weight(_text_:subject in 3311) [ClassicSimilarity], result of:
          0.04547849 = score(doc=3311,freq=4.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.27942157 = fieldWeight in 3311, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3311)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Tools for automatic subject assignment help deal with scale and sustainability in creating and enriching metadata, establishing more connections across and between resources and enhancing consistency. Although some software vendors and experimental researchers claim the tools can replace manual subject indexing, hard scientific evidence of their performance in operating information environments is scarce. A major reason for this is that research is usually conducted in laboratory conditions, excluding the complexities of real-life systems and situations. The article reviews and discusses issues with existing evaluation approaches such as problems of aboutness and relevance assessments, implying the need to use more than a single "gold standard" method when evaluating indexing and retrieval, and proposes a comprehensive evaluation framework. The framework is informed by a systematic review of the literature on evaluation approaches: evaluating indexing quality directly through assessment by an evaluator or through comparison with a gold standard, evaluating the quality of computer-assisted indexing directly in the context of an indexing workflow, and evaluating indexing quality indirectly through analyzing retrieval performance.
Villaespesa, E.; Crider, S.: ¬A critical comparison analysis between human and machine-generated tags for the Metropolitan Museum of Art's collection (2021) 0.01
```
0.011369622 = product of:
  0.022739245 = sum of:
    0.022739245 = product of:
      0.04547849 = sum of:
        0.04547849 = weight(_text_:subject in 341) [ClassicSimilarity], result of:
          0.04547849 = score(doc=341,freq=4.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.27942157 = fieldWeight in 341, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0390625 = fieldNorm(doc=341)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Purpose Based on the highlights of The Metropolitan Museum of Art's collection, the purpose of this paper is to examine the similarities and differences between the subject keywords tags assigned by the museum and those produced by three computer vision systems. Design/methodology/approach This paper uses computer vision tools to generate the data and the Getty Research Institute's Art and Architecture Thesaurus (AAT) to compare the subject keyword tags. Findings This paper finds that there are clear opportunities to use computer vision technologies to automatically generate tags that expand the terms used by the museum. This brings a new perspective to the collection that is different from the traditional art historical one. However, the study also surfaces challenges about the accuracy and lack of context within the computer vision results. Practical implications This finding has important implications on how these machine-generated tags complement the current taxonomies and vocabularies inputted in the collection database. In consequence, the museum needs to consider the selection process for choosing which computer vision system to apply to their collection. Furthermore, they also need to think critically about the kind of tags they wish to use, such as colors, materials or objects. Originality/value The study results add to the rapidly evolving field of computer vision within the art information context and provide recommendations of aspects to consider before selecting and implementing these technologies.

Search (62 results, page 2 of 4)

Authors

Years

Types

Themes