Search (354 results, page 18 of 18)

Witschel, H.F.: Terminology extraction and automatic indexing : comparison and qualitative evaluation of methods (2005) 0.00
```
1.6444239E-4 = product of:
  0.0024666358 = sum of:
    0.0024666358 = product of:
      0.0049332716 = sum of:
        0.0049332716 = weight(_text_:information in 1842) [ClassicSimilarity], result of:
          0.0049332716 = score(doc=1842,freq=2.0), product of:
            0.050870337 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.028978055 = queryNorm
            0.09697737 = fieldWeight in 1842, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1842)
      0.5 = coord(1/2)
  0.06666667 = coord(1/15)
```
Abstract

Many terminology engineering processes involve the task of automatic terminology extraction: before the terminology of a given domain can be modelled, organised or standardised, important concepts (or terms) of this domain have to be identified and fed into terminological databases. These serve in further steps as a starting point for compiling dictionaries, thesauri or maybe even terminological ontologies for the domain. For the extraction of the initial concepts, extraction methods are needed that operate on specialised language texts. On the other hand, many machine learning or information retrieval applications require automatic indexing techniques. In Machine Learning applications concerned with the automatic clustering or classification of texts, often feature vectors are needed that describe the contents of a given text briefly but meaningfully. These feature vectors typically consist of a fairly small set of index terms together with weights indicating their importance. Short but meaningful descriptions of document contents as provided by good index terms are also useful to humans: some knowledge management applications (e.g. topic maps) use them as a set of basic concepts (topics). The author believes that the tasks of terminology extraction and automatic indexing have much in common and can thus benefit from the same set of basic algorithms. It is the goal of this paper to outline some methods that may be used in both contexts, but also to find the discriminating factors between the two tasks that call for the variation of parameters or application of different techniques. The discussion of these methods will be based on statistical, syntactical and especially morphological properties of (index) terms. The paper is concluded by the presentation of some qualitative and quantitative results comparing statistical and morphological methods.

Martins, E.F.; Belém, F.M.; Almeida, J.M.; Gonçalves, M.A.: On cold start for associative tag recommendation (2016) 0.00

1.6444239E-4 = product of:
  0.0024666358 = sum of:
    0.0024666358 = product of:
      0.0049332716 = sum of:
        0.0049332716 = weight(_text_:information in 2494) [ClassicSimilarity], result of:
          0.0049332716 = score(doc=2494,freq=2.0), product of:
            0.050870337 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.028978055 = queryNorm
            0.09697737 = fieldWeight in 2494, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2494)
      0.5 = coord(1/2)
  0.06666667 = coord(1/15)

Source: Journal of the Association for Information Science and Technology. 67(2016) no.1, S.83-105

Blank, I.; Rokach, L.; Shani, G.: Leveraging metadata to recommend keywords for academic papers (2016) 0.00

1.6444239E-4 = product of:
  0.0024666358 = sum of:
    0.0024666358 = product of:
      0.0049332716 = sum of:
        0.0049332716 = weight(_text_:information in 3232) [ClassicSimilarity], result of:
          0.0049332716 = score(doc=3232,freq=2.0), product of:
            0.050870337 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.028978055 = queryNorm
            0.09697737 = fieldWeight in 3232, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3232)
      0.5 = coord(1/2)
  0.06666667 = coord(1/15)

Source: Journal of the Association for Information Science and Technology. 67(2016) no.12, S.3073-3091

Smiraglia, R.P.; Cai, X.: Tracking the evolution of clustering, machine learning, automatic indexing and automatic classification in knowledge organization (2017) 0.00
```
1.6444239E-4 = product of:
  0.0024666358 = sum of:
    0.0024666358 = product of:
      0.0049332716 = sum of:
        0.0049332716 = weight(_text_:information in 3627) [ClassicSimilarity], result of:
          0.0049332716 = score(doc=3627,freq=2.0), product of:
            0.050870337 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.028978055 = queryNorm
            0.09697737 = fieldWeight in 3627, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3627)
      0.5 = coord(1/2)
  0.06666667 = coord(1/15)
```
Abstract

A very important extension of the traditional domain of knowledge organization (KO) arises from attempts to incorporate techniques devised in the computer science domain for automatic concept extraction and for grouping, categorizing, clustering and otherwise organizing knowledge using mechanical means. Four specific terms have emerged to identify the most prevalent techniques: machine learning, clustering, automatic indexing, and automatic classification. Our study presents three domain analytical case analyses in search of answers. The first case relies on citations located using the ISKO-supported "Knowledge Organization Bibliography." The second case relies on works in both Web of Science and SCOPUS. Case three applies co-word analysis and citation analysis to the contents of the papers in the present special issue. We observe scholars involved in "clustering" and "automatic classification" who share common thematic emphases. But we have found no coherence, no common activity and no social semantics. We have not found a research front, or a common teleology within the KO domain. We also have found a lively group of authors who have succeeded in submitting papers to this special issue, and their work quite interestingly aligns with the case studies we report. There is an emphasis on KO for information retrieval; there is much work on clustering (which involves conceptual points within texts) and automatic classification (which involves semantic groupings at the meta-document level).

Strobel, S.; Marín-Arraiza, P.: Metadata for scientific audiovisual media : current practices and perspectives of the TIB / AV-portal (2015) 0.00

1.6444239E-4 = product of:
  0.0024666358 = sum of:
    0.0024666358 = product of:
      0.0049332716 = sum of:
        0.0049332716 = weight(_text_:information in 3667) [ClassicSimilarity], result of:
          0.0049332716 = score(doc=3667,freq=2.0), product of:
            0.050870337 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.028978055 = queryNorm
            0.09697737 = fieldWeight in 3667, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3667)
      0.5 = coord(1/2)
  0.06666667 = coord(1/15)

Series: Communications in computer and information science; 544

Toepfer, M.; Seifert, C.: Content-based quality estimation for automatic subject indexing of short texts under precision and recall constraints 0.00
```
1.6444239E-4 = product of:
  0.0024666358 = sum of:
    0.0024666358 = product of:
      0.0049332716 = sum of:
        0.0049332716 = weight(_text_:information in 4309) [ClassicSimilarity], result of:
          0.0049332716 = score(doc=4309,freq=2.0), product of:
            0.050870337 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.028978055 = queryNorm
            0.09697737 = fieldWeight in 4309, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4309)
      0.5 = coord(1/2)
  0.06666667 = coord(1/15)
```
Abstract

Semantic annotations have to satisfy quality constraints to be useful for digital libraries, which is particularly challenging on large and diverse datasets. Confidence scores of multi-label classification methods typically refer only to the relevance of particular subjects, disregarding indicators of insufficient content representation at the document-level. Therefore, we propose a novel approach that detects documents rather than concepts where quality criteria are met. Our approach uses a deep, multi-layered regression architecture, which comprises a variety of content-based indicators. We evaluated multiple configurations using text collections from law and economics, where the available content is restricted to very short texts. Notably, we demonstrate that the proposed quality estimation technique can determine subsets of the previously unseen data where considerable gains in document-level recall can be achieved, while upholding precision at the same time. Hence, the approach effectively performs a filtering that ensures high data quality standards in operative information retrieval systems.

Li, X.; Zhang, A.; Li, C.; Ouyang, J.; Cai, Y.: Exploring coherent topics by topic modeling with term weighting (2018) 0.00

1.6444239E-4 = product of:
  0.0024666358 = sum of:
    0.0024666358 = product of:
      0.0049332716 = sum of:
        0.0049332716 = weight(_text_:information in 5045) [ClassicSimilarity], result of:
          0.0049332716 = score(doc=5045,freq=2.0), product of:
            0.050870337 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.028978055 = queryNorm
            0.09697737 = fieldWeight in 5045, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5045)
      0.5 = coord(1/2)
  0.06666667 = coord(1/15)

Source: Information processing and management. 54(2018) no.6, S.1345-1358

Golub, K.: Automatic subject indexing of text (2019) 0.00
```
1.6444239E-4 = product of:
  0.0024666358 = sum of:
    0.0024666358 = product of:
      0.0049332716 = sum of:
        0.0049332716 = weight(_text_:information in 5268) [ClassicSimilarity], result of:
          0.0049332716 = score(doc=5268,freq=2.0), product of:
            0.050870337 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.028978055 = queryNorm
            0.09697737 = fieldWeight in 5268, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5268)
      0.5 = coord(1/2)
  0.06666667 = coord(1/15)
```
Abstract

Automatic subject indexing addresses problems of scale and sustainability and can be at the same time used to enrich existing metadata records, establish more connections across and between resources from various metadata and resource collec-tions, and enhance consistency of the metadata. In this work, au-tomatic subject indexing focuses on assigning index terms or classes from established knowledge organization systems (KOSs) for subject indexing like thesauri, subject headings systems and classification systems. The following major approaches are dis-cussed, in terms of their similarities and differences, advantages and disadvantages for automatic assigned indexing from KOSs: "text categorization," "document clustering," and "document classification." Text categorization is perhaps the most wide-spread, machine-learning approach with what seems generally good reported performance. Document clustering automatically both creates groups of related documents and extracts names of subjects depicting the group at hand. Document classification re-uses the intellectual effort invested into creating a KOS for sub-ject indexing and even simple string-matching algorithms have been reported to achieve good results, because one concept can be described using a number of different terms, including equiv-alent, related, narrower and broader terms. Finally, applicability of automatic subject indexing to operative information systems and challenges of evaluation are outlined, suggesting the need for more research.

Wang, S.; Koopman, R.: Embed first, then predict (2019) 0.00

1.6444239E-4 = product of:
  0.0024666358 = sum of:
    0.0024666358 = product of:
      0.0049332716 = sum of:
        0.0049332716 = weight(_text_:information in 5400) [ClassicSimilarity], result of:
          0.0049332716 = score(doc=5400,freq=2.0), product of:
            0.050870337 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.028978055 = queryNorm
            0.09697737 = fieldWeight in 5400, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5400)
      0.5 = coord(1/2)
  0.06666667 = coord(1/15)

Footnote: Beitrag eines Special Issue: Research Information Systems and Science Classifications; including papers from "Trajectories for Research: Fathoming the Promise of the NARCIS Classification," 27-28 September 2018, The Hague, The Netherlands.

Zhang, Y.; Zhang, C.; Li, J.: Joint modeling of characters, words, and conversation contexts for microblog keyphrase extraction (2020) 0.00

1.6444239E-4 = product of:
  0.0024666358 = sum of:
    0.0024666358 = product of:
      0.0049332716 = sum of:
        0.0049332716 = weight(_text_:information in 5816) [ClassicSimilarity], result of:
          0.0049332716 = score(doc=5816,freq=2.0), product of:
            0.050870337 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.028978055 = queryNorm
            0.09697737 = fieldWeight in 5816, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5816)
      0.5 = coord(1/2)
  0.06666667 = coord(1/15)

Source: Journal of the Association for Information Science and Technology. 71(2020) no.5, S.553-567

Villaespesa, E.; Crider, S.: ¬A critical comparison analysis between human and machine-generated tags for the Metropolitan Museum of Art's collection (2021) 0.00
```
1.6444239E-4 = product of:
  0.0024666358 = sum of:
    0.0024666358 = product of:
      0.0049332716 = sum of:
        0.0049332716 = weight(_text_:information in 341) [ClassicSimilarity], result of:
          0.0049332716 = score(doc=341,freq=2.0), product of:
            0.050870337 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.028978055 = queryNorm
            0.09697737 = fieldWeight in 341, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=341)
      0.5 = coord(1/2)
  0.06666667 = coord(1/15)
```
Abstract

Purpose Based on the highlights of The Metropolitan Museum of Art's collection, the purpose of this paper is to examine the similarities and differences between the subject keywords tags assigned by the museum and those produced by three computer vision systems. Design/methodology/approach This paper uses computer vision tools to generate the data and the Getty Research Institute's Art and Architecture Thesaurus (AAT) to compare the subject keyword tags. Findings This paper finds that there are clear opportunities to use computer vision technologies to automatically generate tags that expand the terms used by the museum. This brings a new perspective to the collection that is different from the traditional art historical one. However, the study also surfaces challenges about the accuracy and lack of context within the computer vision results. Practical implications This finding has important implications on how these machine-generated tags complement the current taxonomies and vocabularies inputted in the collection database. In consequence, the museum needs to consider the selection process for choosing which computer vision system to apply to their collection. Furthermore, they also need to think critically about the kind of tags they wish to use, such as colors, materials or objects. Originality/value The study results add to the rapidly evolving field of computer vision within the art information context and provide recommendations of aspects to consider before selecting and implementing these technologies.

Ahmed, M.: Automatic indexing for agriculture : designing a framework by deploying Agrovoc, Agris and Annif (2023) 0.00

1.6444239E-4 = product of:
  0.0024666358 = sum of:
    0.0024666358 = product of:
      0.0049332716 = sum of:
        0.0049332716 = weight(_text_:information in 1024) [ClassicSimilarity], result of:
          0.0049332716 = score(doc=1024,freq=2.0), product of:
            0.050870337 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.028978055 = queryNorm
            0.09697737 = fieldWeight in 1024, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1024)
      0.5 = coord(1/2)
  0.06666667 = coord(1/15)

Source: ¬SRELS Journal of Information Management. 60(2023) no.2, S.85-95

Needham, R.M.; Sparck Jones, K.: Keywords and clumps (1985) 0.00
```
1.6278966E-4 = product of:
  0.0024418447 = sum of:
    0.0024418447 = product of:
      0.0048836893 = sum of:
        0.0048836893 = weight(_text_:information in 3645) [ClassicSimilarity], result of:
          0.0048836893 = score(doc=3645,freq=4.0), product of:
            0.050870337 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.028978055 = queryNorm
            0.0960027 = fieldWeight in 3645, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.02734375 = fieldNorm(doc=3645)
      0.5 = coord(1/2)
  0.06666667 = coord(1/15)
```
Abstract

The selection that follows was chosen as it represents "a very early paper an the possibilities allowed by computers an documentation." In the early 1960s computers were being used to provide simple automatic indexing systems wherein keywords were extracted from documents. The problem with such systems was that they lacked vocabulary control, thus documents related in subject matter were not always collocated in retrieval. To improve retrieval by improving recall is the raison d'être of vocabulary control tools such as classifications and thesauri. The question arose whether it was possible by automatic means to construct classes of terms, which when substituted, one for another, could be used to improve retrieval performance? One of the first theoretical approaches to this question was initiated by R. M. Needham and Karen Sparck Jones at the Cambridge Language Research Institute in England.t The question was later pursued using experimental methodologies by Sparck Jones, who, as a Senior Research Associate in the Computer Laboratory at the University of Cambridge, has devoted her life's work to research in information retrieval and automatic naturai language processing. Based an the principles of numerical taxonomy, automatic classification techniques start from the premise that two objects are similar to the degree that they share attributes in common. When these two objects are keywords, their similarity is measured in terms of the number of documents they index in common. Step 1 in automatic classification is to compute mathematically the degree to which two terms are similar. Step 2 is to group together those terms that are "most similar" to each other, forming equivalence classes of intersubstitutable terms. The technique for forming such classes varies and is the factor that characteristically distinguishes different approaches to automatic classification. The technique used by Needham and Sparck Jones, that of clumping, is described in the selection that follows. Questions that must be asked are whether the use of automatically generated classes really does improve retrieval performance and whether there is a true eco nomic advantage in substituting mechanical for manual labor. Several years after her work with clumping, Sparck Jones was to observe that while it was not wholly satisfactory in itself, it was valuable in that it stimulated research into automatic classification. To this it might be added that it was valuable in that it introduced to libraryl information science the methods of numerical taxonomy, thus stimulating us to think again about the fundamental nature and purpose of classification. In this connection it might be useful to review how automatically derived classes differ from those of manually constructed classifications: 1) the manner of their derivation is purely a posteriori, the ultimate operationalization of the principle of literary warrant; 2) the relationship between members forming such classes is essentially statistical; the members of a given class are similar to each other not because they possess the class-defining characteristic but by virtue of sharing a family resemblance; and finally, 3) automatically derived classes are not related meaningfully one to another, that is, they are not ordered in traditional hierarchical and precedence relationships.

Chung, E.-K.; Miksa, S.; Hastings, S.K.: ¬A framework of automatic subject term assignment for text categorization : an indexing conception-based approach (2010) 0.00

1.3155391E-4 = product of:
  0.0019733086 = sum of:
    0.0019733086 = product of:
      0.0039466172 = sum of:
        0.0039466172 = weight(_text_:information in 3434) [ClassicSimilarity], result of:
          0.0039466172 = score(doc=3434,freq=2.0), product of:
            0.050870337 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.028978055 = queryNorm
            0.0775819 = fieldWeight in 3434, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.03125 = fieldNorm(doc=3434)
      0.5 = coord(1/2)
  0.06666667 = coord(1/15)

Source: Journal of the American Society for Information Science and Technology. 61(2010) no.4, S.688-699

Search (354 results, page 18 of 18)

Authors

Years

Languages

Types

Themes

Subjects

Classifications