Search (35 results, page 1 of 2)

Golub, K.; Soergel, D.; Buchanan, G.; Tudhope, D.; Lykke, M.; Hiom, D.: ¬A framework for evaluating automatic indexing or classification in the context of retrieval (2016) 0.10
```
0.09524199 = product of:
  0.14286299 = sum of:
    0.08966068 = weight(_text_:systematic in 3311) [ClassicSimilarity], result of:
      0.08966068 = score(doc=3311,freq=2.0), product of:
        0.28397155 = queryWeight, product of:
          5.715473 = idf(docFreq=395, maxDocs=44218)
          0.049684696 = queryNorm
        0.31573826 = fieldWeight in 3311, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.715473 = idf(docFreq=395, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3311)
    0.053202312 = product of:
      0.106404625 = sum of:
        0.106404625 = weight(_text_:indexing in 3311) [ClassicSimilarity], result of:
          0.106404625 = score(doc=3311,freq=14.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.55947536 = fieldWeight in 3311, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3311)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Tools for automatic subject assignment help deal with scale and sustainability in creating and enriching metadata, establishing more connections across and between resources and enhancing consistency. Although some software vendors and experimental researchers claim the tools can replace manual subject indexing, hard scientific evidence of their performance in operating information environments is scarce. A major reason for this is that research is usually conducted in laboratory conditions, excluding the complexities of real-life systems and situations. The article reviews and discusses issues with existing evaluation approaches such as problems of aboutness and relevance assessments, implying the need to use more than a single "gold standard" method when evaluating indexing and retrieval, and proposes a comprehensive evaluation framework. The framework is informed by a systematic review of the literature on evaluation approaches: evaluating indexing quality directly through assessment by an evaluator or through comparison with a gold standard, evaluating the quality of computer-assisted indexing directly in the context of an indexing workflow, and evaluating indexing quality indirectly through analyzing retrieval performance.

Stankovic, R. et al.: Indexing of textual databases based on lexical resources : a case study for Serbian (2016) 0.05

0.04925008 = product of:
  0.14775024 = sum of:
    0.14775024 = sum of:
      0.08043434 = weight(_text_:indexing in 2759) [ClassicSimilarity], result of:
        0.08043434 = score(doc=2759,freq=2.0), product of:
          0.19018644 = queryWeight, product of:
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.049684696 = queryNorm
          0.42292362 = fieldWeight in 2759, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.078125 = fieldNorm(doc=2759)
      0.06731591 = weight(_text_:22 in 2759) [ClassicSimilarity], result of:
        0.06731591 = score(doc=2759,freq=2.0), product of:
          0.17398734 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.049684696 = queryNorm
          0.38690117 = fieldWeight in 2759, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.078125 = fieldNorm(doc=2759)
  0.33333334 = coord(1/3)

Date: 1. 2.2016 18:25:22

Mesquita, L.A.P.; Souza, R.R.; Baracho Porto, R.M.A.: Noun phrases in automatic indexing: : a structural analysis of the distribution of relevant terms in doctoral theses (2014) 0.03
```
0.027550971 = product of:
  0.08265291 = sum of:
    0.08265291 = sum of:
      0.055726547 = weight(_text_:indexing in 1442) [ClassicSimilarity], result of:
        0.055726547 = score(doc=1442,freq=6.0), product of:
          0.19018644 = queryWeight, product of:
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.049684696 = queryNorm
          0.2930101 = fieldWeight in 1442, product of:
            2.4494898 = tf(freq=6.0), with freq of:
              6.0 = termFreq=6.0
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.03125 = fieldNorm(doc=1442)
      0.026926363 = weight(_text_:22 in 1442) [ClassicSimilarity], result of:
        0.026926363 = score(doc=1442,freq=2.0), product of:
          0.17398734 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.049684696 = queryNorm
          0.15476047 = fieldWeight in 1442, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.03125 = fieldNorm(doc=1442)
  0.33333334 = coord(1/3)
```
Abstract

The main objective of this research was to analyze whether there was a characteristic distribution behavior of relevant terms over a scientific text that could contribute as a criterion for their process of automatic indexing. The terms considered in this study were only full noun phrases contained in the texts themselves. The texts were considered a total of 98 doctoral theses of the eight areas of knowledge in a same university. Initially, 20 full noun phrases were automatically extracted from each text as candidates to be the most relevant terms, and each author of each text assigned a relevance value 0-6 (not relevant and highly relevant, respectively) for each of the 20 noun phrases sent. Only, 22.1 % of noun phrases were considered not relevant. A relevance values of the terms assigned by the authors were associated with their positions in the text. Each full noun phrases found in the text was considered as a valid linear position. The results that were obtained showed values resulting from this distribution by considering two types of position: linear, with values consolidated into ten equal consecutive parts; and structural, considering parts of the text (such as introduction, development and conclusion). As a result of considerable importance, all areas of knowledge related to the Natural Sciences showed a characteristic behavior in the distribution of relevant terms, as well as all areas of knowledge related to Social Sciences showed the same characteristic behavior of distribution, but distinct from the Natural Sciences. The difference of the distribution behavior between the Natural and Social Sciences can be clearly visualized through graphs. All behaviors, including the general behavior of all areas of knowledge together, were characterized in polynomial equations and can be applied in future as criteria for automatic indexing. Until the present date this work has become inedited of for two reasons: to present a method for characterizing the distribution of relevant terms in a scientific text, and also, through this method, pointing out a quantitative trait difference between the Natural and Social Sciences.

Source

Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik
Gil-Leiva, I.: SISA-automatic indexing system for scientific articles : experiments with location heuristics rules versus TF-IDF rules (2017) 0.02
```
0.024130303 = product of:
  0.07239091 = sum of:
    0.07239091 = product of:
      0.14478181 = sum of:
        0.14478181 = weight(_text_:indexing in 3622) [ClassicSimilarity], result of:
          0.14478181 = score(doc=3622,freq=18.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.76126254 = fieldWeight in 3622, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.046875 = fieldNorm(doc=3622)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

Indexing is contextualized and a brief description is provided of some of the most used automatic indexing systems. We describe SISA, a system which uses location heuristics rules, statistical rules like term frequency (TF) or TF-IDF to obtain automatic or semi-automatic indexing, depending on the user's preference. The aim of this research is to ascertain which rules (location heuristics rules or TF-IDF rules) provide the best indexing terms. SISA is used to obtain the automatic indexing of 200 scientific articles on fruit growing written in Portuguese. It uses, on the one hand, location heuristics rules founded on the value of certain parts of the articles for indexing such as titles, abstracts, keywords, headings, first paragraph, conclusions and references and, on the other, TF-IDF rules. The indexing is then evaluated to ascertain retrieval performance through recall, precision and f-measure. Automatic indexing of the articles with location heuristics rules provided the best results with the evaluation measures.
Lu, K.; Mao, J.: ¬An automatic approach to weighted subject indexing : an empirical study in the biomedical domain (2015) 0.02
```
0.022230878 = product of:
  0.066692635 = sum of:
    0.066692635 = product of:
      0.13338527 = sum of:
        0.13338527 = weight(_text_:indexing in 4005) [ClassicSimilarity], result of:
          0.13338527 = score(doc=4005,freq=22.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.70133954 = fieldWeight in 4005, product of:
              4.690416 = tf(freq=22.0), with freq of:
                22.0 = termFreq=22.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4005)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

Subject indexing is an intellectually intensive process that has many inherent uncertainties. Existing manual subject indexing systems generally produce binary outcomes for whether or not to assign an indexing term. This does not sufficiently reflect the extent to which the indexing terms are associated with the documents. On the other hand, the idea of probabilistic or weighted indexing was proposed a long time ago and has seen success in capturing uncertainties in the automatic indexing process. One hurdle to overcome in implementing weighted indexing in manual subject indexing systems is the practical burden that could be added to the already intensive indexing process. This study proposes a method to infer automatically the associations between subject terms and documents through text mining. By uncovering the connections between MeSH descriptors and document text, we are able to derive the weights of MeSH descriptors manually assigned to documents. Our initial results suggest that the inference method is feasible and promising. The study has practical implications for improving subject indexing practice and providing better support for information retrieval.
Keller, A.: Attitudes among German- and English-speaking librarians toward (automatic) subject indexing (2015) 0.02
```
0.020983277 = product of:
  0.06294983 = sum of:
    0.06294983 = product of:
      0.12589966 = sum of:
        0.12589966 = weight(_text_:indexing in 2629) [ClassicSimilarity], result of:
          0.12589966 = score(doc=2629,freq=10.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.6619802 = fieldWeight in 2629, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2629)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

The survey described in this article investigates the attitudes of librarians in German- and English-speaking countries toward subject indexing in general, and automatic subject indexing in particular. The results show great similarity between attitudes in both language areas. Respondents agree that the current quality standards should be upheld and dismiss critical voices claiming that subject indexing has lost relevance. With regard to automatic subject indexing, respondents demonstrate considerable skepticism-both with regard to the likely timeframe and the expected quality of such systems. The author considers how this low acceptance poses a difficulty for those involved in change management.
Munkelt, J.; Schaer, P.; Lepsky, K.: Towards an IR test collection for the German National Library (2018) 0.02
```
0.019702308 = product of:
  0.059106924 = sum of:
    0.059106924 = product of:
      0.11821385 = sum of:
        0.11821385 = weight(_text_:indexing in 4311) [ClassicSimilarity], result of:
          0.11821385 = score(doc=4311,freq=12.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.6215682 = fieldWeight in 4311, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.046875 = fieldNorm(doc=4311)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

Automatic content indexing is one of the innovations that are increasingly changing the way libraries work. In theory, it promises a cataloguing service that would hardly be possible with humans in terms of speed, quantity and maybe quality. The German National Library (DNB) has also recognised this potential and is increasingly relying on the automatic indexing of their catalogue content. The DNB took a major step in this direction in 2017, which was announced in two papers. The announcement was rather restrained, but the content of the papers is all the more explosive for the library community: Since September 2017, the DNB has discontinued the intellectual indexing of series Band H and has switched to an automatic process for these series. The subject indexing of online publications (series O) has been purely automatical since 2010; from September 2017, monographs and periodicals published outside the publishing industry and university publications will no longer be indexed by people. This raises the question: What is the quality of the automatic indexing compared to the manual work or in other words to which degree can the automatic indexing replace people without a signi cant drop in regards to quality?

Williams, R.V.: Hans Peter Luhn and Herbert M. Ohlman : their roles in the origins of keyword-in-context/permutation automatic indexing (2010) 0.02

0.018575516 = product of:
  0.055726547 = sum of:
    0.055726547 = product of:
      0.11145309 = sum of:
        0.11145309 = weight(_text_:indexing in 3440) [ClassicSimilarity], result of:
          0.11145309 = score(doc=3440,freq=6.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.5860202 = fieldWeight in 3440, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0625 = fieldNorm(doc=3440)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Abstract: The invention of automatic indexing using a keyword-in-context approach has generally been attributed solely to Hans Peter Luhn of IBM. This article shows that credit for this invention belongs equally to Luhn and Herbert Ohlman of the System Development Corporation. It also traces the origins of title derivative automatic indexing, its development and implementation, and current status.

Golub, K.: Automatic subject indexing of text (2019) 0.02
```
0.017734105 = product of:
  0.053202312 = sum of:
    0.053202312 = product of:
      0.106404625 = sum of:
        0.106404625 = weight(_text_:indexing in 5268) [ClassicSimilarity], result of:
          0.106404625 = score(doc=5268,freq=14.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.55947536 = fieldWeight in 5268, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5268)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

Automatic subject indexing addresses problems of scale and sustainability and can be at the same time used to enrich existing metadata records, establish more connections across and between resources from various metadata and resource collec-tions, and enhance consistency of the metadata. In this work, au-tomatic subject indexing focuses on assigning index terms or classes from established knowledge organization systems (KOSs) for subject indexing like thesauri, subject headings systems and classification systems. The following major approaches are dis-cussed, in terms of their similarities and differences, advantages and disadvantages for automatic assigned indexing from KOSs: "text categorization," "document clustering," and "document classification." Text categorization is perhaps the most wide-spread, machine-learning approach with what seems generally good reported performance. Document clustering automatically both creates groups of related documents and extracts names of subjects depicting the group at hand. Document classification re-uses the intellectual effort invested into creating a KOS for sub-ject indexing and even simple string-matching algorithms have been reported to achieve good results, because one concept can be described using a number of different terms, including equiv-alent, related, narrower and broader terms. Finally, applicability of automatic subject indexing to operative information systems and challenges of evaluation are outlined, suggesting the need for more research.
Chung, E.-K.; Miksa, S.; Hastings, S.K.: ¬A framework of automatic subject term assignment for text categorization : an indexing conception-based approach (2010) 0.02
```
0.016957048 = product of:
  0.050871145 = sum of:
    0.050871145 = product of:
      0.10174229 = sum of:
        0.10174229 = weight(_text_:indexing in 3434) [ClassicSimilarity], result of:
          0.10174229 = score(doc=3434,freq=20.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.5349608 = fieldWeight in 3434, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.03125 = fieldNorm(doc=3434)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

The purpose of this study is to examine whether the understandings of subject-indexing processes conducted by human indexers have a positive impact on the effectiveness of automatic subject term assignment through text categorization (TC). More specifically, human indexers' subject-indexing approaches, or conceptions, in conjunction with semantic sources were explored in the context of a typical scientific journal article dataset. Based on the premise that subject indexing approaches or conceptions with semantic sources are important for automatic subject term assignment through TC, this study proposed an indexing conception-based framework. For the purpose of this study, two research questions were explored: To what extent are semantic sources effective? To what extent are indexing conceptions effective? The experiments were conducted using a Support Vector Machine implementation in WEKA (I.H. Witten & E. Frank, [2000]). Using F-measure, the experiment results showed that cited works, source title, and title were as effective as the full text while a keyword was found more effective than the full text. In addition, the findings showed that an indexing conception-based framework was more effective than the full text. The content-oriented and the document-oriented indexing approaches especially were found more effective than the full text. Among three indexing conception-based approaches, the content-oriented approach and the document-oriented approach were more effective than the domain-oriented approach. In other words, in the context of a typical scientific journal article dataset, the objective contents and authors' intentions were more desirable for automatic subject term assignment via TC than the possible users' needs. The findings of this study support that incorporation of human indexers' indexing approaches or conception in conjunction with semantic sources has a positive impact on the effectiveness of automatic subject term assignment.
Fauzi, F.; Belkhatir, M.: Multifaceted conceptual image indexing on the world wide web (2013) 0.02
```
0.016086869 = product of:
  0.048260607 = sum of:
    0.048260607 = product of:
      0.09652121 = sum of:
        0.09652121 = weight(_text_:indexing in 2721) [ClassicSimilarity], result of:
          0.09652121 = score(doc=2721,freq=8.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.5075084 = fieldWeight in 2721, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.046875 = fieldNorm(doc=2721)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

In this paper, we describe a user-centered design of an automated multifaceted concept-based indexing framework which analyzes the semantics of the Web image contextual information and classifies it into five broad semantic concept facets: signal, object, abstract, scene, and relational; and identifies the semantic relationships between the concepts. An important aspect of our indexing model is that it relates to the users' levels of image descriptions. Also, a major contribution relies on the fact that the classification is performed automatically with the raw image contextual information extracted from any general webpage and is not solely based on image tags like state-of-the-art solutions. Human Language Technology techniques and an external knowledge base are used to analyze the information both syntactically and semantically. Experimental results on a human-annotated Web image collection and corresponding contextual information indicate that our method outperforms empirical frameworks employing tf-idf and location-based tf-idf weighting schemes as well as n-gram indexing in a recall/precision based evaluation framework.

Gödert, W.: Detecting multiword phrases in mathematical text corpora (2012) 0.02

0.015166845 = product of:
  0.045500536 = sum of:
    0.045500536 = product of:
      0.09100107 = sum of:
        0.09100107 = weight(_text_:indexing in 466) [ClassicSimilarity], result of:
          0.09100107 = score(doc=466,freq=4.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.47848347 = fieldWeight in 466, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0625 = fieldNorm(doc=466)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Abstract: We present an approach for detecting multiword phrases in mathematical text corpora. The method used is based on characteristic features of mathematical terminology. It makes use of a software tool named Lingo which allows to identify words by means of previously defined dictionaries for specific word classes as adjectives, personal names or nouns. The detection of multiword groups is done algorithmically. Possible advantages of the method for indexing and information retrieval and conclusions for applying dictionary-based methods of automatic indexing instead of stemming procedures are discussed.

Willis, C.; Losee, R.M.: ¬A random walk on an ontology : using thesaurus structure for automatic subject indexing (2013) 0.02
```
0.015166845 = product of:
  0.045500536 = sum of:
    0.045500536 = product of:
      0.09100107 = sum of:
        0.09100107 = weight(_text_:indexing in 1016) [ClassicSimilarity], result of:
          0.09100107 = score(doc=1016,freq=16.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.47848347 = fieldWeight in 1016, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.03125 = fieldNorm(doc=1016)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

Relationships between terms and features are an essential component of thesauri, ontologies, and a range of controlled vocabularies. In this article, we describe ways to identify important concepts in documents using the relationships in a thesaurus or other vocabulary structures. We introduce a methodology for the analysis and modeling of the indexing process based on a weighted random walk algorithm. The primary goal of this research is the analysis of the contribution of thesaurus structure to the indexing process. The resulting models are evaluated in the context of automatic subject indexing using four collections of documents pre-indexed with 4 different thesauri (AGROVOC [UN Food and Agriculture Organization], high-energy physics taxonomy [HEP], National Agricultural Library Thesaurus [NALT], and medical subject headings [MeSH]). We also introduce a thesaurus-centric matching algorithm intended to improve the quality of candidate concepts. In all cases, the weighted random walk improves automatic indexing performance over matching alone with an increase in average precision (AP) of 9% for HEP, 11% for MeSH, 35% for NALT, and 37% for AGROVOC. The results of the analysis support our hypothesis that subject indexing is in part a browsing process, and that using the vocabulary and its structure in a thesaurus contributes to the indexing process. The amount that the vocabulary structure contributes was found to differ among the 4 thesauri, possibly due to the vocabulary used in the corresponding thesauri and the structural relationships between the terms. Each of the thesauri and the manual indexing associated with it is characterized using the methods developed here.
Lu, K.; Mao, J.; Li, G.: Toward effective automated weighted subject indexing : a comparison of different approaches in different environments (2018) 0.01
```
0.014988055 = product of:
  0.044964164 = sum of:
    0.044964164 = product of:
      0.08992833 = sum of:
        0.08992833 = weight(_text_:indexing in 4292) [ClassicSimilarity], result of:
          0.08992833 = score(doc=4292,freq=10.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.47284302 = fieldWeight in 4292, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4292)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

Subject indexing plays an important role in supporting subject access to information resources. Current subject indexing systems do not make adequate distinctions on the importance of assigned subject descriptors. Assigning numeric weights to subject descriptors to distinguish their importance to the documents can strengthen the role of subject metadata. Automated methods are more cost-effective. This study compares different automated weighting methods in different environments. Two evaluation methods were used to assess the performance. Experiments on three datasets in the biomedical domain suggest the performance of different weighting methods depends on whether it is an abstract or full text environment. Mutual information with bag-of-words representation shows the best average performance in the full text environment, while cosine with bag-of-words representation is the best in an abstract environment. The cosine measure has relatively consistent and robust performance. A direct weighting method, IDF (Inverse Document Frequency), can produce quick and reasonable estimates of the weights. Bag-of-words representation generally outperforms the concept-based representation. Further improvement in performance can be obtained by using the learning-to-rank method to integrate different weighting methods. This study follows up Lu and Mao (Journal of the Association for Information Science and Technology, 66, 1776-1784, 2015), in which an automated weighted subject indexing method was proposed and validated. The findings from this study contribute to more effective weighted subject indexing.

Hauer, M.: Tiefenindexierung im Bibliothekskatalog : 17 Jahre intelligentCAPTURE (2019) 0.01

0.0134631805 = product of:
  0.04038954 = sum of:
    0.04038954 = product of:
      0.08077908 = sum of:
        0.08077908 = weight(_text_:22 in 5629) [ClassicSimilarity], result of:
          0.08077908 = score(doc=5629,freq=2.0), product of:
            0.17398734 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049684696 = queryNorm
            0.46428138 = fieldWeight in 5629, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=5629)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Source: B.I.T.online. 22(2019) H.2, S.163-166

Ma, N.; Zheng, H.T.; Xiao, X.: ¬An ontology-based latent semantic indexing approach using long short-term memory networks (2017) 0.01
```
0.0134057235 = product of:
  0.04021717 = sum of:
    0.04021717 = product of:
      0.08043434 = sum of:
        0.08043434 = weight(_text_:indexing in 3810) [ClassicSimilarity], result of:
          0.08043434 = score(doc=3810,freq=8.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.42292362 = fieldWeight in 3810, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3810)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

Nowadays, online data shows an astonishing increase and the issue of semantic indexing remains an open question. Ontologies and knowledge bases have been widely used to optimize performance. However, researchers are placing increased emphasis on internal relations of ontologies but neglect latent semantic relations between ontologies and documents. They generally annotate instances mentioned in documents, which are related to concepts in ontologies. In this paper, we propose an Ontology-based Latent Semantic Indexing approach utilizing Long Short-Term Memory networks (LSTM-OLSI). We utilize an importance-aware topic model to extract document-level semantic features and leverage ontologies to extract word-level contextual features. Then we encode the above two levels of features and match their embedding vectors utilizing LSTM networks. Finally, the experimental results reveal that LSTM-OLSI outperforms existing techniques and demonstrates deep comprehension of instances and articles.

Object

Latent Semantic Indexing

Junger, U.: Can indexing be automated? : the example of the Deutsche Nationalbibliothek (2012) 0.01

0.013270989 = product of:
  0.039812967 = sum of:
    0.039812967 = product of:
      0.079625934 = sum of:
        0.079625934 = weight(_text_:indexing in 1717) [ClassicSimilarity], result of:
          0.079625934 = score(doc=1717,freq=4.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.41867304 = fieldWeight in 1717, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1717)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Abstract: The German subject headings authority file (Schlagwortnormdatei/SWD) provides a broad controlled vocabulary for indexing documents of all subjects. Traditionally used for intellectual subject cataloguing primarily of books the Deutsche Nationalbibliothek (DNB, German National Library) has been working on developping and implementing procedures for automated assignment of subject headings for online publications. This project, its results and problems are sketched in the paper.

Junger, U.: Can indexing be automated? : the example of the Deutsche Nationalbibliothek (2014) 0.01

0.013270989 = product of:
  0.039812967 = sum of:
    0.039812967 = product of:
      0.079625934 = sum of:
        0.079625934 = weight(_text_:indexing in 1969) [ClassicSimilarity], result of:
          0.079625934 = score(doc=1969,freq=4.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.41867304 = fieldWeight in 1969, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1969)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Abstract: The German Integrated Authority File (Gemeinsame Normdatei, GND), provides a broad controlled vocabulary for indexing documents on all subjects. Traditionally used for intellectual subject cataloging primarily for books, the Deutsche Nationalbibliothek (DNB, German National Library) has been working on developing and implementing procedures for automated assignment of subject headings for online publications. This project, its results, and problems are outlined in this article.

Lichtenstein, A.; Plank, M.; Neumann, J.: TIB's portal for audiovisual media : combining manual and automatic indexing (2014) 0.01
```
0.013270989 = product of:
  0.039812967 = sum of:
    0.039812967 = product of:
      0.079625934 = sum of:
        0.079625934 = weight(_text_:indexing in 1981) [ClassicSimilarity], result of:
          0.079625934 = score(doc=1981,freq=4.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.41867304 = fieldWeight in 1981, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1981)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

The German National Library of Science and Technology (TIB) developed a Web-based platform for audiovisual media. The audiovisual portal optimizes access to scientific videos such as computer animations and lecture and conference recordings. TIB's AV-Portal combines traditional cataloging and automatic indexing of audiovisual media. The article describes metadata standards for audiovisual media and introduces the TIB's metadata schema in comparison to other metadata standards for non-textual materials. Additionally, we give an overview of multimedia retrieval technologies used for the Portal and present the AV-Portal in detail as well as the additional value for libraries and their users.

Souza, R.R.; Gil-Leiva, I.: Automatic indexing of scientific texts : a methodological comparison (2016) 0.01

0.01072458 = product of:
  0.032173738 = sum of:
    0.032173738 = product of:
      0.064347476 = sum of:
        0.064347476 = weight(_text_:indexing in 4913) [ClassicSimilarity], result of:
          0.064347476 = score(doc=4913,freq=2.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.3383389 = fieldWeight in 4913, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0625 = fieldNorm(doc=4913)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Search (35 results, page 1 of 2)

Authors

Languages

Types

Themes