Search (77 results, page 2 of 4)

Fauzi, F.; Belkhatir, M.: Multifaceted conceptual image indexing on the world wide web (2013) 0.00
```
0.002269176 = product of:
  0.004538352 = sum of:
    0.004538352 = product of:
      0.009076704 = sum of:
        0.009076704 = weight(_text_:a in 2721) [ClassicSimilarity], result of:
          0.009076704 = score(doc=2721,freq=10.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.1709182 = fieldWeight in 2721, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=2721)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

In this paper, we describe a user-centered design of an automated multifaceted concept-based indexing framework which analyzes the semantics of the Web image contextual information and classifies it into five broad semantic concept facets: signal, object, abstract, scene, and relational; and identifies the semantic relationships between the concepts. An important aspect of our indexing model is that it relates to the users' levels of image descriptions. Also, a major contribution relies on the fact that the classification is performed automatically with the raw image contextual information extracted from any general webpage and is not solely based on image tags like state-of-the-art solutions. Human Language Technology techniques and an external knowledge base are used to analyze the information both syntactically and semantically. Experimental results on a human-annotated Web image collection and corresponding contextual information indicate that our method outperforms empirical frameworks employing tf-idf and location-based tf-idf weighting schemes as well as n-gram indexing in a recall/precision based evaluation framework.

Type

a
Wiesenmüller, H.: DNB-Sacherschließung : Neues für die Reihen A und B (2019) 0.00
```
0.002269176 = product of:
  0.004538352 = sum of:
    0.004538352 = product of:
      0.009076704 = sum of:
        0.009076704 = weight(_text_:a in 5212) [ClassicSimilarity], result of:
          0.009076704 = score(doc=5212,freq=10.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.1709182 = fieldWeight in 5212, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=5212)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

"Alle paar Jahre wird die Bibliothekscommunity mit Veränderungen in der inhaltlichen Erschließung durch die Deutsche Nationalbibliothek konfrontiert. Sicher werden sich viele noch an die Einschnitte des Jahres 2014 für die Reihe A erinnern: Seither werden u.a. Ratgeber, Sprachwörterbücher, Reiseführer und Kochbücher nicht mehr mit Schlagwörtern erschlossen (vgl. das DNB-Konzept von 2014). Das Jahr 2017 brachte die Einführung der maschinellen Indexierung für die Reihen B und H bei gleichzeitigem Verlust der DDC-Tiefenerschließung (vgl. DNB-Informationen von 2017). Virulent war seither die Frage, was mit der Reihe A passieren würde. Seit wenigen Tagen kann man dies nun auf der Website der DNB nachlesen. (Nebenbei: Es ist zu befürchten, dass viele Links in diesem Blog-Beitrag in absehbarer Zeit nicht mehr funktionieren werden, da ein Relaunch der DNB-Website angekündigt ist. Wie beim letzten Mal wird es vermutlich auch diesmal keine Weiterleitungen von den alten auf die neuen URLs geben.)"

Source

https://www.basiswissen-rda.de/dnb-sacherschliessung-reihen-a-und-b/

Type

a
Kanan, T.; Fox, E.A.: Automated arabic text classification with P-Stemmer, machine learning, and a tailored news article taxonomy (2016) 0.00
```
0.0022374375 = product of:
  0.004474875 = sum of:
    0.004474875 = product of:
      0.00894975 = sum of:
        0.00894975 = weight(_text_:a in 3151) [ClassicSimilarity], result of:
          0.00894975 = score(doc=3151,freq=14.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.1685276 = fieldWeight in 3151, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3151)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Arabic news articles in electronic collections are difficult to study. Browsing by category is rarely supported. Although helpful machine-learning methods have been applied successfully to similar situations for English news articles, limited research has been completed to yield suitable solutions for Arabic news. In connection with a Qatar National Research Fund (QNRF)-funded project to build digital library community and infrastructure in Qatar, we developed software for browsing a collection of about 237,000 Arabic news articles, which should be applicable to other Arabic news collections. We designed a simple taxonomy for Arabic news stories that is suitable for the needs of Qatar and other nations, is compatible with the subject codes of the International Press Telecommunications Council, and was enhanced with the aid of a librarian expert as well as five Arabic-speaking volunteers. We developed tailored stemming (i.e., a new Arabic light stemmer called P-Stemmer) and automatic classification methods (the best being binary Support Vector Machines classifiers) to work with the taxonomy. Using evaluation techniques commonly used in the information retrieval community, including 10-fold cross-validation and the Wilcoxon signed-rank test, we showed that our approach to stemming and classification is superior to state-of-the-art techniques.

Type

a
Golub, K.; Soergel, D.; Buchanan, G.; Tudhope, D.; Lykke, M.; Hiom, D.: ¬A framework for evaluating automatic indexing or classification in the context of retrieval (2016) 0.00
```
0.0022374375 = product of:
  0.004474875 = sum of:
    0.004474875 = product of:
      0.00894975 = sum of:
        0.00894975 = weight(_text_:a in 3311) [ClassicSimilarity], result of:
          0.00894975 = score(doc=3311,freq=14.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.1685276 = fieldWeight in 3311, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3311)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Tools for automatic subject assignment help deal with scale and sustainability in creating and enriching metadata, establishing more connections across and between resources and enhancing consistency. Although some software vendors and experimental researchers claim the tools can replace manual subject indexing, hard scientific evidence of their performance in operating information environments is scarce. A major reason for this is that research is usually conducted in laboratory conditions, excluding the complexities of real-life systems and situations. The article reviews and discusses issues with existing evaluation approaches such as problems of aboutness and relevance assessments, implying the need to use more than a single "gold standard" method when evaluating indexing and retrieval, and proposes a comprehensive evaluation framework. The framework is informed by a systematic review of the literature on evaluation approaches: evaluating indexing quality directly through assessment by an evaluator or through comparison with a gold standard, evaluating the quality of computer-assisted indexing directly in the context of an indexing workflow, and evaluating indexing quality indirectly through analyzing retrieval performance.

Type

a
Strobel, S.; Marín-Arraiza, P.: Metadata for scientific audiovisual media : current practices and perspectives of the TIB / AV-portal (2015) 0.00
```
0.0022374375 = product of:
  0.004474875 = sum of:
    0.004474875 = product of:
      0.00894975 = sum of:
        0.00894975 = weight(_text_:a in 3667) [ClassicSimilarity], result of:
          0.00894975 = score(doc=3667,freq=14.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.1685276 = fieldWeight in 3667, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3667)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Descriptive metadata play a key role in finding relevant search results in large amounts of unstructured data. However, current scientific audiovisual media are provided with little metadata, which makes them hard to find, let alone individual sequences. In this paper, the TIB / AV-Portal is presented as a use case where methods concerning the automatic generation of metadata, a semantic search and cross-lingual retrieval (German/English) have already been applied. These methods result in a better discoverability of the scientific audiovisual media hosted in the portal. Text, speech, and image content of the video are automatically indexed by specialised GND (Gemeinsame Normdatei) subject headings. A semantic search is established based on properties of the GND ontology. The cross-lingual retrieval uses English 'translations' that were derived by an ontology mapping (DBpedia i. a.). Further ways of increasing the discoverability and reuse of the metadata are publishing them as Linked Open Data and interlinking them with other data sets.

Type

a
Vlachidis, A.; Tudhope, D.: ¬A knowledge-based approach to information extraction for semantic interoperability in the archaeology domain (2016) 0.00
```
0.0020714647 = product of:
  0.0041429293 = sum of:
    0.0041429293 = product of:
      0.008285859 = sum of:
        0.008285859 = weight(_text_:a in 2895) [ClassicSimilarity], result of:
          0.008285859 = score(doc=2895,freq=12.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.15602624 = fieldWeight in 2895, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2895)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The article presents a method for automatic semantic indexing of archaeological grey-literature reports using empirical (rule-based) Information Extraction techniques in combination with domain-specific knowledge organization systems. The semantic annotation system (OPTIMA) performs the tasks of Named Entity Recognition, Relation Extraction, Negation Detection, and Word-Sense Disambiguation using hand-crafted rules and terminological resources for associating contextual abstractions with classes of the standard ontology CIDOC Conceptual Reference Model (CRM) for cultural heritage and its archaeological extension, CRM-EH. Relation Extraction (RE) performance benefits from a syntactic-based definition of RE patterns derived from domain oriented corpus analysis. The evaluation also shows clear benefit in the use of assistive natural language processing (NLP) modules relating to Word-Sense Disambiguation, Negation Detection, and Noun Phrase Validation, together with controlled thesaurus expansion. The semantic indexing results demonstrate the capacity of rule-based Information Extraction techniques to deliver interoperable semantic abstractions (semantic annotations) with respect to the CIDOC CRM and archaeological thesauri. Major contributions include recognition of relevant entities using shallow parsing NLP techniques driven by a complimentary use of ontological and terminological domain resources and empirical derivation of context-driven RE rules for the recognition of semantic relationships from phrases of unstructured text.

Type

a
Wang, S.; Koopman, R.: Embed first, then predict (2019) 0.00
```
0.0020714647 = product of:
  0.0041429293 = sum of:
    0.0041429293 = product of:
      0.008285859 = sum of:
        0.008285859 = weight(_text_:a in 5400) [ClassicSimilarity], result of:
          0.008285859 = score(doc=5400,freq=12.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.15602624 = fieldWeight in 5400, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5400)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Automatic subject prediction is a desirable feature for modern digital library systems, as manual indexing can no longer cope with the rapid growth of digital collections. It is also desirable to be able to identify a small set of entities (e.g., authors, citations, bibliographic records) which are most relevant to a query. This gets more difficult when the amount of data increases dramatically. Data sparsity and model scalability are the major challenges to solving this type of extreme multilabel classification problem automatically. In this paper, we propose to address this problem in two steps: we first embed different types of entities into the same semantic space, where similarity could be computed easily; second, we propose a novel non-parametric method to identify the most relevant entities in addition to direct semantic similarities. We show how effectively this approach predicts even very specialised subjects, which are associated with few documents in the training set and are more problematic for a classifier.

Type

a
Junger, U.: Can indexing be automated? : the example of the Deutsche Nationalbibliothek (2014) 0.00
```
0.0020506454 = product of:
  0.004101291 = sum of:
    0.004101291 = product of:
      0.008202582 = sum of:
        0.008202582 = weight(_text_:a in 1969) [ClassicSimilarity], result of:
          0.008202582 = score(doc=1969,freq=6.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.1544581 = fieldWeight in 1969, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1969)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The German Integrated Authority File (Gemeinsame Normdatei, GND), provides a broad controlled vocabulary for indexing documents on all subjects. Traditionally used for intellectual subject cataloging primarily for books, the Deutsche Nationalbibliothek (DNB, German National Library) has been working on developing and implementing procedures for automated assignment of subject headings for online publications. This project, its results, and problems are outlined in this article.

Footnote

Contribution in a special issue "Beyond libraries: Subject metadata in the digital environment and Semantic Web" - Enthält Beiträge der gleichnamigen IFLA Satellite Post-Conference, 17-18 August 2012, Tallinn.

Type

a
Lichtenstein, A.; Plank, M.; Neumann, J.: TIB's portal for audiovisual media : combining manual and automatic indexing (2014) 0.00
```
0.0020506454 = product of:
  0.004101291 = sum of:
    0.004101291 = product of:
      0.008202582 = sum of:
        0.008202582 = weight(_text_:a in 1981) [ClassicSimilarity], result of:
          0.008202582 = score(doc=1981,freq=6.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.1544581 = fieldWeight in 1981, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1981)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The German National Library of Science and Technology (TIB) developed a Web-based platform for audiovisual media. The audiovisual portal optimizes access to scientific videos such as computer animations and lecture and conference recordings. TIB's AV-Portal combines traditional cataloging and automatic indexing of audiovisual media. The article describes metadata standards for audiovisual media and introduces the TIB's metadata schema in comparison to other metadata standards for non-textual materials. Additionally, we give an overview of multimedia retrieval technologies used for the Portal and present the AV-Portal in detail as well as the additional value for libraries and their users.

Type

a
Keller, A.: Attitudes among German- and English-speaking librarians toward (automatic) subject indexing (2015) 0.00
```
0.0020506454 = product of:
  0.004101291 = sum of:
    0.004101291 = product of:
      0.008202582 = sum of:
        0.008202582 = weight(_text_:a in 2629) [ClassicSimilarity], result of:
          0.008202582 = score(doc=2629,freq=6.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.1544581 = fieldWeight in 2629, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2629)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The survey described in this article investigates the attitudes of librarians in German- and English-speaking countries toward subject indexing in general, and automatic subject indexing in particular. The results show great similarity between attitudes in both language areas. Respondents agree that the current quality standards should be upheld and dismiss critical voices claiming that subject indexing has lost relevance. With regard to automatic subject indexing, respondents demonstrate considerable skepticism-both with regard to the likely timeframe and the expected quality of such systems. The author considers how this low acceptance poses a difficulty for those involved in change management.

Type

a
Benson, A.C.: Image descriptions and their relational expressions : a review of the literature and the issues (2015) 0.00
```
0.0020296127 = product of:
  0.0040592253 = sum of:
    0.0040592253 = product of:
      0.008118451 = sum of:
        0.008118451 = weight(_text_:a in 1867) [ClassicSimilarity], result of:
          0.008118451 = score(doc=1867,freq=8.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.15287387 = fieldWeight in 1867, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=1867)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Purpose - The purpose of this paper is to survey the treatment of relationships, relationship expressions and the ways in which they manifest themselves in image descriptions. Design/methodology/approach - The term "relationship" is construed in the broadest possible way to include spatial relationships ("to the right of"), temporal ("in 1936," "at noon"), meronymic ("part of"), and attributive ("has color," "has dimension"). The intentions of these vaguely delimited categories with image information, image creation, and description in libraries and archives is complex and in need of explanation. Findings - The review brings into question many generally held beliefs about the relationship problem such as the belief that the semantics of relationships are somehow embedded in the relationship term itself and that image search and retrieval solutions can be found through refinement of word-matching systems. Originality/value - This review has no hope of systematically examining all evidence in all disciplines pertaining to this topic. It instead focusses on a general description of a theoretical treatment in Library and Information Science.

Type

a
Banerjee, K.; Johnson, M.: Improving access to archival collections with automated entity extraction (2015) 0.00
```
0.0020296127 = product of:
  0.0040592253 = sum of:
    0.0040592253 = product of:
      0.008118451 = sum of:
        0.008118451 = weight(_text_:a in 2144) [ClassicSimilarity], result of:
          0.008118451 = score(doc=2144,freq=8.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.15287387 = fieldWeight in 2144, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=2144)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The complexity and diversity of archival resources make constructing rich metadata records time consuming and expensive, which in turn limits access to these valuable materials. However, significant automation of the metadata creation process would dramatically reduce the cost of providing access points, improve access to individual resources, and establish connections between resources that would otherwise remain unknown. Using a case study at Oregon Health & Science University as a lens to examine the conceptual and technical challenges associated with automated extraction of access points, we discuss using publically accessible API's to extract entities (i.e. people, places, concepts, etc.) from digital and digitized objects. We describe why Linked Open Data is not well suited for a use case such as ours. We conclude with recommendations about how this method can be used in archives as well as for other library applications.

Type

a
Flores, F.N.; Moreira, V.P.: Assessing the impact of stemming accuracy on information retrieval : a multilingual perspective (2016) 0.00
```
0.0020296127 = product of:
  0.0040592253 = sum of:
    0.0040592253 = product of:
      0.008118451 = sum of:
        0.008118451 = weight(_text_:a in 3187) [ClassicSimilarity], result of:
          0.008118451 = score(doc=3187,freq=8.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.15287387 = fieldWeight in 3187, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=3187)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The quality of stemming algorithms is typically measured in two different ways: (i) how accurately they map the variant forms of a word to the same stem; or (ii) how much improvement they bring to Information Retrieval systems. In this article, we evaluate various stemming algorithms, in four languages, in terms of accuracy and in terms of their aid to Information Retrieval. The aim is to assess whether the most accurate stemmers are also the ones that bring the biggest gain in Information Retrieval. Experiments in English, French, Portuguese, and Spanish show that this is not always the case, as stemmers with higher error rates yield better retrieval quality. As a byproduct, we also identified the most accurate stemmers and the best for Information Retrieval purposes.

Type

a
Munkelt, J.; Schaer, P.; Lepsky, K.: Towards an IR test collection for the German National Library (2018) 0.00
```
0.0020296127 = product of:
  0.0040592253 = sum of:
    0.0040592253 = product of:
      0.008118451 = sum of:
        0.008118451 = weight(_text_:a in 4311) [ClassicSimilarity], result of:
          0.008118451 = score(doc=4311,freq=8.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.15287387 = fieldWeight in 4311, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=4311)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Automatic content indexing is one of the innovations that are increasingly changing the way libraries work. In theory, it promises a cataloguing service that would hardly be possible with humans in terms of speed, quantity and maybe quality. The German National Library (DNB) has also recognised this potential and is increasingly relying on the automatic indexing of their catalogue content. The DNB took a major step in this direction in 2017, which was announced in two papers. The announcement was rather restrained, but the content of the papers is all the more explosive for the library community: Since September 2017, the DNB has discontinued the intellectual indexing of series Band H and has switched to an automatic process for these series. The subject indexing of online publications (series O) has been purely automatical since 2010; from September 2017, monographs and periodicals published outside the publishing industry and university publications will no longer be indexed by people. This raises the question: What is the quality of the automatic indexing compared to the manual work or in other words to which degree can the automatic indexing replace people without a signi cant drop in regards to quality?

Type

a
Willis, C.; Losee, R.M.: ¬A random walk on an ontology : using thesaurus structure for automatic subject indexing (2013) 0.00
```
0.0020296127 = product of:
  0.0040592253 = sum of:
    0.0040592253 = product of:
      0.008118451 = sum of:
        0.008118451 = weight(_text_:a in 1016) [ClassicSimilarity], result of:
          0.008118451 = score(doc=1016,freq=18.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.15287387 = fieldWeight in 1016, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.03125 = fieldNorm(doc=1016)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Relationships between terms and features are an essential component of thesauri, ontologies, and a range of controlled vocabularies. In this article, we describe ways to identify important concepts in documents using the relationships in a thesaurus or other vocabulary structures. We introduce a methodology for the analysis and modeling of the indexing process based on a weighted random walk algorithm. The primary goal of this research is the analysis of the contribution of thesaurus structure to the indexing process. The resulting models are evaluated in the context of automatic subject indexing using four collections of documents pre-indexed with 4 different thesauri (AGROVOC [UN Food and Agriculture Organization], high-energy physics taxonomy [HEP], National Agricultural Library Thesaurus [NALT], and medical subject headings [MeSH]). We also introduce a thesaurus-centric matching algorithm intended to improve the quality of candidate concepts. In all cases, the weighted random walk improves automatic indexing performance over matching alone with an increase in average precision (AP) of 9% for HEP, 11% for MeSH, 35% for NALT, and 37% for AGROVOC. The results of the analysis support our hypothesis that subject indexing is in part a browsing process, and that using the vocabulary and its structure in a thesaurus contributes to the indexing process. The amount that the vocabulary structure contributes was found to differ among the 4 thesauri, possibly due to the vocabulary used in the corresponding thesauri and the structural relationships between the terms. Each of the thesauri and the manual indexing associated with it is characterized using the methods developed here.

Type

a
Chung, E.-K.; Miksa, S.; Hastings, S.K.: ¬A framework of automatic subject term assignment for text categorization : an indexing conception-based approach (2010) 0.00
```
0.001913537 = product of:
  0.003827074 = sum of:
    0.003827074 = product of:
      0.007654148 = sum of:
        0.007654148 = weight(_text_:a in 3434) [ClassicSimilarity], result of:
          0.007654148 = score(doc=3434,freq=16.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.14413087 = fieldWeight in 3434, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.03125 = fieldNorm(doc=3434)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The purpose of this study is to examine whether the understandings of subject-indexing processes conducted by human indexers have a positive impact on the effectiveness of automatic subject term assignment through text categorization (TC). More specifically, human indexers' subject-indexing approaches, or conceptions, in conjunction with semantic sources were explored in the context of a typical scientific journal article dataset. Based on the premise that subject indexing approaches or conceptions with semantic sources are important for automatic subject term assignment through TC, this study proposed an indexing conception-based framework. For the purpose of this study, two research questions were explored: To what extent are semantic sources effective? To what extent are indexing conceptions effective? The experiments were conducted using a Support Vector Machine implementation in WEKA (I.H. Witten & E. Frank, [2000]). Using F-measure, the experiment results showed that cited works, source title, and title were as effective as the full text while a keyword was found more effective than the full text. In addition, the findings showed that an indexing conception-based framework was more effective than the full text. The content-oriented and the document-oriented indexing approaches especially were found more effective than the full text. Among three indexing conception-based approaches, the content-oriented approach and the document-oriented approach were more effective than the domain-oriented approach. In other words, in the context of a typical scientific journal article dataset, the objective contents and authors' intentions were more desirable for automatic subject term assignment via TC than the possible users' needs. The findings of this study support that incorporation of human indexers' indexing approaches or conception in conjunction with semantic sources has a positive impact on the effectiveness of automatic subject term assignment.

Type

a

Williams, R.V.: Hans Peter Luhn and Herbert M. Ohlman : their roles in the origins of keyword-in-context/permutation automatic indexing (2010) 0.00

0.001913537 = product of:
  0.003827074 = sum of:
    0.003827074 = product of:
      0.007654148 = sum of:
        0.007654148 = weight(_text_:a in 3440) [ClassicSimilarity], result of:
          0.007654148 = score(doc=3440,freq=4.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.14413087 = fieldWeight in 3440, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=3440)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: The invention of automatic indexing using a keyword-in-context approach has generally been attributed solely to Hans Peter Luhn of IBM. This article shows that credit for this invention belongs equally to Luhn and Herbert Ohlman of the System Development Corporation. It also traces the origins of title derivative automatic indexing, its development and implementation, and current status.
Type: a

Gödert, W.: Detecting multiword phrases in mathematical text corpora (2012) 0.00
```
0.001913537 = product of:
  0.003827074 = sum of:
    0.003827074 = product of:
      0.007654148 = sum of:
        0.007654148 = weight(_text_:a in 466) [ClassicSimilarity], result of:
          0.007654148 = score(doc=466,freq=4.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.14413087 = fieldWeight in 466, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=466)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

We present an approach for detecting multiword phrases in mathematical text corpora. The method used is based on characteristic features of mathematical terminology. It makes use of a software tool named Lingo which allows to identify words by means of previously defined dictionaries for specific word classes as adjectives, personal names or nouns. The detection of multiword groups is done algorithmically. Possible advantages of the method for indexing and information retrieval and conclusions for applying dictionary-based methods of automatic indexing instead of stemming procedures are discussed.

Type

a

Keller, A.: Einstellung zur (automatischen) Sacherschließung in deutsch- und englischsprachigen Ländern (2015) 0.00

0.001913537 = product of:
  0.003827074 = sum of:
    0.003827074 = product of:
      0.007654148 = sum of:
        0.007654148 = weight(_text_:a in 2571) [ClassicSimilarity], result of:
          0.007654148 = score(doc=2571,freq=4.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.14413087 = fieldWeight in 2571, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=2571)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Type: a

Golub, K.; Lykke, M.; Tudhope, D.: Enhancing social tagging with automated keywords from the Dewey Decimal Classification (2014) 0.00
```
0.0018909799 = product of:
  0.0037819599 = sum of:
    0.0037819599 = product of:
      0.0075639198 = sum of:
        0.0075639198 = weight(_text_:a in 2918) [ClassicSimilarity], result of:
          0.0075639198 = score(doc=2918,freq=10.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.14243183 = fieldWeight in 2918, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2918)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Purpose - The purpose of this paper is to explore the potential of applying the Dewey Decimal Classification (DDC) as an established knowledge organization system (KOS) for enhancing social tagging, with the ultimate purpose of improving subject indexing and information retrieval. Design/methodology/approach - Over 11.000 Intute metadata records in politics were used. Totally, 28 politics students were each given four tasks, in which a total of 60 resources were tagged in two different configurations, one with uncontrolled social tags only and another with uncontrolled social tags as well as suggestions from a controlled vocabulary. The controlled vocabulary was DDC comprising also mappings from the Library of Congress Subject Headings. Findings - The results demonstrate the importance of controlled vocabulary suggestions for indexing and retrieval: to help produce ideas of which tags to use, to make it easier to find focus for the tagging, to ensure consistency and to increase the number of access points in retrieval. The value and usefulness of the suggestions proved to be dependent on the quality of the suggestions, both as to conceptual relevance to the user and as to appropriateness of the terminology. Originality/value - No research has investigated the enhancement of social tagging with suggestions from the DDC, an established KOS, in a user trial, comparing social tagging only and social tagging enhanced with the suggestions. This paper is a final reflection on all aspects of the study.

Type

a

Search (77 results, page 2 of 4)

Authors

Languages

Themes