Search (146 results, page 1 of 8)

Tay, A.: ¬The next generation discovery citation indexes : a review of the landscape in 2020 (2020) 0.05

0.053932853 = product of:
  0.107865706 = sum of:
    0.107865706 = sum of:
      0.05872144 = weight(_text_:indexing in 40) [ClassicSimilarity], result of:
        0.05872144 = score(doc=40,freq=2.0), product of:
          0.19835205 = queryWeight, product of:
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.051817898 = queryNorm
          0.29604656 = fieldWeight in 40, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.0546875 = fieldNorm(doc=40)
      0.049144268 = weight(_text_:22 in 40) [ClassicSimilarity], result of:
        0.049144268 = score(doc=40,freq=2.0), product of:
          0.18145745 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051817898 = queryNorm
          0.2708308 = fieldWeight in 40, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=40)
  0.5 = coord(1/2)

Date: 17.11.2020 12:22:59
Theme: Citation indexing

Rae, A.R.; Mork, J.G.; Demner-Fushman, D.: ¬The National Library of Medicine indexer assignment dataset : a new large-scale dataset for reviewer assignment research (2023) 0.05
```
0.047210332 = product of:
  0.094420664 = sum of:
    0.094420664 = sum of:
      0.059317615 = weight(_text_:indexing in 885) [ClassicSimilarity], result of:
        0.059317615 = score(doc=885,freq=4.0), product of:
          0.19835205 = queryWeight, product of:
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.051817898 = queryNorm
          0.29905218 = fieldWeight in 885, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.0390625 = fieldNorm(doc=885)
      0.03510305 = weight(_text_:22 in 885) [ClassicSimilarity], result of:
        0.03510305 = score(doc=885,freq=2.0), product of:
          0.18145745 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051817898 = queryNorm
          0.19345059 = fieldWeight in 885, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=885)
  0.5 = coord(1/2)
```
Abstract

MEDLINE is the National Library of Medicine's (NLM) journal citation database. It contains over 28 million references to biomedical and life science journal articles, and a key feature of the database is that all articles are indexed with NLM Medical Subject Headings (MeSH). The library employs a team of MeSH indexers, and in recent years they have been asked to index close to 1 million articles per year in order to keep MEDLINE up to date. An important part of the MEDLINE indexing process is the assignment of articles to indexers. High quality and timely indexing is only possible when articles are assigned to indexers with suitable expertise. This article introduces the NLM indexer assignment dataset: a large dataset of 4.2 million indexer article assignments for articles indexed between 2011 and 2019. The dataset is shown to be a valuable testbed for expert matching and assignment algorithms, and indexer article assignment is also found to be useful domain-adaptive pre-training for the closely related task of reviewer assignment.

Date

22. 1.2023 18:49:49
Asubiaro, T.V.; Onaolapo, S.: ¬A comparative study of the coverage of African journals in Web of Science, Scopus, and CrossRef (2023) 0.05
```
0.047210332 = product of:
  0.094420664 = sum of:
    0.094420664 = sum of:
      0.059317615 = weight(_text_:indexing in 992) [ClassicSimilarity], result of:
        0.059317615 = score(doc=992,freq=4.0), product of:
          0.19835205 = queryWeight, product of:
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.051817898 = queryNorm
          0.29905218 = fieldWeight in 992, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.0390625 = fieldNorm(doc=992)
      0.03510305 = weight(_text_:22 in 992) [ClassicSimilarity], result of:
        0.03510305 = score(doc=992,freq=2.0), product of:
          0.18145745 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051817898 = queryNorm
          0.19345059 = fieldWeight in 992, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=992)
  0.5 = coord(1/2)
```
Abstract

This is the first study that evaluated the coverage of journals from Africa in Web of Science, Scopus, and CrossRef. A list of active journals published in each of the 55 African countries was compiled from Ulrich's periodicals directory and African Journals Online (AJOL) website. Journal master lists for Web of Science, Scopus, and CrossRef were searched for the African journals. A total of 2,229 unique active African journals were identified from Ulrich (N = 2,117, 95.0%) and AJOL (N = 243, 10.9%) after removing duplicates. The volume of African journals in Web of Science and Scopus databases is 7.4% (N = 166) and 7.8% (N = 174), respectively, compared to the 45.6% (N = 1,017) covered in CrossRef. While making up only 17.% of all the African journals, South African journals had the best coverage in the two most authoritative databases, accounting for 73.5% and 62.1% of all the African journals in Web of Science and Scopus, respectively. In contrast, Nigeria published 44.5% of all the African journals. The distribution of the African journals is biased in favor of Medical, Life and Health Sciences and Humanities and the Arts in the three databases. The low representation of African journals in CrossRef, a free indexing infrastructure that could be harnessed for building an African-centric research indexing database, is concerning.

Date

22. 6.2023 14:09:06
Cheti, A.; Viti, E.: Functionality and merits of a faceted thesaurus : the case of the Nuovo soggettario (2023) 0.05
```
0.046228163 = product of:
  0.092456326 = sum of:
    0.092456326 = sum of:
      0.050332665 = weight(_text_:indexing in 1181) [ClassicSimilarity], result of:
        0.050332665 = score(doc=1181,freq=2.0), product of:
          0.19835205 = queryWeight, product of:
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.051817898 = queryNorm
          0.2537542 = fieldWeight in 1181, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.046875 = fieldNorm(doc=1181)
      0.042123657 = weight(_text_:22 in 1181) [ClassicSimilarity], result of:
        0.042123657 = score(doc=1181,freq=2.0), product of:
          0.18145745 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051817898 = queryNorm
          0.23214069 = fieldWeight in 1181, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=1181)
  0.5 = coord(1/2)
```
Abstract

The Nuovo soggettario, the official Italian subject indexing system edited by the National Central Library of Florence, is made up of interactive components, the core of which is a general thesaurus and some rules of a conventional syntax for subject string construction. The Nuovo soggettario Thesaurus is in compliance with ISO 25964: 2011-2013, IFLA LRM, and FAIR principle (findability, accessibility, interoperability, and reusability). Its open data are available in the Zthes, MARC21, and in SKOS formats and allow for interoperability with l library, archive, and museum databases. The Thesaurus's macrostructure is organized into four fundamental macro-categories, thirteen categories, and facets. The facets allow for the orderly development of hierarchies, thereby limiting polyhierarchies and promoting the grouping of homogenous concepts. This paper addresses the main features and peculiarities which have characterized the consistent development of this categorical structure and its effects on the syntactic sphere in a predominantly pre-coordinated usage context.

Date

26.11.2023 18:59:22

Noever, D.; Ciolino, M.: ¬The Turing deception (2022) 0.04

0.04115028 = product of:
  0.08230056 = sum of:
    0.08230056 = product of:
      0.24690168 = sum of:
        0.24690168 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
          0.24690168 = score(doc=862,freq=2.0), product of:
            0.43931273 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.051817898 = queryNorm
            0.56201804 = fieldWeight in 862, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=862)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Source: https%3A%2F%2Farxiv.org%2Fabs%2F2212.06721&usg=AOvVaw3i_9pZm9y_dQWoHi6uv0EN

Hjoerland, B.: Table of contents (ToC) (2022) 0.04
```
0.038523465 = product of:
  0.07704693 = sum of:
    0.07704693 = sum of:
      0.041943885 = weight(_text_:indexing in 1096) [ClassicSimilarity], result of:
        0.041943885 = score(doc=1096,freq=2.0), product of:
          0.19835205 = queryWeight, product of:
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.051817898 = queryNorm
          0.21146181 = fieldWeight in 1096, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1096)
      0.03510305 = weight(_text_:22 in 1096) [ClassicSimilarity], result of:
        0.03510305 = score(doc=1096,freq=2.0), product of:
          0.18145745 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051817898 = queryNorm
          0.19345059 = fieldWeight in 1096, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1096)
  0.5 = coord(1/2)
```
Abstract

A table of contents (ToC) is a kind of document representation as well as a paratext and a kind of finding device to the document it represents. TOCs are very common in books and some other kinds of documents, but not in all kinds. This article discusses the definition and functions of ToC, normative guidelines for their design, and the history and forms of ToC in different kinds of documents and media. A main part of the article is about the role of ToC in information searching, in current awareness services and as items added to bibliographical records. The introduction and the conclusion focus on the core theoretical issues concerning ToCs. Should they be document-oriented or request-oriented, neutral, or policy-oriented, objective, or subjective? It is concluded that because of the special functions of ToCs, the arguments for the request-oriented (policy-oriented, subjective) view are weaker than they are in relation to indexing and knowledge organization in general. Apart from level of granularity, the evaluation of a ToC is difficult to separate from the evaluation of the structuring and naming of the elements of the structure of the document it represents.

Date

18.11.2023 13:47:22
Dietz, K.: en.wikipedia.org > 6 Mio. Artikel (2020) 0.03
```
0.0342919 = product of:
  0.0685838 = sum of:
    0.0685838 = product of:
      0.2057514 = sum of:
        0.2057514 = weight(_text_:3a in 5669) [ClassicSimilarity], result of:
          0.2057514 = score(doc=5669,freq=2.0), product of:
            0.43931273 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.051817898 = queryNorm
            0.46834838 = fieldWeight in 5669, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5669)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)
```
Content

"Die Englischsprachige Wikipedia verfügt jetzt über mehr als 6 Millionen Artikel. An zweiter Stelle kommt die deutschsprachige Wikipedia mit 2.3 Millionen Artikeln, an dritter Stelle steht die französischsprachige Wikipedia mit 2.1 Millionen Artikeln (via Researchbuzz: Firehose <https://rbfirehose.com/2020/01/24/techcrunch-wikipedia-now-has-more-than-6-million-articles-in-english/> und Techcrunch <https://techcrunch.com/2020/01/23/wikipedia-english-six-million-articles/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+Techcrunch+%28TechCrunch%29&guccounter=1&guce_referrer=aHR0cHM6Ly9yYmZpcmVob3NlLmNvbS8yMDIwLzAxLzI0L3RlY2hjcnVuY2gtd2lraXBlZGlhLW5vdy1oYXMtbW9yZS10aGFuLTYtbWlsbGlvbi1hcnRpY2xlcy1pbi1lbmdsaXNoLw&guce_referrer_sig=AQAAAK0zHfjdDZ_spFZBF_z-zDjtL5iWvuKDumFTzm4HvQzkUfE2pLXQzGS6FGB_y-VISdMEsUSvkNsg2U_NWQ4lwWSvOo3jvXo1I3GtgHpP8exukVxYAnn5mJspqX50VHIWFADHhs5AerkRn3hMRtf_R3F1qmEbo8EROZXp328HMC-o>). 250120 via digithek ch = #fineBlog s.a.: Angesichts der Veröffentlichung des 6-millionsten Artikels vergangene Woche in der englischsprachigen Wikipedia hat die Community-Zeitungsseite "Wikipedia Signpost" ein Moratorium bei der Veröffentlichung von Unternehmensartikeln gefordert. Das sei kein Vorwurf gegen die Wikimedia Foundation, aber die derzeitigen Maßnahmen, um die Enzyklopädie gegen missbräuchliches undeklariertes Paid Editing zu schützen, funktionierten ganz klar nicht. *"Da die ehrenamtlichen Autoren derzeit von Werbung in Gestalt von Wikipedia-Artikeln überwältigt werden, und da die WMF nicht in der Lage zu sein scheint, dem irgendetwas entgegenzusetzen, wäre der einzige gangbare Weg für die Autoren, fürs erste die Neuanlage von Artikeln über Unternehmen zu untersagen"*, schreibt der Benutzer Smallbones in seinem Editorial <https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2020-01-27/From_the_editor> zur heutigen Ausgabe."
Gabler, S.: Vergabe von DDC-Sachgruppen mittels eines Schlagwort-Thesaurus (2021) 0.03
```
0.0342919 = product of:
  0.0685838 = sum of:
    0.0685838 = product of:
      0.2057514 = sum of:
        0.2057514 = weight(_text_:3a in 1000) [ClassicSimilarity], result of:
          0.2057514 = score(doc=1000,freq=2.0), product of:
            0.43931273 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.051817898 = queryNorm
            0.46834838 = fieldWeight in 1000, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1000)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)
```
Content

Master thesis Master of Science (Library and Information Studies) (MSc), Universität Wien. Advisor: Christoph Steiner. Vgl.: https://www.researchgate.net/publication/371680244_Vergabe_von_DDC-Sachgruppen_mittels_eines_Schlagwort-Thesaurus. DOI: 10.25365/thesis.70030. Vgl. dazu die Präsentation unter: https://www.google.com/url?sa=i&rct=j&q=&esrc=s&source=web&cd=&ved=0CAIQw7AJahcKEwjwoZzzytz_AhUAAAAAHQAAAAAQAg&url=https%3A%2F%2Fwiki.dnb.de%2Fdownload%2Fattachments%2F252121510%2FDA3%2520Workshop-Gabler.pdf%3Fversion%3D1%26modificationDate%3D1671093170000%26api%3Dv2&psig=AOvVaw0szwENK1or3HevgvIDOfjx&ust=1687719410889597&opi=89978449.
Bedford, D.: Knowledge architectures : structures and semantics (2021) 0.03
```
0.030818773 = product of:
  0.061637547 = sum of:
    0.061637547 = sum of:
      0.03355511 = weight(_text_:indexing in 566) [ClassicSimilarity], result of:
        0.03355511 = score(doc=566,freq=2.0), product of:
          0.19835205 = queryWeight, product of:
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.051817898 = queryNorm
          0.16916946 = fieldWeight in 566, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.03125 = fieldNorm(doc=566)
      0.028082438 = weight(_text_:22 in 566) [ClassicSimilarity], result of:
        0.028082438 = score(doc=566,freq=2.0), product of:
          0.18145745 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051817898 = queryNorm
          0.15476047 = fieldWeight in 566, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.03125 = fieldNorm(doc=566)
  0.5 = coord(1/2)
```
Content

Section 1 Context and purpose of knowledge architecture -- 1 Making the case for knowledge architecture -- 2 The landscape of knowledge assets -- 3 Knowledge architecture and design -- 4 Knowledge architecture reference model -- 5 Knowledge architecture segments -- Section 2 Designing for availability -- 6 Knowledge object modeling -- 7 Knowledge structures for encoding, formatting, and packaging -- 8 Functional architecture for identification and distinction -- 9 Functional architectures for knowledge asset disposition and destruction -- 10 Functional architecture designs for knowledge preservation and conservation -- Section 3 Designing for accessibility -- 11 Functional architectures for knowledge seeking and discovery -- 12 Functional architecture for knowledge search -- 13 Functional architecture for knowledge categorization -- 14 Functional architectures for indexing and keywording -- 15 Functional architecture for knowledge semantics -- 16 Functional architecture for knowledge abstraction and surrogation -- Section 4 Functional architectures to support knowledge consumption -- 17 Functional architecture for knowledge augmentation, derivation, and synthesis -- 18 Functional architecture to manage risk and harm -- 19 Functional architectures for knowledge authentication and provenance -- 20 Functional architectures for securing knowledge assets -- 21 Functional architectures for authorization and asset management -- Section 5 Pulling it all together - the big picture knowledge architecture -- 22 Functional architecture for knowledge metadata and metainformation -- 23 The whole knowledge architecture - pulling it all together
Chou, C.; Chu, T.: ¬An analysis of BERT (NLP) for assisted subject indexing for Project Gutenberg (2022) 0.03
```
0.02936072 = product of:
  0.05872144 = sum of:
    0.05872144 = product of:
      0.11744288 = sum of:
        0.11744288 = weight(_text_:indexing in 1139) [ClassicSimilarity], result of:
          0.11744288 = score(doc=1139,freq=8.0), product of:
            0.19835205 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.051817898 = queryNorm
            0.5920931 = fieldWeight in 1139, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1139)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

In light of AI (Artificial Intelligence) and NLP (Natural language processing) technologies, this article examines the feasibility of using AI/NLP models to enhance the subject indexing of digital resources. While BERT (Bidirectional Encoder Representations from Transformers) models are widely used in scholarly communities, the authors assess whether BERT models can be used in machine-assisted indexing in the Project Gutenberg collection, through suggesting Library of Congress subject headings filtered by certain Library of Congress Classification subclass labels. The findings of this study are informative for further research on BERT models to assist with automatic subject indexing for digital library collections.

Manzoni, L.: Nuovo Soggettario and semantic indexing of cartographic resources in Italy : an exploratory study (2022) 0.03

0.029059576 = product of:
  0.05811915 = sum of:
    0.05811915 = product of:
      0.1162383 = sum of:
        0.1162383 = weight(_text_:indexing in 1138) [ClassicSimilarity], result of:
          0.1162383 = score(doc=1138,freq=6.0), product of:
            0.19835205 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.051817898 = queryNorm
            0.5860202 = fieldWeight in 1138, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0625 = fieldNorm(doc=1138)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: The paper focuses on the potential use of Nuovo soggettario, the semantic indexing tool adopted by the National Central Library of Florence (Biblioteca nazionale centrale di Firenze), for indexing cartographic resources. Particular attention is paid to the treatment of place names, the use of formal subjects, and the different ways of constructing subject strings for general and thematic maps.

Golub, K.: Automated subject indexing : an overview (2021) 0.03
```
0.025427131 = product of:
  0.050854262 = sum of:
    0.050854262 = product of:
      0.101708524 = sum of:
        0.101708524 = weight(_text_:indexing in 718) [ClassicSimilarity], result of:
          0.101708524 = score(doc=718,freq=6.0), product of:
            0.19835205 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.051817898 = queryNorm
            0.5127677 = fieldWeight in 718, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0546875 = fieldNorm(doc=718)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

In the face of the ever-increasing document volume, libraries around the globe are more and more exploring (semi-) automated approaches to subject indexing. This helps sustain bibliographic objectives, enrich metadata, and establish more connections across documents from various collections, effectively leading to improved information retrieval and access. However, generally accepted automated approaches that are functional in operative systems are lacking. This article aims to provide an overview of basic principles used for automated subject indexing, major approaches in relation to their possible application in actual library systems, existing working examples, as well as related challenges calling for further research.

¬Der Student aus dem Computer (2023) 0.02

0.024572134 = product of:
  0.049144268 = sum of:
    0.049144268 = product of:
      0.098288536 = sum of:
        0.098288536 = weight(_text_:22 in 1079) [ClassicSimilarity], result of:
          0.098288536 = score(doc=1079,freq=2.0), product of:
            0.18145745 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051817898 = queryNorm
            0.5416616 = fieldWeight in 1079, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=1079)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 27. 1.2023 16:22:55

Ali, C.B.; Haddad, H.; Slimani, Y.: Multi-word terms selection for information retrieval (2022) 0.02
```
0.023447346 = product of:
  0.046894692 = sum of:
    0.046894692 = product of:
      0.093789384 = sum of:
        0.093789384 = weight(_text_:indexing in 900) [ClassicSimilarity], result of:
          0.093789384 = score(doc=900,freq=10.0), product of:
            0.19835205 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.051817898 = queryNorm
            0.47284302 = fieldWeight in 900, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0390625 = fieldNorm(doc=900)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Purpose A number of approaches and algorithms have been proposed over the years as a basis for automatic indexing. Many of these approaches suffer from precision inefficiency at low recall. The choice of indexing units has a great impact on search system effectiveness. The authors dive beyond simple terms indexing to propose a framework for multi-word terms (MWT) filtering and indexing. Design/methodology/approach In this paper, the authors rely on ranking MWT to filter them, keeping the most effective ones for the indexing process. The proposed model is based on filtering MWT according to their ability to capture the document topic and distinguish between different documents from the same collection. The authors rely on the hypothesis that the best MWT are those that achieve the greatest association degree. The experiments are carried out with English and French languages data sets. Findings The results indicate that this approach achieved precision enhancements at low recall, and it performed better than more advanced models based on terms dependencies. Originality/value Using and testing different association measures to select MWT that best describe the documents to enhance the precision in the first retrieved documents.
Asula, M.; Makke, J.; Freienthal, L.; Kuulmets, H.-A.; Sirel, R.: Kratt: developing an automatic subject indexing tool for the National Library of Estonia : how to transfer metadata information among work cluster members (2021) 0.02
```
0.02179468 = product of:
  0.04358936 = sum of:
    0.04358936 = product of:
      0.08717872 = sum of:
        0.08717872 = weight(_text_:indexing in 723) [ClassicSimilarity], result of:
          0.08717872 = score(doc=723,freq=6.0), product of:
            0.19835205 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.051817898 = queryNorm
            0.4395151 = fieldWeight in 723, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.046875 = fieldNorm(doc=723)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Manual subject indexing in libraries is a time-consuming and costly process and the quality of the assigned subjects is affected by the cataloger's knowledge on the specific topics contained in the book. Trying to solve these issues, we exploited the opportunities arising from artificial intelligence to develop Kratt: a prototype of an automatic subject indexing tool. Kratt is able to subject index a book independent of its extent and genre with a set of keywords present in the Estonian Subject Thesaurus. It takes Kratt approximately one minute to subject index a book, outperforming humans 10-15 times. Although the resulting keywords were not considered satisfactory by the catalogers, the ratings of a small sample of regular library users showed more promise. We also argue that the results can be enhanced by including a bigger corpus for training the model and applying more careful preprocessing techniques.

Jaeger, L.: Wissenschaftler versus Wissenschaft (2020) 0.02

0.021061828 = product of:
  0.042123657 = sum of:
    0.042123657 = product of:
      0.08424731 = sum of:
        0.08424731 = weight(_text_:22 in 4156) [ClassicSimilarity], result of:
          0.08424731 = score(doc=4156,freq=2.0), product of:
            0.18145745 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051817898 = queryNorm
            0.46428138 = fieldWeight in 4156, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=4156)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 2. 3.2020 14:08:22

Ibrahim, G.M.; Taylor, M.: Krebszellen manipulieren Neurone : Gliome (2023) 0.02

0.021061828 = product of:
  0.042123657 = sum of:
    0.042123657 = product of:
      0.08424731 = sum of:
        0.08424731 = weight(_text_:22 in 1203) [ClassicSimilarity], result of:
          0.08424731 = score(doc=1203,freq=2.0), product of:
            0.18145745 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051817898 = queryNorm
            0.46428138 = fieldWeight in 1203, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=1203)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Spektrum der Wissenschaft. 2023, H.10, S.22-24

Ahmed, M.; Mukhopadhyay, M.; Mukhopadhyay, P.: Automated knowledge organization : AI ML based subject indexing system for libraries (2023) 0.02
```
0.020971943 = product of:
  0.041943885 = sum of:
    0.041943885 = product of:
      0.08388777 = sum of:
        0.08388777 = weight(_text_:indexing in 977) [ClassicSimilarity], result of:
          0.08388777 = score(doc=977,freq=8.0), product of:
            0.19835205 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.051817898 = queryNorm
            0.42292362 = fieldWeight in 977, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0390625 = fieldNorm(doc=977)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The research study as reported here is an attempt to explore the possibilities of an AI/ML-based semi-automated indexing system in a library setup to handle large volumes of documents. It uses the Python virtual environment to install and configure an open source AI environment (named Annif) to feed the LOD (Linked Open Data) dataset of Library of Congress Subject Headings (LCSH) as a standard KOS (Knowledge Organisation System). The framework deployed the Turtle format of LCSH after cleaning the file with Skosify, applied an array of backend algorithms (namely TF-IDF, Omikuji, and NN-Ensemble) to measure relative performance, and selected Snowball as an analyser. The training of Annif was conducted with a large set of bibliographic records populated with subject descriptors (MARC tag 650$a) and indexed by trained LIS professionals. The training dataset is first treated with MarcEdit to export it in a format suitable for OpenRefine, and then in OpenRefine it undergoes many steps to produce a bibliographic record set suitable to train Annif. The framework, after training, has been tested with a bibliographic dataset to measure indexing efficiencies, and finally, the automated indexing framework is integrated with data wrangling software (OpenRefine) to produce suggested headings on a mass scale. The entire framework is based on open-source software, open datasets, and open standards.
Golub, K.; Tyrkkö, J.; Hansson, J.; Ahlström, I.: Subject indexing in humanities : a comparison between a local university repository and an international bibliographic service (2020) 0.02
```
0.018162236 = product of:
  0.03632447 = sum of:
    0.03632447 = product of:
      0.07264894 = sum of:
        0.07264894 = weight(_text_:indexing in 5982) [ClassicSimilarity], result of:
          0.07264894 = score(doc=5982,freq=6.0), product of:
            0.19835205 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.051817898 = queryNorm
            0.3662626 = fieldWeight in 5982, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5982)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

As the humanities develop in the realm of increasingly more pronounced digital scholarship, it is important to provide quality subject access to a vast range of heterogeneous information objects in digital services. The study aims to paint a representative picture of the current state of affairs of the use of subject index terms in humanities journal articles with particular reference to the well-established subject access needs of humanities researchers, with the purpose of identifying which improvements are needed in this context. Design/methodology/approach The comparison of subject metadata on a sample of 649 peer-reviewed journal articles from across the humanities is conducted in a university repository, against Scopus, the former reflecting local and national policies and the latter being the most comprehensive international abstract and citation database of research output. Findings The study shows that established bibliographic objectives to ensure subject access for humanities journal articles are not supported in either the world's largest commercial abstract and citation database Scopus or the local repository of a public university in Sweden. The indexing policies in the two services do not seem to address the needs of humanities scholars for highly granular subject index terms with appropriate facets; no controlled vocabularies for any humanities discipline are used whatsoever. Originality/value In all, not much has changed since 1990s when indexing for the humanities was shown to lag behind the sciences. The community of researchers and information professionals, today working together on digital humanities projects, as well as interdisciplinary research teams, should demand that their subject access needs be fulfilled, especially in commercial services like Scopus and discovery services.
Suominen, O.; Koskenniemi, I.: Annif Analyzer Shootout : comparing text lemmatization methods for automated subject indexing (2022) 0.02
```
0.018162236 = product of:
  0.03632447 = sum of:
    0.03632447 = product of:
      0.07264894 = sum of:
        0.07264894 = weight(_text_:indexing in 658) [ClassicSimilarity], result of:
          0.07264894 = score(doc=658,freq=6.0), product of:
            0.19835205 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.051817898 = queryNorm
            0.3662626 = fieldWeight in 658, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0390625 = fieldNorm(doc=658)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Automated text classification is an important function for many AI systems relevant to libraries, including automated subject indexing and classification. When implemented using the traditional natural language processing (NLP) paradigm, one key part of the process is the normalization of words using stemming or lemmatization, which reduces the amount of linguistic variation and often improves the quality of classification. In this paper, we compare the output of seven different text lemmatization algorithms as well as two baseline methods. We measure how the choice of method affects the quality of text classification using example corpora in three languages. The experiments have been performed using the open source Annif toolkit for automated subject indexing and classification, but should generalize also to other NLP toolkits and similar text classification tasks. The results show that lemmatization methods in most cases outperform baseline methods in text classification particularly for Finnish and Swedish text, but not English, where baseline methods are most effective. The differences between lemmatization methods are quite small. The systematic comparison will help optimize text classification pipelines and inform the further development of the Annif toolkit to incorporate a wider choice of normalization methods.

Search (146 results, page 1 of 8)

Authors

Languages

Types

Themes

Subjects

Classifications