Search (28 results, page 1 of 2)

  • × year_i:[2010 TO 2020}
  • × theme_ss:"Automatisches Indexieren"
  1. Keller, A.: Attitudes among German- and English-speaking librarians toward (automatic) subject indexing (2015) 0.03
    0.025167733 = product of:
      0.050335467 = sum of:
        0.050335467 = product of:
          0.10067093 = sum of:
            0.10067093 = weight(_text_:subject in 2629) [ClassicSimilarity], result of:
              0.10067093 = score(doc=2629,freq=10.0), product of:
                0.16275941 = queryWeight, product of:
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.04550679 = queryNorm
                0.61852604 = fieldWeight in 2629, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2629)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The survey described in this article investigates the attitudes of librarians in German- and English-speaking countries toward subject indexing in general, and automatic subject indexing in particular. The results show great similarity between attitudes in both language areas. Respondents agree that the current quality standards should be upheld and dismiss critical voices claiming that subject indexing has lost relevance. With regard to automatic subject indexing, respondents demonstrate considerable skepticism-both with regard to the likely timeframe and the expected quality of such systems. The author considers how this low acceptance poses a difficulty for those involved in change management.
  2. Lu, K.; Mao, J.; Li, G.: Toward effective automated weighted subject indexing : a comparison of different approaches in different environments (2018) 0.02
    0.02411861 = product of:
      0.04823722 = sum of:
        0.04823722 = product of:
          0.09647444 = sum of:
            0.09647444 = weight(_text_:subject in 4292) [ClassicSimilarity], result of:
              0.09647444 = score(doc=4292,freq=18.0), product of:
                0.16275941 = queryWeight, product of:
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.04550679 = queryNorm
                0.5927426 = fieldWeight in 4292, product of:
                  4.2426405 = tf(freq=18.0), with freq of:
                    18.0 = termFreq=18.0
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4292)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Subject indexing plays an important role in supporting subject access to information resources. Current subject indexing systems do not make adequate distinctions on the importance of assigned subject descriptors. Assigning numeric weights to subject descriptors to distinguish their importance to the documents can strengthen the role of subject metadata. Automated methods are more cost-effective. This study compares different automated weighting methods in different environments. Two evaluation methods were used to assess the performance. Experiments on three datasets in the biomedical domain suggest the performance of different weighting methods depends on whether it is an abstract or full text environment. Mutual information with bag-of-words representation shows the best average performance in the full text environment, while cosine with bag-of-words representation is the best in an abstract environment. The cosine measure has relatively consistent and robust performance. A direct weighting method, IDF (Inverse Document Frequency), can produce quick and reasonable estimates of the weights. Bag-of-words representation generally outperforms the concept-based representation. Further improvement in performance can be obtained by using the learning-to-rank method to integrate different weighting methods. This study follows up Lu and Mao (Journal of the Association for Information Science and Technology, 66, 1776-1784, 2015), in which an automated weighted subject indexing method was proposed and validated. The findings from this study contribute to more effective weighted subject indexing.
  3. Junger, U.: Can indexing be automated? : the example of the Deutsche Nationalbibliothek (2012) 0.02
    0.022510704 = product of:
      0.045021407 = sum of:
        0.045021407 = product of:
          0.090042815 = sum of:
            0.090042815 = weight(_text_:subject in 1717) [ClassicSimilarity], result of:
              0.090042815 = score(doc=1717,freq=8.0), product of:
                0.16275941 = queryWeight, product of:
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.04550679 = queryNorm
                0.5532265 = fieldWeight in 1717, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1717)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The German subject headings authority file (Schlagwortnormdatei/SWD) provides a broad controlled vocabulary for indexing documents of all subjects. Traditionally used for intellectual subject cataloguing primarily of books the Deutsche Nationalbibliothek (DNB, German National Library) has been working on developping and implementing procedures for automated assignment of subject headings for online publications. This project, its results and problems are sketched in the paper.
    Content
    Beitrag für die Tagung: Beyond libraries - subject metadata in the digital environment and semantic web. IFLA Satellite Post-Conference, 17-18 August 2012, Tallinn. Vgl.: http://http://www.nlib.ee/index.php?id=17763.
  4. Short, M.: Text mining and subject analysis for fiction; or, using machine learning and information extraction to assign subject headings to dime novels (2019) 0.02
    0.022510704 = product of:
      0.045021407 = sum of:
        0.045021407 = product of:
          0.090042815 = sum of:
            0.090042815 = weight(_text_:subject in 5481) [ClassicSimilarity], result of:
              0.090042815 = score(doc=5481,freq=8.0), product of:
                0.16275941 = queryWeight, product of:
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.04550679 = queryNorm
                0.5532265 = fieldWeight in 5481, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5481)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This article describes multiple experiments in text mining at Northern Illinois University that were undertaken to improve the efficiency and accuracy of cataloging. It focuses narrowly on subject analysis of dime novels, a format of inexpensive fiction that was popular in the United States between 1860 and 1915. NIU holds more than 55,000 dime novels in its collections, which it is in the process of comprehensively digitizing. Classification, keyword extraction, named-entity recognition, clustering, and topic modeling are discussed as means of assigning subject headings to improve their discoverability by researchers and to increase the productivity of digitization workflows.
  5. Lu, K.; Mao, J.: ¬An automatic approach to weighted subject indexing : an empirical study in the biomedical domain (2015) 0.02
    0.019692764 = product of:
      0.039385527 = sum of:
        0.039385527 = product of:
          0.078771055 = sum of:
            0.078771055 = weight(_text_:subject in 4005) [ClassicSimilarity], result of:
              0.078771055 = score(doc=4005,freq=12.0), product of:
                0.16275941 = queryWeight, product of:
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.04550679 = queryNorm
                0.48397237 = fieldWeight in 4005, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4005)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Subject indexing is an intellectually intensive process that has many inherent uncertainties. Existing manual subject indexing systems generally produce binary outcomes for whether or not to assign an indexing term. This does not sufficiently reflect the extent to which the indexing terms are associated with the documents. On the other hand, the idea of probabilistic or weighted indexing was proposed a long time ago and has seen success in capturing uncertainties in the automatic indexing process. One hurdle to overcome in implementing weighted indexing in manual subject indexing systems is the practical burden that could be added to the already intensive indexing process. This study proposes a method to infer automatically the associations between subject terms and documents through text mining. By uncovering the connections between MeSH descriptors and document text, we are able to derive the weights of MeSH descriptors manually assigned to documents. Our initial results suggest that the inference method is feasible and promising. The study has practical implications for improving subject indexing practice and providing better support for information retrieval.
  6. Golub, K.: Automatic subject indexing of text (2019) 0.02
    0.019692764 = product of:
      0.039385527 = sum of:
        0.039385527 = product of:
          0.078771055 = sum of:
            0.078771055 = weight(_text_:subject in 5268) [ClassicSimilarity], result of:
              0.078771055 = score(doc=5268,freq=12.0), product of:
                0.16275941 = queryWeight, product of:
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.04550679 = queryNorm
                0.48397237 = fieldWeight in 5268, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5268)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Automatic subject indexing addresses problems of scale and sustainability and can be at the same time used to enrich existing metadata records, establish more connections across and between resources from various metadata and resource collec-tions, and enhance consistency of the metadata. In this work, au-tomatic subject indexing focuses on assigning index terms or classes from established knowledge organization systems (KOSs) for subject indexing like thesauri, subject headings systems and classification systems. The following major approaches are dis-cussed, in terms of their similarities and differences, advantages and disadvantages for automatic assigned indexing from KOSs: "text categorization," "document clustering," and "document classification." Text categorization is perhaps the most wide-spread, machine-learning approach with what seems generally good reported performance. Document clustering automatically both creates groups of related documents and extracts names of subjects depicting the group at hand. Document classification re-uses the intellectual effort invested into creating a KOS for sub-ject indexing and even simple string-matching algorithms have been reported to achieve good results, because one concept can be described using a number of different terms, including equiv-alent, related, narrower and broader terms. Finally, applicability of automatic subject indexing to operative information systems and challenges of evaluation are outlined, suggesting the need for more research.
  7. Junger, U.: Can indexing be automated? : the example of the Deutsche Nationalbibliothek (2014) 0.02
    0.019494843 = product of:
      0.038989685 = sum of:
        0.038989685 = product of:
          0.07797937 = sum of:
            0.07797937 = weight(_text_:subject in 1969) [ClassicSimilarity], result of:
              0.07797937 = score(doc=1969,freq=6.0), product of:
                0.16275941 = queryWeight, product of:
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.04550679 = queryNorm
                0.4791082 = fieldWeight in 1969, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1969)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The German Integrated Authority File (Gemeinsame Normdatei, GND), provides a broad controlled vocabulary for indexing documents on all subjects. Traditionally used for intellectual subject cataloging primarily for books, the Deutsche Nationalbibliothek (DNB, German National Library) has been working on developing and implementing procedures for automated assignment of subject headings for online publications. This project, its results, and problems are outlined in this article.
    Footnote
    Contribution in a special issue "Beyond libraries: Subject metadata in the digital environment and Semantic Web" - Enthält Beiträge der gleichnamigen IFLA Satellite Post-Conference, 17-18 August 2012, Tallinn.
  8. Hauer, M.: Tiefenindexierung im Bibliothekskatalog : 17 Jahre intelligentCAPTURE (2019) 0.02
    0.018496625 = product of:
      0.03699325 = sum of:
        0.03699325 = product of:
          0.0739865 = sum of:
            0.0739865 = weight(_text_:22 in 5629) [ClassicSimilarity], result of:
              0.0739865 = score(doc=5629,freq=2.0), product of:
                0.15935703 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04550679 = queryNorm
                0.46428138 = fieldWeight in 5629, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=5629)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    B.I.T.online. 22(2019) H.2, S.163-166
  9. Chung, E.-K.; Miksa, S.; Hastings, S.K.: ¬A framework of automatic subject term assignment for text categorization : an indexing conception-based approach (2010) 0.02
    0.018191395 = product of:
      0.03638279 = sum of:
        0.03638279 = product of:
          0.07276558 = sum of:
            0.07276558 = weight(_text_:subject in 3434) [ClassicSimilarity], result of:
              0.07276558 = score(doc=3434,freq=16.0), product of:
                0.16275941 = queryWeight, product of:
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.04550679 = queryNorm
                0.4470745 = fieldWeight in 3434, product of:
                  4.0 = tf(freq=16.0), with freq of:
                    16.0 = termFreq=16.0
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.03125 = fieldNorm(doc=3434)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The purpose of this study is to examine whether the understandings of subject-indexing processes conducted by human indexers have a positive impact on the effectiveness of automatic subject term assignment through text categorization (TC). More specifically, human indexers' subject-indexing approaches, or conceptions, in conjunction with semantic sources were explored in the context of a typical scientific journal article dataset. Based on the premise that subject indexing approaches or conceptions with semantic sources are important for automatic subject term assignment through TC, this study proposed an indexing conception-based framework. For the purpose of this study, two research questions were explored: To what extent are semantic sources effective? To what extent are indexing conceptions effective? The experiments were conducted using a Support Vector Machine implementation in WEKA (I.H. Witten & E. Frank, [2000]). Using F-measure, the experiment results showed that cited works, source title, and title were as effective as the full text while a keyword was found more effective than the full text. In addition, the findings showed that an indexing conception-based framework was more effective than the full text. The content-oriented and the document-oriented indexing approaches especially were found more effective than the full text. Among three indexing conception-based approaches, the content-oriented approach and the document-oriented approach were more effective than the domain-oriented approach. In other words, in the context of a typical scientific journal article dataset, the objective contents and authors' intentions were more desirable for automatic subject term assignment via TC than the possible users' needs. The findings of this study support that incorporation of human indexers' indexing approaches or conception in conjunction with semantic sources has a positive impact on the effectiveness of automatic subject term assignment.
  10. Stankovic, R. et al.: Indexing of textual databases based on lexical resources : a case study for Serbian (2016) 0.02
    0.015413855 = product of:
      0.03082771 = sum of:
        0.03082771 = product of:
          0.06165542 = sum of:
            0.06165542 = weight(_text_:22 in 2759) [ClassicSimilarity], result of:
              0.06165542 = score(doc=2759,freq=2.0), product of:
                0.15935703 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04550679 = queryNorm
                0.38690117 = fieldWeight in 2759, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=2759)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    1. 2.2016 18:25:22
  11. Willis, C.; Losee, R.M.: ¬A random walk on an ontology : using thesaurus structure for automatic subject indexing (2013) 0.01
    0.012863259 = product of:
      0.025726518 = sum of:
        0.025726518 = product of:
          0.051453035 = sum of:
            0.051453035 = weight(_text_:subject in 1016) [ClassicSimilarity], result of:
              0.051453035 = score(doc=1016,freq=8.0), product of:
                0.16275941 = queryWeight, product of:
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.04550679 = queryNorm
                0.31612942 = fieldWeight in 1016, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.03125 = fieldNorm(doc=1016)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Relationships between terms and features are an essential component of thesauri, ontologies, and a range of controlled vocabularies. In this article, we describe ways to identify important concepts in documents using the relationships in a thesaurus or other vocabulary structures. We introduce a methodology for the analysis and modeling of the indexing process based on a weighted random walk algorithm. The primary goal of this research is the analysis of the contribution of thesaurus structure to the indexing process. The resulting models are evaluated in the context of automatic subject indexing using four collections of documents pre-indexed with 4 different thesauri (AGROVOC [UN Food and Agriculture Organization], high-energy physics taxonomy [HEP], National Agricultural Library Thesaurus [NALT], and medical subject headings [MeSH]). We also introduce a thesaurus-centric matching algorithm intended to improve the quality of candidate concepts. In all cases, the weighted random walk improves automatic indexing performance over matching alone with an increase in average precision (AP) of 9% for HEP, 11% for MeSH, 35% for NALT, and 37% for AGROVOC. The results of the analysis support our hypothesis that subject indexing is in part a browsing process, and that using the vocabulary and its structure in a thesaurus contributes to the indexing process. The amount that the vocabulary structure contributes was found to differ among the 4 thesauri, possibly due to the vocabulary used in the corresponding thesauri and the structural relationships between the terms. Each of the thesauri and the manual indexing associated with it is characterized using the methods developed here.
  12. Glaesener, L.: Automatisches Indexieren einer informationswissenschaftlichen Datenbank mit Mehrwortgruppen (2012) 0.01
    0.012331083 = product of:
      0.024662167 = sum of:
        0.024662167 = product of:
          0.049324334 = sum of:
            0.049324334 = weight(_text_:22 in 401) [ClassicSimilarity], result of:
              0.049324334 = score(doc=401,freq=2.0), product of:
                0.15935703 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04550679 = queryNorm
                0.30952093 = fieldWeight in 401, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=401)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    11. 9.2012 19:43:22
  13. Golub, K.; Lykke, M.; Tudhope, D.: Enhancing social tagging with automated keywords from the Dewey Decimal Classification (2014) 0.01
    0.011369622 = product of:
      0.022739245 = sum of:
        0.022739245 = product of:
          0.04547849 = sum of:
            0.04547849 = weight(_text_:subject in 2918) [ClassicSimilarity], result of:
              0.04547849 = score(doc=2918,freq=4.0), product of:
                0.16275941 = queryWeight, product of:
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.04550679 = queryNorm
                0.27942157 = fieldWeight in 2918, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2918)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Purpose - The purpose of this paper is to explore the potential of applying the Dewey Decimal Classification (DDC) as an established knowledge organization system (KOS) for enhancing social tagging, with the ultimate purpose of improving subject indexing and information retrieval. Design/methodology/approach - Over 11.000 Intute metadata records in politics were used. Totally, 28 politics students were each given four tasks, in which a total of 60 resources were tagged in two different configurations, one with uncontrolled social tags only and another with uncontrolled social tags as well as suggestions from a controlled vocabulary. The controlled vocabulary was DDC comprising also mappings from the Library of Congress Subject Headings. Findings - The results demonstrate the importance of controlled vocabulary suggestions for indexing and retrieval: to help produce ideas of which tags to use, to make it easier to find focus for the tagging, to ensure consistency and to increase the number of access points in retrieval. The value and usefulness of the suggestions proved to be dependent on the quality of the suggestions, both as to conceptual relevance to the user and as to appropriateness of the terminology. Originality/value - No research has investigated the enhancement of social tagging with suggestions from the DDC, an established KOS, in a user trial, comparing social tagging only and social tagging enhanced with the suggestions. This paper is a final reflection on all aspects of the study.
  14. Moreno, J.M.T.: Automatic text summarization (2014) 0.01
    0.011369622 = product of:
      0.022739245 = sum of:
        0.022739245 = product of:
          0.04547849 = sum of:
            0.04547849 = weight(_text_:subject in 1518) [ClassicSimilarity], result of:
              0.04547849 = score(doc=1518,freq=4.0), product of:
                0.16275941 = queryWeight, product of:
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.04550679 = queryNorm
                0.27942157 = fieldWeight in 1518, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1518)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This new textbook examines the motivations and the different algorithms for automatic document summarization (ADS). We performed a recent state of the art. The book shows the main problems of ADS, difficulties and the solutions provided by the community. It presents recent advances in ADS, as well as current applications and trends. The approaches are statistical, linguistic and symbolic. Several exemples are included in order to clarify the theoretical concepts. The books currently available in the area of Automatic Document Summarization are not recent. Powerful algorithms have been developed in recent years that include several applications of ADS. The development of recent technology has impacted on the development of algorithms and their applications. The massive use of social networks and the new forms of the technology requires the adaptation of the classical methods of text summarizers. This is a new textbook on Automatic Text Summarization, based on teaching materials used in two or one-semester courses. It presents a extensive state-of-art and describes the new systems on the subject. Previous automatic summarization books have been either collections of specialized papers, or else authored books with only a chapter or two devoted to the field as a whole. In other hand, the classic books on the subject are not recent.
  15. Golub, K.; Soergel, D.; Buchanan, G.; Tudhope, D.; Lykke, M.; Hiom, D.: ¬A framework for evaluating automatic indexing or classification in the context of retrieval (2016) 0.01
    0.011369622 = product of:
      0.022739245 = sum of:
        0.022739245 = product of:
          0.04547849 = sum of:
            0.04547849 = weight(_text_:subject in 3311) [ClassicSimilarity], result of:
              0.04547849 = score(doc=3311,freq=4.0), product of:
                0.16275941 = queryWeight, product of:
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.04550679 = queryNorm
                0.27942157 = fieldWeight in 3311, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3311)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Tools for automatic subject assignment help deal with scale and sustainability in creating and enriching metadata, establishing more connections across and between resources and enhancing consistency. Although some software vendors and experimental researchers claim the tools can replace manual subject indexing, hard scientific evidence of their performance in operating information environments is scarce. A major reason for this is that research is usually conducted in laboratory conditions, excluding the complexities of real-life systems and situations. The article reviews and discusses issues with existing evaluation approaches such as problems of aboutness and relevance assessments, implying the need to use more than a single "gold standard" method when evaluating indexing and retrieval, and proposes a comprehensive evaluation framework. The framework is informed by a systematic review of the literature on evaluation approaches: evaluating indexing quality directly through assessment by an evaluator or through comparison with a gold standard, evaluating the quality of computer-assisted indexing directly in the context of an indexing workflow, and evaluating indexing quality indirectly through analyzing retrieval performance.
  16. Wolfe, EW.: a case study in automated metadata enhancement : Natural Language Processing in the humanities (2019) 0.01
    0.011255352 = product of:
      0.022510704 = sum of:
        0.022510704 = product of:
          0.045021407 = sum of:
            0.045021407 = weight(_text_:subject in 5236) [ClassicSimilarity], result of:
              0.045021407 = score(doc=5236,freq=2.0), product of:
                0.16275941 = queryWeight, product of:
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.04550679 = queryNorm
                0.27661324 = fieldWeight in 5236, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5236)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The Black Book Interactive Project at the University of Kansas (KU) is developing an expanded corpus of novels by African American authors, with an emphasis on lesser known writers and a goal of expanding research in this field. Using a custom metadata schema with an emphasis on race-related elements, each novel is analyzed for a variety of elements such as literary style, targeted content analysis, historical context, and other areas. Librarians at KU have worked to develop a variety of computational text analysis processes designed to assist with specific aspects of this metadata collection, including text mining and natural language processing, automated subject extraction based on word sense disambiguation, harvesting data from Wikidata, and other actions.
  17. Kasprzik, A.: Voraussetzungen und Anwendungspotentiale einer präzisen Sacherschließung aus Sicht der Wissenschaft (2018) 0.01
    0.010789698 = product of:
      0.021579396 = sum of:
        0.021579396 = product of:
          0.043158792 = sum of:
            0.043158792 = weight(_text_:22 in 5195) [ClassicSimilarity], result of:
              0.043158792 = score(doc=5195,freq=2.0), product of:
                0.15935703 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04550679 = queryNorm
                0.2708308 = fieldWeight in 5195, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5195)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Große Aufmerksamkeit richtet sich im Moment auf das Potential von automatisierten Methoden in der Sacherschließung und deren Interaktionsmöglichkeiten mit intellektuellen Methoden. In diesem Kontext befasst sich der vorliegende Beitrag mit den folgenden Fragen: Was sind die Anforderungen an bibliothekarische Metadaten aus Sicht der Wissenschaft? Was wird gebraucht, um den Informationsbedarf der Fachcommunities zu bedienen? Und was bedeutet das entsprechend für die Automatisierung der Metadatenerstellung und -pflege? Dieser Beitrag fasst die von der Autorin eingenommene Position in einem Impulsvortrag und der Podiumsdiskussion beim Workshop der FAG "Erschließung und Informationsvermittlung" des GBV zusammen. Der Workshop fand im Rahmen der 22. Verbundkonferenz des GBV statt.
  18. Franke-Maier, M.: Anforderungen an die Qualität der Inhaltserschließung im Spannungsfeld von intellektuell und automatisch erzeugten Metadaten (2018) 0.01
    0.010789698 = product of:
      0.021579396 = sum of:
        0.021579396 = product of:
          0.043158792 = sum of:
            0.043158792 = weight(_text_:22 in 5344) [ClassicSimilarity], result of:
              0.043158792 = score(doc=5344,freq=2.0), product of:
                0.15935703 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04550679 = queryNorm
                0.2708308 = fieldWeight in 5344, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5344)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Spätestens seit dem Deutschen Bibliothekartag 2018 hat sich die Diskussion zu den automatischen Verfahren der Inhaltserschließung der Deutschen Nationalbibliothek von einer politisch geführten Diskussion in eine Qualitätsdiskussion verwandelt. Der folgende Beitrag beschäftigt sich mit Fragen der Qualität von Inhaltserschließung in digitalen Zeiten, wo heterogene Erzeugnisse unterschiedlicher Verfahren aufeinandertreffen und versucht, wichtige Anforderungen an Qualität zu definieren. Dieser Tagungsbeitrag fasst die vom Autor als Impulse vorgetragenen Ideen beim Workshop der FAG "Erschließung und Informationsvermittlung" des GBV am 29. August 2018 in Kiel zusammen. Der Workshop fand im Rahmen der 22. Verbundkonferenz des GBV statt.
  19. Schöneberg, U.; Gödert, W.: Erschließung mathematischer Publikationen mittels linguistischer Verfahren (2012) 0.01
    0.009647444 = product of:
      0.019294888 = sum of:
        0.019294888 = product of:
          0.038589776 = sum of:
            0.038589776 = weight(_text_:subject in 1055) [ClassicSimilarity], result of:
              0.038589776 = score(doc=1055,freq=2.0), product of:
                0.16275941 = queryWeight, product of:
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.04550679 = queryNorm
                0.23709705 = fieldWeight in 1055, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1055)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Die Zahl der mathematik-relevanten Publikationn steigt von Jahr zu Jahr an. Referatedienste wie da Zentralblatt MATH und Mathematical Reviews erfassen die bibliographischen Daten, erschließen die Arbeiten inhaltlich und machen sie - heute über Datenbanken, früher in gedruckter Form - für den Nutzer suchbar. Keywords sind ein wesentlicher Bestandteil der inhaltlichen Erschließung der Publikationen. Keywords sind meist keine einzelnen Wörter, sondern Mehrwortphrasen. Das legt die Anwendung linguistischer Methoden und Verfahren nahe. Die an der FH Köln entwickelte Software 'Lingo' wurde für die speziellen Anforderungen mathematischer Texte angepasst und sowohl zum Aufbau eines kontrollierten Vokabulars als auch zur Extraction von Keywords aus mathematischen Publikationen genutzt. Es ist geplant, über eine Verknüpfung von kontrolliertem Vokabular und der Mathematical Subject Classification Methoden für die automatische Klassifikation für den Referatedienst Zentralblatt MATH zu entwickeln und zu erproben.
  20. Munkelt, J.; Schaer, P.; Lepsky, K.: Towards an IR test collection for the German National Library (2018) 0.01
    0.009647444 = product of:
      0.019294888 = sum of:
        0.019294888 = product of:
          0.038589776 = sum of:
            0.038589776 = weight(_text_:subject in 4311) [ClassicSimilarity], result of:
              0.038589776 = score(doc=4311,freq=2.0), product of:
                0.16275941 = queryWeight, product of:
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.04550679 = queryNorm
                0.23709705 = fieldWeight in 4311, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.576596 = idf(docFreq=3361, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4311)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Automatic content indexing is one of the innovations that are increasingly changing the way libraries work. In theory, it promises a cataloguing service that would hardly be possible with humans in terms of speed, quantity and maybe quality. The German National Library (DNB) has also recognised this potential and is increasingly relying on the automatic indexing of their catalogue content. The DNB took a major step in this direction in 2017, which was announced in two papers. The announcement was rather restrained, but the content of the papers is all the more explosive for the library community: Since September 2017, the DNB has discontinued the intellectual indexing of series Band H and has switched to an automatic process for these series. The subject indexing of online publications (series O) has been purely automatical since 2010; from September 2017, monographs and periodicals published outside the publishing industry and university publications will no longer be indexed by people. This raises the question: What is the quality of the automatic indexing compared to the manual work or in other words to which degree can the automatic indexing replace people without a signi cant drop in regards to quality?