Search (39 results, page 1 of 2)

  • × year_i:[2010 TO 2020}
  • × theme_ss:"Automatisches Indexieren"
  1. Li, X.; Zhang, A.; Li, C.; Ouyang, J.; Cai, Y.: Exploring coherent topics by topic modeling with term weighting (2018) 0.03
    0.03143732 = product of:
      0.110030614 = sum of:
        0.016822865 = weight(_text_:classification in 5045) [ClassicSimilarity], result of:
          0.016822865 = score(doc=5045,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.17593184 = fieldWeight in 5045, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5045)
        0.04113413 = product of:
          0.08226826 = sum of:
            0.08226826 = weight(_text_:schemes in 5045) [ClassicSimilarity], result of:
              0.08226826 = score(doc=5045,freq=6.0), product of:
                0.16067243 = queryWeight, product of:
                  5.3512506 = idf(docFreq=569, maxDocs=44218)
                  0.03002521 = queryNorm
                0.51202476 = fieldWeight in 5045, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  5.3512506 = idf(docFreq=569, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5045)
          0.5 = coord(1/2)
        0.016822865 = weight(_text_:classification in 5045) [ClassicSimilarity], result of:
          0.016822865 = score(doc=5045,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.17593184 = fieldWeight in 5045, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5045)
        0.035250753 = product of:
          0.07050151 = sum of:
            0.07050151 = weight(_text_:texts in 5045) [ClassicSimilarity], result of:
              0.07050151 = score(doc=5045,freq=4.0), product of:
                0.16460659 = queryWeight, product of:
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.03002521 = queryNorm
                0.42830306 = fieldWeight in 5045, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5045)
          0.5 = coord(1/2)
      0.2857143 = coord(4/14)
    
    Abstract
    Topic models often produce unexplainable topics that are filled with noisy words. The reason is that words in topic modeling have equal weights. High frequency words dominate the top topic word lists, but most of them are meaningless words, e.g., domain-specific stopwords. To address this issue, in this paper we aim to investigate how to weight words, and then develop a straightforward but effective term weighting scheme, namely entropy weighting (EW). The proposed EW scheme is based on conditional entropy measured by word co-occurrences. Compared with existing term weighting schemes, the highlight of EW is that it can automatically reward informative words. For more robust word weight, we further suggest a combination form of EW (CEW) with two existing weighting schemes. Basically, our CEW assigns meaningless words lower weights and informative words higher weights, leading to more coherent topics during topic modeling inference. We apply CEW to Dirichlet multinomial mixture and latent Dirichlet allocation, and evaluate it by topic quality, document clustering and classification tasks on 8 real world data sets. Experimental results show that weighting words can effectively improve the topic modeling performance over both short texts and normal long texts. More importantly, the proposed CEW significantly outperforms the existing term weighting schemes, since it further considers which words are informative.
  2. Martins, A.L.; Souza, R.R.; Ribeiro de Mello, H.: ¬The use of noun phrases in information retrieval : proposing a mechanism for automatic classification (2014) 0.03
    0.029701097 = product of:
      0.13860512 = sum of:
        0.03296595 = weight(_text_:classification in 1441) [ClassicSimilarity], result of:
          0.03296595 = score(doc=1441,freq=12.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.3447546 = fieldWeight in 1441, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03125 = fieldNorm(doc=1441)
        0.03296595 = weight(_text_:classification in 1441) [ClassicSimilarity], result of:
          0.03296595 = score(doc=1441,freq=12.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.3447546 = fieldWeight in 1441, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03125 = fieldNorm(doc=1441)
        0.07267321 = sum of:
          0.056401204 = weight(_text_:texts in 1441) [ClassicSimilarity], result of:
            0.056401204 = score(doc=1441,freq=4.0), product of:
              0.16460659 = queryWeight, product of:
                5.4822793 = idf(docFreq=499, maxDocs=44218)
                0.03002521 = queryNorm
              0.34264246 = fieldWeight in 1441, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.4822793 = idf(docFreq=499, maxDocs=44218)
                0.03125 = fieldNorm(doc=1441)
          0.016272005 = weight(_text_:22 in 1441) [ClassicSimilarity], result of:
            0.016272005 = score(doc=1441,freq=2.0), product of:
              0.10514317 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.03002521 = queryNorm
              0.15476047 = fieldWeight in 1441, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.03125 = fieldNorm(doc=1441)
      0.21428572 = coord(3/14)
    
    Abstract
    This paper presents a research on syntactic structures known as noun phrases (NP) being applied to increase the effectiveness and efficiency of the mechanisms for the document's classification. Our hypothesis is the fact that the NP can be used instead of single words as a semantic aggregator to reduce the number of words that will be used for the classification system without losing its semantic coverage, increasing its efficiency. The experiment divided the documents classification process in three phases: a) NP preprocessing b) system training; and c) classification experiments. In the first step, a corpus of digitalized texts was submitted to a natural language processing platform1 in which the part-of-speech tagging was done, and them PERL scripts pertaining to the PALAVRAS package were used to extract the Noun Phrases. The preprocessing also involved the tasks of a) removing NP low meaning pre-modifiers, as quantifiers; b) identification of synonyms and corresponding substitution for common hyperonyms; and c) stemming of the relevant words contained in the NP, for similitude checking with other NPs. The first tests with the resulting documents have demonstrated its effectiveness. We have compared the structural similarity of the documents before and after the whole pre-processing steps of phase one. The texts maintained the consistency with the original and have kept the readability. The second phase involves submitting the modified documents to a SVM algorithm to identify clusters and classify the documents. The classification rules are to be established using a machine learning approach. Finally, tests will be conducted to check the effectiveness of the whole process.
    Source
    Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik
  3. Short, M.: Text mining and subject analysis for fiction; or, using machine learning and information extraction to assign subject headings to dime novels (2019) 0.03
    0.027005369 = product of:
      0.12602505 = sum of:
        0.059409913 = weight(_text_:subject in 5481) [ClassicSimilarity], result of:
          0.059409913 = score(doc=5481,freq=8.0), product of:
            0.10738805 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.03002521 = queryNorm
            0.5532265 = fieldWeight in 5481, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5481)
        0.033307575 = weight(_text_:classification in 5481) [ClassicSimilarity], result of:
          0.033307575 = score(doc=5481,freq=4.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.34832728 = fieldWeight in 5481, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5481)
        0.033307575 = weight(_text_:classification in 5481) [ClassicSimilarity], result of:
          0.033307575 = score(doc=5481,freq=4.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.34832728 = fieldWeight in 5481, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5481)
      0.21428572 = coord(3/14)
    
    Abstract
    This article describes multiple experiments in text mining at Northern Illinois University that were undertaken to improve the efficiency and accuracy of cataloging. It focuses narrowly on subject analysis of dime novels, a format of inexpensive fiction that was popular in the United States between 1860 and 1915. NIU holds more than 55,000 dime novels in its collections, which it is in the process of comprehensively digitizing. Classification, keyword extraction, named-entity recognition, clustering, and topic modeling are discussed as means of assigning subject headings to improve their discoverability by researchers and to increase the productivity of digitization workflows.
    Source
    Cataloging and classification quarterly. 57(2019) no.5, S.315-336
  4. Wang, S.; Koopman, R.: Embed first, then predict (2019) 0.03
    0.0268396 = product of:
      0.0939386 = sum of:
        0.021217827 = weight(_text_:subject in 5400) [ClassicSimilarity], result of:
          0.021217827 = score(doc=5400,freq=2.0), product of:
            0.10738805 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.03002521 = queryNorm
            0.19758089 = fieldWeight in 5400, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5400)
        0.023791125 = weight(_text_:classification in 5400) [ClassicSimilarity], result of:
          0.023791125 = score(doc=5400,freq=4.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.24880521 = fieldWeight in 5400, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5400)
        0.02513852 = weight(_text_:bibliographic in 5400) [ClassicSimilarity], result of:
          0.02513852 = score(doc=5400,freq=2.0), product of:
            0.11688946 = queryWeight, product of:
              3.893044 = idf(docFreq=2449, maxDocs=44218)
              0.03002521 = queryNorm
            0.21506234 = fieldWeight in 5400, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.893044 = idf(docFreq=2449, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5400)
        0.023791125 = weight(_text_:classification in 5400) [ClassicSimilarity], result of:
          0.023791125 = score(doc=5400,freq=4.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.24880521 = fieldWeight in 5400, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5400)
      0.2857143 = coord(4/14)
    
    Abstract
    Automatic subject prediction is a desirable feature for modern digital library systems, as manual indexing can no longer cope with the rapid growth of digital collections. It is also desirable to be able to identify a small set of entities (e.g., authors, citations, bibliographic records) which are most relevant to a query. This gets more difficult when the amount of data increases dramatically. Data sparsity and model scalability are the major challenges to solving this type of extreme multilabel classification problem automatically. In this paper, we propose to address this problem in two steps: we first embed different types of entities into the same semantic space, where similarity could be computed easily; second, we propose a novel non-parametric method to identify the most relevant entities in addition to direct semantic similarities. We show how effectively this approach predicts even very specialised subjects, which are associated with few documents in the training set and are more problematic for a classifier.
    Footnote
    Beitrag eines Special Issue: Research Information Systems and Science Classifications; including papers from "Trajectories for Research: Fathoming the Promise of the NARCIS Classification," 27-28 September 2018, The Hague, The Netherlands.
  5. Keller, A.: Attitudes among German- and English-speaking librarians toward (automatic) subject indexing (2015) 0.02
    0.024327071 = product of:
      0.11352633 = sum of:
        0.066422306 = weight(_text_:subject in 2629) [ClassicSimilarity], result of:
          0.066422306 = score(doc=2629,freq=10.0), product of:
            0.10738805 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.03002521 = queryNorm
            0.61852604 = fieldWeight in 2629, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2629)
        0.023552012 = weight(_text_:classification in 2629) [ClassicSimilarity], result of:
          0.023552012 = score(doc=2629,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.24630459 = fieldWeight in 2629, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2629)
        0.023552012 = weight(_text_:classification in 2629) [ClassicSimilarity], result of:
          0.023552012 = score(doc=2629,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.24630459 = fieldWeight in 2629, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2629)
      0.21428572 = coord(3/14)
    
    Abstract
    The survey described in this article investigates the attitudes of librarians in German- and English-speaking countries toward subject indexing in general, and automatic subject indexing in particular. The results show great similarity between attitudes in both language areas. Respondents agree that the current quality standards should be upheld and dismiss critical voices claiming that subject indexing has lost relevance. With regard to automatic subject indexing, respondents demonstrate considerable skepticism-both with regard to the likely timeframe and the expected quality of such systems. The author considers how this low acceptance poses a difficulty for those involved in change management.
    Source
    Cataloging and classification quarterly. 53(2015) no.8, S.895-904
  6. Golub, K.: Automatic subject indexing of text (2019) 0.02
    0.02362478 = product of:
      0.11024897 = sum of:
        0.051972847 = weight(_text_:subject in 5268) [ClassicSimilarity], result of:
          0.051972847 = score(doc=5268,freq=12.0), product of:
            0.10738805 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.03002521 = queryNorm
            0.48397237 = fieldWeight in 5268, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5268)
        0.029138058 = weight(_text_:classification in 5268) [ClassicSimilarity], result of:
          0.029138058 = score(doc=5268,freq=6.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.3047229 = fieldWeight in 5268, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5268)
        0.029138058 = weight(_text_:classification in 5268) [ClassicSimilarity], result of:
          0.029138058 = score(doc=5268,freq=6.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.3047229 = fieldWeight in 5268, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5268)
      0.21428572 = coord(3/14)
    
    Abstract
    Automatic subject indexing addresses problems of scale and sustainability and can be at the same time used to enrich existing metadata records, establish more connections across and between resources from various metadata and resource collec-tions, and enhance consistency of the metadata. In this work, au-tomatic subject indexing focuses on assigning index terms or classes from established knowledge organization systems (KOSs) for subject indexing like thesauri, subject headings systems and classification systems. The following major approaches are dis-cussed, in terms of their similarities and differences, advantages and disadvantages for automatic assigned indexing from KOSs: "text categorization," "document clustering," and "document classification." Text categorization is perhaps the most wide-spread, machine-learning approach with what seems generally good reported performance. Document clustering automatically both creates groups of related documents and extracts names of subjects depicting the group at hand. Document classification re-uses the intellectual effort invested into creating a KOS for sub-ject indexing and even simple string-matching algorithms have been reported to achieve good results, because one concept can be described using a number of different terms, including equiv-alent, related, narrower and broader terms. Finally, applicability of automatic subject indexing to operative information systems and challenges of evaluation are outlined, suggesting the need for more research.
  7. Junger, U.: Can indexing be automated? : the example of the Deutsche Nationalbibliothek (2012) 0.02
    0.022824414 = product of:
      0.10651393 = sum of:
        0.059409913 = weight(_text_:subject in 1717) [ClassicSimilarity], result of:
          0.059409913 = score(doc=1717,freq=8.0), product of:
            0.10738805 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.03002521 = queryNorm
            0.5532265 = fieldWeight in 1717, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1717)
        0.023552012 = weight(_text_:classification in 1717) [ClassicSimilarity], result of:
          0.023552012 = score(doc=1717,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.24630459 = fieldWeight in 1717, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1717)
        0.023552012 = weight(_text_:classification in 1717) [ClassicSimilarity], result of:
          0.023552012 = score(doc=1717,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.24630459 = fieldWeight in 1717, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1717)
      0.21428572 = coord(3/14)
    
    Abstract
    The German subject headings authority file (Schlagwortnormdatei/SWD) provides a broad controlled vocabulary for indexing documents of all subjects. Traditionally used for intellectual subject cataloguing primarily of books the Deutsche Nationalbibliothek (DNB, German National Library) has been working on developping and implementing procedures for automated assignment of subject headings for online publications. This project, its results and problems are sketched in the paper.
    Content
    Beitrag für die Tagung: Beyond libraries - subject metadata in the digital environment and semantic web. IFLA Satellite Post-Conference, 17-18 August 2012, Tallinn. Vgl.: http://http://www.nlib.ee/index.php?id=17763.
    Source
    Cataloguing & Classification Quarterly 52(2014) no.1, S.102-109
  8. Junger, U.: Can indexing be automated? : the example of the Deutsche Nationalbibliothek (2014) 0.02
    0.021118827 = product of:
      0.09855452 = sum of:
        0.0514505 = weight(_text_:subject in 1969) [ClassicSimilarity], result of:
          0.0514505 = score(doc=1969,freq=6.0), product of:
            0.10738805 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.03002521 = queryNorm
            0.4791082 = fieldWeight in 1969, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1969)
        0.023552012 = weight(_text_:classification in 1969) [ClassicSimilarity], result of:
          0.023552012 = score(doc=1969,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.24630459 = fieldWeight in 1969, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1969)
        0.023552012 = weight(_text_:classification in 1969) [ClassicSimilarity], result of:
          0.023552012 = score(doc=1969,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.24630459 = fieldWeight in 1969, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1969)
      0.21428572 = coord(3/14)
    
    Abstract
    The German Integrated Authority File (Gemeinsame Normdatei, GND), provides a broad controlled vocabulary for indexing documents on all subjects. Traditionally used for intellectual subject cataloging primarily for books, the Deutsche Nationalbibliothek (DNB, German National Library) has been working on developing and implementing procedures for automated assignment of subject headings for online publications. This project, its results, and problems are outlined in this article.
    Footnote
    Contribution in a special issue "Beyond libraries: Subject metadata in the digital environment and Semantic Web" - Enthält Beiträge der gleichnamigen IFLA Satellite Post-Conference, 17-18 August 2012, Tallinn.
    Source
    Cataloging and classification quarterly. 52(2014) no.1, S.102-109
  9. Smiraglia, R.P.; Cai, X.: Tracking the evolution of clustering, machine learning, automatic indexing and automatic classification in knowledge organization (2017) 0.02
    0.019760894 = product of:
      0.092217505 = sum of:
        0.03364573 = weight(_text_:classification in 3627) [ClassicSimilarity], result of:
          0.03364573 = score(doc=3627,freq=8.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.35186368 = fieldWeight in 3627, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3627)
        0.03364573 = weight(_text_:classification in 3627) [ClassicSimilarity], result of:
          0.03364573 = score(doc=3627,freq=8.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.35186368 = fieldWeight in 3627, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3627)
        0.024926046 = product of:
          0.04985209 = sum of:
            0.04985209 = weight(_text_:texts in 3627) [ClassicSimilarity], result of:
              0.04985209 = score(doc=3627,freq=2.0), product of:
                0.16460659 = queryWeight, product of:
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.03002521 = queryNorm
                0.302856 = fieldWeight in 3627, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3627)
          0.5 = coord(1/2)
      0.21428572 = coord(3/14)
    
    Abstract
    A very important extension of the traditional domain of knowledge organization (KO) arises from attempts to incorporate techniques devised in the computer science domain for automatic concept extraction and for grouping, categorizing, clustering and otherwise organizing knowledge using mechanical means. Four specific terms have emerged to identify the most prevalent techniques: machine learning, clustering, automatic indexing, and automatic classification. Our study presents three domain analytical case analyses in search of answers. The first case relies on citations located using the ISKO-supported "Knowledge Organization Bibliography." The second case relies on works in both Web of Science and SCOPUS. Case three applies co-word analysis and citation analysis to the contents of the papers in the present special issue. We observe scholars involved in "clustering" and "automatic classification" who share common thematic emphases. But we have found no coherence, no common activity and no social semantics. We have not found a research front, or a common teleology within the KO domain. We also have found a lively group of authors who have succeeded in submitting papers to this special issue, and their work quite interestingly aligns with the case studies we report. There is an emphasis on KO for information retrieval; there is much work on clustering (which involves conceptual points within texts) and automatic classification (which involves semantic groupings at the meta-document level).
  10. Vilares, D.; Alonso, M.A.; Gómez-Rodríguez, C.: On the usefulness of lexical and syntactic processing in polarity classification of Twitter messages (2015) 0.02
    0.017829034 = product of:
      0.08320216 = sum of:
        0.029138058 = weight(_text_:classification in 2161) [ClassicSimilarity], result of:
          0.029138058 = score(doc=2161,freq=6.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.3047229 = fieldWeight in 2161, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2161)
        0.029138058 = weight(_text_:classification in 2161) [ClassicSimilarity], result of:
          0.029138058 = score(doc=2161,freq=6.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.3047229 = fieldWeight in 2161, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2161)
        0.024926046 = product of:
          0.04985209 = sum of:
            0.04985209 = weight(_text_:texts in 2161) [ClassicSimilarity], result of:
              0.04985209 = score(doc=2161,freq=2.0), product of:
                0.16460659 = queryWeight, product of:
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.03002521 = queryNorm
                0.302856 = fieldWeight in 2161, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2161)
          0.5 = coord(1/2)
      0.21428572 = coord(3/14)
    
    Abstract
    Millions of micro texts are published every day on Twitter. Identifying the sentiment present in them can be helpful for measuring the frame of mind of the public, their satisfaction with respect to a product, or their support of a social event. In this context, polarity classification is a subfield of sentiment analysis focused on determining whether the content of a text is objective or subjective, and in the latter case, if it conveys a positive or a negative opinion. Most polarity detection techniques tend to take into account individual terms in the text and even some degree of linguistic knowledge, but they do not usually consider syntactic relations between words. This article explores how relating lexical, syntactic, and psychometric information can be helpful to perform polarity classification on Spanish tweets. We provide an evaluation for both shallow and deep linguistic perspectives. Empirical results show an improved performance of syntactic approaches over pure lexical models when using large training sets to create a classifier, but this tendency is reversed when small training collections are used.
  11. Kanan, T.; Fox, E.A.: Automated arabic text classification with P-Stemmer, machine learning, and a tailored news article taxonomy (2016) 0.02
    0.017034415 = product of:
      0.07949394 = sum of:
        0.021217827 = weight(_text_:subject in 3151) [ClassicSimilarity], result of:
          0.021217827 = score(doc=3151,freq=2.0), product of:
            0.10738805 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.03002521 = queryNorm
            0.19758089 = fieldWeight in 3151, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3151)
        0.029138058 = weight(_text_:classification in 3151) [ClassicSimilarity], result of:
          0.029138058 = score(doc=3151,freq=6.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.3047229 = fieldWeight in 3151, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3151)
        0.029138058 = weight(_text_:classification in 3151) [ClassicSimilarity], result of:
          0.029138058 = score(doc=3151,freq=6.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.3047229 = fieldWeight in 3151, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3151)
      0.21428572 = coord(3/14)
    
    Abstract
    Arabic news articles in electronic collections are difficult to study. Browsing by category is rarely supported. Although helpful machine-learning methods have been applied successfully to similar situations for English news articles, limited research has been completed to yield suitable solutions for Arabic news. In connection with a Qatar National Research Fund (QNRF)-funded project to build digital library community and infrastructure in Qatar, we developed software for browsing a collection of about 237,000 Arabic news articles, which should be applicable to other Arabic news collections. We designed a simple taxonomy for Arabic news stories that is suitable for the needs of Qatar and other nations, is compatible with the subject codes of the International Press Telecommunications Council, and was enhanced with the aid of a librarian expert as well as five Arabic-speaking volunteers. We developed tailored stemming (i.e., a new Arabic light stemmer called P-Stemmer) and automatic classification methods (the best being binary Support Vector Machines classifiers) to work with the taxonomy. Using evaluation techniques commonly used in the information retrieval community, including 10-fold cross-validation and the Wilcoxon signed-rank test, we showed that our approach to stemming and classification is superior to state-of-the-art techniques.
  12. Golub, K.; Lykke, M.; Tudhope, D.: Enhancing social tagging with automated keywords from the Dewey Decimal Classification (2014) 0.02
    0.01662617 = product of:
      0.07758879 = sum of:
        0.030006537 = weight(_text_:subject in 2918) [ClassicSimilarity], result of:
          0.030006537 = score(doc=2918,freq=4.0), product of:
            0.10738805 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.03002521 = queryNorm
            0.27942157 = fieldWeight in 2918, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2918)
        0.023791125 = weight(_text_:classification in 2918) [ClassicSimilarity], result of:
          0.023791125 = score(doc=2918,freq=4.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.24880521 = fieldWeight in 2918, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2918)
        0.023791125 = weight(_text_:classification in 2918) [ClassicSimilarity], result of:
          0.023791125 = score(doc=2918,freq=4.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.24880521 = fieldWeight in 2918, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2918)
      0.21428572 = coord(3/14)
    
    Abstract
    Purpose - The purpose of this paper is to explore the potential of applying the Dewey Decimal Classification (DDC) as an established knowledge organization system (KOS) for enhancing social tagging, with the ultimate purpose of improving subject indexing and information retrieval. Design/methodology/approach - Over 11.000 Intute metadata records in politics were used. Totally, 28 politics students were each given four tasks, in which a total of 60 resources were tagged in two different configurations, one with uncontrolled social tags only and another with uncontrolled social tags as well as suggestions from a controlled vocabulary. The controlled vocabulary was DDC comprising also mappings from the Library of Congress Subject Headings. Findings - The results demonstrate the importance of controlled vocabulary suggestions for indexing and retrieval: to help produce ideas of which tags to use, to make it easier to find focus for the tagging, to ensure consistency and to increase the number of access points in retrieval. The value and usefulness of the suggestions proved to be dependent on the quality of the suggestions, both as to conceptual relevance to the user and as to appropriateness of the terminology. Originality/value - No research has investigated the enhancement of social tagging with suggestions from the DDC, an established KOS, in a user trial, comparing social tagging only and social tagging enhanced with the suggestions. This paper is a final reflection on all aspects of the study.
  13. Fauzi, F.; Belkhatir, M.: Multifaceted conceptual image indexing on the world wide web (2013) 0.01
    0.014758593 = product of:
      0.068873435 = sum of:
        0.02018744 = weight(_text_:classification in 2721) [ClassicSimilarity], result of:
          0.02018744 = score(doc=2721,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.21111822 = fieldWeight in 2721, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=2721)
        0.02849856 = product of:
          0.05699712 = sum of:
            0.05699712 = weight(_text_:schemes in 2721) [ClassicSimilarity], result of:
              0.05699712 = score(doc=2721,freq=2.0), product of:
                0.16067243 = queryWeight, product of:
                  5.3512506 = idf(docFreq=569, maxDocs=44218)
                  0.03002521 = queryNorm
                0.35474116 = fieldWeight in 2721, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.3512506 = idf(docFreq=569, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2721)
          0.5 = coord(1/2)
        0.02018744 = weight(_text_:classification in 2721) [ClassicSimilarity], result of:
          0.02018744 = score(doc=2721,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.21111822 = fieldWeight in 2721, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=2721)
      0.21428572 = coord(3/14)
    
    Abstract
    In this paper, we describe a user-centered design of an automated multifaceted concept-based indexing framework which analyzes the semantics of the Web image contextual information and classifies it into five broad semantic concept facets: signal, object, abstract, scene, and relational; and identifies the semantic relationships between the concepts. An important aspect of our indexing model is that it relates to the users' levels of image descriptions. Also, a major contribution relies on the fact that the classification is performed automatically with the raw image contextual information extracted from any general webpage and is not solely based on image tags like state-of-the-art solutions. Human Language Technology techniques and an external knowledge base are used to analyze the information both syntactically and semantically. Experimental results on a human-annotated Web image collection and corresponding contextual information indicate that our method outperforms empirical frameworks employing tf-idf and location-based tf-idf weighting schemes as well as n-gram indexing in a recall/precision based evaluation framework.
  14. Schöneberg, U.; Gödert, W.: Erschließung mathematischer Publikationen mittels linguistischer Verfahren (2012) 0.01
    0.014107771 = product of:
      0.065836266 = sum of:
        0.02546139 = weight(_text_:subject in 1055) [ClassicSimilarity], result of:
          0.02546139 = score(doc=1055,freq=2.0), product of:
            0.10738805 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.03002521 = queryNorm
            0.23709705 = fieldWeight in 1055, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.046875 = fieldNorm(doc=1055)
        0.02018744 = weight(_text_:classification in 1055) [ClassicSimilarity], result of:
          0.02018744 = score(doc=1055,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.21111822 = fieldWeight in 1055, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=1055)
        0.02018744 = weight(_text_:classification in 1055) [ClassicSimilarity], result of:
          0.02018744 = score(doc=1055,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.21111822 = fieldWeight in 1055, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=1055)
      0.21428572 = coord(3/14)
    
    Abstract
    Die Zahl der mathematik-relevanten Publikationn steigt von Jahr zu Jahr an. Referatedienste wie da Zentralblatt MATH und Mathematical Reviews erfassen die bibliographischen Daten, erschließen die Arbeiten inhaltlich und machen sie - heute über Datenbanken, früher in gedruckter Form - für den Nutzer suchbar. Keywords sind ein wesentlicher Bestandteil der inhaltlichen Erschließung der Publikationen. Keywords sind meist keine einzelnen Wörter, sondern Mehrwortphrasen. Das legt die Anwendung linguistischer Methoden und Verfahren nahe. Die an der FH Köln entwickelte Software 'Lingo' wurde für die speziellen Anforderungen mathematischer Texte angepasst und sowohl zum Aufbau eines kontrollierten Vokabulars als auch zur Extraction von Keywords aus mathematischen Publikationen genutzt. Es ist geplant, über eine Verknüpfung von kontrolliertem Vokabular und der Mathematical Subject Classification Methoden für die automatische Klassifikation für den Referatedienst Zentralblatt MATH zu entwickeln und zu erproben.
  15. Golub, K.; Soergel, D.; Buchanan, G.; Tudhope, D.; Lykke, M.; Hiom, D.: ¬A framework for evaluating automatic indexing or classification in the context of retrieval (2016) 0.01
    0.013639772 = product of:
      0.06365227 = sum of:
        0.030006537 = weight(_text_:subject in 3311) [ClassicSimilarity], result of:
          0.030006537 = score(doc=3311,freq=4.0), product of:
            0.10738805 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.03002521 = queryNorm
            0.27942157 = fieldWeight in 3311, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3311)
        0.016822865 = weight(_text_:classification in 3311) [ClassicSimilarity], result of:
          0.016822865 = score(doc=3311,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.17593184 = fieldWeight in 3311, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3311)
        0.016822865 = weight(_text_:classification in 3311) [ClassicSimilarity], result of:
          0.016822865 = score(doc=3311,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.17593184 = fieldWeight in 3311, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3311)
      0.21428572 = coord(3/14)
    
    Abstract
    Tools for automatic subject assignment help deal with scale and sustainability in creating and enriching metadata, establishing more connections across and between resources and enhancing consistency. Although some software vendors and experimental researchers claim the tools can replace manual subject indexing, hard scientific evidence of their performance in operating information environments is scarce. A major reason for this is that research is usually conducted in laboratory conditions, excluding the complexities of real-life systems and situations. The article reviews and discusses issues with existing evaluation approaches such as problems of aboutness and relevance assessments, implying the need to use more than a single "gold standard" method when evaluating indexing and retrieval, and proposes a comprehensive evaluation framework. The framework is informed by a systematic review of the literature on evaluation approaches: evaluating indexing quality directly through assessment by an evaluator or through comparison with a gold standard, evaluating the quality of computer-assisted indexing directly in the context of an indexing workflow, and evaluating indexing quality indirectly through analyzing retrieval performance.
  16. Zhitomirsky-Geffet, M.; Prebor, G.; Bloch, O.: Improving proverb search and retrieval with a generic multidimensional ontology (2017) 0.01
    0.008156957 = product of:
      0.057098698 = sum of:
        0.028549349 = weight(_text_:classification in 3320) [ClassicSimilarity], result of:
          0.028549349 = score(doc=3320,freq=4.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.29856625 = fieldWeight in 3320, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=3320)
        0.028549349 = weight(_text_:classification in 3320) [ClassicSimilarity], result of:
          0.028549349 = score(doc=3320,freq=4.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.29856625 = fieldWeight in 3320, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=3320)
      0.14285715 = coord(2/14)
    
    Abstract
    The goal of this research is to develop a generic ontological model for proverbs that unifies potential classification criteria and various characteristics of proverbs to enable their effective retrieval and large-scale analysis. Because proverbs can be described and indexed by multiple characteristics and criteria, we built a multidimensional ontology suitable for proverb classification. To evaluate the effectiveness of the constructed ontology for improving search and retrieval of proverbs, a large-scale user experiment was arranged with 70 users who were asked to search a proverb repository using ontology-based and free-text search interfaces. The comparative analysis of the results shows that the use of this ontology helped to substantially improve the search recall, precision, user satisfaction, and efficiency and to minimize user effort during the search process. A practical contribution of this work is an automated web-based proverb search and retrieval system which incorporates the proposed ontological scheme and an initial corpus of ontology-based annotated proverbs.
  17. Lepsky, K.; Müller, T.; Wille, J.: Metadata improvement for image information retrieval (2010) 0.01
    0.0067291465 = product of:
      0.047104023 = sum of:
        0.023552012 = weight(_text_:classification in 4995) [ClassicSimilarity], result of:
          0.023552012 = score(doc=4995,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.24630459 = fieldWeight in 4995, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4995)
        0.023552012 = weight(_text_:classification in 4995) [ClassicSimilarity], result of:
          0.023552012 = score(doc=4995,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.24630459 = fieldWeight in 4995, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4995)
      0.14285715 = coord(2/14)
    
    Abstract
    This paper discusses the goals and results of the research project Perseus-a as an attempt to improve information retrieval of digital images by automatically connecting them with text-based descriptions. The development uses the image collection of prometheus, the distributed digital image archive for research and studies, the articles of the digitized Reallexikon zur Deutschen Kunstgeschichte, art historical terminological resources and classification data, and an open source system for linguistic and statistic automatic indexing called lingo.
  18. Lichtenstein, A.; Plank, M.; Neumann, J.: TIB's portal for audiovisual media : combining manual and automatic indexing (2014) 0.01
    0.0067291465 = product of:
      0.047104023 = sum of:
        0.023552012 = weight(_text_:classification in 1981) [ClassicSimilarity], result of:
          0.023552012 = score(doc=1981,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.24630459 = fieldWeight in 1981, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1981)
        0.023552012 = weight(_text_:classification in 1981) [ClassicSimilarity], result of:
          0.023552012 = score(doc=1981,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.24630459 = fieldWeight in 1981, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1981)
      0.14285715 = coord(2/14)
    
    Source
    Cataloging and classification quarterly. 52(2014) no.5, S.562-577
  19. Gábor, K.; Zargayouna, H.; Tellier, I.; Buscaldi, D.; Charnois, T.: ¬A typology of semantic relations dedicated to scientific literature analysis (2016) 0.01
    0.0067291465 = product of:
      0.047104023 = sum of:
        0.023552012 = weight(_text_:classification in 2933) [ClassicSimilarity], result of:
          0.023552012 = score(doc=2933,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.24630459 = fieldWeight in 2933, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2933)
        0.023552012 = weight(_text_:classification in 2933) [ClassicSimilarity], result of:
          0.023552012 = score(doc=2933,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.24630459 = fieldWeight in 2933, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2933)
      0.14285715 = coord(2/14)
    
    Abstract
    We propose a method for improving access to scientific literature by analyzing the content of research papers beyond citation links and topic tracking. Our model relies on a typology of explicit semantic relations. These relations are instantiated in the abstract/introduction part of the papers and can be identified automatically using textual data and external ontologies. Preliminary results show a promising precision in unsupervised relationship classification.
  20. Kasprzik, A.: Automatisierte und semiautomatisierte Klassifizierung : eine Analyse aktueller Projekte (2014) 0.01
    0.00576784 = product of:
      0.04037488 = sum of:
        0.02018744 = weight(_text_:classification in 2470) [ClassicSimilarity], result of:
          0.02018744 = score(doc=2470,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.21111822 = fieldWeight in 2470, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=2470)
        0.02018744 = weight(_text_:classification in 2470) [ClassicSimilarity], result of:
          0.02018744 = score(doc=2470,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.21111822 = fieldWeight in 2470, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=2470)
      0.14285715 = coord(2/14)
    
    Abstract
    Das sprunghafte Anwachsen der Menge digital verfügbarer Dokumente gepaart mit dem Zeit- und Personalmangel an wissenschaftlichen Bibliotheken legt den Einsatz von halb- oder vollautomatischen Verfahren für die verbale und klassifikatorische Inhaltserschließung nahe. Nach einer kurzen allgemeinen Einführung in die gängige Methodik beleuchtet dieser Artikel eine Reihe von Projekten zur automatisierten Klassifizierung aus dem Zeitraum 2007-2012 und aus dem deutschsprachigen Raum. Ein Großteil der vorgestellten Projekte verwendet Methoden des Maschinellen Lernens aus der Künstlichen Intelligenz, arbeitet meist mit angepassten Versionen einer kommerziellen Software und bezieht sich in der Regel auf die Dewey Decimal Classification (DDC). Als Datengrundlage dienen Metadatensätze, Abstracs, Inhaltsverzeichnisse und Volltexte in diversen Datenformaten. Die abschließende Analyse enthält eine Anordnung der Projekte nach einer Reihe von verschiedenen Kriterien und eine Zusammenfassung der aktuellen Lage und der größten Herausfordungen für automatisierte Klassifizierungsverfahren.

Languages

  • e 30
  • d 8
  • m 1
  • More… Less…

Types

  • a 35
  • el 6
  • m 1
  • p 1
  • x 1
  • More… Less…

Classifications