Search (38 results, page 1 of 2)

  • × theme_ss:"Automatisches Indexieren"
  1. Daudaravicius, V.: ¬A framework for keyphrase extraction from scientific journals (2016) 0.05
    0.049816553 = product of:
      0.099633105 = sum of:
        0.099633105 = product of:
          0.19926621 = sum of:
            0.19926621 = weight(_text_:journals in 2930) [ClassicSimilarity], result of:
              0.19926621 = score(doc=2930,freq=8.0), product of:
                0.25656942 = queryWeight, product of:
                  5.021064 = idf(docFreq=792, maxDocs=44218)
                  0.05109862 = queryNorm
                0.77665615 = fieldWeight in 2930, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  5.021064 = idf(docFreq=792, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2930)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    We present a framework for keyphrase extraction from scientific journals in diverse research fields. While journal articles are often provided with manually assigned keywords, it is not clear how to automatically extract keywords and measure their significance for a set of journal articles. We compare extracted keyphrases from journals in the fields of astrophysics, mathematics, physics, and computer science. We show that the presented statistics-based framework is able to demonstrate differences among journals, and that the extracted keyphrases can be used to represent journal or conference research topics, dynamics, and specificity.
  2. Humphrey, S.M.: Automatic indexing of documents from journal descriptors : a preliminary investigation (1999) 0.04
    0.036979202 = product of:
      0.073958404 = sum of:
        0.073958404 = product of:
          0.14791681 = sum of:
            0.14791681 = weight(_text_:journals in 3769) [ClassicSimilarity], result of:
              0.14791681 = score(doc=3769,freq=6.0), product of:
                0.25656942 = queryWeight, product of:
                  5.021064 = idf(docFreq=792, maxDocs=44218)
                  0.05109862 = queryNorm
                0.5765177 = fieldWeight in 3769, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  5.021064 = idf(docFreq=792, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3769)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    A new, fully automated approach for indedexing documents is presented based on associating textwords in a training set of bibliographic citations with the indexing of journals. This journal-level indexing is in the form of a consistent, timely set of journal descriptors (JDs) indexing the individual journals themselves. This indexing is maintained in journal records in a serials authority database. The advantage of this novel approach is that the training set does not depend on previous manual indexing of thousands of documents (i.e., any such indexing already in the training set is not used), but rather the relatively small intellectual effort of indexing at the journal level, usually a matter of a few thousand unique journals for which retrospective indexing to maintain consistency and currency may be feasible. If successful, JD indexing would provide topical categorization of documents outside the training set, i.e., journal articles, monographs, Web documents, reports from the grey literature, etc., and therefore be applied in searching. Because JDs are quite general, corresponding to subject domains, their most problable use would be for improving or refining search results
  3. Gomez, I.: Coping with the problem of subject classification diversity (1996) 0.04
    0.035225622 = product of:
      0.070451245 = sum of:
        0.070451245 = product of:
          0.14090249 = sum of:
            0.14090249 = weight(_text_:journals in 5074) [ClassicSimilarity], result of:
              0.14090249 = score(doc=5074,freq=4.0), product of:
                0.25656942 = queryWeight, product of:
                  5.021064 = idf(docFreq=792, maxDocs=44218)
                  0.05109862 = queryNorm
                0.54917884 = fieldWeight in 5074, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.021064 = idf(docFreq=792, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5074)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The delimination of a research field in bibliometric studies presents the problem of the diversity of subject classifications used in the sources of input and output data. Classification of documents according the thematic codes or keywords is the most accurate method, mainly used is specialized bibliographic or patent databases. Classification of journals in disciplines presents lower specifity, and some shortcomings as the change over time of both journals and disciplines and the increasing interdisciplinarity of research. Standardization of subject classifications emerges as an important point in bibliometric studies in order to allow international comparisons, although flexibility is needed to meet the needs of local studies
  4. Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.03
    0.027692629 = product of:
      0.055385258 = sum of:
        0.055385258 = product of:
          0.110770516 = sum of:
            0.110770516 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
              0.110770516 = score(doc=402,freq=2.0), product of:
                0.17893866 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05109862 = queryNorm
                0.61904186 = fieldWeight in 402, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.125 = fieldNorm(doc=402)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    Information processing and management. 22(1986) no.6, S.465-476
  5. Bonzi, S.: Representation of concepts in text : a comparison of within-document frequency, anaphora, and synonymy (1991) 0.02
    0.024908276 = product of:
      0.049816553 = sum of:
        0.049816553 = product of:
          0.099633105 = sum of:
            0.099633105 = weight(_text_:journals in 4933) [ClassicSimilarity], result of:
              0.099633105 = score(doc=4933,freq=2.0), product of:
                0.25656942 = queryWeight, product of:
                  5.021064 = idf(docFreq=792, maxDocs=44218)
                  0.05109862 = queryNorm
                0.38832808 = fieldWeight in 4933, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.021064 = idf(docFreq=792, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=4933)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Investigates the 3 major ways by which a concept may be represented in text: within-document frequency, anaphoric reference, and synonyms in order to determine which provides the optical means of representation. Analysis a sample of 60 abstracts, drawn at random for the abstracting journals of 4 disciplines. Results show that in general, initial within-document frequency is higher for keyword terms. Additionally, frequency of keyword terms referenced anaphorically or with intellectually related terms is higher that that of other keyword terms. It appears that initial document length influences both the number and impact of both anaphoric resolutions and intellectually related terms
  6. Fuhr, N.; Niewelt, B.: ¬Ein Retrievaltest mit automatisch indexierten Dokumenten (1984) 0.02
    0.02423105 = product of:
      0.0484621 = sum of:
        0.0484621 = product of:
          0.0969242 = sum of:
            0.0969242 = weight(_text_:22 in 262) [ClassicSimilarity], result of:
              0.0969242 = score(doc=262,freq=2.0), product of:
                0.17893866 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05109862 = queryNorm
                0.5416616 = fieldWeight in 262, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=262)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    20.10.2000 12:22:23
  7. Hlava, M.M.K.: Automatic indexing : comparing rule-based and statistics-based indexing systems (2005) 0.02
    0.02423105 = product of:
      0.0484621 = sum of:
        0.0484621 = product of:
          0.0969242 = sum of:
            0.0969242 = weight(_text_:22 in 6265) [ClassicSimilarity], result of:
              0.0969242 = score(doc=6265,freq=2.0), product of:
                0.17893866 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05109862 = queryNorm
                0.5416616 = fieldWeight in 6265, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=6265)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    Information outlook. 9(2005) no.8, S.22-23
  8. Fuhr, N.: Ranking-Experimente mit gewichteter Indexierung (1986) 0.02
    0.020769471 = product of:
      0.041538943 = sum of:
        0.041538943 = product of:
          0.083077885 = sum of:
            0.083077885 = weight(_text_:22 in 58) [ClassicSimilarity], result of:
              0.083077885 = score(doc=58,freq=2.0), product of:
                0.17893866 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05109862 = queryNorm
                0.46428138 = fieldWeight in 58, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=58)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    14. 6.2015 22:12:44
  9. Hauer, M.: Automatische Indexierung (2000) 0.02
    0.020769471 = product of:
      0.041538943 = sum of:
        0.041538943 = product of:
          0.083077885 = sum of:
            0.083077885 = weight(_text_:22 in 5887) [ClassicSimilarity], result of:
              0.083077885 = score(doc=5887,freq=2.0), product of:
                0.17893866 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05109862 = queryNorm
                0.46428138 = fieldWeight in 5887, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=5887)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    Wissen in Aktion: Wege des Knowledge Managements. 22. Online-Tagung der DGI, Frankfurt am Main, 2.-4.5.2000. Proceedings. Hrsg.: R. Schmidt
  10. Fuhr, N.: Rankingexperimente mit gewichteter Indexierung (1986) 0.02
    0.020769471 = product of:
      0.041538943 = sum of:
        0.041538943 = product of:
          0.083077885 = sum of:
            0.083077885 = weight(_text_:22 in 2051) [ClassicSimilarity], result of:
              0.083077885 = score(doc=2051,freq=2.0), product of:
                0.17893866 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05109862 = queryNorm
                0.46428138 = fieldWeight in 2051, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=2051)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    14. 6.2015 22:12:56
  11. Hauer, M.: Tiefenindexierung im Bibliothekskatalog : 17 Jahre intelligentCAPTURE (2019) 0.02
    0.020769471 = product of:
      0.041538943 = sum of:
        0.041538943 = product of:
          0.083077885 = sum of:
            0.083077885 = weight(_text_:22 in 5629) [ClassicSimilarity], result of:
              0.083077885 = score(doc=5629,freq=2.0), product of:
                0.17893866 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05109862 = queryNorm
                0.46428138 = fieldWeight in 5629, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=5629)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    B.I.T.online. 22(2019) H.2, S.163-166
  12. Humphrey, S.M.; Névéol, A.; Browne, A.; Gobeil, J.; Ruch, P.; Darmoni, S.J.: Comparing a rule-based versus statistical system for automatic categorization of MEDLINE documents according to biomedical specialty (2009) 0.02
    0.017791625 = product of:
      0.03558325 = sum of:
        0.03558325 = product of:
          0.0711665 = sum of:
            0.0711665 = weight(_text_:journals in 3300) [ClassicSimilarity], result of:
              0.0711665 = score(doc=3300,freq=2.0), product of:
                0.25656942 = queryWeight, product of:
                  5.021064 = idf(docFreq=792, maxDocs=44218)
                  0.05109862 = queryNorm
                0.2773772 = fieldWeight in 3300, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.021064 = idf(docFreq=792, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3300)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Automatic document categorization is an important research problem in Information Science and Natural Language Processing. Many applications, including, Word Sense Disambiguation and Information Retrieval in large collections, can benefit from such categorization. This paper focuses on automatic categorization of documents from the biomedical literature into broad discipline-based categories. Two different systems are described and contrasted: CISMeF, which uses rules based on human indexing of the documents by the Medical Subject Headings (MeSH) controlled vocabulary in order to assign metaterms (MTs), and Journal Descriptor Indexing (JDI), based on human categorization of about 4,000 journals and statistical associations between journal descriptors (JDs) and textwords in the documents. We evaluate and compare the performance of these systems against a gold standard of humanly assigned categories for 100 MEDLINE documents, using six measures selected from trec_eval. The results show that for five of the measures performance is comparable, and for one measure JDI is superior. We conclude that these results favor JDI, given the significantly greater intellectual overhead involved in human indexing and maintaining a rule base for mapping MeSH terms to MTs. We also note a JDI method that associates JDs with MeSH indexing rather than textwords, and it may be worthwhile to investigate whether this JDI method (statistical) and CISMeF (rule-based) might be combined and then evaluated showing they are complementary to one another.
  13. Biebricher, N.; Fuhr, N.; Lustig, G.; Schwantner, M.; Knorz, G.: ¬The automatic indexing system AIR/PHYS : from research to application (1988) 0.02
    0.017307894 = product of:
      0.03461579 = sum of:
        0.03461579 = product of:
          0.06923158 = sum of:
            0.06923158 = weight(_text_:22 in 1952) [ClassicSimilarity], result of:
              0.06923158 = score(doc=1952,freq=2.0), product of:
                0.17893866 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05109862 = queryNorm
                0.38690117 = fieldWeight in 1952, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=1952)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    16. 8.1998 12:51:22
  14. Kutschekmanesch, S.; Lutes, B.; Moelle, K.; Thiel, U.; Tzeras, K.: Automated multilingual indexing : a synthesis of rule-based and thesaurus-based methods (1998) 0.02
    0.017307894 = product of:
      0.03461579 = sum of:
        0.03461579 = product of:
          0.06923158 = sum of:
            0.06923158 = weight(_text_:22 in 4157) [ClassicSimilarity], result of:
              0.06923158 = score(doc=4157,freq=2.0), product of:
                0.17893866 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05109862 = queryNorm
                0.38690117 = fieldWeight in 4157, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=4157)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    Information und Märkte: 50. Deutscher Dokumentartag 1998, Kongreß der Deutschen Gesellschaft für Dokumentation e.V. (DGD), Rheinische Friedrich-Wilhelms-Universität Bonn, 22.-24. September 1998. Hrsg. von Marlies Ockenfeld u. Gerhard J. Mantwill
  15. Tsareva, P.V.: Algoritmy dlya raspoznavaniya pozitivnykh i negativnykh vkhozdenii deskriptorov v tekst i protsedura avtomaticheskoi klassifikatsii tekstov (1999) 0.02
    0.017307894 = product of:
      0.03461579 = sum of:
        0.03461579 = product of:
          0.06923158 = sum of:
            0.06923158 = weight(_text_:22 in 374) [ClassicSimilarity], result of:
              0.06923158 = score(doc=374,freq=2.0), product of:
                0.17893866 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05109862 = queryNorm
                0.38690117 = fieldWeight in 374, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=374)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    1. 4.2002 10:22:41
  16. Stankovic, R. et al.: Indexing of textual databases based on lexical resources : a case study for Serbian (2016) 0.02
    0.017307894 = product of:
      0.03461579 = sum of:
        0.03461579 = product of:
          0.06923158 = sum of:
            0.06923158 = weight(_text_:22 in 2759) [ClassicSimilarity], result of:
              0.06923158 = score(doc=2759,freq=2.0), product of:
                0.17893866 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05109862 = queryNorm
                0.38690117 = fieldWeight in 2759, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=2759)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    1. 2.2016 18:25:22
  17. Tsujii, J.-I.: Automatic acquisition of semantic collocation from corpora (1995) 0.01
    0.0138463145 = product of:
      0.027692629 = sum of:
        0.027692629 = product of:
          0.055385258 = sum of:
            0.055385258 = weight(_text_:22 in 4709) [ClassicSimilarity], result of:
              0.055385258 = score(doc=4709,freq=2.0), product of:
                0.17893866 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05109862 = queryNorm
                0.30952093 = fieldWeight in 4709, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=4709)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    31. 7.1996 9:22:19
  18. Riloff, E.: ¬An empirical study of automated dictionary construction for information extraction in three domains (1996) 0.01
    0.0138463145 = product of:
      0.027692629 = sum of:
        0.027692629 = product of:
          0.055385258 = sum of:
            0.055385258 = weight(_text_:22 in 6752) [ClassicSimilarity], result of:
              0.055385258 = score(doc=6752,freq=2.0), product of:
                0.17893866 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05109862 = queryNorm
                0.30952093 = fieldWeight in 6752, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=6752)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    6. 3.1997 16:22:15
  19. Lepsky, K.; Vorhauer, J.: Lingo - ein open source System für die Automatische Indexierung deutschsprachiger Dokumente (2006) 0.01
    0.0138463145 = product of:
      0.027692629 = sum of:
        0.027692629 = product of:
          0.055385258 = sum of:
            0.055385258 = weight(_text_:22 in 3581) [ClassicSimilarity], result of:
              0.055385258 = score(doc=3581,freq=2.0), product of:
                0.17893866 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05109862 = queryNorm
                0.30952093 = fieldWeight in 3581, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=3581)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    24. 3.2006 12:22:02
  20. Probst, M.; Mittelbach, J.: Maschinelle Indexierung in der Sacherschließung wissenschaftlicher Bibliotheken (2006) 0.01
    0.0138463145 = product of:
      0.027692629 = sum of:
        0.027692629 = product of:
          0.055385258 = sum of:
            0.055385258 = weight(_text_:22 in 1755) [ClassicSimilarity], result of:
              0.055385258 = score(doc=1755,freq=2.0), product of:
                0.17893866 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05109862 = queryNorm
                0.30952093 = fieldWeight in 1755, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1755)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    22. 3.2008 12:35:19