Search (259 results, page 1 of 13)

  • × theme_ss:"Automatisches Indexieren"
  1. Biebricher, N.; Fuhr, N.; Lustig, G.; Schwantner, M.; Knorz, G.: ¬The automatic indexing system AIR/PHYS : from research to application (1988) 0.02
    0.023731515 = product of:
      0.071194544 = sum of:
        0.071194544 = sum of:
          0.0118366135 = weight(_text_:of in 1952) [ClassicSimilarity], result of:
            0.0118366135 = score(doc=1952,freq=2.0), product of:
              0.06850986 = queryWeight, product of:
                1.5637573 = idf(docFreq=25162, maxDocs=44218)
                0.043811057 = queryNorm
              0.17277241 = fieldWeight in 1952, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                1.5637573 = idf(docFreq=25162, maxDocs=44218)
                0.078125 = fieldNorm(doc=1952)
          0.059357934 = weight(_text_:22 in 1952) [ClassicSimilarity], result of:
            0.059357934 = score(doc=1952,freq=2.0), product of:
              0.15341885 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.043811057 = queryNorm
              0.38690117 = fieldWeight in 1952, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.078125 = fieldNorm(doc=1952)
      0.33333334 = coord(1/3)
    
    Date
    16. 8.1998 12:51:22
    Source
    Proceedings of the 11th annual conference on research and development in information retrieval. Ed.: Y. Chiaramella
  2. Kutschekmanesch, S.; Lutes, B.; Moelle, K.; Thiel, U.; Tzeras, K.: Automated multilingual indexing : a synthesis of rule-based and thesaurus-based methods (1998) 0.02
    0.023731515 = product of:
      0.071194544 = sum of:
        0.071194544 = sum of:
          0.0118366135 = weight(_text_:of in 4157) [ClassicSimilarity], result of:
            0.0118366135 = score(doc=4157,freq=2.0), product of:
              0.06850986 = queryWeight, product of:
                1.5637573 = idf(docFreq=25162, maxDocs=44218)
                0.043811057 = queryNorm
              0.17277241 = fieldWeight in 4157, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                1.5637573 = idf(docFreq=25162, maxDocs=44218)
                0.078125 = fieldNorm(doc=4157)
          0.059357934 = weight(_text_:22 in 4157) [ClassicSimilarity], result of:
            0.059357934 = score(doc=4157,freq=2.0), product of:
              0.15341885 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.043811057 = queryNorm
              0.38690117 = fieldWeight in 4157, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.078125 = fieldNorm(doc=4157)
      0.33333334 = coord(1/3)
    
    Source
    Information und Märkte: 50. Deutscher Dokumentartag 1998, Kongreß der Deutschen Gesellschaft für Dokumentation e.V. (DGD), Rheinische Friedrich-Wilhelms-Universität Bonn, 22.-24. September 1998. Hrsg. von Marlies Ockenfeld u. Gerhard J. Mantwill
  3. Tsareva, P.V.: Algoritmy dlya raspoznavaniya pozitivnykh i negativnykh vkhozdenii deskriptorov v tekst i protsedura avtomaticheskoi klassifikatsii tekstov (1999) 0.02
    0.023731515 = product of:
      0.071194544 = sum of:
        0.071194544 = sum of:
          0.0118366135 = weight(_text_:of in 374) [ClassicSimilarity], result of:
            0.0118366135 = score(doc=374,freq=2.0), product of:
              0.06850986 = queryWeight, product of:
                1.5637573 = idf(docFreq=25162, maxDocs=44218)
                0.043811057 = queryNorm
              0.17277241 = fieldWeight in 374, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                1.5637573 = idf(docFreq=25162, maxDocs=44218)
                0.078125 = fieldNorm(doc=374)
          0.059357934 = weight(_text_:22 in 374) [ClassicSimilarity], result of:
            0.059357934 = score(doc=374,freq=2.0), product of:
              0.15341885 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.043811057 = queryNorm
              0.38690117 = fieldWeight in 374, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.078125 = fieldNorm(doc=374)
      0.33333334 = coord(1/3)
    
    Date
    1. 4.2002 10:22:41
    Footnote
    Übers. des Titels: Algorithms for selection of positive and negative descriptors from text and automated text indexing
  4. Stankovic, R. et al.: Indexing of textual databases based on lexical resources : a case study for Serbian (2016) 0.02
    0.023731515 = product of:
      0.071194544 = sum of:
        0.071194544 = sum of:
          0.0118366135 = weight(_text_:of in 2759) [ClassicSimilarity], result of:
            0.0118366135 = score(doc=2759,freq=2.0), product of:
              0.06850986 = queryWeight, product of:
                1.5637573 = idf(docFreq=25162, maxDocs=44218)
                0.043811057 = queryNorm
              0.17277241 = fieldWeight in 2759, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                1.5637573 = idf(docFreq=25162, maxDocs=44218)
                0.078125 = fieldNorm(doc=2759)
          0.059357934 = weight(_text_:22 in 2759) [ClassicSimilarity], result of:
            0.059357934 = score(doc=2759,freq=2.0), product of:
              0.15341885 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.043811057 = queryNorm
              0.38690117 = fieldWeight in 2759, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.078125 = fieldNorm(doc=2759)
      0.33333334 = coord(1/3)
    
    Date
    1. 2.2016 18:25:22
  5. Newman, D.J.; Block, S.: Probabilistic topic decomposition of an eighteenth-century American newspaper (2006) 0.02
    0.022584006 = product of:
      0.06775202 = sum of:
        0.06775202 = sum of:
          0.02620146 = weight(_text_:of in 5291) [ClassicSimilarity], result of:
            0.02620146 = score(doc=5291,freq=20.0), product of:
              0.06850986 = queryWeight, product of:
                1.5637573 = idf(docFreq=25162, maxDocs=44218)
                0.043811057 = queryNorm
              0.38244802 = fieldWeight in 5291, product of:
                4.472136 = tf(freq=20.0), with freq of:
                  20.0 = termFreq=20.0
                1.5637573 = idf(docFreq=25162, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5291)
          0.041550554 = weight(_text_:22 in 5291) [ClassicSimilarity], result of:
            0.041550554 = score(doc=5291,freq=2.0), product of:
              0.15341885 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.043811057 = queryNorm
              0.2708308 = fieldWeight in 5291, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5291)
      0.33333334 = coord(1/3)
    
    Abstract
    We use a probabilistic mixture decomposition method to determine topics in the Pennsylvania Gazette, a major colonial U.S. newspaper from 1728-1800. We assess the value of several topic decomposition techniques for historical research and compare the accuracy and efficacy of various methods. After determining the topics covered by the 80,000 articles and advertisements in the entire 18th century run of the Gazette, we calculate how the prevalence of those topics changed over time, and give historically relevant examples of our findings. This approach reveals important information about the content of this colonial newspaper, and suggests the value of such approaches to a more complete understanding of early American print culture and society.
    Date
    22. 7.2006 17:32:00
    Source
    Journal of the American Society for Information Science and Technology. 57(2006) no.6, S.753-767
  6. Bordoni, L.; Pazienza, M.T.: Documents automatic indexing in an environmental domain (1997) 0.02
    0.021661952 = product of:
      0.06498586 = sum of:
        0.06498586 = sum of:
          0.0234353 = weight(_text_:of in 530) [ClassicSimilarity], result of:
            0.0234353 = score(doc=530,freq=16.0), product of:
              0.06850986 = queryWeight, product of:
                1.5637573 = idf(docFreq=25162, maxDocs=44218)
                0.043811057 = queryNorm
              0.34207192 = fieldWeight in 530, product of:
                4.0 = tf(freq=16.0), with freq of:
                  16.0 = termFreq=16.0
                1.5637573 = idf(docFreq=25162, maxDocs=44218)
                0.0546875 = fieldNorm(doc=530)
          0.041550554 = weight(_text_:22 in 530) [ClassicSimilarity], result of:
            0.041550554 = score(doc=530,freq=2.0), product of:
              0.15341885 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.043811057 = queryNorm
              0.2708308 = fieldWeight in 530, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0546875 = fieldNorm(doc=530)
      0.33333334 = coord(1/3)
    
    Abstract
    Describes an application of Natural Language Processing (NLP) techniques, in HIRMA (Hypertextual Information Retrieval Managed by ARIOSTO), to the problem of document indexing by referring to a system which incorporates natural language processing techniques to determine the subject of the text of documents and to associate them with relevant semantic indexes. Describes briefly the overall system, details of its implementation on a corpus of scientific abstracts related to environmental topics and experimental evidence of the system's behaviour. Analyzes in detail an experiment designed to evaluate the system's retrieval ability in terms of recall and precision
    Source
    International forum on information and documentation. 22(1997) no.1, S.17-28
  7. Tsujii, J.-I.: Automatic acquisition of semantic collocation from corpora (1995) 0.02
    0.02129588 = product of:
      0.06388764 = sum of:
        0.06388764 = sum of:
          0.016401293 = weight(_text_:of in 4709) [ClassicSimilarity], result of:
            0.016401293 = score(doc=4709,freq=6.0), product of:
              0.06850986 = queryWeight, product of:
                1.5637573 = idf(docFreq=25162, maxDocs=44218)
                0.043811057 = queryNorm
              0.23940048 = fieldWeight in 4709, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                1.5637573 = idf(docFreq=25162, maxDocs=44218)
                0.0625 = fieldNorm(doc=4709)
          0.047486346 = weight(_text_:22 in 4709) [ClassicSimilarity], result of:
            0.047486346 = score(doc=4709,freq=2.0), product of:
              0.15341885 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.043811057 = queryNorm
              0.30952093 = fieldWeight in 4709, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0625 = fieldNorm(doc=4709)
      0.33333334 = coord(1/3)
    
    Abstract
    Proposes automatic linguistic knowledge acquisition from sublanguage corpora. The system combines existing linguistic knowledge and human intervention with corpus based techniques. The algorithm involves a gradual approximation which works to converge linguistic knowledge gradually towards desirable results. The 1st experiment revealed the characteristic of this algorithm and the others proved the effectiveness of this algorithm for a real corpus
    Date
    31. 7.1996 9:22:19
  8. Riloff, E.: ¬An empirical study of automated dictionary construction for information extraction in three domains (1996) 0.02
    0.020292649 = product of:
      0.060877945 = sum of:
        0.060877945 = sum of:
          0.0133916 = weight(_text_:of in 6752) [ClassicSimilarity], result of:
            0.0133916 = score(doc=6752,freq=4.0), product of:
              0.06850986 = queryWeight, product of:
                1.5637573 = idf(docFreq=25162, maxDocs=44218)
                0.043811057 = queryNorm
              0.19546966 = fieldWeight in 6752, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                1.5637573 = idf(docFreq=25162, maxDocs=44218)
                0.0625 = fieldNorm(doc=6752)
          0.047486346 = weight(_text_:22 in 6752) [ClassicSimilarity], result of:
            0.047486346 = score(doc=6752,freq=2.0), product of:
              0.15341885 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.043811057 = queryNorm
              0.30952093 = fieldWeight in 6752, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0625 = fieldNorm(doc=6752)
      0.33333334 = coord(1/3)
    
    Abstract
    AutoSlog is a system that addresses the knowledge engineering bottleneck for information extraction. AutoSlog automatically creates domain specific dictionaries for information extraction, given an appropriate training corpus. Describes experiments with AutoSlog in terrorism, joint ventures and microelectronics domains. Compares the performance of AutoSlog across the 3 domains, discusses the lessons learned and presents results from 2 experiments which demonstrate that novice users can generate effective dictionaries using AutoSlog
    Date
    6. 3.1997 16:22:15
  9. Hodges, P.R.: Keyword in title indexes : effectiveness of retrieval in computer searches (1983) 0.02
    0.02002593 = product of:
      0.060077786 = sum of:
        0.060077786 = sum of:
          0.018527232 = weight(_text_:of in 5001) [ClassicSimilarity], result of:
            0.018527232 = score(doc=5001,freq=10.0), product of:
              0.06850986 = queryWeight, product of:
                1.5637573 = idf(docFreq=25162, maxDocs=44218)
                0.043811057 = queryNorm
              0.2704316 = fieldWeight in 5001, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                1.5637573 = idf(docFreq=25162, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5001)
          0.041550554 = weight(_text_:22 in 5001) [ClassicSimilarity], result of:
            0.041550554 = score(doc=5001,freq=2.0), product of:
              0.15341885 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.043811057 = queryNorm
              0.2708308 = fieldWeight in 5001, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5001)
      0.33333334 = coord(1/3)
    
    Abstract
    A study was done to test the effectiveness of retrieval using title word searching. It was based on actual search profiles used in the Mechanized Information Center at Ohio State University, in order ro replicate as closely as possible actual searching conditions. Fewer than 50% of the relevant titles were retrieved by keywords in titles. The low rate of retrieval can be attributes to three sources: titles themselves, user and information specialist ignorance of the subject vocabulary in use, and to general language problems. Across fields it was found that the social sciences had the best retrieval rate, with science having the next best, and arts and humanities the lowest. Ways to enhance and supplement keyword in title searching on the computer and in printed indexes are discussed.
    Date
    14. 3.1996 13:22:21
  10. Ward, M.L.: ¬The future of the human indexer (1996) 0.02
    0.018134935 = product of:
      0.054404803 = sum of:
        0.054404803 = sum of:
          0.018790042 = weight(_text_:of in 7244) [ClassicSimilarity], result of:
            0.018790042 = score(doc=7244,freq=14.0), product of:
              0.06850986 = queryWeight, product of:
                1.5637573 = idf(docFreq=25162, maxDocs=44218)
                0.043811057 = queryNorm
              0.2742677 = fieldWeight in 7244, product of:
                3.7416575 = tf(freq=14.0), with freq of:
                  14.0 = termFreq=14.0
                1.5637573 = idf(docFreq=25162, maxDocs=44218)
                0.046875 = fieldNorm(doc=7244)
          0.03561476 = weight(_text_:22 in 7244) [ClassicSimilarity], result of:
            0.03561476 = score(doc=7244,freq=2.0), product of:
              0.15341885 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.043811057 = queryNorm
              0.23214069 = fieldWeight in 7244, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=7244)
      0.33333334 = coord(1/3)
    
    Abstract
    Considers the principles of indexing and the intellectual skills involved in order to determine what automatic indexing systems would be required in order to supplant or complement the human indexer. Good indexing requires: considerable prior knowledge of the literature; judgement as to what to index and what depth to index; reading skills; abstracting skills; and classification skills, Illustrates these features with a detailed description of abstracting and indexing processes involved in generating entries for the mechanical engineering database POWERLINK. Briefly assesses the possibility of replacing human indexers with specialist indexing software, with particular reference to the Object Analyzer from the InTEXT automatic indexing system and using the criteria described for human indexers. At present, it is unlikely that the automatic indexer will replace the human indexer, but when more primary texts are available in electronic form, it may be a useful productivity tool for dealing with large quantities of low grade texts (should they be wanted in the database)
    Date
    9. 2.1997 18:44:22
    Source
    Journal of librarianship and information science. 28(1996) no.4, S.217-225
  11. Wolfekuhler, M.R.; Punch, W.F.: Finding salient features for personal Web pages categories (1997) 0.02
    0.01775607 = product of:
      0.053268205 = sum of:
        0.053268205 = sum of:
          0.01171765 = weight(_text_:of in 2673) [ClassicSimilarity], result of:
            0.01171765 = score(doc=2673,freq=4.0), product of:
              0.06850986 = queryWeight, product of:
                1.5637573 = idf(docFreq=25162, maxDocs=44218)
                0.043811057 = queryNorm
              0.17103596 = fieldWeight in 2673, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                1.5637573 = idf(docFreq=25162, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2673)
          0.041550554 = weight(_text_:22 in 2673) [ClassicSimilarity], result of:
            0.041550554 = score(doc=2673,freq=2.0), product of:
              0.15341885 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.043811057 = queryNorm
              0.2708308 = fieldWeight in 2673, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2673)
      0.33333334 = coord(1/3)
    
    Abstract
    Examines techniques that discover features in sets of pre-categorized documents, such that similar documents can be found on the WWW. Examines techniques which will classifiy training examples with high accuracy, then explains why this is not necessarily useful. Describes a method for extracting word clusters from the raw document features. Results show that the clustering technique is successful in discovering word groups in personal Web pages which can be used to find similar information on the WWW
    Date
    1. 8.1996 22:08:06
    Footnote
    Contribution to a special issue of papers from the 6th International World Wide Web conference, held 7-11 Apr 1997, Santa Clara, California
  12. Milstead, J.L.: Thesauri in a full-text world (1998) 0.02
    0.016131433 = product of:
      0.048394296 = sum of:
        0.048394296 = sum of:
          0.01871533 = weight(_text_:of in 2337) [ClassicSimilarity], result of:
            0.01871533 = score(doc=2337,freq=20.0), product of:
              0.06850986 = queryWeight, product of:
                1.5637573 = idf(docFreq=25162, maxDocs=44218)
                0.043811057 = queryNorm
              0.27317715 = fieldWeight in 2337, product of:
                4.472136 = tf(freq=20.0), with freq of:
                  20.0 = termFreq=20.0
                1.5637573 = idf(docFreq=25162, maxDocs=44218)
                0.0390625 = fieldNorm(doc=2337)
          0.029678967 = weight(_text_:22 in 2337) [ClassicSimilarity], result of:
            0.029678967 = score(doc=2337,freq=2.0), product of:
              0.15341885 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.043811057 = queryNorm
              0.19345059 = fieldWeight in 2337, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=2337)
      0.33333334 = coord(1/3)
    
    Abstract
    Despite early claims to the contemporary, thesauri continue to find use as access tools for information in the full-text environment. Their mode of use is changing, but this change actually represents an expansion rather than a contrdiction of their utility. Thesauri and similar vocabulary tools can complement full-text access by aiding users in focusing their searches, by supplementing the linguistic analysis of the text search engine, and even by serving as one of the tools used by the linguistic engine for its analysis. While human indexing contunues to be used for many databases, the trend is to increase the use of machine aids for this purpose. All machine-aided indexing (MAI) systems rely on thesauri as the basis for term selection. In the 21st century, the balance of effort between human and machine will change at both input and output, but thesauri will continue to play an important role for the foreseeable future
    Date
    22. 9.1997 19:16:05
    Imprint
    Urbana-Champaign, IL : Illinois University at Urbana-Champaign, Graduate School of Library and Information Science
    Source
    Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al
  13. Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.02
    0.015828783 = product of:
      0.047486346 = sum of:
        0.047486346 = product of:
          0.09497269 = sum of:
            0.09497269 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
              0.09497269 = score(doc=402,freq=2.0), product of:
                0.15341885 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043811057 = queryNorm
                0.61904186 = fieldWeight in 402, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.125 = fieldNorm(doc=402)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Source
    Information processing and management. 22(1986) no.6, S.465-476
  14. Mesquita, L.A.P.; Souza, R.R.; Baracho Porto, R.M.A.: Noun phrases in automatic indexing: : a structural analysis of the distribution of relevant terms in doctoral theses (2014) 0.02
    0.015805468 = product of:
      0.0474164 = sum of:
        0.0474164 = sum of:
          0.023673227 = weight(_text_:of in 1442) [ClassicSimilarity], result of:
            0.023673227 = score(doc=1442,freq=50.0), product of:
              0.06850986 = queryWeight, product of:
                1.5637573 = idf(docFreq=25162, maxDocs=44218)
                0.043811057 = queryNorm
              0.34554482 = fieldWeight in 1442, product of:
                7.071068 = tf(freq=50.0), with freq of:
                  50.0 = termFreq=50.0
                1.5637573 = idf(docFreq=25162, maxDocs=44218)
                0.03125 = fieldNorm(doc=1442)
          0.023743173 = weight(_text_:22 in 1442) [ClassicSimilarity], result of:
            0.023743173 = score(doc=1442,freq=2.0), product of:
              0.15341885 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.043811057 = queryNorm
              0.15476047 = fieldWeight in 1442, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.03125 = fieldNorm(doc=1442)
      0.33333334 = coord(1/3)
    
    Abstract
    The main objective of this research was to analyze whether there was a characteristic distribution behavior of relevant terms over a scientific text that could contribute as a criterion for their process of automatic indexing. The terms considered in this study were only full noun phrases contained in the texts themselves. The texts were considered a total of 98 doctoral theses of the eight areas of knowledge in a same university. Initially, 20 full noun phrases were automatically extracted from each text as candidates to be the most relevant terms, and each author of each text assigned a relevance value 0-6 (not relevant and highly relevant, respectively) for each of the 20 noun phrases sent. Only, 22.1 % of noun phrases were considered not relevant. A relevance values of the terms assigned by the authors were associated with their positions in the text. Each full noun phrases found in the text was considered as a valid linear position. The results that were obtained showed values resulting from this distribution by considering two types of position: linear, with values consolidated into ten equal consecutive parts; and structural, considering parts of the text (such as introduction, development and conclusion). As a result of considerable importance, all areas of knowledge related to the Natural Sciences showed a characteristic behavior in the distribution of relevant terms, as well as all areas of knowledge related to Social Sciences showed the same characteristic behavior of distribution, but distinct from the Natural Sciences. The difference of the distribution behavior between the Natural and Social Sciences can be clearly visualized through graphs. All behaviors, including the general behavior of all areas of knowledge together, were characterized in polynomial equations and can be applied in future as criteria for automatic indexing. Until the present date this work has become inedited of for two reasons: to present a method for characterizing the distribution of relevant terms in a scientific text, and also, through this method, pointing out a quantitative trait difference between the Natural and Social Sciences.
    Source
    Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik
  15. Plaunt, C.; Norgard, B.A.: ¬An association-based method for automatic indexing with a controlled vocabulary (1998) 0.02
    0.015472822 = product of:
      0.046418466 = sum of:
        0.046418466 = sum of:
          0.016739499 = weight(_text_:of in 1794) [ClassicSimilarity], result of:
            0.016739499 = score(doc=1794,freq=16.0), product of:
              0.06850986 = queryWeight, product of:
                1.5637573 = idf(docFreq=25162, maxDocs=44218)
                0.043811057 = queryNorm
              0.24433708 = fieldWeight in 1794, product of:
                4.0 = tf(freq=16.0), with freq of:
                  16.0 = termFreq=16.0
                1.5637573 = idf(docFreq=25162, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1794)
          0.029678967 = weight(_text_:22 in 1794) [ClassicSimilarity], result of:
            0.029678967 = score(doc=1794,freq=2.0), product of:
              0.15341885 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.043811057 = queryNorm
              0.19345059 = fieldWeight in 1794, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1794)
      0.33333334 = coord(1/3)
    
    Abstract
    In this article, we describe and test a two-stage algorithm based on a lexical collocation technique which maps from the lexical clues contained in a document representation into a controlled vocabulary list of subject headings. Using a collection of 4.626 INSPEC documents, we create a 'dictionary' of associations between the lexical items contained in the titles, authors, and abstracts, and controlled vocabulary subject headings assigned to those records by human indexers using a likelihood ratio statistic as the measure of association. In the deployment stage, we use the dictiony to predict which of the controlled vocabulary subject headings best describe new documents when they are presented to the system. Our evaluation of this algorithm, in which we compare the automatically assigned subject headings to the subject headings assigned to the test documents by human catalogers, shows that we can obtain results comparable to, and consistent with, human cataloging. In effect we have cast this as a classic partial match information retrieval problem. We consider the problem to be one of 'retrieving' (or assigning) the most probably 'relevant' (or correct) controlled vocabulary subject headings to a document based on the clues contained in that document
    Date
    11. 9.2000 19:53:22
    Source
    Journal of the American Society for Information Science. 49(1998) no.10, S.888-902
  16. Fuhr, N.; Niewelt, B.: ¬Ein Retrievaltest mit automatisch indexierten Dokumenten (1984) 0.01
    0.013850185 = product of:
      0.041550554 = sum of:
        0.041550554 = product of:
          0.08310111 = sum of:
            0.08310111 = weight(_text_:22 in 262) [ClassicSimilarity], result of:
              0.08310111 = score(doc=262,freq=2.0), product of:
                0.15341885 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043811057 = queryNorm
                0.5416616 = fieldWeight in 262, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=262)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    20.10.2000 12:22:23
  17. Hlava, M.M.K.: Automatic indexing : comparing rule-based and statistics-based indexing systems (2005) 0.01
    0.013850185 = product of:
      0.041550554 = sum of:
        0.041550554 = product of:
          0.08310111 = sum of:
            0.08310111 = weight(_text_:22 in 6265) [ClassicSimilarity], result of:
              0.08310111 = score(doc=6265,freq=2.0), product of:
                0.15341885 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043811057 = queryNorm
                0.5416616 = fieldWeight in 6265, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=6265)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Source
    Information outlook. 9(2005) no.8, S.22-23
  18. Martins, A.L.; Souza, R.R.; Ribeiro de Mello, H.: ¬The use of noun phrases in information retrieval : proposing a mechanism for automatic classification (2014) 0.01
    0.013604727 = product of:
      0.04081418 = sum of:
        0.04081418 = sum of:
          0.017071007 = weight(_text_:of in 1441) [ClassicSimilarity], result of:
            0.017071007 = score(doc=1441,freq=26.0), product of:
              0.06850986 = queryWeight, product of:
                1.5637573 = idf(docFreq=25162, maxDocs=44218)
                0.043811057 = queryNorm
              0.2491759 = fieldWeight in 1441, product of:
                5.0990195 = tf(freq=26.0), with freq of:
                  26.0 = termFreq=26.0
                1.5637573 = idf(docFreq=25162, maxDocs=44218)
                0.03125 = fieldNorm(doc=1441)
          0.023743173 = weight(_text_:22 in 1441) [ClassicSimilarity], result of:
            0.023743173 = score(doc=1441,freq=2.0), product of:
              0.15341885 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.043811057 = queryNorm
              0.15476047 = fieldWeight in 1441, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.03125 = fieldNorm(doc=1441)
      0.33333334 = coord(1/3)
    
    Abstract
    This paper presents a research on syntactic structures known as noun phrases (NP) being applied to increase the effectiveness and efficiency of the mechanisms for the document's classification. Our hypothesis is the fact that the NP can be used instead of single words as a semantic aggregator to reduce the number of words that will be used for the classification system without losing its semantic coverage, increasing its efficiency. The experiment divided the documents classification process in three phases: a) NP preprocessing b) system training; and c) classification experiments. In the first step, a corpus of digitalized texts was submitted to a natural language processing platform1 in which the part-of-speech tagging was done, and them PERL scripts pertaining to the PALAVRAS package were used to extract the Noun Phrases. The preprocessing also involved the tasks of a) removing NP low meaning pre-modifiers, as quantifiers; b) identification of synonyms and corresponding substitution for common hyperonyms; and c) stemming of the relevant words contained in the NP, for similitude checking with other NPs. The first tests with the resulting documents have demonstrated its effectiveness. We have compared the structural similarity of the documents before and after the whole pre-processing steps of phase one. The texts maintained the consistency with the original and have kept the readability. The second phase involves submitting the modified documents to a SVM algorithm to identify clusters and classify the documents. The classification rules are to be established using a machine learning approach. Finally, tests will be conducted to check the effectiveness of the whole process.
    Source
    Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik
  19. Greiner-Petter, A.; Schubotz, M.; Cohl, H.S.; Gipp, B.: Semantic preserving bijective mappings for expressions involving special functions between computer algebra systems and document preparation systems (2019) 0.01
    0.012649037 = product of:
      0.03794711 = sum of:
        0.03794711 = sum of:
          0.014203936 = weight(_text_:of in 5499) [ClassicSimilarity], result of:
            0.014203936 = score(doc=5499,freq=18.0), product of:
              0.06850986 = queryWeight, product of:
                1.5637573 = idf(docFreq=25162, maxDocs=44218)
                0.043811057 = queryNorm
              0.20732687 = fieldWeight in 5499, product of:
                4.2426405 = tf(freq=18.0), with freq of:
                  18.0 = termFreq=18.0
                1.5637573 = idf(docFreq=25162, maxDocs=44218)
                0.03125 = fieldNorm(doc=5499)
          0.023743173 = weight(_text_:22 in 5499) [ClassicSimilarity], result of:
            0.023743173 = score(doc=5499,freq=2.0), product of:
              0.15341885 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.043811057 = queryNorm
              0.15476047 = fieldWeight in 5499, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.03125 = fieldNorm(doc=5499)
      0.33333334 = coord(1/3)
    
    Abstract
    Purpose Modern mathematicians and scientists of math-related disciplines often use Document Preparation Systems (DPS) to write and Computer Algebra Systems (CAS) to calculate mathematical expressions. Usually, they translate the expressions manually between DPS and CAS. This process is time-consuming and error-prone. The purpose of this paper is to automate this translation. This paper uses Maple and Mathematica as the CAS, and LaTeX as the DPS. Design/methodology/approach Bruce Miller at the National Institute of Standards and Technology (NIST) developed a collection of special LaTeX macros that create links from mathematical symbols to their definitions in the NIST Digital Library of Mathematical Functions (DLMF). The authors are using these macros to perform rule-based translations between the formulae in the DLMF and CAS. Moreover, the authors develop software to ease the creation of new rules and to discover inconsistencies. Findings The authors created 396 mappings and translated 58.8 percent of DLMF formulae (2,405 expressions) successfully between Maple and DLMF. For a significant percentage, the special function definitions in Maple and the DLMF were different. An atomic symbol in one system maps to a composite expression in the other system. The translator was also successfully used for automatic verification of mathematical online compendia and CAS. The evaluation techniques discovered two errors in the DLMF and one defect in Maple. Originality/value This paper introduces the first translation tool for special functions between LaTeX and CAS. The approach improves error-prone manual translations and can be used to verify mathematical online compendia and CAS.
    Date
    20. 1.2015 18:30:22
    Source
    Aslib journal of information management. 71(2019) no.3, S.415-439
  20. Fuhr, N.: Ranking-Experimente mit gewichteter Indexierung (1986) 0.01
    0.011871587 = product of:
      0.03561476 = sum of:
        0.03561476 = product of:
          0.07122952 = sum of:
            0.07122952 = weight(_text_:22 in 58) [ClassicSimilarity], result of:
              0.07122952 = score(doc=58,freq=2.0), product of:
                0.15341885 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043811057 = queryNorm
                0.46428138 = fieldWeight in 58, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=58)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    14. 6.2015 22:12:44

Languages

Types

  • a 241
  • el 21
  • x 8
  • m 4
  • s 2
  • More… Less…

Classifications