Search (168 results, page 1 of 9)

  • × theme_ss:"Automatisches Indexieren"
  1. Ward, M.L.: ¬The future of the human indexer (1996) 0.07
    0.06935668 = product of:
      0.11559446 = sum of:
        0.07611368 = weight(_text_:index in 7244) [ClassicSimilarity], result of:
          0.07611368 = score(doc=7244,freq=4.0), product of:
            0.18579477 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.04251826 = queryNorm
            0.40966535 = fieldWeight in 7244, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.046875 = fieldNorm(doc=7244)
        0.027959513 = weight(_text_:system in 7244) [ClassicSimilarity], result of:
          0.027959513 = score(doc=7244,freq=2.0), product of:
            0.13391352 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.04251826 = queryNorm
            0.20878783 = fieldWeight in 7244, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.046875 = fieldNorm(doc=7244)
        0.011521274 = product of:
          0.03456382 = sum of:
            0.03456382 = weight(_text_:22 in 7244) [ClassicSimilarity], result of:
              0.03456382 = score(doc=7244,freq=2.0), product of:
                0.1488917 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04251826 = queryNorm
                0.23214069 = fieldWeight in 7244, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=7244)
          0.33333334 = coord(1/3)
      0.6 = coord(3/5)
    
    Abstract
    Considers the principles of indexing and the intellectual skills involved in order to determine what automatic indexing systems would be required in order to supplant or complement the human indexer. Good indexing requires: considerable prior knowledge of the literature; judgement as to what to index and what depth to index; reading skills; abstracting skills; and classification skills, Illustrates these features with a detailed description of abstracting and indexing processes involved in generating entries for the mechanical engineering database POWERLINK. Briefly assesses the possibility of replacing human indexers with specialist indexing software, with particular reference to the Object Analyzer from the InTEXT automatic indexing system and using the criteria described for human indexers. At present, it is unlikely that the automatic indexer will replace the human indexer, but when more primary texts are available in electronic form, it may be a useful productivity tool for dealing with large quantities of low grade texts (should they be wanted in the database)
    Date
    9. 2.1997 18:44:22
  2. SIGIR'92 : Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (1992) 0.06
    0.063462004 = product of:
      0.10577001 = sum of:
        0.028243875 = weight(_text_:context in 6671) [ClassicSimilarity], result of:
          0.028243875 = score(doc=6671,freq=2.0), product of:
            0.17622331 = queryWeight, product of:
              4.14465 = idf(docFreq=1904, maxDocs=44218)
              0.04251826 = queryNorm
            0.16027321 = fieldWeight in 6671, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.14465 = idf(docFreq=1904, maxDocs=44218)
              0.02734375 = fieldNorm(doc=6671)
        0.03139529 = weight(_text_:index in 6671) [ClassicSimilarity], result of:
          0.03139529 = score(doc=6671,freq=2.0), product of:
            0.18579477 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.04251826 = queryNorm
            0.16897833 = fieldWeight in 6671, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.02734375 = fieldNorm(doc=6671)
        0.04613084 = weight(_text_:system in 6671) [ClassicSimilarity], result of:
          0.04613084 = score(doc=6671,freq=16.0), product of:
            0.13391352 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.04251826 = queryNorm
            0.34448233 = fieldWeight in 6671, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.02734375 = fieldNorm(doc=6671)
      0.6 = coord(3/5)
    
    Content
    HARMAN, D.: Relevance feedback revisited; AALBERSBERG, I.J.: Incremental relevance feedback; TAGUE-SUTCLIFFE, J.: Measuring the informativeness of a retrieval process; LEWIS, D.D.: An evaluation of phrasal and clustered representations on a text categorization task; BLOSSEVILLE, M.J., G. HÉBRAIL, M.G. MONTEIL u. N. PÉNOT: Automatic document classification: natural language processing, statistical analysis, and expert system techniques used together; MASAND, B., G. LINOFF u. D. WALTZ: Classifying news stories using memory based reasoning; KEEN, E.M.: Term position ranking: some new test results; CROUCH, C.J. u. B. YANG: Experiments in automatic statistical thesaurus construction; GREFENSTETTE, G.: Use of syntactic context to produce term association lists for text retrieval; ANICK, P.G. u. R.A. FLYNN: Versioning of full-text information retrieval system; BURKOWSKI, F.J.: Retrieval activities in a database consisting of heterogeneous collections; DEERWESTER, S.C., K. WACLENA u. M. LaMAR: A textual object management system; NIE, J.-Y.:Towards a probabilistic modal logic for semantic-based information retrieval; WANG, A.W., S.K.M. WONG u. Y.Y. YAO: An analysis of vector space models based on computational geometry; BARTELL, B.T., G.W. COTTRELL u. R.K. BELEW: Latent semantic indexing is an optimal special case of multidimensional scaling; GLAVITSCH, U. u. P. SCHÄUBLE: A system for retrieving speech documents; MARGULIS, E.L.: N-Poisson document modelling; HESS, M.: An incrementally extensible document retrieval system based on linguistics and logical principles; COOPER, W.S., F.C. GEY u. D.P. DABNEY: Probabilistic retrieval based on staged logistic regression; FUHR, N.: Integration of probabilistic fact and text retrieval; CROFT, B., L.A. SMITH u. H. TURTLE: A loosely-coupled integration of a text retrieval system and an object-oriented database system; DUMAIS, S.T. u. J. NIELSEN: Automating the assignement of submitted manuscripts to reviewers; GOST, M.A. u. M. MASOTTI: Design of an OPAC database to permit different subject searching accesses; ROBERTSON, A.M. u. P. WILLETT: Searching for historical word forms in a database of 17th century English text using spelling correction methods; FAX, E.A., Q.F. CHEN u. L.S. HEATH: A faster algorithm for constructing minimal perfect hash functions; MOFFAT, A. u. J. ZOBEL: Parameterised compression for sparse bitmaps; GRANDI, F., P. TIBERIO u. P. Zezula: Frame-sliced patitioned parallel signature files; ALLEN, B.: Cognitive differences in end user searching of a CD-ROM index; SONNENWALD, D.H.: Developing a theory to guide the process of designing information retrieval systems; CUTTING, D.R., J.O. PEDERSEN, D. KARGER, u. J.W. TUKEY: Scatter/ Gather: a cluster-based approach to browsing large document collections; CHALMERS, M. u. P. CHITSON: Bead: Explorations in information visualization; WILLIAMSON, C. u. B. SHNEIDERMAN: The dynamic HomeFinder: evaluating dynamic queries in a real-estate information exploring system
  3. Hmeidi, I.; Kanaan, G.; Evens, M.: Design and implementation of automatic indexing for information retrieval with Arabic documents (1997) 0.06
    0.05604352 = product of:
      0.093405865 = sum of:
        0.0538205 = weight(_text_:index in 1660) [ClassicSimilarity], result of:
          0.0538205 = score(doc=1660,freq=2.0), product of:
            0.18579477 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.04251826 = queryNorm
            0.28967714 = fieldWeight in 1660, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.046875 = fieldNorm(doc=1660)
        0.027959513 = weight(_text_:system in 1660) [ClassicSimilarity], result of:
          0.027959513 = score(doc=1660,freq=2.0), product of:
            0.13391352 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.04251826 = queryNorm
            0.20878783 = fieldWeight in 1660, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.046875 = fieldNorm(doc=1660)
        0.011625858 = product of:
          0.034877572 = sum of:
            0.034877572 = weight(_text_:29 in 1660) [ClassicSimilarity], result of:
              0.034877572 = score(doc=1660,freq=2.0), product of:
                0.14956595 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.04251826 = queryNorm
                0.23319192 = fieldWeight in 1660, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1660)
          0.33333334 = coord(1/3)
      0.6 = coord(3/5)
    
    Abstract
    A corpus of 242 abstracts of Arabic documents on computer science and information systems using the Proceedings of the Saudi Arabian National Conferences as a source was put together. Reports on the design and building of an automatic information retrieval system from scratch to handle Arabic data. Both automatic and manual indexing techniques were implemented. Experiments using measures of recall and precision has demonstrated that automatic indexing is at least as effective as manual indexing and more effective in some cases. Automatic indexing is both cheaper and faster. Results suggests that a wider coverage of the literature can be achieved with less money and produce as good results as with manual indexing. Compares the retrieval results using words as index terms versus stems and roots, and confirms the results obtained by Al-Kharashi and Abu-Salem with smaller corpora that root indexing is more effective than word indexing
    Date
    29. 7.1998 17:40:01
  4. Nicoletti, M.: Automatische Indexierung (2001) 0.05
    0.052357085 = product of:
      0.13089271 = sum of:
        0.107641 = weight(_text_:index in 4326) [ClassicSimilarity], result of:
          0.107641 = score(doc=4326,freq=2.0), product of:
            0.18579477 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.04251826 = queryNorm
            0.5793543 = fieldWeight in 4326, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.09375 = fieldNorm(doc=4326)
        0.023251716 = product of:
          0.069755144 = sum of:
            0.069755144 = weight(_text_:29 in 4326) [ClassicSimilarity], result of:
              0.069755144 = score(doc=4326,freq=2.0), product of:
                0.14956595 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.04251826 = queryNorm
                0.46638384 = fieldWeight in 4326, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.09375 = fieldNorm(doc=4326)
          0.33333334 = coord(1/3)
      0.4 = coord(2/5)
    
    Content
    Inhalt: 1. Aufgabe - 2. Ermittlung von Mehrwortgruppen - 2.1 Definition - 3. Kennzeichnung der Mehrwortgruppen - 4. Grundformen - 5. Term- und Dokumenthäufigkeit --- Termgewichtung - 6. Steuerungsinstrument Schwellenwert - 7. Invertierter Index. Vgl. unter: http://www.grin.com/de/e-book/104966/automatische-indexierung.
    Date
    29. 9.2017 12:00:04
  5. Hauer, M.: Automatische Indexierung (2000) 0.05
    0.05227342 = product of:
      0.13068354 = sum of:
        0.107641 = weight(_text_:index in 5887) [ClassicSimilarity], result of:
          0.107641 = score(doc=5887,freq=2.0), product of:
            0.18579477 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.04251826 = queryNorm
            0.5793543 = fieldWeight in 5887, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.09375 = fieldNorm(doc=5887)
        0.023042548 = product of:
          0.06912764 = sum of:
            0.06912764 = weight(_text_:22 in 5887) [ClassicSimilarity], result of:
              0.06912764 = score(doc=5887,freq=2.0), product of:
                0.1488917 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04251826 = queryNorm
                0.46428138 = fieldWeight in 5887, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=5887)
          0.33333334 = coord(1/3)
      0.4 = coord(2/5)
    
    Object
    Index-5.0
    Source
    Wissen in Aktion: Wege des Knowledge Managements. 22. Online-Tagung der DGI, Frankfurt am Main, 2.-4.5.2000. Proceedings. Hrsg.: R. Schmidt
  6. Williams, R.V.: Hans Peter Luhn and Herbert M. Ohlman : their roles in the origins of keyword-in-context/permutation automatic indexing (2010) 0.05
    0.051430937 = product of:
      0.12857734 = sum of:
        0.09129799 = weight(_text_:context in 3440) [ClassicSimilarity], result of:
          0.09129799 = score(doc=3440,freq=4.0), product of:
            0.17622331 = queryWeight, product of:
              4.14465 = idf(docFreq=1904, maxDocs=44218)
              0.04251826 = queryNorm
            0.51808125 = fieldWeight in 3440, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.14465 = idf(docFreq=1904, maxDocs=44218)
              0.0625 = fieldNorm(doc=3440)
        0.03727935 = weight(_text_:system in 3440) [ClassicSimilarity], result of:
          0.03727935 = score(doc=3440,freq=2.0), product of:
            0.13391352 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.04251826 = queryNorm
            0.27838376 = fieldWeight in 3440, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.0625 = fieldNorm(doc=3440)
      0.4 = coord(2/5)
    
    Abstract
    The invention of automatic indexing using a keyword-in-context approach has generally been attributed solely to Hans Peter Luhn of IBM. This article shows that credit for this invention belongs equally to Luhn and Herbert Ohlman of the System Development Corporation. It also traces the origins of title derivative automatic indexing, its development and implementation, and current status.
  7. Koch, T.: Experiments with automatic classification of WAIS databases and indexing of WWW : some results from the Nordic WAIS/WWW project (1994) 0.04
    0.038164005 = product of:
      0.09541001 = sum of:
        0.06279058 = weight(_text_:index in 7209) [ClassicSimilarity], result of:
          0.06279058 = score(doc=7209,freq=2.0), product of:
            0.18579477 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.04251826 = queryNorm
            0.33795667 = fieldWeight in 7209, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.0546875 = fieldNorm(doc=7209)
        0.03261943 = weight(_text_:system in 7209) [ClassicSimilarity], result of:
          0.03261943 = score(doc=7209,freq=2.0), product of:
            0.13391352 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.04251826 = queryNorm
            0.2435858 = fieldWeight in 7209, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.0546875 = fieldNorm(doc=7209)
      0.4 = coord(2/5)
    
    Abstract
    The Nordic WAIS/WWW project sponsored by NORDINFO is a joint project between Lund University Library and the National Technological Library of Denmark. It aims to improve the existing networked information discovery and retrieval tools Wide Area Information System (WAIS) and World Wide Web (WWW), and to move towards unifying WWW and WAIS. Details current results focusing on the WAIS side of the project. Describes research into automatic indexing and classification of WAIS sources, development of an orientation tool for WAIS, and development of a WAIS index of WWW resources
  8. Srinivasan, P.: On generalizing the Two-Poisson Model (1990) 0.04
    0.038164005 = product of:
      0.09541001 = sum of:
        0.06279058 = weight(_text_:index in 2880) [ClassicSimilarity], result of:
          0.06279058 = score(doc=2880,freq=2.0), product of:
            0.18579477 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.04251826 = queryNorm
            0.33795667 = fieldWeight in 2880, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2880)
        0.03261943 = weight(_text_:system in 2880) [ClassicSimilarity], result of:
          0.03261943 = score(doc=2880,freq=2.0), product of:
            0.13391352 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.04251826 = queryNorm
            0.2435858 = fieldWeight in 2880, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2880)
      0.4 = coord(2/5)
    
    Abstract
    Automatic indexing is one of the important functions of a modern document retrieval system. Numerous techniques for this function have been proposed in the literature ranging from purely statistical to linguistically complex mechanisms. Most result from examining properties of terms. Examines term distribution within the framework of the Poisson models. Specifically examines the effectiveness of the Two-Poisson and the Three-Poisson model to see if generalisation results in increased effectiveness. The results show that the Two-Poisson model is only moderately effective in identifying index terms. In addition, generalisation to the Three-Poisson does not give any additional power. The only Poisson model which consistently works well is the basic One-Poisson model. Also discusses term distribution information.
  9. Villaespesa, E.; Crider, S.: ¬A critical comparison analysis between human and machine-generated tags for the Metropolitan Museum of Art's collection (2021) 0.03
    0.032144334 = product of:
      0.08036084 = sum of:
        0.057061244 = weight(_text_:context in 341) [ClassicSimilarity], result of:
          0.057061244 = score(doc=341,freq=4.0), product of:
            0.17622331 = queryWeight, product of:
              4.14465 = idf(docFreq=1904, maxDocs=44218)
              0.04251826 = queryNorm
            0.32380077 = fieldWeight in 341, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.14465 = idf(docFreq=1904, maxDocs=44218)
              0.0390625 = fieldNorm(doc=341)
        0.023299592 = weight(_text_:system in 341) [ClassicSimilarity], result of:
          0.023299592 = score(doc=341,freq=2.0), product of:
            0.13391352 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.04251826 = queryNorm
            0.17398985 = fieldWeight in 341, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.0390625 = fieldNorm(doc=341)
      0.4 = coord(2/5)
    
    Abstract
    Purpose Based on the highlights of The Metropolitan Museum of Art's collection, the purpose of this paper is to examine the similarities and differences between the subject keywords tags assigned by the museum and those produced by three computer vision systems. Design/methodology/approach This paper uses computer vision tools to generate the data and the Getty Research Institute's Art and Architecture Thesaurus (AAT) to compare the subject keyword tags. Findings This paper finds that there are clear opportunities to use computer vision technologies to automatically generate tags that expand the terms used by the museum. This brings a new perspective to the collection that is different from the traditional art historical one. However, the study also surfaces challenges about the accuracy and lack of context within the computer vision results. Practical implications This finding has important implications on how these machine-generated tags complement the current taxonomies and vocabularies inputted in the collection database. In consequence, the museum needs to consider the selection process for choosing which computer vision system to apply to their collection. Furthermore, they also need to think critically about the kind of tags they wish to use, such as colors, materials or objects. Originality/value The study results add to the rapidly evolving field of computer vision within the art information context and provide recommendations of aspects to consider before selecting and implementing these technologies.
  10. Salton, G.: Automatic processing of foreign language documents (1985) 0.03
    0.029263873 = product of:
      0.07315968 = sum of:
        0.03588033 = weight(_text_:index in 3650) [ClassicSimilarity], result of:
          0.03588033 = score(doc=3650,freq=2.0), product of:
            0.18579477 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.04251826 = queryNorm
            0.1931181 = fieldWeight in 3650, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.03125 = fieldNorm(doc=3650)
        0.03727935 = weight(_text_:system in 3650) [ClassicSimilarity], result of:
          0.03727935 = score(doc=3650,freq=8.0), product of:
            0.13391352 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.04251826 = queryNorm
            0.27838376 = fieldWeight in 3650, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.03125 = fieldNorm(doc=3650)
      0.4 = coord(2/5)
    
    Abstract
    The attempt to computerize a process, such as indexing, abstracting, classifying, or retrieving information, begins with an analysis of the process into its intellectual and nonintellectual components. That part of the process which is amenable to computerization is mechanical or algorithmic. What is not is intellectual or creative and requires human intervention. Gerard Salton has been an innovator, experimenter, and promoter in the area of mechanized information systems since the early 1960s. He has been particularly ingenious at analyzing the process of information retrieval into its algorithmic components. He received a doctorate in applied mathematics from Harvard University before moving to the computer science department at Cornell, where he developed a prototype automatic retrieval system called SMART. Working with this system he and his students contributed for over a decade to our theoretical understanding of the retrieval process. On a more practical level, they have contributed design criteria for operating retrieval systems. The following selection presents one of the early descriptions of the SMART system; it is valuable as it shows the direction automatic retrieval methods were to take beyond simple word-matching techniques. These include various word normalization techniques to improve recall, for instance, the separation of words into stems and affixes; the correlation and clustering, using statistical association measures, of related terms; and the identification, using a concept thesaurus, of synonymous, broader, narrower, and sibling terms. They include, as weIl, techniques, both linguistic and statistical, to deal with the thorny problem of how to automatically extract from texts index terms that consist of more than one word. They include weighting techniques and various documentrequest matching algorithms. Significant among the latter are those which produce a retrieval output of citations ranked in relevante order. During the 1970s, Salton and his students went an to further refine these various techniques, particularly the weighting and statistical association measures. Many of their early innovations seem commonplace today. Some of their later techniques are still ahead of their time and await technological developments for implementation. The particular focus of the selection that follows is an the evaluation of a particular component of the SMART system, a multilingual thesaurus. By mapping English language expressions and their German equivalents to a common concept number, the thesaurus permitted the automatic processing of German language documents against English language queries and vice versa. The results of the evaluation, as it turned out, were somewhat inconclusive. However, this SMART experiment suggested in a bold and optimistic way how one might proceed to answer such complex questions as What is meant by retrieval language compatability? How it is to be achieved, and how evaluated?
  11. Sparck Jones, K.: Index term weighting (1973) 0.03
    0.028704265 = product of:
      0.14352132 = sum of:
        0.14352132 = weight(_text_:index in 5491) [ClassicSimilarity], result of:
          0.14352132 = score(doc=5491,freq=2.0), product of:
            0.18579477 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.04251826 = queryNorm
            0.7724724 = fieldWeight in 5491, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.125 = fieldNorm(doc=5491)
      0.2 = coord(1/5)
    
  12. Lepsky, K.; Vorhauer, J.: Lingo - ein open source System für die Automatische Indexierung deutschsprachiger Dokumente (2006) 0.03
    0.027233064 = product of:
      0.06808266 = sum of:
        0.05272096 = weight(_text_:system in 3581) [ClassicSimilarity], result of:
          0.05272096 = score(doc=3581,freq=4.0), product of:
            0.13391352 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.04251826 = queryNorm
            0.3936941 = fieldWeight in 3581, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.0625 = fieldNorm(doc=3581)
        0.015361699 = product of:
          0.046085097 = sum of:
            0.046085097 = weight(_text_:22 in 3581) [ClassicSimilarity], result of:
              0.046085097 = score(doc=3581,freq=2.0), product of:
                0.1488917 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04251826 = queryNorm
                0.30952093 = fieldWeight in 3581, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=3581)
          0.33333334 = coord(1/3)
      0.4 = coord(2/5)
    
    Abstract
    Lingo ist ein frei verfügbares System (open source) zur automatischen Indexierung der deutschen Sprache. Bei der Entwicklung von lingo standen hohe Konfigurierbarkeit und Flexibilität des Systems für unterschiedliche Einsatzmöglichkeiten im Vordergrund. Der Beitrag zeigt den Nutzen einer linguistisch basierten automatischen Indexierung für das Information Retrieval auf. Die für eine Retrievalverbesserung zur Verfügung stehende linguistische Funktionalität von lingo wird vorgestellt und an Beispielen erläutert: Grundformerkennung, Kompositumerkennung bzw. Kompositumzerlegung, Wortrelationierung, lexikalische und algorithmische Mehrwortgruppenerkennung, OCR-Fehlerkorrektur. Der offene Systemaufbau von lingo wird beschrieben, mögliche Einsatzszenarien und Anwendungsgrenzen werden benannt.
    Date
    24. 3.2006 12:22:02
  13. Salton, G.: Another look at automatic text-retrieval systems (1986) 0.03
    0.026390245 = product of:
      0.065975614 = sum of:
        0.046599183 = weight(_text_:system in 1356) [ClassicSimilarity], result of:
          0.046599183 = score(doc=1356,freq=2.0), product of:
            0.13391352 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.04251826 = queryNorm
            0.3479797 = fieldWeight in 1356, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.078125 = fieldNorm(doc=1356)
        0.01937643 = product of:
          0.05812929 = sum of:
            0.05812929 = weight(_text_:29 in 1356) [ClassicSimilarity], result of:
              0.05812929 = score(doc=1356,freq=2.0), product of:
                0.14956595 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.04251826 = queryNorm
                0.38865322 = fieldWeight in 1356, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.078125 = fieldNorm(doc=1356)
          0.33333334 = coord(1/3)
      0.4 = coord(2/5)
    
    Footnote
    Bezugnahme auf: Blair, D.C.: An evaluation of retrieval effectiveness for a full-text document-retrieval system. Comm. ACM 28(1985) S.280-299. - Vgl. auch: Blair, D.C.: Full text retrieval ... Int. Class. 13(1986) S.18-23; Blair, D.C., M.E. Maron: full-text information retrieval ... Inf. Proc. Man. 26(1990) S.437-447.
    Source
    Communications of the Association for Computing Machinery. 29(1986), S.648-656
  14. Leung, C.-H.; Kan, W.-K.: ¬A statistical learning approach to automatic indexing of controlled index terms (1997) 0.03
    0.026366552 = product of:
      0.13183276 = sum of:
        0.13183276 = weight(_text_:index in 6497) [ClassicSimilarity], result of:
          0.13183276 = score(doc=6497,freq=12.0), product of:
            0.18579477 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.04251826 = queryNorm
            0.7095612 = fieldWeight in 6497, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.046875 = fieldNorm(doc=6497)
      0.2 = coord(1/5)
    
    Abstract
    A statistical learning approach to assigning controlled index terms is presented. In this approach, there are two processes: (1) the learning process and (2) the indexing process. The learning process constructs a relationship between an index term and the words relevant and irrelevant to it, based on the positive training set and negative training set, and those not indexed by it, respectively. The indexing process determines whether an index term is assigned to a certain document, based on the relationship constructed by the learning process, and the text found in the document. Furthermore, a learning feedback technique is introduced. This technique used in the learning process modifies the relationship between an index term and its relevant and irrelevant words to improve the learning performance and, thus, the indexing performance. Experimental results have shown that the statistical learning approach and the learning feedback technique are practical means to automatic indexing of controlled index terms
  15. Biebricher, N.; Fuhr, N.; Lustig, G.; Schwantner, M.; Knorz, G.: ¬The automatic indexing system AIR/PHYS : from research to application (1988) 0.03
    0.026320523 = product of:
      0.06580131 = sum of:
        0.046599183 = weight(_text_:system in 1952) [ClassicSimilarity], result of:
          0.046599183 = score(doc=1952,freq=2.0), product of:
            0.13391352 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.04251826 = queryNorm
            0.3479797 = fieldWeight in 1952, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.078125 = fieldNorm(doc=1952)
        0.019202124 = product of:
          0.057606373 = sum of:
            0.057606373 = weight(_text_:22 in 1952) [ClassicSimilarity], result of:
              0.057606373 = score(doc=1952,freq=2.0), product of:
                0.1488917 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04251826 = queryNorm
                0.38690117 = fieldWeight in 1952, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=1952)
          0.33333334 = coord(1/3)
      0.4 = coord(2/5)
    
    Date
    16. 8.1998 12:51:22
  16. Vlachidis, A.; Tudhope, D.: ¬A knowledge-based approach to information extraction for semantic interoperability in the archaeology domain (2016) 0.03
    0.025459195 = product of:
      0.063647985 = sum of:
        0.040348392 = weight(_text_:context in 2895) [ClassicSimilarity], result of:
          0.040348392 = score(doc=2895,freq=2.0), product of:
            0.17622331 = queryWeight, product of:
              4.14465 = idf(docFreq=1904, maxDocs=44218)
              0.04251826 = queryNorm
            0.22896172 = fieldWeight in 2895, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.14465 = idf(docFreq=1904, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2895)
        0.023299592 = weight(_text_:system in 2895) [ClassicSimilarity], result of:
          0.023299592 = score(doc=2895,freq=2.0), product of:
            0.13391352 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.04251826 = queryNorm
            0.17398985 = fieldWeight in 2895, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2895)
      0.4 = coord(2/5)
    
    Abstract
    The article presents a method for automatic semantic indexing of archaeological grey-literature reports using empirical (rule-based) Information Extraction techniques in combination with domain-specific knowledge organization systems. The semantic annotation system (OPTIMA) performs the tasks of Named Entity Recognition, Relation Extraction, Negation Detection, and Word-Sense Disambiguation using hand-crafted rules and terminological resources for associating contextual abstractions with classes of the standard ontology CIDOC Conceptual Reference Model (CRM) for cultural heritage and its archaeological extension, CRM-EH. Relation Extraction (RE) performance benefits from a syntactic-based definition of RE patterns derived from domain oriented corpus analysis. The evaluation also shows clear benefit in the use of assistive natural language processing (NLP) modules relating to Word-Sense Disambiguation, Negation Detection, and Noun Phrase Validation, together with controlled thesaurus expansion. The semantic indexing results demonstrate the capacity of rule-based Information Extraction techniques to deliver interoperable semantic abstractions (semantic annotations) with respect to the CIDOC CRM and archaeological thesauri. Major contributions include recognition of relevant entities using shallow parsing NLP techniques driven by a complimentary use of ontological and terminological domain resources and empirical derivation of context-driven RE rules for the recognition of semantic relationships from phrases of unstructured text.
  17. Thönssen, B.: Automatische Indexierung und Schnittstellen zu Thesauri (1988) 0.03
    0.025371227 = product of:
      0.12685613 = sum of:
        0.12685613 = weight(_text_:index in 30) [ClassicSimilarity], result of:
          0.12685613 = score(doc=30,freq=4.0), product of:
            0.18579477 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.04251826 = queryNorm
            0.6827756 = fieldWeight in 30, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.078125 = fieldNorm(doc=30)
      0.2 = coord(1/5)
    
    Abstract
    Über eine Schnittstelle zwischen Programmen zur automatischen Indexierung (PRIMUS-IDX) und zur maschinellen Thesaurusverwaltung (INDEX) sollen große Textmengen schnell, kostengünstig und konsistent erschlossen und verbesserte Recherchemöglichkeiten geschaffen werden. Zielvorstellung ist ein Verfahren, das auf PCs ablauffähig ist und speziell deutschsprachige Texte bearbeiten kann
    Object
    INDEX
  18. Moens, M.F.: Automatic indexing and abstracting of document texts (2000) 0.03
    0.025371227 = product of:
      0.12685613 = sum of:
        0.12685613 = weight(_text_:index in 6892) [ClassicSimilarity], result of:
          0.12685613 = score(doc=6892,freq=4.0), product of:
            0.18579477 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.04251826 = queryNorm
            0.6827756 = fieldWeight in 6892, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.078125 = fieldNorm(doc=6892)
      0.2 = coord(1/5)
    
    Content
    Need for indexing and abstracting texts; attributes of texts; text representations and their use; selection of natural language index terms; assignment of controlled language index texts; automatic abstracting; applications
  19. Bordoni, L.; Pazienza, M.T.: Documents automatic indexing in an environmental domain (1997) 0.02
    0.023828931 = product of:
      0.059572328 = sum of:
        0.04613084 = weight(_text_:system in 530) [ClassicSimilarity], result of:
          0.04613084 = score(doc=530,freq=4.0), product of:
            0.13391352 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.04251826 = queryNorm
            0.34448233 = fieldWeight in 530, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.0546875 = fieldNorm(doc=530)
        0.013441487 = product of:
          0.04032446 = sum of:
            0.04032446 = weight(_text_:22 in 530) [ClassicSimilarity], result of:
              0.04032446 = score(doc=530,freq=2.0), product of:
                0.1488917 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04251826 = queryNorm
                0.2708308 = fieldWeight in 530, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=530)
          0.33333334 = coord(1/3)
      0.4 = coord(2/5)
    
    Abstract
    Describes an application of Natural Language Processing (NLP) techniques, in HIRMA (Hypertextual Information Retrieval Managed by ARIOSTO), to the problem of document indexing by referring to a system which incorporates natural language processing techniques to determine the subject of the text of documents and to associate them with relevant semantic indexes. Describes briefly the overall system, details of its implementation on a corpus of scientific abstracts related to environmental topics and experimental evidence of the system's behaviour. Analyzes in detail an experiment designed to evaluate the system's retrieval ability in terms of recall and precision
    Source
    International forum on information and documentation. 22(1997) no.1, S.17-28
  20. Cohen, J.D.: Highlights: language- and domain-independent automatic indexing terms for abstracting (1995) 0.02
    0.021751298 = product of:
      0.10875649 = sum of:
        0.10875649 = weight(_text_:index in 1793) [ClassicSimilarity], result of:
          0.10875649 = score(doc=1793,freq=6.0), product of:
            0.18579477 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.04251826 = queryNorm
            0.5853582 = fieldWeight in 1793, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1793)
      0.2 = coord(1/5)
    
    Abstract
    Presents a model of drawing index terms from text. The approach uses no stop list, stemmer, or other language and domain specific component, allowing operation in any language or domain with only trivial modification. The method uses n-grams counts, achieving a function similar to, but more general than, a stemmer. The generated index terms, called 'highlights', are suitable for identifying the topic for perusal and selection. An extension is also described and demonstrated which selects index terms to represent a subset of documents, distinguishing them from the corpus. Presents some experimental results, showing operation in English, Spanish, German, Georgian, Russian and Japanese

Years

Languages

Types

  • a 146
  • el 13
  • x 8
  • m 4
  • s 3
  • d 1
  • p 1
  • More… Less…