Search (143 results, page 1 of 8)

Ward, M.L.: ¬The future of the human indexer (1996) 0.10

0.104232416 = product of:
  0.15634862 = sum of:
    0.023478512 = weight(_text_:science in 7244) [ClassicSimilarity], result of:
      0.023478512 = score(doc=7244,freq=2.0), product of:
        0.13445559 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.05104385 = queryNorm
        0.17461908 = fieldWeight in 7244, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.046875 = fieldNorm(doc=7244)
    0.13287011 = sum of:
      0.09137568 = weight(_text_:index in 7244) [ClassicSimilarity], result of:
        0.09137568 = score(doc=7244,freq=4.0), product of:
          0.22304957 = queryWeight, product of:
            4.369764 = idf(docFreq=1520, maxDocs=44218)
            0.05104385 = queryNorm
          0.40966535 = fieldWeight in 7244, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            4.369764 = idf(docFreq=1520, maxDocs=44218)
            0.046875 = fieldNorm(doc=7244)
      0.04149442 = weight(_text_:22 in 7244) [ClassicSimilarity], result of:
        0.04149442 = score(doc=7244,freq=2.0), product of:
          0.17874686 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05104385 = queryNorm
          0.23214069 = fieldWeight in 7244, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=7244)
  0.6666667 = coord(2/3)

Abstract: Considers the principles of indexing and the intellectual skills involved in order to determine what automatic indexing systems would be required in order to supplant or complement the human indexer. Good indexing requires: considerable prior knowledge of the literature; judgement as to what to index and what depth to index; reading skills; abstracting skills; and classification skills, Illustrates these features with a detailed description of abstracting and indexing processes involved in generating entries for the mechanical engineering database POWERLINK. Briefly assesses the possibility of replacing human indexers with specialist indexing software, with particular reference to the Object Analyzer from the InTEXT automatic indexing system and using the criteria described for human indexers. At present, it is unlikely that the automatic indexer will replace the human indexer, but when more primary texts are available in electronic form, it may be a useful productivity tool for dealing with large quantities of low grade texts (should they be wanted in the database)
Date: 9. 2.1997 18:44:22
Source: Journal of librarianship and information science. 28(1996) no.4, S.217-225

Hauer, M.: Automatische Indexierung (2000) 0.07

0.07073785 = product of:
  0.21221356 = sum of:
    0.21221356 = sum of:
      0.12922472 = weight(_text_:index in 5887) [ClassicSimilarity], result of:
        0.12922472 = score(doc=5887,freq=2.0), product of:
          0.22304957 = queryWeight, product of:
            4.369764 = idf(docFreq=1520, maxDocs=44218)
            0.05104385 = queryNorm
          0.5793543 = fieldWeight in 5887, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            4.369764 = idf(docFreq=1520, maxDocs=44218)
            0.09375 = fieldNorm(doc=5887)
      0.08298884 = weight(_text_:22 in 5887) [ClassicSimilarity], result of:
        0.08298884 = score(doc=5887,freq=2.0), product of:
          0.17874686 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05104385 = queryNorm
          0.46428138 = fieldWeight in 5887, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.09375 = fieldNorm(doc=5887)
  0.33333334 = coord(1/3)

Object: Index-5.0
Source: Wissen in Aktion: Wege des Knowledge Managements. 22. Online-Tagung der DGI, Frankfurt am Main, 2.-4.5.2000. Proceedings. Hrsg.: R. Schmidt

Leung, C.-H.; Kan, W.-K.: ¬A statistical learning approach to automatic indexing of controlled index terms (1997) 0.07
```
0.06840812 = product of:
  0.10261217 = sum of:
    0.023478512 = weight(_text_:science in 6497) [ClassicSimilarity], result of:
      0.023478512 = score(doc=6497,freq=2.0), product of:
        0.13445559 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.05104385 = queryNorm
        0.17461908 = fieldWeight in 6497, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.046875 = fieldNorm(doc=6497)
    0.07913366 = product of:
      0.15826732 = sum of:
        0.15826732 = weight(_text_:index in 6497) [ClassicSimilarity], result of:
          0.15826732 = score(doc=6497,freq=12.0), product of:
            0.22304957 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.05104385 = queryNorm
            0.7095612 = fieldWeight in 6497, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.046875 = fieldNorm(doc=6497)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

A statistical learning approach to assigning controlled index terms is presented. In this approach, there are two processes: (1) the learning process and (2) the indexing process. The learning process constructs a relationship between an index term and the words relevant and irrelevant to it, based on the positive training set and negative training set, and those not indexed by it, respectively. The indexing process determines whether an index term is assigned to a certain document, based on the relationship constructed by the learning process, and the text found in the document. Furthermore, a learning feedback technique is introduced. This technique used in the learning process modifies the relationship between an index term and its relevant and irrelevant words to improve the learning performance and, thus, the indexing performance. Experimental results have shown that the statistical learning approach and the learning feedback technique are practical means to automatic indexing of controlled index terms

Source

Journal of the American Society for Information Science. 48(1997) no.1, S.55-66

Cohen, J.D.: Highlights: language- and domain-independent automatic indexing terms for abstracting (1995) 0.06

0.061782368 = product of:
  0.09267355 = sum of:
    0.027391598 = weight(_text_:science in 1793) [ClassicSimilarity], result of:
      0.027391598 = score(doc=1793,freq=2.0), product of:
        0.13445559 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.05104385 = queryNorm
        0.20372227 = fieldWeight in 1793, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1793)
    0.06528195 = product of:
      0.1305639 = sum of:
        0.1305639 = weight(_text_:index in 1793) [ClassicSimilarity], result of:
          0.1305639 = score(doc=1793,freq=6.0), product of:
            0.22304957 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.05104385 = queryNorm
            0.5853582 = fieldWeight in 1793, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1793)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: Presents a model of drawing index terms from text. The approach uses no stop list, stemmer, or other language and domain specific component, allowing operation in any language or domain with only trivial modification. The method uses n-grams counts, achieving a function similar to, but more general than, a stemmer. The generated index terms, called 'highlights', are suitable for identifying the topic for perusal and selection. An extension is also described and demonstrated which selects index terms to represent a subset of documents, distinguishing them from the corpus. Presents some experimental results, showing operation in English, Spanish, German, Georgian, Russian and Japanese
Source: Journal of the American Society for Information Science. 46(1995) no.3, S.162-174

Faraj, N.: Analyse d'une methode d'indexation automatique basée sur une analyse syntaxique de texte (1996) 0.05

0.049586397 = product of:
  0.07437959 = sum of:
    0.031304684 = weight(_text_:science in 685) [ClassicSimilarity], result of:
      0.031304684 = score(doc=685,freq=2.0), product of:
        0.13445559 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.05104385 = queryNorm
        0.23282544 = fieldWeight in 685, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.0625 = fieldNorm(doc=685)
    0.043074906 = product of:
      0.08614981 = sum of:
        0.08614981 = weight(_text_:index in 685) [ClassicSimilarity], result of:
          0.08614981 = score(doc=685,freq=2.0), product of:
            0.22304957 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.05104385 = queryNorm
            0.3862362 = fieldWeight in 685, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.0625 = fieldNorm(doc=685)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: Evaluates an automatic indexing method based on syntactical text analysis combined with statistical analysis. Tests many combinations for the choice of term categories and weighting methods. The experiment, conducted on a software engineering corpus, shows systematic improvement in the use of syntactic term phrases compared to using only individual words as index terms
Source: Canadian journal of information and library science. 21(1996) no.1, S.1-21

Garfield, E.: ¬The relationship between mechanical indexing, structural linguistics and information retrieval (1992) 0.05

0.049586397 = product of:
  0.07437959 = sum of:
    0.031304684 = weight(_text_:science in 3632) [ClassicSimilarity], result of:
      0.031304684 = score(doc=3632,freq=2.0), product of:
        0.13445559 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.05104385 = queryNorm
        0.23282544 = fieldWeight in 3632, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.0625 = fieldNorm(doc=3632)
    0.043074906 = product of:
      0.08614981 = sum of:
        0.08614981 = weight(_text_:index in 3632) [ClassicSimilarity], result of:
          0.08614981 = score(doc=3632,freq=2.0), product of:
            0.22304957 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.05104385 = queryNorm
            0.3862362 = fieldWeight in 3632, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.0625 = fieldNorm(doc=3632)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: It is possible to locate over 60% of indexing terms used in the Current List of Medical Literature by analysing the titles of the articles. Citation indexes contain 'noise' and lack many pertinent citations. Mechanical indexing or analysis of text must begin with some linguistic technique. Discusses Harris' methods of structural linguistics, discourse analysis and transformational analysis. Provides 3 examples with references, abstracts and index entries
Source: Journal of information science. 18(1992) no.5, S.343-354

Stankovic, R. et al.: Indexing of textual databases based on lexical resources : a case study for Serbian (2016) 0.05

0.049139693 = product of:
  0.07370954 = sum of:
    0.039130855 = weight(_text_:science in 2759) [ClassicSimilarity], result of:
      0.039130855 = score(doc=2759,freq=2.0), product of:
        0.13445559 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.05104385 = queryNorm
        0.2910318 = fieldWeight in 2759, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.078125 = fieldNorm(doc=2759)
    0.034578685 = product of:
      0.06915737 = sum of:
        0.06915737 = weight(_text_:22 in 2759) [ClassicSimilarity], result of:
          0.06915737 = score(doc=2759,freq=2.0), product of:
            0.17874686 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05104385 = queryNorm
            0.38690117 = fieldWeight in 2759, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2759)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Date: 1. 2.2016 18:25:22
Series: Lecture notes in computer science ; 9398

Hmeidi, I.; Kanaan, G.; Evens, M.: Design and implementation of automatic indexing for information retrieval with Arabic documents (1997) 0.04

0.04367321 = product of:
  0.06550981 = sum of:
    0.03320363 = weight(_text_:science in 1660) [ClassicSimilarity], result of:
      0.03320363 = score(doc=1660,freq=4.0), product of:
        0.13445559 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.05104385 = queryNorm
        0.24694869 = fieldWeight in 1660, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.046875 = fieldNorm(doc=1660)
    0.03230618 = product of:
      0.06461236 = sum of:
        0.06461236 = weight(_text_:index in 1660) [ClassicSimilarity], result of:
          0.06461236 = score(doc=1660,freq=2.0), product of:
            0.22304957 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.05104385 = queryNorm
            0.28967714 = fieldWeight in 1660, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.046875 = fieldNorm(doc=1660)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: A corpus of 242 abstracts of Arabic documents on computer science and information systems using the Proceedings of the Saudi Arabian National Conferences as a source was put together. Reports on the design and building of an automatic information retrieval system from scratch to handle Arabic data. Both automatic and manual indexing techniques were implemented. Experiments using measures of recall and precision has demonstrated that automatic indexing is at least as effective as manual indexing and more effective in some cases. Automatic indexing is both cheaper and faster. Results suggests that a wider coverage of the literature can be achieved with less money and produce as good results as with manual indexing. Compares the retrieval results using words as index terms versus stems and roots, and confirms the results obtained by Al-Kharashi and Abu-Salem with smaller corpora that root indexing is more effective than word indexing
Source: Journal of the American Society for Information Science. 48(1997) no.10, S.867-881

Srinivasan, P.: On generalizing the Two-Poisson Model (1990) 0.04

0.043388095 = product of:
  0.06508214 = sum of:
    0.027391598 = weight(_text_:science in 2880) [ClassicSimilarity], result of:
      0.027391598 = score(doc=2880,freq=2.0), product of:
        0.13445559 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.05104385 = queryNorm
        0.20372227 = fieldWeight in 2880, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2880)
    0.037690543 = product of:
      0.075381085 = sum of:
        0.075381085 = weight(_text_:index in 2880) [ClassicSimilarity], result of:
          0.075381085 = score(doc=2880,freq=2.0), product of:
            0.22304957 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.05104385 = queryNorm
            0.33795667 = fieldWeight in 2880, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2880)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: Automatic indexing is one of the important functions of a modern document retrieval system. Numerous techniques for this function have been proposed in the literature ranging from purely statistical to linguistically complex mechanisms. Most result from examining properties of terms. Examines term distribution within the framework of the Poisson models. Specifically examines the effectiveness of the Two-Poisson and the Three-Poisson model to see if generalisation results in increased effectiveness. The results show that the Two-Poisson model is only moderately effective in identifying index terms. In addition, generalisation to the Three-Poisson does not give any additional power. The only Poisson model which consistently works well is the basic One-Poisson model. Also discusses term distribution information.
Source: Journal of the American Society for Information Science. 41(1990) no.1, S.61-66

Hodges, P.R.: Keyword in title indexes : effectiveness of retrieval in computer searches (1983) 0.03

0.03439779 = product of:
  0.05159668 = sum of:
    0.027391598 = weight(_text_:science in 5001) [ClassicSimilarity], result of:
      0.027391598 = score(doc=5001,freq=2.0), product of:
        0.13445559 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.05104385 = queryNorm
        0.20372227 = fieldWeight in 5001, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5001)
    0.02420508 = product of:
      0.04841016 = sum of:
        0.04841016 = weight(_text_:22 in 5001) [ClassicSimilarity], result of:
          0.04841016 = score(doc=5001,freq=2.0), product of:
            0.17874686 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05104385 = queryNorm
            0.2708308 = fieldWeight in 5001, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5001)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: A study was done to test the effectiveness of retrieval using title word searching. It was based on actual search profiles used in the Mechanized Information Center at Ohio State University, in order ro replicate as closely as possible actual searching conditions. Fewer than 50% of the relevant titles were retrieved by keywords in titles. The low rate of retrieval can be attributes to three sources: titles themselves, user and information specialist ignorance of the subject vocabulary in use, and to general language problems. Across fields it was found that the social sciences had the best retrieval rate, with science having the next best, and arts and humanities the lowest. Ways to enhance and supplement keyword in title searching on the computer and in printed indexes are discussed.
Date: 14. 3.1996 13:22:21

Newman, D.J.; Block, S.: Probabilistic topic decomposition of an eighteenth-century American newspaper (2006) 0.03

0.03439779 = product of:
  0.05159668 = sum of:
    0.027391598 = weight(_text_:science in 5291) [ClassicSimilarity], result of:
      0.027391598 = score(doc=5291,freq=2.0), product of:
        0.13445559 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.05104385 = queryNorm
        0.20372227 = fieldWeight in 5291, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5291)
    0.02420508 = product of:
      0.04841016 = sum of:
        0.04841016 = weight(_text_:22 in 5291) [ClassicSimilarity], result of:
          0.04841016 = score(doc=5291,freq=2.0), product of:
            0.17874686 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05104385 = queryNorm
            0.2708308 = fieldWeight in 5291, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5291)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Date: 22. 7.2006 17:32:00
Source: Journal of the American Society for Information Science and Technology. 57(2006) no.6, S.753-767

Milstead, J.L.: Thesauri in a full-text world (1998) 0.03

0.029972691 = product of:
  0.044959035 = sum of:
    0.027669692 = weight(_text_:science in 2337) [ClassicSimilarity], result of:
      0.027669692 = score(doc=2337,freq=4.0), product of:
        0.13445559 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.05104385 = queryNorm
        0.20579056 = fieldWeight in 2337, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2337)
    0.017289342 = product of:
      0.034578685 = sum of:
        0.034578685 = weight(_text_:22 in 2337) [ClassicSimilarity], result of:
          0.034578685 = score(doc=2337,freq=2.0), product of:
            0.17874686 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05104385 = queryNorm
            0.19345059 = fieldWeight in 2337, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2337)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Date: 22. 9.1997 19:16:05
Imprint: Urbana-Champaign, IL : Illinois University at Urbana-Champaign, Graduate School of Library and Information Science
Source: Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al

Salton, G.: Automatic processing of foreign language documents (1985) 0.03
```
0.029115472 = product of:
  0.043673206 = sum of:
    0.022135753 = weight(_text_:science in 3650) [ClassicSimilarity], result of:
      0.022135753 = score(doc=3650,freq=4.0), product of:
        0.13445559 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.05104385 = queryNorm
        0.16463245 = fieldWeight in 3650, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.03125 = fieldNorm(doc=3650)
    0.021537453 = product of:
      0.043074906 = sum of:
        0.043074906 = weight(_text_:index in 3650) [ClassicSimilarity], result of:
          0.043074906 = score(doc=3650,freq=2.0), product of:
            0.22304957 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.05104385 = queryNorm
            0.1931181 = fieldWeight in 3650, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.03125 = fieldNorm(doc=3650)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

The attempt to computerize a process, such as indexing, abstracting, classifying, or retrieving information, begins with an analysis of the process into its intellectual and nonintellectual components. That part of the process which is amenable to computerization is mechanical or algorithmic. What is not is intellectual or creative and requires human intervention. Gerard Salton has been an innovator, experimenter, and promoter in the area of mechanized information systems since the early 1960s. He has been particularly ingenious at analyzing the process of information retrieval into its algorithmic components. He received a doctorate in applied mathematics from Harvard University before moving to the computer science department at Cornell, where he developed a prototype automatic retrieval system called SMART. Working with this system he and his students contributed for over a decade to our theoretical understanding of the retrieval process. On a more practical level, they have contributed design criteria for operating retrieval systems. The following selection presents one of the early descriptions of the SMART system; it is valuable as it shows the direction automatic retrieval methods were to take beyond simple word-matching techniques. These include various word normalization techniques to improve recall, for instance, the separation of words into stems and affixes; the correlation and clustering, using statistical association measures, of related terms; and the identification, using a concept thesaurus, of synonymous, broader, narrower, and sibling terms. They include, as weIl, techniques, both linguistic and statistical, to deal with the thorny problem of how to automatically extract from texts index terms that consist of more than one word. They include weighting techniques and various documentrequest matching algorithms. Significant among the latter are those which produce a retrieval output of citations ranked in relevante order. During the 1970s, Salton and his students went an to further refine these various techniques, particularly the weighting and statistical association measures. Many of their early innovations seem commonplace today. Some of their later techniques are still ahead of their time and await technological developments for implementation. The particular focus of the selection that follows is an the evaluation of a particular component of the SMART system, a multilingual thesaurus. By mapping English language expressions and their German equivalents to a common concept number, the thesaurus permitted the automatic processing of German language documents against English language queries and vice versa. The results of the evaluation, as it turned out, were somewhat inconclusive. However, this SMART experiment suggested in a bold and optimistic way how one might proceed to answer such complex questions as What is meant by retrieval language compatability? How it is to be achieved, and how evaluated?

Footnote

Original in: Journal of the American Society for Information Science 21(1970) no.3, S.187-194.

Sparck Jones, K.: Index term weighting (1973) 0.03

0.028716605 = product of:
  0.08614981 = sum of:
    0.08614981 = product of:
      0.17229962 = sum of:
        0.17229962 = weight(_text_:index in 5491) [ClassicSimilarity], result of:
          0.17229962 = score(doc=5491,freq=2.0), product of:
            0.22304957 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.05104385 = queryNorm
            0.7724724 = fieldWeight in 5491, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.125 = fieldNorm(doc=5491)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Thönssen, B.: Automatische Indexierung und Schnittstellen zu Thesauri (1988) 0.03

0.025382135 = product of:
  0.0761464 = sum of:
    0.0761464 = product of:
      0.1522928 = sum of:
        0.1522928 = weight(_text_:index in 30) [ClassicSimilarity], result of:
          0.1522928 = score(doc=30,freq=4.0), product of:
            0.22304957 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.05104385 = queryNorm
            0.6827756 = fieldWeight in 30, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.078125 = fieldNorm(doc=30)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Abstract: Über eine Schnittstelle zwischen Programmen zur automatischen Indexierung (PRIMUS-IDX) und zur maschinellen Thesaurusverwaltung (INDEX) sollen große Textmengen schnell, kostengünstig und konsistent erschlossen und verbesserte Recherchemöglichkeiten geschaffen werden. Zielvorstellung ist ein Verfahren, das auf PCs ablauffähig ist und speziell deutschsprachige Texte bearbeiten kann
Object: INDEX

Moens, M.F.: Automatic indexing and abstracting of document texts (2000) 0.03

0.025382135 = product of:
  0.0761464 = sum of:
    0.0761464 = product of:
      0.1522928 = sum of:
        0.1522928 = weight(_text_:index in 6892) [ClassicSimilarity], result of:
          0.1522928 = score(doc=6892,freq=4.0), product of:
            0.22304957 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.05104385 = queryNorm
            0.6827756 = fieldWeight in 6892, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.078125 = fieldNorm(doc=6892)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Content: Need for indexing and abstracting texts; attributes of texts; text representations and their use; selection of natural language index terms; assignment of controlled language index texts; automatic abstracting; applications

Schneider, A.: Moderne Retrievalverfahren in klassischen bibliotheksbezogenen Anwendungen : Projekte und Perspektiven (2008) 0.02
```
0.024793198 = product of:
  0.037189797 = sum of:
    0.015652342 = weight(_text_:science in 4031) [ClassicSimilarity], result of:
      0.015652342 = score(doc=4031,freq=2.0), product of:
        0.13445559 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.05104385 = queryNorm
        0.11641272 = fieldWeight in 4031, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.03125 = fieldNorm(doc=4031)
    0.021537453 = product of:
      0.043074906 = sum of:
        0.043074906 = weight(_text_:index in 4031) [ClassicSimilarity], result of:
          0.043074906 = score(doc=4031,freq=2.0), product of:
            0.22304957 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.05104385 = queryNorm
            0.1931181 = fieldWeight in 4031, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.03125 = fieldNorm(doc=4031)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Die vorliegende Arbeit beschäftigt sich mit modernen Retrievalverfahren in klassischen bibliotheksbezogenen Anwendungen. Wie die Verbindung der beiden gegensätzlich scheinenden Wortgruppen im Titel zeigt, werden in der Arbeit Aspekte aus der Informatik bzw. Informationswissenschaft mit Aspekten aus der Bibliothekstradition verknüpft. Nach einer kurzen Schilderung der Ausgangslage, der so genannten Informationsflut, im ersten Kapitel stellt das zweite Kapitel eine Einführung in die Theorie des Information Retrieval dar. Im Einzelnen geht es um die Grundlagen von Information Retrieval und Information-Retrieval-Systemen sowie um die verschiedenen Möglichkeiten der Informationserschließung. Hier werden Formal- und Sacherschließung, Indexierung und automatische Indexierung behandelt. Des Weiteren werden im Rahmen der Theorie des Information Retrieval unterschiedliche Information-Retrieval-Modelle und die Evaluation durch Retrievaltests vorgestellt. Nach der Theorie folgt im dritten Kapitel die Praxis des Information Retrieval. Es werden die organisationsinterne Anwendung, die Anwendung im Informations- und Dokumentationsbereich sowie die Anwendung im Bibliotheksbereich unterschieden. Die organisationsinterne Anwendung wird durch das Beispiel der Datenbank KURS zur Aus- und Weiterbildung veranschaulicht. Die Anwendung im Bibliotheksbereich bezieht sich in erster Linie auf den OPAC als Kompromiss zwischen bibliothekarischer Indexierung und Endnutzeranforderungen und auf seine Anreicherung (sog. Catalogue Enrichment), um das Retrieval zu verbessern. Der Bibliotheksbereich wird ausführlicher behandelt, indem ein Rückblick auf abgeschlossene Projekte zu Informations- und Indexierungssystemen aus den Neunziger Jahren (OSIRIS, MILOS I und II, KASCADE) sowie ein Einblick in aktuelle Projekte gegeben werden. In den beiden folgenden Kapiteln wird je ein aktuelles Projekt zur Verbesserung des Retrievals durch Kataloganreicherung, automatische Erschließung und fortschrittliche Retrievalverfahren präsentiert: das Suchportal dandelon.com und das 180T-Projekt des Hochschulbibliothekszentrums des Landes Nordrhein-Westfalen. Hierbei werden jeweils Projektziel, Projektpartner, Projektorganisation, Projektverlauf und die verwendete Technologie vorgestellt. Die Projekte unterscheiden sich insofern, dass in dem einen Fall eine große Verbundzentrale die Projektkoordination übernimmt, im anderen Fall jede einzelne teilnehmende Bibliothek selbst für die Durchführung verantwortlich ist. Im sechsten und letzten Kapitel geht es um das Fazit und die Perspektiven. Es werden sowohl die beiden beschriebenen Projekte bewertet als auch ein Ausblick auf Entwicklungen bezüglich des Bibliothekskatalogs gegeben. Diese Veröffentlichung geht zurück auf eine Master-Arbeit im postgradualen Fernstudiengang Master of Arts (Library and Information Science) an der Humboldt-Universität zu Berlin.

Object

IC Index

Plaunt, C.; Norgard, B.A.: ¬An association-based method for automatic indexing with a controlled vocabulary (1998) 0.02

0.024569847 = product of:
  0.03685477 = sum of:
    0.019565428 = weight(_text_:science in 1794) [ClassicSimilarity], result of:
      0.019565428 = score(doc=1794,freq=2.0), product of:
        0.13445559 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.05104385 = queryNorm
        0.1455159 = fieldWeight in 1794, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1794)
    0.017289342 = product of:
      0.034578685 = sum of:
        0.034578685 = weight(_text_:22 in 1794) [ClassicSimilarity], result of:
          0.034578685 = score(doc=1794,freq=2.0), product of:
            0.17874686 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05104385 = queryNorm
            0.19345059 = fieldWeight in 1794, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1794)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Date: 11. 9.2000 19:53:22
Source: Journal of the American Society for Information Science. 49(1998) no.10, S.888-902

Junger, U.; Schwens, U.: ¬Die inhaltliche Erschließung des schriftlichen kulturellen Erbes auf dem Weg in die Zukunft : Automatische Vergabe von Schlagwörtern in der Deutschen Nationalbibliothek (2017) 0.02
```
0.024569847 = product of:
  0.03685477 = sum of:
    0.019565428 = weight(_text_:science in 3780) [ClassicSimilarity], result of:
      0.019565428 = score(doc=3780,freq=2.0), product of:
        0.13445559 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.05104385 = queryNorm
        0.1455159 = fieldWeight in 3780, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3780)
    0.017289342 = product of:
      0.034578685 = sum of:
        0.034578685 = weight(_text_:22 in 3780) [ClassicSimilarity], result of:
          0.034578685 = score(doc=3780,freq=2.0), product of:
            0.17874686 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05104385 = queryNorm
            0.19345059 = fieldWeight in 3780, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3780)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Wir leben im 21. Jahrhundert, und vieles, was vor hundert und noch vor fünfzig Jahren als Science Fiction abgetan worden wäre, ist mittlerweile Realität. Raumsonden fliegen zum Mars, machen dort Experimente und liefern Daten zur Erde zurück. Roboter werden für Routineaufgaben eingesetzt, zum Beispiel in der Industrie oder in der Medizin. Digitalisierung, künstliche Intelligenz und automatisierte Verfahren sind kaum mehr aus unserem Alltag wegzudenken. Grundlage vieler Prozesse sind lernende Algorithmen. Die fortschreitende digitale Transformation ist global und umfasst alle Lebens- und Arbeitsbereiche: Wirtschaft, Gesellschaft und Politik. Sie eröffnet neue Möglichkeiten, von denen auch Bibliotheken profitieren. Der starke Anstieg digitaler Publikationen, die einen wichtigen und prozentual immer größer werdenden Teil des Kulturerbes darstellen, sollte für Bibliotheken Anlass sein, diese Möglichkeiten aktiv aufzugreifen und einzusetzen. Die Auswertbarkeit digitaler Inhalte, beispielsweise durch Text- and Data-Mining (TDM), und die Entwicklung technischer Verfahren, mittels derer Inhalte miteinander vernetzt und semantisch in Beziehung gesetzt werden können, bieten Raum, auch bibliothekarische Erschließungsverfahren neu zu denken. Daher beschäftigt sich die Deutsche Nationalbibliothek (DNB) seit einigen Jahren mit der Frage, wie sich die Prozesse bei der Erschließung von Medienwerken verbessern und maschinell unterstützen lassen. Sie steht dabei im regelmäßigen kollegialen Austausch mit anderen Bibliotheken, die sich ebenfalls aktiv mit dieser Fragestellung befassen, sowie mit europäischen Nationalbibliotheken, die ihrerseits Interesse an dem Thema und den Erfahrungen der DNB haben. Als Nationalbibliothek mit umfangreichen Beständen an digitalen Publikationen hat die DNB auch Expertise bei der digitalen Langzeitarchivierung aufgebaut und ist im Netzwerk ihrer Partner als kompetente Gesprächspartnerin geschätzt.

Date

19. 8.2017 9:24:22
O'Kane, K.C.: Generating hierarchical document indices from common denominators in large document collections (1996) 0.02
```
0.02176065 = product of:
  0.06528195 = sum of:
    0.06528195 = product of:
      0.1305639 = sum of:
        0.1305639 = weight(_text_:index in 4037) [ClassicSimilarity], result of:
          0.1305639 = score(doc=4037,freq=6.0), product of:
            0.22304957 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.05104385 = queryNorm
            0.5853582 = fieldWeight in 4037, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4037)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

Describes an effective, simple and efficient algorithm for computer generation of hierarchical indices from Document Term matrices by means of calculating common denominator vectors from the document vector set. This procedure produces an intuitive, user friendly hierarchical index of a document collection not unlike that which would be expected had a manual indexer set about to create an index or outline of a collection. The resulting index, when presented with a graphical user interface, provides the user with a natural easily comprehended view of the document collection, permits general browsing and informal search activities with an access method that requires no keyboard entry or prior knowledge of the vocabulary

Search (143 results, page 1 of 8)

Authors

Years

Languages

Types

Themes