Search (106 results, page 1 of 6)

Hauer, M.: Automatische Indexierung (2000) 0.11

0.11334069 = product of:
  0.17001103 = sum of:
    0.12868872 = weight(_text_:index in 5887) [ClassicSimilarity], result of:
      0.12868872 = score(doc=5887,freq=2.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.5793543 = fieldWeight in 5887, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.09375 = fieldNorm(doc=5887)
    0.04132231 = product of:
      0.08264462 = sum of:
        0.08264462 = weight(_text_:22 in 5887) [ClassicSimilarity], result of:
          0.08264462 = score(doc=5887,freq=2.0), product of:
            0.17800546 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05083213 = queryNorm
            0.46428138 = fieldWeight in 5887, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=5887)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Object: Index-5.0
Source: Wissen in Aktion: Wege des Knowledge Managements. 22. Online-Tagung der DGI, Frankfurt am Main, 2.-4.5.2000. Proceedings. Hrsg.: R. Schmidt

Ward, M.L.: ¬The future of the human indexer (1996) 0.11

0.110997304 = product of:
  0.16649595 = sum of:
    0.09099667 = weight(_text_:index in 7244) [ClassicSimilarity], result of:
      0.09099667 = score(doc=7244,freq=4.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.40966535 = fieldWeight in 7244, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.046875 = fieldNorm(doc=7244)
    0.075499274 = sum of:
      0.034176964 = weight(_text_:classification in 7244) [ClassicSimilarity], result of:
        0.034176964 = score(doc=7244,freq=2.0), product of:
          0.16188543 = queryWeight, product of:
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.05083213 = queryNorm
          0.21111822 = fieldWeight in 7244, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.046875 = fieldNorm(doc=7244)
      0.04132231 = weight(_text_:22 in 7244) [ClassicSimilarity], result of:
        0.04132231 = score(doc=7244,freq=2.0), product of:
          0.17800546 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05083213 = queryNorm
          0.23214069 = fieldWeight in 7244, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=7244)
  0.6666667 = coord(2/3)

Abstract: Considers the principles of indexing and the intellectual skills involved in order to determine what automatic indexing systems would be required in order to supplant or complement the human indexer. Good indexing requires: considerable prior knowledge of the literature; judgement as to what to index and what depth to index; reading skills; abstracting skills; and classification skills, Illustrates these features with a detailed description of abstracting and indexing processes involved in generating entries for the mechanical engineering database POWERLINK. Briefly assesses the possibility of replacing human indexers with specialist indexing software, with particular reference to the Object Analyzer from the InTEXT automatic indexing system and using the criteria described for human indexers. At present, it is unlikely that the automatic indexer will replace the human indexer, but when more primary texts are available in electronic form, it may be a useful productivity tool for dealing with large quantities of low grade texts (should they be wanted in the database)
Date: 9. 2.1997 18:44:22

Pirkola, A.: Morphological typology of languages for IR (2001) 0.07

0.07205677 = product of:
  0.10808515 = sum of:
    0.09099667 = weight(_text_:index in 4476) [ClassicSimilarity], result of:
      0.09099667 = score(doc=4476,freq=4.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.40966535 = fieldWeight in 4476, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.046875 = fieldNorm(doc=4476)
    0.017088482 = product of:
      0.034176964 = sum of:
        0.034176964 = weight(_text_:classification in 4476) [ClassicSimilarity], result of:
          0.034176964 = score(doc=4476,freq=2.0), product of:
            0.16188543 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.05083213 = queryNorm
            0.21111822 = fieldWeight in 4476, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=4476)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: This paper presents a morphological classification of languages from the IR perspective. Linguistic typology research has shown that the morphological complexity of every language in the world can be described by two variables, index of synthesis and index of fusion. These variables provide a theoretical basis for IR research handling morphological issues. A common theoretical framework is needed in particular because of the increasing significance of cross-language retrieval research and CLIR systems processing different languages. The paper elaborates the linguistic morphological typology for the purposes of IR research. It studies how the indexes of synthesis and fusion could be used as practical tools in mono- and cross-lingual IR research. The need for semantic and syntactic typologies is discussed. The paper also reviews studies made in different languages on the effects of morphology and stemming in IR.

Bloomfield, M.: Indexing : neglected and poorly understood (2001) 0.07

0.07205677 = product of:
  0.10808515 = sum of:
    0.09099667 = weight(_text_:index in 5439) [ClassicSimilarity], result of:
      0.09099667 = score(doc=5439,freq=4.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.40966535 = fieldWeight in 5439, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.046875 = fieldNorm(doc=5439)
    0.017088482 = product of:
      0.034176964 = sum of:
        0.034176964 = weight(_text_:classification in 5439) [ClassicSimilarity], result of:
          0.034176964 = score(doc=5439,freq=2.0), product of:
            0.16188543 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.05083213 = queryNorm
            0.21111822 = fieldWeight in 5439, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=5439)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: The growth of the Internet has highlighted the use of machine indexing. The difficulties in using the Internet as a searching device can be frustrating. The use of the term "Python" is given as an example. Machine indexing is noted as "rotten" and human indexing as "capricious." The problem seems to be a lack of a theoretical foundation for the art of indexing. What librarians have learned over the last hundred years has yet to yield a consistent approach to what really works best in preparing index terms and in the ability of our customers to search the various indexes. An attempt is made to consider the elements of indexing, their pros and cons. The argument is made that machine indexing is far too prolific in its production of index terms. Neither librarians nor computer programmers have made much progress to improve Internet indexing. Human indexing has had the same problems for over fifty years.
Source: Cataloging and classification quarterly. 33(2001) no.1, S.63-75

Asula, M.; Makke, J.; Freienthal, L.; Kuulmets, H.-A.; Sirel, R.: Kratt: developing an automatic subject indexing tool for the National Library of Estonia : how to transfer metadata information among work cluster members (2021) 0.07
```
0.07205677 = product of:
  0.10808515 = sum of:
    0.09099667 = weight(_text_:index in 723) [ClassicSimilarity], result of:
      0.09099667 = score(doc=723,freq=4.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.40966535 = fieldWeight in 723, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.046875 = fieldNorm(doc=723)
    0.017088482 = product of:
      0.034176964 = sum of:
        0.034176964 = weight(_text_:classification in 723) [ClassicSimilarity], result of:
          0.034176964 = score(doc=723,freq=2.0), product of:
            0.16188543 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.05083213 = queryNorm
            0.21111822 = fieldWeight in 723, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=723)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Manual subject indexing in libraries is a time-consuming and costly process and the quality of the assigned subjects is affected by the cataloger's knowledge on the specific topics contained in the book. Trying to solve these issues, we exploited the opportunities arising from artificial intelligence to develop Kratt: a prototype of an automatic subject indexing tool. Kratt is able to subject index a book independent of its extent and genre with a set of keywords present in the Estonian Subject Thesaurus. It takes Kratt approximately one minute to subject index a book, outperforming humans 10-15 times. Although the resulting keywords were not considered satisfactory by the catalogers, the ratings of a small sample of regular library users showed more promise. We also argue that the results can be enhanced by including a bigger corpus for training the model and applying more careful preprocessing techniques.

Source

Cataloging and classification quarterly. 59(2021) no.8, p.775-793
Witschel, H.F.: Terminology extraction and automatic indexing : comparison and qualitative evaluation of methods (2005) 0.07
```
0.071409 = product of:
  0.107113495 = sum of:
    0.0928731 = weight(_text_:index in 1842) [ClassicSimilarity], result of:
      0.0928731 = score(doc=1842,freq=6.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.418113 = fieldWeight in 1842, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1842)
    0.014240401 = product of:
      0.028480802 = sum of:
        0.028480802 = weight(_text_:classification in 1842) [ClassicSimilarity], result of:
          0.028480802 = score(doc=1842,freq=2.0), product of:
            0.16188543 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.05083213 = queryNorm
            0.17593184 = fieldWeight in 1842, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1842)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Many terminology engineering processes involve the task of automatic terminology extraction: before the terminology of a given domain can be modelled, organised or standardised, important concepts (or terms) of this domain have to be identified and fed into terminological databases. These serve in further steps as a starting point for compiling dictionaries, thesauri or maybe even terminological ontologies for the domain. For the extraction of the initial concepts, extraction methods are needed that operate on specialised language texts. On the other hand, many machine learning or information retrieval applications require automatic indexing techniques. In Machine Learning applications concerned with the automatic clustering or classification of texts, often feature vectors are needed that describe the contents of a given text briefly but meaningfully. These feature vectors typically consist of a fairly small set of index terms together with weights indicating their importance. Short but meaningful descriptions of document contents as provided by good index terms are also useful to humans: some knowledge management applications (e.g. topic maps) use them as a set of basic concepts (topics). The author believes that the tasks of terminology extraction and automatic indexing have much in common and can thus benefit from the same set of basic algorithms. It is the goal of this paper to outline some methods that may be used in both contexts, but also to find the discriminating factors between the two tasks that call for the variation of parameters or application of different techniques. The discussion of these methods will be based on statistical, syntactical and especially morphological properties of (index) terms. The paper is concluded by the presentation of some qualitative and quantitative results comparing statistical and morphological methods.

Koch, T.: Experiments with automatic classification of WAIS databases and indexing of WWW : some results from the Nordic WAIS/WWW project (1994) 0.07

0.06884199 = product of:
  0.103262976 = sum of:
    0.07506842 = weight(_text_:index in 7209) [ClassicSimilarity], result of:
      0.07506842 = score(doc=7209,freq=2.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.33795667 = fieldWeight in 7209, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7209)
    0.028194554 = product of:
      0.05638911 = sum of:
        0.05638911 = weight(_text_:classification in 7209) [ClassicSimilarity], result of:
          0.05638911 = score(doc=7209,freq=4.0), product of:
            0.16188543 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.05083213 = queryNorm
            0.34832728 = fieldWeight in 7209, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0546875 = fieldNorm(doc=7209)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: The Nordic WAIS/WWW project sponsored by NORDINFO is a joint project between Lund University Library and the National Technological Library of Denmark. It aims to improve the existing networked information discovery and retrieval tools Wide Area Information System (WAIS) and World Wide Web (WWW), and to move towards unifying WWW and WAIS. Details current results focusing on the WAIS side of the project. Describes research into automatic indexing and classification of WAIS sources, development of an orientation tool for WAIS, and development of a WAIS index of WWW resources

Sparck Jones, K.: Index term weighting (1973) 0.06

0.05719499 = product of:
  0.17158496 = sum of:
    0.17158496 = weight(_text_:index in 5491) [ClassicSimilarity], result of:
      0.17158496 = score(doc=5491,freq=2.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.7724724 = fieldWeight in 5491, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.125 = fieldNorm(doc=5491)
  0.33333334 = coord(1/3)

Leung, C.-H.; Kan, W.-K.: ¬A statistical learning approach to automatic indexing of controlled index terms (1997) 0.05
```
0.052536957 = product of:
  0.15761086 = sum of:
    0.15761086 = weight(_text_:index in 6497) [ClassicSimilarity], result of:
      0.15761086 = score(doc=6497,freq=12.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.7095612 = fieldWeight in 6497, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.046875 = fieldNorm(doc=6497)
  0.33333334 = coord(1/3)
```
Abstract

A statistical learning approach to assigning controlled index terms is presented. In this approach, there are two processes: (1) the learning process and (2) the indexing process. The learning process constructs a relationship between an index term and the words relevant and irrelevant to it, based on the positive training set and negative training set, and those not indexed by it, respectively. The indexing process determines whether an index term is assigned to a certain document, based on the relationship constructed by the learning process, and the text found in the document. Furthermore, a learning feedback technique is introduced. This technique used in the learning process modifies the relationship between an index term and its relevant and irrelevant words to improve the learning performance and, thus, the indexing performance. Experimental results have shown that the statistical learning approach and the learning feedback technique are practical means to automatic indexing of controlled index terms
Golub, K.: Automatic subject indexing of text (2019) 0.05
```
0.052190267 = product of:
  0.078285396 = sum of:
    0.0536203 = weight(_text_:index in 5268) [ClassicSimilarity], result of:
      0.0536203 = score(doc=5268,freq=2.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.24139762 = fieldWeight in 5268, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5268)
    0.024665099 = product of:
      0.049330197 = sum of:
        0.049330197 = weight(_text_:classification in 5268) [ClassicSimilarity], result of:
          0.049330197 = score(doc=5268,freq=6.0), product of:
            0.16188543 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.05083213 = queryNorm
            0.3047229 = fieldWeight in 5268, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5268)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Automatic subject indexing addresses problems of scale and sustainability and can be at the same time used to enrich existing metadata records, establish more connections across and between resources from various metadata and resource collec-tions, and enhance consistency of the metadata. In this work, au-tomatic subject indexing focuses on assigning index terms or classes from established knowledge organization systems (KOSs) for subject indexing like thesauri, subject headings systems and classification systems. The following major approaches are dis-cussed, in terms of their similarities and differences, advantages and disadvantages for automatic assigned indexing from KOSs: "text categorization," "document clustering," and "document classification." Text categorization is perhaps the most wide-spread, machine-learning approach with what seems generally good reported performance. Document clustering automatically both creates groups of related documents and extracts names of subjects depicting the group at hand. Document classification re-uses the intellectual effort invested into creating a KOS for sub-ject indexing and even simple string-matching algorithms have been reported to achieve good results, because one concept can be described using a number of different terms, including equiv-alent, related, narrower and broader terms. Finally, applicability of automatic subject indexing to operative information systems and challenges of evaluation are outlined, suggesting the need for more research.

Thönssen, B.: Automatische Indexierung und Schnittstellen zu Thesauri (1988) 0.05

0.05055371 = product of:
  0.15166113 = sum of:
    0.15166113 = weight(_text_:index in 30) [ClassicSimilarity], result of:
      0.15166113 = score(doc=30,freq=4.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.6827756 = fieldWeight in 30, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.078125 = fieldNorm(doc=30)
  0.33333334 = coord(1/3)

Abstract: Über eine Schnittstelle zwischen Programmen zur automatischen Indexierung (PRIMUS-IDX) und zur maschinellen Thesaurusverwaltung (INDEX) sollen große Textmengen schnell, kostengünstig und konsistent erschlossen und verbesserte Recherchemöglichkeiten geschaffen werden. Zielvorstellung ist ein Verfahren, das auf PCs ablauffähig ist und speziell deutschsprachige Texte bearbeiten kann
Object: INDEX

Moens, M.F.: Automatic indexing and abstracting of document texts (2000) 0.05

0.05055371 = product of:
  0.15166113 = sum of:
    0.15166113 = weight(_text_:index in 6892) [ClassicSimilarity], result of:
      0.15166113 = score(doc=6892,freq=4.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.6827756 = fieldWeight in 6892, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.078125 = fieldNorm(doc=6892)
  0.33333334 = coord(1/3)

Content: Need for indexing and abstracting texts; attributes of texts; text representations and their use; selection of natural language index terms; assignment of controlled language index texts; automatic abstracting; applications

Cohen, J.D.: Highlights: language- and domain-independent automatic indexing terms for abstracting (1995) 0.04
```
0.04334078 = product of:
  0.13002233 = sum of:
    0.13002233 = weight(_text_:index in 1793) [ClassicSimilarity], result of:
      0.13002233 = score(doc=1793,freq=6.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.5853582 = fieldWeight in 1793, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1793)
  0.33333334 = coord(1/3)
```
Abstract

Presents a model of drawing index terms from text. The approach uses no stop list, stemmer, or other language and domain specific component, allowing operation in any language or domain with only trivial modification. The method uses n-grams counts, achieving a function similar to, but more general than, a stemmer. The generated index terms, called 'highlights', are suitable for identifying the topic for perusal and selection. An extension is also described and demonstrated which selects index terms to represent a subset of documents, distinguishing them from the corpus. Presents some experimental results, showing operation in English, Spanish, German, Georgian, Russian and Japanese
O'Kane, K.C.: Generating hierarchical document indices from common denominators in large document collections (1996) 0.04
```
0.04334078 = product of:
  0.13002233 = sum of:
    0.13002233 = weight(_text_:index in 4037) [ClassicSimilarity], result of:
      0.13002233 = score(doc=4037,freq=6.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.5853582 = fieldWeight in 4037, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4037)
  0.33333334 = coord(1/3)
```
Abstract

Describes an effective, simple and efficient algorithm for computer generation of hierarchical indices from Document Term matrices by means of calculating common denominator vectors from the document vector set. This procedure produces an intuitive, user friendly hierarchical index of a document collection not unlike that which would be expected had a manual indexer set about to create an index or outline of a collection. The resulting index, when presented with a graphical user interface, provides the user with a natural easily comprehended view of the document collection, permits general browsing and informal search activities with an access method that requires no keyboard entry or prior knowledge of the vocabulary
Mansour, N.; Haraty, R.A.; Daher, W.; Houri, M.: ¬An auto-indexing method for Arabic text (2008) 0.04
```
0.04289624 = product of:
  0.12868872 = sum of:
    0.12868872 = weight(_text_:index in 2103) [ClassicSimilarity], result of:
      0.12868872 = score(doc=2103,freq=8.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.5793543 = fieldWeight in 2103, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.046875 = fieldNorm(doc=2103)
  0.33333334 = coord(1/3)
```
Abstract

This work addresses the information retrieval problem of auto-indexing Arabic documents. Auto-indexing a text document refers to automatically extracting words that are suitable for building an index for the document. In this paper, we propose an auto-indexing method for Arabic text documents. This method is mainly based on morphological analysis and on a technique for assigning weights to words. The morphological analysis uses a number of grammatical rules to extract stem words that become candidate index words. The weight assignment technique computes weights for these words relative to the container document. The weight is based on how spread is the word in a document and not only on its rate of occurrence. The candidate index words are then sorted in descending order by weight so that information retrievers can select the more important index words. We empirically verify the usefulness of our method using several examples. For these examples, we obtained an average recall of 46% and an average precision of 64%.

Nicoletti, M.: Automatische Indexierung (2001) 0.04

0.04289624 = product of:
  0.12868872 = sum of:
    0.12868872 = weight(_text_:index in 4326) [ClassicSimilarity], result of:
      0.12868872 = score(doc=4326,freq=2.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.5793543 = fieldWeight in 4326, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.09375 = fieldNorm(doc=4326)
  0.33333334 = coord(1/3)

Content: Inhalt: 1. Aufgabe - 2. Ermittlung von Mehrwortgruppen - 2.1 Definition - 3. Kennzeichnung der Mehrwortgruppen - 4. Grundformen - 5. Term- und Dokumenthäufigkeit --- Termgewichtung - 6. Steuerungsinstrument Schwellenwert - 7. Invertierter Index. Vgl. unter: http://www.grin.com/de/e-book/104966/automatische-indexierung.

Needham, R.M.; Sparck Jones, K.: Keywords and clumps (1985) 0.04
```
0.039882645 = product of:
  0.059823968 = sum of:
    0.03753421 = weight(_text_:index in 3645) [ClassicSimilarity], result of:
      0.03753421 = score(doc=3645,freq=2.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.16897833 = fieldWeight in 3645, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.02734375 = fieldNorm(doc=3645)
    0.022289755 = product of:
      0.04457951 = sum of:
        0.04457951 = weight(_text_:classification in 3645) [ClassicSimilarity], result of:
          0.04457951 = score(doc=3645,freq=10.0), product of:
            0.16188543 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.05083213 = queryNorm
            0.27537692 = fieldWeight in 3645, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.02734375 = fieldNorm(doc=3645)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

The selection that follows was chosen as it represents "a very early paper an the possibilities allowed by computers an documentation." In the early 1960s computers were being used to provide simple automatic indexing systems wherein keywords were extracted from documents. The problem with such systems was that they lacked vocabulary control, thus documents related in subject matter were not always collocated in retrieval. To improve retrieval by improving recall is the raison d'être of vocabulary control tools such as classifications and thesauri. The question arose whether it was possible by automatic means to construct classes of terms, which when substituted, one for another, could be used to improve retrieval performance? One of the first theoretical approaches to this question was initiated by R. M. Needham and Karen Sparck Jones at the Cambridge Language Research Institute in England.t The question was later pursued using experimental methodologies by Sparck Jones, who, as a Senior Research Associate in the Computer Laboratory at the University of Cambridge, has devoted her life's work to research in information retrieval and automatic naturai language processing. Based an the principles of numerical taxonomy, automatic classification techniques start from the premise that two objects are similar to the degree that they share attributes in common. When these two objects are keywords, their similarity is measured in terms of the number of documents they index in common. Step 1 in automatic classification is to compute mathematically the degree to which two terms are similar. Step 2 is to group together those terms that are "most similar" to each other, forming equivalence classes of intersubstitutable terms. The technique for forming such classes varies and is the factor that characteristically distinguishes different approaches to automatic classification. The technique used by Needham and Sparck Jones, that of clumping, is described in the selection that follows. Questions that must be asked are whether the use of automatically generated classes really does improve retrieval performance and whether there is a true eco nomic advantage in substituting mechanical for manual labor. Several years after her work with clumping, Sparck Jones was to observe that while it was not wholly satisfactory in itself, it was valuable in that it stimulated research into automatic classification. To this it might be added that it was valuable in that it introduced to libraryl information science the methods of numerical taxonomy, thus stimulating us to think again about the fundamental nature and purpose of classification. In this connection it might be useful to review how automatically derived classes differ from those of manually constructed classifications: 1) the manner of their derivation is purely a posteriori, the ultimate operationalization of the principle of literary warrant; 2) the relationship between members forming such classes is essentially statistical; the members of a given class are similar to each other not because they possess the class-defining characteristic but by virtue of sharing a family resemblance; and finally, 3) automatically derived classes are not related meaningfully one to another, that is, they are not ordered in traditional hierarchical and precedence relationships.
Oberhauser, O.; Labner, J.: OPAC-Erweiterung durch automatische Indexierung : Empirische Untersuchung mit Daten aus dem Österreichischen Verbundkatalog (2002) 0.04
```
0.03714924 = product of:
  0.111447714 = sum of:
    0.111447714 = weight(_text_:index in 883) [ClassicSimilarity], result of:
      0.111447714 = score(doc=883,freq=6.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.50173557 = fieldWeight in 883, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.046875 = fieldNorm(doc=883)
  0.33333334 = coord(1/3)
```
Abstract

In Anlehnung an die in den neunziger Jahren durchgeführten Erschließungsprojekte MILOS I und MILOS II, die die Eignung eines Verfahrens zur automatischen Indexierung für Bibliothekskataloge zum Thema hatten, wurde eine empirische Untersuchung anhand einer repräsentativen Stichprobe von Titelsätzen aus dem Österreichischen Verbundkatalog durchgeführt. Ziel war die Prüfung und Bewertung der Einsatzmöglichkeit dieses Verfahrens in den Online-Katalogen des Verbundes. Der Realsituation der OPAC-Benutzung gemäß wurde ausschließlich die Auswirkung auf den automatisch generierten Begriffen angereicherten Basic Index ("Alle Felder") untersucht. Dazu wurden 100 Suchanfragen zunächst im ursprünglichen Basic Index und sodann im angereicherten Basic Index in einem OPAC unter Aleph 500 durchgeführt. Die Tests erbrachten einen Zuwachs an relevanten Treffern bei nur leichten Verlusten an Precision, eine Reduktion der Nulltreffer-Ergebnisse sowie Aufschlüsse über die Auswirkung einer vorhandenen verbalen Sacherschließung.
SIGIR'92 : Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (1992) 0.03
```
0.031668328 = product of:
  0.04750249 = sum of:
    0.03753421 = weight(_text_:index in 6671) [ClassicSimilarity], result of:
      0.03753421 = score(doc=6671,freq=2.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.16897833 = fieldWeight in 6671, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.02734375 = fieldNorm(doc=6671)
    0.009968281 = product of:
      0.019936562 = sum of:
        0.019936562 = weight(_text_:classification in 6671) [ClassicSimilarity], result of:
          0.019936562 = score(doc=6671,freq=2.0), product of:
            0.16188543 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.05083213 = queryNorm
            0.12315229 = fieldWeight in 6671, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.02734375 = fieldNorm(doc=6671)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Content

HARMAN, D.: Relevance feedback revisited; AALBERSBERG, I.J.: Incremental relevance feedback; TAGUE-SUTCLIFFE, J.: Measuring the informativeness of a retrieval process; LEWIS, D.D.: An evaluation of phrasal and clustered representations on a text categorization task; BLOSSEVILLE, M.J., G. HÉBRAIL, M.G. MONTEIL u. N. PÉNOT: Automatic document classification: natural language processing, statistical analysis, and expert system techniques used together; MASAND, B., G. LINOFF u. D. WALTZ: Classifying news stories using memory based reasoning; KEEN, E.M.: Term position ranking: some new test results; CROUCH, C.J. u. B. YANG: Experiments in automatic statistical thesaurus construction; GREFENSTETTE, G.: Use of syntactic context to produce term association lists for text retrieval; ANICK, P.G. u. R.A. FLYNN: Versioning of full-text information retrieval system; BURKOWSKI, F.J.: Retrieval activities in a database consisting of heterogeneous collections; DEERWESTER, S.C., K. WACLENA u. M. LaMAR: A textual object management system; NIE, J.-Y.:Towards a probabilistic modal logic for semantic-based information retrieval; WANG, A.W., S.K.M. WONG u. Y.Y. YAO: An analysis of vector space models based on computational geometry; BARTELL, B.T., G.W. COTTRELL u. R.K. BELEW: Latent semantic indexing is an optimal special case of multidimensional scaling; GLAVITSCH, U. u. P. SCHÄUBLE: A system for retrieving speech documents; MARGULIS, E.L.: N-Poisson document modelling; HESS, M.: An incrementally extensible document retrieval system based on linguistics and logical principles; COOPER, W.S., F.C. GEY u. D.P. DABNEY: Probabilistic retrieval based on staged logistic regression; FUHR, N.: Integration of probabilistic fact and text retrieval; CROFT, B., L.A. SMITH u. H. TURTLE: A loosely-coupled integration of a text retrieval system and an object-oriented database system; DUMAIS, S.T. u. J. NIELSEN: Automating the assignement of submitted manuscripts to reviewers; GOST, M.A. u. M. MASOTTI: Design of an OPAC database to permit different subject searching accesses; ROBERTSON, A.M. u. P. WILLETT: Searching for historical word forms in a database of 17th century English text using spelling correction methods; FAX, E.A., Q.F. CHEN u. L.S. HEATH: A faster algorithm for constructing minimal perfect hash functions; MOFFAT, A. u. J. ZOBEL: Parameterised compression for sparse bitmaps; GRANDI, F., P. TIBERIO u. P. Zezula: Frame-sliced patitioned parallel signature files; ALLEN, B.: Cognitive differences in end user searching of a CD-ROM index; SONNENWALD, D.H.: Developing a theory to guide the process of designing information retrieval systems; CUTTING, D.R., J.O. PEDERSEN, D. KARGER, u. J.W. TUKEY: Scatter/ Gather: a cluster-based approach to browsing large document collections; CHALMERS, M. u. P. CHITSON: Bead: Explorations in information visualization; WILLIAMSON, C. u. B. SHNEIDERMAN: The dynamic HomeFinder: evaluating dynamic queries in a real-estate information exploring system
Ladewig, C.; Henkes, M.: Verfahren zur automatischen inhaltlichen Erschließung von elektronischen Texten : ASPECTIX (2001) 0.03
```
0.030332223 = product of:
  0.09099667 = sum of:
    0.09099667 = weight(_text_:index in 5794) [ClassicSimilarity], result of:
      0.09099667 = score(doc=5794,freq=4.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.40966535 = fieldWeight in 5794, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.046875 = fieldNorm(doc=5794)
  0.33333334 = coord(1/3)
```
Abstract

Das Verfahren zur automatischen syntaktischen inhaltlichen Erschließung von elektronischen Texten, AspectiX, basiert auf einem Index, dessen Elemente mit einer universellen Aspekt-Klassifikation verknüpft sind, die es erlauben, ein syntaktisches Retrieval durchzuführen. Mit diesen, auf den jeweiligen Suchgegenstand inhaltlich bezogenen Klassifikationselementen, werden die Informationen in elektronischen Texten mit bekannten Suchalgorithmen abgefragt und die Ergebnisse entsprechend der Aspektverknüpfung ausgewertet. Mit diesen Aspekten ist es möglich, unbekannte Textdokumente automatisch fachgebiets- und sprachunabhängig nach Inhalten zu klassifizieren und beim Suchen in einem Textcorpus nicht nur auf die Verwendung von Zeichenfolgen angewiesen zu sein wie bei Suchmaschinen im WWW. Der Index kann bei diesen Vorgängen intellektuell und automatisch weiter ausgebaut werden und liefert Ergebnisse im Retrieval von nahezu 100 Prozent Precision, bei gleichzeitig nahezu 100 Prozent Recall. Damit ist das Verfahren AspectiX allen anderen Recherchetools um bis zu 40 Prozent an Precision bzw. Recall überlegen, wie an zahlreichen Recherchen in drei Datenbanken, die unterschiedlich groß und thematisch unähnlich sind, nachgewiesen wird

Search (106 results, page 1 of 6)

Authors

Years

Languages

Types

Themes