Search (42 results, page 1 of 3)

Riloff, E.: ¬An empirical study of automated dictionary construction for information extraction in three domains (1996) 0.03

0.028787265 = product of:
  0.05757453 = sum of:
    0.05757453 = sum of:
      0.007654148 = weight(_text_:a in 6752) [ClassicSimilarity], result of:
        0.007654148 = score(doc=6752,freq=4.0), product of:
          0.053105544 = queryWeight, product of:
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.046056706 = queryNorm
          0.14413087 = fieldWeight in 6752, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.0625 = fieldNorm(doc=6752)
      0.04992038 = weight(_text_:22 in 6752) [ClassicSimilarity], result of:
        0.04992038 = score(doc=6752,freq=2.0), product of:
          0.16128273 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046056706 = queryNorm
          0.30952093 = fieldWeight in 6752, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0625 = fieldNorm(doc=6752)
  0.5 = coord(1/2)

Abstract: AutoSlog is a system that addresses the knowledge engineering bottleneck for information extraction. AutoSlog automatically creates domain specific dictionaries for information extraction, given an appropriate training corpus. Describes experiments with AutoSlog in terrorism, joint ventures and microelectronics domains. Compares the performance of AutoSlog across the 3 domains, discusses the lessons learned and presents results from 2 experiments which demonstrate that novice users can generate effective dictionaries using AutoSlog
Date: 6. 3.1997 16:22:15
Type: a

Lorenz, S.: Konzeption und prototypische Realisierung einer begriffsbasierten Texterschließung (2006) 0.01

0.009360071 = product of:
  0.018720143 = sum of:
    0.018720143 = product of:
      0.037440285 = sum of:
        0.037440285 = weight(_text_:22 in 1746) [ClassicSimilarity], result of:
          0.037440285 = score(doc=1746,freq=2.0), product of:
            0.16128273 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046056706 = queryNorm
            0.23214069 = fieldWeight in 1746, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=1746)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 3.2015 9:17:30

Hoppe, A.: ¬Die systematischen Grundlagen für ein linguistisch orientiertes maschinelles Dokumentationsverfahren (1969) 0.00

0.00334869 = product of:
  0.00669738 = sum of:
    0.00669738 = product of:
      0.01339476 = sum of:
        0.01339476 = weight(_text_:a in 4720) [ClassicSimilarity], result of:
          0.01339476 = score(doc=4720,freq=4.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.25222903 = fieldWeight in 4720, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.109375 = fieldNorm(doc=4720)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Type: a

Driscoll, J.R.; Rajala, D.A.; Shaffer, W.H.: ¬The operation and performance of an artificially intelligent keywording system (1991) 0.00
```
0.00334869 = product of:
  0.00669738 = sum of:
    0.00669738 = product of:
      0.01339476 = sum of:
        0.01339476 = weight(_text_:a in 6681) [ClassicSimilarity], result of:
          0.01339476 = score(doc=6681,freq=16.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.25222903 = fieldWeight in 6681, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6681)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Presents a new approach to text analysis for automating the key phrase indexing process, using artificial intelligence techniques. This mimics the behaviour of human experts by using a rule base consisting of insertion and deletion rules generated by subject-matter experts. The insertion rules are based on the idea that some phrases found in a text imply or trigger other phrases. The deletion rules apply to semantically ambiguous phrases where text presence alone does not determine appropriateness as a key phrase. The insertion and deletion rules are used to transform a list of found phrases to a list of key phrases for indexing a document. Statistical data are provided to demonstrate the performance of this expert rule based system

Type

a
Pritchard-Schoch, T.: Natural language comes of age (1993) 0.00
```
0.00270615 = product of:
  0.0054123 = sum of:
    0.0054123 = product of:
      0.0108246 = sum of:
        0.0108246 = weight(_text_:a in 2570) [ClassicSimilarity], result of:
          0.0108246 = score(doc=2570,freq=8.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.20383182 = fieldWeight in 2570, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=2570)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Discusses natural languages and the natural language implementations of Westlaw's full-text legal documents, Westlaw Is Natural. Natural language is not aritificial intelligence but a hybrid of linguistics, mathematics and statistics. Provides 3 classes of retrieval models. Explains how Westlaw processes an English query. Assesses WIN. Covers WIN enhancements; the natural language features of Congressional Quarterly's Washington Alert using a document for a query; the personal librarian front end search software and Dowquest from Dow Jones news/retrieval. Conmsiders whether natural language encourages fuzzy thinking and whether Boolean logic will still be needed

Type

a
Salton, G.: Automatic processing of foreign language documents (1985) 0.00
```
0.0024392908 = product of:
  0.0048785815 = sum of:
    0.0048785815 = product of:
      0.009757163 = sum of:
        0.009757163 = weight(_text_:a in 3650) [ClassicSimilarity], result of:
          0.009757163 = score(doc=3650,freq=26.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.18373153 = fieldWeight in 3650, product of:
              5.0990195 = tf(freq=26.0), with freq of:
                26.0 = termFreq=26.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.03125 = fieldNorm(doc=3650)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The attempt to computerize a process, such as indexing, abstracting, classifying, or retrieving information, begins with an analysis of the process into its intellectual and nonintellectual components. That part of the process which is amenable to computerization is mechanical or algorithmic. What is not is intellectual or creative and requires human intervention. Gerard Salton has been an innovator, experimenter, and promoter in the area of mechanized information systems since the early 1960s. He has been particularly ingenious at analyzing the process of information retrieval into its algorithmic components. He received a doctorate in applied mathematics from Harvard University before moving to the computer science department at Cornell, where he developed a prototype automatic retrieval system called SMART. Working with this system he and his students contributed for over a decade to our theoretical understanding of the retrieval process. On a more practical level, they have contributed design criteria for operating retrieval systems. The following selection presents one of the early descriptions of the SMART system; it is valuable as it shows the direction automatic retrieval methods were to take beyond simple word-matching techniques. These include various word normalization techniques to improve recall, for instance, the separation of words into stems and affixes; the correlation and clustering, using statistical association measures, of related terms; and the identification, using a concept thesaurus, of synonymous, broader, narrower, and sibling terms. They include, as weIl, techniques, both linguistic and statistical, to deal with the thorny problem of how to automatically extract from texts index terms that consist of more than one word. They include weighting techniques and various documentrequest matching algorithms. Significant among the latter are those which produce a retrieval output of citations ranked in relevante order. During the 1970s, Salton and his students went an to further refine these various techniques, particularly the weighting and statistical association measures. Many of their early innovations seem commonplace today. Some of their later techniques are still ahead of their time and await technological developments for implementation. The particular focus of the selection that follows is an the evaluation of a particular component of the SMART system, a multilingual thesaurus. By mapping English language expressions and their German equivalents to a common concept number, the thesaurus permitted the automatic processing of German language documents against English language queries and vice versa. The results of the evaluation, as it turned out, were somewhat inconclusive. However, this SMART experiment suggested in a bold and optimistic way how one might proceed to answer such complex questions as What is meant by retrieval language compatability? How it is to be achieved, and how evaluated?

Source

Theory of subject analysis: a sourcebook. Ed.: L.M. Chan, et al

Type

a

Lustig, G.: ¬Das Projekt WAI : Wörterbuchentwicklung für automatisches Indexing (1982) 0.00

0.0023678814 = product of:
  0.0047357627 = sum of:
    0.0047357627 = product of:
      0.009471525 = sum of:
        0.009471525 = weight(_text_:a in 33) [ClassicSimilarity], result of:
          0.009471525 = score(doc=33,freq=2.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.17835285 = fieldWeight in 33, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.109375 = fieldNorm(doc=33)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Type: a

Kuhlen, R.: Morphologische Relationen durch Reduktionsalgorithmen (1974) 0.00

0.0023678814 = product of:
  0.0047357627 = sum of:
    0.0047357627 = product of:
      0.009471525 = sum of:
        0.009471525 = weight(_text_:a in 4251) [ClassicSimilarity], result of:
          0.009471525 = score(doc=4251,freq=2.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.17835285 = fieldWeight in 4251, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.109375 = fieldNorm(doc=4251)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Type: a

Malone, L.C.; Driscoll, J.R.; Pepe, J.W.: Modeling the performance of an automated keywording system (1991) 0.00

0.0023435948 = product of:
  0.0046871896 = sum of:
    0.0046871896 = product of:
      0.009374379 = sum of:
        0.009374379 = weight(_text_:a in 6682) [ClassicSimilarity], result of:
          0.009374379 = score(doc=6682,freq=6.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.17652355 = fieldWeight in 6682, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=6682)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: Presents a model for predicting the performance of a computerised keyword assigning and indexing system. Statistical procedures were investigated in order to protect against incorrect keywording by the system behaving as an expert system designed to mimic the behaviour of human keyword indexers and representing lessons learned from military exercises and operations
Type: a

Polity, Y.: Vers une ergonomie linguistique (1994) 0.00

0.0023435948 = product of:
  0.0046871896 = sum of:
    0.0046871896 = product of:
      0.009374379 = sum of:
        0.009374379 = weight(_text_:a in 36) [ClassicSimilarity], result of:
          0.009374379 = score(doc=36,freq=6.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.17652355 = fieldWeight in 36, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=36)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: Analyzed a special type of man-mchine interaction, that of searching an information system with natural language. A model for full text processing for information retrieval was proposed that considered the system's users and how they employ information. Describes how INIST (the National Institute for Scientific and Technical Information) is developing computer assisted indexing as an aid to improving relevance when retrieving information from bibliographic data banks
Type: a

SIGIR'92 : Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (1992) 0.00
```
0.0022926913 = product of:
  0.0045853825 = sum of:
    0.0045853825 = product of:
      0.009170765 = sum of:
        0.009170765 = weight(_text_:a in 6671) [ClassicSimilarity], result of:
          0.009170765 = score(doc=6671,freq=30.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.17268941 = fieldWeight in 6671, product of:
              5.477226 = tf(freq=30.0), with freq of:
                30.0 = termFreq=30.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.02734375 = fieldNorm(doc=6671)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Content

HARMAN, D.: Relevance feedback revisited; AALBERSBERG, I.J.: Incremental relevance feedback; TAGUE-SUTCLIFFE, J.: Measuring the informativeness of a retrieval process; LEWIS, D.D.: An evaluation of phrasal and clustered representations on a text categorization task; BLOSSEVILLE, M.J., G. HÉBRAIL, M.G. MONTEIL u. N. PÉNOT: Automatic document classification: natural language processing, statistical analysis, and expert system techniques used together; MASAND, B., G. LINOFF u. D. WALTZ: Classifying news stories using memory based reasoning; KEEN, E.M.: Term position ranking: some new test results; CROUCH, C.J. u. B. YANG: Experiments in automatic statistical thesaurus construction; GREFENSTETTE, G.: Use of syntactic context to produce term association lists for text retrieval; ANICK, P.G. u. R.A. FLYNN: Versioning of full-text information retrieval system; BURKOWSKI, F.J.: Retrieval activities in a database consisting of heterogeneous collections; DEERWESTER, S.C., K. WACLENA u. M. LaMAR: A textual object management system; NIE, J.-Y.:Towards a probabilistic modal logic for semantic-based information retrieval; WANG, A.W., S.K.M. WONG u. Y.Y. YAO: An analysis of vector space models based on computational geometry; BARTELL, B.T., G.W. COTTRELL u. R.K. BELEW: Latent semantic indexing is an optimal special case of multidimensional scaling; GLAVITSCH, U. u. P. SCHÄUBLE: A system for retrieving speech documents; MARGULIS, E.L.: N-Poisson document modelling; HESS, M.: An incrementally extensible document retrieval system based on linguistics and logical principles; COOPER, W.S., F.C. GEY u. D.P. DABNEY: Probabilistic retrieval based on staged logistic regression; FUHR, N.: Integration of probabilistic fact and text retrieval; CROFT, B., L.A. SMITH u. H. TURTLE: A loosely-coupled integration of a text retrieval system and an object-oriented database system; DUMAIS, S.T. u. J. NIELSEN: Automating the assignement of submitted manuscripts to reviewers; GOST, M.A. u. M. MASOTTI: Design of an OPAC database to permit different subject searching accesses; ROBERTSON, A.M. u. P. WILLETT: Searching for historical word forms in a database of 17th century English text using spelling correction methods; FAX, E.A., Q.F. CHEN u. L.S. HEATH: A faster algorithm for constructing minimal perfect hash functions; MOFFAT, A. u. J. ZOBEL: Parameterised compression for sparse bitmaps; GRANDI, F., P. TIBERIO u. P. Zezula: Frame-sliced patitioned parallel signature files; ALLEN, B.: Cognitive differences in end user searching of a CD-ROM index; SONNENWALD, D.H.: Developing a theory to guide the process of designing information retrieval systems; CUTTING, D.R., J.O. PEDERSEN, D. KARGER, u. J.W. TUKEY: Scatter/ Gather: a cluster-based approach to browsing large document collections; CHALMERS, M. u. P. CHITSON: Bead: Explorations in information visualization; WILLIAMSON, C. u. B. SHNEIDERMAN: The dynamic HomeFinder: evaluating dynamic queries in a real-estate information exploring system
Goller, C.; Löning, J.; Will, T.; Wolff, W.: Automatic document classification : a thourough evaluation of various methods (2000) 0.00
```
0.002269176 = product of:
  0.004538352 = sum of:
    0.004538352 = product of:
      0.009076704 = sum of:
        0.009076704 = weight(_text_:a in 5480) [ClassicSimilarity], result of:
          0.009076704 = score(doc=5480,freq=10.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.1709182 = fieldWeight in 5480, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=5480)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

(Automatic) document classification is generally defined as content-based assignment of one or more predefined categories to documents. Usually, machine learning, statistical pattern recognition, or neural network approaches are used to construct classifiers automatically. In this paper we thoroughly evaluate a wide variety of these methods on a document classification task for German text. We evaluate different feature construction and selection methods and various classifiers. Our main results are: (1) feature selection is necessary not only to reduce learning and classification time, but also to avoid overfitting (even for Support Vector Machines); (2) surprisingly, our morphological analysis does not improve classification quality compared to a letter 5-gram approach; (3) Support Vector Machines are significantly better than all other classification methods

Type

a
Pirkola, A.: Morphological typology of languages for IR (2001) 0.00
```
0.002269176 = product of:
  0.004538352 = sum of:
    0.004538352 = product of:
      0.009076704 = sum of:
        0.009076704 = weight(_text_:a in 4476) [ClassicSimilarity], result of:
          0.009076704 = score(doc=4476,freq=10.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.1709182 = fieldWeight in 4476, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=4476)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

This paper presents a morphological classification of languages from the IR perspective. Linguistic typology research has shown that the morphological complexity of every language in the world can be described by two variables, index of synthesis and index of fusion. These variables provide a theoretical basis for IR research handling morphological issues. A common theoretical framework is needed in particular because of the increasing significance of cross-language retrieval research and CLIR systems processing different languages. The paper elaborates the linguistic morphological typology for the purposes of IR research. It studies how the indexes of synthesis and fusion could be used as practical tools in mono- and cross-lingual IR research. The need for semantic and syntactic typologies is discussed. The paper also reviews studies made in different languages on the effects of morphology and stemming in IR.

Type

a
Snajder, J.; Dalbelo Basic, B.D.; Tadic, M.: Automatic acquisition of inflectional lexica for morphological normalisation (2008) 0.00
```
0.002269176 = product of:
  0.004538352 = sum of:
    0.004538352 = product of:
      0.009076704 = sum of:
        0.009076704 = weight(_text_:a in 2910) [ClassicSimilarity], result of:
          0.009076704 = score(doc=2910,freq=10.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.1709182 = fieldWeight in 2910, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=2910)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Due to natural language morphology, words can take on various morphological forms. Morphological normalisation - often used in information retrieval and text mining systems - conflates morphological variants of a word to a single representative form. In this paper, we describe an approach to lexicon-based inflectional normalisation. This approach is in between stemming and lemmatisation, and is suitable for morphological normalisation of inflectionally complex languages. To eliminate the immense effort required to compile the lexicon by hand, we focus on the problem of acquiring automatically an inflectional morphological lexicon from raw corpora. We propose a convenient and highly expressive morphology representation formalism on which the acquisition procedure is based. Our approach is applied to the morphologically complex Croatian language, but it should be equally applicable to other languages of similar morphological complexity. Experimental results show that our approach can be used to acquire a lexicon whose linguistic quality allows for rather good normalisation performance.

Type

a
Li, W.; Wong, K.-F.; Yuan, C.: Toward automatic Chinese temporal information extraction (2001) 0.00
```
0.0020714647 = product of:
  0.0041429293 = sum of:
    0.0041429293 = product of:
      0.008285859 = sum of:
        0.008285859 = weight(_text_:a in 6029) [ClassicSimilarity], result of:
          0.008285859 = score(doc=6029,freq=12.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.15602624 = fieldWeight in 6029, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=6029)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Over the past few years, temporal information processing and temporal database management have increasingly become hot topics. Nevertheless, only a few researchers have investigated these areas in the Chinese language. This lays down the objective of our research: to exploit Chinese language processing techniques for temporal information extraction and concept reasoning. In this article, we first study the mechanism for expressing time in Chinese. On the basis of the study, we then design a general frame structure for maintaining the extracted temporal concepts and propose a system for extracting time-dependent information from Hong Kong financial news. In the system, temporal knowledge is represented by different types of temporal concepts (TTC) and different temporal relations, including absolute and relative relations, which are used to correlate between action times and reference times. In analyzing a sentence, the algorithm first determines the situation related to the verb. This in turn will identify the type of temporal concept associated with the verb. After that, the relevant temporal information is extracted and the temporal relations are derived. These relations link relevant concept frames together in chronological order, which in turn provide the knowledge to fulfill users' queries, e.g., for question-answering (i.e., Q&A) applications

Type

a
Ahlgren, P.; Kekäläinen, J.: Indexing strategies for Swedish full text retrieval under different user scenarios (2007) 0.00
```
0.0020714647 = product of:
  0.0041429293 = sum of:
    0.0041429293 = product of:
      0.008285859 = sum of:
        0.008285859 = weight(_text_:a in 896) [ClassicSimilarity], result of:
          0.008285859 = score(doc=896,freq=12.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.15602624 = fieldWeight in 896, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=896)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

This paper deals with Swedish full text retrieval and the problem of morphological variation of query terms in the document database. The effects of combination of indexing strategies with query terms on retrieval effectiveness were studied. Three of five tested combinations involved indexing strategies that used conflation, in the form of normalization. Further, two of these three combinations used indexing strategies that employed compound splitting. Normalization and compound splitting were performed by SWETWOL, a morphological analyzer for the Swedish language. A fourth combination attempted to group related terms by right hand truncation of query terms. The four combinations were compared to each other and to a baseline combination, where no attempt was made to counteract the problem of morphological variation of query terms in the document database. The five combinations were evaluated under six different user scenarios, where each scenario simulated a certain user type. The four alternative combinations outperformed the baseline, for each user scenario. The truncation combination had the best performance under each user scenario. The main conclusion of the paper is that normalization and right hand truncation (performed by a search expert) enhanced retrieval effectiveness in comparison to the baseline. The performance of the three combinations of indexing strategies with query terms based on normalization was not far below the performance of the truncation combination.

Type

a
Witschel, H.F.: Terminology extraction and automatic indexing : comparison and qualitative evaluation of methods (2005) 0.00
```
0.0020714647 = product of:
  0.0041429293 = sum of:
    0.0041429293 = product of:
      0.008285859 = sum of:
        0.008285859 = weight(_text_:a in 1842) [ClassicSimilarity], result of:
          0.008285859 = score(doc=1842,freq=12.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.15602624 = fieldWeight in 1842, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1842)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Many terminology engineering processes involve the task of automatic terminology extraction: before the terminology of a given domain can be modelled, organised or standardised, important concepts (or terms) of this domain have to be identified and fed into terminological databases. These serve in further steps as a starting point for compiling dictionaries, thesauri or maybe even terminological ontologies for the domain. For the extraction of the initial concepts, extraction methods are needed that operate on specialised language texts. On the other hand, many machine learning or information retrieval applications require automatic indexing techniques. In Machine Learning applications concerned with the automatic clustering or classification of texts, often feature vectors are needed that describe the contents of a given text briefly but meaningfully. These feature vectors typically consist of a fairly small set of index terms together with weights indicating their importance. Short but meaningful descriptions of document contents as provided by good index terms are also useful to humans: some knowledge management applications (e.g. topic maps) use them as a set of basic concepts (topics). The author believes that the tasks of terminology extraction and automatic indexing have much in common and can thus benefit from the same set of basic algorithms. It is the goal of this paper to outline some methods that may be used in both contexts, but also to find the discriminating factors between the two tasks that call for the variation of parameters or application of different techniques. The discussion of these methods will be based on statistical, syntactical and especially morphological properties of (index) terms. The paper is concluded by the presentation of some qualitative and quantitative results comparing statistical and morphological methods.

Type

a

Renouf, A.: Sticking to the text : a corpus linguist's view of language (1993) 0.00

0.0020506454 = product of:
  0.004101291 = sum of:
    0.004101291 = product of:
      0.008202582 = sum of:
        0.008202582 = weight(_text_:a in 2314) [ClassicSimilarity], result of:
          0.008202582 = score(doc=2314,freq=6.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.1544581 = fieldWeight in 2314, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2314)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Type: a

Cheng, K.-H.: Automatic identification for topics of electronic documents (1997) 0.00

0.0020506454 = product of:
  0.004101291 = sum of:
    0.004101291 = product of:
      0.008202582 = sum of:
        0.008202582 = weight(_text_:a in 1811) [ClassicSimilarity], result of:
          0.008202582 = score(doc=1811,freq=6.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.1544581 = fieldWeight in 1811, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1811)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: With the rapid rise in numbers of electronic documents on the Internet, how to effectively assign topics to documents become an important issue. Current research in this area focuses on the behaviour of nouns in documents. Proposes, however, that nouns and verbs together contribute to the process of topic identification. Constructs a mathematical model taking into account the following factors: word importance, word frequency, word co-occurence, and word distance. Preliminary experiments ahow that the performance of the proposed model is equivalent to that of a human being
Type: a

Gräbnitz, V.: PASSAT: Programm zur automatischen Selektion von Stichwörtern aus Texten (1987) 0.00

0.0020296127 = product of:
  0.0040592253 = sum of:
    0.0040592253 = product of:
      0.008118451 = sum of:
        0.008118451 = weight(_text_:a in 932) [ClassicSimilarity], result of:
          0.008118451 = score(doc=932,freq=2.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.15287387 = fieldWeight in 932, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.09375 = fieldNorm(doc=932)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Type: a

Search (42 results, page 1 of 3)

Authors

Years

Languages

Types

Themes