Search (76 results, page 1 of 4)

  • × theme_ss:"Computerlinguistik"
  1. Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.11
    0.11492675 = product of:
      0.28731686 = sum of:
        0.2454422 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.2454422 = score(doc=562,freq=2.0), product of:
            0.43671587 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.051511593 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
        0.04187466 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.04187466 = score(doc=562,freq=2.0), product of:
            0.18038483 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051511593 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.4 = coord(2/5)
    
    Content
    Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
    Date
    8. 1.2013 10:22:32
  2. Way, E.C.: Knowledge representation and metaphor (oder: meaning) (1994) 0.06
    0.057108857 = product of:
      0.14277214 = sum of:
        0.08693925 = weight(_text_:index in 771) [ClassicSimilarity], result of:
          0.08693925 = score(doc=771,freq=2.0), product of:
            0.2250935 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.051511593 = queryNorm
            0.3862362 = fieldWeight in 771, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.0625 = fieldNorm(doc=771)
        0.05583288 = weight(_text_:22 in 771) [ClassicSimilarity], result of:
          0.05583288 = score(doc=771,freq=2.0), product of:
            0.18038483 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051511593 = queryNorm
            0.30952093 = fieldWeight in 771, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=771)
      0.4 = coord(2/5)
    
    Content
    Enthält folgende 9 Kapitel: The literal and the metaphoric; Views of metaphor; Knowledge representation; Representation schemes and conceptual graphs; The dynamic type hierarchy theory of metaphor; Computational approaches to metaphor; Thenature and structure of semantic hierarchies; Language games, open texture and family resemblance; Programming the dynamic type hierarchy; Subject index
    Footnote
    Bereits 1991 bei Kluwer publiziert // Rez. in: Knowledge organization 22(1995) no.1, S.48-49 (O. Sechser)
  3. Ruge, G.: Sprache und Computer : Wortbedeutung und Termassoziation. Methoden zur automatischen semantischen Klassifikation (1995) 0.06
    0.057108857 = product of:
      0.14277214 = sum of:
        0.08693925 = weight(_text_:index in 1534) [ClassicSimilarity], result of:
          0.08693925 = score(doc=1534,freq=2.0), product of:
            0.2250935 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.051511593 = queryNorm
            0.3862362 = fieldWeight in 1534, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.0625 = fieldNorm(doc=1534)
        0.05583288 = weight(_text_:22 in 1534) [ClassicSimilarity], result of:
          0.05583288 = score(doc=1534,freq=2.0), product of:
            0.18038483 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051511593 = queryNorm
            0.30952093 = fieldWeight in 1534, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=1534)
      0.4 = coord(2/5)
    
    Content
    Enthält folgende Kapitel: (1) Motivation; (2) Language philosophical foundations; (3) Structural comparison of extensions; (4) Earlier approaches towards term association; (5) Experiments; (6) Spreading-activation networks or memory models; (7) Perspective. Appendices: Heads and modifiers of 'car'. Glossary. Index. Language and computer. Word semantics and term association. Methods towards an automatic semantic classification
    Footnote
    Rez. in: Knowledge organization 22(1995) no.3/4, S.182-184 (M.T. Rolland)
  4. Noever, D.; Ciolino, M.: ¬The Turing deception (2022) 0.05
    0.04908844 = product of:
      0.2454422 = sum of:
        0.2454422 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
          0.2454422 = score(doc=862,freq=2.0), product of:
            0.43671587 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.051511593 = queryNorm
            0.56201804 = fieldWeight in 862, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=862)
      0.2 = coord(1/5)
    
    Source
    https%3A%2F%2Farxiv.org%2Fabs%2F2212.06721&usg=AOvVaw3i_9pZm9y_dQWoHi6uv0EN
  5. Campe, P.: Case, semantic roles, and grammatical relations : a comprehensive bibliography (1994) 0.03
    0.026081776 = product of:
      0.13040888 = sum of:
        0.13040888 = weight(_text_:index in 8663) [ClassicSimilarity], result of:
          0.13040888 = score(doc=8663,freq=2.0), product of:
            0.2250935 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.051511593 = queryNorm
            0.5793543 = fieldWeight in 8663, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.09375 = fieldNorm(doc=8663)
      0.2 = coord(1/5)
    
    Abstract
    Contains references to more than 6000 publications with a subject and a language index as well as a guide to the relevant languages and language families
  6. Peters, W.; Vossen, P.; Diez-Orzas, P.; Adriaens, G.: Cross-linguistic alignment of WordNets with an inter-lingual-index (1998) 0.03
    0.026081776 = product of:
      0.13040888 = sum of:
        0.13040888 = weight(_text_:index in 6446) [ClassicSimilarity], result of:
          0.13040888 = score(doc=6446,freq=2.0), product of:
            0.2250935 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.051511593 = queryNorm
            0.5793543 = fieldWeight in 6446, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.09375 = fieldNorm(doc=6446)
      0.2 = coord(1/5)
    
  7. He, Q.: ¬A study of the strength indexes in co-word analysis (2000) 0.03
    0.026081776 = product of:
      0.13040888 = sum of:
        0.13040888 = weight(_text_:index in 111) [ClassicSimilarity], result of:
          0.13040888 = score(doc=111,freq=8.0), product of:
            0.2250935 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.051511593 = queryNorm
            0.5793543 = fieldWeight in 111, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.046875 = fieldNorm(doc=111)
      0.2 = coord(1/5)
    
    Abstract
    Co-word analysis is a technique for detecting the knowledge structure of scientific literature and mapping the dynamics in a research field. It is used to count the co-occurrences of term pairs, compute the strength between term pairs, and map the research field by inserting terms and their linkages into a graphical structure according to the strength values. In previous co-word studies, there are two indexes used to measure the strength between term pairs in order to identify the major areas in a research field - the inclusion index (I) and the equivalence index (E). This study will conduct two co-word analysis experiments using the two indexes, respectively, and compare the results from the two experiments. The results show, due to the difference in their computation, index I is more likely to identify general subject areas in a research field while index E is more likely to identify subject areas at more specific levels
  8. Abu-Salem, H.; Al-Omari, M.; Evens, M.W.: Stemming methodologies over individual query words for an Arabic information retrieval system (1999) 0.02
    0.024300262 = product of:
      0.12150131 = sum of:
        0.12150131 = weight(_text_:index in 3672) [ClassicSimilarity], result of:
          0.12150131 = score(doc=3672,freq=10.0), product of:
            0.2250935 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.051511593 = queryNorm
            0.5397815 = fieldWeight in 3672, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3672)
      0.2 = coord(1/5)
    
    Abstract
    Stemming is one of the most important factors that affect the performance of information retrieval systems. This article investigates how to improve the performance of an Arabic information retrieval system by imposing the retrieval method over individual words of a query depending on the importance of the WORD, the STEM, or the ROOT of the query terms in the database. This method, called Mxed Stemming, computes term importance using a weighting scheme that use the Term Frequency (TF) and the Inverse Document Frequency (IDF), called TFxIDF. An extended version of the Arabic IRS system is designed, implemented, and evaluated to reduce the number of irrelevant documents retrieved. The results of the experiment suggest that the proposed method outperforms the Word index method using the TFxIDF weighting scheme. It also outperforms the Stem index method using the Binary weighting scheme but does not outperform the Stem index method using the TFxIDF weighting scheme, and again it outperforms the Root index method using the Binary weighting scheme but does not outperform the Root index method using the TFxIDF weighting scheme
  9. Warner, A.J.: Natural language processing (1987) 0.02
    0.022333153 = product of:
      0.11166576 = sum of:
        0.11166576 = weight(_text_:22 in 337) [ClassicSimilarity], result of:
          0.11166576 = score(doc=337,freq=2.0), product of:
            0.18038483 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051511593 = queryNorm
            0.61904186 = fieldWeight in 337, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.125 = fieldNorm(doc=337)
      0.2 = coord(1/5)
    
    Source
    Annual review of information science and technology. 22(1987), S.79-108
  10. Mock, K.J.; Vemuri, V.R.: Information filtering via hill climbing, WordNet, and index patterns (1997) 0.02
    0.02151637 = product of:
      0.10758185 = sum of:
        0.10758185 = weight(_text_:index in 1517) [ClassicSimilarity], result of:
          0.10758185 = score(doc=1517,freq=4.0), product of:
            0.2250935 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.051511593 = queryNorm
            0.4779429 = fieldWeight in 1517, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1517)
      0.2 = coord(1/5)
    
    Abstract
    The INFOS (Intelligent News Filtering Organizational System) project is designed to reduce the user's search burden by automatically categorising data as relevant or irrelevant based upon user interests. These predictions are learned automatically based upon features taken from input articles and collaborative features derived from other users. The filtering is performed by a hybrid technique that combines elements of a keyword-based hill climbing method, knowledge-based conceptual representation via WordNet, and partial parsing via index patterns. The hybrid systems integrating all these approaches combines the benefits of each while maintaing robustness and acalability
  11. McMahon, J.G.; Smith, F.J.: Improved statistical language model performance with automatic generated word hierarchies (1996) 0.02
    0.019541508 = product of:
      0.09770754 = sum of:
        0.09770754 = weight(_text_:22 in 3164) [ClassicSimilarity], result of:
          0.09770754 = score(doc=3164,freq=2.0), product of:
            0.18038483 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051511593 = queryNorm
            0.5416616 = fieldWeight in 3164, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=3164)
      0.2 = coord(1/5)
    
    Source
    Computational linguistics. 22(1996) no.2, S.217-248
  12. Ruge, G.: ¬A spreading activation network for automatic generation of thesaurus relationships (1991) 0.02
    0.019541508 = product of:
      0.09770754 = sum of:
        0.09770754 = weight(_text_:22 in 4506) [ClassicSimilarity], result of:
          0.09770754 = score(doc=4506,freq=2.0), product of:
            0.18038483 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051511593 = queryNorm
            0.5416616 = fieldWeight in 4506, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=4506)
      0.2 = coord(1/5)
    
    Date
    8.10.2000 11:52:22
  13. Somers, H.: Example-based machine translation : Review article (1999) 0.02
    0.019541508 = product of:
      0.09770754 = sum of:
        0.09770754 = weight(_text_:22 in 6672) [ClassicSimilarity], result of:
          0.09770754 = score(doc=6672,freq=2.0), product of:
            0.18038483 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051511593 = queryNorm
            0.5416616 = fieldWeight in 6672, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=6672)
      0.2 = coord(1/5)
    
    Date
    31. 7.1996 9:22:19
  14. New tools for human translators (1997) 0.02
    0.019541508 = product of:
      0.09770754 = sum of:
        0.09770754 = weight(_text_:22 in 1179) [ClassicSimilarity], result of:
          0.09770754 = score(doc=1179,freq=2.0), product of:
            0.18038483 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051511593 = queryNorm
            0.5416616 = fieldWeight in 1179, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=1179)
      0.2 = coord(1/5)
    
    Date
    31. 7.1996 9:22:19
  15. Baayen, R.H.; Lieber, H.: Word frequency distributions and lexical semantics (1997) 0.02
    0.019541508 = product of:
      0.09770754 = sum of:
        0.09770754 = weight(_text_:22 in 3117) [ClassicSimilarity], result of:
          0.09770754 = score(doc=3117,freq=2.0), product of:
            0.18038483 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051511593 = queryNorm
            0.5416616 = fieldWeight in 3117, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=3117)
      0.2 = coord(1/5)
    
    Date
    28. 2.1999 10:48:22
  16. ¬Der Student aus dem Computer (2023) 0.02
    0.019541508 = product of:
      0.09770754 = sum of:
        0.09770754 = weight(_text_:22 in 1079) [ClassicSimilarity], result of:
          0.09770754 = score(doc=1079,freq=2.0), product of:
            0.18038483 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051511593 = queryNorm
            0.5416616 = fieldWeight in 1079, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=1079)
      0.2 = coord(1/5)
    
    Date
    27. 1.2023 16:22:55
  17. Vilares, J.; Alonso, M.A.; Vilares, M.: Extraction of complex index terms in non-English IR : a shallow parsing based approach (2008) 0.02
    0.018822905 = product of:
      0.09411452 = sum of:
        0.09411452 = weight(_text_:index in 2107) [ClassicSimilarity], result of:
          0.09411452 = score(doc=2107,freq=6.0), product of:
            0.2250935 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.051511593 = queryNorm
            0.418113 = fieldWeight in 2107, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2107)
      0.2 = coord(1/5)
    
    Abstract
    The performance of information retrieval systems is limited by the linguistic variation present in natural language texts. Word-level natural language processing techniques have been shown to be useful in reducing this variation. In this article, we summarize our work on the extension of these techniques for dealing with phrase-level variation in European languages, taking Spanish as a case in point. We propose the use of syntactic dependencies as complex index terms in an attempt to solve the problems deriving from both syntactic and morpho-syntactic variation and, in this way, to obtain more precise index terms. Such dependencies are obtained through a shallow parser based on cascades of finite-state transducers in order to reduce as far as possible the overhead due to this parsing process. The use of different sources of syntactic information, queries or documents, has been also studied, as has the restriction of the dependencies applied to those obtained from noun phrases. Our approaches have been tested using the CLEF corpus, obtaining consistent improvements with regard to classical word-level non-linguistic techniques. Results show, on the one hand, that syntactic information extracted from documents is more useful than that from queries. On the other hand, it has been demonstrated that by restricting dependencies to those corresponding to noun phrases, important reductions of storage and management costs can be achieved, albeit at the expense of a slight reduction in performance.
  18. Witschel, H.F.: Terminology extraction and automatic indexing : comparison and qualitative evaluation of methods (2005) 0.02
    0.018822905 = product of:
      0.09411452 = sum of:
        0.09411452 = weight(_text_:index in 1842) [ClassicSimilarity], result of:
          0.09411452 = score(doc=1842,freq=6.0), product of:
            0.2250935 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.051511593 = queryNorm
            0.418113 = fieldWeight in 1842, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1842)
      0.2 = coord(1/5)
    
    Abstract
    Many terminology engineering processes involve the task of automatic terminology extraction: before the terminology of a given domain can be modelled, organised or standardised, important concepts (or terms) of this domain have to be identified and fed into terminological databases. These serve in further steps as a starting point for compiling dictionaries, thesauri or maybe even terminological ontologies for the domain. For the extraction of the initial concepts, extraction methods are needed that operate on specialised language texts. On the other hand, many machine learning or information retrieval applications require automatic indexing techniques. In Machine Learning applications concerned with the automatic clustering or classification of texts, often feature vectors are needed that describe the contents of a given text briefly but meaningfully. These feature vectors typically consist of a fairly small set of index terms together with weights indicating their importance. Short but meaningful descriptions of document contents as provided by good index terms are also useful to humans: some knowledge management applications (e.g. topic maps) use them as a set of basic concepts (topics). The author believes that the tasks of terminology extraction and automatic indexing have much in common and can thus benefit from the same set of basic algorithms. It is the goal of this paper to outline some methods that may be used in both contexts, but also to find the discriminating factors between the two tasks that call for the variation of parameters or application of different techniques. The discussion of these methods will be based on statistical, syntactical and especially morphological properties of (index) terms. The paper is concluded by the presentation of some qualitative and quantitative results comparing statistical and morphological methods.
  19. Järvelin, A.; Keskustalo, H.; Sormunen, E.; Saastamoinen, M.; Kettunen, K.: Information retrieval from historical newspaper collections in highly inflectional languages : a query expansion approach (2016) 0.02
    0.018822905 = product of:
      0.09411452 = sum of:
        0.09411452 = weight(_text_:index in 3223) [ClassicSimilarity], result of:
          0.09411452 = score(doc=3223,freq=6.0), product of:
            0.2250935 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.051511593 = queryNorm
            0.418113 = fieldWeight in 3223, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3223)
      0.2 = coord(1/5)
    
    Abstract
    The aim of the study was to test whether query expansion by approximate string matching methods is beneficial in retrieval from historical newspaper collections in a language rich with compounds and inflectional forms (Finnish). First, approximate string matching methods were used to generate lists of index words most similar to contemporary query terms in a digitized newspaper collection from the 1800s. Top index word variants were categorized to estimate the appropriate query expansion ranges in the retrieval test. Second, the effectiveness of approximate string matching methods, automatically generated inflectional forms, and their combinations were measured in a Cranfield-style test. Finally, a detailed topic-level analysis of test results was conducted. In the index of historical newspaper collection the occurrences of a word typically spread to many linguistic and historical variants along with optical character recognition (OCR) errors. All query expansion methods improved the baseline results. Extensive expansion of around 30 variants for each query word was required to achieve the highest performance improvement. Query expansion based on approximate string matching was superior to using the inflectional forms of the query words, showing that coverage of the different types of variation is more important than precision in handling one type of variation.
  20. Pirkola, A.: Morphological typology of languages for IR (2001) 0.02
    0.018442601 = product of:
      0.092213005 = sum of:
        0.092213005 = weight(_text_:index in 4476) [ClassicSimilarity], result of:
          0.092213005 = score(doc=4476,freq=4.0), product of:
            0.2250935 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.051511593 = queryNorm
            0.40966535 = fieldWeight in 4476, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.046875 = fieldNorm(doc=4476)
      0.2 = coord(1/5)
    
    Abstract
    This paper presents a morphological classification of languages from the IR perspective. Linguistic typology research has shown that the morphological complexity of every language in the world can be described by two variables, index of synthesis and index of fusion. These variables provide a theoretical basis for IR research handling morphological issues. A common theoretical framework is needed in particular because of the increasing significance of cross-language retrieval research and CLIR systems processing different languages. The paper elaborates the linguistic morphological typology for the purposes of IR research. It studies how the indexes of synthesis and fusion could be used as practical tools in mono- and cross-lingual IR research. The need for semantic and syntactic typologies is discussed. The paper also reviews studies made in different languages on the effects of morphology and stemming in IR.

Languages

  • e 59
  • d 16
  • m 1
  • More… Less…

Types

  • a 61
  • m 7
  • el 6
  • s 4
  • p 2
  • x 2
  • b 1
  • d 1
  • More… Less…