Search (4 results, page 1 of 1)

  • × author_ss:"Kettunen, K."
  • × theme_ss:"Computerlinguistik"
  1. Kettunen, K.: Reductive and generative approaches to management of morphological variation of keywords in monolingual information retrieval : an overview (2009) 0.02
    0.020352896 = sum of:
      0.018274104 = product of:
        0.07309642 = sum of:
          0.07309642 = weight(_text_:authors in 2835) [ClassicSimilarity], result of:
            0.07309642 = score(doc=2835,freq=2.0), product of:
              0.2418733 = queryWeight, product of:
                4.558814 = idf(docFreq=1258, maxDocs=44218)
                0.053056188 = queryNorm
              0.30220953 = fieldWeight in 2835, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.558814 = idf(docFreq=1258, maxDocs=44218)
                0.046875 = fieldNorm(doc=2835)
        0.25 = coord(1/4)
      0.0020787928 = product of:
        0.0041575856 = sum of:
          0.0041575856 = weight(_text_:s in 2835) [ClassicSimilarity], result of:
            0.0041575856 = score(doc=2835,freq=2.0), product of:
              0.057684682 = queryWeight, product of:
                1.0872376 = idf(docFreq=40523, maxDocs=44218)
                0.053056188 = queryNorm
              0.072074346 = fieldWeight in 2835, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                1.0872376 = idf(docFreq=40523, maxDocs=44218)
                0.046875 = fieldNorm(doc=2835)
        0.5 = coord(1/2)
    
    Abstract
    Purpose - The purpose of this article is to discuss advantages and disadvantages of various means to manage morphological variation of keywords in monolingual information retrieval. Design/methodology/approach - The authors present a compilation of query results from 11 mostly European languages and a new general classification of the language dependent techniques for management of morphological variation. Variants of the different techniques are compared in some detail in terms of retrieval effectiveness and other criteria. The paper consists mainly of an overview of different management methods for keyword variation in information retrieval. Typical IR retrieval results of 11 languages and a new classification for keyword management methods are also presented. Findings - The main results of the paper are an overall comparison of reductive and generative keyword management methods in terms of retrieval effectiveness and other broader criteria. Originality/value - The paper is of value to anyone who wants to get an overall picture of keyword management techniques used in IR.
    Source
    Journal of documentation. 65(2009) no.2, S.267-290
  2. Airio, E.; Kettunen, K.: Does dictionary based bilingual retrieval work in a non-normalized index? (2009) 0.00
    0.0023241614 = product of:
      0.0046483227 = sum of:
        0.0046483227 = product of:
          0.009296645 = sum of:
            0.009296645 = weight(_text_:s in 4224) [ClassicSimilarity], result of:
              0.009296645 = score(doc=4224,freq=10.0), product of:
                0.057684682 = queryWeight, product of:
                  1.0872376 = idf(docFreq=40523, maxDocs=44218)
                  0.053056188 = queryNorm
                0.16116315 = fieldWeight in 4224, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.0872376 = idf(docFreq=40523, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4224)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Many operational IR indexes are non-normalized, i.e. no lemmatization or stemming techniques, etc. have been employed in indexing. This poses a challenge for dictionary-based cross-language retrieval (CLIR), because translations are mostly lemmas. In this study, we face the challenge of dictionary-based CLIR in a non-normalized index. We test two optional approaches: FCG (Frequent Case Generation) and s-gramming. The idea of FCG is to automatically generate the most frequent inflected forms for a given lemma. FCG has been tested in monolingual retrieval and has been shown to be a good method for inflected retrieval, especially for highly inflected languages. S-gramming is an approximate string matching technique (an extension of n-gramming). The language pairs in our tests were English-Finnish, English-Swedish, Swedish-Finnish and Finnish-Swedish. Both our approaches performed quite well, but the results varied depending on the language pair. S-gramming and FCG performed quite equally in all the other language pairs except Finnish-Swedish, where s-gramming outperformed FCG.
    Source
    Information processing and management. 45(2009) no.6, S.703-713
  3. Kettunen, K.; Kunttu, T.; Järvelin, K.: To stem or lemmatize a highly inflectional language in a probabilistic IR environment? (2005) 0.00
    8.6616375E-4 = product of:
      0.0017323275 = sum of:
        0.0017323275 = product of:
          0.003464655 = sum of:
            0.003464655 = weight(_text_:s in 4395) [ClassicSimilarity], result of:
              0.003464655 = score(doc=4395,freq=2.0), product of:
                0.057684682 = queryWeight, product of:
                  1.0872376 = idf(docFreq=40523, maxDocs=44218)
                  0.053056188 = queryNorm
                0.060061958 = fieldWeight in 4395, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.0872376 = idf(docFreq=40523, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4395)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    Journal of documentation. 61(2005) no.4, S.476-496
  4. Järvelin, A.; Keskustalo, H.; Sormunen, E.; Saastamoinen, M.; Kettunen, K.: Information retrieval from historical newspaper collections in highly inflectional languages : a query expansion approach (2016) 0.00
    8.6616375E-4 = product of:
      0.0017323275 = sum of:
        0.0017323275 = product of:
          0.003464655 = sum of:
            0.003464655 = weight(_text_:s in 3223) [ClassicSimilarity], result of:
              0.003464655 = score(doc=3223,freq=2.0), product of:
                0.057684682 = queryWeight, product of:
                  1.0872376 = idf(docFreq=40523, maxDocs=44218)
                  0.053056188 = queryNorm
                0.060061958 = fieldWeight in 3223, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.0872376 = idf(docFreq=40523, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3223)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    Journal of the Association for Information Science and Technology. 67(2016) no.12, S.2928-2946