Search (14 results, page 1 of 1)

  • × author_ss:"Järvelin, K."
  1. Järvelin, K.; Kristensen, J.; Niemi, T.; Sormunen, E.; Keskustalo, H.: ¬A deductive data model for query expansion (1996) 0.05
    0.047451444 = product of:
      0.09490289 = sum of:
        0.09490289 = sum of:
          0.052821 = weight(_text_:language in 2230) [ClassicSimilarity], result of:
            0.052821 = score(doc=2230,freq=2.0), product of:
              0.2030952 = queryWeight, product of:
                3.9232929 = idf(docFreq=2376, maxDocs=44218)
                0.051766515 = queryNorm
              0.26008 = fieldWeight in 2230, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.9232929 = idf(docFreq=2376, maxDocs=44218)
                0.046875 = fieldNorm(doc=2230)
          0.04208189 = weight(_text_:22 in 2230) [ClassicSimilarity], result of:
            0.04208189 = score(doc=2230,freq=2.0), product of:
              0.18127751 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.051766515 = queryNorm
              0.23214069 = fieldWeight in 2230, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=2230)
      0.5 = coord(1/2)
    
    Abstract
    We present a deductive data model for concept-based query expansion. It is based on three abstraction levels: the conceptual, linguistic and occurrence levels. Concepts and relationships among them are represented at the conceptual level. The expression level represents natural language expressions for concepts. Each expression has one or more matching models at the occurrence level. Each model specifies the matching of the expression in database indices built in varying ways. The data model supports a concept-based query expansion and formulation tool, the ExpansionTool, for environments providing heterogeneous IR systems. Expansion is controlled by adjustable matching reliability.
    Source
    Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM SIGIR '96), Zürich, Switzerland, August 18-22, 1996. Eds.: H.P. Frei et al
  2. Näppilä, T.; Järvelin, K.; Niemi, T.: ¬A tool for data cube construction from structurally heterogeneous XML documents (2008) 0.04
    0.03954287 = product of:
      0.07908574 = sum of:
        0.07908574 = sum of:
          0.0440175 = weight(_text_:language in 1369) [ClassicSimilarity], result of:
            0.0440175 = score(doc=1369,freq=2.0), product of:
              0.2030952 = queryWeight, product of:
                3.9232929 = idf(docFreq=2376, maxDocs=44218)
                0.051766515 = queryNorm
              0.21673335 = fieldWeight in 1369, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.9232929 = idf(docFreq=2376, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1369)
          0.03506824 = weight(_text_:22 in 1369) [ClassicSimilarity], result of:
            0.03506824 = score(doc=1369,freq=2.0), product of:
              0.18127751 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.051766515 = queryNorm
              0.19345059 = fieldWeight in 1369, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1369)
      0.5 = coord(1/2)
    
    Abstract
    Data cubes for OLAP (On-Line Analytical Processing) often need to be constructed from data located in several distributed and autonomous information sources. Such a data integration process is challenging due to semantic, syntactic, and structural heterogeneity among the data. While XML (extensible markup language) is the de facto standard for data exchange, the three types of heterogeneity remain. Moreover, popular path-oriented XML query languages, such as XQuery, require the user to know in much detail the structure of the documents to be processed and are, thus, effectively impractical in many real-world data integration tasks. Several Lowest Common Ancestor (LCA)-based XML query evaluation strategies have recently been introduced to provide a more structure-independent way to access XML documents. We shall, however, show that this approach leads in the context of certain - not uncommon - types of XML documents to undesirable results. This article introduces a novel high-level data extraction primitive that utilizes the purpose-built Smallest Possible Context (SPC) query evaluation strategy. We demonstrate, through a system prototype for OLAP data cube construction and a sample application in informetrics, that our approach has real advantages in data integration.
    Date
    9. 2.2008 17:22:42
  3. Niemi, T.; Junkkari, M.; Järvelin, K.; Viita, S.: Advanced query language for manipulating complex entities (2004) 0.03
    0.03081225 = product of:
      0.0616245 = sum of:
        0.0616245 = product of:
          0.123249 = sum of:
            0.123249 = weight(_text_:language in 4218) [ClassicSimilarity], result of:
              0.123249 = score(doc=4218,freq=2.0), product of:
                0.2030952 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.051766515 = queryNorm
                0.60685337 = fieldWeight in 4218, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.109375 = fieldNorm(doc=4218)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
  4. Pirkola, A.; Hedlund, T.; Keskustalo, H.; Järvelin, K.: Dictionary-based cross-language information retrieval : problems, methods, and research findings (2001) 0.03
    0.03081225 = product of:
      0.0616245 = sum of:
        0.0616245 = product of:
          0.123249 = sum of:
            0.123249 = weight(_text_:language in 3908) [ClassicSimilarity], result of:
              0.123249 = score(doc=3908,freq=2.0), product of:
                0.2030952 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.051766515 = queryNorm
                0.60685337 = fieldWeight in 3908, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.109375 = fieldNorm(doc=3908)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
  5. Lehtokangas, R.; Keskustalo, H.; Järvelin, K.: Experiments with transitive dictionary translation and pseudo-relevance feedback using graded relevance assessments (2008) 0.03
    0.029527843 = product of:
      0.059055686 = sum of:
        0.059055686 = product of:
          0.11811137 = sum of:
            0.11811137 = weight(_text_:language in 1349) [ClassicSimilarity], result of:
              0.11811137 = score(doc=1349,freq=10.0), product of:
                0.2030952 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.051766515 = queryNorm
                0.5815567 = fieldWeight in 1349, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1349)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    In this article, the authors present evaluation results for transitive dictionary-based cross-language information retrieval (CLIR) using graded relevance assessments in a best match retrieval environment. A text database containing newspaper articles and a related set of 35 search topics were used in the tests. Source language topics (in English, German, and Swedish) were automatically translated into the target language (Finnish) via an intermediate (or pivot) language. Effectiveness of the transitively translated queries was compared to that of the directly translated and monolingual Finnish queries. Pseudo-relevance feedback (PRF) was also used to expand the original transitive target queries. Cross-language information retrieval performance was evaluated on three relevance thresholds: stringent, regular, and liberal. The transitive translations performed well achieving, on the average, 85-93% of the direct translation performance, and 66-72% of monolingual performance. Moreover, PRF was successful in raising the performance of transitive translation routes in absolute terms as well as in relation to monolingual and direct translation performance applying PRF.
  6. Niemi, T.; Hirvonen, L.; Järvelin, K.: Multidimensional data model and query language for informetrics (2003) 0.03
    0.0264105 = product of:
      0.052821 = sum of:
        0.052821 = product of:
          0.105642 = sum of:
            0.105642 = weight(_text_:language in 1753) [ClassicSimilarity], result of:
              0.105642 = score(doc=1753,freq=8.0), product of:
                0.2030952 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.051766515 = queryNorm
                0.52016 = fieldWeight in 1753, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1753)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Multidimensional data analysis or On-line analytical processing (OLAP) offers a single subject-oriented source for analyzing summary data based an various dimensions. We demonstrate that the OLAP approach gives a promising starting point for advanced analysis and comparison among summary data in informetrics applications. At the moment there is no single precise, commonly accepted logical/conceptual model for multidimensional analysis. This is because the requirements of applications vary considerably. We develop a conceptual/logical multidimensional model for supporting the complex and unpredictable needs of informetrics. Summary data are considered with respect of some dimensions. By changing dimensions the user may construct other views an the same summary data. We develop a multidimensional query language whose basic idea is to support the definition of views in a way, which is natural and intuitive for lay users in the informetrics area. We show that this view-oriented query language has a great expressive power and its degree of declarativity is greater than in contemporary operation-oriented or SQL (Structured Query Language)-like OLAP query languages.
  7. Toivonen, J.; Pirkola, A.; Keskustalo, H.; Visala, K.; Järvelin, K.: Translating cross-lingual spelling variants using transformation rules (2005) 0.03
    0.0264105 = product of:
      0.052821 = sum of:
        0.052821 = product of:
          0.105642 = sum of:
            0.105642 = weight(_text_:language in 1052) [ClassicSimilarity], result of:
              0.105642 = score(doc=1052,freq=8.0), product of:
                0.2030952 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.051766515 = queryNorm
                0.52016 = fieldWeight in 1052, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1052)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Technical terms and proper names constitute a major problem in dictionary-based cross-language information retrieval (CLIR). However, technical terms and proper names in different languages often share the same Latin or Greek origin, being thus spelling variants of each other. In this paper we present a novel two-step fuzzy translation technique for cross-lingual spelling variants. In the first step, transformation rules are applied to source words to render them more similar to their target language equivalents. The rules are generated automatically using translation dictionaries as source data. In the second step, the intermediate forms obtained in the first step are translated into a target language using fuzzy matching. The effectiveness of the technique was evaluated empirically using five source languages and English as a target language. The two-step technique performed better, in some cases considerably better, than fuzzy matching alone. Even using the first step as such showed promising results.
  8. Järvelin, K.; Niemi, T.: Deductive information retrieval based on classifications (1993) 0.03
    0.0264105 = product of:
      0.052821 = sum of:
        0.052821 = product of:
          0.105642 = sum of:
            0.105642 = weight(_text_:language in 2229) [ClassicSimilarity], result of:
              0.105642 = score(doc=2229,freq=8.0), product of:
                0.2030952 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.051766515 = queryNorm
                0.52016 = fieldWeight in 2229, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2229)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Modern fact databses contain abundant data classified through several classifications. Typically, users msut consult these classifications in separate manuals or files, thus making their effective use difficult. Contemporary database systems do little support deductive use of classifications. In this study we show how deductive data management techniques can be applied to the utilization of data value classifications. Computation of transitive class relationships is of primary importance here. We define a representation of classifications which supports transitive computation and present an operation-oriented deductive query language tailored for classification-based deductive information retrieval. The operations of this language are on the same abstraction level as relational algebra operations and can be integrated with these to form a powerful and flexible query language for deductive information retrieval. We define the integration of these operations and demonstrate the usefulness of the language in terms of several sample queries
  9. Kettunen, K.; Kunttu, T.; Järvelin, K.: To stem or lemmatize a highly inflectional language in a probabilistic IR environment? (2005) 0.02
    0.024606533 = product of:
      0.049213067 = sum of:
        0.049213067 = product of:
          0.09842613 = sum of:
            0.09842613 = weight(_text_:language in 4395) [ClassicSimilarity], result of:
              0.09842613 = score(doc=4395,freq=10.0), product of:
                0.2030952 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.051766515 = queryNorm
                0.48463053 = fieldWeight in 4395, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4395)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Purpose - To show that stem generation compares well with lemmatization as a morphological tool for a highly inflectional language for IR purposes in a best-match retrieval system. Design/methodology/approach - Effects of three different morphological methods - lemmatization, stemming and stem production - for Finnish are compared in a probabilistic IR environment (INQUERY). Evaluation is done using a four-point relevance scale which is partitioned differently in different test settings. Findings - Results show that stem production, a lighter method than morphological lemmatization, compares well with lemmatization in a best-match IR environment. Differences in performance between stem production and lemmatization are small and they are not statistically significant in most of the tested settings. It is also shown that hitherto a rather neglected method of morphological processing for Finnish, stemming, performs reasonably well although the stemmer used - a Porter stemmer implementation - is far from optimal for a morphologically complex language like Finnish. In another series of tests, the effects of compound splitting and derivational expansion of queries are tested. Practical implications - Usefulness of morphological lemmatization and stem generation for IR purposes can be estimated with many factors. On the average P-R level they seem to behave very close to each other in a probabilistic IR system. Thus, the choice of the used method with highly inflectional languages needs to be estimated along other dimensions too. Originality/value - Results are achieved using Finnish as an example of a highly inflectional language. The results are of interest for anyone who is interested in processing of morphological variation of a highly inflected language for IR purposes.
  10. Pirkola, A.; Puolamäki, D.; Järvelin, K.: Applying query structuring in cross-language retrieval (2003) 0.02
    0.022872169 = product of:
      0.045744337 = sum of:
        0.045744337 = product of:
          0.091488674 = sum of:
            0.091488674 = weight(_text_:language in 1074) [ClassicSimilarity], result of:
              0.091488674 = score(doc=1074,freq=6.0), product of:
                0.2030952 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.051766515 = queryNorm
                0.45047188 = fieldWeight in 1074, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1074)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    We will explore various ways to apply query structuring in cross-language information retrieval. In the first test, English queries were translated into Finnish using an electronic dictionary, and were run in a Finnish newspaper database of 55,000 articles. Queries were structured by combining the Finnish translation equivalents of the same English query key using the syn-operator of the InQuery retrieval system. Structured queries performed markedly better than unstructured queries. Second, the effects of compound-based structuring using a proximity operator for the translation equivalents of query language compound components were tested. The method was not useful in syn-based queries but resulted in decrease in retrieval effectiveness. Proper names are often non-identical spelling variants in different languages. This allows n-gram based translation of names not included in a dictionary. In the third test, a query structuring method where the Boolean and-operator was used to assign more weight to keys translated through n-gram matching gave good results.
  11. Talvensaari, T.; Laurikkala, J.; Järvelin, K.; Juhola, M.: ¬A study on automatic creation of a comparable document collection in cross-language information retrieval (2006) 0.02
    0.015562538 = product of:
      0.031125076 = sum of:
        0.031125076 = product of:
          0.062250152 = sum of:
            0.062250152 = weight(_text_:language in 5601) [ClassicSimilarity], result of:
              0.062250152 = score(doc=5601,freq=4.0), product of:
                0.2030952 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.051766515 = queryNorm
                0.30650726 = fieldWeight in 5601, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5601)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Purpose - To present a method for creating a comparable document collection from two document collections in different languages. Design/methodology/approach - The best query keys were extracted from a Finnish source collection (articles of the newspaper Aamulehti) with the relative average term frequency formula. The keys were translated into English with a dictionary-based query translation program. The resulting lists of words were used as queries that were run against the target collection (Los Angeles Times articles) with the nearest neighbor method. The documents were aligned with unrestricted and date-restricted alignment schemes, which were also combined. Findings - The combined alignment scheme was found the best, when the relatedness of the document pairs was assessed with a five-degree relevance scale. Of the 400 document pairs, roughly 40 percent were highly or fairly related and 75 percent included at least lexical similarity. Research limitations/implications - The number of alignment pairs was small due to the short common time period of the two collections, and their geographical (and thus, topical) remoteness. In future, our aim is to build larger comparable corpora in various languages and use them as source of translation knowledge for the purposes of cross-language information retrieval (CLIR). Practical implications - Readily available parallel corpora are scarce. With this method, two unrelated document collections can relatively easily be aligned to create a CLIR resource. Originality/value - The method can be applied to weakly linked collections and morphologically complex languages, such as Finnish.
  12. Talvensaari, T.; Juhola, M.; Laurikkala, J.; Järvelin, K.: Corpus-based cross-language information retrieval in retrieval of highly relevant documents (2007) 0.02
    0.015562538 = product of:
      0.031125076 = sum of:
        0.031125076 = product of:
          0.062250152 = sum of:
            0.062250152 = weight(_text_:language in 139) [ClassicSimilarity], result of:
              0.062250152 = score(doc=139,freq=4.0), product of:
                0.2030952 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.051766515 = queryNorm
                0.30650726 = fieldWeight in 139, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=139)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Information retrieval systems' ability to retrieve highly relevant documents has become more and more important in the age of extremely large collections, such as the World Wide Web (WWW). The authors' aim was to find out how corpus-based cross-language information retrieval (CLIR) manages in retrieving highly relevant documents. They created a Finnish-Swedish comparable corpus from two loosely related document collections and used it as a source of knowledge for query translation. Finnish test queries were translated into Swedish and run against a Swedish test collection. Graded relevance assessments were used in evaluating the results and three relevance criterion levels-liberal, regular, and stringent-were applied. The runs were also evaluated with generalized recall and precision, which weight the retrieved documents according to their relevance level. The performance of the Comparable Corpus Translation system (COCOT) was compared to that of a dictionarybased query translation program; the two translation methods were also combined. The results indicate that corpus-based CUR performs particularly well with highly relevant documents. In average precision, COCOT even matched the monolingual baseline on the highest relevance level. The performance of the different query translation methods was further analyzed by finding out reasons for poor rankings of highly relevant documents.
  13. Saastamoinen, M.; Järvelin, K.: Search task features in work tasks of varying types and complexity (2017) 0.01
    0.010520472 = product of:
      0.021040944 = sum of:
        0.021040944 = product of:
          0.04208189 = sum of:
            0.04208189 = weight(_text_:22 in 3589) [ClassicSimilarity], result of:
              0.04208189 = score(doc=3589,freq=2.0), product of:
                0.18127751 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051766515 = queryNorm
                0.23214069 = fieldWeight in 3589, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3589)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Information searching in practice seldom is an end in itself. In work, work task (WT) performance forms the context, which information searching should serve. Therefore, information retrieval (IR) systems development/evaluation should take the WT context into account. The present paper analyzes how WT features: task complexity and task types, affect information searching in authentic work: the types of information needs, search processes, and search media. We collected data on 22 information professionals in authentic work situations in three organization types: city administration, universities, and companies. The data comprise 286 WTs and 420 search tasks (STs). The data include transaction logs, video recordings, daily questionnaires, interviews. and observation. The data were analyzed quantitatively. Even if the participants used a range of search media, most STs were simple throughout the data, and up to 42% of WTs did not include searching. WT's effects on STs are not straightforward: different WT types react differently to WT complexity. Due to the simplicity of authentic searching, the WT/ST types in interactive IR experiments should be reconsidered.
  14. Vakkari, P.; Järvelin, K.; Chang, Y.-W.: ¬The association of disciplinary background with the evolution of topics and methods in Library and Information Science research 1995-2015 (2023) 0.01
    0.00876706 = product of:
      0.01753412 = sum of:
        0.01753412 = product of:
          0.03506824 = sum of:
            0.03506824 = weight(_text_:22 in 998) [ClassicSimilarity], result of:
              0.03506824 = score(doc=998,freq=2.0), product of:
                0.18127751 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051766515 = queryNorm
                0.19345059 = fieldWeight in 998, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=998)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    22. 6.2023 18:15:06