Search (35 results, page 2 of 2)

  • × author_ss:"Järvelin, K."
  1. Saarikoski, J.; Laurikkala, J.; Järvelin, K.; Juhola, M.: ¬A study of the use of self-organising maps in information retrieval (2009) 0.00
    0.0029745363 = product of:
      0.011898145 = sum of:
        0.011898145 = weight(_text_:information in 2836) [ClassicSimilarity], result of:
          0.011898145 = score(doc=2836,freq=8.0), product of:
            0.06134496 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.034944877 = queryNorm
            0.19395474 = fieldWeight in 2836, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2836)
      0.25 = coord(1/4)
    
    Abstract
    Purpose - The aim of this paper is to explore the possibility of retrieving information with Kohonen self-organising maps, which are known to be effective to group objects according to their similarity or dissimilarity. Design/methodology/approach - After conventional preprocessing, such as transforming into vector space, documents from a German document collection were trained for a neural network of Kohonen self-organising map type. Such an unsupervised network forms a document map from which relevant objects can be found according to queries. Findings - Self-organising maps ordered documents to groups from which it was possible to find relevant targets. Research limitations/implications - The number of documents used was moderate due to the limited number of documents associated to test topics. The training of self-organising maps entails rather long running times, which is their practical limitation. In future, the aim will be to build larger networks by compressing document matrices, and to develop document searching in them. Practical implications - With self-organising maps the distribution of documents can be visualised and relevant documents found in document collections of limited size. Originality/value - The paper reports on an approach that can be especially used to group documents and also for information search. So far self-organising maps have rarely been studied for information retrieval. Instead, they have been applied to document grouping tasks.
  2. Ferro, N.; Silvello, G.; Keskustalo, H.; Pirkola, A.; Järvelin, K.: ¬The twist measure for IR evaluation : taking user's effort into account (2016) 0.00
    0.0025760243 = product of:
      0.010304097 = sum of:
        0.010304097 = weight(_text_:information in 2771) [ClassicSimilarity], result of:
          0.010304097 = score(doc=2771,freq=6.0), product of:
            0.06134496 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.034944877 = queryNorm
            0.16796975 = fieldWeight in 2771, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2771)
      0.25 = coord(1/4)
    
    Abstract
    We present a novel measure for ranking evaluation, called Twist (t). It is a measure for informational intents, which handles both binary and graded relevance. t stems from the observation that searching is currently a that searching is currently taken for granted and it is natural for users to assume that search engines are available and work well. As a consequence, users may assume the utility they have in finding relevant documents, which is the focus of traditional measures, as granted. On the contrary, they may feel uneasy when the system returns nonrelevant documents because they are then forced to do additional work to get the desired information, and this causes avoidable effort. The latter is the focus of t, which evaluates the effectiveness of a system from the point of view of the effort required to the users to retrieve the desired information. We provide a formal definition of t, a demonstration of its properties, and introduce the notion of effort/gain plots, which complement traditional utility-based measures. By means of an extensive experimental evaluation, t is shown to grasp different aspects of system performances, to not require extensive and costly assessments, and to be a robust tool for detecting differences between systems.
    Source
    Journal of the Association for Information Science and Technology. 67(2016) no.3, S.620-648
  3. Järvelin, K.; Vakkari, P.: LIS research across 50 years: content analysis of journal articles : offering an information-centric conception of memes (2022) 0.00
    0.0025760243 = product of:
      0.010304097 = sum of:
        0.010304097 = weight(_text_:information in 949) [ClassicSimilarity], result of:
          0.010304097 = score(doc=949,freq=6.0), product of:
            0.06134496 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.034944877 = queryNorm
            0.16796975 = fieldWeight in 949, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=949)
      0.25 = coord(1/4)
    
    Abstract
    Purpose This paper analyses the research in Library and Information Science (LIS) and reports on (1) the status of LIS research in 2015 and (2) on the evolution of LIS research longitudinally from 1965 to 2015. Design/methodology/approach The study employs a quantitative intellectual content analysis of articles published in 30+ scholarly LIS journals, following the design by Tuomaala et al. (2014). In the content analysis, we classify articles along eight dimensions covering topical content and methodology. Findings The topical findings indicate that the earlier strong LIS emphasis on L&I services has declined notably, while scientific and professional communication has become the most popular topic. Information storage and retrieval has given up its earlier strong position towards the end of the years analyzed. Individuals are increasingly the units of observation. End-user's and developer's viewpoints have strengthened at the cost of intermediaries' viewpoint. LIS research is methodologically increasingly scattered since survey, scientometric methods, experiment, case studies and qualitative studies have all gained in popularity. Consequently, LIS may have become more versatile in the analysis of its research objects during the years analyzed. Originality/value Among quantitative intellectual content analyses of LIS research, the study is unique in its scope: length of analysis period (50 years), width (8 dimensions covering topical content and methodology) and depth (the annual batch of 30+ scholarly journals).
  4. Pharo, N.; Järvelin, K.: "Irrational" searchers and IR-rational researchers (2006) 0.00
    0.0025239778 = product of:
      0.010095911 = sum of:
        0.010095911 = weight(_text_:information in 4922) [ClassicSimilarity], result of:
          0.010095911 = score(doc=4922,freq=4.0), product of:
            0.06134496 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.034944877 = queryNorm
            0.16457605 = fieldWeight in 4922, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=4922)
      0.25 = coord(1/4)
    
    Abstract
    In this article the authors look at the prescriptions advocated by Web search textbooks in the light of a selection of empirical data of real Web information search processes. They use the strategy of disjointed incrementalism, which is a theoretical foundation from decision making, to focus an how people face complex problems, and claim that such problem solving can be compared to the tasks searchers perform when interacting with the Web. The findings suggest that textbooks an Web searching should take into account that searchers only tend to take a certain number of sources into consideration, that the searchers adjust their goals and objectives during searching, and that searchers reconsider the usefulness of sources at different stages of their work tasks as well as their search tasks.
    Source
    Journal of the American Society for Information Science and Technology. 57(2006) no.2, S.222-232
  5. Toivonen, J.; Pirkola, A.; Keskustalo, H.; Visala, K.; Järvelin, K.: Translating cross-lingual spelling variants using transformation rules (2005) 0.00
    0.0025239778 = product of:
      0.010095911 = sum of:
        0.010095911 = weight(_text_:information in 1052) [ClassicSimilarity], result of:
          0.010095911 = score(doc=1052,freq=4.0), product of:
            0.06134496 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.034944877 = queryNorm
            0.16457605 = fieldWeight in 1052, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=1052)
      0.25 = coord(1/4)
    
    Abstract
    Technical terms and proper names constitute a major problem in dictionary-based cross-language information retrieval (CLIR). However, technical terms and proper names in different languages often share the same Latin or Greek origin, being thus spelling variants of each other. In this paper we present a novel two-step fuzzy translation technique for cross-lingual spelling variants. In the first step, transformation rules are applied to source words to render them more similar to their target language equivalents. The rules are generated automatically using translation dictionaries as source data. In the second step, the intermediate forms obtained in the first step are translated into a target language using fuzzy matching. The effectiveness of the technique was evaluated empirically using five source languages and English as a target language. The two-step technique performed better, in some cases considerably better, than fuzzy matching alone. Even using the first step as such showed promising results.
    Source
    Information processing and management. 41(2005) no.4, S.859-872
  6. Pirkola, A.; Puolamäki, D.; Järvelin, K.: Applying query structuring in cross-language retrieval (2003) 0.00
    0.0025239778 = product of:
      0.010095911 = sum of:
        0.010095911 = weight(_text_:information in 1074) [ClassicSimilarity], result of:
          0.010095911 = score(doc=1074,freq=4.0), product of:
            0.06134496 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.034944877 = queryNorm
            0.16457605 = fieldWeight in 1074, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=1074)
      0.25 = coord(1/4)
    
    Abstract
    We will explore various ways to apply query structuring in cross-language information retrieval. In the first test, English queries were translated into Finnish using an electronic dictionary, and were run in a Finnish newspaper database of 55,000 articles. Queries were structured by combining the Finnish translation equivalents of the same English query key using the syn-operator of the InQuery retrieval system. Structured queries performed markedly better than unstructured queries. Second, the effects of compound-based structuring using a proximity operator for the translation equivalents of query language compound components were tested. The method was not useful in syn-based queries but resulted in decrease in retrieval effectiveness. Proper names are often non-identical spelling variants in different languages. This allows n-gram based translation of names not included in a dictionary. In the third test, a query structuring method where the Boolean and-operator was used to assign more weight to keys translated through n-gram matching gave good results.
    Source
    Information processing and management. 39(2003) no.3, S.391-402
  7. Ahlgren, P.; Järvelin, K.: Measuring impact of twelve information scientists using the DCI index (2010) 0.00
    0.0025239778 = product of:
      0.010095911 = sum of:
        0.010095911 = weight(_text_:information in 3593) [ClassicSimilarity], result of:
          0.010095911 = score(doc=3593,freq=4.0), product of:
            0.06134496 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.034944877 = queryNorm
            0.16457605 = fieldWeight in 3593, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=3593)
      0.25 = coord(1/4)
    
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.7, S.1424-1439
  8. Järvelin, K.; Persson, O.: ¬The DCI-index : discounted cumulated impact-based research evaluation (2008) 0.00
    0.002379629 = product of:
      0.009518516 = sum of:
        0.009518516 = weight(_text_:information in 2332) [ClassicSimilarity], result of:
          0.009518516 = score(doc=2332,freq=2.0), product of:
            0.06134496 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.034944877 = queryNorm
            0.1551638 = fieldWeight in 2332, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0625 = fieldNorm(doc=2332)
      0.25 = coord(1/4)
    
    Source
    Journal of the American Society for Information Science and Technology. 59(2008) no.14, S.2350-2352
  9. Talvensaari, T.; Laurikkala, J.; Järvelin, K.; Juhola, M.: ¬A study on automatic creation of a comparable document collection in cross-language information retrieval (2006) 0.00
    0.0021033147 = product of:
      0.008413259 = sum of:
        0.008413259 = weight(_text_:information in 5601) [ClassicSimilarity], result of:
          0.008413259 = score(doc=5601,freq=4.0), product of:
            0.06134496 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.034944877 = queryNorm
            0.13714671 = fieldWeight in 5601, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5601)
      0.25 = coord(1/4)
    
    Abstract
    Purpose - To present a method for creating a comparable document collection from two document collections in different languages. Design/methodology/approach - The best query keys were extracted from a Finnish source collection (articles of the newspaper Aamulehti) with the relative average term frequency formula. The keys were translated into English with a dictionary-based query translation program. The resulting lists of words were used as queries that were run against the target collection (Los Angeles Times articles) with the nearest neighbor method. The documents were aligned with unrestricted and date-restricted alignment schemes, which were also combined. Findings - The combined alignment scheme was found the best, when the relatedness of the document pairs was assessed with a five-degree relevance scale. Of the 400 document pairs, roughly 40 percent were highly or fairly related and 75 percent included at least lexical similarity. Research limitations/implications - The number of alignment pairs was small due to the short common time period of the two collections, and their geographical (and thus, topical) remoteness. In future, our aim is to build larger comparable corpora in various languages and use them as source of translation knowledge for the purposes of cross-language information retrieval (CLIR). Practical implications - Readily available parallel corpora are scarce. With this method, two unrelated document collections can relatively easily be aligned to create a CLIR resource. Originality/value - The method can be applied to weakly linked collections and morphologically complex languages, such as Finnish.
  10. Näppilä, T.; Järvelin, K.; Niemi, T.: ¬A tool for data cube construction from structurally heterogeneous XML documents (2008) 0.00
    0.0021033147 = product of:
      0.008413259 = sum of:
        0.008413259 = weight(_text_:information in 1369) [ClassicSimilarity], result of:
          0.008413259 = score(doc=1369,freq=4.0), product of:
            0.06134496 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.034944877 = queryNorm
            0.13714671 = fieldWeight in 1369, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1369)
      0.25 = coord(1/4)
    
    Abstract
    Data cubes for OLAP (On-Line Analytical Processing) often need to be constructed from data located in several distributed and autonomous information sources. Such a data integration process is challenging due to semantic, syntactic, and structural heterogeneity among the data. While XML (extensible markup language) is the de facto standard for data exchange, the three types of heterogeneity remain. Moreover, popular path-oriented XML query languages, such as XQuery, require the user to know in much detail the structure of the documents to be processed and are, thus, effectively impractical in many real-world data integration tasks. Several Lowest Common Ancestor (LCA)-based XML query evaluation strategies have recently been introduced to provide a more structure-independent way to access XML documents. We shall, however, show that this approach leads in the context of certain - not uncommon - types of XML documents to undesirable results. This article introduces a novel high-level data extraction primitive that utilizes the purpose-built Smallest Possible Context (SPC) query evaluation strategy. We demonstrate, through a system prototype for OLAP data cube construction and a sample application in informetrics, that our approach has real advantages in data integration.
    Source
    Journal of the American Society for Information Science and Technology. 59(2008) no.3, S.435-449
  11. Järvelin, K.; Persson, O.: ¬The DCI index : discounted cumulated impact-based research evaluation (2008) 0.00
    0.0021033147 = product of:
      0.008413259 = sum of:
        0.008413259 = weight(_text_:information in 2694) [ClassicSimilarity], result of:
          0.008413259 = score(doc=2694,freq=4.0), product of:
            0.06134496 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.034944877 = queryNorm
            0.13714671 = fieldWeight in 2694, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2694)
      0.25 = coord(1/4)
    
    Content
    Erratum in: Järvelin, K., O. Persson: The DCI-index: discounted cumulated impact-based research evaluation. Erratum re. In: Journal of the American Society for Information Science and Technology. 59(2008) no.14, S.2350-2352.
    Source
    Journal of the American Society for Information Science and Technology. 59(2008) no.9, S.1433-1440
  12. Pirkola, A.; Järvelin, K.: Employing the resolution power of search keys (2001) 0.00
    0.0020821756 = product of:
      0.008328702 = sum of:
        0.008328702 = weight(_text_:information in 5907) [ClassicSimilarity], result of:
          0.008328702 = score(doc=5907,freq=2.0), product of:
            0.06134496 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.034944877 = queryNorm
            0.13576832 = fieldWeight in 5907, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5907)
      0.25 = coord(1/4)
    
    Source
    Journal of the American Society for Information Science and technology. 52(2001) no.7, S.575-583
  13. Niemi, T.; Hirvonen, L.; Järvelin, K.: Multidimensional data model and query language for informetrics (2003) 0.00
    0.0017847219 = product of:
      0.0071388874 = sum of:
        0.0071388874 = weight(_text_:information in 1753) [ClassicSimilarity], result of:
          0.0071388874 = score(doc=1753,freq=2.0), product of:
            0.06134496 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.034944877 = queryNorm
            0.116372846 = fieldWeight in 1753, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=1753)
      0.25 = coord(1/4)
    
    Source
    Journal of the American Society for Information Science and technology. 54(2003) no.10, S.939-951
  14. Järvelin, K.; Kristensen, J.; Niemi, T.; Sormunen, E.; Keskustalo, H.: ¬A deductive data model for query expansion (1996) 0.00
    0.0017847219 = product of:
      0.0071388874 = sum of:
        0.0071388874 = weight(_text_:information in 2230) [ClassicSimilarity], result of:
          0.0071388874 = score(doc=2230,freq=2.0), product of:
            0.06134496 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.034944877 = queryNorm
            0.116372846 = fieldWeight in 2230, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=2230)
      0.25 = coord(1/4)
    
    Source
    Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM SIGIR '96), Zürich, Switzerland, August 18-22, 1996. Eds.: H.P. Frei et al
  15. Kekäläinen, J.; Järvelin, K.: Using graded relevance assessments in IR evaluation (2002) 0.00
    0.0014872681 = product of:
      0.0059490725 = sum of:
        0.0059490725 = weight(_text_:information in 5225) [ClassicSimilarity], result of:
          0.0059490725 = score(doc=5225,freq=2.0), product of:
            0.06134496 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.034944877 = queryNorm
            0.09697737 = fieldWeight in 5225, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5225)
      0.25 = coord(1/4)
    
    Source
    Journal of the American Society for Information Science and technology. 53(2002) no.13, S.1120-xxxx