Search (36 results, page 1 of 2)

  • × author_ss:"Järvelin, K."
  1. Saastamoinen, M.; Järvelin, K.: Search task features in work tasks of varying types and complexity (2017) 0.02
    0.017670318 = product of:
      0.053010955 = sum of:
        0.053010955 = sum of:
          0.017396197 = weight(_text_:of in 3589) [ClassicSimilarity], result of:
            0.017396197 = score(doc=3589,freq=12.0), product of:
              0.06850986 = queryWeight, product of:
                1.5637573 = idf(docFreq=25162, maxDocs=44218)
                0.043811057 = queryNorm
              0.25392252 = fieldWeight in 3589, product of:
                3.4641016 = tf(freq=12.0), with freq of:
                  12.0 = termFreq=12.0
                1.5637573 = idf(docFreq=25162, maxDocs=44218)
                0.046875 = fieldNorm(doc=3589)
          0.03561476 = weight(_text_:22 in 3589) [ClassicSimilarity], result of:
            0.03561476 = score(doc=3589,freq=2.0), product of:
              0.15341885 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.043811057 = queryNorm
              0.23214069 = fieldWeight in 3589, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=3589)
      0.33333334 = coord(1/3)
    
    Abstract
    Information searching in practice seldom is an end in itself. In work, work task (WT) performance forms the context, which information searching should serve. Therefore, information retrieval (IR) systems development/evaluation should take the WT context into account. The present paper analyzes how WT features: task complexity and task types, affect information searching in authentic work: the types of information needs, search processes, and search media. We collected data on 22 information professionals in authentic work situations in three organization types: city administration, universities, and companies. The data comprise 286 WTs and 420 search tasks (STs). The data include transaction logs, video recordings, daily questionnaires, interviews. and observation. The data were analyzed quantitatively. Even if the participants used a range of search media, most STs were simple throughout the data, and up to 42% of WTs did not include searching. WT's effects on STs are not straightforward: different WT types react differently to WT complexity. Due to the simplicity of authentic searching, the WT/ST types in interactive IR experiments should be reconsidered.
    Source
    Journal of the Association for Information Science and Technology. 68(2017) no.5, S.1111-1123
  2. Vakkari, P.; Järvelin, K.; Chang, Y.-W.: ¬The association of disciplinary background with the evolution of topics and methods in Library and Information Science research 1995-2015 (2023) 0.02
    0.017274415 = product of:
      0.051823243 = sum of:
        0.051823243 = sum of:
          0.022144277 = weight(_text_:of in 998) [ClassicSimilarity], result of:
            0.022144277 = score(doc=998,freq=28.0), product of:
              0.06850986 = queryWeight, product of:
                1.5637573 = idf(docFreq=25162, maxDocs=44218)
                0.043811057 = queryNorm
              0.32322758 = fieldWeight in 998, product of:
                5.2915025 = tf(freq=28.0), with freq of:
                  28.0 = termFreq=28.0
                1.5637573 = idf(docFreq=25162, maxDocs=44218)
                0.0390625 = fieldNorm(doc=998)
          0.029678967 = weight(_text_:22 in 998) [ClassicSimilarity], result of:
            0.029678967 = score(doc=998,freq=2.0), product of:
              0.15341885 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.043811057 = queryNorm
              0.19345059 = fieldWeight in 998, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=998)
      0.33333334 = coord(1/3)
    
    Abstract
    The paper reports a longitudinal analysis of the topical and methodological development of Library and Information Science (LIS). Its focus is on the effects of researchers' disciplines on these developments. The study extends an earlier cross-sectional study (Vakkari et al., Journal of the Association for Information Science and Technology, 2022a, 73, 1706-1722) by a coordinated dataset representing a content analysis of articles published in 31 scholarly LIS journals in 1995, 2005, and 2015. It is novel in its coverage of authors' disciplines, topical and methodological aspects in a coordinated dataset spanning two decades thus allowing trend analysis. The findings include a shrinking trend in the share of LIS from 67 to 36% while Computer Science, and Business and Economics increase their share from 9 and 6% to 21 and 16%, respectively. The earlier cross-sectional study (Vakkari et al., Journal of the Association for Information Science and Technology, 2022a, 73, 1706-1722) for the year 2015 identified three topical clusters of LIS research, focusing on topical subfields, methodologies, and contributing disciplines. Correspondence analysis confirms their existence already in 1995 and traces their development through the decades. The contributing disciplines infuse their concepts, research questions, and approaches to LIS and may also subsume vital parts of LIS in their own structures of knowledge production.
    Date
    22. 6.2023 18:15:06
    Source
    Journal of the Association for Information Science and Technology. 74(2023) no.7, S.811-827
  3. Järvelin, K.; Kristensen, J.; Niemi, T.; Sormunen, E.; Keskustalo, H.: ¬A deductive data model for query expansion (1996) 0.02
    0.015219486 = product of:
      0.045658458 = sum of:
        0.045658458 = sum of:
          0.010043699 = weight(_text_:of in 2230) [ClassicSimilarity], result of:
            0.010043699 = score(doc=2230,freq=4.0), product of:
              0.06850986 = queryWeight, product of:
                1.5637573 = idf(docFreq=25162, maxDocs=44218)
                0.043811057 = queryNorm
              0.14660224 = fieldWeight in 2230, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                1.5637573 = idf(docFreq=25162, maxDocs=44218)
                0.046875 = fieldNorm(doc=2230)
          0.03561476 = weight(_text_:22 in 2230) [ClassicSimilarity], result of:
            0.03561476 = score(doc=2230,freq=2.0), product of:
              0.15341885 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.043811057 = queryNorm
              0.23214069 = fieldWeight in 2230, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=2230)
      0.33333334 = coord(1/3)
    
    Abstract
    We present a deductive data model for concept-based query expansion. It is based on three abstraction levels: the conceptual, linguistic and occurrence levels. Concepts and relationships among them are represented at the conceptual level. The expression level represents natural language expressions for concepts. Each expression has one or more matching models at the occurrence level. Each model specifies the matching of the expression in database indices built in varying ways. The data model supports a concept-based query expansion and formulation tool, the ExpansionTool, for environments providing heterogeneous IR systems. Expansion is controlled by adjustable matching reliability.
    Source
    Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM SIGIR '96), Zürich, Switzerland, August 18-22, 1996. Eds.: H.P. Frei et al
  4. Näppilä, T.; Järvelin, K.; Niemi, T.: ¬A tool for data cube construction from structurally heterogeneous XML documents (2008) 0.01
    0.014304235 = product of:
      0.042912703 = sum of:
        0.042912703 = sum of:
          0.013233736 = weight(_text_:of in 1369) [ClassicSimilarity], result of:
            0.013233736 = score(doc=1369,freq=10.0), product of:
              0.06850986 = queryWeight, product of:
                1.5637573 = idf(docFreq=25162, maxDocs=44218)
                0.043811057 = queryNorm
              0.19316542 = fieldWeight in 1369, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                1.5637573 = idf(docFreq=25162, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1369)
          0.029678967 = weight(_text_:22 in 1369) [ClassicSimilarity], result of:
            0.029678967 = score(doc=1369,freq=2.0), product of:
              0.15341885 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.043811057 = queryNorm
              0.19345059 = fieldWeight in 1369, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1369)
      0.33333334 = coord(1/3)
    
    Abstract
    Data cubes for OLAP (On-Line Analytical Processing) often need to be constructed from data located in several distributed and autonomous information sources. Such a data integration process is challenging due to semantic, syntactic, and structural heterogeneity among the data. While XML (extensible markup language) is the de facto standard for data exchange, the three types of heterogeneity remain. Moreover, popular path-oriented XML query languages, such as XQuery, require the user to know in much detail the structure of the documents to be processed and are, thus, effectively impractical in many real-world data integration tasks. Several Lowest Common Ancestor (LCA)-based XML query evaluation strategies have recently been introduced to provide a more structure-independent way to access XML documents. We shall, however, show that this approach leads in the context of certain - not uncommon - types of XML documents to undesirable results. This article introduces a novel high-level data extraction primitive that utilizes the purpose-built Smallest Possible Context (SPC) query evaluation strategy. We demonstrate, through a system prototype for OLAP data cube construction and a sample application in informetrics, that our approach has real advantages in data integration.
    Date
    9. 2.2008 17:22:42
    Source
    Journal of the American Society for Information Science and Technology. 59(2008) no.3, S.435-449
  5. Pirkola, A.; Järvelin, K.: Employing the resolution power of search keys (2001) 0.00
    0.004142815 = product of:
      0.012428444 = sum of:
        0.012428444 = product of:
          0.024856888 = sum of:
            0.024856888 = weight(_text_:of in 5907) [ClassicSimilarity], result of:
              0.024856888 = score(doc=5907,freq=18.0), product of:
                0.06850986 = queryWeight, product of:
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.043811057 = queryNorm
                0.36282203 = fieldWeight in 5907, product of:
                  4.2426405 = tf(freq=18.0), with freq of:
                    18.0 = termFreq=18.0
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5907)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Search key resolution power is analyzed in the context of a request, i.e., among the set of search keys for the request. Methods of characterizing the resolution power of keys automatically are studied, and the effects search keys of varying resolution power have on retrieval effectiveness are analyzed. It is shown that it often is possible to identify the best key of a query while the discrimination between the remaining keys presents problems. It is also shown that query performance is improved by suitably using the best key in a structured query. The tests were run with InQuery in a subcollection of the TREC collection, which contained some 515,000 documents
    Source
    Journal of the American Society for Information Science and technology. 52(2001) no.7, S.575-583
  6. Ahlgren, P.; Järvelin, K.: Measuring impact of twelve information scientists using the DCI index (2010) 0.00
    0.0041003237 = product of:
      0.01230097 = sum of:
        0.01230097 = product of:
          0.02460194 = sum of:
            0.02460194 = weight(_text_:of in 3593) [ClassicSimilarity], result of:
              0.02460194 = score(doc=3593,freq=24.0), product of:
                0.06850986 = queryWeight, product of:
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.043811057 = queryNorm
                0.3591007 = fieldWeight in 3593, product of:
                  4.8989797 = tf(freq=24.0), with freq of:
                    24.0 = termFreq=24.0
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3593)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    The Discounted Cumulated Impact (DCI) index has recently been proposed for research evaluation. In the present work an earlier dataset by Cronin and Meho (2007) is reanalyzed, with the aim of exemplifying the salient features of the DCI index. We apply the index on, and compare our results to, the outcomes of the Cronin-Meho (2007) study. Both authors and their top publications are used as units of analysis, which suggests that, by adjusting the parameters of evaluation according to the needs of research evaluation, the DCI index delivers data on an author's (or publication's) lifetime impact or current impact at the time of evaluation on an author's (or publication's) capability of inviting citations from highly cited later publications as an indication of impact, and on the relative impact across a set of authors (or publications) over their lifetime or currently.
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.7, S.1424-1439
  7. Sormunen, E.; Kekäläinen, J.; Koivisto, J.; Järvelin, K.: Document text characteristics affect the ranking of the most relevant documents by expanded structured queries (2001) 0.00
    0.003945538 = product of:
      0.0118366135 = sum of:
        0.0118366135 = product of:
          0.023673227 = sum of:
            0.023673227 = weight(_text_:of in 4487) [ClassicSimilarity], result of:
              0.023673227 = score(doc=4487,freq=32.0), product of:
                0.06850986 = queryWeight, product of:
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.043811057 = queryNorm
                0.34554482 = fieldWeight in 4487, product of:
                  5.656854 = tf(freq=32.0), with freq of:
                    32.0 = termFreq=32.0
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4487)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    The increasing flood of documentary information through the Internet and other information sources challenges the developers of information retrieval systems. It is not enough that an IR system is able to make a distinction between relevant and non-relevant documents. The reduction of information overload requires that IR systems provide the capability of screening the most valuable documents out of the mass of potentially or marginally relevant documents. This paper introduces a new concept-based method to analyse the text characteristics of documents at varying relevance levels. The results of the document analysis were applied in an experiment on query expansion (QE) in a probabilistic IR system. Statistical differences in textual characteristics of highly relevant and less relevant documents were investigated by applying a facet analysis technique. In highly relevant documents a larger number of aspects of the request were discussed, searchable expressions for the aspects were distributed over a larger set of text paragraphs, and a larger set of unique expressions were used per aspect than in marginally relevant documents. A query expansion experiment verified that the findings of the text analysis can be exploited in formulating more effective queries for best match retrieval in the search for highly relevant documents. The results revealed that expanded queries with concept-based structures performed better than unexpanded queries or Ñnatural languageÒ queries. Further, it was shown that highly relevant documents benefit essentially more from the concept-based QE in ranking than marginally relevant documents.
    Source
    Journal of documentation. 57(2001) no.3, S.358-376
  8. Järvelin, K.; Vakkari, P.: ¬The evolution of library and information science 1965-1985 : a content analysis of journal titles (1993) 0.00
    0.0039058835 = product of:
      0.01171765 = sum of:
        0.01171765 = product of:
          0.0234353 = sum of:
            0.0234353 = weight(_text_:of in 4649) [ClassicSimilarity], result of:
              0.0234353 = score(doc=4649,freq=4.0), product of:
                0.06850986 = queryWeight, product of:
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.043811057 = queryNorm
                0.34207192 = fieldWeight in 4649, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.109375 = fieldNorm(doc=4649)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
  9. Kekäläinen, J.; Järvelin, K.: Using graded relevance assessments in IR evaluation (2002) 0.00
    0.0038202507 = product of:
      0.011460752 = sum of:
        0.011460752 = product of:
          0.022921504 = sum of:
            0.022921504 = weight(_text_:of in 5225) [ClassicSimilarity], result of:
              0.022921504 = score(doc=5225,freq=30.0), product of:
                0.06850986 = queryWeight, product of:
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.043811057 = queryNorm
                0.33457235 = fieldWeight in 5225, product of:
                  5.477226 = tf(freq=30.0), with freq of:
                    30.0 = termFreq=30.0
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5225)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Kekalainen and Jarvelin use what they term generalized, nonbinary recall and precision measures where recall is the sum of the relevance scores of the retrieved documents divided by the sum of relevance scores of all documents in the data base, and precision is the sum of the relevance scores of the retrieved documents divided by the number of documents where the relevance scores are real numbers between zero and one. Using the In-Query system and a text data base of 53,893 newspaper articles with 30 queries selected from those for which four relevance categories to provide recall measures were available, search results were evaluated by four judges. Searches were done by average key term weight, Boolean expression, and by average term weight where the terms are grouped by a synonym operator, and for each case with and without expansion of the original terms. Use of higher standards of relevance appears to increase the superiority of the best method. Some methods do a better job of getting the highly relevant documents but do not increase retrieval of marginal ones. There is evidence that generalized precision provides more equitable results, while binary precision provides undeserved merit to some methods. Generally graded relevance measures seem to provide additional insight into IR evaluation.
    Source
    Journal of the American Society for Information Science and technology. 53(2002) no.13, S.1120-xxxx
  10. Järvelin, K.; Niemi, T.: Deductive information retrieval based on classifications (1993) 0.00
    0.003743066 = product of:
      0.0112291975 = sum of:
        0.0112291975 = product of:
          0.022458395 = sum of:
            0.022458395 = weight(_text_:of in 2229) [ClassicSimilarity], result of:
              0.022458395 = score(doc=2229,freq=20.0), product of:
                0.06850986 = queryWeight, product of:
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.043811057 = queryNorm
                0.32781258 = fieldWeight in 2229, product of:
                  4.472136 = tf(freq=20.0), with freq of:
                    20.0 = termFreq=20.0
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2229)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Modern fact databses contain abundant data classified through several classifications. Typically, users msut consult these classifications in separate manuals or files, thus making their effective use difficult. Contemporary database systems do little support deductive use of classifications. In this study we show how deductive data management techniques can be applied to the utilization of data value classifications. Computation of transitive class relationships is of primary importance here. We define a representation of classifications which supports transitive computation and present an operation-oriented deductive query language tailored for classification-based deductive information retrieval. The operations of this language are on the same abstraction level as relational algebra operations and can be integrated with these to form a powerful and flexible query language for deductive information retrieval. We define the integration of these operations and demonstrate the usefulness of the language in terms of several sample queries
    Source
    Journal of the American Society for Information Science. 44(1993) no.10, S.557-578
  11. Kumpulainen, S.; Järvelin, K.: Barriers to task-based information access in molecular medicine (2012) 0.00
    0.0036907129 = product of:
      0.011072138 = sum of:
        0.011072138 = product of:
          0.022144277 = sum of:
            0.022144277 = weight(_text_:of in 4965) [ClassicSimilarity], result of:
              0.022144277 = score(doc=4965,freq=28.0), product of:
                0.06850986 = queryWeight, product of:
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.043811057 = queryNorm
                0.32322758 = fieldWeight in 4965, product of:
                  5.2915025 = tf(freq=28.0), with freq of:
                    28.0 = termFreq=28.0
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4965)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    We analyze barriers to task-based information access in molecular medicine, focusing on research tasks, which provide task performance sessions of varying complexity. Molecular medicine is a relevant domain because it offers thousands of digital resources as the information environment. Data were collected through shadowing of real work tasks. Thirty work task sessions were analyzed and barriers in these identified. The barriers were classified by their character (conceptual, syntactic, and technological) and by their context of appearance (work task, system integration, or system). Also, work task sessions were grouped into three complexity classes and the frequency of barriers of varying types across task complexity levels were analyzed. Our findings indicate that although most of the barriers are on system level, there is a quantum of barriers in integration and work task contexts. These barriers might be overcome through attention to the integrated use of multiple systems at least for the most frequent uses. This can be done by means of standardization and harmonization of the data and by taking the requirements of the work tasks into account in system design and development, because information access is seldom an end itself, but rather serves to reach the goals of work tasks.
    Source
    Journal of the American Society for Information Science and Technology. 63(2012) no.1, S.86-97
  12. Tuomaala, O.; Järvelin, K.; Vakkari, P.: Evolution of library and information science, 1965-2005 : content analysis of journal articles (2014) 0.00
    0.0036907129 = product of:
      0.011072138 = sum of:
        0.011072138 = product of:
          0.022144277 = sum of:
            0.022144277 = weight(_text_:of in 1309) [ClassicSimilarity], result of:
              0.022144277 = score(doc=1309,freq=28.0), product of:
                0.06850986 = queryWeight, product of:
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.043811057 = queryNorm
                0.32322758 = fieldWeight in 1309, product of:
                  5.2915025 = tf(freq=28.0), with freq of:
                    28.0 = termFreq=28.0
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1309)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    This article first analyzes library and information science (LIS) research articles published in core LIS journals in 2005. It also examines the development of LIS from 1965 to 2005 in light of comparable data sets for 1965, 1985, and 2005. In both cases, the authors report (a) how the research articles are distributed by topic and (b) what approaches, research strategies, and methods were applied in the articles. In 2005, the largest research areas in LIS by this measure were information storage and retrieval, scientific communication, library and information-service activities, and information seeking. The same research areas constituted the quantitative core of LIS in the previous years since 1965. Information retrieval has been the most popular area of research over the years. The proportion of research on library and information-service activities decreased after 1985, but the popularity of information seeking and of scientific communication grew during the period studied. The viewpoint of research has shifted from library and information organizations to end users and development of systems for the latter. The proportion of empirical research strategies was high and rose over time, with the survey method being the single most important method. However, attention to evaluation and experiments increased considerably after 1985. Conceptual research strategies and system analysis, description, and design were quite popular, but declining. The most significant changes from 1965 to 2005 are the decreasing interest in library and information-service activities and the growth of research into information seeking and scientific communication.
    Source
    Journal of the Association for Information Science and Technology. 65(2014) no.7, S.1446-1462
  13. Kettunen, K.; Kunttu, T.; Järvelin, K.: To stem or lemmatize a highly inflectional language in a probabilistic IR environment? (2005) 0.00
    0.00355646 = product of:
      0.0106693795 = sum of:
        0.0106693795 = product of:
          0.021338759 = sum of:
            0.021338759 = weight(_text_:of in 4395) [ClassicSimilarity], result of:
              0.021338759 = score(doc=4395,freq=26.0), product of:
                0.06850986 = queryWeight, product of:
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.043811057 = queryNorm
                0.31146988 = fieldWeight in 4395, product of:
                  5.0990195 = tf(freq=26.0), with freq of:
                    26.0 = termFreq=26.0
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4395)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Purpose - To show that stem generation compares well with lemmatization as a morphological tool for a highly inflectional language for IR purposes in a best-match retrieval system. Design/methodology/approach - Effects of three different morphological methods - lemmatization, stemming and stem production - for Finnish are compared in a probabilistic IR environment (INQUERY). Evaluation is done using a four-point relevance scale which is partitioned differently in different test settings. Findings - Results show that stem production, a lighter method than morphological lemmatization, compares well with lemmatization in a best-match IR environment. Differences in performance between stem production and lemmatization are small and they are not statistically significant in most of the tested settings. It is also shown that hitherto a rather neglected method of morphological processing for Finnish, stemming, performs reasonably well although the stemmer used - a Porter stemmer implementation - is far from optimal for a morphologically complex language like Finnish. In another series of tests, the effects of compound splitting and derivational expansion of queries are tested. Practical implications - Usefulness of morphological lemmatization and stem generation for IR purposes can be estimated with many factors. On the average P-R level they seem to behave very close to each other in a probabilistic IR system. Thus, the choice of the used method with highly inflectional languages needs to be estimated along other dimensions too. Originality/value - Results are achieved using Finnish as an example of a highly inflectional language. The results are of interest for anyone who is interested in processing of morphological variation of a highly inflected language for IR purposes.
    Source
    Journal of documentation. 61(2005) no.4, S.476-496
  14. Järvelin, K.; Vakkari, P.: LIS research across 50 years: content analysis of journal articles : offering an information-centric conception of memes (2022) 0.00
    0.00355646 = product of:
      0.0106693795 = sum of:
        0.0106693795 = product of:
          0.021338759 = sum of:
            0.021338759 = weight(_text_:of in 949) [ClassicSimilarity], result of:
              0.021338759 = score(doc=949,freq=26.0), product of:
                0.06850986 = queryWeight, product of:
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.043811057 = queryNorm
                0.31146988 = fieldWeight in 949, product of:
                  5.0990195 = tf(freq=26.0), with freq of:
                    26.0 = termFreq=26.0
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=949)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Purpose This paper analyses the research in Library and Information Science (LIS) and reports on (1) the status of LIS research in 2015 and (2) on the evolution of LIS research longitudinally from 1965 to 2015. Design/methodology/approach The study employs a quantitative intellectual content analysis of articles published in 30+ scholarly LIS journals, following the design by Tuomaala et al. (2014). In the content analysis, we classify articles along eight dimensions covering topical content and methodology. Findings The topical findings indicate that the earlier strong LIS emphasis on L&I services has declined notably, while scientific and professional communication has become the most popular topic. Information storage and retrieval has given up its earlier strong position towards the end of the years analyzed. Individuals are increasingly the units of observation. End-user's and developer's viewpoints have strengthened at the cost of intermediaries' viewpoint. LIS research is methodologically increasingly scattered since survey, scientometric methods, experiment, case studies and qualitative studies have all gained in popularity. Consequently, LIS may have become more versatile in the analysis of its research objects during the years analyzed. Originality/value Among quantitative intellectual content analyses of LIS research, the study is unique in its scope: length of analysis period (50 years), width (8 dimensions covering topical content and methodology) and depth (the annual batch of 30+ scholarly journals).
    Source
    Journal of documentation. 78(2022) no.7, S.65-88
  15. Halttunen, K.; Järvelin, K.: Assessing learning outcomes in two information retrieval learning environments (2005) 0.00
    0.0035509837 = product of:
      0.010652951 = sum of:
        0.010652951 = product of:
          0.021305902 = sum of:
            0.021305902 = weight(_text_:of in 996) [ClassicSimilarity], result of:
              0.021305902 = score(doc=996,freq=18.0), product of:
                0.06850986 = queryWeight, product of:
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.043811057 = queryNorm
                0.3109903 = fieldWeight in 996, product of:
                  4.2426405 = tf(freq=18.0), with freq of:
                    18.0 = termFreq=18.0
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.046875 = fieldNorm(doc=996)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    In order to design information retrieval (IR) learning environments and instruction, it is important to explore learning outcomes of different pedagogical solutions. Learning outcomes have seldom been evaluated in IR instruction. The particular focus of this study is the assessment of learning outcomes in an experimental, but naturalistic, learning environment compared to more traditional instruction. The 57 participants of an introductory course on IR were selected for this study, and the analysis illustrates their learning outcomes regarding both conceptual change and development of IR skill. Concept mapping of student essays was used to analyze conceptual change and log-files of search exercises provided data for performance assessment. Students in the experimental learning environment changed their conceptions more regarding linguistic aspects of IR and paid more emphasis on planning and management of search process. Performance assessment indicates that anchored instruction and scaffolding with an instructional tool, the IR Game, with performance feedback enables students to construct queries with fewer semantic knowledge errors also in operational IR systems.
  16. Järvelin, K.; Persson, O.: ¬The DCI index : discounted cumulated impact-based research evaluation (2008) 0.00
    0.0034169364 = product of:
      0.010250809 = sum of:
        0.010250809 = product of:
          0.020501617 = sum of:
            0.020501617 = weight(_text_:of in 2694) [ClassicSimilarity], result of:
              0.020501617 = score(doc=2694,freq=24.0), product of:
                0.06850986 = queryWeight, product of:
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.043811057 = queryNorm
                0.2992506 = fieldWeight in 2694, product of:
                  4.8989797 = tf(freq=24.0), with freq of:
                    24.0 = termFreq=24.0
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2694)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Research evaluation is increasingly popular and important among research funding bodies and science policy makers. Various indicators have been proposed to evaluate the standing of individual scientists, institutions, journals, or countries. A simple and popular one among the indicators is the h-index, the Hirsch index (Hirsch 2005), which is an indicator for lifetime achievement of a scholar. Several other indicators have been proposed to complement or balance the h-index. However, these indicators have no conception of aging. The AR-index (Jin et al. 2007) incorporates aging but divides the received citation counts by the raw age of the publication. Consequently, the decay of a publication is very steep and insensitive to disciplinary differences. In addition, we believe that a publication becomes outdated only when it is no longer cited, not because of its age. Finally, all indicators treat citations as equally material when one might reasonably think that a citation from a heavily cited publication should weigh more than a citation froma non-cited or little-cited publication.We propose a new indicator, the Discounted Cumulated Impact (DCI) index, which devalues old citations in a smooth way. It rewards an author for receiving new citations even if the publication is old. Further, it allows weighting of the citations by the citation weight of the citing publication. DCI can be used to calculate research performance on the basis of the h-core of a scholar or any other publication data.
    Content
    Erratum in: Järvelin, K., O. Persson: The DCI-index: discounted cumulated impact-based research evaluation. Erratum re. In: Journal of the American Society for Information Science and Technology. 59(2008) no.14, S.2350-2352.
    Source
    Journal of the American Society for Information Science and Technology. 59(2008) no.9, S.1433-1440
  17. Järvelin, K.; Ingwersen, P.: User-oriented and cognitive models of information retrieval (2009) 0.00
    0.003382594 = product of:
      0.010147782 = sum of:
        0.010147782 = product of:
          0.020295564 = sum of:
            0.020295564 = weight(_text_:of in 3901) [ClassicSimilarity], result of:
              0.020295564 = score(doc=3901,freq=12.0), product of:
                0.06850986 = queryWeight, product of:
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.043811057 = queryNorm
                0.29624295 = fieldWeight in 3901, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3901)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    The domain of user-oriented and cognitive information retrieval (IR) is first discussed, followed by a discussion on the dimensions and types of models one may build for the domain. The focus of the present entry is on the models of user-oriented and cognitive IR, not on their empirical applications. Several models with different emphases on user-oriented and cognitive IR are presented-ranging from overall approaches and relevance models to procedural models, cognitive models, and task-based models. The present entry does not discuss empirical findings based on the models.
    Source
    Encyclopedia of library and information sciences. 3rd ed. Ed.: M.J. Bates
  18. Pharo, N.; Järvelin, K.: "Irrational" searchers and IR-rational researchers (2006) 0.00
    0.0033478998 = product of:
      0.010043699 = sum of:
        0.010043699 = product of:
          0.020087399 = sum of:
            0.020087399 = weight(_text_:of in 4922) [ClassicSimilarity], result of:
              0.020087399 = score(doc=4922,freq=16.0), product of:
                0.06850986 = queryWeight, product of:
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.043811057 = queryNorm
                0.2932045 = fieldWeight in 4922, product of:
                  4.0 = tf(freq=16.0), with freq of:
                    16.0 = termFreq=16.0
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4922)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    In this article the authors look at the prescriptions advocated by Web search textbooks in the light of a selection of empirical data of real Web information search processes. They use the strategy of disjointed incrementalism, which is a theoretical foundation from decision making, to focus an how people face complex problems, and claim that such problem solving can be compared to the tasks searchers perform when interacting with the Web. The findings suggest that textbooks an Web searching should take into account that searchers only tend to take a certain number of sources into consideration, that the searchers adjust their goals and objectives during searching, and that searchers reconsider the usefulness of sources at different stages of their work tasks as well as their search tasks.
    Source
    Journal of the American Society for Information Science and Technology. 57(2006) no.2, S.222-232
  19. Saarikoski, J.; Laurikkala, J.; Järvelin, K.; Juhola, M.: ¬A study of the use of self-organising maps in information retrieval (2009) 0.00
    0.003271467 = product of:
      0.009814401 = sum of:
        0.009814401 = product of:
          0.019628802 = sum of:
            0.019628802 = weight(_text_:of in 2836) [ClassicSimilarity], result of:
              0.019628802 = score(doc=2836,freq=22.0), product of:
                0.06850986 = queryWeight, product of:
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.043811057 = queryNorm
                0.28651062 = fieldWeight in 2836, product of:
                  4.690416 = tf(freq=22.0), with freq of:
                    22.0 = termFreq=22.0
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2836)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Purpose - The aim of this paper is to explore the possibility of retrieving information with Kohonen self-organising maps, which are known to be effective to group objects according to their similarity or dissimilarity. Design/methodology/approach - After conventional preprocessing, such as transforming into vector space, documents from a German document collection were trained for a neural network of Kohonen self-organising map type. Such an unsupervised network forms a document map from which relevant objects can be found according to queries. Findings - Self-organising maps ordered documents to groups from which it was possible to find relevant targets. Research limitations/implications - The number of documents used was moderate due to the limited number of documents associated to test topics. The training of self-organising maps entails rather long running times, which is their practical limitation. In future, the aim will be to build larger networks by compressing document matrices, and to develop document searching in them. Practical implications - With self-organising maps the distribution of documents can be visualised and relevant documents found in document collections of limited size. Originality/value - The paper reports on an approach that can be especially used to group documents and also for information search. So far self-organising maps have rarely been studied for information retrieval. Instead, they have been applied to document grouping tasks.
    Source
    Journal of documentation. 65(2009) no.2, S.304-322
  20. Ferro, N.; Silvello, G.; Keskustalo, H.; Pirkola, A.; Järvelin, K.: ¬The twist measure for IR evaluation : taking user's effort into account (2016) 0.00
    0.003271467 = product of:
      0.009814401 = sum of:
        0.009814401 = product of:
          0.019628802 = sum of:
            0.019628802 = weight(_text_:of in 2771) [ClassicSimilarity], result of:
              0.019628802 = score(doc=2771,freq=22.0), product of:
                0.06850986 = queryWeight, product of:
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.043811057 = queryNorm
                0.28651062 = fieldWeight in 2771, product of:
                  4.690416 = tf(freq=22.0), with freq of:
                    22.0 = termFreq=22.0
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2771)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    We present a novel measure for ranking evaluation, called Twist (t). It is a measure for informational intents, which handles both binary and graded relevance. t stems from the observation that searching is currently a that searching is currently taken for granted and it is natural for users to assume that search engines are available and work well. As a consequence, users may assume the utility they have in finding relevant documents, which is the focus of traditional measures, as granted. On the contrary, they may feel uneasy when the system returns nonrelevant documents because they are then forced to do additional work to get the desired information, and this causes avoidable effort. The latter is the focus of t, which evaluates the effectiveness of a system from the point of view of the effort required to the users to retrieve the desired information. We provide a formal definition of t, a demonstration of its properties, and introduce the notion of effort/gain plots, which complement traditional utility-based measures. By means of an extensive experimental evaluation, t is shown to grasp different aspects of system performances, to not require extensive and costly assessments, and to be a robust tool for detecting differences between systems.
    Source
    Journal of the Association for Information Science and Technology. 67(2016) no.3, S.620-648