Search (34 results, page 1 of 2)

  • × author_ss:"Järvelin, K."
  1. Ingwersen, P.; Järvelin, K.: ¬The turn : integration of information seeking and retrieval in context (2005) 0.01
    0.013260903 = product of:
      0.039782707 = sum of:
        0.011376208 = weight(_text_:in in 1323) [ClassicSimilarity], result of:
          0.011376208 = score(doc=1323,freq=52.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.19158077 = fieldWeight in 1323, product of:
              7.2111025 = tf(freq=52.0), with freq of:
                52.0 = termFreq=52.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.01953125 = fieldNorm(doc=1323)
        0.028406499 = weight(_text_:und in 1323) [ClassicSimilarity], result of:
          0.028406499 = score(doc=1323,freq=46.0), product of:
            0.09675359 = queryWeight, product of:
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.043654136 = queryNorm
            0.29359633 = fieldWeight in 1323, product of:
              6.78233 = tf(freq=46.0), with freq of:
                46.0 = termFreq=46.0
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.01953125 = fieldNorm(doc=1323)
      0.33333334 = coord(2/6)
    
    Abstract
    The Turn analyzes the research of information seeking and retrieval (IS&R) and proposes a new direction of integrating research in these two areas: the fields should turn off their separate and narrow paths and construct a new avenue of research. An essential direction for this avenue is context as given in the subtitle Integration of Information Seeking and Retrieval in Context. Other essential themes in the book include: IS&R research models, frameworks and theories; search and works tasks and situations in context; interaction between humans and machines; information acquisition, relevance and information use; research design and methodology based on a structured set of explicit variables - all set into the holistic cognitive approach. The present monograph invites the reader into a construction project - there is much research to do for a contextual understanding of IS&R. The Turn represents a wide-ranging perspective of IS&R by providing a novel unique research framework, covering both individual and social aspects of information behavior, including the generation, searching, retrieval and use of information. Regarding traditional laboratory information retrieval research, the monograph proposes the extension of research toward actors, search and work tasks, IR interaction and utility of information. Regarding traditional information seeking research, it proposes the extension toward information access technology and work task contexts. The Turn is the first synthesis of research in the broad area of IS&R ranging from systems oriented laboratory IR research to social science oriented information seeking studies. TOC:Introduction.- The Cognitive Framework for Information.- The Development of Information Seeking Research.- Systems-Oriented Information Retrieval.- Cognitive and User-Oriented Information Retrieval.- The Integrated IS&R Research Framework.- Implications of the Cognitive Framework for IS&R.- Towards a Research Program.- Conclusion.- Definitions.- References.- Index.
    Footnote
    Rez. in: Mitt. VÖB 59(2006) H.2, S.81-83 (O. Oberhauser): "Mit diesem Band haben zwei herausragende Vertreter der europäischen Informationswissenschaft, die Professoren Peter Ingwersen (Kopenhagen) und Kalervo Järvelin (Tampere) ein Werk vorgelegt, das man vielleicht dereinst als ihr opus magnum bezeichnen wird. Mich würde dies nicht überraschen, denn die Autoren unternehmen hier den ambitionierten Versuch, zwei informations wissenschaftliche Forschungstraditionen, die einander bisher in eher geringem Ausmass begegneten, unter einem gesamtheitlichen kognitiven Ansatz zu vereinen - das primär im sozialwissenschaftlichen Bereich verankerte Forschungsgebiet "Information Seeking and Retrieval" (IS&R) und das vorwiegend im Informatikbereich angesiedelte "Information Retrieval" (IR). Dabei geht es ihnen auch darum, den seit etlichen Jahren zwar dominierenden, aber auch als zu individualistisch kritisierten kognitiven Ansatz so zu erweitern, dass technologische, verhaltensbezogene und kooperative Aspekte in kohärenter Weise berücksichtigt werden. Dies geschieht auf folgende Weise in neun Kapiteln: - Zunächst werden die beiden "Lager" - die an Systemen und Laborexperimenten orientierte IR-Tradition und die an Benutzerfragen orientierte IS&R-Fraktion - einander gegenübergestellt und einige zentrale Begriffe geklärt. - Im zweiten Kapitel erfolgt eine ausführliche Darstellung der kognitiven Richtung der Informationswissenschaft, insbesondere hinsichtlich des Informationsbegriffes. - Daran schliesst sich ein Überblick über die bisherige Forschung zu "Information Seeking" (IS) - eine äusserst brauchbare Einführung in die Forschungsfragen und Modelle, die Forschungsmethodik sowie die in diesem Bereich offenen Fragen, z.B. die aufgrund der einseitigen Ausrichtung des Blickwinkels auf den Benutzer mangelnde Betrachtung der Benutzer-System-Interaktion. - In analoger Weise wird im vierten Kapitel die systemorientierte IRForschung in einem konzentrierten Überblick vorgestellt, in dem es sowohl um das "Labormodell" als auch Ansätze wie die Verarbeitung natürlicher Sprache und Expertensysteme geht. Aspekte wie Relevanz, Anfragemodifikation und Performanzmessung werden ebenso angesprochen wie die Methodik - von den ersten Laborexperimenten bis zu TREC und darüber hinaus.
    - Kapitel fünf enthält einen entsprechenden Überblick über die kognitive und benutzerorientierte IR-Tradition. Es zeigt, welche anderen (als nur die labororientierten) IR-Studien durchgeführt werden können, wobei sich die Betrachtung von frühen Modellen (z.B. Taylor) über Belkins ASK-Konzept bis zu Ingwersens Modell der Polyrepräsentation, und von Bates Berrypicking-Ansatz bis zu Vakkaris "taskbased" IR-Modell erstreckt. Auch Web-IR, OKAPI und Diskussionen zum Relevanzbegriff werden hier thematisiert. - Im folgenden Kapitel schlagen die Autoren ein integriertes IS&R Forschungsmodell vor, bei dem die vielfältigen Beziehungen zwischen Informationssuchenden, Systementwicklern, Oberflächen und anderen beteiligten Aspekten berücksichtigt werden. Ihr Ansatz vereint die traditionelle Laborforschung mit verschiedenen benutzerorientierten Traditionen aus IS&R, insbesondere mit den empirischen Ansätzen zu IS und zum interaktiven IR, in einem holistischen kognitiven Modell. - Kapitel sieben untersucht die Implikationen dieses Modells für IS&R, wobei besonders ins Auge fällt, wie komplex die Anfragen von Informationssuchenden im Vergleich mit der relativen Einfachheit der Algorithmen zum Auffinden relevanter Dokumente sind. Die Abbildung der vielfältig variierenden kognitiven Zustände der Anfragesteller im Rahmen der der Systementwicklung ist sicherlich keine triviale Aufgabe. Wie dabei das Problem der Einbeziehung des zentralen Aspektes der Bedeutung gelöst werden kann, sei dahingestellt. - Im achten Kapitel wird der Versuch unternommen, die zuvor diskutierten Punkte in ein IS&R-Forschungsprogramm (Prozesse - Verhalten - Systemfunktionalität - Performanz) umzusetzen, wobei auch einige kritische Anmerkungen zur bisherigen Forschungspraxis getroffen werden. - Das abschliessende neunte Kapitel fasst das Buch kurz zusammen und kann somit auch als Einstieg in dieThematik gelesen werden. Darauffolgen noch ein sehr nützliches Glossar zu allen wichtigen Begriffen, die in dem Buch Verwendung finden, eine Bibliographie und ein Sachregister. Ingwersen und Järvelin haben hier ein sehr anspruchsvolles und dennoch lesbares Buch vorgelegt. Die gebotenen Übersichtskapitel und Diskussionen sind zwar keine Einführung in die Informationswissenschaft, decken aber einen grossen Teil der heute in dieser Disziplin aktuellen und durch laufende Forschungsaktivitäten und Publikationen berührten Teilbereiche ab. Man könnte es auch - vielleicht ein wenig überspitzt - so formulieren: Was hier thematisiert wird, ist eigentlich die moderne Informationswissenschaft. Der Versuch, die beiden Forschungstraditionen zu vereinen, wird diesem Werk sicherlich einen Platz in der Geschichte der Disziplin sichern. Nicht ganz glücklich erscheint der Titel des Buches. "The Turn" soll eine Wende bedeuten, nämlich jene hin zu einer integrierten Sicht von IS und IR. Das geht vermutlich aus dem Untertitel besser hervor, doch dieser erschien den Autoren wohl zu trocken. Schade, denn "The Turn" gibt es z.B. in unserem Verbundkatalog bereits, allerdings mit dem Zusatz "from the Cold War to a new era; the United States and the Soviet Union 1983-1990". Der Verlag, der abgesehen davon ein gediegenes (wenn auch nicht gerade wohlfeiles) Produkt vorgelegt hat, hätte derlei unscharfe Duplizierend besser verhindert. Ungeachtet dessen empfehle ich dieses wichtige Buch ohne Vorbehalt zur Anschaffung; es sollte in keiner grösseren Bibliothek fehlen."
    Theme
    Semantisches Umfeld in Indexierung u. Retrieval
  2. Saastamoinen, M.; Järvelin, K.: Search task features in work tasks of varying types and complexity (2017) 0.01
    0.01096284 = product of:
      0.03288852 = sum of:
        0.015144923 = weight(_text_:in in 3589) [ClassicSimilarity], result of:
          0.015144923 = score(doc=3589,freq=16.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.25504774 = fieldWeight in 3589, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=3589)
        0.017743597 = product of:
          0.035487194 = sum of:
            0.035487194 = weight(_text_:22 in 3589) [ClassicSimilarity], result of:
              0.035487194 = score(doc=3589,freq=2.0), product of:
                0.15286934 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043654136 = queryNorm
                0.23214069 = fieldWeight in 3589, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3589)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Abstract
    Information searching in practice seldom is an end in itself. In work, work task (WT) performance forms the context, which information searching should serve. Therefore, information retrieval (IR) systems development/evaluation should take the WT context into account. The present paper analyzes how WT features: task complexity and task types, affect information searching in authentic work: the types of information needs, search processes, and search media. We collected data on 22 information professionals in authentic work situations in three organization types: city administration, universities, and companies. The data comprise 286 WTs and 420 search tasks (STs). The data include transaction logs, video recordings, daily questionnaires, interviews. and observation. The data were analyzed quantitatively. Even if the participants used a range of search media, most STs were simple throughout the data, and up to 42% of WTs did not include searching. WT's effects on STs are not straightforward: different WT types react differently to WT complexity. Due to the simplicity of authentic searching, the WT/ST types in interactive IR experiments should be reconsidered.
  3. Järvelin, K.; Kristensen, J.; Niemi, T.; Sormunen, E.; Keskustalo, H.: ¬A deductive data model for query expansion (1996) 0.01
    0.009484224 = product of:
      0.028452672 = sum of:
        0.010709076 = weight(_text_:in in 2230) [ClassicSimilarity], result of:
          0.010709076 = score(doc=2230,freq=8.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.18034597 = fieldWeight in 2230, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=2230)
        0.017743597 = product of:
          0.035487194 = sum of:
            0.035487194 = weight(_text_:22 in 2230) [ClassicSimilarity], result of:
              0.035487194 = score(doc=2230,freq=2.0), product of:
                0.15286934 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043654136 = queryNorm
                0.23214069 = fieldWeight in 2230, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2230)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Abstract
    We present a deductive data model for concept-based query expansion. It is based on three abstraction levels: the conceptual, linguistic and occurrence levels. Concepts and relationships among them are represented at the conceptual level. The expression level represents natural language expressions for concepts. Each expression has one or more matching models at the occurrence level. Each model specifies the matching of the expression in database indices built in varying ways. The data model supports a concept-based query expansion and formulation tool, the ExpansionTool, for environments providing heterogeneous IR systems. Expansion is controlled by adjustable matching reliability.
    Source
    Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM SIGIR '96), Zürich, Switzerland, August 18-22, 1996. Eds.: H.P. Frei et al
    Theme
    Semantisches Umfeld in Indexierung u. Retrieval
  4. Vakkari, P.; Järvelin, K.; Chang, Y.-W.: ¬The association of disciplinary background with the evolution of topics and methods in Library and Information Science research 1995-2015 (2023) 0.01
    0.0091357 = product of:
      0.027407099 = sum of:
        0.012620768 = weight(_text_:in in 998) [ClassicSimilarity], result of:
          0.012620768 = score(doc=998,freq=16.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.21253976 = fieldWeight in 998, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=998)
        0.014786332 = product of:
          0.029572664 = sum of:
            0.029572664 = weight(_text_:22 in 998) [ClassicSimilarity], result of:
              0.029572664 = score(doc=998,freq=2.0), product of:
                0.15286934 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043654136 = queryNorm
                0.19345059 = fieldWeight in 998, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=998)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Abstract
    The paper reports a longitudinal analysis of the topical and methodological development of Library and Information Science (LIS). Its focus is on the effects of researchers' disciplines on these developments. The study extends an earlier cross-sectional study (Vakkari et al., Journal of the Association for Information Science and Technology, 2022a, 73, 1706-1722) by a coordinated dataset representing a content analysis of articles published in 31 scholarly LIS journals in 1995, 2005, and 2015. It is novel in its coverage of authors' disciplines, topical and methodological aspects in a coordinated dataset spanning two decades thus allowing trend analysis. The findings include a shrinking trend in the share of LIS from 67 to 36% while Computer Science, and Business and Economics increase their share from 9 and 6% to 21 and 16%, respectively. The earlier cross-sectional study (Vakkari et al., Journal of the Association for Information Science and Technology, 2022a, 73, 1706-1722) for the year 2015 identified three topical clusters of LIS research, focusing on topical subfields, methodologies, and contributing disciplines. Correspondence analysis confirms their existence already in 1995 and traces their development through the decades. The contributing disciplines infuse their concepts, research questions, and approaches to LIS and may also subsume vital parts of LIS in their own structures of knowledge production.
    Date
    22. 6.2023 18:15:06
  5. Näppilä, T.; Järvelin, K.; Niemi, T.: ¬A tool for data cube construction from structurally heterogeneous XML documents (2008) 0.01
    0.008572079 = product of:
      0.025716238 = sum of:
        0.010929906 = weight(_text_:in in 1369) [ClassicSimilarity], result of:
          0.010929906 = score(doc=1369,freq=12.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.18406484 = fieldWeight in 1369, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1369)
        0.014786332 = product of:
          0.029572664 = sum of:
            0.029572664 = weight(_text_:22 in 1369) [ClassicSimilarity], result of:
              0.029572664 = score(doc=1369,freq=2.0), product of:
                0.15286934 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043654136 = queryNorm
                0.19345059 = fieldWeight in 1369, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1369)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Abstract
    Data cubes for OLAP (On-Line Analytical Processing) often need to be constructed from data located in several distributed and autonomous information sources. Such a data integration process is challenging due to semantic, syntactic, and structural heterogeneity among the data. While XML (extensible markup language) is the de facto standard for data exchange, the three types of heterogeneity remain. Moreover, popular path-oriented XML query languages, such as XQuery, require the user to know in much detail the structure of the documents to be processed and are, thus, effectively impractical in many real-world data integration tasks. Several Lowest Common Ancestor (LCA)-based XML query evaluation strategies have recently been introduced to provide a more structure-independent way to access XML documents. We shall, however, show that this approach leads in the context of certain - not uncommon - types of XML documents to undesirable results. This article introduces a novel high-level data extraction primitive that utilizes the purpose-built Smallest Possible Context (SPC) query evaluation strategy. We demonstrate, through a system prototype for OLAP data cube construction and a sample application in informetrics, that our approach has real advantages in data integration.
    Date
    9. 2.2008 17:22:42
  6. Kristensen, J.; Järvelin, K.: ¬The effectiveness of a searching thesaurus in free-text searching in a full-text database (1990) 0.00
    0.003365538 = product of:
      0.020193228 = sum of:
        0.020193228 = weight(_text_:in in 2043) [ClassicSimilarity], result of:
          0.020193228 = score(doc=2043,freq=4.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.34006363 = fieldWeight in 2043, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.125 = fieldNorm(doc=2043)
      0.16666667 = coord(1/6)
    
  7. Pirkola, A.; Puolamäki, D.; Järvelin, K.: Applying query structuring in cross-language retrieval (2003) 0.00
    0.0028220895 = product of:
      0.016932536 = sum of:
        0.016932536 = weight(_text_:in in 1074) [ClassicSimilarity], result of:
          0.016932536 = score(doc=1074,freq=20.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.28515202 = fieldWeight in 1074, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=1074)
      0.16666667 = coord(1/6)
    
    Abstract
    We will explore various ways to apply query structuring in cross-language information retrieval. In the first test, English queries were translated into Finnish using an electronic dictionary, and were run in a Finnish newspaper database of 55,000 articles. Queries were structured by combining the Finnish translation equivalents of the same English query key using the syn-operator of the InQuery retrieval system. Structured queries performed markedly better than unstructured queries. Second, the effects of compound-based structuring using a proximity operator for the translation equivalents of query language compound components were tested. The method was not useful in syn-based queries but resulted in decrease in retrieval effectiveness. Proper names are often non-identical spelling variants in different languages. This allows n-gram based translation of names not included in a dictionary. In the third test, a query structuring method where the Boolean and-operator was used to assign more weight to keys translated through n-gram matching gave good results.
  8. Toivonen, J.; Pirkola, A.; Keskustalo, H.; Visala, K.; Järvelin, K.: Translating cross-lingual spelling variants using transformation rules (2005) 0.00
    0.0023611297 = product of:
      0.014166778 = sum of:
        0.014166778 = weight(_text_:in in 1052) [ClassicSimilarity], result of:
          0.014166778 = score(doc=1052,freq=14.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.23857531 = fieldWeight in 1052, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=1052)
      0.16666667 = coord(1/6)
    
    Abstract
    Technical terms and proper names constitute a major problem in dictionary-based cross-language information retrieval (CLIR). However, technical terms and proper names in different languages often share the same Latin or Greek origin, being thus spelling variants of each other. In this paper we present a novel two-step fuzzy translation technique for cross-lingual spelling variants. In the first step, transformation rules are applied to source words to render them more similar to their target language equivalents. The rules are generated automatically using translation dictionaries as source data. In the second step, the intermediate forms obtained in the first step are translated into a target language using fuzzy matching. The effectiveness of the technique was evaluated empirically using five source languages and English as a target language. The two-step technique performed better, in some cases considerably better, than fuzzy matching alone. Even using the first step as such showed promising results.
  9. Lehtokangas, R.; Keskustalo, H.; Järvelin, K.: Experiments with transitive dictionary translation and pseudo-relevance feedback using graded relevance assessments (2008) 0.00
    0.0023611297 = product of:
      0.014166778 = sum of:
        0.014166778 = weight(_text_:in in 1349) [ClassicSimilarity], result of:
          0.014166778 = score(doc=1349,freq=14.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.23857531 = fieldWeight in 1349, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=1349)
      0.16666667 = coord(1/6)
    
    Abstract
    In this article, the authors present evaluation results for transitive dictionary-based cross-language information retrieval (CLIR) using graded relevance assessments in a best match retrieval environment. A text database containing newspaper articles and a related set of 35 search topics were used in the tests. Source language topics (in English, German, and Swedish) were automatically translated into the target language (Finnish) via an intermediate (or pivot) language. Effectiveness of the transitively translated queries was compared to that of the directly translated and monolingual Finnish queries. Pseudo-relevance feedback (PRF) was also used to expand the original transitive target queries. Cross-language information retrieval performance was evaluated on three relevance thresholds: stringent, regular, and liberal. The transitive translations performed well achieving, on the average, 85-93% of the direct translation performance, and 66-72% of monolingual performance. Moreover, PRF was successful in raising the performance of transitive translation routes in absolute terms as well as in relation to monolingual and direct translation performance applying PRF.
  10. Kettunen, K.; Kunttu, T.; Järvelin, K.: To stem or lemmatize a highly inflectional language in a probabilistic IR environment? (2005) 0.00
    0.0023517415 = product of:
      0.014110449 = sum of:
        0.014110449 = weight(_text_:in in 4395) [ClassicSimilarity], result of:
          0.014110449 = score(doc=4395,freq=20.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.2376267 = fieldWeight in 4395, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4395)
      0.16666667 = coord(1/6)
    
    Abstract
    Purpose - To show that stem generation compares well with lemmatization as a morphological tool for a highly inflectional language for IR purposes in a best-match retrieval system. Design/methodology/approach - Effects of three different morphological methods - lemmatization, stemming and stem production - for Finnish are compared in a probabilistic IR environment (INQUERY). Evaluation is done using a four-point relevance scale which is partitioned differently in different test settings. Findings - Results show that stem production, a lighter method than morphological lemmatization, compares well with lemmatization in a best-match IR environment. Differences in performance between stem production and lemmatization are small and they are not statistically significant in most of the tested settings. It is also shown that hitherto a rather neglected method of morphological processing for Finnish, stemming, performs reasonably well although the stemmer used - a Porter stemmer implementation - is far from optimal for a morphologically complex language like Finnish. In another series of tests, the effects of compound splitting and derivational expansion of queries are tested. Practical implications - Usefulness of morphological lemmatization and stem generation for IR purposes can be estimated with many factors. On the average P-R level they seem to behave very close to each other in a probabilistic IR system. Thus, the choice of the used method with highly inflectional languages needs to be estimated along other dimensions too. Originality/value - Results are achieved using Finnish as an example of a highly inflectional language. The results are of interest for anyone who is interested in processing of morphological variation of a highly inflected language for IR purposes.
  11. Lehtokangas, R.; Järvelin, K.: Consistency of textual expression in newspaper articles : an argument for semantically based query expansion (2001) 0.00
    0.0022310577 = product of:
      0.0133863455 = sum of:
        0.0133863455 = weight(_text_:in in 4485) [ClassicSimilarity], result of:
          0.0133863455 = score(doc=4485,freq=18.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.22543246 = fieldWeight in 4485, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4485)
      0.16666667 = coord(1/6)
    
    Abstract
    This article investigates how consistent different newspapers are in their choice of words when writing about the same news events. News articles on the same news events were taken from three Finnish newspapers and compared in regard to their central concepts and words representing the concepts in the news texts. Consistency figures were calculated for each set of three articles (the total number of sets was sixty). Inconsistency in words and concepts was found between news articles from different newspapers. The mean value of consistency calculated on the basis of words was 65 per cent; this however depended on the article length. For short news wires consistency was 83 per cent while for long articles it was only 47 per cent. At the concept level, consistency was considerably higher, ranging from 92 per cent to 97 per cent between short and long articles. The articles also represented three categories of topic (event, process and opinion). Statistically significant differences in consistency were found in regard to length but not in regard to the categories of topic. We argue that the expression inconsistency is a clear sign of a retrieval problem and that query expansion based on semantic relationships can significantly improve retrieval performance on free-text sources.
    Theme
    Semantisches Umfeld in Indexierung u. Retrieval
  12. Tuomaala, O.; Järvelin, K.; Vakkari, P.: Evolution of library and information science, 1965-2005 : content analysis of journal articles (2014) 0.00
    0.0022310577 = product of:
      0.0133863455 = sum of:
        0.0133863455 = weight(_text_:in in 1309) [ClassicSimilarity], result of:
          0.0133863455 = score(doc=1309,freq=18.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.22543246 = fieldWeight in 1309, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1309)
      0.16666667 = coord(1/6)
    
    Abstract
    This article first analyzes library and information science (LIS) research articles published in core LIS journals in 2005. It also examines the development of LIS from 1965 to 2005 in light of comparable data sets for 1965, 1985, and 2005. In both cases, the authors report (a) how the research articles are distributed by topic and (b) what approaches, research strategies, and methods were applied in the articles. In 2005, the largest research areas in LIS by this measure were information storage and retrieval, scientific communication, library and information-service activities, and information seeking. The same research areas constituted the quantitative core of LIS in the previous years since 1965. Information retrieval has been the most popular area of research over the years. The proportion of research on library and information-service activities decreased after 1985, but the popularity of information seeking and of scientific communication grew during the period studied. The viewpoint of research has shifted from library and information organizations to end users and development of systems for the latter. The proportion of empirical research strategies was high and rose over time, with the survey method being the single most important method. However, attention to evaluation and experiments increased considerably after 1985. Conceptual research strategies and system analysis, description, and design were quite popular, but declining. The most significant changes from 1965 to 2005 are the decreasing interest in library and information-service activities and the growth of research into information seeking and scientific communication.
  13. Halttunen, K.; Järvelin, K.: Assessing learning outcomes in two information retrieval learning environments (2005) 0.00
    0.0021859813 = product of:
      0.013115887 = sum of:
        0.013115887 = weight(_text_:in in 996) [ClassicSimilarity], result of:
          0.013115887 = score(doc=996,freq=12.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.22087781 = fieldWeight in 996, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=996)
      0.16666667 = coord(1/6)
    
    Abstract
    In order to design information retrieval (IR) learning environments and instruction, it is important to explore learning outcomes of different pedagogical solutions. Learning outcomes have seldom been evaluated in IR instruction. The particular focus of this study is the assessment of learning outcomes in an experimental, but naturalistic, learning environment compared to more traditional instruction. The 57 participants of an introductory course on IR were selected for this study, and the analysis illustrates their learning outcomes regarding both conceptual change and development of IR skill. Concept mapping of student essays was used to analyze conceptual change and log-files of search exercises provided data for performance assessment. Students in the experimental learning environment changed their conceptions more regarding linguistic aspects of IR and paid more emphasis on planning and management of search process. Performance assessment indicates that anchored instruction and scaffolding with an instructional tool, the IR Game, with performance feedback enables students to construct queries with fewer semantic knowledge errors also in operational IR systems.
  14. Sormunen, E.; Kekäläinen, J.; Koivisto, J.; Järvelin, K.: Document text characteristics affect the ranking of the most relevant documents by expanded structured queries (2001) 0.00
    0.0021034614 = product of:
      0.012620768 = sum of:
        0.012620768 = weight(_text_:in in 4487) [ClassicSimilarity], result of:
          0.012620768 = score(doc=4487,freq=16.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.21253976 = fieldWeight in 4487, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4487)
      0.16666667 = coord(1/6)
    
    Abstract
    The increasing flood of documentary information through the Internet and other information sources challenges the developers of information retrieval systems. It is not enough that an IR system is able to make a distinction between relevant and non-relevant documents. The reduction of information overload requires that IR systems provide the capability of screening the most valuable documents out of the mass of potentially or marginally relevant documents. This paper introduces a new concept-based method to analyse the text characteristics of documents at varying relevance levels. The results of the document analysis were applied in an experiment on query expansion (QE) in a probabilistic IR system. Statistical differences in textual characteristics of highly relevant and less relevant documents were investigated by applying a facet analysis technique. In highly relevant documents a larger number of aspects of the request were discussed, searchable expressions for the aspects were distributed over a larger set of text paragraphs, and a larger set of unique expressions were used per aspect than in marginally relevant documents. A query expansion experiment verified that the findings of the text analysis can be exploited in formulating more effective queries for best match retrieval in the search for highly relevant documents. The results revealed that expanded queries with concept-based structures performed better than unexpanded queries or Ñnatural languageÒ queries. Further, it was shown that highly relevant documents benefit essentially more from the concept-based QE in ranking than marginally relevant documents.
  15. Järvelin, K.; Vakkari, P.: LIS research across 50 years: content analysis of journal articles : offering an information-centric conception of memes (2022) 0.00
    0.0019676082 = product of:
      0.011805649 = sum of:
        0.011805649 = weight(_text_:in in 949) [ClassicSimilarity], result of:
          0.011805649 = score(doc=949,freq=14.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.19881277 = fieldWeight in 949, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=949)
      0.16666667 = coord(1/6)
    
    Abstract
    Purpose This paper analyses the research in Library and Information Science (LIS) and reports on (1) the status of LIS research in 2015 and (2) on the evolution of LIS research longitudinally from 1965 to 2015. Design/methodology/approach The study employs a quantitative intellectual content analysis of articles published in 30+ scholarly LIS journals, following the design by Tuomaala et al. (2014). In the content analysis, we classify articles along eight dimensions covering topical content and methodology. Findings The topical findings indicate that the earlier strong LIS emphasis on L&I services has declined notably, while scientific and professional communication has become the most popular topic. Information storage and retrieval has given up its earlier strong position towards the end of the years analyzed. Individuals are increasingly the units of observation. End-user's and developer's viewpoints have strengthened at the cost of intermediaries' viewpoint. LIS research is methodologically increasingly scattered since survey, scientometric methods, experiment, case studies and qualitative studies have all gained in popularity. Consequently, LIS may have become more versatile in the analysis of its research objects during the years analyzed. Originality/value Among quantitative intellectual content analyses of LIS research, the study is unique in its scope: length of analysis period (50 years), width (8 dimensions covering topical content and methodology) and depth (the annual batch of 30+ scholarly journals).
  16. Vakkari, P.; Chang, Y.-W.; Järvelin, K.: Disciplinary contributions to research topics and methodology in Library and Information Science : leading to fragmentation? (2022) 0.00
    0.001821651 = product of:
      0.010929906 = sum of:
        0.010929906 = weight(_text_:in in 767) [ClassicSimilarity], result of:
          0.010929906 = score(doc=767,freq=12.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.18406484 = fieldWeight in 767, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=767)
      0.16666667 = coord(1/6)
    
    Abstract
    The study analyses contributions to Library and Information Science (LIS) by researchers representing various disciplines. How are such contributions associated with the choice of research topics and methodology? The study employs a quantitative content analysis of articles published in 31 scholarly LIS journals in 2015. Each article is seen as a contribution to LIS by the authors' disciplines, which are inferred from their affiliations. The unit of analysis is the article-discipline pair. Of the contribution instances, the share of LIS is one third. Computer Science contributes one fifth and Business and Economics one sixth. The latter disciplines dominate the contributions in information retrieval, information seeking, and scientific communication indicating strong influences in LIS. Correspondence analysis reveals three clusters of research, one focusing on traditional LIS with contributions from LIS and Humanities and survey-type research; another on information retrieval with contributions from Computer Science and experimental research; and the third on scientific communication with contributions from Natural Sciences and Medicine and citation analytic research. The strong differentiation of scholarly contributions in LIS hints to the fragmentation of LIS as a discipline.
  17. Pirkola, A.; Järvelin, K.: Employing the resolution power of search keys (2001) 0.00
    0.0018033426 = product of:
      0.010820055 = sum of:
        0.010820055 = weight(_text_:in in 5907) [ClassicSimilarity], result of:
          0.010820055 = score(doc=5907,freq=6.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.1822149 = fieldWeight in 5907, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5907)
      0.16666667 = coord(1/6)
    
    Abstract
    Search key resolution power is analyzed in the context of a request, i.e., among the set of search keys for the request. Methods of characterizing the resolution power of keys automatically are studied, and the effects search keys of varying resolution power have on retrieval effectiveness are analyzed. It is shown that it often is possible to identify the best key of a query while the discrimination between the remaining keys presents problems. It is also shown that query performance is improved by suitably using the best key in a structured query. The tests were run with InQuery in a subcollection of the TREC collection, which contained some 515,000 documents
  18. Niemi, T.; Hirvonen, L.; Järvelin, K.: Multidimensional data model and query language for informetrics (2003) 0.00
    0.0017848461 = product of:
      0.010709076 = sum of:
        0.010709076 = weight(_text_:in in 1753) [ClassicSimilarity], result of:
          0.010709076 = score(doc=1753,freq=8.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.18034597 = fieldWeight in 1753, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=1753)
      0.16666667 = coord(1/6)
    
    Abstract
    Multidimensional data analysis or On-line analytical processing (OLAP) offers a single subject-oriented source for analyzing summary data based an various dimensions. We demonstrate that the OLAP approach gives a promising starting point for advanced analysis and comparison among summary data in informetrics applications. At the moment there is no single precise, commonly accepted logical/conceptual model for multidimensional analysis. This is because the requirements of applications vary considerably. We develop a conceptual/logical multidimensional model for supporting the complex and unpredictable needs of informetrics. Summary data are considered with respect of some dimensions. By changing dimensions the user may construct other views an the same summary data. We develop a multidimensional query language whose basic idea is to support the definition of views in a way, which is natural and intuitive for lay users in the informetrics area. We show that this view-oriented query language has a great expressive power and its degree of declarativity is greater than in contemporary operation-oriented or SQL (Structured Query Language)-like OLAP query languages.
  19. Järvelin, K.: ¬An analysis of two approaches in information retrieval : from frameworks to study designs (2007) 0.00
    0.0017848461 = product of:
      0.010709076 = sum of:
        0.010709076 = weight(_text_:in in 326) [ClassicSimilarity], result of:
          0.010709076 = score(doc=326,freq=8.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.18034597 = fieldWeight in 326, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=326)
      0.16666667 = coord(1/6)
    
    Abstract
    There is a well-known gap between systems-oriented information retrieval (IR) and user-oriented IR, which cognitive IR seeks to bridge. It is therefore interesting to analyze approaches at the level of frameworks, models, and study designs. This article is an exercise in such an analysis, focusing on two significant approaches to IR: the lab IR approach and P. Ingwersen's (1996) cognitive IR approach. The article focuses on their research frameworks, models, hypotheses, laws and theories, study designs, and possible contributions. The two approaches are quite different, which becomes apparent in the use of Independent, controlled, and dependent variables in the study designs of each approach. Thus, each approach is capable of contributing very differently to understanding and developing information access. The article also discusses integrating the approaches at the study-design level.
  20. Hansen, P.; Järvelin, K.: Collaborative Information Retrieval in an information-intensive domain (2005) 0.00
    0.0017848461 = product of:
      0.010709076 = sum of:
        0.010709076 = weight(_text_:in in 1040) [ClassicSimilarity], result of:
          0.010709076 = score(doc=1040,freq=8.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.18034597 = fieldWeight in 1040, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=1040)
      0.16666667 = coord(1/6)
    
    Abstract
    In this article we investigate the expressions of collaborative activities within information seeking and retrieval processes (IS&R). Generally, information seeking and retrieval is regarded as an individual and isolated process in IR research. We assume that an IS&R situation is not merely an individual effort, but inherently involves various collaborative activities. We present empirical results from a real-life and information-intensive setting within the patent domain, showing that the patent task performance process involves highly collaborative aspects throughout the stages of the information seeking and retrieval process. Furthermore, we show that these activities may be categorised and related to different stages in an information seeking and retrieval process. Therefore, the assumption that information retrieval performance is purely individual needs to be reconsidered. Finally, we also propose a refined IR framework involving collaborative aspects.