Search (10 results, page 1 of 1)

  • × author_ss:"Sormunen, E."
  1. Järvelin, K.; Kristensen, J.; Niemi, T.; Sormunen, E.; Keskustalo, H.: ¬A deductive data model for query expansion (1996) 0.02
    0.022779368 = product of:
      0.045558736 = sum of:
        0.045558736 = sum of:
          0.008118451 = weight(_text_:a in 2230) [ClassicSimilarity], result of:
            0.008118451 = score(doc=2230,freq=8.0), product of:
              0.053105544 = queryWeight, product of:
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.046056706 = queryNorm
              0.15287387 = fieldWeight in 2230, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.046875 = fieldNorm(doc=2230)
          0.037440285 = weight(_text_:22 in 2230) [ClassicSimilarity], result of:
            0.037440285 = score(doc=2230,freq=2.0), product of:
              0.16128273 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046056706 = queryNorm
              0.23214069 = fieldWeight in 2230, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=2230)
      0.5 = coord(1/2)
    
    Abstract
    We present a deductive data model for concept-based query expansion. It is based on three abstraction levels: the conceptual, linguistic and occurrence levels. Concepts and relationships among them are represented at the conceptual level. The expression level represents natural language expressions for concepts. Each expression has one or more matching models at the occurrence level. Each model specifies the matching of the expression in database indices built in varying ways. The data model supports a concept-based query expansion and formulation tool, the ExpansionTool, for environments providing heterogeneous IR systems. Expansion is controlled by adjustable matching reliability.
    Source
    Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM SIGIR '96), Zürich, Switzerland, August 18-22, 1996. Eds.: H.P. Frei et al
    Type
    a
  2. Sormunen, E.; Kekäläinen, J.; Koivisto, J.; Järvelin, K.: Document text characteristics affect the ranking of the most relevant documents by expanded structured queries (2001) 0.00
    0.0025370158 = product of:
      0.0050740317 = sum of:
        0.0050740317 = product of:
          0.010148063 = sum of:
            0.010148063 = weight(_text_:a in 4487) [ClassicSimilarity], result of:
              0.010148063 = score(doc=4487,freq=18.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.19109234 = fieldWeight in 4487, product of:
                  4.2426405 = tf(freq=18.0), with freq of:
                    18.0 = termFreq=18.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4487)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The increasing flood of documentary information through the Internet and other information sources challenges the developers of information retrieval systems. It is not enough that an IR system is able to make a distinction between relevant and non-relevant documents. The reduction of information overload requires that IR systems provide the capability of screening the most valuable documents out of the mass of potentially or marginally relevant documents. This paper introduces a new concept-based method to analyse the text characteristics of documents at varying relevance levels. The results of the document analysis were applied in an experiment on query expansion (QE) in a probabilistic IR system. Statistical differences in textual characteristics of highly relevant and less relevant documents were investigated by applying a facet analysis technique. In highly relevant documents a larger number of aspects of the request were discussed, searchable expressions for the aspects were distributed over a larger set of text paragraphs, and a larger set of unique expressions were used per aspect than in marginally relevant documents. A query expansion experiment verified that the findings of the text analysis can be exploited in formulating more effective queries for best match retrieval in the search for highly relevant documents. The results revealed that expanded queries with concept-based structures performed better than unexpanded queries or Ñnatural languageÒ queries. Further, it was shown that highly relevant documents benefit essentially more from the concept-based QE in ranking than marginally relevant documents.
    Type
    a
  3. Järvelin, A.; Keskustalo, H.; Sormunen, E.; Saastamoinen, M.; Kettunen, K.: Information retrieval from historical newspaper collections in highly inflectional languages : a query expansion approach (2016) 0.00
    0.0023919214 = product of:
      0.0047838427 = sum of:
        0.0047838427 = product of:
          0.009567685 = sum of:
            0.009567685 = weight(_text_:a in 3223) [ClassicSimilarity], result of:
              0.009567685 = score(doc=3223,freq=16.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.18016359 = fieldWeight in 3223, product of:
                  4.0 = tf(freq=16.0), with freq of:
                    16.0 = termFreq=16.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3223)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The aim of the study was to test whether query expansion by approximate string matching methods is beneficial in retrieval from historical newspaper collections in a language rich with compounds and inflectional forms (Finnish). First, approximate string matching methods were used to generate lists of index words most similar to contemporary query terms in a digitized newspaper collection from the 1800s. Top index word variants were categorized to estimate the appropriate query expansion ranges in the retrieval test. Second, the effectiveness of approximate string matching methods, automatically generated inflectional forms, and their combinations were measured in a Cranfield-style test. Finally, a detailed topic-level analysis of test results was conducted. In the index of historical newspaper collection the occurrences of a word typically spread to many linguistic and historical variants along with optical character recognition (OCR) errors. All query expansion methods improved the baseline results. Extensive expansion of around 30 variants for each query word was required to achieve the highest performance improvement. Query expansion based on approximate string matching was superior to using the inflectional forms of the query words, showing that coverage of the different types of variation is more important than precision in handling one type of variation.
    Type
    a
  4. Sormunen, E.: Free-text searching in full-text databases : probing system limits (1993) 0.00
    0.0023678814 = product of:
      0.0047357627 = sum of:
        0.0047357627 = product of:
          0.009471525 = sum of:
            0.009471525 = weight(_text_:a in 7120) [ClassicSimilarity], result of:
              0.009471525 = score(doc=7120,freq=2.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.17835285 = fieldWeight in 7120, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.109375 = fieldNorm(doc=7120)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Type
    a
  5. Halttunen, K.; Sormunen, E.: Learning information retrieval through an educational game : is gaming sufficient for learning? (2000) 0.00
    0.0023678814 = product of:
      0.0047357627 = sum of:
        0.0047357627 = product of:
          0.009471525 = sum of:
            0.009471525 = weight(_text_:a in 3865) [ClassicSimilarity], result of:
              0.009471525 = score(doc=3865,freq=2.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.17835285 = fieldWeight in 3865, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.109375 = fieldNorm(doc=3865)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Type
    a
  6. Vakkari, P.; Sormunen, E.: ¬The influence of relevance levels an the effectiveness of interactive information retrieval (2004) 0.00
    0.002269176 = product of:
      0.004538352 = sum of:
        0.004538352 = product of:
          0.009076704 = sum of:
            0.009076704 = weight(_text_:a in 2884) [ClassicSimilarity], result of:
              0.009076704 = score(doc=2884,freq=10.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.1709182 = fieldWeight in 2884, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2884)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    In this paper, we focus an the effect of graded relevance an the results of interactive information retrieval (IR) experiments based an assigned search tasks in a test collection. A group of 26 subjects searched for four Text REtrieval Conference (TREC) topics using automatic and interactive query expansion based an relevance feedback. The TREC- and user-suggested pools of relevant documents were reassessed an a four-level relevance scale. The results show that the users could identify nearly all highly relevant documents and about half of the marginal ones. Users also selected a fair number of irrelevant documents for query expansion. The findings suggest that the effectiveness of query expansion is closely related to the searchers' success in retrieving and identifying highly relevant documents for feedback. The implications of the results an interpreting the findings of past experiments with liberal relevance thresholds are also discussed.
    Type
    a
  7. Alkula, R.; Sormunen, E.: Problems and guidelines for database descriptions (1989) 0.00
    0.0020506454 = product of:
      0.004101291 = sum of:
        0.004101291 = product of:
          0.008202582 = sum of:
            0.008202582 = weight(_text_:a in 2397) [ClassicSimilarity], result of:
              0.008202582 = score(doc=2397,freq=6.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.1544581 = fieldWeight in 2397, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2397)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    An essential part of information retrieval knowledge is the knowledge of data base contents and structures. Currently, the variety of data bases is so wide that it is difficult to know the contents and structure of a particular data base and how they differ from those of other data bases. Because of the lack of commonly acknowledged guidelines for data base descriptions, each on-line service designs and produces printed manuals, on-line help texts and other user documentation in its own manner. For the presentation of exact information and knowledge on a data base, common, structured principles for data base descriptions are needed. Requirements and some solutions for such description principles are presented.
    Type
    a
  8. Heinström, J.; Sormunen, E.; Savolainen, R.; Ek, S.: Developing an empirical measure of everyday information mastering (2020) 0.00
    0.0014647468 = product of:
      0.0029294936 = sum of:
        0.0029294936 = product of:
          0.005858987 = sum of:
            0.005858987 = weight(_text_:a in 5914) [ClassicSimilarity], result of:
              0.005858987 = score(doc=5914,freq=6.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.11032722 = fieldWeight in 5914, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5914)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The aim of the study was to develop an empirical measure for everyday information mastering (EIM). EIM describes the ways that individuals, based on their beliefs, attitudes, and expectations, orient themselves to information as a resource of everyday action. The key features of EIM were identified by conceptual analysis focusing on three EIM frameworks. Four modes of EIM-Proactive, Social, Reactive, and Passive-and their 12 constituents were identified. A survey of 39 items was developed in two pilot studies to operationalize the identified modes as measurable EIM constituents. The respondents in the main study were upper secondary school students (n = 412). Exploratory factor analysis (EFA) was applied to validate subscales for each EIM constituent. Seven subscales emerged: Inquiring and Scanning in the Proactive mode, Social media-centered, and Experiential in the Social mode, and Information poor, Overwhelmed, and Blunting in the Passive mode. Two constituents, Serendipitous and Intuitive, were not supported in the EFA. The findings highlight that the core constituents of an individual's everyday information mastering can be operationalized as psychometric scales. The instrument contributes to the systematic empirical study of EIM constituents and their relationships. The study further sheds light on key modes of EIM.
    Type
    a
  9. Vakkari, P.; Jones, S.; MacFarlane, A.; Sormunen, E.: Query exhaustivity, relevance feedback and search success in automatic and interactive query expansion (2004) 0.00
    0.0014351527 = product of:
      0.0028703054 = sum of:
        0.0028703054 = product of:
          0.005740611 = sum of:
            0.005740611 = weight(_text_:a in 4435) [ClassicSimilarity], result of:
              0.005740611 = score(doc=4435,freq=4.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.10809815 = fieldWeight in 4435, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4435)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Type
    a
  10. Sormunen, E.; Tanni, M.; Alamettälä, T.; Heinström, J.: Students' group work strategies in source-based writing assignments (2014) 0.00
    0.0011959607 = product of:
      0.0023919214 = sum of:
        0.0023919214 = product of:
          0.0047838427 = sum of:
            0.0047838427 = weight(_text_:a in 1289) [ClassicSimilarity], result of:
              0.0047838427 = score(doc=1289,freq=4.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.090081796 = fieldWeight in 1289, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1289)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Source-based writing assignments conducted by groups of students are a common learning task used in information literacy instruction. The fundamental assumption in group assignments is that students' collaboration substantially enhances their learning. The present study focused on the group work strategies adopted by upper secondary school students in source-based writing assignments. Seventeen groups authored Wikipedia or Wikipedia-style articles and were interviewed during and after the assignment. Group work strategies were analyzed in 6 activities: planning, searching, assessing sources, reading, writing, and editing. The students used 2 cooperative strategies: delegation and division of work, and 2 collaborative strategies: pair and group collaboration. Division of work into independently conducted parts was the most popular group work strategy. Also group collaboration, where students worked together to complete an activity, was commonly applied. Division of work was justified by efficiency in completing the project and by ease of control in the fair division of contributions. The motivation behind collaboration was related to quality issues and shared responsibility. We suggest that the present designs of learning tasks lead students to avoid collaboration, increasing the risk of low learning outcomes in information literacy instruction.
    Type
    a