Search (8 results, page 1 of 1)

  • × author_ss:"Pirkola, A."
  1. Pirkola, A.; Hedlund, T.; Keskustalo, H.; Järvelin, K.: Dictionary-based cross-language information retrieval : problems, methods, and research findings (2001) 0.00
    0.0030093212 = product of:
      0.0060186423 = sum of:
        0.0060186423 = product of:
          0.012037285 = sum of:
            0.012037285 = weight(_text_:a in 3908) [ClassicSimilarity], result of:
              0.012037285 = score(doc=3908,freq=4.0), product of:
                0.04772363 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.041389145 = queryNorm
                0.25222903 = fieldWeight in 3908, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.109375 = fieldNorm(doc=3908)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Type
    a
  2. Pirkola, A.: Constructing topic-specific search keyphrase suggestion tools for Web information retrieval (2010) 0.00
    0.0028838771 = product of:
      0.0057677543 = sum of:
        0.0057677543 = product of:
          0.011535509 = sum of:
            0.011535509 = weight(_text_:a in 4665) [ClassicSimilarity], result of:
              0.011535509 = score(doc=4665,freq=20.0), product of:
                0.04772363 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.041389145 = queryNorm
                0.24171482 = fieldWeight in 4665, product of:
                  4.472136 = tf(freq=20.0), with freq of:
                    20.0 = termFreq=20.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4665)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    We devised a method to extract keyphrases from the Web pages to construct a keyphrase list for a specific topic. The keyphrases are identified and out-oftopic phrases removed based on their frequencies in the text corpora of various densities of text discussing the topic. The list is intended as a search aid for Web information retrieval, so that the user can browse the list, identify different aspects of the topic, and select from it keyphrases (e.g. find synonymous phrases) for a query. A keyphrase list containing a large set of key-phrases related to climate change was constructed using the proposed method. We argue that there is a need for such keyphrase suggestion tools, because the major Web search engines do not provide users with such terminological search aids that help them identify different topic aspects and find synonyms.
    Type
    a
  3. Pirkola, A.; Järvelin, K.: Employing the resolution power of search keys (2001) 0.00
    0.0026061484 = product of:
      0.0052122967 = sum of:
        0.0052122967 = product of:
          0.010424593 = sum of:
            0.010424593 = weight(_text_:a in 5907) [ClassicSimilarity], result of:
              0.010424593 = score(doc=5907,freq=12.0), product of:
                0.04772363 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.041389145 = queryNorm
                0.21843673 = fieldWeight in 5907, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5907)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Search key resolution power is analyzed in the context of a request, i.e., among the set of search keys for the request. Methods of characterizing the resolution power of keys automatically are studied, and the effects search keys of varying resolution power have on retrieval effectiveness are analyzed. It is shown that it often is possible to identify the best key of a query while the discrimination between the remaining keys presents problems. It is also shown that query performance is improved by suitably using the best key in a structured query. The tests were run with InQuery in a subcollection of the TREC collection, which contained some 515,000 documents
    Type
    a
  4. Ferro, N.; Silvello, G.; Keskustalo, H.; Pirkola, A.; Järvelin, K.: ¬The twist measure for IR evaluation : taking user's effort into account (2016) 0.00
    0.0024032309 = product of:
      0.0048064617 = sum of:
        0.0048064617 = product of:
          0.0096129235 = sum of:
            0.0096129235 = weight(_text_:a in 2771) [ClassicSimilarity], result of:
              0.0096129235 = score(doc=2771,freq=20.0), product of:
                0.04772363 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.041389145 = queryNorm
                0.20142901 = fieldWeight in 2771, product of:
                  4.472136 = tf(freq=20.0), with freq of:
                    20.0 = termFreq=20.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2771)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    We present a novel measure for ranking evaluation, called Twist (t). It is a measure for informational intents, which handles both binary and graded relevance. t stems from the observation that searching is currently a that searching is currently taken for granted and it is natural for users to assume that search engines are available and work well. As a consequence, users may assume the utility they have in finding relevant documents, which is the focus of traditional measures, as granted. On the contrary, they may feel uneasy when the system returns nonrelevant documents because they are then forced to do additional work to get the desired information, and this causes avoidable effort. The latter is the focus of t, which evaluates the effectiveness of a system from the point of view of the effort required to the users to retrieve the desired information. We provide a formal definition of t, a demonstration of its properties, and introduce the notion of effort/gain plots, which complement traditional utility-based measures. By means of an extensive experimental evaluation, t is shown to grasp different aspects of system performances, to not require extensive and costly assessments, and to be a robust tool for detecting differences between systems.
    Type
    a
  5. Toivonen, J.; Pirkola, A.; Keskustalo, H.; Visala, K.; Järvelin, K.: Translating cross-lingual spelling variants using transformation rules (2005) 0.00
    0.0022338415 = product of:
      0.004467683 = sum of:
        0.004467683 = product of:
          0.008935366 = sum of:
            0.008935366 = weight(_text_:a in 1052) [ClassicSimilarity], result of:
              0.008935366 = score(doc=1052,freq=12.0), product of:
                0.04772363 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.041389145 = queryNorm
                0.18723148 = fieldWeight in 1052, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1052)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Technical terms and proper names constitute a major problem in dictionary-based cross-language information retrieval (CLIR). However, technical terms and proper names in different languages often share the same Latin or Greek origin, being thus spelling variants of each other. In this paper we present a novel two-step fuzzy translation technique for cross-lingual spelling variants. In the first step, transformation rules are applied to source words to render them more similar to their target language equivalents. The rules are generated automatically using translation dictionaries as source data. In the second step, the intermediate forms obtained in the first step are translated into a target language using fuzzy matching. The effectiveness of the technique was evaluated empirically using five source languages and English as a target language. The two-step technique performed better, in some cases considerably better, than fuzzy matching alone. Even using the first step as such showed promising results.
    Type
    a
  6. Pirkola, A.; Puolamäki, D.; Järvelin, K.: Applying query structuring in cross-language retrieval (2003) 0.00
    0.0022338415 = product of:
      0.004467683 = sum of:
        0.004467683 = product of:
          0.008935366 = sum of:
            0.008935366 = weight(_text_:a in 1074) [ClassicSimilarity], result of:
              0.008935366 = score(doc=1074,freq=12.0), product of:
                0.04772363 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.041389145 = queryNorm
                0.18723148 = fieldWeight in 1074, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1074)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    We will explore various ways to apply query structuring in cross-language information retrieval. In the first test, English queries were translated into Finnish using an electronic dictionary, and were run in a Finnish newspaper database of 55,000 articles. Queries were structured by combining the Finnish translation equivalents of the same English query key using the syn-operator of the InQuery retrieval system. Structured queries performed markedly better than unstructured queries. Second, the effects of compound-based structuring using a proximity operator for the translation equivalents of query language compound components were tested. The method was not useful in syn-based queries but resulted in decrease in retrieval effectiveness. Proper names are often non-identical spelling variants in different languages. This allows n-gram based translation of names not included in a dictionary. In the third test, a query structuring method where the Boolean and-operator was used to assign more weight to keys translated through n-gram matching gave good results.
    Type
    a
  7. Pirkola, A.; Jarvelin, K.: ¬The effect of anaphor and ellipsis resolution on proximity searching in a text database (1995) 0.00
    0.002149515 = product of:
      0.00429903 = sum of:
        0.00429903 = product of:
          0.00859806 = sum of:
            0.00859806 = weight(_text_:a in 4088) [ClassicSimilarity], result of:
              0.00859806 = score(doc=4088,freq=16.0), product of:
                0.04772363 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.041389145 = queryNorm
                0.18016359 = fieldWeight in 4088, product of:
                  4.0 = tf(freq=16.0), with freq of:
                    16.0 = termFreq=16.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4088)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    So far, methods for ellipsis and anaphor resolution have been developed and the effects of anaphor resolution have been analyzed in the context of statistical information retrieval of scientific abstracts. No significant improvements has been observed. Analyzes the effects of ellipsis and anaphor resolution on proximity searching in a full text database. Anaphora and ellipsis are classified on the basis of the type of their correlates / antecedents rather than, as traditional, on the basis of their own linguistic type. The classification differentiates proper names and common nouns of basic words, compound words, and phrases. The study was carried out in a newspaper article database containing 55.000 full text articles. A set of 154 keyword pairs in different categories was created. Human resolution of keyword ellipsis and anaphora was performed to identify sentences and paragraphs which would match proximity searches after resolution. Findings indicate that ellipsis and anaphor resolution is most relevant for proper name phrases and only marginal in the other keyword categories. Therefore the recall effect of restricted resolution of proper name phrases only was analyzed for keyword pairs containing at least 1 proper name phrase. Findings indicate a recall increase of 38.2% in sentence searches, and 28.8% in paragraph searches when proper name ellipsis were resolved. The recall increase was 17.6% sentence searches, and 19.8% in paragraph searches when proper name anaphora were resolved. Some simple and computationally justifiable resolution method might be developed only for proper name phrases to support keyword based full text information retrieval. Discusses elements of such a method
    Type
    a
  8. Pirkola, A.: Morphological typology of languages for IR (2001) 0.00
    0.0020392092 = product of:
      0.0040784185 = sum of:
        0.0040784185 = product of:
          0.008156837 = sum of:
            0.008156837 = weight(_text_:a in 4476) [ClassicSimilarity], result of:
              0.008156837 = score(doc=4476,freq=10.0), product of:
                0.04772363 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.041389145 = queryNorm
                0.1709182 = fieldWeight in 4476, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4476)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This paper presents a morphological classification of languages from the IR perspective. Linguistic typology research has shown that the morphological complexity of every language in the world can be described by two variables, index of synthesis and index of fusion. These variables provide a theoretical basis for IR research handling morphological issues. A common theoretical framework is needed in particular because of the increasing significance of cross-language retrieval research and CLIR systems processing different languages. The paper elaborates the linguistic morphological typology for the purposes of IR research. It studies how the indexes of synthesis and fusion could be used as practical tools in mono- and cross-lingual IR research. The need for semantic and syntactic typologies is discussed. The paper also reviews studies made in different languages on the effects of morphology and stemming in IR.
    Type
    a