Document (#33374)

Author
Can, F.
Kocberber, S.
Balcik, E.
Kaynak, C.
Ocalan, H.C.
Title
Information retrieval on Turkish texts
Source
Journal of the American Society for Information Science and Technology. 59(2008) no.3, S.407-421
Year
2008
Abstract
In this study, we investigate information retrieval (IR) on Turkish texts using a large-scale test collection that contains 408,305 documents and 72 ad hoc queries. We examine the effects of several stemming options and query-document matching functions on retrieval performance. We show that a simple word truncation approach, a word truncation approach that uses language-dependent corpus statistics, and an elaborate lemmatizer-based stemmer provide similar retrieval effectiveness in Turkish IR. We investigate the effects of a range of search conditions on the retrieval performance; these include scalability issues, query and document length effects, and the use of stopword list in indexing.

Similar documents (content)

  1. Ekmekcioglu, F.C.; Lynch, M.F.; Willet, P.: Development and evaluation of conflation techniques for the implementation of a document retrieval system for Turkish text databases (1995) 0.51
    0.50559866 = sum of:
      0.50559866 = product of:
        1.8057096 = sum of:
          0.054246258 = weight(abstract_txt:matching in 5797) [ClassicSimilarity], result of:
            0.054246258 = score(doc=5797,freq=1.0), product of:
              0.095673785 = queryWeight, product of:
                1.0185126 = boost
                6.047913 = idf(docFreq=283, maxDocs=44218)
                0.015531772 = queryNorm
              0.56699187 = fieldWeight in 5797, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.047913 = idf(docFreq=283, maxDocs=44218)
                0.09375 = fieldNorm(doc=5797)
          0.009786577 = weight(abstract_txt:that in 5797) [ClassicSimilarity], result of:
            0.009786577 = score(doc=5797,freq=1.0), product of:
              0.044056237 = queryWeight, product of:
                1.1971105 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.015531772 = queryNorm
              0.22213829 = fieldWeight in 5797, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.09375 = fieldNorm(doc=5797)
          0.14330313 = weight(abstract_txt:stemming in 5797) [ClassicSimilarity], result of:
            0.14330313 = score(doc=5797,freq=2.0), product of:
              0.14511319 = queryWeight, product of:
                1.2543634 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.015531772 = queryNorm
              0.9875266 = fieldWeight in 5797, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.09375 = fieldNorm(doc=5797)
          0.038792428 = weight(abstract_txt:document in 5797) [ClassicSimilarity], result of:
            0.038792428 = score(doc=5797,freq=1.0), product of:
              0.09639498 = queryWeight, product of:
                1.4458131 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.015531772 = queryNorm
              0.40243202 = fieldWeight in 5797, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.09375 = fieldNorm(doc=5797)
          0.2273514 = weight(abstract_txt:stopword in 5797) [ClassicSimilarity], result of:
            0.2273514 = score(doc=5797,freq=1.0), product of:
              0.24870138 = queryWeight, product of:
                1.6421356 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.015531772 = queryNorm
              0.9141542 = fieldWeight in 5797, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.09375 = fieldNorm(doc=5797)
          0.051456448 = weight(abstract_txt:retrieval in 5797) [ClassicSimilarity], result of:
            0.051456448 = score(doc=5797,freq=1.0), product of:
              0.15794152 = queryWeight, product of:
                2.9261937 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.015531772 = queryNorm
              0.3257943 = fieldWeight in 5797, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.09375 = fieldNorm(doc=5797)
          1.2807734 = weight(abstract_txt:turkish in 5797) [ClassicSimilarity], result of:
            1.2807734 = score(doc=5797,freq=6.0), product of:
              0.6249587 = queryWeight, product of:
                4.508751 = boost
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.015531772 = queryNorm
              2.049373 = fieldWeight in 5797, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.09375 = fieldNorm(doc=5797)
        0.28 = coord(7/25)
    
  2. Can, F.; Kocberber, S.; Baglioglu, O.; Kardas, S.; Ocalan, H.C.; Uyar, E.: New event detection and topic tracking in Turkish (2010) 0.38
    0.38098928 = sum of:
      0.38098928 = product of:
        1.1905916 = sum of:
          0.014588968 = weight(abstract_txt:that in 3442) [ClassicSimilarity], result of:
            0.014588968 = score(doc=3442,freq=5.0), product of:
              0.044056237 = queryWeight, product of:
                1.1971105 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.015531772 = queryNorm
              0.3311442 = fieldWeight in 3442, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=3442)
          0.09553542 = weight(abstract_txt:stemming in 3442) [ClassicSimilarity], result of:
            0.09553542 = score(doc=3442,freq=2.0), product of:
              0.14511319 = queryWeight, product of:
                1.2543634 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.015531772 = queryNorm
              0.65835106 = fieldWeight in 3442, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.0625 = fieldNorm(doc=3442)
          0.024292737 = weight(abstract_txt:approach in 3442) [ClassicSimilarity], result of:
            0.024292737 = score(doc=3442,freq=2.0), product of:
              0.07338235 = queryWeight, product of:
                1.261481 = boost
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.015531772 = queryNorm
              0.33104333 = fieldWeight in 3442, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.0625 = fieldNorm(doc=3442)
          0.05121606 = weight(abstract_txt:investigate in 3442) [ClassicSimilarity], result of:
            0.05121606 = score(doc=3442,freq=1.0), product of:
              0.15201557 = queryWeight, product of:
                1.8156368 = boost
                5.390612 = idf(docFreq=547, maxDocs=44218)
                0.015531772 = queryNorm
              0.33691326 = fieldWeight in 3442, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.390612 = idf(docFreq=547, maxDocs=44218)
                0.0625 = fieldNorm(doc=3442)
          0.07425067 = weight(abstract_txt:word in 3442) [ClassicSimilarity], result of:
            0.07425067 = score(doc=3442,freq=2.0), product of:
              0.15455185 = queryWeight, product of:
                1.8307204 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.015531772 = queryNorm
              0.48042563 = fieldWeight in 3442, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.0625 = fieldNorm(doc=3442)
          0.19923876 = weight(abstract_txt:truncation in 3442) [ClassicSimilarity], result of:
            0.19923876 = score(doc=3442,freq=1.0), product of:
              0.3760104 = queryWeight, product of:
                2.8555176 = boost
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.015531772 = queryNorm
              0.5298757 = fieldWeight in 3442, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.0625 = fieldNorm(doc=3442)
          0.0343043 = weight(abstract_txt:retrieval in 3442) [ClassicSimilarity], result of:
            0.0343043 = score(doc=3442,freq=1.0), product of:
              0.15794152 = queryWeight, product of:
                2.9261937 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.015531772 = queryNorm
              0.21719621 = fieldWeight in 3442, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.0625 = fieldNorm(doc=3442)
          0.6971647 = weight(abstract_txt:turkish in 3442) [ClassicSimilarity], result of:
            0.6971647 = score(doc=3442,freq=4.0), product of:
              0.6249587 = queryWeight, product of:
                4.508751 = boost
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.015531772 = queryNorm
              1.1155373 = fieldWeight in 3442, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.0625 = fieldNorm(doc=3442)
        0.32 = coord(8/25)
    
  3. Ahlgren, P.; Kekäläinen, J.: Indexing strategies for Swedish full text retrieval under different user scenarios (2007) 0.20
    0.20302391 = sum of:
      0.20302391 = product of:
        0.7250854 = sum of:
          0.011300566 = weight(abstract_txt:that in 896) [ClassicSimilarity], result of:
            0.011300566 = score(doc=896,freq=3.0), product of:
              0.044056237 = queryWeight, product of:
                1.1971105 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.015531772 = queryNorm
              0.2565032 = fieldWeight in 896, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=896)
          0.03657385 = weight(abstract_txt:document in 896) [ClassicSimilarity], result of:
            0.03657385 = score(doc=896,freq=2.0), product of:
              0.09639498 = queryWeight, product of:
                1.4458131 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.015531772 = queryNorm
              0.37941656 = fieldWeight in 896, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.0625 = fieldNorm(doc=896)
          0.056222957 = weight(abstract_txt:performance in 896) [ClassicSimilarity], result of:
            0.056222957 = score(doc=896,freq=3.0), product of:
              0.11216379 = queryWeight, product of:
                1.559593 = boost
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.015531772 = queryNorm
              0.50125766 = fieldWeight in 896, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.0625 = fieldNorm(doc=896)
          0.07853982 = weight(abstract_txt:query in 896) [ClassicSimilarity], result of:
            0.07853982 = score(doc=896,freq=5.0), product of:
              0.11821898 = queryWeight, product of:
                1.6011372 = boost
                4.7537646 = idf(docFreq=1035, maxDocs=44218)
                0.015531772 = queryNorm
              0.6643588 = fieldWeight in 896, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.7537646 = idf(docFreq=1035, maxDocs=44218)
                0.0625 = fieldNorm(doc=896)
          0.08455385 = weight(abstract_txt:effects in 896) [ClassicSimilarity], result of:
            0.08455385 = score(doc=896,freq=1.0), product of:
              0.24307296 = queryWeight, product of:
                2.8118937 = boost
                5.565661 = idf(docFreq=459, maxDocs=44218)
                0.015531772 = queryNorm
              0.3478538 = fieldWeight in 896, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.565661 = idf(docFreq=459, maxDocs=44218)
                0.0625 = fieldNorm(doc=896)
          0.39847752 = weight(abstract_txt:truncation in 896) [ClassicSimilarity], result of:
            0.39847752 = score(doc=896,freq=4.0), product of:
              0.3760104 = queryWeight, product of:
                2.8555176 = boost
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.015531772 = queryNorm
              1.0597514 = fieldWeight in 896, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.0625 = fieldNorm(doc=896)
          0.05941679 = weight(abstract_txt:retrieval in 896) [ClassicSimilarity], result of:
            0.05941679 = score(doc=896,freq=3.0), product of:
              0.15794152 = queryWeight, product of:
                2.9261937 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.015531772 = queryNorm
              0.37619486 = fieldWeight in 896, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.0625 = fieldNorm(doc=896)
        0.28 = coord(7/25)
    
  4. Yilmaz, T.; Ozcan, R.; Altingovde, I.S.; Ulusoy, Ö.: Improving educational web search for question-like queries through subject classification (2019) 0.16
    0.1600843 = sum of:
      0.1600843 = product of:
        0.57172966 = sum of:
          0.045683898 = weight(abstract_txt:length in 5041) [ClassicSimilarity], result of:
            0.045683898 = score(doc=5041,freq=1.0), product of:
              0.11180195 = queryWeight, product of:
                1.1010185 = boost
                6.537832 = idf(docFreq=173, maxDocs=44218)
                0.015531772 = queryNorm
              0.4086145 = fieldWeight in 5041, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.537832 = idf(docFreq=173, maxDocs=44218)
                0.0625 = fieldNorm(doc=5041)
          0.014588968 = weight(abstract_txt:that in 5041) [ClassicSimilarity], result of:
            0.014588968 = score(doc=5041,freq=5.0), product of:
              0.044056237 = queryWeight, product of:
                1.1971105 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.015531772 = queryNorm
              0.3311442 = fieldWeight in 5041, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=5041)
          0.025861617 = weight(abstract_txt:document in 5041) [ClassicSimilarity], result of:
            0.025861617 = score(doc=5041,freq=1.0), product of:
              0.09639498 = queryWeight, product of:
                1.4458131 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.015531772 = queryNorm
              0.26828802 = fieldWeight in 5041, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.0625 = fieldNorm(doc=5041)
          0.032460343 = weight(abstract_txt:performance in 5041) [ClassicSimilarity], result of:
            0.032460343 = score(doc=5041,freq=1.0), product of:
              0.11216379 = queryWeight, product of:
                1.559593 = boost
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.015531772 = queryNorm
              0.28940126 = fieldWeight in 5041, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.0625 = fieldNorm(doc=5041)
          0.07024815 = weight(abstract_txt:query in 5041) [ClassicSimilarity], result of:
            0.07024815 = score(doc=5041,freq=4.0), product of:
              0.11821898 = queryWeight, product of:
                1.6011372 = boost
                4.7537646 = idf(docFreq=1035, maxDocs=44218)
                0.015531772 = queryNorm
              0.5942206 = fieldWeight in 5041, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.7537646 = idf(docFreq=1035, maxDocs=44218)
                0.0625 = fieldNorm(doc=5041)
          0.0343043 = weight(abstract_txt:retrieval in 5041) [ClassicSimilarity], result of:
            0.0343043 = score(doc=5041,freq=1.0), product of:
              0.15794152 = queryWeight, product of:
                2.9261937 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.015531772 = queryNorm
              0.21719621 = fieldWeight in 5041, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.0625 = fieldNorm(doc=5041)
          0.34858236 = weight(abstract_txt:turkish in 5041) [ClassicSimilarity], result of:
            0.34858236 = score(doc=5041,freq=1.0), product of:
              0.6249587 = queryWeight, product of:
                4.508751 = boost
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.015531772 = queryNorm
              0.55776864 = fieldWeight in 5041, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.0625 = fieldNorm(doc=5041)
        0.28 = coord(7/25)
    
  5. Savoy, J.: Searching strategies for the Hungarian language (2008) 0.14
    0.14240539 = sum of:
      0.14240539 = product of:
        0.5085907 = sum of:
          0.016310962 = weight(abstract_txt:that in 2037) [ClassicSimilarity], result of:
            0.016310962 = score(doc=2037,freq=4.0), product of:
              0.044056237 = queryWeight, product of:
                1.1971105 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.015531772 = queryNorm
              0.3702305 = fieldWeight in 2037, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.078125 = fieldNorm(doc=2037)
          0.14625815 = weight(abstract_txt:stemming in 2037) [ClassicSimilarity], result of:
            0.14625815 = score(doc=2037,freq=3.0), product of:
              0.14511319 = queryWeight, product of:
                1.2543634 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.015531772 = queryNorm
              1.0078901 = fieldWeight in 2037, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.078125 = fieldNorm(doc=2037)
          0.037190504 = weight(abstract_txt:approach in 2037) [ClassicSimilarity], result of:
            0.037190504 = score(doc=2037,freq=3.0), product of:
              0.07338235 = queryWeight, product of:
                1.261481 = boost
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.015531772 = queryNorm
              0.5068045 = fieldWeight in 2037, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.078125 = fieldNorm(doc=2037)
          0.15974636 = weight(abstract_txt:stemmer in 2037) [ClassicSimilarity], result of:
            0.15974636 = score(doc=2037,freq=1.0), product of:
              0.22196674 = queryWeight, product of:
                1.5513647 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.015531772 = queryNorm
              0.71968603 = fieldWeight in 2037, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.078125 = fieldNorm(doc=2037)
          0.04057543 = weight(abstract_txt:performance in 2037) [ClassicSimilarity], result of:
            0.04057543 = score(doc=2037,freq=1.0), product of:
              0.11216379 = queryWeight, product of:
                1.559593 = boost
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.015531772 = queryNorm
              0.3617516 = fieldWeight in 2037, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.078125 = fieldNorm(doc=2037)
          0.065628946 = weight(abstract_txt:word in 2037) [ClassicSimilarity], result of:
            0.065628946 = score(doc=2037,freq=1.0), product of:
              0.15455185 = queryWeight, product of:
                1.8307204 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.015531772 = queryNorm
              0.4246403 = fieldWeight in 2037, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.078125 = fieldNorm(doc=2037)
          0.04288037 = weight(abstract_txt:retrieval in 2037) [ClassicSimilarity], result of:
            0.04288037 = score(doc=2037,freq=1.0), product of:
              0.15794152 = queryWeight, product of:
                2.9261937 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.015531772 = queryNorm
              0.27149525 = fieldWeight in 2037, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.078125 = fieldNorm(doc=2037)
        0.28 = coord(7/25)