Document (#33375)

Author
Can, F.
Kocberber, S.
Balcik, E.
Kaynak, C.
Ocalan, H.C.
Title
Information retrieval on Turkish texts
Source
Journal of the American Society for Information Science and Technology. 59(2008) no.3, S.407-421
Year
2008
Abstract
In this study, we investigate information retrieval (IR) on Turkish texts using a large-scale test collection that contains 408,305 documents and 72 ad hoc queries. We examine the effects of several stemming options and query-document matching functions on retrieval performance. We show that a simple word truncation approach, a word truncation approach that uses language-dependent corpus statistics, and an elaborate lemmatizer-based stemmer provide similar retrieval effectiveness in Turkish IR. We investigate the effects of a range of search conditions on the retrieval performance; these include scalability issues, query and document length effects, and the use of stopword list in indexing.

Similar documents (content)

  1. Ekmekcioglu, F.C.; Lynch, M.F.; Willet, P.: Development and evaluation of conflation techniques for the implementation of a document retrieval system for Turkish text databases (1995) 0.51
    0.5116014 = sum of:
      0.5116014 = product of:
        1.8271477 = sum of:
          0.05402759 = weight(abstract_txt:matching in 5866) [ClassicSimilarity], result of:
            0.05402759 = score(doc=5866,freq=1.0), product of:
              0.095085956 = queryWeight, product of:
                1.0210122 = boost
                6.060772 = idf(docFreq=270, maxDocs=42740)
                0.015365883 = queryNorm
              0.56819737 = fieldWeight in 5866, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.060772 = idf(docFreq=270, maxDocs=42740)
                0.09375 = fieldNorm(doc=5866)
          0.009997325 = weight(abstract_txt:that in 5866) [ClassicSimilarity], result of:
            0.009997325 = score(doc=5866,freq=1.0), product of:
              0.044531666 = queryWeight, product of:
                1.2102298 = boost
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.015365883 = queryNorm
              0.22449924 = fieldWeight in 5866, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.09375 = fieldNorm(doc=5866)
          0.14153305 = weight(abstract_txt:stemming in 5866) [ClassicSimilarity], result of:
            0.14153305 = score(doc=5866,freq=2.0), product of:
              0.1434172 = queryWeight, product of:
                1.2539301 = boost
                7.4433827 = idf(docFreq=67, maxDocs=42740)
                0.015365883 = queryNorm
              0.9868624 = fieldWeight in 5866, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.4433827 = idf(docFreq=67, maxDocs=42740)
                0.09375 = fieldNorm(doc=5866)
          0.038073804 = weight(abstract_txt:document in 5866) [ClassicSimilarity], result of:
            0.038073804 = score(doc=5866,freq=1.0), product of:
              0.094871 = queryWeight, product of:
                1.4422963 = boost
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.015365883 = queryNorm
              0.40132183 = fieldWeight in 5866, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.09375 = fieldNorm(doc=5866)
          0.22265154 = weight(abstract_txt:stopword in 5866) [ClassicSimilarity], result of:
            0.22265154 = score(doc=5866,freq=1.0), product of:
              0.24441233 = queryWeight, product of:
                1.6369457 = boost
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.015365883 = queryNorm
              0.9109669 = fieldWeight in 5866, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.09375 = fieldNorm(doc=5866)
          0.050470404 = weight(abstract_txt:retrieval in 5866) [ClassicSimilarity], result of:
            0.050470404 = score(doc=5866,freq=1.0), product of:
              0.15537716 = queryWeight, product of:
                2.9184437 = boost
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.015365883 = queryNorm
              0.3248251 = fieldWeight in 5866, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.09375 = fieldNorm(doc=5866)
          1.3103939 = weight(abstract_txt:turkish in 5866) [ClassicSimilarity], result of:
            1.3103939 = score(doc=5866,freq=6.0), product of:
              0.63235927 = queryWeight, product of:
                4.5605297 = boost
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.015365883 = queryNorm
              2.07223 = fieldWeight in 5866, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.09375 = fieldNorm(doc=5866)
        0.28 = coord(7/25)
    
  2. Can, F.; Kocberber, S.; Baglioglu, O.; Kardas, S.; Ocalan, H.C.; Uyar, E.: New event detection and topic tracking in Turkish (2010) 0.38
    0.3846334 = sum of:
      0.3846334 = product of:
        1.2019794 = sum of:
          0.014903133 = weight(abstract_txt:that in 443) [ClassicSimilarity], result of:
            0.014903133 = score(doc=443,freq=5.0), product of:
              0.044531666 = queryWeight, product of:
                1.2102298 = boost
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.015365883 = queryNorm
              0.33466372 = fieldWeight in 443, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.0625 = fieldNorm(doc=443)
          0.09435536 = weight(abstract_txt:stemming in 443) [ClassicSimilarity], result of:
            0.09435536 = score(doc=443,freq=2.0), product of:
              0.1434172 = queryWeight, product of:
                1.2539301 = boost
                7.4433827 = idf(docFreq=67, maxDocs=42740)
                0.015365883 = queryNorm
              0.65790826 = fieldWeight in 443, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.4433827 = idf(docFreq=67, maxDocs=42740)
                0.0625 = fieldNorm(doc=443)
          0.024564447 = weight(abstract_txt:approach in 443) [ClassicSimilarity], result of:
            0.024564447 = score(doc=443,freq=2.0), product of:
              0.0736724 = queryWeight, product of:
                1.2709842 = boost
                3.772308 = idf(docFreq=2671, maxDocs=42740)
                0.015365883 = queryNorm
              0.33342808 = fieldWeight in 443, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.772308 = idf(docFreq=2671, maxDocs=42740)
                0.0625 = fieldNorm(doc=443)
          0.052157648 = weight(abstract_txt:investigate in 443) [ClassicSimilarity], result of:
            0.052157648 = score(doc=443,freq=1.0), product of:
              0.15334001 = queryWeight, product of:
                1.8336459 = boost
                5.4423003 = idf(docFreq=502, maxDocs=42740)
                0.015365883 = queryNorm
              0.34014377 = fieldWeight in 443, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4423003 = idf(docFreq=502, maxDocs=42740)
                0.0625 = fieldNorm(doc=443)
          0.074251056 = weight(abstract_txt:word in 443) [ClassicSimilarity], result of:
            0.074251056 = score(doc=443,freq=2.0), product of:
              0.15401697 = queryWeight, product of:
                1.837689 = boost
                5.4543004 = idf(docFreq=496, maxDocs=42740)
                0.015365883 = queryNorm
              0.48209658 = fieldWeight in 443, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4543004 = idf(docFreq=496, maxDocs=42740)
                0.0625 = fieldNorm(doc=443)
          0.19481277 = weight(abstract_txt:truncation in 443) [ClassicSimilarity], result of:
            0.19481277 = score(doc=443,freq=1.0), product of:
              0.36913773 = queryWeight, product of:
                2.844998 = boost
                8.444015 = idf(docFreq=24, maxDocs=42740)
                0.015365883 = queryNorm
              0.5277509 = fieldWeight in 443, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.444015 = idf(docFreq=24, maxDocs=42740)
                0.0625 = fieldNorm(doc=443)
          0.033646937 = weight(abstract_txt:retrieval in 443) [ClassicSimilarity], result of:
            0.033646937 = score(doc=443,freq=1.0), product of:
              0.15537716 = queryWeight, product of:
                2.9184437 = boost
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.015365883 = queryNorm
              0.21655008 = fieldWeight in 443, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.0625 = fieldNorm(doc=443)
          0.71328807 = weight(abstract_txt:turkish in 443) [ClassicSimilarity], result of:
            0.71328807 = score(doc=443,freq=4.0), product of:
              0.63235927 = queryWeight, product of:
                4.5605297 = boost
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.015365883 = queryNorm
              1.1279792 = fieldWeight in 443, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.0625 = fieldNorm(doc=443)
        0.32 = coord(8/25)
    
  3. Ahlgren, P.; Kekäläinen, J.: Indexing strategies for Swedish full text retrieval under different user scenarios (2007) 0.20
    0.20029005 = sum of:
      0.20029005 = product of:
        0.7153216 = sum of:
          0.011543917 = weight(abstract_txt:that in 2897) [ClassicSimilarity], result of:
            0.011543917 = score(doc=2897,freq=3.0), product of:
              0.044531666 = queryWeight, product of:
                1.2102298 = boost
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.015365883 = queryNorm
              0.2592294 = fieldWeight in 2897, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.0625 = fieldNorm(doc=2897)
          0.035896324 = weight(abstract_txt:document in 2897) [ClassicSimilarity], result of:
            0.035896324 = score(doc=2897,freq=2.0), product of:
              0.094871 = queryWeight, product of:
                1.4422963 = boost
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.015365883 = queryNorm
              0.37836984 = fieldWeight in 2897, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.0625 = fieldNorm(doc=2897)
          0.05608451 = weight(abstract_txt:performance in 2897) [ClassicSimilarity], result of:
            0.05608451 = score(doc=2897,freq=3.0), product of:
              0.11159165 = queryWeight, product of:
                1.5642407 = boost
                4.6426997 = idf(docFreq=1118, maxDocs=42740)
                0.015365883 = queryNorm
              0.50258696 = fieldWeight in 2897, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.6426997 = idf(docFreq=1118, maxDocs=42740)
                0.0625 = fieldNorm(doc=2897)
          0.076730475 = weight(abstract_txt:query in 2897) [ClassicSimilarity], result of:
            0.076730475 = score(doc=2897,freq=5.0), product of:
              0.115993075 = queryWeight, product of:
                1.5947909 = boost
                4.7333736 = idf(docFreq=1021, maxDocs=42740)
                0.015365883 = queryNorm
              0.6615091 = fieldWeight in 2897, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.7333736 = idf(docFreq=1021, maxDocs=42740)
                0.0625 = fieldNorm(doc=2897)
          0.38962555 = weight(abstract_txt:truncation in 2897) [ClassicSimilarity], result of:
            0.38962555 = score(doc=2897,freq=4.0), product of:
              0.36913773 = queryWeight, product of:
                2.844998 = boost
                8.444015 = idf(docFreq=24, maxDocs=42740)
                0.015365883 = queryNorm
              1.0555018 = fieldWeight in 2897, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                8.444015 = idf(docFreq=24, maxDocs=42740)
                0.0625 = fieldNorm(doc=2897)
          0.087162636 = weight(abstract_txt:effects in 2897) [ClassicSimilarity], result of:
            0.087162636 = score(doc=2897,freq=1.0), product of:
              0.24718805 = queryWeight, product of:
                2.8513274 = boost
                5.641867 = idf(docFreq=411, maxDocs=42740)
                0.015365883 = queryNorm
              0.3526167 = fieldWeight in 2897, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.641867 = idf(docFreq=411, maxDocs=42740)
                0.0625 = fieldNorm(doc=2897)
          0.058278203 = weight(abstract_txt:retrieval in 2897) [ClassicSimilarity], result of:
            0.058278203 = score(doc=2897,freq=3.0), product of:
              0.15537716 = queryWeight, product of:
                2.9184437 = boost
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.015365883 = queryNorm
              0.37507573 = fieldWeight in 2897, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.0625 = fieldNorm(doc=2897)
        0.28 = coord(7/25)
    
  4. Yilmaz, T.; Ozcan, R.; Altingovde, I.S.; Ulusoy, Ö.: Improving educational web search for question-like queries through subject classification (2019) 0.16
    0.16157942 = sum of:
      0.16157942 = product of:
        0.57706934 = sum of:
          0.045482464 = weight(abstract_txt:length in 1042) [ClassicSimilarity], result of:
            0.045482464 = score(doc=1042,freq=1.0), product of:
              0.11108689 = queryWeight, product of:
                1.1035808 = boost
                6.550903 = idf(docFreq=165, maxDocs=42740)
                0.015365883 = queryNorm
              0.40943143 = fieldWeight in 1042, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.550903 = idf(docFreq=165, maxDocs=42740)
                0.0625 = fieldNorm(doc=1042)
          0.014903133 = weight(abstract_txt:that in 1042) [ClassicSimilarity], result of:
            0.014903133 = score(doc=1042,freq=5.0), product of:
              0.044531666 = queryWeight, product of:
                1.2102298 = boost
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.015365883 = queryNorm
              0.33466372 = fieldWeight in 1042, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.0625 = fieldNorm(doc=1042)
          0.025382534 = weight(abstract_txt:document in 1042) [ClassicSimilarity], result of:
            0.025382534 = score(doc=1042,freq=1.0), product of:
              0.094871 = queryWeight, product of:
                1.4422963 = boost
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.015365883 = queryNorm
              0.26754788 = fieldWeight in 1042, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.0625 = fieldNorm(doc=1042)
          0.03238041 = weight(abstract_txt:performance in 1042) [ClassicSimilarity], result of:
            0.03238041 = score(doc=1042,freq=1.0), product of:
              0.11159165 = queryWeight, product of:
                1.5642407 = boost
                4.6426997 = idf(docFreq=1118, maxDocs=42740)
                0.015365883 = queryNorm
              0.29016873 = fieldWeight in 1042, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6426997 = idf(docFreq=1118, maxDocs=42740)
                0.0625 = fieldNorm(doc=1042)
          0.06862982 = weight(abstract_txt:query in 1042) [ClassicSimilarity], result of:
            0.06862982 = score(doc=1042,freq=4.0), product of:
              0.115993075 = queryWeight, product of:
                1.5947909 = boost
                4.7333736 = idf(docFreq=1021, maxDocs=42740)
                0.015365883 = queryNorm
              0.5916717 = fieldWeight in 1042, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.7333736 = idf(docFreq=1021, maxDocs=42740)
                0.0625 = fieldNorm(doc=1042)
          0.033646937 = weight(abstract_txt:retrieval in 1042) [ClassicSimilarity], result of:
            0.033646937 = score(doc=1042,freq=1.0), product of:
              0.15537716 = queryWeight, product of:
                2.9184437 = boost
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.015365883 = queryNorm
              0.21655008 = fieldWeight in 1042, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.0625 = fieldNorm(doc=1042)
          0.35664403 = weight(abstract_txt:turkish in 1042) [ClassicSimilarity], result of:
            0.35664403 = score(doc=1042,freq=1.0), product of:
              0.63235927 = queryWeight, product of:
                4.5605297 = boost
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.015365883 = queryNorm
              0.5639896 = fieldWeight in 1042, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.0625 = fieldNorm(doc=1042)
        0.28 = coord(7/25)
    
  5. Savoy, J.: Searching strategies for the Hungarian language (2008) 0.14
    0.14090489 = sum of:
      0.14090489 = product of:
        0.50323176 = sum of:
          0.016662208 = weight(abstract_txt:that in 4038) [ClassicSimilarity], result of:
            0.016662208 = score(doc=4038,freq=4.0), product of:
              0.044531666 = queryWeight, product of:
                1.2102298 = boost
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.015365883 = queryNorm
              0.37416542 = fieldWeight in 4038, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.078125 = fieldNorm(doc=4038)
          0.14445156 = weight(abstract_txt:stemming in 4038) [ClassicSimilarity], result of:
            0.14445156 = score(doc=4038,freq=3.0), product of:
              0.1434172 = queryWeight, product of:
                1.2539301 = boost
                7.4433827 = idf(docFreq=67, maxDocs=42740)
                0.015365883 = queryNorm
              1.0072123 = fieldWeight in 4038, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.4433827 = idf(docFreq=67, maxDocs=42740)
                0.078125 = fieldNorm(doc=4038)
          0.037606478 = weight(abstract_txt:approach in 4038) [ClassicSimilarity], result of:
            0.037606478 = score(doc=4038,freq=3.0), product of:
              0.0736724 = queryWeight, product of:
                1.2709842 = boost
                3.772308 = idf(docFreq=2671, maxDocs=42740)
                0.015365883 = queryNorm
              0.5104554 = fieldWeight in 4038, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.772308 = idf(docFreq=2671, maxDocs=42740)
                0.078125 = fieldNorm(doc=4038)
          0.15634805 = weight(abstract_txt:stemmer in 4038) [ClassicSimilarity], result of:
            0.15634805 = score(doc=4038,freq=1.0), product of:
              0.21804951 = queryWeight, product of:
                1.5461452 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.015365883 = queryNorm
              0.71703005 = fieldWeight in 4038, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.078125 = fieldNorm(doc=4038)
          0.04047551 = weight(abstract_txt:performance in 4038) [ClassicSimilarity], result of:
            0.04047551 = score(doc=4038,freq=1.0), product of:
              0.11159165 = queryWeight, product of:
                1.5642407 = boost
                4.6426997 = idf(docFreq=1118, maxDocs=42740)
                0.015365883 = queryNorm
              0.36271092 = fieldWeight in 4038, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6426997 = idf(docFreq=1118, maxDocs=42740)
                0.078125 = fieldNorm(doc=4038)
          0.06562928 = weight(abstract_txt:word in 4038) [ClassicSimilarity], result of:
            0.06562928 = score(doc=4038,freq=1.0), product of:
              0.15401697 = queryWeight, product of:
                1.837689 = boost
                5.4543004 = idf(docFreq=496, maxDocs=42740)
                0.015365883 = queryNorm
              0.4261172 = fieldWeight in 4038, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4543004 = idf(docFreq=496, maxDocs=42740)
                0.078125 = fieldNorm(doc=4038)
          0.042058673 = weight(abstract_txt:retrieval in 4038) [ClassicSimilarity], result of:
            0.042058673 = score(doc=4038,freq=1.0), product of:
              0.15537716 = queryWeight, product of:
                2.9184437 = boost
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.015365883 = queryNorm
              0.2706876 = fieldWeight in 4038, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.078125 = fieldNorm(doc=4038)
        0.28 = coord(7/25)