Document (#14867)

Author
Ekmekcioglu, F.C.
Lynch, M.F.
Willet, P.
Title
Development and evaluation of conflation techniques for the implementation of a document retrieval system for Turkish text databases
Source
New review of document and text management. 1995, no.1, S.131-146
Year
1995
Abstract
Considers language processing techniques necessary for the implementation of a document retrieval system for Turkish text databases. Introduces the main characteristics of the Turkish language. Discusses the development of a stopword list and the evaluation of a stemming algorithm that takes account of the language's morphological structure. A 2 level description of Turkish morphology developed in Bilkent University, Ankara, is incorporated into a morphological parser, PC-KIMMO, to carry out stemming in Turkish databases. Describes the evaluation of string similarity measures - n-gram matching techniques - for Turkish. Reports experiments on 6 different Turkish text corpora
Theme
Computerlinguistik

Similar documents (author)

  1. Lynch, C.A.: ¬The use of heuristics in user interfaces for online information retrieval systems (1987) 4.97
    4.974511 = sum of:
      4.974511 = weight(author_txt:lynch in 2236) [ClassicSimilarity], result of:
        4.974511 = fieldWeight in 2236, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.9592175 = idf(docFreq=41, maxDocs=44218)
          0.625 = fieldNorm(doc=2236)
    
  2. Lynch, C.A.: ¬The MELVYL system : looking back, looking forward (1992) 4.97
    4.974511 = sum of:
      4.974511 = weight(author_txt:lynch in 2252) [ClassicSimilarity], result of:
        4.974511 = fieldWeight in 2252, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.9592175 = idf(docFreq=41, maxDocs=44218)
          0.625 = fieldNorm(doc=2252)
    
  3. Lynch, M.J.: Access technology in academic libraries (1992) 4.97
    4.974511 = sum of:
      4.974511 = weight(author_txt:lynch in 2344) [ClassicSimilarity], result of:
        4.974511 = fieldWeight in 2344, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.9592175 = idf(docFreq=41, maxDocs=44218)
          0.625 = fieldNorm(doc=2344)
    
  4. Lynch, C.A.: Subject access in MELVYL : reducing search results to manageable size (1990) 4.97
    4.974511 = sum of:
      4.974511 = weight(author_txt:lynch in 2681) [ClassicSimilarity], result of:
        4.974511 = fieldWeight in 2681, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.9592175 = idf(docFreq=41, maxDocs=44218)
          0.625 = fieldNorm(doc=2681)
    
  5. Lynch, C.A.: ¬The next generation of public access information retrieval systems for research libraries : lessons from ten years of the MELVYL system (1992) 4.97
    4.974511 = sum of:
      4.974511 = weight(author_txt:lynch in 2971) [ClassicSimilarity], result of:
        4.974511 = fieldWeight in 2971, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.9592175 = idf(docFreq=41, maxDocs=44218)
          0.625 = fieldNorm(doc=2971)
    

Similar documents (content)

  1. Can, F.; Kocberber, S.; Balcik, E.; Kaynak, C.; Ocalan, H.C.: Information retrieval on Turkish texts (2008) 0.39
    0.393129 = sum of:
      0.393129 = product of:
        1.4040321 = sum of:
          0.032526925 = weight(abstract_txt:matching in 1373) [ClassicSimilarity], result of:
            0.032526925 = score(doc=1373,freq=1.0), product of:
              0.057367537 = queryWeight, product of:
                1.0373033 = boost
                6.047913 = idf(docFreq=283, maxDocs=44218)
                0.009144393 = queryNorm
              0.56699187 = fieldWeight in 1373, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.047913 = idf(docFreq=283, maxDocs=44218)
                0.09375 = fieldNorm(doc=1373)
          0.024683291 = weight(abstract_txt:retrieval in 1373) [ClassicSimilarity], result of:
            0.024683291 = score(doc=1373,freq=4.0), product of:
              0.03788171 = queryWeight, product of:
                1.1920719 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.009144393 = queryNorm
              0.6515886 = fieldWeight in 1373, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.09375 = fieldNorm(doc=1373)
          0.021509767 = weight(abstract_txt:language in 1373) [ClassicSimilarity], result of:
            0.021509767 = score(doc=1373,freq=1.0), product of:
              0.054861955 = queryWeight, product of:
                1.4345752 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.009144393 = queryNorm
              0.3920707 = fieldWeight in 1373, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.09375 = fieldNorm(doc=1373)
          0.032895397 = weight(abstract_txt:document in 1373) [ClassicSimilarity], result of:
            0.032895397 = score(doc=1373,freq=2.0), product of:
              0.057799973 = queryWeight, product of:
                1.4724871 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.009144393 = queryNorm
              0.5691248 = fieldWeight in 1373, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.09375 = fieldNorm(doc=1373)
          0.13632356 = weight(abstract_txt:stopword in 1373) [ClassicSimilarity], result of:
            0.13632356 = score(doc=1373,freq=1.0), product of:
              0.14912534 = queryWeight, product of:
                1.6724317 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.009144393 = queryNorm
              0.9141542 = fieldWeight in 1373, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.09375 = fieldNorm(doc=1373)
          0.121518925 = weight(abstract_txt:stemming in 1373) [ClassicSimilarity], result of:
            0.121518925 = score(doc=1373,freq=1.0), product of:
              0.17402439 = queryWeight, product of:
                2.5550108 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.009144393 = queryNorm
              0.6982868 = fieldWeight in 1373, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.09375 = fieldNorm(doc=1373)
          1.0345743 = weight(abstract_txt:turkish in 1373) [ClassicSimilarity], result of:
            1.0345743 = score(doc=1373,freq=2.0), product of:
              0.8743822 = queryWeight, product of:
                10.714511 = boost
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.009144393 = queryNorm
              1.183206 = fieldWeight in 1373, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.09375 = fieldNorm(doc=1373)
        0.28 = coord(7/25)
    
  2. Snajder, J.; Dalbelo Basic, B.D.; Tadic, M.: Automatic acquisition of inflectional lexica for morphological normalisation (2008) 0.18
    0.18022048 = sum of:
      0.18022048 = product of:
        0.6436446 = sum of:
          0.010284704 = weight(abstract_txt:retrieval in 2910) [ClassicSimilarity], result of:
            0.010284704 = score(doc=2910,freq=1.0), product of:
              0.03788171 = queryWeight, product of:
                1.1920719 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.009144393 = queryNorm
              0.27149525 = fieldWeight in 2910, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.078125 = fieldNorm(doc=2910)
          0.04246288 = weight(abstract_txt:corpora in 2910) [ClassicSimilarity], result of:
            0.04246288 = score(doc=2910,freq=1.0), product of:
              0.07738046 = queryWeight, product of:
                1.2047261 = boost
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.009144393 = queryNorm
              0.5487546 = fieldWeight in 2910, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.078125 = fieldNorm(doc=2910)
          0.025349502 = weight(abstract_txt:language in 2910) [ClassicSimilarity], result of:
            0.025349502 = score(doc=2910,freq=2.0), product of:
              0.054861955 = queryWeight, product of:
                1.4345752 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.009144393 = queryNorm
              0.46205974 = fieldWeight in 2910, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.078125 = fieldNorm(doc=2910)
          0.12877531 = weight(abstract_txt:morphology in 2910) [ClassicSimilarity], result of:
            0.12877531 = score(doc=2910,freq=2.0), product of:
              0.12867774 = queryWeight, product of:
                1.553547 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.009144393 = queryNorm
              1.0007583 = fieldWeight in 2910, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.078125 = fieldNorm(doc=2910)
          0.024308428 = weight(abstract_txt:text in 2910) [ClassicSimilarity], result of:
            0.024308428 = score(doc=2910,freq=1.0), product of:
              0.07694316 = queryWeight, product of:
                2.0807424 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.009144393 = queryNorm
              0.3159271 = fieldWeight in 2910, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=2910)
          0.101265766 = weight(abstract_txt:stemming in 2910) [ClassicSimilarity], result of:
            0.101265766 = score(doc=2910,freq=1.0), product of:
              0.17402439 = queryWeight, product of:
                2.5550108 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.009144393 = queryNorm
              0.5819056 = fieldWeight in 2910, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.078125 = fieldNorm(doc=2910)
          0.311198 = weight(abstract_txt:morphological in 2910) [ClassicSimilarity], result of:
            0.311198 = score(doc=2910,freq=6.0), product of:
              0.20243043 = queryWeight, product of:
                2.7556596 = boost
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.009144393 = queryNorm
              1.5373085 = fieldWeight in 2910, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.078125 = fieldNorm(doc=2910)
        0.28 = coord(7/25)
    
  3. Can, F.; Kocberber, S.; Baglioglu, O.; Kardas, S.; Ocalan, H.C.; Uyar, E.: New event detection and topic tracking in Turkish (2010) 0.18
    0.17800681 = sum of:
      0.17800681 = product of:
        1.1125426 = sum of:
          0.008227764 = weight(abstract_txt:retrieval in 3442) [ClassicSimilarity], result of:
            0.008227764 = score(doc=3442,freq=1.0), product of:
              0.03788171 = queryWeight, product of:
                1.1920719 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.009144393 = queryNorm
              0.21719621 = fieldWeight in 3442, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.0625 = fieldNorm(doc=3442)
          0.014339844 = weight(abstract_txt:language in 3442) [ClassicSimilarity], result of:
            0.014339844 = score(doc=3442,freq=1.0), product of:
              0.054861955 = queryWeight, product of:
                1.4345752 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.009144393 = queryNorm
              0.26138046 = fieldWeight in 3442, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.0625 = fieldNorm(doc=3442)
          0.11456914 = weight(abstract_txt:stemming in 3442) [ClassicSimilarity], result of:
            0.11456914 = score(doc=3442,freq=2.0), product of:
              0.17402439 = queryWeight, product of:
                2.5550108 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.009144393 = queryNorm
              0.65835106 = fieldWeight in 3442, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.0625 = fieldNorm(doc=3442)
          0.97540593 = weight(abstract_txt:turkish in 3442) [ClassicSimilarity], result of:
            0.97540593 = score(doc=3442,freq=4.0), product of:
              0.8743822 = queryWeight, product of:
                10.714511 = boost
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.009144393 = queryNorm
              1.1155373 = fieldWeight in 3442, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.0625 = fieldNorm(doc=3442)
        0.16 = coord(4/25)
    
  4. Kettunen, K.; Kunttu, T.; Järvelin, K.: To stem or lemmatize a highly inflectional language in a probabilistic IR environment? (2005) 0.11
    0.114041835 = sum of:
      0.114041835 = product of:
        0.40729228 = sum of:
          0.0093040075 = weight(abstract_txt:system in 4395) [ClassicSimilarity], result of:
            0.0093040075 = score(doc=4395,freq=2.0), product of:
              0.035673004 = queryWeight, product of:
                1.156798 = boost
                3.3723085 = idf(docFreq=4123, maxDocs=44218)
                0.009144393 = queryNorm
              0.26081368 = fieldWeight in 4395, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3723085 = idf(docFreq=4123, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4395)
          0.0071992935 = weight(abstract_txt:retrieval in 4395) [ClassicSimilarity], result of:
            0.0071992935 = score(doc=4395,freq=1.0), product of:
              0.03788171 = queryWeight, product of:
                1.1920719 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.009144393 = queryNorm
              0.19004668 = fieldWeight in 4395, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4395)
          0.025094727 = weight(abstract_txt:language in 4395) [ClassicSimilarity], result of:
            0.025094727 = score(doc=4395,freq=4.0), product of:
              0.054861955 = queryWeight, product of:
                1.4345752 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.009144393 = queryNorm
              0.45741582 = fieldWeight in 4395, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4395)
          0.024376966 = weight(abstract_txt:implementation in 4395) [ClassicSimilarity], result of:
            0.024376966 = score(doc=4395,freq=1.0), product of:
              0.08541931 = queryWeight, product of:
                1.7900523 = boost
                5.2183776 = idf(docFreq=650, maxDocs=44218)
                0.009144393 = queryNorm
              0.28538004 = fieldWeight in 4395, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2183776 = idf(docFreq=650, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4395)
          0.023230694 = weight(abstract_txt:evaluation in 4395) [ClassicSimilarity], result of:
            0.023230694 = score(doc=4395,freq=1.0), product of:
              0.09469089 = queryWeight, product of:
                2.3082743 = boost
                4.4860687 = idf(docFreq=1353, maxDocs=44218)
                0.009144393 = queryNorm
              0.24533188 = fieldWeight in 4395, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4860687 = idf(docFreq=1353, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4395)
          0.100248 = weight(abstract_txt:stemming in 4395) [ClassicSimilarity], result of:
            0.100248 = score(doc=4395,freq=2.0), product of:
              0.17402439 = queryWeight, product of:
                2.5550108 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.009144393 = queryNorm
              0.5760572 = fieldWeight in 4395, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4395)
          0.21783859 = weight(abstract_txt:morphological in 4395) [ClassicSimilarity], result of:
            0.21783859 = score(doc=4395,freq=6.0), product of:
              0.20243043 = queryWeight, product of:
                2.7556596 = boost
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.009144393 = queryNorm
              1.0761158 = fieldWeight in 4395, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4395)
        0.28 = coord(7/25)
    
  5. Mustafa, S.H.; AI-Radaideh, Q.A.: Using n-grams for Arabic text searching (2004) 0.10
    0.10232275 = sum of:
      0.10232275 = product of:
        0.4263448 = sum of:
          0.027105771 = weight(abstract_txt:matching in 2888) [ClassicSimilarity], result of:
            0.027105771 = score(doc=2888,freq=1.0), product of:
              0.057367537 = queryWeight, product of:
                1.0373033 = boost
                6.047913 = idf(docFreq=283, maxDocs=44218)
                0.009144393 = queryNorm
              0.4724932 = fieldWeight in 2888, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.047913 = idf(docFreq=283, maxDocs=44218)
                0.078125 = fieldNorm(doc=2888)
          0.014544768 = weight(abstract_txt:retrieval in 2888) [ClassicSimilarity], result of:
            0.014544768 = score(doc=2888,freq=2.0), product of:
              0.03788171 = queryWeight, product of:
                1.1920719 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.009144393 = queryNorm
              0.38395226 = fieldWeight in 2888, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.078125 = fieldNorm(doc=2888)
          0.1060619 = weight(abstract_txt:gram in 2888) [ClassicSimilarity], result of:
            0.1060619 = score(doc=2888,freq=3.0), product of:
              0.09876981 = queryWeight, product of:
                1.3610835 = boost
                7.935687 = idf(docFreq=42, maxDocs=44218)
                0.009144393 = queryNorm
              1.0738292 = fieldWeight in 2888, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.935687 = idf(docFreq=42, maxDocs=44218)
                0.078125 = fieldNorm(doc=2888)
          0.21008722 = weight(abstract_txt:conflation in 2888) [ClassicSimilarity], result of:
            0.21008722 = score(doc=2888,freq=4.0), product of:
              0.14153747 = queryWeight, product of:
                1.6293275 = boost
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.009144393 = queryNorm
              1.4843223 = fieldWeight in 2888, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.078125 = fieldNorm(doc=2888)
          0.03437731 = weight(abstract_txt:text in 2888) [ClassicSimilarity], result of:
            0.03437731 = score(doc=2888,freq=2.0), product of:
              0.07694316 = queryWeight, product of:
                2.0807424 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.009144393 = queryNorm
              0.44678837 = fieldWeight in 2888, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=2888)
          0.034167852 = weight(abstract_txt:techniques in 2888) [ClassicSimilarity], result of:
            0.034167852 = score(doc=2888,freq=1.0), product of:
              0.09654813 = queryWeight, product of:
                2.3308012 = boost
                4.5298495 = idf(docFreq=1295, maxDocs=44218)
                0.009144393 = queryNorm
              0.3538945 = fieldWeight in 2888, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5298495 = idf(docFreq=1295, maxDocs=44218)
                0.078125 = fieldNorm(doc=2888)
        0.24 = coord(6/25)