Search (1 results, page 1 of 1)

  • × author_ss:"Ahmed, F."
  • × language_ss:"e"
  • × theme_ss:"Computerlinguistik"
  1. Ahmed, F.; Nürnberger, A.: Evaluation of n-gram conflation approaches for Arabic text retrieval (2009) 0.01
    0.009513528 = product of:
      0.038054112 = sum of:
        0.027509877 = weight(_text_:retrieval in 2941) [ClassicSimilarity], result of:
          0.027509877 = score(doc=2941,freq=4.0), product of:
            0.09700725 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.032069415 = queryNorm
            0.2835858 = fieldWeight in 2941, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=2941)
        0.010544236 = product of:
          0.021088472 = sum of:
            0.021088472 = weight(_text_:system in 2941) [ClassicSimilarity], result of:
              0.021088472 = score(doc=2941,freq=2.0), product of:
                0.10100432 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.032069415 = queryNorm
                0.20878783 = fieldWeight in 2941, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2941)
          0.5 = coord(1/2)
      0.25 = coord(2/8)
    
    Abstract
    In this paper we present a language-independent approach for conflation that does not depend on predefined rules or prior knowledge of the target language. The proposed unsupervised method is based on an enhancement of the pure n-gram model that can group related words based on various string-similarity measures, while restricting the search to specific locations of the target word by taking into account the order of n-grams. We show that the method is effective to achieve high score similarities for all word-form variations and reduces the ambiguity, i.e., obtains a higher precision and recall, compared to pure n-gram-based approaches for English, Portuguese, and Arabic. The proposed method is especially suited for conflation approaches in Arabic, since Arabic is a highly inflectional language. Therefore, we present in addition an adaptive user interface for Arabic text retrieval called araSearch. araSearch serves as a metasearch interface to existing search engines. The system is able to extend a query using the proposed conflation approach such that additional results for relevant subwords can be found automatically.