Search (208 results, page 2 of 11)

  • × theme_ss:"Computerlinguistik"
  • × type_ss:"a"
  • × year_i:[2000 TO 2010}
  1. Diaz, I.; Morato, J.; Lioréns, J.: ¬An algorithm for term conflation based on tree structures (2002) 0.01
    0.0084460005 = product of:
      0.021115001 = sum of:
        0.012184162 = weight(_text_:a in 246) [ClassicSimilarity], result of:
          0.012184162 = score(doc=246,freq=10.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.22789092 = fieldWeight in 246, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=246)
        0.0089308405 = product of:
          0.017861681 = sum of:
            0.017861681 = weight(_text_:information in 246) [ClassicSimilarity], result of:
              0.017861681 = score(doc=246,freq=4.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.21943474 = fieldWeight in 246, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0625 = fieldNorm(doc=246)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    This work presents a new stemming algorithm. This algorithm stores the stemming information in tree structures. This storage allows us to enhance the performance of the algorithm due to the reduction of the search space and the overall complexity. The final result of that stemming algorithm is a normalized concept, understanding this process as the automatic extraction of the generic form (or a lexeme) for a selected term.
    Source
    Journal of the American Society for Information Science and technology. 53(2002) no.3, S.199-208
    Type
    a
  2. Hull, D.; Ait-Mokhtar, S.; Chuat, M.; Eisele, A.; Gaussier, E.; Grefenstette, G.; Isabelle, P.; Samulesson, C.; Segand, F.: Language technologies and patent search and classification (2001) 0.01
    0.008412599 = product of:
      0.021031497 = sum of:
        0.01155891 = weight(_text_:a in 6318) [ClassicSimilarity], result of:
          0.01155891 = score(doc=6318,freq=4.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.2161963 = fieldWeight in 6318, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.09375 = fieldNorm(doc=6318)
        0.009472587 = product of:
          0.018945174 = sum of:
            0.018945174 = weight(_text_:information in 6318) [ClassicSimilarity], result of:
              0.018945174 = score(doc=6318,freq=2.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.23274569 = fieldWeight in 6318, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.09375 = fieldNorm(doc=6318)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Source
    World patent information. 23(2001), S.265-268
    Type
    a
  3. Vilar, P.; Dimec, J.: Krnjenje kot osnova nekaterih nekonvencionalnih metod poizvedovanja (2000) 0.01
    0.008412599 = product of:
      0.021031497 = sum of:
        0.01155891 = weight(_text_:a in 6331) [ClassicSimilarity], result of:
          0.01155891 = score(doc=6331,freq=4.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.2161963 = fieldWeight in 6331, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.09375 = fieldNorm(doc=6331)
        0.009472587 = product of:
          0.018945174 = sum of:
            0.018945174 = weight(_text_:information in 6331) [ClassicSimilarity], result of:
              0.018945174 = score(doc=6331,freq=2.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.23274569 = fieldWeight in 6331, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.09375 = fieldNorm(doc=6331)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Footnote
    Übers. d. Titels: Stemming as a basis for some non-conventional methods of information retrieval
    Type
    a
  4. Mustafa El Hadi, W.: Evaluating human language technology : general applications to information access and management (2002) 0.01
    0.008412599 = product of:
      0.021031497 = sum of:
        0.01155891 = weight(_text_:a in 1840) [ClassicSimilarity], result of:
          0.01155891 = score(doc=1840,freq=4.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.2161963 = fieldWeight in 1840, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.09375 = fieldNorm(doc=1840)
        0.009472587 = product of:
          0.018945174 = sum of:
            0.018945174 = weight(_text_:information in 1840) [ClassicSimilarity], result of:
              0.018945174 = score(doc=1840,freq=2.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.23274569 = fieldWeight in 1840, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.09375 = fieldNorm(doc=1840)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Footnote
    Guest editorial to a special issue of Knowledge Organization on "Evaluation of HLT"
    Type
    a
  5. Chieu, H.L.; Lee, Y.K.: Query based event extraction along a timeline (2004) 0.01
    0.008412599 = product of:
      0.021031497 = sum of:
        0.01155891 = weight(_text_:a in 4108) [ClassicSimilarity], result of:
          0.01155891 = score(doc=4108,freq=4.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.2161963 = fieldWeight in 4108, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.09375 = fieldNorm(doc=4108)
        0.009472587 = product of:
          0.018945174 = sum of:
            0.018945174 = weight(_text_:information in 4108) [ClassicSimilarity], result of:
              0.018945174 = score(doc=4108,freq=2.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.23274569 = fieldWeight in 4108, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.09375 = fieldNorm(doc=4108)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Source
    SIGIR'04: Proceedings of the 27th Annual International ACM-SIGIR Conference an Research and Development in Information Retrieval. Ed.: K. Järvelin, u.a
    Type
    a
  6. Chandrasekar, R.; Bangalore, S.: Glean : using syntactic information in document filtering (2002) 0.01
    0.008318391 = product of:
      0.020795977 = sum of:
        0.009632425 = weight(_text_:a in 4257) [ClassicSimilarity], result of:
          0.009632425 = score(doc=4257,freq=16.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.18016359 = fieldWeight in 4257, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4257)
        0.011163551 = product of:
          0.022327103 = sum of:
            0.022327103 = weight(_text_:information in 4257) [ClassicSimilarity], result of:
              0.022327103 = score(doc=4257,freq=16.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.27429342 = fieldWeight in 4257, product of:
                  4.0 = tf(freq=16.0), with freq of:
                    16.0 = termFreq=16.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4257)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    In today's networked world, a huge amount of data is available in machine-processable form. Likewise, there are any number of search engines and specialized information retrieval (IR) programs that seek to extract relevant information from these data repositories. Most IR systems and Web search engines have been designed for speed and tend to maximize the quantity of information (recall) rather than the relevance of the information (precision) to the query. As a result, search engine users get inundated with information for practically any query, and are forced to scan a large number of potentially relevant items to get to the information of interest. The Holy Grail of IR is to somehow retrieve those and only those documents pertinent to the user's query. Polysemy and synonymy - the fact that often there are several meanings for a word or phrase, and likewise, many ways to express a conceptmake this a very hard task. While conventional IR systems provide usable solutions, there are a number of open problems to be solved, in areas such as syntactic processing, semantic analysis, and user modeling, before we develop systems that "understand" user queries and text collections. Meanwhile, we can use tools and techniques available today to improve the precision of retrieval. In particular, using the approach described in this article, we can approximate understanding using the syntactic structure and patterns of language use that is latent in documents to make IR more effective.
    Source
    Encyclopedia of library and information science. Vol.71, [=Suppl.34]
    Type
    a
  7. Koppel, M.; Akiva, N.; Dagan, I.: Feature instability as a criterion for selecting potential style markers (2006) 0.01
    0.008292621 = product of:
      0.020731552 = sum of:
        0.0144164935 = weight(_text_:a in 6092) [ClassicSimilarity], result of:
          0.0144164935 = score(doc=6092,freq=14.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.26964417 = fieldWeight in 6092, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=6092)
        0.006315058 = product of:
          0.012630116 = sum of:
            0.012630116 = weight(_text_:information in 6092) [ClassicSimilarity], result of:
              0.012630116 = score(doc=6092,freq=2.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.1551638 = fieldWeight in 6092, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0625 = fieldNorm(doc=6092)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    We introduce a new measure on linguistic features, called stability, which captures the extent to which a language element such as a word or a syntactic construct is replaceable by semantically equivalent elements. This measure may be perceived as quantifying the degree of available "synonymy" for a language item. We show that frequent, but unstable, features are especially useful as discriminators of an author's writing style.
    Source
    Journal of the American Society for Information Science and Technology. 57(2006) no.11, S.1519-1525
    Type
    a
  8. Liu, X.; Croft, W.B.: Statistical language modeling for information retrieval (2004) 0.01
    0.00826371 = product of:
      0.020659275 = sum of:
        0.01021673 = weight(_text_:a in 4277) [ClassicSimilarity], result of:
          0.01021673 = score(doc=4277,freq=18.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.19109234 = fieldWeight in 4277, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4277)
        0.010442546 = product of:
          0.020885091 = sum of:
            0.020885091 = weight(_text_:information in 4277) [ClassicSimilarity], result of:
              0.020885091 = score(doc=4277,freq=14.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.256578 = fieldWeight in 4277, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4277)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    This chapter reviews research and applications in statistical language modeling for information retrieval (IR), which has emerged within the past several years as a new probabilistic framework for describing information retrieval processes. Generally speaking, statistical language modeling, or more simply language modeling (LM), involves estimating a probability distribution that captures statistical regularities of natural language use. Applied to information retrieval, language modeling refers to the problem of estimating the likelihood that a query and a document could have been generated by the same language model, given the language model of the document either with or without a language model of the query. The roots of statistical language modeling date to the beginning of the twentieth century when Markov tried to model letter sequences in works of Russian literature (Manning & Schütze, 1999). Zipf (1929, 1932, 1949, 1965) studied the statistical properties of text and discovered that the frequency of works decays as a Power function of each works rank. However, it was Shannon's (1951) work that inspired later research in this area. In 1951, eager to explore the applications of his newly founded information theory to human language, Shannon used a prediction game involving n-grams to investigate the information content of English text. He evaluated n-gram models' performance by comparing their crossentropy an texts with the true entropy estimated using predictions made by human subjects. For many years, statistical language models have been used primarily for automatic speech recognition. Since 1980, when the first significant language model was proposed (Rosenfeld, 2000), statistical language modeling has become a fundamental component of speech recognition, machine translation, and spelling correction.
    Source
    Annual review of information science and technology. 39(2005), S.3-32
    Type
    a
  9. Santana Suárez, O.; Carreras Riudavets, F.J.; Hernández Figueroa, Z.; González Cabrera, A.C.: Integration of an XML electronic dictionary with linguistic tools for natural language processing (2007) 0.01
    0.008240394 = product of:
      0.020600986 = sum of:
        0.0100103095 = weight(_text_:a in 921) [ClassicSimilarity], result of:
          0.0100103095 = score(doc=921,freq=12.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.18723148 = fieldWeight in 921, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=921)
        0.010590675 = product of:
          0.02118135 = sum of:
            0.02118135 = weight(_text_:information in 921) [ClassicSimilarity], result of:
              0.02118135 = score(doc=921,freq=10.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.2602176 = fieldWeight in 921, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046875 = fieldNorm(doc=921)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    This study proposes the codification of lexical information in electronic dictionaries, in accordance with a generic and extendable XML scheme model, and its conjunction with linguistic tools for the processing of natural language. Our approach is different from other similar studies in that we propose XML coding of those items from a dictionary of meanings that are less related to the lexical units. Linguistic information, such as morphology, syllables, phonology, etc., will be included by means of specific linguistic tools. The use of XML as a container for the information allows the use of other XML tools for carrying out searches or for enabling presentation of the information in different resources. This model is particularly important as it combines two parallel paradigms-extendable labelling of documents and computational linguistics-and it is also applicable to other languages. We have included a comparison with the labelling proposal of printed dictionaries carried out by the Text Encoding Initiative (TEI). The proposed design has been validated with a dictionary of more than 145 000 accepted meanings.
    Source
    Information processing and management. 43(2007) no.4, S.946-957
    Type
    a
  10. Atlam, E.S.: Similarity measurement using term negative weight and its application to word similarity (2000) 0.01
    0.008234787 = product of:
      0.020586967 = sum of:
        0.009535614 = weight(_text_:a in 4844) [ClassicSimilarity], result of:
          0.009535614 = score(doc=4844,freq=2.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.17835285 = fieldWeight in 4844, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.109375 = fieldNorm(doc=4844)
        0.011051352 = product of:
          0.022102704 = sum of:
            0.022102704 = weight(_text_:information in 4844) [ClassicSimilarity], result of:
              0.022102704 = score(doc=4844,freq=2.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.27153665 = fieldWeight in 4844, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.109375 = fieldNorm(doc=4844)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Source
    Information processing and management. 36(2000) no.5, S.717-736
    Type
    a
  11. Hane, P.J.: Beyond keyword searching : Oingo and Simpli.com introduce meaning-based searching (2000) 0.01
    0.008234787 = product of:
      0.020586967 = sum of:
        0.009535614 = weight(_text_:a in 6301) [ClassicSimilarity], result of:
          0.009535614 = score(doc=6301,freq=2.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.17835285 = fieldWeight in 6301, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.109375 = fieldNorm(doc=6301)
        0.011051352 = product of:
          0.022102704 = sum of:
            0.022102704 = weight(_text_:information in 6301) [ClassicSimilarity], result of:
              0.022102704 = score(doc=6301,freq=2.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.27153665 = fieldWeight in 6301, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.109375 = fieldNorm(doc=6301)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Source
    Information today. 17(2000) no.1, S.57
    Type
    a
  12. Bowker, L.: Information retrieval in translation memory systems : assessment of current limitations and possibilities for future development (2002) 0.01
    0.008234787 = product of:
      0.020586967 = sum of:
        0.009535614 = weight(_text_:a in 1854) [ClassicSimilarity], result of:
          0.009535614 = score(doc=1854,freq=8.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.17835285 = fieldWeight in 1854, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1854)
        0.011051352 = product of:
          0.022102704 = sum of:
            0.022102704 = weight(_text_:information in 1854) [ClassicSimilarity], result of:
              0.022102704 = score(doc=1854,freq=8.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.27153665 = fieldWeight in 1854, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1854)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    A translation memory system is a new type of human language technology (HLT) tool that is gaining popularity among translators. Such tools allow translators to store previously translated texts in a type of aligned bilingual database, and to recycle relevant parts of these texts when producing new translations. Currently, these tools retrieve information from the database using superficial character string matching, which often results in poor precision and recall. This paper explains how translation memory systems work, and it considers some possible ways for introducing more sophisticated information retrieval techniques into such systems by taking syntactic and semantic similarity into account. Some of the suggested techniques are inspired by these used in other areas of HLT, and some by techniques used in information science.
    Type
    a
  13. Sicilia-Garcia, E.I.; Smith, F.J.: Statistical language modeling (2002) 0.01
    0.008234787 = product of:
      0.020586967 = sum of:
        0.009535614 = weight(_text_:a in 4261) [ClassicSimilarity], result of:
          0.009535614 = score(doc=4261,freq=2.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.17835285 = fieldWeight in 4261, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.109375 = fieldNorm(doc=4261)
        0.011051352 = product of:
          0.022102704 = sum of:
            0.022102704 = weight(_text_:information in 4261) [ClassicSimilarity], result of:
              0.022102704 = score(doc=4261,freq=2.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.27153665 = fieldWeight in 4261, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.109375 = fieldNorm(doc=4261)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Source
    Encyclopedia of library and information science. Vol.71, [=Suppl.34]
    Type
    a
  14. Ponte, J.M.: Language models for relevance feedback (2000) 0.01
    0.008113983 = product of:
      0.020284958 = sum of:
        0.010812371 = weight(_text_:a in 35) [ClassicSimilarity], result of:
          0.010812371 = score(doc=35,freq=14.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.20223314 = fieldWeight in 35, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=35)
        0.009472587 = product of:
          0.018945174 = sum of:
            0.018945174 = weight(_text_:information in 35) [ClassicSimilarity], result of:
              0.018945174 = score(doc=35,freq=8.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.23274569 = fieldWeight in 35, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046875 = fieldNorm(doc=35)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    The language modeling approach to Information Retrieval (IR) is a conceptually simple model of IR originally developed by Ponte and Croft (1998). In this approach, the query is treated as a random event and documents are ranked according to the likelihood that the query would be generated via a language model estimated for each document. The intuition behind this approach is that users have a prototypical document in mind and will choose query terms accordingly. The intuitive appeal of this method is that inferences about the semantic content of documents do not need to be made resulting in a conceptually simple model. In this paper, techniques for relevance feedback and routing are derived from the language modeling approach in a straightforward manner and their effectiveness is demonstrated empirically. These experiments demonstrate further proof of concept for the language modeling approach to retrieval
    Series
    The Kluwer international series on information retrieval; 7
    Source
    Advances in information retrieval: Recent research from the Center for Intelligent Information Retrieval. Ed.: W.B. Croft
    Type
    a
  15. Mustafa el Hadi, W.: Dynamics of the linguistic paradigm in information retrieval (2000) 0.01
    0.008113983 = product of:
      0.020284958 = sum of:
        0.010812371 = weight(_text_:a in 151) [ClassicSimilarity], result of:
          0.010812371 = score(doc=151,freq=14.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.20223314 = fieldWeight in 151, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=151)
        0.009472587 = product of:
          0.018945174 = sum of:
            0.018945174 = weight(_text_:information in 151) [ClassicSimilarity], result of:
              0.018945174 = score(doc=151,freq=8.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.23274569 = fieldWeight in 151, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046875 = fieldNorm(doc=151)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    In this paper we briefly sketch the dynamics of the linguistic paradigm in Information Retrieval (IR) and its adaptation to the Internet. The emergence of Natural Language Processing (NLP) techniques has been a major factor leading to this adaptation. These techniques and tools try to adapt to the current needs, i.e. retrieving information from documents written and indexed in a foreign language by using a native language query to express the information need. This process, known as cross-language IR (CLIR), is a field at the cross roads of both Machine Translation and IR. This field represents a real challenge to the IR community and will require a solid cooperation with the NLP community.
    Type
    a
  16. Benoit, G.: Data discretization for novel relationship discovery in information retrieval (2002) 0.01
    0.007931474 = product of:
      0.019828685 = sum of:
        0.010897844 = weight(_text_:a in 5197) [ClassicSimilarity], result of:
          0.010897844 = score(doc=5197,freq=8.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.20383182 = fieldWeight in 5197, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=5197)
        0.0089308405 = product of:
          0.017861681 = sum of:
            0.017861681 = weight(_text_:information in 5197) [ClassicSimilarity], result of:
              0.017861681 = score(doc=5197,freq=4.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.21943474 = fieldWeight in 5197, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0625 = fieldNorm(doc=5197)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    A sample of 600 Dialog and Swiss-Prot full text records in genetics and molecular biology were parsed and term frequencies calculated to provide data for a test of Benoit's visualization model for retrieval. A retrieved set is displayed graphically allowing for manipulation of document and concept relationships in real time, which hopefully will reveal unanticipated relationships.
    Source
    Journal of the American Society for Information Science and Technology. 53(2002) no.9, S.736-746
    Type
    a
  17. Bacchin, M.; Ferro, N.; Melucci, M.: ¬A probabilistic model for stemmer generation (2005) 0.01
    0.007797272 = product of:
      0.01949318 = sum of:
        0.011678694 = weight(_text_:a in 1001) [ClassicSimilarity], result of:
          0.011678694 = score(doc=1001,freq=12.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.21843673 = fieldWeight in 1001, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1001)
        0.007814486 = product of:
          0.015628971 = sum of:
            0.015628971 = weight(_text_:information in 1001) [ClassicSimilarity], result of:
              0.015628971 = score(doc=1001,freq=4.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.1920054 = fieldWeight in 1001, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1001)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    In this paper we will present a language-independent probabilistic model which can automatically generate stemmers. Stemmers can improve the retrieval effectiveness of information retrieval systems, however the designing and the implementation of stemmers requires a laborious amount of effort due to the fact that documents and queries are often written or spoken in several different languages. The probabilistic model proposed in this paper aims at the development of stemmers used for several languages. The proposed model describes the mutual reinforcement relationship between stems and derivations and then provides a probabilistic interpretation. A series of experiments shows that the stemmers generated by the probabilistic model are as effective as the ones based on linguistic knowledge.
    Source
    Information processing and management. 41(2005) no.1, S.121-137
    Type
    a
  18. Beitzel, S.M.; Jensen, E.C.; Chowdhury, A.; Grossman, D.; Frieder, O; Goharian, N.: Fusion of effective retrieval strategies in the same information retrieval system (2004) 0.01
    0.007505624 = product of:
      0.01876406 = sum of:
        0.008173384 = weight(_text_:a in 2502) [ClassicSimilarity], result of:
          0.008173384 = score(doc=2502,freq=8.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.15287387 = fieldWeight in 2502, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=2502)
        0.010590675 = product of:
          0.02118135 = sum of:
            0.02118135 = weight(_text_:information in 2502) [ClassicSimilarity], result of:
              0.02118135 = score(doc=2502,freq=10.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.2602176 = fieldWeight in 2502, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2502)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    Prior efforts have shown that under certain situations retrieval effectiveness may be improved via the use of data fusion techniques. Although these improvements have been observed from the fusion of result sets from several distinct information retrieval systems, it has often been thought that fusing different document retrieval strategies in a single information retrieval system will lead to similar improvements. In this study, we show that this is not the case. We hold constant systemic differences such as parsing, stemming, phrase processing, and relevance feedback, and fuse result sets generated from highly effective retrieval strategies in the same information retrieval system. From this, we show that data fusion of highly effective retrieval strategies alone shows little or no improvement in retrieval effectiveness. Furthermore, we present a detailed analysis of the performance of modern data fusion approaches, and demonstrate the reasons why they do not perform weIl when applied to this problem. Detailed results and analyses are included to support our conclusions.
    Source
    Journal of the American Society for Information Science and Technology. 55(2004) no.10, S.859-868
    Type
    a
  19. Mustafa El Hadi, W.: Terminologies, ontologies and information access (2006) 0.01
    0.0074575767 = product of:
      0.018643942 = sum of:
        0.00770594 = weight(_text_:a in 1488) [ClassicSimilarity], result of:
          0.00770594 = score(doc=1488,freq=4.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.14413087 = fieldWeight in 1488, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=1488)
        0.010938003 = product of:
          0.021876005 = sum of:
            0.021876005 = weight(_text_:information in 1488) [ClassicSimilarity], result of:
              0.021876005 = score(doc=1488,freq=6.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.2687516 = fieldWeight in 1488, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1488)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    Ontologies have become an important issue in research communities across several disciplines. This paper discusses some of the innovative techniques involving automatic terminology resources acquisition are briefly discussed. Suggests that NLP-based ontologies are useful in reducing the cost of ontology engineering. Emphasizes that linguistic ontologies covering both ontological and lexical information can offer solutions since they can be more easily updated by the resources of NLP products.
    Source
    Knowledge organization, information systems and other essays: Professor A. Neelameghan Festschrift. Ed. by K.S. Raghavan and K.N. Prasad
    Type
    a
  20. Fox, B.; Fox, C.J.: Efficient stemmer generation (2002) 0.01
    0.007399688 = product of:
      0.01849922 = sum of:
        0.012184162 = weight(_text_:a in 2585) [ClassicSimilarity], result of:
          0.012184162 = score(doc=2585,freq=10.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.22789092 = fieldWeight in 2585, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=2585)
        0.006315058 = product of:
          0.012630116 = sum of:
            0.012630116 = weight(_text_:information in 2585) [ClassicSimilarity], result of:
              0.012630116 = score(doc=2585,freq=2.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.1551638 = fieldWeight in 2585, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0625 = fieldNorm(doc=2585)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    This paper presents an algorithm for generating stemmers from text stemmer specification files. A small study shows that the generated stemmers are computationally efficient, often running faster than stemmers custom written to implement particular stemming algorithms. The stemmer specification files are easily written and modified by non-programmers, making it much easier to create a stemmer, or tune a stemmer's performance, than would be the case with a custom stemmer program. Stemmer generation is thus also human-resource efficient.
    Source
    Information processing and management. 38(2002) no.4, S.547-558
    Type
    a

Authors

Languages

  • e 148
  • d 53
  • ru 5
  • slv 1
  • More… Less…