Search (11 results, page 1 of 1)

  • × theme_ss:"Computerlinguistik"
  • × theme_ss:"Retrievalalgorithmen"
  1. Herrera-Viedma, E.; Cordón, O.; Herrera, J.C.; Luqe, M.: ¬An IRS based on multi-granular lnguistic information (2003) 0.03
    0.026535526 = product of:
      0.05307105 = sum of:
        0.02303018 = weight(_text_:information in 2740) [ClassicSimilarity], result of:
          0.02303018 = score(doc=2740,freq=10.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.2602176 = fieldWeight in 2740, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=2740)
        0.030040871 = product of:
          0.060081743 = sum of:
            0.060081743 = weight(_text_:organization in 2740) [ClassicSimilarity], result of:
              0.060081743 = score(doc=2740,freq=4.0), product of:
                0.17974974 = queryWeight, product of:
                  3.5653565 = idf(docFreq=3399, maxDocs=44218)
                  0.050415643 = queryNorm
                0.33425218 = fieldWeight in 2740, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.5653565 = idf(docFreq=3399, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2740)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    An information retrieval system (IRS) based on fuzzy multi-granular linguistic information is proposed. The system has an evaluation method to process multi-granular linguistic information, in such a way that the inputs to the IRS are represented in a different linguistic domain than the outputs. The system accepts Boolean queries whose terms are weighted by means of the ordinal linguistic values represented by the linguistic variable "Importance" assessed an a label set S. The system evaluates the weighted queries according to a threshold semantic and obtains the linguistic retrieval status values (RSV) of documents represented by a linguistic variable "Relevance" expressed in a different label set S'. The advantage of this linguistic IRS with respect to others is that the use of the multi-granular linguistic information facilitates and improves the IRS-user interaction
    Series
    Advances in knowledge organization; vol.8
    Source
    Challenges in knowledge representation and organization for the 21st century: Integration of knowledge across boundaries. Proceedings of the 7th ISKO International Conference Granada, Spain, July 10-13, 2002. Ed.: M. López-Huertas
  2. Beitzel, S.M.; Jensen, E.C.; Chowdhury, A.; Grossman, D.; Frieder, O; Goharian, N.: Fusion of effective retrieval strategies in the same information retrieval system (2004) 0.01
    0.005757545 = product of:
      0.02303018 = sum of:
        0.02303018 = weight(_text_:information in 2502) [ClassicSimilarity], result of:
          0.02303018 = score(doc=2502,freq=10.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.2602176 = fieldWeight in 2502, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=2502)
      0.25 = coord(1/4)
    
    Abstract
    Prior efforts have shown that under certain situations retrieval effectiveness may be improved via the use of data fusion techniques. Although these improvements have been observed from the fusion of result sets from several distinct information retrieval systems, it has often been thought that fusing different document retrieval strategies in a single information retrieval system will lead to similar improvements. In this study, we show that this is not the case. We hold constant systemic differences such as parsing, stemming, phrase processing, and relevance feedback, and fuse result sets generated from highly effective retrieval strategies in the same information retrieval system. From this, we show that data fusion of highly effective retrieval strategies alone shows little or no improvement in retrieval effectiveness. Furthermore, we present a detailed analysis of the performance of modern data fusion approaches, and demonstrate the reasons why they do not perform weIl when applied to this problem. Detailed results and analyses are included to support our conclusions.
    Source
    Journal of the American Society for Information Science and Technology. 55(2004) no.10, S.859-868
  3. Hoenkamp, E.; Bruza, P.: How everyday language can and will boost effective information retrieval (2015) 0.01
    0.005255895 = product of:
      0.02102358 = sum of:
        0.02102358 = weight(_text_:information in 2123) [ClassicSimilarity], result of:
          0.02102358 = score(doc=2123,freq=12.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.23754507 = fieldWeight in 2123, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2123)
      0.25 = coord(1/4)
    
    Abstract
    Typing 2 or 3 keywords into a browser has become an easy and efficient way to find information. Yet, typing even short queries becomes tedious on ever shrinking (virtual) keyboards. Meanwhile, speech processing is maturing rapidly, facilitating everyday language input. Also, wearable technology can inform users proactively by listening in on their conversations or processing their social media interactions. Given these developments, everyday language may soon become the new input of choice. We present an information retrieval (IR) algorithm specifically designed to accept everyday language. It integrates two paradigms of information retrieval, previously studied in isolation; one directed mainly at the surface structure of language, the other primarily at the underlying meaning. The integration was achieved by a Markov machine that encodes meaning by its transition graph, and surface structure by the language it generates. A rigorous evaluation of the approach showed, first, that it can compete with the quality of existing language models, second, that it is more effective the more verbose the input, and third, as a consequence, that it is promising for an imminent transition from keyword input, where the onus is on the user to formulate concise queries, to a modality where users can express more freely, more informal, and more natural their need for information in everyday language.
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.8, S.1546-1558
  4. Brenner, E.H.: Beyond Boolean : new approaches in information retrieval; the quest for intuitive online search systems past, present & future (1995) 0.01
    0.0052030715 = product of:
      0.020812286 = sum of:
        0.020812286 = weight(_text_:information in 2547) [ClassicSimilarity], result of:
          0.020812286 = score(doc=2547,freq=6.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.23515764 = fieldWeight in 2547, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2547)
      0.25 = coord(1/4)
    
    Abstract
    The challenge of effectively bringing specific, relevant information from the global sea of data to our fingertips, has become an increasingly difficult one. Discusses how the online information industry, founded on Boolean search systems, may be evolving to take advantage of other methods, such as 'term weighting', 'relevance ranking' and 'query by example'
  5. Ponte, J.M.: Language models for relevance feedback (2000) 0.01
    0.005149705 = product of:
      0.02059882 = sum of:
        0.02059882 = weight(_text_:information in 35) [ClassicSimilarity], result of:
          0.02059882 = score(doc=35,freq=8.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.23274569 = fieldWeight in 35, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=35)
      0.25 = coord(1/4)
    
    Abstract
    The language modeling approach to Information Retrieval (IR) is a conceptually simple model of IR originally developed by Ponte and Croft (1998). In this approach, the query is treated as a random event and documents are ranked according to the likelihood that the query would be generated via a language model estimated for each document. The intuition behind this approach is that users have a prototypical document in mind and will choose query terms accordingly. The intuitive appeal of this method is that inferences about the semantic content of documents do not need to be made resulting in a conceptually simple model. In this paper, techniques for relevance feedback and routing are derived from the language modeling approach in a straightforward manner and their effectiveness is demonstrated empirically. These experiments demonstrate further proof of concept for the language modeling approach to retrieval
    Series
    The Kluwer international series on information retrieval; 7
    Source
    Advances in information retrieval: Recent research from the Center for Intelligent Information Retrieval. Ed.: W.B. Croft
  6. Symonds, M.; Bruza, P.; Zuccon, G.; Koopman, B.; Sitbon, L.; Turner, I.: Automatic query expansion : a structural linguistic perspective (2014) 0.00
    0.004797954 = product of:
      0.019191816 = sum of:
        0.019191816 = weight(_text_:information in 1338) [ClassicSimilarity], result of:
          0.019191816 = score(doc=1338,freq=10.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.21684799 = fieldWeight in 1338, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1338)
      0.25 = coord(1/4)
    
    Abstract
    A user's query is considered to be an imprecise description of their information need. Automatic query expansion is the process of reformulating the original query with the goal of improving retrieval effectiveness. Many successful query expansion techniques model syntagmatic associations that infer two terms co-occur more often than by chance in natural language. However, structural linguistics relies on both syntagmatic and paradigmatic associations to deduce the meaning of a word. Given the success of dependency-based approaches to query expansion and the reliance on word meanings in the query formulation process, we argue that modeling both syntagmatic and paradigmatic information in the query expansion process improves retrieval effectiveness. This article develops and evaluates a new query expansion technique that is based on a formal, corpus-based model of word meaning that models syntagmatic and paradigmatic associations. We demonstrate that when sufficient statistical information exists, as in the case of longer queries, including paradigmatic information alone provides significant improvements in retrieval effectiveness across a wide variety of data sets. More generally, when our new query expansion approach is applied to large-scale web retrieval it demonstrates significant improvements in retrieval effectiveness over a strong baseline system, based on a commercial search engine.
    Source
    Journal of the Association for Information Science and Technology. 65(2014) no.8, S.1577-1596
  7. French, J.C.; Powell, A.L.; Schulman, E.: Using clustering strategies for creating authority files (2000) 0.00
    0.0044597755 = product of:
      0.017839102 = sum of:
        0.017839102 = weight(_text_:information in 4811) [ClassicSimilarity], result of:
          0.017839102 = score(doc=4811,freq=6.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.20156369 = fieldWeight in 4811, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=4811)
      0.25 = coord(1/4)
    
    Abstract
    As more online databases are integrated into digital libraries, the issue of quality control of the data becomes increasingly important, especially as it relates to the effective retrieval of information. Authority work, the need to discover and reconcile variant forms of strings in bibliographical entries, will become more critical in the future. Spelling variants, misspellings, and transliteration differences will all increase the difficulty of retrieving information. We investigate a number of approximate string matching techniques that have traditionally been used to help with this problem. We then introduce the notion of approximate word matching and show how it can be used to improve detection and categorization of variant forms. We demonstrate the utility of these approaches using data from the Astrophysics Data System and show how we can reduce the human effort involved in the creation of authority files
    Source
    Journal of the American Society for Information Science. 51(2000) no.8, S.774-786
  8. Abu-Salem, H.; Al-Omari, M.; Evens, M.W.: Stemming methodologies over individual query words for an Arabic information retrieval system (1999) 0.00
    0.0042914203 = product of:
      0.017165681 = sum of:
        0.017165681 = weight(_text_:information in 3672) [ClassicSimilarity], result of:
          0.017165681 = score(doc=3672,freq=8.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.19395474 = fieldWeight in 3672, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3672)
      0.25 = coord(1/4)
    
    Abstract
    Stemming is one of the most important factors that affect the performance of information retrieval systems. This article investigates how to improve the performance of an Arabic information retrieval system by imposing the retrieval method over individual words of a query depending on the importance of the WORD, the STEM, or the ROOT of the query terms in the database. This method, called Mxed Stemming, computes term importance using a weighting scheme that use the Term Frequency (TF) and the Inverse Document Frequency (IDF), called TFxIDF. An extended version of the Arabic IRS system is designed, implemented, and evaluated to reduce the number of irrelevant documents retrieved. The results of the experiment suggest that the proposed method outperforms the Word index method using the TFxIDF weighting scheme. It also outperforms the Stem index method using the Binary weighting scheme but does not outperform the Stem index method using the TFxIDF weighting scheme, and again it outperforms the Root index method using the Binary weighting scheme but does not outperform the Root index method using the TFxIDF weighting scheme
    Source
    Journal of the American Society for Information Science. 50(1999) no.6, S.524-529
  9. Frakes, W.B.: Stemming algorithms (1992) 0.00
    0.0034331365 = product of:
      0.013732546 = sum of:
        0.013732546 = weight(_text_:information in 3503) [ClassicSimilarity], result of:
          0.013732546 = score(doc=3503,freq=2.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.1551638 = fieldWeight in 3503, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0625 = fieldNorm(doc=3503)
      0.25 = coord(1/4)
    
    Source
    Information retrieval: data structures and algorithms. Ed.: W.B. Frakes u. R. Baeza-Yates
  10. Radev, D.; Fan, W.; Qu, H.; Wu, H.; Grewal, A.: Probabilistic question answering on the Web (2005) 0.00
    0.0025748524 = product of:
      0.01029941 = sum of:
        0.01029941 = weight(_text_:information in 3455) [ClassicSimilarity], result of:
          0.01029941 = score(doc=3455,freq=2.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.116372846 = fieldWeight in 3455, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=3455)
      0.25 = coord(1/4)
    
    Source
    Journal of the American Society for Information Science and Technology. 56(2005) no.6, S.571-583
  11. Hoenkamp, E.; Bruza, P.D.; Song, D.; Huang, Q.: ¬An effective approach to verbose queries using a limited dependencies language model (2009) 0.00
    0.002427594 = product of:
      0.009710376 = sum of:
        0.009710376 = weight(_text_:information in 2122) [ClassicSimilarity], result of:
          0.009710376 = score(doc=2122,freq=4.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.10971737 = fieldWeight in 2122, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.03125 = fieldNorm(doc=2122)
      0.25 = coord(1/4)
    
    Series
    Lecture notes in computer science : advances in information retrieval theory; 5766
    Source
    Second International Conference on the Theory of Information Retrieval, ICTIR 2009 Cambridge, UK, September 10-12, 2009 Proceedings. Ed.: L. Azzopardi