Search (10 results, page 1 of 1)

  • × author_ss:"Croft, W.B."
  1. Kim, Y.; Seo, J.; Croft, W.B.; Smith, D.A.: Automatic suggestion of phrasal-concept queries for literature search (2014) 0.05
    0.048083793 = product of:
      0.14425138 = sum of:
        0.14425138 = weight(_text_:query in 2692) [ClassicSimilarity], result of:
          0.14425138 = score(doc=2692,freq=12.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.6289012 = fieldWeight in 2692, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2692)
      0.33333334 = coord(1/3)
    
    Abstract
    Both general and domain-specific search engines have adopted query suggestion techniques to help users formulate effective queries. In the specific domain of literature search (e.g., finding academic papers), the initial queries are usually based on a draft paper or abstract, rather than short lists of keywords. In this paper, we investigate phrasal-concept query suggestions for literature search. These suggestions explicitly specify important phrasal concepts related to an initial detailed query. The merits of phrasal-concept query suggestions for this domain are their readability and retrieval effectiveness: (1) phrasal concepts are natural for academic authors because of their frequent use of terminology and subject-specific phrases and (2) academic papers describe their key ideas via these subject-specific phrases, and thus phrasal concepts can be used effectively to find those papers. We propose a novel phrasal-concept query suggestion technique that generates queries by identifying key phrasal-concepts from pseudo-labeled documents and combines them with related phrases. Our proposed technique is evaluated in terms of both user preference and retrieval effectiveness. We conduct user experiments to verify a preference for our approach, in comparison to baseline query suggestion methods, and demonstrate the effectiveness of the technique with retrieval experiments.
  2. Rajashekar, T.B.; Croft, W.B.: Combining automatic and manual index representations in probabilistic retrieval (1995) 0.05
    0.04760053 = product of:
      0.14280158 = sum of:
        0.14280158 = weight(_text_:query in 2418) [ClassicSimilarity], result of:
          0.14280158 = score(doc=2418,freq=6.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.62258047 = fieldWeight in 2418, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2418)
      0.33333334 = coord(1/3)
    
    Abstract
    Results from research in information retrieval have suggested that significant improvements in retrieval effectiveness can be obtained by combining results from multiple index representioms, query formulations, and search strategies. The inference net model of retrieval, which was designed from this point of view, treats information retrieval as an evidental reasoning process where multiple sources of evidence about document and query content are combined to estimate relevance probabilities. Uses a system based on this model to study the retrieval effectiveness benefits of combining these types of document and query information that are found in typical commercial databases and information services. The results indicate that substantial real benefits are possible
  3. Xu, J.; Croft, W.B.: Topic-based language models for distributed retrieval (2000) 0.03
    0.03331343 = product of:
      0.09994029 = sum of:
        0.09994029 = weight(_text_:query in 38) [ClassicSimilarity], result of:
          0.09994029 = score(doc=38,freq=4.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.43571556 = fieldWeight in 38, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.046875 = fieldNorm(doc=38)
      0.33333334 = coord(1/3)
    
    Abstract
    Effective retrieval in a distributed environment is an important but difficult problem. Lack of effectiveness appears to have two major causes. First, existing collection selection algorithms do not work well on heterogeneous collections. Second, relevant documents are scattered over many collections and searching a few collections misses many relevant documents. We propose a topic-oriented approach to distributed retrieval. With this approach, we structure the document set of a distributed retrieval environment around a set of topics. Retrieval for a query involves first selecting the right topics for the query and then dispatching the search process to collections that contain such topics. The content of a topic is characterized by a language model. In environments where the labeling of documents by topics is unavailable, document clustering is employed for topic identification. Based on these ideas, three methods are proposed to suit different environments. We show that all three methods improve effectiveness of distributed retrieval
  4. Croft, W.B.: Effective retrieval based on combining evidence from the corpus and users (1995) 0.03
    0.031408206 = product of:
      0.09422461 = sum of:
        0.09422461 = weight(_text_:query in 4489) [ClassicSimilarity], result of:
          0.09422461 = score(doc=4489,freq=2.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.41079655 = fieldWeight in 4489, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0625 = fieldNorm(doc=4489)
      0.33333334 = coord(1/3)
    
    Abstract
    Inquery is a text retrieval system that is the basis of a number of WWW applications, including the Thomas system supported by the Library of Congress. Surveys the representation, query processing, and retrieval techniques used in the system. By combining evidence about relevance from the corpus, individual documents, and users, Inquery achieves effective overall recall and precision evaluation while avoiding occasional major failures
  5. Liu, X.; Croft, W.B.: Statistical language modeling for information retrieval (2004) 0.03
    0.027761191 = product of:
      0.08328357 = sum of:
        0.08328357 = weight(_text_:query in 4277) [ClassicSimilarity], result of:
          0.08328357 = score(doc=4277,freq=4.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.3630963 = fieldWeight in 4277, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4277)
      0.33333334 = coord(1/3)
    
    Abstract
    This chapter reviews research and applications in statistical language modeling for information retrieval (IR), which has emerged within the past several years as a new probabilistic framework for describing information retrieval processes. Generally speaking, statistical language modeling, or more simply language modeling (LM), involves estimating a probability distribution that captures statistical regularities of natural language use. Applied to information retrieval, language modeling refers to the problem of estimating the likelihood that a query and a document could have been generated by the same language model, given the language model of the document either with or without a language model of the query. The roots of statistical language modeling date to the beginning of the twentieth century when Markov tried to model letter sequences in works of Russian literature (Manning & Schütze, 1999). Zipf (1929, 1932, 1949, 1965) studied the statistical properties of text and discovered that the frequency of works decays as a Power function of each works rank. However, it was Shannon's (1951) work that inspired later research in this area. In 1951, eager to explore the applications of his newly founded information theory to human language, Shannon used a prediction game involving n-grams to investigate the information content of English text. He evaluated n-gram models' performance by comparing their crossentropy an texts with the true entropy estimated using predictions made by human subjects. For many years, statistical language models have been used primarily for automatic speech recognition. Since 1980, when the first significant language model was proposed (Rosenfeld, 2000), statistical language modeling has become a fundamental component of speech recognition, machine translation, and spelling correction.
  6. Krovetz, R.; Croft, W.B.: Lexical ambiguity and information retrieval (1992) 0.03
    0.027482178 = product of:
      0.08244653 = sum of:
        0.08244653 = weight(_text_:query in 4028) [ClassicSimilarity], result of:
          0.08244653 = score(doc=4028,freq=2.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.35944697 = fieldWeight in 4028, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4028)
      0.33333334 = coord(1/3)
    
    Abstract
    Reports on an analysis of lexical ambiguity in information retrieval text collections and on experiments to determine the utility of word meanings for separating relevant from nonrelevant documents. Results show that there is considerable ambiguity even in a specialised database. Word senses provide a significant separation between relevant and nonrelevant documents, but several factors contribute to determining whether disambiguation will make an improvement in performance such as: resolving lexical ambiguity was found to have little impact on retrieval effectiveness for documents that have many words in common with the query. Discusses other uses of word sense disambiguation in an information retrieval context
  7. Croft, W.B.: Advances in information retrieval : Recent research from the Center for Intelligent Information Retrieval (2000) 0.02
    0.023556154 = product of:
      0.07066846 = sum of:
        0.07066846 = weight(_text_:query in 6860) [ClassicSimilarity], result of:
          0.07066846 = score(doc=6860,freq=2.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.30809742 = fieldWeight in 6860, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.046875 = fieldNorm(doc=6860)
      0.33333334 = coord(1/3)
    
    Content
    Enthält die Beiträge: CROFT, W.B.: Combining approaches to information retrieval; GREIFF, W.R.: The use of exploratory data analysis in information retrieval research; PONTE, J.M.: Language models for relevance feedback; PAPKA, R. u. J. ALLAN: Topic detection and tracking: event clustering as a basis for first story detection; CALLAN, J.: Distributed information retrieval; XU, J. u. W.B. CROFT: Topic-based language models for ditributed retrieval; LU, Z. u. K.S. McKINLEY: The effect of collection organization and query locality on information retrieval system performance; BALLESTEROS, L.A.: Cross-language retrieval via transitive translation; SANDERSON, M. u. D. LAWRIE: Building, testing, and applying concept hierarchies; RAVELA, S. u. C. LUO: Appearance-based global similarity retrieval of images
  8. Luk, R.W.P.; Leong, H.V.; Dillon, T.S.; Chan, A.T.S.; Croft, W.B.; Allen, J.: ¬A survey in indexing and searching XML documents (2002) 0.02
    0.023556154 = product of:
      0.07066846 = sum of:
        0.07066846 = weight(_text_:query in 460) [ClassicSimilarity], result of:
          0.07066846 = score(doc=460,freq=2.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.30809742 = fieldWeight in 460, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.046875 = fieldNorm(doc=460)
      0.33333334 = coord(1/3)
    
    Abstract
    XML holds the promise to yield (1) a more precise search by providing additional information in the elements, (2) a better integrated search of documents from heterogeneous sources, (3) a powerful search paradigm using structural as well as content specifications, and (4) data and information exchange to share resources and to support cooperative search. We survey several indexing techniques for XML documents, grouping them into flatfile, semistructured, and structured indexing paradigms. Searching techniques and supporting techniques for searching are reviewed, including full text search and multistage search. Because searching XML documents can be very flexible, various search result presentations are discussed, as well as database and information retrieval system integration and XML query languages. We also survey various retrieval models, examining how they would be used or extended for retrieving XML documents. To conclude the article, we discuss various open issues that XML poses with respect to information retrieval and database research.
  9. Belkin, N.J.; Croft, W.B.: Retrieval techniques (1987) 0.02
    0.017830748 = product of:
      0.053492244 = sum of:
        0.053492244 = product of:
          0.10698449 = sum of:
            0.10698449 = weight(_text_:22 in 334) [ClassicSimilarity], result of:
              0.10698449 = score(doc=334,freq=2.0), product of:
                0.1728227 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049352113 = queryNorm
                0.61904186 = fieldWeight in 334, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.125 = fieldNorm(doc=334)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Source
    Annual review of information science and technology. 22(1987), S.109-145
  10. Allan, J.; Callan, J.P.; Croft, W.B.; Ballesteros, L.; Broglio, J.; Xu, J.; Shu, H.: INQUERY at TREC-5 (1997) 0.01
    0.011144217 = product of:
      0.03343265 = sum of:
        0.03343265 = product of:
          0.0668653 = sum of:
            0.0668653 = weight(_text_:22 in 3103) [ClassicSimilarity], result of:
              0.0668653 = score(doc=3103,freq=2.0), product of:
                0.1728227 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049352113 = queryNorm
                0.38690117 = fieldWeight in 3103, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=3103)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    27. 2.1999 20:55:22