Search (1 results, page 1 of 1)

Urbain, J.; Goharian, N.; Frieder, O.: Probabilistic passage models for semantic search of genomics literature (2008) 0.01
```
0.011488594 = product of:
  0.022977188 = sum of:
    0.022977188 = product of:
      0.045954376 = sum of:
        0.045954376 = weight(_text_:p in 2380) [ClassicSimilarity], result of:
          0.045954376 = score(doc=2380,freq=4.0), product of:
            0.16359726 = queryWeight, product of:
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.045500398 = queryNorm
            0.28089944 = fieldWeight in 2380, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2380)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

We explore unsupervised learning techniques for extracting semantic information about biomedical concepts and topics, and introduce a passage retrieval model for using these semantics in context to improve genomics literature search. Our contributions include a new passage retrieval model based on an undirected graphical model (Markov Random Fields), and new methods for modeling passage-concepts, document-topics, and passage-terms as potential functions within the model. Each potential function includes distributional evidence to disambiguate topics, concepts, and terms in context. The joint distribution across potential functions in the graph represents the probability of a passage being relevant to a biologist's information need. Relevance ranking within each potential function simplifies normalization across potential functions and eliminates the need for tuning of passage retrieval model parameters. Our dimensional indexing model facilitates efficient aggregation of topic, concept, and term distributions. The proposed passage-retrieval model improves search results in the presence of varying levels of semantic evidence, outperforming models of query terms, concepts, or document topics alone. Our results exceed the state-of-the-art for automatic document retrieval by 14.46% (0.3554 vs. 0.3105) and passage retrieval by 15.57% (0.1128 vs. 0.0976) as assessed by the TREC 2007 Genomics Track, and automatic document retrieval by 18.56% (0.3424 vs. 0.2888) as assessed by the TREC 2005 Genomics Track. Automatic document retrieval results for TREC 2007 and TREC 2005 are statistically significant at the 95% confidence level (p = .0359 and .0253, respectively). Passage retrieval is significant at the 90% confidence level (p = 0.0893).