Search (2 results, page 1 of 1)

  • × author_ss:"Trotman, A."
  1. Trotman, A.: Searching structured documents (2004) 0.02
    0.0248391 = product of:
      0.0496782 = sum of:
        0.0496782 = sum of:
          0.010424593 = weight(_text_:a in 2538) [ClassicSimilarity], result of:
            0.010424593 = score(doc=2538,freq=12.0), product of:
              0.04772363 = queryWeight, product of:
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.041389145 = queryNorm
              0.21843673 = fieldWeight in 2538, product of:
                3.4641016 = tf(freq=12.0), with freq of:
                  12.0 = termFreq=12.0
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2538)
          0.039253604 = weight(_text_:22 in 2538) [ClassicSimilarity], result of:
            0.039253604 = score(doc=2538,freq=2.0), product of:
              0.14493774 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.041389145 = queryNorm
              0.2708308 = fieldWeight in 2538, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2538)
      0.5 = coord(1/2)
    
    Abstract
    Structured document interchange formats such as XML and SGML are ubiquitous, however, information retrieval systems supporting structured searching are not. Structured searching can result in increased precision. A search for the author "Smith" in an unstructured corpus of documents specializing in iron-working could have a lower precision than a structured search for "Smith as author" in the same corpus. Analysis of XML retrieval languages identifies additional functionality that must be supported including searching at, and broken across multiple nodes in the document tree. A data structure is developed to support structured document searching. Application of this structure to information retrieval is then demonstrated. Document ranking is examined and adapted specifically for structured searching.
    Date
    14. 8.2004 10:39:22
    Type
    a
  2. Trotman, A.: Choosing document structure weights (2005) 0.00
    0.0020392092 = product of:
      0.0040784185 = sum of:
        0.0040784185 = product of:
          0.008156837 = sum of:
            0.008156837 = weight(_text_:a in 1016) [ClassicSimilarity], result of:
              0.008156837 = score(doc=1016,freq=10.0), product of:
                0.04772363 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.041389145 = queryNorm
                0.1709182 = fieldWeight in 1016, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1016)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Existing ranking schemes assume all term occurrences in a given document are of equal influence. Intuitively, terms occurring in some places should have a greater influence than those elsewhere. An occurrence in an abstract may be more important than an occurrence in the body text. Although this observation is not new, there remains the issue of finding good weights for each structure. Vector space, probability, and Okapi BM25 ranking are extended to include structure weighting. Weights are then selected for the TREC WSJ collection using a genetic algorithm. The learned weights are then tested on an evaluation set of queries. Structure weighted vector space inner product and structure weighted probabilistic retrieval show an about 5% improvement in mean average precision over their unstructured counterparts. Structure weighted BM25 shows nearly no improvement. Analysis suggests BM25 cannot be improved using structure weighting.
    Type
    a