Document (#27539)

Author
Trotman, A.
Title
Searching structured documents
Source
Information processing and management. 40(2004) no.4, S.619-632
Year
2004
Abstract
Structured document interchange formats such as XML and SGML are ubiquitous, however, information retrieval systems supporting structured searching are not. Structured searching can result in increased precision. A search for the author "Smith" in an unstructured corpus of documents specializing in iron-working could have a lower precision than a structured search for "Smith as author" in the same corpus. Analysis of XML retrieval languages identifies additional functionality that must be supported including searching at, and broken across multiple nodes in the document tree. A data structure is developed to support structured document searching. Application of this structure to information retrieval is then demonstrated. Document ranking is examined and adapted specifically for structured searching.
Theme
Auszeichnungssprachen

Similar documents (content)

  1. Crestani, F.; Vegas, J.; Fuente, P. de la: ¬A graphical user interface for the retrieval of hierarchically structured documents (2004) 0.27
    0.26677752 = sum of:
      0.26677752 = product of:
        0.9527768 = sum of:
          0.020608224 = weight(abstract_txt:search in 2555) [ClassicSimilarity], result of:
            0.020608224 = score(doc=2555,freq=1.0), product of:
              0.07211085 = queryWeight, product of:
                1.2166524 = boost
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.01620258 = queryNorm
              0.28578535 = fieldWeight in 2555, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.078125 = fieldNorm(doc=2555)
          0.051045194 = weight(abstract_txt:documents in 2555) [ClassicSimilarity], result of:
            0.051045194 = score(doc=2555,freq=3.0), product of:
              0.09153132 = queryWeight, product of:
                1.3707273 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.01620258 = queryNorm
              0.5576801 = fieldWeight in 2555, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.078125 = fieldNorm(doc=2555)
          0.03484595 = weight(abstract_txt:structure in 2555) [ClassicSimilarity], result of:
            0.03484595 = score(doc=2555,freq=1.0), product of:
              0.102346994 = queryWeight, product of:
                1.4494517 = boost
                4.3579993 = idf(docFreq=1538, maxDocs=44218)
                0.01620258 = queryNorm
              0.3404687 = fieldWeight in 2555, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3579993 = idf(docFreq=1538, maxDocs=44218)
                0.078125 = fieldNorm(doc=2555)
          0.053006448 = weight(abstract_txt:retrieval in 2555) [ClassicSimilarity], result of:
            0.053006448 = score(doc=2555,freq=4.0), product of:
              0.097619474 = queryWeight, product of:
                1.7337244 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.01620258 = queryNorm
              0.5429905 = fieldWeight in 2555, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.078125 = fieldNorm(doc=2555)
          0.16313991 = weight(abstract_txt:document in 2555) [ClassicSimilarity], result of:
            0.16313991 = score(doc=2555,freq=6.0), product of:
              0.19859727 = queryWeight, product of:
                2.8554058 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.01620258 = queryNorm
              0.82146096 = fieldWeight in 2555, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.078125 = fieldNorm(doc=2555)
          0.09935312 = weight(abstract_txt:searching in 2555) [ClassicSimilarity], result of:
            0.09935312 = score(doc=2555,freq=1.0), product of:
              0.29680303 = queryWeight, product of:
                4.2752447 = boost
                4.284727 = idf(docFreq=1655, maxDocs=44218)
                0.01620258 = queryNorm
              0.3347443 = fieldWeight in 2555, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.284727 = idf(docFreq=1655, maxDocs=44218)
                0.078125 = fieldNorm(doc=2555)
          0.53077793 = weight(abstract_txt:structured in 2555) [ClassicSimilarity], result of:
            0.53077793 = score(doc=2555,freq=5.0), product of:
              0.5584035 = queryWeight, product of:
                6.333945 = boost
                5.4411373 = idf(docFreq=520, maxDocs=44218)
                0.01620258 = queryNorm
              0.95052755 = fieldWeight in 2555, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.4411373 = idf(docFreq=520, maxDocs=44218)
                0.078125 = fieldNorm(doc=2555)
        0.28 = coord(7/25)
    
  2. Schlieder, T.; Meuss, H.: Querying and ranking XML documents (2002) 0.23
    0.22867198 = sum of:
      0.22867198 = product of:
        0.71459997 = sum of:
          0.049033735 = weight(abstract_txt:adapted in 459) [ClassicSimilarity], result of:
            0.049033735 = score(doc=459,freq=1.0), product of:
              0.1183678 = queryWeight, product of:
                1.1022192 = boost
                6.627983 = idf(docFreq=158, maxDocs=44218)
                0.01620258 = queryNorm
              0.41424894 = fieldWeight in 459, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.627983 = idf(docFreq=158, maxDocs=44218)
                0.0625 = fieldNorm(doc=459)
          0.07637634 = weight(abstract_txt:tree in 459) [ClassicSimilarity], result of:
            0.07637634 = score(doc=459,freq=2.0), product of:
              0.12624073 = queryWeight, product of:
                1.1382848 = boost
                6.8448567 = idf(docFreq=127, maxDocs=44218)
                0.01620258 = queryNorm
              0.60500556 = fieldWeight in 459, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.8448567 = idf(docFreq=127, maxDocs=44218)
                0.0625 = fieldNorm(doc=459)
          0.040836155 = weight(abstract_txt:documents in 459) [ClassicSimilarity], result of:
            0.040836155 = score(doc=459,freq=3.0), product of:
              0.09153132 = queryWeight, product of:
                1.3707273 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.01620258 = queryNorm
              0.44614407 = fieldWeight in 459, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.0625 = fieldNorm(doc=459)
          0.055753518 = weight(abstract_txt:structure in 459) [ClassicSimilarity], result of:
            0.055753518 = score(doc=459,freq=4.0), product of:
              0.102346994 = queryWeight, product of:
                1.4494517 = boost
                4.3579993 = idf(docFreq=1538, maxDocs=44218)
                0.01620258 = queryNorm
              0.5447499 = fieldWeight in 459, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.3579993 = idf(docFreq=1538, maxDocs=44218)
                0.0625 = fieldNorm(doc=459)
          0.036723945 = weight(abstract_txt:retrieval in 459) [ClassicSimilarity], result of:
            0.036723945 = score(doc=459,freq=3.0), product of:
              0.097619474 = queryWeight, product of:
                1.7337244 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.01620258 = queryNorm
              0.37619486 = fieldWeight in 459, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.0625 = fieldNorm(doc=459)
          0.056809578 = weight(abstract_txt:precision in 459) [ClassicSimilarity], result of:
            0.056809578 = score(doc=459,freq=1.0), product of:
              0.16451088 = queryWeight, product of:
                1.837653 = boost
                5.5251865 = idf(docFreq=478, maxDocs=44218)
                0.01620258 = queryNorm
              0.34532416 = fieldWeight in 459, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5251865 = idf(docFreq=478, maxDocs=44218)
                0.0625 = fieldNorm(doc=459)
          0.13051191 = weight(abstract_txt:document in 459) [ClassicSimilarity], result of:
            0.13051191 = score(doc=459,freq=6.0), product of:
              0.19859727 = queryWeight, product of:
                2.8554058 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.01620258 = queryNorm
              0.65716875 = fieldWeight in 459, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.0625 = fieldNorm(doc=459)
          0.26855475 = weight(abstract_txt:structured in 459) [ClassicSimilarity], result of:
            0.26855475 = score(doc=459,freq=2.0), product of:
              0.5584035 = queryWeight, product of:
                6.333945 = boost
                5.4411373 = idf(docFreq=520, maxDocs=44218)
                0.01620258 = queryNorm
              0.48093313 = fieldWeight in 459, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4411373 = idf(docFreq=520, maxDocs=44218)
                0.0625 = fieldNorm(doc=459)
        0.32 = coord(8/25)
    
  3. Denoyer, L.; Gallinari, P.: Bayesian network model for semi-structured document classification (2004) 0.22
    0.21561876 = sum of:
      0.21561876 = product of:
        0.770067 = sum of:
          0.041678227 = weight(abstract_txt:documents in 995) [ClassicSimilarity], result of:
            0.041678227 = score(doc=995,freq=2.0), product of:
              0.09153132 = queryWeight, product of:
                1.3707273 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.01620258 = queryNorm
              0.4553439 = fieldWeight in 995, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.078125 = fieldNorm(doc=995)
          0.03484595 = weight(abstract_txt:structure in 995) [ClassicSimilarity], result of:
            0.03484595 = score(doc=995,freq=1.0), product of:
              0.102346994 = queryWeight, product of:
                1.4494517 = boost
                4.3579993 = idf(docFreq=1538, maxDocs=44218)
                0.01620258 = queryNorm
              0.3404687 = fieldWeight in 995, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3579993 = idf(docFreq=1538, maxDocs=44218)
                0.078125 = fieldNorm(doc=995)
          0.026503224 = weight(abstract_txt:retrieval in 995) [ClassicSimilarity], result of:
            0.026503224 = score(doc=995,freq=1.0), product of:
              0.097619474 = queryWeight, product of:
                1.7337244 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.01620258 = queryNorm
              0.27149525 = fieldWeight in 995, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.078125 = fieldNorm(doc=995)
          0.16539146 = weight(abstract_txt:corpus in 995) [ClassicSimilarity], result of:
            0.16539146 = score(doc=995,freq=3.0), product of:
              0.20042038 = queryWeight, product of:
                2.0283232 = boost
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.01620258 = queryNorm
              0.8252228 = fieldWeight in 995, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.078125 = fieldNorm(doc=995)
          0.06660158 = weight(abstract_txt:document in 995) [ClassicSimilarity], result of:
            0.06660158 = score(doc=995,freq=1.0), product of:
              0.19859727 = queryWeight, product of:
                2.8554058 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.01620258 = queryNorm
              0.33536002 = fieldWeight in 995, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.078125 = fieldNorm(doc=995)
          0.09935312 = weight(abstract_txt:searching in 995) [ClassicSimilarity], result of:
            0.09935312 = score(doc=995,freq=1.0), product of:
              0.29680303 = queryWeight, product of:
                4.2752447 = boost
                4.284727 = idf(docFreq=1655, maxDocs=44218)
                0.01620258 = queryNorm
              0.3347443 = fieldWeight in 995, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.284727 = idf(docFreq=1655, maxDocs=44218)
                0.078125 = fieldNorm(doc=995)
          0.33569342 = weight(abstract_txt:structured in 995) [ClassicSimilarity], result of:
            0.33569342 = score(doc=995,freq=2.0), product of:
              0.5584035 = queryWeight, product of:
                6.333945 = boost
                5.4411373 = idf(docFreq=520, maxDocs=44218)
                0.01620258 = queryNorm
              0.6011664 = fieldWeight in 995, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4411373 = idf(docFreq=520, maxDocs=44218)
                0.078125 = fieldNorm(doc=995)
        0.28 = coord(7/25)
    
  4. Skov, M.; Larsen, B.; Ingwersen, P.: Inter and intra-document contexts applied in polyrepresentation for best match IR (2008) 0.17
    0.16510846 = sum of:
      0.16510846 = product of:
        0.58967304 = sum of:
          0.038057018 = weight(abstract_txt:supporting in 2117) [ClassicSimilarity], result of:
            0.038057018 = score(doc=2117,freq=1.0), product of:
              0.099967785 = queryWeight, product of:
                1.0129342 = boost
                6.091085 = idf(docFreq=271, maxDocs=44218)
                0.01620258 = queryNorm
              0.3806928 = fieldWeight in 2117, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.091085 = idf(docFreq=271, maxDocs=44218)
                0.0625 = fieldNorm(doc=2117)
          0.061648622 = weight(abstract_txt:unstructured in 2117) [ClassicSimilarity], result of:
            0.061648622 = score(doc=2117,freq=1.0), product of:
              0.13788567 = queryWeight, product of:
                1.1896269 = boost
                7.1535926 = idf(docFreq=93, maxDocs=44218)
                0.01620258 = queryNorm
              0.44709954 = fieldWeight in 2117, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1535926 = idf(docFreq=93, maxDocs=44218)
                0.0625 = fieldNorm(doc=2117)
          0.023315543 = weight(abstract_txt:search in 2117) [ClassicSimilarity], result of:
            0.023315543 = score(doc=2117,freq=2.0), product of:
              0.07211085 = queryWeight, product of:
                1.2166524 = boost
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.01620258 = queryNorm
              0.3233292 = fieldWeight in 2117, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.0625 = fieldNorm(doc=2117)
          0.04240516 = weight(abstract_txt:retrieval in 2117) [ClassicSimilarity], result of:
            0.04240516 = score(doc=2117,freq=4.0), product of:
              0.097619474 = queryWeight, product of:
                1.7337244 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.01620258 = queryNorm
              0.43439242 = fieldWeight in 2117, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.0625 = fieldNorm(doc=2117)
          0.08034088 = weight(abstract_txt:precision in 2117) [ClassicSimilarity], result of:
            0.08034088 = score(doc=2117,freq=2.0), product of:
              0.16451088 = queryWeight, product of:
                1.837653 = boost
                5.5251865 = idf(docFreq=478, maxDocs=44218)
                0.01620258 = queryNorm
              0.4883621 = fieldWeight in 2117, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.5251865 = idf(docFreq=478, maxDocs=44218)
                0.0625 = fieldNorm(doc=2117)
          0.07535109 = weight(abstract_txt:document in 2117) [ClassicSimilarity], result of:
            0.07535109 = score(doc=2117,freq=2.0), product of:
              0.19859727 = queryWeight, product of:
                2.8554058 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.01620258 = queryNorm
              0.37941656 = fieldWeight in 2117, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.0625 = fieldNorm(doc=2117)
          0.26855475 = weight(abstract_txt:structured in 2117) [ClassicSimilarity], result of:
            0.26855475 = score(doc=2117,freq=2.0), product of:
              0.5584035 = queryWeight, product of:
                6.333945 = boost
                5.4411373 = idf(docFreq=520, maxDocs=44218)
                0.01620258 = queryNorm
              0.48093313 = fieldWeight in 2117, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4411373 = idf(docFreq=520, maxDocs=44218)
                0.0625 = fieldNorm(doc=2117)
        0.28 = coord(7/25)
    
  5. Sevigny, M.; Marcoux, Y.: Construction et evaluation d'un prototype d'interface-utilisateurs pour l'interrogation de bases de documents structures (1996) 0.16
    0.1615813 = sum of:
      0.1615813 = product of:
        0.80790645 = sum of:
          0.07713822 = weight(abstract_txt:sgml in 752) [ClassicSimilarity], result of:
            0.07713822 = score(doc=752,freq=1.0), product of:
              0.122186296 = queryWeight, product of:
                1.1198567 = boost
                6.7340426 = idf(docFreq=142, maxDocs=44218)
                0.01620258 = queryNorm
              0.6313165 = fieldWeight in 752, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7340426 = idf(docFreq=142, maxDocs=44218)
                0.09375 = fieldNorm(doc=752)
          0.03536515 = weight(abstract_txt:documents in 752) [ClassicSimilarity], result of:
            0.03536515 = score(doc=752,freq=1.0), product of:
              0.09153132 = queryWeight, product of:
                1.3707273 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.01620258 = queryNorm
              0.38637212 = fieldWeight in 752, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.09375 = fieldNorm(doc=752)
          0.06360774 = weight(abstract_txt:retrieval in 752) [ClassicSimilarity], result of:
            0.06360774 = score(doc=752,freq=4.0), product of:
              0.097619474 = queryWeight, product of:
                1.7337244 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.01620258 = queryNorm
              0.6515886 = fieldWeight in 752, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.09375 = fieldNorm(doc=752)
          0.13842879 = weight(abstract_txt:document in 752) [ClassicSimilarity], result of:
            0.13842879 = score(doc=752,freq=3.0), product of:
              0.19859727 = queryWeight, product of:
                2.8554058 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.01620258 = queryNorm
              0.6970327 = fieldWeight in 752, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.09375 = fieldNorm(doc=752)
          0.49336654 = weight(abstract_txt:structured in 752) [ClassicSimilarity], result of:
            0.49336654 = score(doc=752,freq=3.0), product of:
              0.5584035 = queryWeight, product of:
                6.333945 = boost
                5.4411373 = idf(docFreq=520, maxDocs=44218)
                0.01620258 = queryNorm
              0.88353056 = fieldWeight in 752, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.4411373 = idf(docFreq=520, maxDocs=44218)
                0.09375 = fieldNorm(doc=752)
        0.2 = coord(5/25)