Document (#37099)

Author
Lassalle, E.
Lassalle, E.
Title
Semantic models in information retrieval
Source
Next generation search engines: advanced models for information retrieval. Eds.: C. Jouis, u.a
Imprint
Hershey, PA : IGI Publishing
Year
2012
Pages
S.138-173
Abstract
Robertson and Spärck Jones pioneered experimental probabilistic models (Binary Independence Model) with both a typology generalizing the Boolean model, a frequency counting to calculate elementary weightings, and their combination into a global probabilistic estimation. However, this model did not consider indexing terms dependencies. An extension to mixture models (e.g., using a 2-Poisson law) made it possible to take into account these dependencies from a macroscopic point of view (BM25), as well as a shallow linguistic processing of co-references. New approaches (language models, for example "bag of words" models, probabilistic dependencies between requests and documents, and consequently Bayesian inference using Dirichlet prior conjugate) furnished new solutions for documents structuring (categorization) and for index smoothing. Presently, in these probabilistic models the main issues have been addressed from a formal point of view only. Thus, linguistic properties are neglected in the indexing language. The authors examine how a linguistic and semantic modeling can be integrated in indexing languages and set up a hybrid model that makes it possible to deal with different information retrieval problems in a unified way.
Footnote
Vgl.: http://www.igi-global.com/book/next-generation-search-engines/64424.
Theme
Semantic Web
Wissensrepräsentation

Similar documents (content)

  1. Lhadj, L.S.; Boughanem, M.; Amrouche, K.: Enhancing information retrieval through concept-based language modeling and semantic smoothing (2016) 0.27
    0.26805303 = sum of:
      0.26805303 = product of:
        0.7445917 = sum of:
          0.01960491 = weight(abstract_txt:documents in 5222) [ClassicSimilarity], result of:
            0.01960491 = score(doc=5222,freq=1.0), product of:
              0.07622088 = queryWeight, product of:
                1.0187398 = boost
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.018180247 = queryNorm
              0.2572118 = fieldWeight in 5222, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.0625 = fieldNorm(doc=5222)
          0.035935033 = weight(abstract_txt:language in 5222) [ClassicSimilarity], result of:
            0.035935033 = score(doc=5222,freq=3.0), product of:
              0.0791533 = queryWeight, product of:
                1.0381517 = boost
                4.1938066 = idf(docFreq=1752, maxDocs=42740)
                0.018180247 = queryNorm
              0.45399287 = fieldWeight in 5222, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.1938066 = idf(docFreq=1752, maxDocs=42740)
                0.0625 = fieldNorm(doc=5222)
          0.044340495 = weight(abstract_txt:semantic in 5222) [ClassicSimilarity], result of:
            0.044340495 = score(doc=5222,freq=3.0), product of:
              0.09105924 = queryWeight, product of:
                1.1134951 = boost
                4.4981704 = idf(docFreq=1292, maxDocs=42740)
                0.018180247 = queryNorm
              0.48694122 = fieldWeight in 5222, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.4981704 = idf(docFreq=1292, maxDocs=42740)
                0.0625 = fieldNorm(doc=5222)
          0.115337975 = weight(abstract_txt:smoothing in 5222) [ClassicSimilarity], result of:
            0.115337975 = score(doc=5222,freq=1.0), product of:
              0.1971525 = queryWeight, product of:
                1.1585438 = boost
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.018180247 = queryNorm
              0.5850191 = fieldWeight in 5222, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.0625 = fieldNorm(doc=5222)
          0.035675064 = weight(abstract_txt:view in 5222) [ClassicSimilarity], result of:
            0.035675064 = score(doc=5222,freq=1.0), product of:
              0.113607556 = queryWeight, product of:
                1.2437409 = boost
                5.0243225 = idf(docFreq=763, maxDocs=42740)
                0.018180247 = queryNorm
              0.31402016 = fieldWeight in 5222, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0243225 = idf(docFreq=763, maxDocs=42740)
                0.0625 = fieldNorm(doc=5222)
          0.039797507 = weight(abstract_txt:point in 5222) [ClassicSimilarity], result of:
            0.039797507 = score(doc=5222,freq=1.0), product of:
              0.12219909 = queryWeight, product of:
                1.2899126 = boost
                5.2108417 = idf(docFreq=633, maxDocs=42740)
                0.018180247 = queryNorm
              0.3256776 = fieldWeight in 5222, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2108417 = idf(docFreq=633, maxDocs=42740)
                0.0625 = fieldNorm(doc=5222)
          0.0732169 = weight(abstract_txt:model in 5222) [ClassicSimilarity], result of:
            0.0732169 = score(doc=5222,freq=4.0), product of:
              0.14562243 = queryWeight, product of:
                1.991386 = boost
                4.022287 = idf(docFreq=2080, maxDocs=42740)
                0.018180247 = queryNorm
              0.50278586 = fieldWeight in 5222, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.022287 = idf(docFreq=2080, maxDocs=42740)
                0.0625 = fieldNorm(doc=5222)
          0.20634183 = weight(abstract_txt:dependencies in 5222) [ClassicSimilarity], result of:
            0.20634183 = score(doc=5222,freq=1.0), product of:
              0.41903728 = queryWeight, product of:
                2.9254878 = boost
                7.8787007 = idf(docFreq=43, maxDocs=42740)
                0.018180247 = queryNorm
              0.4924188 = fieldWeight in 5222, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.8787007 = idf(docFreq=43, maxDocs=42740)
                0.0625 = fieldNorm(doc=5222)
          0.17434202 = weight(abstract_txt:models in 5222) [ClassicSimilarity], result of:
            0.17434202 = score(doc=5222,freq=4.0), product of:
              0.29724818 = queryWeight, product of:
                3.4845488 = boost
                4.6921606 = idf(docFreq=1064, maxDocs=42740)
                0.018180247 = queryNorm
              0.5865201 = fieldWeight in 5222, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.6921606 = idf(docFreq=1064, maxDocs=42740)
                0.0625 = fieldNorm(doc=5222)
        0.36 = coord(9/25)
    
  2. Vilares, J.; Alonso, M.A.; Vilares, M.: Extraction of complex index terms in non-English IR : a shallow parsing based approach (2008) 0.21
    0.2071048 = sum of:
      0.2071048 = product of:
        0.73966 = sum of:
          0.027725529 = weight(abstract_txt:documents in 4108) [ClassicSimilarity], result of:
            0.027725529 = score(doc=4108,freq=2.0), product of:
              0.07622088 = queryWeight, product of:
                1.0187398 = boost
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.018180247 = queryNorm
              0.36375242 = fieldWeight in 4108, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.0625 = fieldNorm(doc=4108)
          0.08237966 = weight(abstract_txt:shallow in 4108) [ClassicSimilarity], result of:
            0.08237966 = score(doc=4108,freq=1.0), product of:
              0.1575315 = queryWeight, product of:
                1.0356071 = boost
                8.367054 = idf(docFreq=26, maxDocs=42740)
                0.018180247 = queryNorm
              0.5229409 = fieldWeight in 4108, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.367054 = idf(docFreq=26, maxDocs=42740)
                0.0625 = fieldNorm(doc=4108)
          0.029340833 = weight(abstract_txt:language in 4108) [ClassicSimilarity], result of:
            0.029340833 = score(doc=4108,freq=2.0), product of:
              0.0791533 = queryWeight, product of:
                1.0381517 = boost
                4.1938066 = idf(docFreq=1752, maxDocs=42740)
                0.018180247 = queryNorm
              0.37068364 = fieldWeight in 4108, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1938066 = idf(docFreq=1752, maxDocs=42740)
                0.0625 = fieldNorm(doc=4108)
          0.027986264 = weight(abstract_txt:possible in 4108) [ClassicSimilarity], result of:
            0.027986264 = score(doc=4108,freq=1.0), product of:
              0.09663342 = queryWeight, product of:
                1.1470702 = boost
                4.633803 = idf(docFreq=1128, maxDocs=42740)
                0.018180247 = queryNorm
              0.28961268 = fieldWeight in 4108, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.633803 = idf(docFreq=1128, maxDocs=42740)
                0.0625 = fieldNorm(doc=4108)
          0.039797507 = weight(abstract_txt:point in 4108) [ClassicSimilarity], result of:
            0.039797507 = score(doc=4108,freq=1.0), product of:
              0.12219909 = queryWeight, product of:
                1.2899126 = boost
                5.2108417 = idf(docFreq=633, maxDocs=42740)
                0.018180247 = queryNorm
              0.3256776 = fieldWeight in 4108, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2108417 = idf(docFreq=633, maxDocs=42740)
                0.0625 = fieldNorm(doc=4108)
          0.119746596 = weight(abstract_txt:linguistic in 4108) [ClassicSimilarity], result of:
            0.119746596 = score(doc=4108,freq=2.0), product of:
              0.23139818 = queryWeight, product of:
                2.1739619 = boost
                5.8547482 = idf(docFreq=332, maxDocs=42740)
                0.018180247 = queryNorm
              0.5174915 = fieldWeight in 4108, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.8547482 = idf(docFreq=332, maxDocs=42740)
                0.0625 = fieldNorm(doc=4108)
          0.41268367 = weight(abstract_txt:dependencies in 4108) [ClassicSimilarity], result of:
            0.41268367 = score(doc=4108,freq=4.0), product of:
              0.41903728 = queryWeight, product of:
                2.9254878 = boost
                7.8787007 = idf(docFreq=43, maxDocs=42740)
                0.018180247 = queryNorm
              0.9848376 = fieldWeight in 4108, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.8787007 = idf(docFreq=43, maxDocs=42740)
                0.0625 = fieldNorm(doc=4108)
        0.28 = coord(7/25)
    
  3. Bodoff, D.; Wong, S.P.-S.: Documents and queries as random variables : history and implications (2006) 0.19
    0.19056396 = sum of:
      0.19056396 = product of:
        0.79401654 = sum of:
          0.042445872 = weight(abstract_txt:documents in 1319) [ClassicSimilarity], result of:
            0.042445872 = score(doc=1319,freq=3.0), product of:
              0.07622088 = queryWeight, product of:
                1.0187398 = boost
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.018180247 = queryNorm
              0.5568799 = fieldWeight in 1319, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.078125 = fieldNorm(doc=1319)
          0.03667604 = weight(abstract_txt:language in 1319) [ClassicSimilarity], result of:
            0.03667604 = score(doc=1319,freq=2.0), product of:
              0.0791533 = queryWeight, product of:
                1.0381517 = boost
                4.1938066 = idf(docFreq=1752, maxDocs=42740)
                0.018180247 = queryNorm
              0.46335456 = fieldWeight in 1319, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1938066 = idf(docFreq=1752, maxDocs=42740)
                0.078125 = fieldNorm(doc=1319)
          0.04459383 = weight(abstract_txt:view in 1319) [ClassicSimilarity], result of:
            0.04459383 = score(doc=1319,freq=1.0), product of:
              0.113607556 = queryWeight, product of:
                1.2437409 = boost
                5.0243225 = idf(docFreq=763, maxDocs=42740)
                0.018180247 = queryNorm
              0.3925252 = fieldWeight in 1319, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0243225 = idf(docFreq=763, maxDocs=42740)
                0.078125 = fieldNorm(doc=1319)
          0.045760565 = weight(abstract_txt:model in 1319) [ClassicSimilarity], result of:
            0.045760565 = score(doc=1319,freq=1.0), product of:
              0.14562243 = queryWeight, product of:
                1.991386 = boost
                4.022287 = idf(docFreq=2080, maxDocs=42740)
                0.018180247 = queryNorm
              0.31424117 = fieldWeight in 1319, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.022287 = idf(docFreq=2080, maxDocs=42740)
                0.078125 = fieldNorm(doc=1319)
          0.38088986 = weight(abstract_txt:probabilistic in 1319) [ClassicSimilarity], result of:
            0.38088986 = score(doc=1319,freq=3.0), product of:
              0.41469288 = queryWeight, product of:
                3.3605056 = boost
                6.787693 = idf(docFreq=130, maxDocs=42740)
                0.018180247 = queryNorm
              0.91848665 = fieldWeight in 1319, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.787693 = idf(docFreq=130, maxDocs=42740)
                0.078125 = fieldNorm(doc=1319)
          0.24365039 = weight(abstract_txt:models in 1319) [ClassicSimilarity], result of:
            0.24365039 = score(doc=1319,freq=5.0), product of:
              0.29724818 = queryWeight, product of:
                3.4845488 = boost
                4.6921606 = idf(docFreq=1064, maxDocs=42740)
                0.018180247 = queryNorm
              0.8196867 = fieldWeight in 1319, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.6921606 = idf(docFreq=1064, maxDocs=42740)
                0.078125 = fieldNorm(doc=1319)
        0.24 = coord(6/25)
    
  4. Lee, C.; Lee, G.G.: Probabilistic information retrieval model for a dependence structured indexing system (2005) 0.19
    0.18864363 = sum of:
      0.18864363 = product of:
        0.78601515 = sum of:
          0.024506137 = weight(abstract_txt:documents in 3005) [ClassicSimilarity], result of:
            0.024506137 = score(doc=3005,freq=1.0), product of:
              0.07622088 = queryWeight, product of:
                1.0187398 = boost
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.018180247 = queryNorm
              0.32151476 = fieldWeight in 3005, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.078125 = fieldNorm(doc=3005)
          0.11868211 = weight(abstract_txt:poisson in 3005) [ClassicSimilarity], result of:
            0.11868211 = score(doc=3005,freq=1.0), product of:
              0.1731693 = queryWeight, product of:
                1.0857923 = boost
                8.772519 = idf(docFreq=17, maxDocs=42740)
                0.018180247 = queryNorm
              0.68535304 = fieldWeight in 3005, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.772519 = idf(docFreq=17, maxDocs=42740)
                0.078125 = fieldNorm(doc=3005)
          0.043123778 = weight(abstract_txt:indexing in 3005) [ClassicSimilarity], result of:
            0.043123778 = score(doc=3005,freq=1.0), product of:
              0.1271742 = queryWeight, product of:
                1.6116527 = boost
                4.34038 = idf(docFreq=1513, maxDocs=42740)
                0.018180247 = queryNorm
              0.3390922 = fieldWeight in 3005, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.34038 = idf(docFreq=1513, maxDocs=42740)
                0.078125 = fieldNorm(doc=3005)
          0.06471521 = weight(abstract_txt:model in 3005) [ClassicSimilarity], result of:
            0.06471521 = score(doc=3005,freq=2.0), product of:
              0.14562243 = queryWeight, product of:
                1.991386 = boost
                4.022287 = idf(docFreq=2080, maxDocs=42740)
                0.018180247 = queryNorm
              0.44440413 = fieldWeight in 3005, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.022287 = idf(docFreq=2080, maxDocs=42740)
                0.078125 = fieldNorm(doc=3005)
          0.38088986 = weight(abstract_txt:probabilistic in 3005) [ClassicSimilarity], result of:
            0.38088986 = score(doc=3005,freq=3.0), product of:
              0.41469288 = queryWeight, product of:
                3.3605056 = boost
                6.787693 = idf(docFreq=130, maxDocs=42740)
                0.018180247 = queryNorm
              0.91848665 = fieldWeight in 3005, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.787693 = idf(docFreq=130, maxDocs=42740)
                0.078125 = fieldNorm(doc=3005)
          0.15409803 = weight(abstract_txt:models in 3005) [ClassicSimilarity], result of:
            0.15409803 = score(doc=3005,freq=2.0), product of:
              0.29724818 = queryWeight, product of:
                3.4845488 = boost
                4.6921606 = idf(docFreq=1064, maxDocs=42740)
                0.018180247 = queryNorm
              0.5184154 = fieldWeight in 3005, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6921606 = idf(docFreq=1064, maxDocs=42740)
                0.078125 = fieldNorm(doc=3005)
        0.24 = coord(6/25)
    
  5. Cho, B.-H.; Lee, C.; Lee, G.G.: Exploring term dependences in probabilistic information retrieval model (2003) 0.17
    0.16774334 = sum of:
      0.16774334 = product of:
        0.6989306 = sum of:
          0.024506137 = weight(abstract_txt:documents in 3078) [ClassicSimilarity], result of:
            0.024506137 = score(doc=3078,freq=1.0), product of:
              0.07622088 = queryWeight, product of:
                1.0187398 = boost
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.018180247 = queryNorm
              0.32151476 = fieldWeight in 3078, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.078125 = fieldNorm(doc=3078)
          0.025933877 = weight(abstract_txt:language in 3078) [ClassicSimilarity], result of:
            0.025933877 = score(doc=3078,freq=1.0), product of:
              0.0791533 = queryWeight, product of:
                1.0381517 = boost
                4.1938066 = idf(docFreq=1752, maxDocs=42740)
                0.018180247 = queryNorm
              0.32764113 = fieldWeight in 3078, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1938066 = idf(docFreq=1752, maxDocs=42740)
                0.078125 = fieldNorm(doc=3078)
          0.11868211 = weight(abstract_txt:poisson in 3078) [ClassicSimilarity], result of:
            0.11868211 = score(doc=3078,freq=1.0), product of:
              0.1731693 = queryWeight, product of:
                1.0857923 = boost
                8.772519 = idf(docFreq=17, maxDocs=42740)
                0.018180247 = queryNorm
              0.68535304 = fieldWeight in 3078, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.772519 = idf(docFreq=17, maxDocs=42740)
                0.078125 = fieldNorm(doc=3078)
          0.06471521 = weight(abstract_txt:model in 3078) [ClassicSimilarity], result of:
            0.06471521 = score(doc=3078,freq=2.0), product of:
              0.14562243 = queryWeight, product of:
                1.991386 = boost
                4.022287 = idf(docFreq=2080, maxDocs=42740)
                0.018180247 = queryNorm
              0.44440413 = fieldWeight in 3078, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.022287 = idf(docFreq=2080, maxDocs=42740)
                0.078125 = fieldNorm(doc=3078)
          0.31099525 = weight(abstract_txt:probabilistic in 3078) [ClassicSimilarity], result of:
            0.31099525 = score(doc=3078,freq=2.0), product of:
              0.41469288 = queryWeight, product of:
                3.3605056 = boost
                6.787693 = idf(docFreq=130, maxDocs=42740)
                0.018180247 = queryNorm
              0.7499412 = fieldWeight in 3078, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.787693 = idf(docFreq=130, maxDocs=42740)
                0.078125 = fieldNorm(doc=3078)
          0.15409803 = weight(abstract_txt:models in 3078) [ClassicSimilarity], result of:
            0.15409803 = score(doc=3078,freq=2.0), product of:
              0.29724818 = queryWeight, product of:
                3.4845488 = boost
                4.6921606 = idf(docFreq=1064, maxDocs=42740)
                0.018180247 = queryNorm
              0.5184154 = fieldWeight in 3078, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6921606 = idf(docFreq=1064, maxDocs=42740)
                0.078125 = fieldNorm(doc=3078)
        0.24 = coord(6/25)