Document (#42085)

Author
Goslin, K.
Hofmann, M.
Title
¬A Wikipedia powered state-based approach to automatic search query enhancement
Source
Information processing and management. 54(2018) no.4, S.726-739
Year
2018
Abstract
This paper describes the development and testing of a novel Automatic Search Query Enhancement (ASQE) algorithm, the Wikipedia N Sub-state Algorithm (WNSSA), which utilises Wikipedia as the sole data source for prior knowledge. This algorithm is built upon the concept of iterative states and sub-states, harnessing the power of Wikipedia's data set and link information to identify and utilise reoccurring terms to aid term selection and weighting during enhancement. This algorithm is designed to prevent query drift by making callbacks to the user's original search intent by persisting the original query between internal states with additional selected enhancement terms. The developed algorithm has shown to improve both short and long queries by providing a better understanding of the query and available data. The proposed algorithm was compared against five existing ASQE algorithms that utilise Wikipedia as the sole data source, showing an average Mean Average Precision (MAP) improvement of 0.273 over the tested existing ASQE algorithms.
Content
Vgl.: https://doi.org/10.1016/j.ipm.2017.10.001.
Theme
Semantisches Umfeld in Indexierung u. Retrieval
Object
Wikipedia

Similar documents (author)

  1. Hofmann, U.: Kritische Erfolgsfaktoren führender US-Bibliotheken (1992) 5.42
    5.416974 = sum of:
      5.416974 = weight(author_txt:hofmann in 3974) [ClassicSimilarity], result of:
        5.416974 = fieldWeight in 3974, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.667158 = idf(docFreq=19, maxDocs=42740)
          0.625 = fieldNorm(doc=3974)
    
  2. Hofmann, U.: Bibliothek und Buchhandel im Verbund : Kosten und Nutzen integrierter Informationsverarbeitung (1993) 5.42
    5.416974 = sum of:
      5.416974 = weight(author_txt:hofmann in 4500) [ClassicSimilarity], result of:
        5.416974 = fieldWeight in 4500, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.667158 = idf(docFreq=19, maxDocs=42740)
          0.625 = fieldNorm(doc=4500)
    
  3. Hofmann, W.: Zur Frage internationaler Klassifikationssysteme (1947) 5.42
    5.416974 = sum of:
      5.416974 = weight(author_txt:hofmann in 5203) [ClassicSimilarity], result of:
        5.416974 = fieldWeight in 5203, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.667158 = idf(docFreq=19, maxDocs=42740)
          0.625 = fieldNorm(doc=5203)
    
  4. Hofmann, M.: DFÜ mit dem PC : eine Übersicht über Methoden, Hardware und Software zur Datenkommunikation zwischen PCs (1992) 5.42
    5.416974 = sum of:
      5.416974 = weight(author_txt:hofmann in 6934) [ClassicSimilarity], result of:
        5.416974 = fieldWeight in 6934, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.667158 = idf(docFreq=19, maxDocs=42740)
          0.625 = fieldNorm(doc=6934)
    
  5. Hofmann, M.: TREC Konferenzbericht (7.10.93) (1995) 5.42
    5.416974 = sum of:
      5.416974 = weight(author_txt:hofmann in 5074) [ClassicSimilarity], result of:
        5.416974 = fieldWeight in 5074, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.667158 = idf(docFreq=19, maxDocs=42740)
          0.625 = fieldNorm(doc=5074)
    

Similar documents (content)

  1. Sankarasubramaniam, Y.; Ramanathan, K.; Ghosh, S.: Text summarization using Wikipedia (2014) 0.17
    0.1741037 = sum of:
      0.1741037 = product of:
        0.62179893 = sum of:
          0.053545214 = weight(abstract_txt:iterative in 4694) [ClassicSimilarity], result of:
            0.053545214 = score(doc=4694,freq=1.0), product of:
              0.11319525 = queryWeight, product of:
                7.568546 = idf(docFreq=59, maxDocs=42740)
                0.0149560105 = queryNorm
              0.4730341 = fieldWeight in 4694, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.568546 = idf(docFreq=59, maxDocs=42740)
                0.0625 = fieldNorm(doc=4694)
          0.016530585 = weight(abstract_txt:terms in 4694) [ClassicSimilarity], result of:
            0.016530585 = score(doc=4694,freq=1.0), product of:
              0.06514532 = queryWeight, product of:
                1.0728587 = boost
                4.05999 = idf(docFreq=2003, maxDocs=42740)
                0.0149560105 = queryNorm
              0.25374937 = fieldWeight in 4694, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.05999 = idf(docFreq=2003, maxDocs=42740)
                0.0625 = fieldNorm(doc=4694)
          0.0347287 = weight(abstract_txt:automatic in 4694) [ClassicSimilarity], result of:
            0.0347287 = score(doc=4694,freq=1.0), product of:
              0.10686039 = queryWeight, product of:
                1.3740714 = boost
                5.199861 = idf(docFreq=640, maxDocs=42740)
                0.0149560105 = queryNorm
              0.32499132 = fieldWeight in 4694, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.199861 = idf(docFreq=640, maxDocs=42740)
                0.0625 = fieldNorm(doc=4694)
          0.066670954 = weight(abstract_txt:algorithms in 4694) [ClassicSimilarity], result of:
            0.066670954 = score(doc=4694,freq=2.0), product of:
              0.13101034 = queryWeight, product of:
                1.521436 = boost
                5.757529 = idf(docFreq=366, maxDocs=42740)
                0.0149560105 = queryNorm
              0.50889844 = fieldWeight in 4694, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.757529 = idf(docFreq=366, maxDocs=42740)
                0.0625 = fieldNorm(doc=4694)
          0.0654886 = weight(abstract_txt:query in 4694) [ClassicSimilarity], result of:
            0.0654886 = score(doc=4694,freq=1.0), product of:
              0.22136803 = queryWeight, product of:
                3.1270034 = boost
                4.7333736 = idf(docFreq=1021, maxDocs=42740)
                0.0149560105 = queryNorm
              0.29583585 = fieldWeight in 4694, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7333736 = idf(docFreq=1021, maxDocs=42740)
                0.0625 = fieldNorm(doc=4694)
          0.24614541 = weight(abstract_txt:wikipedia in 4694) [ClassicSimilarity], result of:
            0.24614541 = score(doc=4694,freq=4.0), product of:
              0.31295046 = queryWeight, product of:
                3.325475 = boost
                6.2922525 = idf(docFreq=214, maxDocs=42740)
                0.0149560105 = queryNorm
              0.78653157 = fieldWeight in 4694, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.2922525 = idf(docFreq=214, maxDocs=42740)
                0.0625 = fieldNorm(doc=4694)
          0.13868947 = weight(abstract_txt:algorithm in 4694) [ClassicSimilarity], result of:
            0.13868947 = score(doc=4694,freq=1.0), product of:
              0.38793638 = queryWeight, product of:
                4.5346293 = boost
                5.7200913 = idf(docFreq=380, maxDocs=42740)
                0.0149560105 = queryNorm
              0.3575057 = fieldWeight in 4694, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7200913 = idf(docFreq=380, maxDocs=42740)
                0.0625 = fieldNorm(doc=4694)
        0.28 = coord(7/25)
    
  2. Brandão, W.C.; Santos, R.L.T.; Ziviani, N.; Moura, E.S. de; Silva, A.S. da: Learning to expand queries using entities (2014) 0.17
    0.16755843 = sum of:
      0.16755843 = product of:
        0.5236201 = sum of:
          0.0404915 = weight(abstract_txt:terms in 3344) [ClassicSimilarity], result of:
            0.0404915 = score(doc=3344,freq=6.0), product of:
              0.06514532 = queryWeight, product of:
                1.0728587 = boost
                4.05999 = idf(docFreq=2003, maxDocs=42740)
                0.0149560105 = queryNorm
              0.6215565 = fieldWeight in 3344, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.05999 = idf(docFreq=2003, maxDocs=42740)
                0.0625 = fieldNorm(doc=3344)
          0.025640342 = weight(abstract_txt:existing in 3344) [ClassicSimilarity], result of:
            0.025640342 = score(doc=3344,freq=1.0), product of:
              0.08729184 = queryWeight, product of:
                1.2419031 = boost
                4.6997004 = idf(docFreq=1056, maxDocs=42740)
                0.0149560105 = queryNorm
              0.29373127 = fieldWeight in 3344, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6997004 = idf(docFreq=1056, maxDocs=42740)
                0.0625 = fieldNorm(doc=3344)
          0.028247027 = weight(abstract_txt:state in 3344) [ClassicSimilarity], result of:
            0.028247027 = score(doc=3344,freq=1.0), product of:
              0.093112126 = queryWeight, product of:
                1.2826377 = boost
                4.8538513 = idf(docFreq=905, maxDocs=42740)
                0.0149560105 = queryNorm
              0.3033657 = fieldWeight in 3344, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8538513 = idf(docFreq=905, maxDocs=42740)
                0.0625 = fieldNorm(doc=3344)
          0.039216794 = weight(abstract_txt:original in 3344) [ClassicSimilarity], result of:
            0.039216794 = score(doc=3344,freq=1.0), product of:
              0.11587929 = queryWeight, product of:
                1.430882 = boost
                5.414848 = idf(docFreq=516, maxDocs=42740)
                0.0149560105 = queryNorm
              0.338428 = fieldWeight in 3344, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.414848 = idf(docFreq=516, maxDocs=42740)
                0.0625 = fieldNorm(doc=3344)
          0.018039778 = weight(abstract_txt:search in 3344) [ClassicSimilarity], result of:
            0.018039778 = score(doc=3344,freq=1.0), product of:
              0.07904523 = queryWeight, product of:
                1.447386 = boost
                3.6515355 = idf(docFreq=3014, maxDocs=42740)
                0.0149560105 = queryNorm
              0.22822097 = fieldWeight in 3344, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6515355 = idf(docFreq=3014, maxDocs=42740)
                0.0625 = fieldNorm(doc=3344)
          0.05149658 = weight(abstract_txt:average in 3344) [ClassicSimilarity], result of:
            0.05149658 = score(doc=3344,freq=1.0), product of:
              0.1389558 = queryWeight, product of:
                1.5668926 = boost
                5.929549 = idf(docFreq=308, maxDocs=42740)
                0.0149560105 = queryNorm
              0.37059683 = fieldWeight in 3344, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.929549 = idf(docFreq=308, maxDocs=42740)
                0.0625 = fieldNorm(doc=3344)
          0.14643696 = weight(abstract_txt:query in 3344) [ClassicSimilarity], result of:
            0.14643696 = score(doc=3344,freq=5.0), product of:
              0.22136803 = queryWeight, product of:
                3.1270034 = boost
                4.7333736 = idf(docFreq=1021, maxDocs=42740)
                0.0149560105 = queryNorm
              0.6615091 = fieldWeight in 3344, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.7333736 = idf(docFreq=1021, maxDocs=42740)
                0.0625 = fieldNorm(doc=3344)
          0.1740511 = weight(abstract_txt:wikipedia in 3344) [ClassicSimilarity], result of:
            0.1740511 = score(doc=3344,freq=2.0), product of:
              0.31295046 = queryWeight, product of:
                3.325475 = boost
                6.2922525 = idf(docFreq=214, maxDocs=42740)
                0.0149560105 = queryNorm
              0.5561618 = fieldWeight in 3344, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.2922525 = idf(docFreq=214, maxDocs=42740)
                0.0625 = fieldNorm(doc=3344)
        0.32 = coord(8/25)
    
  3. Selvaretnam, B.; Belkhatir, M.: ¬A linguistically driven framework for query expansion via grammatical constituent highlighting and role-based concept weighting (2016) 0.15
    0.14535215 = sum of:
      0.14535215 = product of:
        0.605634 = sum of:
          0.069765836 = weight(abstract_txt:intent in 4877) [ClassicSimilarity], result of:
            0.069765836 = score(doc=4877,freq=1.0), product of:
              0.116368726 = queryWeight, product of:
                1.0139208 = boost
                7.6739063 = idf(docFreq=53, maxDocs=42740)
                0.0149560105 = queryNorm
              0.5995239 = fieldWeight in 4877, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.6739063 = idf(docFreq=53, maxDocs=42740)
                0.078125 = fieldNorm(doc=4877)
          0.02066323 = weight(abstract_txt:terms in 4877) [ClassicSimilarity], result of:
            0.02066323 = score(doc=4877,freq=1.0), product of:
              0.06514532 = queryWeight, product of:
                1.0728587 = boost
                4.05999 = idf(docFreq=2003, maxDocs=42740)
                0.0149560105 = queryNorm
              0.3171867 = fieldWeight in 4877, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.05999 = idf(docFreq=2003, maxDocs=42740)
                0.078125 = fieldNorm(doc=4877)
          0.031890124 = weight(abstract_txt:search in 4877) [ClassicSimilarity], result of:
            0.031890124 = score(doc=4877,freq=2.0), product of:
              0.07904523 = queryWeight, product of:
                1.447386 = boost
                3.6515355 = idf(docFreq=3014, maxDocs=42740)
                0.0149560105 = queryNorm
              0.4034415 = fieldWeight in 4877, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.6515355 = idf(docFreq=3014, maxDocs=42740)
                0.078125 = fieldNorm(doc=4877)
          0.06437073 = weight(abstract_txt:average in 4877) [ClassicSimilarity], result of:
            0.06437073 = score(doc=4877,freq=1.0), product of:
              0.1389558 = queryWeight, product of:
                1.5668926 = boost
                5.929549 = idf(docFreq=308, maxDocs=42740)
                0.0149560105 = queryNorm
              0.46324605 = fieldWeight in 4877, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.929549 = idf(docFreq=308, maxDocs=42740)
                0.078125 = fieldNorm(doc=4877)
          0.24558224 = weight(abstract_txt:query in 4877) [ClassicSimilarity], result of:
            0.24558224 = score(doc=4877,freq=9.0), product of:
              0.22136803 = queryWeight, product of:
                3.1270034 = boost
                4.7333736 = idf(docFreq=1021, maxDocs=42740)
                0.0149560105 = queryNorm
              1.1093844 = fieldWeight in 4877, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                4.7333736 = idf(docFreq=1021, maxDocs=42740)
                0.078125 = fieldNorm(doc=4877)
          0.17336184 = weight(abstract_txt:algorithm in 4877) [ClassicSimilarity], result of:
            0.17336184 = score(doc=4877,freq=1.0), product of:
              0.38793638 = queryWeight, product of:
                4.5346293 = boost
                5.7200913 = idf(docFreq=380, maxDocs=42740)
                0.0149560105 = queryNorm
              0.44688213 = fieldWeight in 4877, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7200913 = idf(docFreq=380, maxDocs=42740)
                0.078125 = fieldNorm(doc=4877)
        0.24 = coord(6/25)
    
  4. Abdelali, A.; Cowie, J.; Soliman, H.S.: Improving query precision using semantic expansion (2007) 0.13
    0.12679635 = sum of:
      0.12679635 = product of:
        0.52831817 = sum of:
          0.02066323 = weight(abstract_txt:terms in 2918) [ClassicSimilarity], result of:
            0.02066323 = score(doc=2918,freq=1.0), product of:
              0.06514532 = queryWeight, product of:
                1.0728587 = boost
                4.05999 = idf(docFreq=2003, maxDocs=42740)
                0.0149560105 = queryNorm
              0.3171867 = fieldWeight in 2918, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.05999 = idf(docFreq=2003, maxDocs=42740)
                0.078125 = fieldNorm(doc=2918)
          0.045326147 = weight(abstract_txt:existing in 2918) [ClassicSimilarity], result of:
            0.045326147 = score(doc=2918,freq=2.0), product of:
              0.08729184 = queryWeight, product of:
                1.2419031 = boost
                4.6997004 = idf(docFreq=1056, maxDocs=42740)
                0.0149560105 = queryNorm
              0.5192484 = fieldWeight in 2918, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6997004 = idf(docFreq=1056, maxDocs=42740)
                0.078125 = fieldNorm(doc=2918)
          0.049020994 = weight(abstract_txt:original in 2918) [ClassicSimilarity], result of:
            0.049020994 = score(doc=2918,freq=1.0), product of:
              0.11587929 = queryWeight, product of:
                1.430882 = boost
                5.414848 = idf(docFreq=516, maxDocs=42740)
                0.0149560105 = queryNorm
              0.423035 = fieldWeight in 2918, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.414848 = idf(docFreq=516, maxDocs=42740)
                0.078125 = fieldNorm(doc=2918)
          0.023673166 = weight(abstract_txt:data in 2918) [ClassicSimilarity], result of:
            0.023673166 = score(doc=2918,freq=1.0), product of:
              0.08986667 = queryWeight, product of:
                1.7820308 = boost
                3.3718455 = idf(docFreq=3987, maxDocs=42740)
                0.0149560105 = queryNorm
              0.26342544 = fieldWeight in 2918, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3718455 = idf(docFreq=3987, maxDocs=42740)
                0.078125 = fieldNorm(doc=2918)
          0.1637215 = weight(abstract_txt:query in 2918) [ClassicSimilarity], result of:
            0.1637215 = score(doc=2918,freq=4.0), product of:
              0.22136803 = queryWeight, product of:
                3.1270034 = boost
                4.7333736 = idf(docFreq=1021, maxDocs=42740)
                0.0149560105 = queryNorm
              0.73958963 = fieldWeight in 2918, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.7333736 = idf(docFreq=1021, maxDocs=42740)
                0.078125 = fieldNorm(doc=2918)
          0.22591314 = weight(abstract_txt:enhancement in 2918) [ClassicSimilarity], result of:
            0.22591314 = score(doc=2918,freq=1.0), product of:
              0.40431705 = queryWeight, product of:
                3.7798705 = boost
                7.152031 = idf(docFreq=90, maxDocs=42740)
                0.0149560105 = queryNorm
              0.5587524 = fieldWeight in 2918, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.152031 = idf(docFreq=90, maxDocs=42740)
                0.078125 = fieldNorm(doc=2918)
        0.24 = coord(6/25)
    
  5. Lim, S.: How and why do college students use Wikipedia? (2009) 0.13
    0.12583235 = sum of:
      0.12583235 = product of:
        0.6291617 = sum of:
          0.091652624 = weight(abstract_txt:wikipedia's in 164) [ClassicSimilarity], result of:
            0.091652624 = score(doc=164,freq=1.0), product of:
              0.17705397 = queryWeight, product of:
                1.2506585 = boost
                9.465666 = idf(docFreq=8, maxDocs=42740)
                0.0149560105 = queryNorm
              0.5176536 = fieldWeight in 164, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.465666 = idf(docFreq=8, maxDocs=42740)
                0.0546875 = fieldNorm(doc=164)
          0.024716148 = weight(abstract_txt:state in 164) [ClassicSimilarity], result of:
            0.024716148 = score(doc=164,freq=1.0), product of:
              0.093112126 = queryWeight, product of:
                1.2826377 = boost
                4.8538513 = idf(docFreq=905, maxDocs=42740)
                0.0149560105 = queryNorm
              0.265445 = fieldWeight in 164, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8538513 = idf(docFreq=905, maxDocs=42740)
                0.0546875 = fieldNorm(doc=164)
          0.016571216 = weight(abstract_txt:data in 164) [ClassicSimilarity], result of:
            0.016571216 = score(doc=164,freq=1.0), product of:
              0.08986667 = queryWeight, product of:
                1.7820308 = boost
                3.3718455 = idf(docFreq=3987, maxDocs=42740)
                0.0149560105 = queryNorm
              0.1843978 = fieldWeight in 164, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3718455 = idf(docFreq=3987, maxDocs=42740)
                0.0546875 = fieldNorm(doc=164)
          0.0654672 = weight(abstract_txt:states in 164) [ClassicSimilarity], result of:
            0.0654672 = score(doc=164,freq=1.0), product of:
              0.20404783 = queryWeight, product of:
                2.3254795 = boost
                5.8668327 = idf(docFreq=328, maxDocs=42740)
                0.0149560105 = queryNorm
              0.32084242 = fieldWeight in 164, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8668327 = idf(docFreq=328, maxDocs=42740)
                0.0546875 = fieldNorm(doc=164)
          0.43075448 = weight(abstract_txt:wikipedia in 164) [ClassicSimilarity], result of:
            0.43075448 = score(doc=164,freq=16.0), product of:
              0.31295046 = queryWeight, product of:
                3.325475 = boost
                6.2922525 = idf(docFreq=214, maxDocs=42740)
                0.0149560105 = queryNorm
              1.3764303 = fieldWeight in 164, product of:
                4.0 = tf(freq=16.0), with freq of:
                  16.0 = termFreq=16.0
                6.2922525 = idf(docFreq=214, maxDocs=42740)
                0.0546875 = fieldNorm(doc=164)
        0.2 = coord(5/25)