Document (#34082)

Author
Xiaoyan Li, X.
Croft, W.B.
Title
¬An information-pattern-based approach to novelty detection
Source
Information processing and management. 44(2008) no.3, S.1159-1188
Year
2008
Abstract
In this paper, a new novelty detection approach based on the identification of sentence level information patterns is proposed. First, "novelty" is redefined based on the proposed information patterns, and several different types of information patterns are given corresponding to different types of users' information needs. Second, a thorough analysis of sentence level information patterns is elaborated using data from the TREC novelty tracks, including sentence lengths, named entities (NEs), and sentence level opinion patterns. Finally, a unified information-pattern-based approach to novelty detection (ip-BAND) is presented for both specific NE topics and more general topics. Experiments on novelty detection on data from the TREC 2002, 2003 and 2004 novelty tracks show that the proposed approach significantly improves the performance of novelty detection in terms of precision at top ranks. Future research directions are suggested.

Similar documents (author)

  1. Croft, W.B.: Approaches to intelligent information retrieval (1987) 5.01
    5.009436 = sum of:
      5.009436 = weight(author_txt:croft in 1094) [ClassicSimilarity], result of:
        5.009436 = fieldWeight in 1094, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.015098 = idf(docFreq=37, maxDocs=42306)
          0.625 = fieldNorm(doc=1094)
    
  2. Croft, W.B.: Clustering large files of documents using the single link method (1977) 5.01
    5.009436 = sum of:
      5.009436 = weight(author_txt:croft in 5489) [ClassicSimilarity], result of:
        5.009436 = fieldWeight in 5489, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.015098 = idf(docFreq=37, maxDocs=42306)
          0.625 = fieldNorm(doc=5489)
    
  3. Croft, W.B.: Knowledge-based and statistical approaches to text retrieval (1993) 5.01
    5.009436 = sum of:
      5.009436 = weight(author_txt:croft in 7863) [ClassicSimilarity], result of:
        5.009436 = fieldWeight in 7863, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.015098 = idf(docFreq=37, maxDocs=42306)
          0.625 = fieldNorm(doc=7863)
    
  4. Croft, W.B.: Hypertext and information retrieval : what are the fundamental concepts? (1990) 5.01
    5.009436 = sum of:
      5.009436 = weight(author_txt:croft in 3) [ClassicSimilarity], result of:
        5.009436 = fieldWeight in 3, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.015098 = idf(docFreq=37, maxDocs=42306)
          0.625 = fieldNorm(doc=3)
    
  5. Croft, W.B.: What do people want from information retrieval? : the top 10 research issues for companies that use and sell IR systems (1995) 5.01
    5.009436 = sum of:
      5.009436 = weight(author_txt:croft in 3471) [ClassicSimilarity], result of:
        5.009436 = fieldWeight in 3471, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.015098 = idf(docFreq=37, maxDocs=42306)
          0.625 = fieldNorm(doc=3471)
    

Similar documents (content)

  1. Otterbacher, J.; Radev, D.: Exploring fact-focused relevance and novelty detection (2008) 0.33
    0.32632485 = sum of:
      0.32632485 = product of:
        1.3596869 = sum of:
          0.045714404 = weight(abstract_txt:level in 30) [ClassicSimilarity], result of:
            0.045714404 = score(doc=30,freq=3.0), product of:
              0.09304183 = queryWeight, product of:
                2.0561278 = boost
                4.538728 = idf(docFreq=1228, maxDocs=42306)
                0.009969973 = queryNorm
              0.49133173 = fieldWeight in 30, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.538728 = idf(docFreq=1228, maxDocs=42306)
                0.0625 = fieldNorm(doc=30)
          0.040949047 = weight(abstract_txt:approach in 30) [ClassicSimilarity], result of:
            0.040949047 = score(doc=30,freq=4.0), product of:
              0.08645804 = queryWeight, product of:
                2.2886693 = boost
                3.789033 = idf(docFreq=2600, maxDocs=42306)
                0.009969973 = queryNorm
              0.47362912 = fieldWeight in 30, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.789033 = idf(docFreq=2600, maxDocs=42306)
                0.0625 = fieldNorm(doc=30)
          0.01648872 = weight(abstract_txt:information in 30) [ClassicSimilarity], result of:
            0.01648872 = score(doc=30,freq=3.0), product of:
              0.062530674 = queryWeight, product of:
                2.5748143 = boost
                2.435865 = idf(docFreq=10064, maxDocs=42306)
                0.009969973 = queryNorm
              0.2636901 = fieldWeight in 30, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.435865 = idf(docFreq=10064, maxDocs=42306)
                0.0625 = fieldNorm(doc=30)
          0.1728578 = weight(abstract_txt:sentence in 30) [ClassicSimilarity], result of:
            0.1728578 = score(doc=30,freq=2.0), product of:
              0.2845196 = queryWeight, product of:
                4.1518 = boost
                6.873561 = idf(docFreq=118, maxDocs=42306)
                0.009969973 = queryNorm
              0.6075427 = fieldWeight in 30, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.873561 = idf(docFreq=118, maxDocs=42306)
                0.0625 = fieldNorm(doc=30)
          0.263668 = weight(abstract_txt:detection in 30) [ClassicSimilarity], result of:
            0.263668 = score(doc=30,freq=3.0), product of:
              0.35478404 = queryWeight, product of:
                5.1834316 = boost
                6.8651924 = idf(docFreq=119, maxDocs=42306)
                0.009969973 = queryNorm
              0.74317884 = fieldWeight in 30, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.8651924 = idf(docFreq=119, maxDocs=42306)
                0.0625 = fieldNorm(doc=30)
          0.82000893 = weight(abstract_txt:novelty in 30) [ClassicSimilarity], result of:
            0.82000893 = score(doc=30,freq=5.0), product of:
              0.7456962 = queryWeight, product of:
                9.505529 = boost
                7.8684945 = idf(docFreq=43, maxDocs=42306)
                0.009969973 = queryNorm
              1.0996555 = fieldWeight in 30, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.8684945 = idf(docFreq=43, maxDocs=42306)
                0.0625 = fieldNorm(doc=30)
        0.24 = coord(6/25)
    
  2. An, X.; Huang, J.X.: geNov : a new metric for measuring novelty and relevancy in biomedical information retrieval (2017) 0.28
    0.2803888 = sum of:
      0.2803888 = product of:
        1.0013885 = sum of:
          0.009565699 = weight(abstract_txt:different in 840) [ClassicSimilarity], result of:
            0.009565699 = score(doc=840,freq=1.0), product of:
              0.04131718 = queryWeight, product of:
                1.1187439 = boost
                3.704299 = idf(docFreq=2830, maxDocs=42306)
                0.009969973 = queryNorm
              0.23151869 = fieldWeight in 840, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.704299 = idf(docFreq=2830, maxDocs=42306)
                0.0625 = fieldNorm(doc=840)
          0.012509676 = weight(abstract_txt:based in 840) [ClassicSimilarity], result of:
            0.012509676 = score(doc=840,freq=1.0), product of:
              0.06225299 = queryWeight, product of:
                1.9420502 = boost
                3.2151837 = idf(docFreq=4616, maxDocs=42306)
                0.009969973 = queryNorm
              0.20094898 = fieldWeight in 840, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.2151837 = idf(docFreq=4616, maxDocs=42306)
                0.0625 = fieldNorm(doc=840)
          0.05599355 = weight(abstract_txt:trec in 840) [ClassicSimilarity], result of:
            0.05599355 = score(doc=840,freq=1.0), product of:
              0.13419764 = queryWeight, product of:
                2.0162194 = boost
                6.6759505 = idf(docFreq=144, maxDocs=42306)
                0.009969973 = queryNorm
              0.4172469 = fieldWeight in 840, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6759505 = idf(docFreq=144, maxDocs=42306)
                0.0625 = fieldNorm(doc=840)
          0.037325654 = weight(abstract_txt:level in 840) [ClassicSimilarity], result of:
            0.037325654 = score(doc=840,freq=2.0), product of:
              0.09304183 = queryWeight, product of:
                2.0561278 = boost
                4.538728 = idf(docFreq=1228, maxDocs=42306)
                0.009969973 = queryNorm
              0.40117067 = fieldWeight in 840, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.538728 = idf(docFreq=1228, maxDocs=42306)
                0.0625 = fieldNorm(doc=840)
          0.049496282 = weight(abstract_txt:proposed in 840) [ClassicSimilarity], result of:
            0.049496282 = score(doc=840,freq=3.0), product of:
              0.09810503 = queryWeight, product of:
                2.1113324 = boost
                4.660588 = idf(docFreq=1087, maxDocs=42306)
                0.009969973 = queryNorm
              0.5045234 = fieldWeight in 840, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.660588 = idf(docFreq=1087, maxDocs=42306)
                0.0625 = fieldNorm(doc=840)
          0.01648872 = weight(abstract_txt:information in 840) [ClassicSimilarity], result of:
            0.01648872 = score(doc=840,freq=3.0), product of:
              0.062530674 = queryWeight, product of:
                2.5748143 = boost
                2.435865 = idf(docFreq=10064, maxDocs=42306)
                0.009969973 = queryNorm
              0.2636901 = fieldWeight in 840, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.435865 = idf(docFreq=10064, maxDocs=42306)
                0.0625 = fieldNorm(doc=840)
          0.82000893 = weight(abstract_txt:novelty in 840) [ClassicSimilarity], result of:
            0.82000893 = score(doc=840,freq=5.0), product of:
              0.7456962 = queryWeight, product of:
                9.505529 = boost
                7.8684945 = idf(docFreq=43, maxDocs=42306)
                0.009969973 = queryNorm
              1.0996555 = fieldWeight in 840, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.8684945 = idf(docFreq=43, maxDocs=42306)
                0.0625 = fieldNorm(doc=840)
        0.28 = coord(7/25)
    
  3. MacCain, K.W.: Descriptor and citation retrieval in the medical behavioral sciences literature : retrieval overlaps and novelty distribution (1989) 0.24
    0.23837508 = sum of:
      0.23837508 = product of:
        0.9932295 = sum of:
          0.011957125 = weight(abstract_txt:different in 2290) [ClassicSimilarity], result of:
            0.011957125 = score(doc=2290,freq=1.0), product of:
              0.04131718 = queryWeight, product of:
                1.1187439 = boost
                3.704299 = idf(docFreq=2830, maxDocs=42306)
                0.009969973 = queryNorm
              0.28939837 = fieldWeight in 2290, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.704299 = idf(docFreq=2830, maxDocs=42306)
                0.078125 = fieldNorm(doc=2290)
          0.02146447 = weight(abstract_txt:types in 2290) [ClassicSimilarity], result of:
            0.02146447 = score(doc=2290,freq=1.0), product of:
              0.0610276 = queryWeight, product of:
                1.3596543 = boost
                4.5019827 = idf(docFreq=1274, maxDocs=42306)
                0.009969973 = queryNorm
              0.3517174 = fieldWeight in 2290, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5019827 = idf(docFreq=1274, maxDocs=42306)
                0.078125 = fieldNorm(doc=2290)
          0.055026464 = weight(abstract_txt:topics in 2290) [ClassicSimilarity], result of:
            0.055026464 = score(doc=2290,freq=3.0), product of:
              0.0792599 = queryWeight, product of:
                1.5495019 = boost
                5.1305914 = idf(docFreq=679, maxDocs=42306)
                0.009969973 = queryNorm
              0.6942535 = fieldWeight in 2290, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.1305914 = idf(docFreq=679, maxDocs=42306)
                0.078125 = fieldNorm(doc=2290)
          0.022114191 = weight(abstract_txt:based in 2290) [ClassicSimilarity], result of:
            0.022114191 = score(doc=2290,freq=2.0), product of:
              0.06225299 = queryWeight, product of:
                1.9420502 = boost
                3.2151837 = idf(docFreq=4616, maxDocs=42306)
                0.009969973 = queryNorm
              0.355231 = fieldWeight in 2290, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.2151837 = idf(docFreq=4616, maxDocs=42306)
                0.078125 = fieldNorm(doc=2290)
          0.08869704 = weight(abstract_txt:patterns in 2290) [ClassicSimilarity], result of:
            0.08869704 = score(doc=2290,freq=1.0), product of:
              0.2132876 = queryWeight, product of:
                4.0190015 = boost
                5.322963 = idf(docFreq=560, maxDocs=42306)
                0.009969973 = queryNorm
              0.4158565 = fieldWeight in 2290, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.322963 = idf(docFreq=560, maxDocs=42306)
                0.078125 = fieldNorm(doc=2290)
          0.7939702 = weight(abstract_txt:novelty in 2290) [ClassicSimilarity], result of:
            0.7939702 = score(doc=2290,freq=3.0), product of:
              0.7456962 = queryWeight, product of:
                9.505529 = boost
                7.8684945 = idf(docFreq=43, maxDocs=42306)
                0.009969973 = queryNorm
              1.0647368 = fieldWeight in 2290, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.8684945 = idf(docFreq=43, maxDocs=42306)
                0.078125 = fieldNorm(doc=2290)
        0.24 = coord(6/25)
    
  4. Bando, L.L.; Scholer, F.; Turpin, A.: Query-biased summary generation assisted by query expansion : temporality (2015) 0.23
    0.23013364 = sum of:
      0.23013364 = product of:
        0.82190585 = sum of:
          0.029493678 = weight(abstract_txt:improves in 3821) [ClassicSimilarity], result of:
            0.029493678 = score(doc=3821,freq=1.0), product of:
              0.06946971 = queryWeight, product of:
                1.0257655 = boost
                6.792872 = idf(docFreq=128, maxDocs=42306)
                0.009969973 = queryNorm
              0.4245545 = fieldWeight in 3821, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.792872 = idf(docFreq=128, maxDocs=42306)
                0.0625 = fieldNorm(doc=3821)
          0.05905271 = weight(abstract_txt:lengths in 3821) [ClassicSimilarity], result of:
            0.05905271 = score(doc=3821,freq=1.0), product of:
              0.11035773 = queryWeight, product of:
                1.2928606 = boost
                8.561642 = idf(docFreq=21, maxDocs=42306)
                0.009969973 = queryNorm
              0.5351026 = fieldWeight in 3821, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.561642 = idf(docFreq=21, maxDocs=42306)
                0.0625 = fieldNorm(doc=3821)
          0.05599355 = weight(abstract_txt:trec in 3821) [ClassicSimilarity], result of:
            0.05599355 = score(doc=3821,freq=1.0), product of:
              0.13419764 = queryWeight, product of:
                2.0162194 = boost
                6.6759505 = idf(docFreq=144, maxDocs=42306)
                0.009969973 = queryNorm
              0.4172469 = fieldWeight in 3821, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6759505 = idf(docFreq=144, maxDocs=42306)
                0.0625 = fieldNorm(doc=3821)
          0.045714404 = weight(abstract_txt:level in 3821) [ClassicSimilarity], result of:
            0.045714404 = score(doc=3821,freq=3.0), product of:
              0.09304183 = queryWeight, product of:
                2.0561278 = boost
                4.538728 = idf(docFreq=1228, maxDocs=42306)
                0.009969973 = queryNorm
              0.49133173 = fieldWeight in 3821, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.538728 = idf(docFreq=1228, maxDocs=42306)
                0.0625 = fieldNorm(doc=3821)
          0.020474523 = weight(abstract_txt:approach in 3821) [ClassicSimilarity], result of:
            0.020474523 = score(doc=3821,freq=1.0), product of:
              0.08645804 = queryWeight, product of:
                2.2886693 = boost
                3.789033 = idf(docFreq=2600, maxDocs=42306)
                0.009969973 = queryNorm
              0.23681456 = fieldWeight in 3821, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.789033 = idf(docFreq=2600, maxDocs=42306)
                0.0625 = fieldNorm(doc=3821)
          0.24445786 = weight(abstract_txt:sentence in 3821) [ClassicSimilarity], result of:
            0.24445786 = score(doc=3821,freq=4.0), product of:
              0.2845196 = queryWeight, product of:
                4.1518 = boost
                6.873561 = idf(docFreq=118, maxDocs=42306)
                0.009969973 = queryNorm
              0.8591951 = fieldWeight in 3821, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.873561 = idf(docFreq=118, maxDocs=42306)
                0.0625 = fieldNorm(doc=3821)
          0.36671916 = weight(abstract_txt:novelty in 3821) [ClassicSimilarity], result of:
            0.36671916 = score(doc=3821,freq=1.0), product of:
              0.7456962 = queryWeight, product of:
                9.505529 = boost
                7.8684945 = idf(docFreq=43, maxDocs=42306)
                0.009969973 = queryNorm
              0.4917809 = fieldWeight in 3821, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.8684945 = idf(docFreq=43, maxDocs=42306)
                0.0625 = fieldNorm(doc=3821)
        0.28 = coord(7/25)
    
  5. Aksoy, C.; Can, F.; Kocberber, S.: Novelty detection for topic tracking (2012) 0.22
    0.22413209 = sum of:
      0.22413209 = product of:
        0.9338837 = sum of:
          0.009565699 = weight(abstract_txt:different in 2052) [ClassicSimilarity], result of:
            0.009565699 = score(doc=2052,freq=1.0), product of:
              0.04131718 = queryWeight, product of:
                1.1187439 = boost
                3.704299 = idf(docFreq=2830, maxDocs=42306)
                0.009969973 = queryNorm
              0.23151869 = fieldWeight in 2052, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.704299 = idf(docFreq=2830, maxDocs=42306)
                0.0625 = fieldNorm(doc=2052)
          0.035382707 = weight(abstract_txt:based in 2052) [ClassicSimilarity], result of:
            0.035382707 = score(doc=2052,freq=8.0), product of:
              0.06225299 = queryWeight, product of:
                1.9420502 = boost
                3.2151837 = idf(docFreq=4616, maxDocs=42306)
                0.009969973 = queryNorm
              0.56836957 = fieldWeight in 2052, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                3.2151837 = idf(docFreq=4616, maxDocs=42306)
                0.0625 = fieldNorm(doc=2052)
          0.02895535 = weight(abstract_txt:approach in 2052) [ClassicSimilarity], result of:
            0.02895535 = score(doc=2052,freq=2.0), product of:
              0.08645804 = queryWeight, product of:
                2.2886693 = boost
                3.789033 = idf(docFreq=2600, maxDocs=42306)
                0.009969973 = queryNorm
              0.33490637 = fieldWeight in 2052, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.789033 = idf(docFreq=2600, maxDocs=42306)
                0.0625 = fieldNorm(doc=2052)
          0.009519767 = weight(abstract_txt:information in 2052) [ClassicSimilarity], result of:
            0.009519767 = score(doc=2052,freq=1.0), product of:
              0.062530674 = queryWeight, product of:
                2.5748143 = boost
                2.435865 = idf(docFreq=10064, maxDocs=42306)
                0.009969973 = queryNorm
              0.15224156 = fieldWeight in 2052, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.435865 = idf(docFreq=10064, maxDocs=42306)
                0.0625 = fieldNorm(doc=2052)
          0.21528402 = weight(abstract_txt:detection in 2052) [ClassicSimilarity], result of:
            0.21528402 = score(doc=2052,freq=2.0), product of:
              0.35478404 = queryWeight, product of:
                5.1834316 = boost
                6.8651924 = idf(docFreq=119, maxDocs=42306)
                0.009969973 = queryNorm
              0.606803 = fieldWeight in 2052, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.8651924 = idf(docFreq=119, maxDocs=42306)
                0.0625 = fieldNorm(doc=2052)
          0.6351762 = weight(abstract_txt:novelty in 2052) [ClassicSimilarity], result of:
            0.6351762 = score(doc=2052,freq=3.0), product of:
              0.7456962 = queryWeight, product of:
                9.505529 = boost
                7.8684945 = idf(docFreq=43, maxDocs=42306)
                0.009969973 = queryNorm
              0.8517895 = fieldWeight in 2052, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.8684945 = idf(docFreq=43, maxDocs=42306)
                0.0625 = fieldNorm(doc=2052)
        0.24 = coord(6/25)