Document (#31118)

Author
Thelwall, M.
Prabowo, R.
Fairclough, R.
Title
Are raw RSS feeds suitable for broad issue scanning? : a science concern case study
Source
Journal of the American Society for Information Science and Technology. 57(2006) no.12, S.1644-1654
Year
2006
Abstract
Broad issue scanning is the task of identifying important public debates arising in a given broad issue; really simple syndication (RSS) feeds are a natural information source for investigating broad issues. RSS, as originally conceived, is a method for publishing timely and concise information on the Internet, for example, about the main stories in a news site or the latest postings in a blog. RSS feeds are potentially a nonintrusive source of high-quality data about public opinion: Monitoring a large number may allow quantitative methods to extract information relevant to a given need. In this article we describe an RSS feed-based coword frequency method to identify bursts of discussion relevant to a given broad issue. A case study of public science concerns is used to demonstrate the method and assess the suitability of raw RSS feeds for broad issue scanning (i.e., without data cleansing). An attempt to identify genuine science concern debates from the corpus through investigating the top 1,000 "burst" words found only two genuine debates, however. The low success rate was mainly caused by a few pathological feeds that dominated the results and obscured any significant debates. The results point to the need to develop effective data cleansing procedures for RSS feeds, particularly if there is not a large quantity of discussion about the broad issue, and a range of potential techniques is suggested. Finally, the analysis confirmed that the time series information generated by real-time monitoring of RSS feeds could usefully illustrate the evolution of new debates relevant to a broad issue.
Object
RSS

Similar documents (author)

  1. Thelwall, M.: Extracting macroscopic information from Web links (2001) 4.35
    4.345732 = sum of:
      4.345732 = weight(author_txt:thelwall in 852) [ClassicSimilarity], result of:
        4.345732 = fieldWeight in 852, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          6.9531717 = idf(docFreq=108, maxDocs=41962)
          0.625 = fieldNorm(doc=852)
    
  2. Thelwall, M.: Conceptualizing documentation on the Web : an evaluation of different heuristic-based models for counting links between university Web sites (2002) 4.35
    4.345732 = sum of:
      4.345732 = weight(author_txt:thelwall in 1979) [ClassicSimilarity], result of:
        4.345732 = fieldWeight in 1979, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          6.9531717 = idf(docFreq=108, maxDocs=41962)
          0.625 = fieldNorm(doc=1979)
    
  3. Thelwall, M.: Text characteristics of English language university Web sites (2005) 4.35
    4.345732 = sum of:
      4.345732 = weight(author_txt:thelwall in 4464) [ClassicSimilarity], result of:
        4.345732 = fieldWeight in 4464, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          6.9531717 = idf(docFreq=108, maxDocs=41962)
          0.625 = fieldNorm(doc=4464)
    
  4. Thelwall, M.: Bibliometrics to webometrics (2009) 4.35
    4.345732 = sum of:
      4.345732 = weight(author_txt:thelwall in 240) [ClassicSimilarity], result of:
        4.345732 = fieldWeight in 240, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          6.9531717 = idf(docFreq=108, maxDocs=41962)
          0.625 = fieldNorm(doc=240)
    
  5. Thelwall, M.: ¬A layered approach for investigating the topological structure of communities in the Web (2003) 4.35
    4.345732 = sum of:
      4.345732 = weight(author_txt:thelwall in 451) [ClassicSimilarity], result of:
        4.345732 = fieldWeight in 451, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          6.9531717 = idf(docFreq=108, maxDocs=41962)
          0.625 = fieldNorm(doc=451)
    

Similar documents (content)

  1. Farooq, U.; Ganoe, C.H.; Carroll, J.M.; Councill, I.G.; Giles, C.L.: Design and evaluation of awareness mechanisms in CiteSeer (2008) 0.16
    0.16264956 = sum of:
      0.16264956 = product of:
        0.8132478 = sum of:
          0.006431989 = weight(abstract_txt:information in 4052) [ClassicSimilarity], result of:
            0.006431989 = score(doc=4052,freq=2.0), product of:
              0.02982859 = queryWeight, product of:
                1.0202087 = boost
                2.439594 = idf(docFreq=9945, maxDocs=41962)
                0.011984671 = queryNorm
              0.21563168 = fieldWeight in 4052, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.439594 = idf(docFreq=9945, maxDocs=41962)
                0.0625 = fieldNorm(doc=4052)
          0.014306918 = weight(abstract_txt:science in 4052) [ClassicSimilarity], result of:
            0.014306918 = score(doc=4052,freq=1.0), product of:
              0.05818312 = queryWeight, product of:
                1.233962 = boost
                3.9343145 = idf(docFreq=2230, maxDocs=41962)
                0.011984671 = queryNorm
              0.24589466 = fieldWeight in 4052, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9343145 = idf(docFreq=2230, maxDocs=41962)
                0.0625 = fieldNorm(doc=4052)
          0.046574064 = weight(abstract_txt:investigating in 4052) [ClassicSimilarity], result of:
            0.046574064 = score(doc=4052,freq=1.0), product of:
              0.11164311 = queryWeight, product of:
                1.3956409 = boost
                6.6747065 = idf(docFreq=143, maxDocs=41962)
                0.011984671 = queryNorm
              0.41716915 = fieldWeight in 4052, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6747065 = idf(docFreq=143, maxDocs=41962)
                0.0625 = fieldNorm(doc=4052)
          0.023970919 = weight(abstract_txt:relevant in 4052) [ClassicSimilarity], result of:
            0.023970919 = score(doc=4052,freq=1.0), product of:
              0.082077235 = queryWeight, product of:
                1.4655974 = boost
                4.672851 = idf(docFreq=1065, maxDocs=41962)
                0.011984671 = queryNorm
              0.2920532 = fieldWeight in 4052, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.672851 = idf(docFreq=1065, maxDocs=41962)
                0.0625 = fieldNorm(doc=4052)
          0.72196394 = weight(abstract_txt:feeds in 4052) [ClassicSimilarity], result of:
            0.72196394 = score(doc=4052,freq=4.0), product of:
              0.6638687 = queryWeight, product of:
                6.36697 = boost
                8.700081 = idf(docFreq=18, maxDocs=41962)
                0.011984671 = queryNorm
              1.0875101 = fieldWeight in 4052, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                8.700081 = idf(docFreq=18, maxDocs=41962)
                0.0625 = fieldNorm(doc=4052)
        0.2 = coord(5/25)
    
  2. Thelwall, M.; Prabowo, R.: Identifying and characterizing public science-related fears from RSS feeds (2007) 0.13
    0.1263309 = sum of:
      0.1263309 = product of:
        0.63165444 = sum of:
          0.04380581 = weight(abstract_txt:science in 2138) [ClassicSimilarity], result of:
            0.04380581 = score(doc=2138,freq=6.0), product of:
              0.05818312 = queryWeight, product of:
                1.233962 = boost
                3.9343145 = idf(docFreq=2230, maxDocs=41962)
                0.011984671 = queryNorm
              0.75289553 = fieldWeight in 2138, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.9343145 = idf(docFreq=2230, maxDocs=41962)
                0.078125 = fieldNorm(doc=2138)
          0.050761886 = weight(abstract_txt:concern in 2138) [ClassicSimilarity], result of:
            0.050761886 = score(doc=2138,freq=1.0), product of:
              0.10189535 = queryWeight, product of:
                1.3333215 = boost
                6.376662 = idf(docFreq=193, maxDocs=41962)
                0.011984671 = queryNorm
              0.4981767 = fieldWeight in 2138, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.376662 = idf(docFreq=193, maxDocs=41962)
                0.078125 = fieldNorm(doc=2138)
          0.027576664 = weight(abstract_txt:method in 2138) [ClassicSimilarity], result of:
            0.027576664 = score(doc=2138,freq=1.0), product of:
              0.07765821 = queryWeight, product of:
                1.4255978 = boost
                4.545318 = idf(docFreq=1210, maxDocs=41962)
                0.011984671 = queryNorm
              0.355103 = fieldWeight in 2138, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.545318 = idf(docFreq=1210, maxDocs=41962)
                0.078125 = fieldNorm(doc=2138)
          0.058282603 = weight(abstract_txt:public in 2138) [ClassicSimilarity], result of:
            0.058282603 = score(doc=2138,freq=4.0), product of:
              0.08056855 = queryWeight, product of:
                1.4520651 = boost
                4.6297054 = idf(docFreq=1112, maxDocs=41962)
                0.011984671 = queryNorm
              0.7233915 = fieldWeight in 2138, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.6297054 = idf(docFreq=1112, maxDocs=41962)
                0.078125 = fieldNorm(doc=2138)
          0.45122746 = weight(abstract_txt:feeds in 2138) [ClassicSimilarity], result of:
            0.45122746 = score(doc=2138,freq=1.0), product of:
              0.6638687 = queryWeight, product of:
                6.36697 = boost
                8.700081 = idf(docFreq=18, maxDocs=41962)
                0.011984671 = queryNorm
              0.6796938 = fieldWeight in 2138, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.700081 = idf(docFreq=18, maxDocs=41962)
                0.078125 = fieldNorm(doc=2138)
        0.2 = coord(5/25)
    
  3. Cornelius, I.: Theorizing information for information science (2002) 0.11
    0.110168405 = sum of:
      0.110168405 = product of:
        0.34427628 = sum of:
          0.016574856 = weight(abstract_txt:information in 245) [ClassicSimilarity], result of:
            0.016574856 = score(doc=245,freq=34.0), product of:
              0.02982859 = queryWeight, product of:
                1.0202087 = boost
                2.439594 = idf(docFreq=9945, maxDocs=41962)
                0.011984671 = queryNorm
              0.55567014 = fieldWeight in 245, product of:
                5.8309517 = tf(freq=34.0), with freq of:
                  34.0 = termFreq=34.0
                2.439594 = idf(docFreq=9945, maxDocs=41962)
                0.0390625 = fieldNorm(doc=245)
          0.009974281 = weight(abstract_txt:data in 245) [ClassicSimilarity], result of:
            0.009974281 = score(doc=245,freq=3.0), product of:
              0.043390393 = queryWeight, product of:
                1.0656145 = boost
                3.3975618 = idf(docFreq=3815, maxDocs=41962)
                0.011984671 = queryNorm
              0.22987303 = fieldWeight in 245, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.3975618 = idf(docFreq=3815, maxDocs=41962)
                0.0390625 = fieldNorm(doc=245)
          0.013248345 = weight(abstract_txt:source in 245) [ClassicSimilarity], result of:
            0.013248345 = score(doc=245,freq=1.0), product of:
              0.06605772 = queryWeight, product of:
                1.0735431 = boost
                5.1342616 = idf(docFreq=671, maxDocs=41962)
                0.011984671 = queryNorm
              0.2005571 = fieldWeight in 245, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1342616 = idf(docFreq=671, maxDocs=41962)
                0.0390625 = fieldNorm(doc=245)
          0.013666733 = weight(abstract_txt:discussion in 245) [ClassicSimilarity], result of:
            0.013666733 = score(doc=245,freq=1.0), product of:
              0.067441255 = queryWeight, product of:
                1.0847272 = boost
                5.18775 = idf(docFreq=636, maxDocs=41962)
                0.011984671 = queryNorm
              0.20264648 = fieldWeight in 245, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.18775 = idf(docFreq=636, maxDocs=41962)
                0.0390625 = fieldNorm(doc=245)
          0.02682547 = weight(abstract_txt:science in 245) [ClassicSimilarity], result of:
            0.02682547 = score(doc=245,freq=9.0), product of:
              0.05818312 = queryWeight, product of:
                1.233962 = boost
                3.9343145 = idf(docFreq=2230, maxDocs=41962)
                0.011984671 = queryNorm
              0.46105248 = fieldWeight in 245, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                3.9343145 = idf(docFreq=2230, maxDocs=41962)
                0.0390625 = fieldNorm(doc=245)
          0.012991919 = weight(abstract_txt:about in 245) [ClassicSimilarity], result of:
            0.012991919 = score(doc=245,freq=2.0), product of:
              0.059240464 = queryWeight, product of:
                1.2451239 = boost
                3.9699023 = idf(docFreq=2152, maxDocs=41962)
                0.011984671 = queryNorm
              0.2193082 = fieldWeight in 245, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.9699023 = idf(docFreq=2152, maxDocs=41962)
                0.0390625 = fieldNorm(doc=245)
          0.025380943 = weight(abstract_txt:concern in 245) [ClassicSimilarity], result of:
            0.025380943 = score(doc=245,freq=1.0), product of:
              0.10189535 = queryWeight, product of:
                1.3333215 = boost
                6.376662 = idf(docFreq=193, maxDocs=41962)
                0.011984671 = queryNorm
              0.24908835 = fieldWeight in 245, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.376662 = idf(docFreq=193, maxDocs=41962)
                0.0390625 = fieldNorm(doc=245)
          0.22561373 = weight(abstract_txt:feeds in 245) [ClassicSimilarity], result of:
            0.22561373 = score(doc=245,freq=1.0), product of:
              0.6638687 = queryWeight, product of:
                6.36697 = boost
                8.700081 = idf(docFreq=18, maxDocs=41962)
                0.011984671 = queryNorm
              0.3398469 = fieldWeight in 245, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.700081 = idf(docFreq=18, maxDocs=41962)
                0.0390625 = fieldNorm(doc=245)
        0.32 = coord(8/25)
    
  4. Nomoto, T.: Discriminative sentence compression with conditional random fields (2007) 0.09
    0.09198129 = sum of:
      0.09198129 = product of:
        0.5748831 = sum of:
          0.008039986 = weight(abstract_txt:information in 2946) [ClassicSimilarity], result of:
            0.008039986 = score(doc=2946,freq=2.0), product of:
              0.02982859 = queryWeight, product of:
                1.0202087 = boost
                2.439594 = idf(docFreq=9945, maxDocs=41962)
                0.011984671 = queryNorm
              0.2695396 = fieldWeight in 2946, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.439594 = idf(docFreq=9945, maxDocs=41962)
                0.078125 = fieldNorm(doc=2946)
          0.019948563 = weight(abstract_txt:data in 2946) [ClassicSimilarity], result of:
            0.019948563 = score(doc=2946,freq=3.0), product of:
              0.043390393 = queryWeight, product of:
                1.0656145 = boost
                3.3975618 = idf(docFreq=3815, maxDocs=41962)
                0.011984671 = queryNorm
              0.45974606 = fieldWeight in 2946, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.3975618 = idf(docFreq=3815, maxDocs=41962)
                0.078125 = fieldNorm(doc=2946)
          0.09566712 = weight(abstract_txt:issue in 2946) [ClassicSimilarity], result of:
            0.09566712 = score(doc=2946,freq=1.0), product of:
              0.23604436 = queryWeight, product of:
                3.7965448 = boost
                5.18775 = idf(docFreq=636, maxDocs=41962)
                0.011984671 = queryNorm
              0.40529296 = fieldWeight in 2946, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.18775 = idf(docFreq=636, maxDocs=41962)
                0.078125 = fieldNorm(doc=2946)
          0.45122746 = weight(abstract_txt:feeds in 2946) [ClassicSimilarity], result of:
            0.45122746 = score(doc=2946,freq=1.0), product of:
              0.6638687 = queryWeight, product of:
                6.36697 = boost
                8.700081 = idf(docFreq=18, maxDocs=41962)
                0.011984671 = queryNorm
              0.6796938 = fieldWeight in 2946, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.700081 = idf(docFreq=18, maxDocs=41962)
                0.078125 = fieldNorm(doc=2946)
        0.16 = coord(4/25)
    
  5. Otterbacher, J.; Radev, D.: Exploring fact-focused relevance and novelty detection (2008) 0.09
    0.09131782 = sum of:
      0.09131782 = product of:
        0.32613507 = sum of:
          0.007877545 = weight(abstract_txt:information in 4211) [ClassicSimilarity], result of:
            0.007877545 = score(doc=4211,freq=3.0), product of:
              0.02982859 = queryWeight, product of:
                1.0202087 = boost
                2.439594 = idf(docFreq=9945, maxDocs=41962)
                0.011984671 = queryNorm
              0.2640938 = fieldWeight in 4211, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.439594 = idf(docFreq=9945, maxDocs=41962)
                0.0625 = fieldNorm(doc=4211)
          0.019709392 = weight(abstract_txt:identify in 4211) [ClassicSimilarity], result of:
            0.019709392 = score(doc=4211,freq=1.0), product of:
              0.06292907 = queryWeight, product of:
                1.047812 = boost
                5.0112014 = idf(docFreq=759, maxDocs=41962)
                0.011984671 = queryNorm
              0.3132001 = fieldWeight in 4211, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0112014 = idf(docFreq=759, maxDocs=41962)
                0.0625 = fieldNorm(doc=4211)
          0.04060951 = weight(abstract_txt:concern in 4211) [ClassicSimilarity], result of:
            0.04060951 = score(doc=4211,freq=1.0), product of:
              0.10189535 = queryWeight, product of:
                1.3333215 = boost
                6.376662 = idf(docFreq=193, maxDocs=41962)
                0.011984671 = queryNorm
              0.39854136 = fieldWeight in 4211, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.376662 = idf(docFreq=193, maxDocs=41962)
                0.0625 = fieldNorm(doc=4211)
          0.033899996 = weight(abstract_txt:relevant in 4211) [ClassicSimilarity], result of:
            0.033899996 = score(doc=4211,freq=2.0), product of:
              0.082077235 = queryWeight, product of:
                1.4655974 = boost
                4.672851 = idf(docFreq=1065, maxDocs=41962)
                0.011984671 = queryNorm
              0.4130256 = fieldWeight in 4211, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.672851 = idf(docFreq=1065, maxDocs=41962)
                0.0625 = fieldNorm(doc=4211)
          0.02465619 = weight(abstract_txt:given in 4211) [ClassicSimilarity], result of:
            0.02465619 = score(doc=4211,freq=1.0), product of:
              0.08363414 = queryWeight, product of:
                1.4794323 = boost
                4.716962 = idf(docFreq=1019, maxDocs=41962)
                0.011984671 = queryNorm
              0.29481012 = fieldWeight in 4211, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.716962 = idf(docFreq=1019, maxDocs=41962)
                0.0625 = fieldNorm(doc=4211)
          0.0765337 = weight(abstract_txt:issue in 4211) [ClassicSimilarity], result of:
            0.0765337 = score(doc=4211,freq=1.0), product of:
              0.23604436 = queryWeight, product of:
                3.7965448 = boost
                5.18775 = idf(docFreq=636, maxDocs=41962)
                0.011984671 = queryNorm
              0.32423437 = fieldWeight in 4211, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.18775 = idf(docFreq=636, maxDocs=41962)
                0.0625 = fieldNorm(doc=4211)
          0.12284875 = weight(abstract_txt:broad in 4211) [ClassicSimilarity], result of:
            0.12284875 = score(doc=4211,freq=1.0), product of:
              0.33832675 = queryWeight, product of:
                4.8591003 = boost
                5.809709 = idf(docFreq=341, maxDocs=41962)
                0.011984671 = queryNorm
              0.36310682 = fieldWeight in 4211, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.809709 = idf(docFreq=341, maxDocs=41962)
                0.0625 = fieldNorm(doc=4211)
        0.28 = coord(7/25)