Document (#39336)

Author
Luo, Z.
Yu, Y.
Osborne, M.
Wang, T.
Title
Structuring tweets for improving Twitter search
Source
Journal of the Association for Information Science and Technology. 66(2015) no.12, S.2522-2539
Year
2015
Abstract
Spam and wildly varying documents make searching in Twitter challenging. Most Twitter search systems generally treat a Tweet as a plain text when modeling relevance. However, a series of conventions allows users to Tweet in structural ways using a combination of different blocks of texts. These blocks include plain texts, hashtags, links, mentions, etc. Each block encodes a variety of communicative intent and the sequence of these blocks captures changing discourse. Previous work shows that exploiting the structural information can improve the structured documents (e.g., web pages) retrieval. In this study we utilize the structure of Tweets, induced by these blocks, for Twitter retrieval and Twitter opinion retrieval. For Twitter retrieval, a set of features, derived from the blocks of text and their combinations, is used into a learning-to-rank scenario. We show that structuring Tweets can achieve state-of-the-art performance. Our approach does not rely on social media features, but when we do add this additional information, performance improves significantly. For Twitter opinion retrieval, we explore the question of whether structural information derived from the body of Tweets and opinionatedness ratings of Tweets can improve performance. Experimental results show that retrieval using a novel unsupervised opinionatedness feature based on structuring Tweets achieves comparable performance with a supervised method using manually tagged Tweets. Topic-related specific structured Tweet sets are shown to help with query-dependent opinion retrieval.
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23332/abstract.
Theme
Internet
Computerlinguistik
Object
Twitter

Similar documents (author)

  1. Wang, H.; Wang, C.: Ontologies for universal information systems (1995) 4.64
    4.63939 = sum of:
      4.63939 = weight(author_txt:wang in 3194) [ClassicSimilarity], result of:
        4.63939 = fieldWeight in 3194, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.5610886 = idf(docFreq=169, maxDocs=44218)
          0.5 = fieldNorm(doc=3194)
    
  2. Wang, F.; Wang, X.: Tracing theory diffusion : a text mining and citation-based analysis of TAM (2020) 4.64
    4.63939 = sum of:
      4.63939 = weight(author_txt:wang in 5980) [ClassicSimilarity], result of:
        4.63939 = fieldWeight in 5980, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.5610886 = idf(docFreq=169, maxDocs=44218)
          0.5 = fieldNorm(doc=5980)
    
  3. Wang, C.: ¬The online catalogue, subject access and user reactions : a review (1985) 4.10
    4.1006804 = sum of:
      4.1006804 = weight(author_txt:wang in 986) [ClassicSimilarity], result of:
        4.1006804 = fieldWeight in 986, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          6.5610886 = idf(docFreq=169, maxDocs=44218)
          0.625 = fieldNorm(doc=986)
    
  4. Wang, C.: Bibliometrics : a textbook (1990) 4.10
    4.1006804 = sum of:
      4.1006804 = weight(author_txt:wang in 5040) [ClassicSimilarity], result of:
        4.1006804 = fieldWeight in 5040, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          6.5610886 = idf(docFreq=169, maxDocs=44218)
          0.625 = fieldNorm(doc=5040)
    
  5. Wang, P.: Users' information needs at different stages of a research project : a cognitive view (1997) 4.10
    4.1006804 = sum of:
      4.1006804 = weight(author_txt:wang in 320) [ClassicSimilarity], result of:
        4.1006804 = fieldWeight in 320, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          6.5610886 = idf(docFreq=169, maxDocs=44218)
          0.625 = fieldNorm(doc=320)
    

Similar documents (content)

  1. Sedhai, S.; Sun, A.: ¬An analysis of 14 Million tweets on hashtag-oriented spamming* (2017) 0.40
    0.40323022 = sum of:
      0.40323022 = product of:
        1.680126 = sum of:
          0.12836048 = weight(abstract_txt:spam in 3683) [ClassicSimilarity], result of:
            0.12836048 = score(doc=3683,freq=6.0), product of:
              0.098422766 = queryWeight, product of:
                1.0533029 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.010968877 = queryNorm
              1.3041747 = fieldWeight in 3683, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.0625 = fieldNorm(doc=3683)
          0.14434084 = weight(abstract_txt:hashtags in 3683) [ClassicSimilarity], result of:
            0.14434084 = score(doc=3683,freq=5.0), product of:
              0.11309965 = queryWeight, product of:
                1.1291097 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.010968877 = queryNorm
              1.2762271 = fieldWeight in 3683, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.0625 = fieldNorm(doc=3683)
          0.010561819 = weight(abstract_txt:using in 3683) [ClassicSimilarity], result of:
            0.010561819 = score(doc=3683,freq=1.0), product of:
              0.0487968 = queryWeight, product of:
                1.2845819 = boost
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.010968877 = queryNorm
              0.21644491 = fieldWeight in 3683, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.0625 = fieldNorm(doc=3683)
          0.2722937 = weight(abstract_txt:tweet in 3683) [ClassicSimilarity], result of:
            0.2722937 = score(doc=3683,freq=3.0), product of:
              0.2952683 = queryWeight, product of:
                3.1599088 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.010968877 = queryNorm
              0.9221907 = fieldWeight in 3683, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.0625 = fieldNorm(doc=3683)
          0.43923363 = weight(abstract_txt:twitter in 3683) [ClassicSimilarity], result of:
            0.43923363 = score(doc=3683,freq=5.0), product of:
              0.4543231 = queryWeight, product of:
                5.987382 = boost
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.010968877 = queryNorm
              0.96678686 = fieldWeight in 3683, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.0625 = fieldNorm(doc=3683)
          0.68533546 = weight(abstract_txt:tweets in 3683) [ClassicSimilarity], result of:
            0.68533546 = score(doc=3683,freq=7.0), product of:
              0.54633695 = queryWeight, product of:
                6.5657573 = boost
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.010968877 = queryNorm
              1.254419 = fieldWeight in 3683, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.0625 = fieldNorm(doc=3683)
        0.24 = coord(6/25)
    
  2. Arakawa, Y.; Kameda, A.; Aizawa, A.; Suzuki, T.: Adding Twitter-specific features to stylistic features for classifying tweets by user type and number of retweets (2014) 0.35
    0.34858567 = sum of:
      0.34858567 = product of:
        1.0893302 = sum of:
          0.01121078 = weight(abstract_txt:text in 1307) [ClassicSimilarity], result of:
            0.01121078 = score(doc=1307,freq=1.0), product of:
              0.044356674 = queryWeight, product of:
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.010968877 = queryNorm
              0.25274166 = fieldWeight in 1307, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=1307)
          0.03883695 = weight(abstract_txt:features in 1307) [ClassicSimilarity], result of:
            0.03883695 = score(doc=1307,freq=6.0), product of:
              0.05588751 = queryWeight, product of:
                1.1224781 = boost
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.010968877 = queryNorm
              0.69491285 = fieldWeight in 1307, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.0625 = fieldNorm(doc=1307)
          0.014936667 = weight(abstract_txt:using in 1307) [ClassicSimilarity], result of:
            0.014936667 = score(doc=1307,freq=2.0), product of:
              0.0487968 = queryWeight, product of:
                1.2845819 = boost
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.010968877 = queryNorm
              0.30609933 = fieldWeight in 1307, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.0625 = fieldNorm(doc=1307)
          0.030645832 = weight(abstract_txt:texts in 1307) [ClassicSimilarity], result of:
            0.030645832 = score(doc=1307,freq=1.0), product of:
              0.08671936 = queryWeight, product of:
                1.3982297 = boost
                5.6542544 = idf(docFreq=420, maxDocs=44218)
                0.010968877 = queryNorm
              0.3533909 = fieldWeight in 1307, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6542544 = idf(docFreq=420, maxDocs=44218)
                0.0625 = fieldNorm(doc=1307)
          0.04760489 = weight(abstract_txt:performance in 1307) [ClassicSimilarity], result of:
            0.04760489 = score(doc=1307,freq=2.0), product of:
              0.11631511 = queryWeight, product of:
                2.2900953 = boost
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.010968877 = queryNorm
              0.40927517 = fieldWeight in 1307, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.0625 = fieldNorm(doc=1307)
          0.15720883 = weight(abstract_txt:tweet in 1307) [ClassicSimilarity], result of:
            0.15720883 = score(doc=1307,freq=1.0), product of:
              0.2952683 = queryWeight, product of:
                3.1599088 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.010968877 = queryNorm
              0.5324271 = fieldWeight in 1307, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.0625 = fieldNorm(doc=1307)
          0.34022892 = weight(abstract_txt:twitter in 1307) [ClassicSimilarity], result of:
            0.34022892 = score(doc=1307,freq=3.0), product of:
              0.4543231 = queryWeight, product of:
                5.987382 = boost
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.010968877 = queryNorm
              0.7488699 = fieldWeight in 1307, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.0625 = fieldNorm(doc=1307)
          0.4486574 = weight(abstract_txt:tweets in 1307) [ClassicSimilarity], result of:
            0.4486574 = score(doc=1307,freq=3.0), product of:
              0.54633695 = queryWeight, product of:
                6.5657573 = boost
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.010968877 = queryNorm
              0.82121 = fieldWeight in 1307, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.0625 = fieldNorm(doc=1307)
        0.32 = coord(8/25)
    
  3. Fang, Z.; Dudek, J.; Costas, R.: ¬The stability of Twitter metrics : a study on unavailable Twitter mentions of scientific publications (2020) 0.28
    0.28391075 = sum of:
      0.28391075 = product of:
        1.1829615 = sum of:
          0.012102236 = weight(abstract_txt:when in 35) [ClassicSimilarity], result of:
            0.012102236 = score(doc=35,freq=1.0), product of:
              0.046677995 = queryWeight, product of:
                1.0258329 = boost
                4.148331 = idf(docFreq=1897, maxDocs=44218)
                0.010968877 = queryNorm
              0.2592707 = fieldWeight in 35, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.148331 = idf(docFreq=1897, maxDocs=44218)
                0.0625 = fieldNorm(doc=35)
          0.014499498 = weight(abstract_txt:show in 35) [ClassicSimilarity], result of:
            0.014499498 = score(doc=35,freq=1.0), product of:
              0.05265469 = queryWeight, product of:
                1.0895296 = boost
                4.4059124 = idf(docFreq=1466, maxDocs=44218)
                0.010968877 = queryNorm
              0.27536952 = fieldWeight in 35, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4059124 = idf(docFreq=1466, maxDocs=44218)
                0.0625 = fieldNorm(doc=35)
          0.011653583 = weight(abstract_txt:these in 35) [ClassicSimilarity], result of:
            0.011653583 = score(doc=35,freq=2.0), product of:
              0.041355047 = queryWeight, product of:
                1.1825796 = boost
                3.1881294 = idf(docFreq=4957, maxDocs=44218)
                0.010968877 = queryNorm
              0.28179348 = fieldWeight in 35, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.1881294 = idf(docFreq=4957, maxDocs=44218)
                0.0625 = fieldNorm(doc=35)
          0.15720883 = weight(abstract_txt:tweet in 35) [ClassicSimilarity], result of:
            0.15720883 = score(doc=35,freq=1.0), product of:
              0.2952683 = queryWeight, product of:
                3.1599088 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.010968877 = queryNorm
              0.5324271 = fieldWeight in 35, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.0625 = fieldNorm(doc=35)
          0.62117016 = weight(abstract_txt:twitter in 35) [ClassicSimilarity], result of:
            0.62117016 = score(doc=35,freq=10.0), product of:
              0.4543231 = queryWeight, product of:
                5.987382 = boost
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.010968877 = queryNorm
              1.3672432 = fieldWeight in 35, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.0625 = fieldNorm(doc=35)
          0.36632723 = weight(abstract_txt:tweets in 35) [ClassicSimilarity], result of:
            0.36632723 = score(doc=35,freq=2.0), product of:
              0.54633695 = queryWeight, product of:
                6.5657573 = boost
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.010968877 = queryNorm
              0.6705152 = fieldWeight in 35, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.0625 = fieldNorm(doc=35)
        0.24 = coord(6/25)
    
  4. Ma, Z.; Sun, A.; Cong, G.: On predicting the popularity of newly emerging hashtags in Twitter (2013) 0.28
    0.2779328 = sum of:
      0.2779328 = product of:
        0.99261713 = sum of:
          0.014499498 = weight(abstract_txt:show in 967) [ClassicSimilarity], result of:
            0.014499498 = score(doc=967,freq=1.0), product of:
              0.05265469 = queryWeight, product of:
                1.0895296 = boost
                4.4059124 = idf(docFreq=1466, maxDocs=44218)
                0.010968877 = queryNorm
              0.27536952 = fieldWeight in 967, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4059124 = idf(docFreq=1466, maxDocs=44218)
                0.0625 = fieldNorm(doc=967)
          0.0419487 = weight(abstract_txt:features in 967) [ClassicSimilarity], result of:
            0.0419487 = score(doc=967,freq=7.0), product of:
              0.05588751 = queryWeight, product of:
                1.1224781 = boost
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.010968877 = queryNorm
              0.75059164 = fieldWeight in 967, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.0625 = fieldNorm(doc=967)
          0.11180593 = weight(abstract_txt:hashtags in 967) [ClassicSimilarity], result of:
            0.11180593 = score(doc=967,freq=3.0), product of:
              0.11309965 = queryWeight, product of:
                1.1291097 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.010968877 = queryNorm
              0.9885613 = fieldWeight in 967, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.0625 = fieldNorm(doc=967)
          0.008240328 = weight(abstract_txt:these in 967) [ClassicSimilarity], result of:
            0.008240328 = score(doc=967,freq=1.0), product of:
              0.041355047 = queryWeight, product of:
                1.1825796 = boost
                3.1881294 = idf(docFreq=4957, maxDocs=44218)
                0.010968877 = queryNorm
              0.19925809 = fieldWeight in 967, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1881294 = idf(docFreq=4957, maxDocs=44218)
                0.0625 = fieldNorm(doc=967)
          0.010561819 = weight(abstract_txt:using in 967) [ClassicSimilarity], result of:
            0.010561819 = score(doc=967,freq=1.0), product of:
              0.0487968 = queryWeight, product of:
                1.2845819 = boost
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.010968877 = queryNorm
              0.21644491 = fieldWeight in 967, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.0625 = fieldNorm(doc=967)
          0.43923363 = weight(abstract_txt:twitter in 967) [ClassicSimilarity], result of:
            0.43923363 = score(doc=967,freq=5.0), product of:
              0.4543231 = queryWeight, product of:
                5.987382 = boost
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.010968877 = queryNorm
              0.96678686 = fieldWeight in 967, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.0625 = fieldNorm(doc=967)
          0.36632723 = weight(abstract_txt:tweets in 967) [ClassicSimilarity], result of:
            0.36632723 = score(doc=967,freq=2.0), product of:
              0.54633695 = queryWeight, product of:
                6.5657573 = boost
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.010968877 = queryNorm
              0.6705152 = fieldWeight in 967, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.0625 = fieldNorm(doc=967)
        0.28 = coord(7/25)
    
  5. Chang, H.-C.; Iyer, I.: Trends in Twitter hashtag applications : design features for value-added dimensions to future library catalogues (2012) 0.24
    0.24071705 = sum of:
      0.24071705 = product of:
        1.2035853 = sum of:
          0.019818896 = weight(abstract_txt:features in 5574) [ClassicSimilarity], result of:
            0.019818896 = score(doc=5574,freq=1.0), product of:
              0.05588751 = queryWeight, product of:
                1.1224781 = boost
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.010968877 = queryNorm
              0.35462123 = fieldWeight in 5574, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.078125 = fieldNorm(doc=5574)
          0.11411146 = weight(abstract_txt:hashtags in 5574) [ClassicSimilarity], result of:
            0.11411146 = score(doc=5574,freq=2.0), product of:
              0.11309965 = queryWeight, product of:
                1.1291097 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.010968877 = queryNorm
              1.0089462 = fieldWeight in 5574, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.078125 = fieldNorm(doc=5574)
          0.01030041 = weight(abstract_txt:these in 5574) [ClassicSimilarity], result of:
            0.01030041 = score(doc=5574,freq=1.0), product of:
              0.041355047 = queryWeight, product of:
                1.1825796 = boost
                3.1881294 = idf(docFreq=4957, maxDocs=44218)
                0.010968877 = queryNorm
              0.24907261 = fieldWeight in 5574, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1881294 = idf(docFreq=4957, maxDocs=44218)
                0.078125 = fieldNorm(doc=5574)
          0.60144544 = weight(abstract_txt:twitter in 5574) [ClassicSimilarity], result of:
            0.60144544 = score(doc=5574,freq=6.0), product of:
              0.4543231 = queryWeight, product of:
                5.987382 = boost
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.010968877 = queryNorm
              1.3238275 = fieldWeight in 5574, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.078125 = fieldNorm(doc=5574)
          0.45790902 = weight(abstract_txt:tweets in 5574) [ClassicSimilarity], result of:
            0.45790902 = score(doc=5574,freq=2.0), product of:
              0.54633695 = queryWeight, product of:
                6.5657573 = boost
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.010968877 = queryNorm
              0.83814394 = fieldWeight in 5574, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.078125 = fieldNorm(doc=5574)
        0.2 = coord(5/25)