Document (#40687)

Author
Li, C.
Sun, A.
Title
Extracting fine-grained location with temporal awareness in tweets : a two-stage approach
Source
Journal of the Association for Information Science and Technology. 68(2017) no.7, S.1652-1670
Year
2017
Abstract
Twitter has attracted billions of users for life logging and sharing activities and opinions. In their tweets, users often reveal their location information and short-term visiting histories or plans. Capturing user's short-term activities could benefit many applications for providing the right context at the right time and location. In this paper we are interested in extracting locations mentioned in tweets at fine-grained granularity, with temporal awareness. Specifically, we recognize the points-of-interest (POIs) mentioned in a tweet and predict whether the user has visited, is currently at, or will soon visit the mentioned POIs. A POI can be a restaurant, a shopping mall, a bookstore, or any other fine-grained location. Our proposed framework, named TS-Petar (Two-Stage POI Extractor with Temporal Awareness), consists of two main components: a POI inventory and a two-stage time-aware POI tagger. The POI inventory is built by exploiting the crowd wisdom of the Foursquare community. It contains both POIs' formal names and their informal abbreviations, commonly observed in Foursquare check-ins. The time-aware POI tagger, based on the Conditional Random Field (CRF) model, is devised to disambiguate the POI mentions and to resolve their associated temporal awareness accordingly. Three sets of contextual features (linguistic, temporal, and inventory features) and two labeling schema features (OP and BILOU schemas) are explored for the time-aware POI extraction task. Our empirical study shows that the subtask of POI disambiguation and the subtask of temporal awareness resolution call for different feature settings for best performance. We have also evaluated the proposed TS-Petar against several strong baseline methods. The experimental results demonstrate that the two-stage approach achieves the best accuracy and outperforms all baseline methods in terms of both effectiveness and efficiency.
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23816/full.
Theme
Informetrie
Object
Twitter

Similar documents (content)

  1. Naaman, M.; Becker, H.; Gravano, L.: Hip and trendy : characterizing emerging trends on Twitter (2011) 0.17
    0.16542244 = sum of:
      0.16542244 = product of:
        0.59079444 = sum of:
          0.024701945 = weight(abstract_txt:activities in 4448) [ClassicSimilarity], result of:
            0.024701945 = score(doc=4448,freq=1.0), product of:
              0.078134865 = queryWeight, product of:
                1.0879285 = boost
                5.0583196 = idf(docFreq=763, maxDocs=44218)
                0.014198361 = queryNorm
              0.31614497 = fieldWeight in 4448, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0583196 = idf(docFreq=763, maxDocs=44218)
                0.0625 = fieldNorm(doc=4448)
          0.037180904 = weight(abstract_txt:short in 4448) [ClassicSimilarity], result of:
            0.037180904 = score(doc=4448,freq=1.0), product of:
              0.10262127 = queryWeight, product of:
                1.2467996 = boost
                5.79699 = idf(docFreq=364, maxDocs=44218)
                0.014198361 = queryNorm
              0.36231187 = fieldWeight in 4448, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.79699 = idf(docFreq=364, maxDocs=44218)
                0.0625 = fieldNorm(doc=4448)
          0.012039187 = weight(abstract_txt:their in 4448) [ClassicSimilarity], result of:
            0.012039187 = score(doc=4448,freq=1.0), product of:
              0.06096757 = queryWeight, product of:
                1.3590717 = boost
                3.1594994 = idf(docFreq=5101, maxDocs=44218)
                0.014198361 = queryNorm
              0.19746871 = fieldWeight in 4448, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1594994 = idf(docFreq=5101, maxDocs=44218)
                0.0625 = fieldNorm(doc=4448)
          0.037865408 = weight(abstract_txt:features in 4448) [ClassicSimilarity], result of:
            0.037865408 = score(doc=4448,freq=2.0), product of:
              0.09437847 = queryWeight, product of:
                1.4644011 = boost
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.014198361 = queryNorm
              0.4012081 = fieldWeight in 4448, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.0625 = fieldNorm(doc=4448)
          0.038536973 = weight(abstract_txt:time in 4448) [ClassicSimilarity], result of:
            0.038536973 = score(doc=4448,freq=2.0), product of:
              0.105101556 = queryWeight, product of:
                1.7844218 = boost
                4.148331 = idf(docFreq=1897, maxDocs=44218)
                0.014198361 = queryNorm
              0.36666414 = fieldWeight in 4448, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.148331 = idf(docFreq=1897, maxDocs=44218)
                0.0625 = fieldNorm(doc=4448)
          0.17433448 = weight(abstract_txt:awareness in 4448) [ClassicSimilarity], result of:
            0.17433448 = score(doc=4448,freq=2.0), product of:
              0.30968225 = queryWeight, product of:
                3.4245706 = boost
                6.369011 = idf(docFreq=205, maxDocs=44218)
                0.014198361 = queryNorm
              0.5629463 = fieldWeight in 4448, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.369011 = idf(docFreq=205, maxDocs=44218)
                0.0625 = fieldNorm(doc=4448)
          0.26613557 = weight(abstract_txt:temporal in 4448) [ClassicSimilarity], result of:
            0.26613557 = score(doc=4448,freq=2.0), product of:
              0.43630463 = queryWeight, product of:
                4.452803 = boost
                6.901097 = idf(docFreq=120, maxDocs=44218)
                0.014198361 = queryNorm
              0.60997653 = fieldWeight in 4448, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.901097 = idf(docFreq=120, maxDocs=44218)
                0.0625 = fieldNorm(doc=4448)
        0.28 = coord(7/25)
    
  2. Toutanova, K.; Klein, D.; Manning, C.D.; Singer, Y.: Feature-rich Part-of-Speech Tagging with a cyclic dependency network (2003) 0.15
    0.14563662 = sum of:
      0.14563662 = product of:
        0.7281831 = sum of:
          0.03613139 = weight(abstract_txt:best in 1059) [ClassicSimilarity], result of:
            0.03613139 = score(doc=1059,freq=1.0), product of:
              0.076833926 = queryWeight, product of:
                1.0788336 = boost
                5.0160327 = idf(docFreq=796, maxDocs=44218)
                0.014198361 = queryNorm
              0.47025305 = fieldWeight in 1059, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0160327 = idf(docFreq=796, maxDocs=44218)
                0.09375 = fieldNorm(doc=1059)
          0.056798108 = weight(abstract_txt:features in 1059) [ClassicSimilarity], result of:
            0.056798108 = score(doc=1059,freq=2.0), product of:
              0.09437847 = queryWeight, product of:
                1.4644011 = boost
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.014198361 = queryNorm
              0.6018121 = fieldWeight in 1059, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.09375 = fieldNorm(doc=1059)
          0.26671612 = weight(abstract_txt:tagger in 1059) [ClassicSimilarity], result of:
            0.26671612 = score(doc=1059,freq=2.0), product of:
              0.23119907 = queryWeight, product of:
                1.8714188 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.014198361 = queryNorm
              1.1536211 = fieldWeight in 1059, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.09375 = fieldNorm(doc=1059)
          0.15935345 = weight(abstract_txt:fine in 1059) [ClassicSimilarity], result of:
            0.15935345 = score(doc=1059,freq=1.0), product of:
              0.23653822 = queryWeight, product of:
                2.3183246 = boost
                7.1860275 = idf(docFreq=90, maxDocs=44218)
                0.014198361 = queryNorm
              0.6736901 = fieldWeight in 1059, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1860275 = idf(docFreq=90, maxDocs=44218)
                0.09375 = fieldNorm(doc=1059)
          0.20918405 = weight(abstract_txt:grained in 1059) [ClassicSimilarity], result of:
            0.20918405 = score(doc=1059,freq=1.0), product of:
              0.28358248 = queryWeight, product of:
                2.5384188 = boost
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.014198361 = queryNorm
              0.737648 = fieldWeight in 1059, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.09375 = fieldNorm(doc=1059)
        0.2 = coord(5/25)
    
  3. Ma, X.; Xue, P.; Matta, N.; Chen, Q.: Fine-grained ontology reconstruction for crisis knowledge based on integrated analysis of temporal-spatial factors (2021) 0.14
    0.14147411 = sum of:
      0.14147411 = product of:
        0.8842132 = sum of:
          0.037865408 = weight(abstract_txt:features in 232) [ClassicSimilarity], result of:
            0.037865408 = score(doc=232,freq=2.0), product of:
              0.09437847 = queryWeight, product of:
                1.4644011 = boost
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.014198361 = queryNorm
              0.4012081 = fieldWeight in 232, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.0625 = fieldNorm(doc=232)
          0.18400551 = weight(abstract_txt:fine in 232) [ClassicSimilarity], result of:
            0.18400551 = score(doc=232,freq=3.0), product of:
              0.23653822 = queryWeight, product of:
                2.3183246 = boost
                7.1860275 = idf(docFreq=90, maxDocs=44218)
                0.014198361 = queryNorm
              0.7779103 = fieldWeight in 232, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.1860275 = idf(docFreq=90, maxDocs=44218)
                0.0625 = fieldNorm(doc=232)
          0.24154493 = weight(abstract_txt:grained in 232) [ClassicSimilarity], result of:
            0.24154493 = score(doc=232,freq=3.0), product of:
              0.28358248 = queryWeight, product of:
                2.5384188 = boost
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.014198361 = queryNorm
              0.85176253 = fieldWeight in 232, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.0625 = fieldNorm(doc=232)
          0.42079732 = weight(abstract_txt:temporal in 232) [ClassicSimilarity], result of:
            0.42079732 = score(doc=232,freq=5.0), product of:
              0.43630463 = queryWeight, product of:
                4.452803 = boost
                6.901097 = idf(docFreq=120, maxDocs=44218)
                0.014198361 = queryNorm
              0.96445763 = fieldWeight in 232, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.901097 = idf(docFreq=120, maxDocs=44218)
                0.0625 = fieldNorm(doc=232)
        0.16 = coord(4/25)
    
  4. Frandsen, T.F.; Wouters, P.: Turning working papers into journal articles : an exercise in microbibliometrics (2009) 0.12
    0.11699875 = sum of:
      0.11699875 = product of:
        0.4874948 = sum of:
          0.0264042 = weight(abstract_txt:term in 2757) [ClassicSimilarity], result of:
            0.0264042 = score(doc=2757,freq=1.0), product of:
              0.07039353 = queryWeight, product of:
                1.032629 = boost
                4.8012047 = idf(docFreq=987, maxDocs=44218)
                0.014198361 = queryNorm
              0.37509412 = fieldWeight in 2757, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8012047 = idf(docFreq=987, maxDocs=44218)
                0.078125 = fieldNorm(doc=2757)
          0.030109491 = weight(abstract_txt:best in 2757) [ClassicSimilarity], result of:
            0.030109491 = score(doc=2757,freq=1.0), product of:
              0.076833926 = queryWeight, product of:
                1.0788336 = boost
                5.0160327 = idf(docFreq=796, maxDocs=44218)
                0.014198361 = queryNorm
              0.39187756 = fieldWeight in 2757, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0160327 = idf(docFreq=796, maxDocs=44218)
                0.078125 = fieldNorm(doc=2757)
          0.0150489835 = weight(abstract_txt:their in 2757) [ClassicSimilarity], result of:
            0.0150489835 = score(doc=2757,freq=1.0), product of:
              0.06096757 = queryWeight, product of:
                1.3590717 = boost
                3.1594994 = idf(docFreq=5101, maxDocs=44218)
                0.014198361 = queryNorm
              0.24683589 = fieldWeight in 2757, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1594994 = idf(docFreq=5101, maxDocs=44218)
                0.078125 = fieldNorm(doc=2757)
          0.13279454 = weight(abstract_txt:fine in 2757) [ClassicSimilarity], result of:
            0.13279454 = score(doc=2757,freq=1.0), product of:
              0.23653822 = queryWeight, product of:
                2.3183246 = boost
                7.1860275 = idf(docFreq=90, maxDocs=44218)
                0.014198361 = queryNorm
              0.5614084 = fieldWeight in 2757, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1860275 = idf(docFreq=90, maxDocs=44218)
                0.078125 = fieldNorm(doc=2757)
          0.17432004 = weight(abstract_txt:grained in 2757) [ClassicSimilarity], result of:
            0.17432004 = score(doc=2757,freq=1.0), product of:
              0.28358248 = queryWeight, product of:
                2.5384188 = boost
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.014198361 = queryNorm
              0.6147067 = fieldWeight in 2757, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.078125 = fieldNorm(doc=2757)
          0.10881755 = weight(abstract_txt:stage in 2757) [ClassicSimilarity], result of:
            0.10881755 = score(doc=2757,freq=1.0), product of:
              0.22797823 = queryWeight, product of:
                2.6280863 = boost
                6.1096387 = idf(docFreq=266, maxDocs=44218)
                0.014198361 = queryNorm
              0.47731552 = fieldWeight in 2757, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1096387 = idf(docFreq=266, maxDocs=44218)
                0.078125 = fieldNorm(doc=2757)
        0.24 = coord(6/25)
    
  5. Gaines, B.R.; Chen, L.-J.; Shaw, M.L.G.: Modeling the human factors of scholarly communities supported through the Internet and World Wide Web (1997) 0.11
    0.10980177 = sum of:
      0.10980177 = product of:
        0.54900885 = sum of:
          0.0264042 = weight(abstract_txt:term in 1458) [ClassicSimilarity], result of:
            0.0264042 = score(doc=1458,freq=1.0), product of:
              0.07039353 = queryWeight, product of:
                1.032629 = boost
                4.8012047 = idf(docFreq=987, maxDocs=44218)
                0.014198361 = queryNorm
              0.37509412 = fieldWeight in 1458, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8012047 = idf(docFreq=987, maxDocs=44218)
                0.078125 = fieldNorm(doc=1458)
          0.021282477 = weight(abstract_txt:their in 1458) [ClassicSimilarity], result of:
            0.021282477 = score(doc=1458,freq=2.0), product of:
              0.06096757 = queryWeight, product of:
                1.3590717 = boost
                3.1594994 = idf(docFreq=5101, maxDocs=44218)
                0.014198361 = queryNorm
              0.34907866 = fieldWeight in 1458, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.1594994 = idf(docFreq=5101, maxDocs=44218)
                0.078125 = fieldNorm(doc=1458)
          0.048171215 = weight(abstract_txt:time in 1458) [ClassicSimilarity], result of:
            0.048171215 = score(doc=1458,freq=2.0), product of:
              0.105101556 = queryWeight, product of:
                1.7844218 = boost
                4.148331 = idf(docFreq=1897, maxDocs=44218)
                0.014198361 = queryNorm
              0.45833018 = fieldWeight in 1458, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.148331 = idf(docFreq=1897, maxDocs=44218)
                0.078125 = fieldNorm(doc=1458)
          0.2179181 = weight(abstract_txt:awareness in 1458) [ClassicSimilarity], result of:
            0.2179181 = score(doc=1458,freq=2.0), product of:
              0.30968225 = queryWeight, product of:
                3.4245706 = boost
                6.369011 = idf(docFreq=205, maxDocs=44218)
                0.014198361 = queryNorm
              0.7036829 = fieldWeight in 1458, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.369011 = idf(docFreq=205, maxDocs=44218)
                0.078125 = fieldNorm(doc=1458)
          0.23523286 = weight(abstract_txt:temporal in 1458) [ClassicSimilarity], result of:
            0.23523286 = score(doc=1458,freq=1.0), product of:
              0.43630463 = queryWeight, product of:
                4.452803 = boost
                6.901097 = idf(docFreq=120, maxDocs=44218)
                0.014198361 = queryNorm
              0.5391482 = fieldWeight in 1458, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.901097 = idf(docFreq=120, maxDocs=44218)
                0.078125 = fieldNorm(doc=1458)
        0.2 = coord(5/25)