Document (#42816)

Author
Phan, M.C.
Sun, A.
Title
Collective named entity recognition in user comments via parameterized label propagation
Source
Journal of the Association for Information Science and Technology. 71(2020) no.5, S.568-577
Year
2020
Abstract
Named entity recognition (NER) in the past has focused on extracting mentions in a local region, within a sentence or short paragraph. When dealing with user-generated text, the diverse and informal writing style makes traditional approaches much less effective. On the other hand, in many types of text on social media such as user comments, tweets, or question-answer posts, the contextual connections between documents do exist. Examples include posts in a thread discussing the same topic, tweets that share a hashtag about the same entity. Our idea in this work is utilizing the related contexts across documents to perform mention recognition in a collective manner. Intuitively, within a mention coreference graph, the labels of mentions are expected to propagate from more confidence cases to less confidence ones. To this end, we propose a novel semisupervised inference algorithm named parameterized label propagation. In our model, the propagation weights between mentions are learned by an attention-like mechanism, given their local contexts and the initial labels as input. We study the performance of our approach in the Yahoo! News data set, where comments and articles within a thread share similar context. The results show that our model significantly outperforms all other noncollective NER baselines.
Content
https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.24282.

Similar documents (content)

  1. Gao, N.; Dredze, M.; Oard, D.W.: Person entity linking in email with NIL detection (2017) 0.14
    0.13741623 = sum of:
      0.13741623 = product of:
        0.6870811 = sum of:
          0.098244205 = weight(abstract_txt:posts in 3830) [ClassicSimilarity], result of:
            0.098244205 = score(doc=3830,freq=1.0), product of:
              0.20809622 = queryWeight, product of:
                1.6678945 = boost
                7.5537524 = idf(docFreq=62, maxDocs=44218)
                0.016517065 = queryNorm
              0.47210953 = fieldWeight in 3830, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5537524 = idf(docFreq=62, maxDocs=44218)
                0.0625 = fieldNorm(doc=3830)
          0.10438246 = weight(abstract_txt:mention in 3830) [ClassicSimilarity], result of:
            0.10438246 = score(doc=3830,freq=1.0), product of:
              0.21667622 = queryWeight, product of:
                1.7019316 = boost
                7.7079034 = idf(docFreq=53, maxDocs=44218)
                0.016517065 = queryNorm
              0.48174396 = fieldWeight in 3830, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7079034 = idf(docFreq=53, maxDocs=44218)
                0.0625 = fieldNorm(doc=3830)
          0.22365622 = weight(abstract_txt:entity in 3830) [ClassicSimilarity], result of:
            0.22365622 = score(doc=3830,freq=7.0), product of:
              0.21549869 = queryWeight, product of:
                2.0787604 = boost
                6.2763524 = idf(docFreq=225, maxDocs=44218)
                0.016517065 = queryNorm
              1.0378542 = fieldWeight in 3830, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.2763524 = idf(docFreq=225, maxDocs=44218)
                0.0625 = fieldNorm(doc=3830)
          0.10853903 = weight(abstract_txt:named in 3830) [ClassicSimilarity], result of:
            0.10853903 = score(doc=3830,freq=1.0), product of:
              0.25457394 = queryWeight, product of:
                2.259379 = boost
                6.82169 = idf(docFreq=130, maxDocs=44218)
                0.016517065 = queryNorm
              0.42635563 = fieldWeight in 3830, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.82169 = idf(docFreq=130, maxDocs=44218)
                0.0625 = fieldNorm(doc=3830)
          0.1522592 = weight(abstract_txt:mentions in 3830) [ClassicSimilarity], result of:
            0.1522592 = score(doc=3830,freq=1.0), product of:
              0.31901592 = queryWeight, product of:
                2.5292296 = boost
                7.636444 = idf(docFreq=57, maxDocs=44218)
                0.016517065 = queryNorm
              0.47727776 = fieldWeight in 3830, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.636444 = idf(docFreq=57, maxDocs=44218)
                0.0625 = fieldNorm(doc=3830)
        0.2 = coord(5/25)
    
  2. Ebrahimi, M.; ShafieiBavani, E.; Wong, R.; Chen, F.: Twitter user geolocation by filtering of highly mentioned users (2018) 0.12
    0.11501455 = sum of:
      0.11501455 = product of:
        0.57507277 = sum of:
          0.057412047 = weight(abstract_txt:local in 4286) [ClassicSimilarity], result of:
            0.057412047 = score(doc=4286,freq=2.0), product of:
              0.09948973 = queryWeight, product of:
                1.1532557 = boost
                5.2229967 = idf(docFreq=647, maxDocs=44218)
                0.016517065 = queryNorm
              0.57706505 = fieldWeight in 4286, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.2229967 = idf(docFreq=647, maxDocs=44218)
                0.078125 = fieldNorm(doc=4286)
          0.036998004 = weight(abstract_txt:user in 4286) [ClassicSimilarity], result of:
            0.036998004 = score(doc=4286,freq=3.0), product of:
              0.074226975 = queryWeight, product of:
                1.2200089 = boost
                3.6835442 = idf(docFreq=3020, maxDocs=44218)
                0.016517065 = queryNorm
              0.49844417 = fieldWeight in 4286, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.6835442 = idf(docFreq=3020, maxDocs=44218)
                0.078125 = fieldNorm(doc=4286)
          0.10772589 = weight(abstract_txt:label in 4286) [ClassicSimilarity], result of:
            0.10772589 = score(doc=4286,freq=1.0), product of:
              0.1906922 = queryWeight, product of:
                1.5966251 = boost
                7.230979 = idf(docFreq=86, maxDocs=44218)
                0.016517065 = queryNorm
              0.56492025 = fieldWeight in 4286, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.230979 = idf(docFreq=86, maxDocs=44218)
                0.078125 = fieldNorm(doc=4286)
          0.12280525 = weight(abstract_txt:posts in 4286) [ClassicSimilarity], result of:
            0.12280525 = score(doc=4286,freq=1.0), product of:
              0.20809622 = queryWeight, product of:
                1.6678945 = boost
                7.5537524 = idf(docFreq=62, maxDocs=44218)
                0.016517065 = queryNorm
              0.5901369 = fieldWeight in 4286, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5537524 = idf(docFreq=62, maxDocs=44218)
                0.078125 = fieldNorm(doc=4286)
          0.25013158 = weight(abstract_txt:propagation in 4286) [ClassicSimilarity], result of:
            0.25013158 = score(doc=4286,freq=1.0), product of:
              0.38276216 = queryWeight, product of:
                2.7704263 = boost
                8.364683 = idf(docFreq=27, maxDocs=44218)
                0.016517065 = queryNorm
              0.6534909 = fieldWeight in 4286, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.364683 = idf(docFreq=27, maxDocs=44218)
                0.078125 = fieldNorm(doc=4286)
        0.2 = coord(5/25)
    
  3. Berg, A.; Nelimarkka, M.: Do you see what I see? : measuring the semantic differences in image-recognition services' outputs (2023) 0.11
    0.10603695 = sum of:
      0.10603695 = product of:
        0.53018475 = sum of:
          0.04202195 = weight(abstract_txt:less in 1070) [ClassicSimilarity], result of:
            0.04202195 = score(doc=1070,freq=1.0), product of:
              0.10180529 = queryWeight, product of:
                1.1665992 = boost
                5.283428 = idf(docFreq=609, maxDocs=44218)
                0.016517065 = queryNorm
              0.41276783 = fieldWeight in 1070, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.283428 = idf(docFreq=609, maxDocs=44218)
                0.078125 = fieldNorm(doc=1070)
          0.021360807 = weight(abstract_txt:user in 1070) [ClassicSimilarity], result of:
            0.021360807 = score(doc=1070,freq=1.0), product of:
              0.074226975 = queryWeight, product of:
                1.2200089 = boost
                3.6835442 = idf(docFreq=3020, maxDocs=44218)
                0.016517065 = queryNorm
              0.2877769 = fieldWeight in 1070, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6835442 = idf(docFreq=3020, maxDocs=44218)
                0.078125 = fieldNorm(doc=1070)
          0.10772589 = weight(abstract_txt:label in 1070) [ClassicSimilarity], result of:
            0.10772589 = score(doc=1070,freq=1.0), product of:
              0.1906922 = queryWeight, product of:
                1.5966251 = boost
                7.230979 = idf(docFreq=86, maxDocs=44218)
                0.016517065 = queryNorm
              0.56492025 = fieldWeight in 1070, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.230979 = idf(docFreq=86, maxDocs=44218)
                0.078125 = fieldNorm(doc=1070)
          0.18931638 = weight(abstract_txt:labels in 1070) [ClassicSimilarity], result of:
            0.18931638 = score(doc=1070,freq=3.0), product of:
              0.19254752 = queryWeight, product of:
                1.6043733 = boost
                7.2660704 = idf(docFreq=83, maxDocs=44218)
                0.016517065 = queryNorm
              0.983219 = fieldWeight in 1070, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.2660704 = idf(docFreq=83, maxDocs=44218)
                0.078125 = fieldNorm(doc=1070)
          0.16975974 = weight(abstract_txt:recognition in 1070) [ClassicSimilarity], result of:
            0.16975974 = score(doc=1070,freq=3.0), product of:
              0.20495854 = queryWeight, product of:
                2.0272865 = boost
                6.1209383 = idf(docFreq=263, maxDocs=44218)
                0.016517065 = queryNorm
              0.82826376 = fieldWeight in 1070, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.1209383 = idf(docFreq=263, maxDocs=44218)
                0.078125 = fieldNorm(doc=1070)
        0.2 = coord(5/25)
    
  4. Pereira, D.A.; Ribeiro-Neto, B.; Ziviani, N.; Laender, A.H.F.; Gonçalves, M.A.: ¬A generic Web-based entity resolution framework (2011) 0.10
    0.101760045 = sum of:
      0.101760045 = product of:
        0.5088002 = sum of:
          0.04207946 = weight(abstract_txt:same in 4450) [ClassicSimilarity], result of:
            0.04207946 = score(doc=4450,freq=3.0), product of:
              0.081984654 = queryWeight, product of:
                1.0468941 = boost
                4.7412944 = idf(docFreq=1048, maxDocs=44218)
                0.016517065 = queryNorm
              0.5132602 = fieldWeight in 4450, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.7412944 = idf(docFreq=1048, maxDocs=44218)
                0.0625 = fieldNorm(doc=4450)
          0.050335906 = weight(abstract_txt:share in 4450) [ClassicSimilarity], result of:
            0.050335906 = score(doc=4450,freq=1.0), product of:
              0.13324313 = queryWeight, product of:
                1.3346238 = boost
                6.044398 = idf(docFreq=284, maxDocs=44218)
                0.016517065 = queryNorm
              0.37777486 = fieldWeight in 4450, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.044398 = idf(docFreq=284, maxDocs=44218)
                0.0625 = fieldNorm(doc=4450)
          0.12187792 = weight(abstract_txt:label in 4450) [ClassicSimilarity], result of:
            0.12187792 = score(doc=4450,freq=2.0), product of:
              0.1906922 = queryWeight, product of:
                1.5966251 = boost
                7.230979 = idf(docFreq=86, maxDocs=44218)
                0.016517065 = queryNorm
              0.6391343 = fieldWeight in 4450, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.230979 = idf(docFreq=86, maxDocs=44218)
                0.0625 = fieldNorm(doc=4450)
          0.08744149 = weight(abstract_txt:labels in 4450) [ClassicSimilarity], result of:
            0.08744149 = score(doc=4450,freq=1.0), product of:
              0.19254752 = queryWeight, product of:
                1.6043733 = boost
                7.2660704 = idf(docFreq=83, maxDocs=44218)
                0.016517065 = queryNorm
              0.4541294 = fieldWeight in 4450, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2660704 = idf(docFreq=83, maxDocs=44218)
                0.0625 = fieldNorm(doc=4450)
          0.20706543 = weight(abstract_txt:entity in 4450) [ClassicSimilarity], result of:
            0.20706543 = score(doc=4450,freq=6.0), product of:
              0.21549869 = queryWeight, product of:
                2.0787604 = boost
                6.2763524 = idf(docFreq=225, maxDocs=44218)
                0.016517065 = queryNorm
              0.96086633 = fieldWeight in 4450, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.2763524 = idf(docFreq=225, maxDocs=44218)
                0.0625 = fieldNorm(doc=4450)
        0.2 = coord(5/25)
    
  5. Billal, B.; Fonseca, A.; Sadat, F.; Lounis, H.: Semi-supervised learning and social media text analysis towards multi-labeling categorization (2017) 0.10
    0.096672826 = sum of:
      0.096672826 = product of:
        0.4833641 = sum of:
          0.09245732 = weight(abstract_txt:hashtag in 4095) [ClassicSimilarity], result of:
            0.09245732 = score(doc=4095,freq=1.0), product of:
              0.17338242 = queryWeight, product of:
                1.0765247 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.016517065 = queryNorm
              0.5332566 = fieldWeight in 4095, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4095)
          0.15081623 = weight(abstract_txt:label in 4095) [ClassicSimilarity], result of:
            0.15081623 = score(doc=4095,freq=4.0), product of:
              0.1906922 = queryWeight, product of:
                1.5966251 = boost
                7.230979 = idf(docFreq=86, maxDocs=44218)
                0.016517065 = queryNorm
              0.7908883 = fieldWeight in 4095, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.230979 = idf(docFreq=86, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4095)
          0.0765113 = weight(abstract_txt:labels in 4095) [ClassicSimilarity], result of:
            0.0765113 = score(doc=4095,freq=1.0), product of:
              0.19254752 = queryWeight, product of:
                1.6043733 = boost
                7.2660704 = idf(docFreq=83, maxDocs=44218)
                0.016517065 = queryNorm
              0.39736322 = fieldWeight in 4095, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2660704 = idf(docFreq=83, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4095)
          0.06860758 = weight(abstract_txt:recognition in 4095) [ClassicSimilarity], result of:
            0.06860758 = score(doc=4095,freq=1.0), product of:
              0.20495854 = queryWeight, product of:
                2.0272865 = boost
                6.1209383 = idf(docFreq=263, maxDocs=44218)
                0.016517065 = queryNorm
              0.33473882 = fieldWeight in 4095, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1209383 = idf(docFreq=263, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4095)
          0.09497166 = weight(abstract_txt:named in 4095) [ClassicSimilarity], result of:
            0.09497166 = score(doc=4095,freq=1.0), product of:
              0.25457394 = queryWeight, product of:
                2.259379 = boost
                6.82169 = idf(docFreq=130, maxDocs=44218)
                0.016517065 = queryNorm
              0.37306118 = fieldWeight in 4095, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.82169 = idf(docFreq=130, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4095)
        0.2 = coord(5/25)