Document (#40829)

Author
Gao, N.
Dredze, M.
Oard, D.W.
Title
Person entity linking in email with NIL detection
Source
Journal of the Association for Information Science and Technology. 68(2017) no.10, S.2412-2424
Year
2017
Abstract
For each specific mention of an entity found in a text, the goal of entity linking is to determine whether the referenced entity is present in an existing knowledge base, and if so to determine which KB entity is the correct referent. Entity linking has been well explored for dissemination-oriented sources such as news stories, blogs, and microblog posts, but the limited work to date on "conversational" sources such as email or text chat has not yet attempted to determine when the referent entity is not in the knowledge base (a task known as "NIL detection"). This article presents a supervised machine learning system for linking named mentions of people in email messages to a collection-specific knowledge base, and that is also capable of NIL detection. This system learns from manually annotated training examples to leverage a rich set of features. The entity linking accuracy for entities present in the knowledge base is substantially and significantly better than the best previously reported results on the Enron email collection, comparable accuracy is reported for the challenging NIL detection task, and these results are for the first time replicated on a second email collection from a different source with comparable results.
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23888/full.
Theme
Internet

Similar documents (author)

  1. Oard, D.W.: Serving users in many languages : cross-language information retrieval for digital libraries (1997) 5.49
    5.4946446 = sum of:
      5.4946446 = weight(author_txt:oard in 3259) [ClassicSimilarity], result of:
        5.4946446 = fieldWeight in 3259, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.791431 = idf(docFreq=17, maxDocs=43556)
          0.625 = fieldNorm(doc=3259)
    
  2. Oard, D.W.: Multilingual information access (2009) 5.49
    5.4946446 = sum of:
      5.4946446 = weight(author_txt:oard in 848) [ClassicSimilarity], result of:
        5.4946446 = fieldWeight in 848, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.791431 = idf(docFreq=17, maxDocs=43556)
          0.625 = fieldNorm(doc=848)
    
  3. Oard, D.W.: Alternative approaches for cross-language text retrieval (1997) 5.49
    5.4946446 = sum of:
      5.4946446 = weight(author_txt:oard in 3162) [ClassicSimilarity], result of:
        5.4946446 = fieldWeight in 3162, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.791431 = idf(docFreq=17, maxDocs=43556)
          0.625 = fieldNorm(doc=3162)
    
  4. Wang, J.; Oard, D.W.: Matching meaning for cross-language information retrieval (2012) 4.40
    4.3957157 = sum of:
      4.3957157 = weight(author_txt:oard in 7427) [ClassicSimilarity], result of:
        4.3957157 = fieldWeight in 7427, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.791431 = idf(docFreq=17, maxDocs=43556)
          0.5 = fieldNorm(doc=7427)
    
  5. Oard, D.W.; Resnik, P.: Support for interactive document selection in cross-language information retrieval (1999) 4.40
    4.3957157 = sum of:
      4.3957157 = weight(author_txt:oard in 6004) [ClassicSimilarity], result of:
        4.3957157 = fieldWeight in 6004, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.791431 = idf(docFreq=17, maxDocs=43556)
          0.5 = fieldNorm(doc=6004)
    

Similar documents (content)

  1. Zhao, G.; Wu, J.; Wang, D.; Li, T.: Entity disambiguation to Wikipedia using collective ranking (2016) 0.25
    0.24519463 = sum of:
      0.24519463 = product of:
        0.8756951 = sum of:
          0.057169862 = weight(abstract_txt:mention in 264) [ClassicSimilarity], result of:
            0.057169862 = score(doc=264,freq=1.0), product of:
              0.095124334 = queryWeight, product of:
                7.6928186 = idf(docFreq=53, maxDocs=43556)
                0.012365342 = queryNorm
              0.60100144 = fieldWeight in 264, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.6928186 = idf(docFreq=53, maxDocs=43556)
                0.078125 = fieldNorm(doc=264)
          0.028885104 = weight(abstract_txt:text in 264) [ClassicSimilarity], result of:
            0.028885104 = score(doc=264,freq=3.0), product of:
              0.05271478 = queryWeight, product of:
                1.0527745 = boost
                4.0494018 = idf(docFreq=2063, maxDocs=43556)
                0.012365342 = queryNorm
              0.54795074 = fieldWeight in 264, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.0494018 = idf(docFreq=2063, maxDocs=43556)
                0.078125 = fieldNorm(doc=264)
          0.029915541 = weight(abstract_txt:task in 264) [ClassicSimilarity], result of:
            0.029915541 = score(doc=264,freq=1.0), product of:
              0.07782541 = queryWeight, product of:
                1.2791748 = boost
                4.92023 = idf(docFreq=863, maxDocs=43556)
                0.012365342 = queryNorm
              0.38439298 = fieldWeight in 264, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.92023 = idf(docFreq=863, maxDocs=43556)
                0.078125 = fieldNorm(doc=264)
          0.022979662 = weight(abstract_txt:knowledge in 264) [ClassicSimilarity], result of:
            0.022979662 = score(doc=264,freq=1.0), product of:
              0.08224243 = queryWeight, product of:
                1.859654 = boost
                3.5764952 = idf(docFreq=3311, maxDocs=43556)
                0.012365342 = queryNorm
              0.2794137 = fieldWeight in 264, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5764952 = idf(docFreq=3311, maxDocs=43556)
                0.078125 = fieldNorm(doc=264)
          0.089850545 = weight(abstract_txt:base in 264) [ClassicSimilarity], result of:
            0.089850545 = score(doc=264,freq=1.0), product of:
              0.2041177 = queryWeight, product of:
                2.9297092 = boost
                5.6344304 = idf(docFreq=422, maxDocs=43556)
                0.012365342 = queryNorm
              0.44018987 = fieldWeight in 264, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6344304 = idf(docFreq=422, maxDocs=43556)
                0.078125 = fieldNorm(doc=264)
          0.14292884 = weight(abstract_txt:linking in 264) [ClassicSimilarity], result of:
            0.14292884 = score(doc=264,freq=1.0), product of:
              0.29962873 = queryWeight, product of:
                3.9685414 = boost
                6.1058536 = idf(docFreq=263, maxDocs=43556)
                0.012365342 = queryNorm
              0.47701982 = fieldWeight in 264, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1058536 = idf(docFreq=263, maxDocs=43556)
                0.078125 = fieldNorm(doc=264)
          0.50396556 = weight(abstract_txt:entity in 264) [ClassicSimilarity], result of:
            0.50396556 = score(doc=264,freq=4.0), product of:
              0.5114354 = queryWeight, product of:
                6.5583496 = boost
                6.3065243 = idf(docFreq=215, maxDocs=43556)
                0.012365342 = queryNorm
              0.9853944 = fieldWeight in 264, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.3065243 = idf(docFreq=215, maxDocs=43556)
                0.078125 = fieldNorm(doc=264)
        0.28 = coord(7/25)
    
  2. Lawrie, D.; Mayfield, J.; McNamee, P.; Oard, P.W.: Cross-language person-entity linking from 20 languages (2015) 0.24
    0.24332926 = sum of:
      0.24332926 = product of:
        1.0138719 = sum of:
          0.016095597 = weight(abstract_txt:results in 3846) [ClassicSimilarity], result of:
            0.016095597 = score(doc=3846,freq=1.0), product of:
              0.058933016 = queryWeight, product of:
                1.3633085 = boost
                3.4958951 = idf(docFreq=3589, maxDocs=43556)
                0.012365342 = queryNorm
              0.2731168 = fieldWeight in 3846, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4958951 = idf(docFreq=3589, maxDocs=43556)
                0.078125 = fieldNorm(doc=3846)
          0.049683888 = weight(abstract_txt:reported in 3846) [ClassicSimilarity], result of:
            0.049683888 = score(doc=3846,freq=1.0), product of:
              0.10914418 = queryWeight, product of:
                1.5148494 = boost
                5.8267307 = idf(docFreq=348, maxDocs=43556)
                0.012365342 = queryNorm
              0.45521334 = fieldWeight in 3846, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8267307 = idf(docFreq=348, maxDocs=43556)
                0.078125 = fieldNorm(doc=3846)
          0.076263435 = weight(abstract_txt:accuracy in 3846) [ClassicSimilarity], result of:
            0.076263435 = score(doc=3846,freq=2.0), product of:
              0.115272164 = queryWeight, product of:
                1.5567949 = boost
                5.9880705 = idf(docFreq=296, maxDocs=43556)
                0.012365342 = queryNorm
              0.66159457 = fieldWeight in 3846, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9880705 = idf(docFreq=296, maxDocs=43556)
                0.078125 = fieldNorm(doc=3846)
          0.082005695 = weight(abstract_txt:comparable in 3846) [ClassicSimilarity], result of:
            0.082005695 = score(doc=3846,freq=1.0), product of:
              0.15243553 = queryWeight, product of:
                1.7902442 = boost
                6.886012 = idf(docFreq=120, maxDocs=43556)
                0.012365342 = queryNorm
              0.5379697 = fieldWeight in 3846, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.886012 = idf(docFreq=120, maxDocs=43556)
                0.078125 = fieldNorm(doc=3846)
          0.28585768 = weight(abstract_txt:linking in 3846) [ClassicSimilarity], result of:
            0.28585768 = score(doc=3846,freq=4.0), product of:
              0.29962873 = queryWeight, product of:
                3.9685414 = boost
                6.1058536 = idf(docFreq=263, maxDocs=43556)
                0.012365342 = queryNorm
              0.95403963 = fieldWeight in 3846, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.1058536 = idf(docFreq=263, maxDocs=43556)
                0.078125 = fieldNorm(doc=3846)
          0.50396556 = weight(abstract_txt:entity in 3846) [ClassicSimilarity], result of:
            0.50396556 = score(doc=3846,freq=4.0), product of:
              0.5114354 = queryWeight, product of:
                6.5583496 = boost
                6.3065243 = idf(docFreq=215, maxDocs=43556)
                0.012365342 = queryNorm
              0.9853944 = fieldWeight in 3846, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.3065243 = idf(docFreq=215, maxDocs=43556)
                0.078125 = fieldNorm(doc=3846)
        0.24 = coord(6/25)
    
  3. Lee, D.J.L.; Stvilia, B.: Developing a data identifier taxonomy (2014) 0.14
    0.1377028 = sum of:
      0.1377028 = product of:
        0.68851393 = sum of:
          0.079223745 = weight(abstract_txt:referenced in 3974) [ClassicSimilarity], result of:
            0.079223745 = score(doc=3974,freq=1.0), product of:
              0.10470392 = queryWeight, product of:
                1.0491453 = boost
                8.070885 = idf(docFreq=36, maxDocs=43556)
                0.012365342 = queryNorm
              0.75664544 = fieldWeight in 3974, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.070885 = idf(docFreq=36, maxDocs=43556)
                0.09375 = fieldNorm(doc=3974)
          0.027575592 = weight(abstract_txt:knowledge in 3974) [ClassicSimilarity], result of:
            0.027575592 = score(doc=3974,freq=1.0), product of:
              0.08224243 = queryWeight, product of:
                1.859654 = boost
                3.5764952 = idf(docFreq=3311, maxDocs=43556)
                0.012365342 = queryNorm
              0.33529642 = fieldWeight in 3974, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5764952 = idf(docFreq=3311, maxDocs=43556)
                0.09375 = fieldNorm(doc=3974)
          0.10782066 = weight(abstract_txt:base in 3974) [ClassicSimilarity], result of:
            0.10782066 = score(doc=3974,freq=1.0), product of:
              0.2041177 = queryWeight, product of:
                2.9297092 = boost
                5.6344304 = idf(docFreq=422, maxDocs=43556)
                0.012365342 = queryNorm
              0.52822787 = fieldWeight in 3974, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6344304 = idf(docFreq=422, maxDocs=43556)
                0.09375 = fieldNorm(doc=3974)
          0.1715146 = weight(abstract_txt:linking in 3974) [ClassicSimilarity], result of:
            0.1715146 = score(doc=3974,freq=1.0), product of:
              0.29962873 = queryWeight, product of:
                3.9685414 = boost
                6.1058536 = idf(docFreq=263, maxDocs=43556)
                0.012365342 = queryNorm
              0.57242376 = fieldWeight in 3974, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1058536 = idf(docFreq=263, maxDocs=43556)
                0.09375 = fieldNorm(doc=3974)
          0.30237934 = weight(abstract_txt:entity in 3974) [ClassicSimilarity], result of:
            0.30237934 = score(doc=3974,freq=1.0), product of:
              0.5114354 = queryWeight, product of:
                6.5583496 = boost
                6.3065243 = idf(docFreq=215, maxDocs=43556)
                0.012365342 = queryNorm
              0.59123665 = fieldWeight in 3974, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3065243 = idf(docFreq=215, maxDocs=43556)
                0.09375 = fieldNorm(doc=3974)
        0.2 = coord(5/25)
    
  4. Tang, X.; Chen, L.; Cui, J.; Wei, B.: Knowledge representation learning with entity descriptions, hierarchical types, and textual relations (2019) 0.12
    0.123250656 = sum of:
      0.123250656 = product of:
        0.61625326 = sum of:
          0.020317214 = weight(abstract_txt:specific in 1387) [ClassicSimilarity], result of:
            0.020317214 = score(doc=1387,freq=1.0), product of:
              0.060131166 = queryWeight, product of:
                1.1243953 = boost
                4.3248844 = idf(docFreq=1566, maxDocs=43556)
                0.012365342 = queryNorm
              0.3378816 = fieldWeight in 1387, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3248844 = idf(docFreq=1566, maxDocs=43556)
                0.078125 = fieldNorm(doc=1387)
          0.029915541 = weight(abstract_txt:task in 1387) [ClassicSimilarity], result of:
            0.029915541 = score(doc=1387,freq=1.0), product of:
              0.07782541 = queryWeight, product of:
                1.2791748 = boost
                4.92023 = idf(docFreq=863, maxDocs=43556)
                0.012365342 = queryNorm
              0.38439298 = fieldWeight in 1387, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.92023 = idf(docFreq=863, maxDocs=43556)
                0.078125 = fieldNorm(doc=1387)
          0.016095597 = weight(abstract_txt:results in 1387) [ClassicSimilarity], result of:
            0.016095597 = score(doc=1387,freq=1.0), product of:
              0.058933016 = queryWeight, product of:
                1.3633085 = boost
                3.4958951 = idf(docFreq=3589, maxDocs=43556)
                0.012365342 = queryNorm
              0.2731168 = fieldWeight in 1387, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4958951 = idf(docFreq=3589, maxDocs=43556)
                0.078125 = fieldNorm(doc=1387)
          0.045959324 = weight(abstract_txt:knowledge in 1387) [ClassicSimilarity], result of:
            0.045959324 = score(doc=1387,freq=4.0), product of:
              0.08224243 = queryWeight, product of:
                1.859654 = boost
                3.5764952 = idf(docFreq=3311, maxDocs=43556)
                0.012365342 = queryNorm
              0.5588274 = fieldWeight in 1387, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.5764952 = idf(docFreq=3311, maxDocs=43556)
                0.078125 = fieldNorm(doc=1387)
          0.50396556 = weight(abstract_txt:entity in 1387) [ClassicSimilarity], result of:
            0.50396556 = score(doc=1387,freq=4.0), product of:
              0.5114354 = queryWeight, product of:
                6.5583496 = boost
                6.3065243 = idf(docFreq=215, maxDocs=43556)
                0.012365342 = queryNorm
              0.9853944 = fieldWeight in 1387, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.3065243 = idf(docFreq=215, maxDocs=43556)
                0.078125 = fieldNorm(doc=1387)
        0.2 = coord(5/25)
    
  5. Ku, C.-H.; Leroy, G.: ¬A crime reports analysis system to identify related crimes (2011) 0.12
    0.12166437 = sum of:
      0.12166437 = product of:
        0.60832185 = sum of:
          0.013341458 = weight(abstract_txt:text in 1627) [ClassicSimilarity], result of:
            0.013341458 = score(doc=1627,freq=1.0), product of:
              0.05271478 = queryWeight, product of:
                1.0527745 = boost
                4.0494018 = idf(docFreq=2063, maxDocs=43556)
                0.012365342 = queryNorm
              0.2530876 = fieldWeight in 1627, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0494018 = idf(docFreq=2063, maxDocs=43556)
                0.0625 = fieldNorm(doc=1627)
          0.016253771 = weight(abstract_txt:specific in 1627) [ClassicSimilarity], result of:
            0.016253771 = score(doc=1627,freq=1.0), product of:
              0.060131166 = queryWeight, product of:
                1.1243953 = boost
                4.3248844 = idf(docFreq=1566, maxDocs=43556)
                0.012365342 = queryNorm
              0.27030528 = fieldWeight in 1627, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3248844 = idf(docFreq=1566, maxDocs=43556)
                0.0625 = fieldNorm(doc=1627)
          0.023932433 = weight(abstract_txt:task in 1627) [ClassicSimilarity], result of:
            0.023932433 = score(doc=1627,freq=1.0), product of:
              0.07782541 = queryWeight, product of:
                1.2791748 = boost
                4.92023 = idf(docFreq=863, maxDocs=43556)
                0.012365342 = queryNorm
              0.30751437 = fieldWeight in 1627, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.92023 = idf(docFreq=863, maxDocs=43556)
                0.0625 = fieldNorm(doc=1627)
          0.061010752 = weight(abstract_txt:accuracy in 1627) [ClassicSimilarity], result of:
            0.061010752 = score(doc=1627,freq=2.0), product of:
              0.115272164 = queryWeight, product of:
                1.5567949 = boost
                5.9880705 = idf(docFreq=296, maxDocs=43556)
                0.012365342 = queryNorm
              0.52927566 = fieldWeight in 1627, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9880705 = idf(docFreq=296, maxDocs=43556)
                0.0625 = fieldNorm(doc=1627)
          0.4937834 = weight(abstract_txt:entity in 1627) [ClassicSimilarity], result of:
            0.4937834 = score(doc=1627,freq=6.0), product of:
              0.5114354 = queryWeight, product of:
                6.5583496 = boost
                6.3065243 = idf(docFreq=215, maxDocs=43556)
                0.012365342 = queryNorm
              0.96548545 = fieldWeight in 1627, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.3065243 = idf(docFreq=215, maxDocs=43556)
                0.0625 = fieldNorm(doc=1627)
        0.2 = coord(5/25)