Document (#40831)

Author
Gao, N.
Dredze, M.
Oard, D.W.
Title
Person entity linking in email with NIL detection
Source
Journal of the Association for Information Science and Technology. 68(2017) no.10, S.2412-2424
Year
2017
Abstract
For each specific mention of an entity found in a text, the goal of entity linking is to determine whether the referenced entity is present in an existing knowledge base, and if so to determine which KB entity is the correct referent. Entity linking has been well explored for dissemination-oriented sources such as news stories, blogs, and microblog posts, but the limited work to date on "conversational" sources such as email or text chat has not yet attempted to determine when the referent entity is not in the knowledge base (a task known as "NIL detection"). This article presents a supervised machine learning system for linking named mentions of people in email messages to a collection-specific knowledge base, and that is also capable of NIL detection. This system learns from manually annotated training examples to leverage a rich set of features. The entity linking accuracy for entities present in the knowledge base is substantially and significantly better than the best previously reported results on the Enron email collection, comparable accuracy is reported for the challenging NIL detection task, and these results are for the first time replicated on a second email collection from a different source with comparable results.
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23888/full.
Theme
Internet

Similar documents (author)

  1. Oard, D.W.: Serving users in many languages : cross-language information retrieval for digital libraries (1997) 5.50
    5.504072 = sum of:
      5.504072 = weight(author_txt:oard in 1261) [ClassicSimilarity], result of:
        5.504072 = fieldWeight in 1261, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.806516 = idf(docFreq=17, maxDocs=44218)
          0.625 = fieldNorm(doc=1261)
    
  2. Oard, D.W.: Multilingual information access (2009) 5.50
    5.504072 = sum of:
      5.504072 = weight(author_txt:oard in 3850) [ClassicSimilarity], result of:
        5.504072 = fieldWeight in 3850, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.806516 = idf(docFreq=17, maxDocs=44218)
          0.625 = fieldNorm(doc=3850)
    
  3. Oard, D.W.: Alternative approaches for cross-language text retrieval (1997) 5.50
    5.504072 = sum of:
      5.504072 = weight(author_txt:oard in 1164) [ClassicSimilarity], result of:
        5.504072 = fieldWeight in 1164, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.806516 = idf(docFreq=17, maxDocs=44218)
          0.625 = fieldNorm(doc=1164)
    
  4. Wang, J.; Oard, D.W.: Matching meaning for cross-language information retrieval (2012) 4.40
    4.403258 = sum of:
      4.403258 = weight(author_txt:oard in 7430) [ClassicSimilarity], result of:
        4.403258 = fieldWeight in 7430, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.806516 = idf(docFreq=17, maxDocs=44218)
          0.5 = fieldNorm(doc=7430)
    
  5. Oard, D.W.; Resnik, P.: Support for interactive document selection in cross-language information retrieval (1999) 4.40
    4.403258 = sum of:
      4.403258 = weight(author_txt:oard in 5938) [ClassicSimilarity], result of:
        4.403258 = fieldWeight in 5938, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.806516 = idf(docFreq=17, maxDocs=44218)
          0.5 = fieldNorm(doc=5938)
    

Similar documents (content)

  1. Zhao, G.; Wu, J.; Wang, D.; Li, T.: Entity disambiguation to Wikipedia using collective ranking (2016) 0.24
    0.2433182 = sum of:
      0.2433182 = product of:
        0.8689936 = sum of:
          0.057639036 = weight(abstract_txt:mention in 3266) [ClassicSimilarity], result of:
            0.057639036 = score(doc=3266,freq=1.0), product of:
              0.095717296 = queryWeight, product of:
                7.7079034 = idf(docFreq=53, maxDocs=44218)
                0.012418072 = queryNorm
              0.60217994 = fieldWeight in 3266, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7079034 = idf(docFreq=53, maxDocs=44218)
                0.078125 = fieldNorm(doc=3266)
          0.028832946 = weight(abstract_txt:text in 3266) [ClassicSimilarity], result of:
            0.028832946 = score(doc=3266,freq=3.0), product of:
              0.05269162 = queryWeight, product of:
                1.049278 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.012418072 = queryNorm
              0.54720175 = fieldWeight in 3266, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=3266)
          0.02982135 = weight(abstract_txt:task in 3266) [ClassicSimilarity], result of:
            0.02982135 = score(doc=3266,freq=1.0), product of:
              0.07772143 = queryWeight, product of:
                1.2743542 = boost
                4.9112997 = idf(docFreq=884, maxDocs=44218)
                0.012418072 = queryNorm
              0.3836953 = fieldWeight in 3266, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9112997 = idf(docFreq=884, maxDocs=44218)
                0.078125 = fieldNorm(doc=3266)
          0.022577511 = weight(abstract_txt:knowledge in 3266) [ClassicSimilarity], result of:
            0.022577511 = score(doc=3266,freq=1.0), product of:
              0.08134234 = queryWeight, product of:
                1.8437121 = boost
                3.5527887 = idf(docFreq=3442, maxDocs=44218)
                0.012418072 = queryNorm
              0.2775616 = fieldWeight in 3266, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5527887 = idf(docFreq=3442, maxDocs=44218)
                0.078125 = fieldNorm(doc=3266)
          0.08999342 = weight(abstract_txt:base in 3266) [ClassicSimilarity], result of:
            0.08999342 = score(doc=3266,freq=1.0), product of:
              0.20449048 = queryWeight, product of:
                2.9232862 = boost
                5.633102 = idf(docFreq=429, maxDocs=44218)
                0.012418072 = queryNorm
              0.4400861 = fieldWeight in 3266, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.633102 = idf(docFreq=429, maxDocs=44218)
                0.078125 = fieldNorm(doc=3266)
          0.14222042 = weight(abstract_txt:linking in 3266) [ClassicSimilarity], result of:
            0.14222042 = score(doc=3266,freq=1.0), product of:
              0.2988665 = queryWeight, product of:
                3.9511945 = boost
                6.091085 = idf(docFreq=271, maxDocs=44218)
                0.012418072 = queryNorm
              0.47586602 = fieldWeight in 3266, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.091085 = idf(docFreq=271, maxDocs=44218)
                0.078125 = fieldNorm(doc=3266)
          0.4979089 = weight(abstract_txt:entity in 3266) [ClassicSimilarity], result of:
            0.4979089 = score(doc=3266,freq=4.0), product of:
              0.50771797 = queryWeight, product of:
                6.5141993 = boost
                6.2763524 = idf(docFreq=225, maxDocs=44218)
                0.012418072 = queryNorm
              0.98068005 = fieldWeight in 3266, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.2763524 = idf(docFreq=225, maxDocs=44218)
                0.078125 = fieldNorm(doc=3266)
        0.28 = coord(7/25)
    
  2. Lawrie, D.; Mayfield, J.; McNamee, P.; Oard, P.W.: Cross-language person-entity linking from 20 languages (2015) 0.24
    0.24156727 = sum of:
      0.24156727 = product of:
        1.0065303 = sum of:
          0.015946796 = weight(abstract_txt:results in 1848) [ClassicSimilarity], result of:
            0.015946796 = score(doc=1848,freq=1.0), product of:
              0.058614083 = queryWeight, product of:
                1.3553966 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.012418072 = queryNorm
              0.27206424 = fieldWeight in 1848, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.078125 = fieldNorm(doc=1848)
          0.049747832 = weight(abstract_txt:reported in 1848) [ClassicSimilarity], result of:
            0.049747832 = score(doc=1848,freq=1.0), product of:
              0.10932145 = queryWeight, product of:
                1.5113759 = boost
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.012418072 = queryNorm
              0.45506012 = fieldWeight in 1848, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.078125 = fieldNorm(doc=1848)
          0.07575033 = weight(abstract_txt:accuracy in 1848) [ClassicSimilarity], result of:
            0.07575033 = score(doc=1848,freq=2.0), product of:
              0.114842415 = queryWeight, product of:
                1.5490696 = boost
                5.9700394 = idf(docFreq=306, maxDocs=44218)
                0.012418072 = queryNorm
              0.65960234 = fieldWeight in 1848, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9700394 = idf(docFreq=306, maxDocs=44218)
                0.078125 = fieldNorm(doc=1848)
          0.082735576 = weight(abstract_txt:comparable in 1848) [ClassicSimilarity], result of:
            0.082735576 = score(doc=1848,freq=1.0), product of:
              0.15345609 = queryWeight, product of:
                1.7906548 = boost
                6.901097 = idf(docFreq=120, maxDocs=44218)
                0.012418072 = queryNorm
              0.5391482 = fieldWeight in 1848, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.901097 = idf(docFreq=120, maxDocs=44218)
                0.078125 = fieldNorm(doc=1848)
          0.28444085 = weight(abstract_txt:linking in 1848) [ClassicSimilarity], result of:
            0.28444085 = score(doc=1848,freq=4.0), product of:
              0.2988665 = queryWeight, product of:
                3.9511945 = boost
                6.091085 = idf(docFreq=271, maxDocs=44218)
                0.012418072 = queryNorm
              0.95173204 = fieldWeight in 1848, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.091085 = idf(docFreq=271, maxDocs=44218)
                0.078125 = fieldNorm(doc=1848)
          0.4979089 = weight(abstract_txt:entity in 1848) [ClassicSimilarity], result of:
            0.4979089 = score(doc=1848,freq=4.0), product of:
              0.50771797 = queryWeight, product of:
                6.5141993 = boost
                6.2763524 = idf(docFreq=225, maxDocs=44218)
                0.012418072 = queryNorm
              0.98068005 = fieldWeight in 1848, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.2763524 = idf(docFreq=225, maxDocs=44218)
                0.078125 = fieldNorm(doc=1848)
        0.24 = coord(6/25)
    
  3. Lee, D.J.L.; Stvilia, B.: Developing a data identifier taxonomy (2014) 0.14
    0.13671188 = sum of:
      0.13671188 = product of:
        0.6835594 = sum of:
          0.07906447 = weight(abstract_txt:referenced in 1976) [ClassicSimilarity], result of:
            0.07906447 = score(doc=1976,freq=1.0), product of:
              0.104643606 = queryWeight, product of:
                1.0455893 = boost
                8.059301 = idf(docFreq=37, maxDocs=44218)
                0.012418072 = queryNorm
              0.7555595 = fieldWeight in 1976, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.059301 = idf(docFreq=37, maxDocs=44218)
                0.09375 = fieldNorm(doc=1976)
          0.027093014 = weight(abstract_txt:knowledge in 1976) [ClassicSimilarity], result of:
            0.027093014 = score(doc=1976,freq=1.0), product of:
              0.08134234 = queryWeight, product of:
                1.8437121 = boost
                3.5527887 = idf(docFreq=3442, maxDocs=44218)
                0.012418072 = queryNorm
              0.33307394 = fieldWeight in 1976, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5527887 = idf(docFreq=3442, maxDocs=44218)
                0.09375 = fieldNorm(doc=1976)
          0.1079921 = weight(abstract_txt:base in 1976) [ClassicSimilarity], result of:
            0.1079921 = score(doc=1976,freq=1.0), product of:
              0.20449048 = queryWeight, product of:
                2.9232862 = boost
                5.633102 = idf(docFreq=429, maxDocs=44218)
                0.012418072 = queryNorm
              0.5281033 = fieldWeight in 1976, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.633102 = idf(docFreq=429, maxDocs=44218)
                0.09375 = fieldNorm(doc=1976)
          0.17066449 = weight(abstract_txt:linking in 1976) [ClassicSimilarity], result of:
            0.17066449 = score(doc=1976,freq=1.0), product of:
              0.2988665 = queryWeight, product of:
                3.9511945 = boost
                6.091085 = idf(docFreq=271, maxDocs=44218)
                0.012418072 = queryNorm
              0.5710392 = fieldWeight in 1976, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.091085 = idf(docFreq=271, maxDocs=44218)
                0.09375 = fieldNorm(doc=1976)
          0.29874533 = weight(abstract_txt:entity in 1976) [ClassicSimilarity], result of:
            0.29874533 = score(doc=1976,freq=1.0), product of:
              0.50771797 = queryWeight, product of:
                6.5141993 = boost
                6.2763524 = idf(docFreq=225, maxDocs=44218)
                0.012418072 = queryNorm
              0.58840805 = fieldWeight in 1976, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2763524 = idf(docFreq=225, maxDocs=44218)
                0.09375 = fieldNorm(doc=1976)
        0.2 = coord(5/25)
    
  4. Tang, X.; Chen, L.; Cui, J.; Wei, B.: Knowledge representation learning with entity descriptions, hierarchical types, and textual relations (2019) 0.12
    0.121808924 = sum of:
      0.121808924 = product of:
        0.6090446 = sum of:
          0.020212537 = weight(abstract_txt:specific in 5101) [ClassicSimilarity], result of:
            0.020212537 = score(doc=5101,freq=1.0), product of:
              0.059970338 = queryWeight, product of:
                1.119407 = boost
                4.314141 = idf(docFreq=1607, maxDocs=44218)
                0.012418072 = queryNorm
              0.33704224 = fieldWeight in 5101, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.314141 = idf(docFreq=1607, maxDocs=44218)
                0.078125 = fieldNorm(doc=5101)
          0.02982135 = weight(abstract_txt:task in 5101) [ClassicSimilarity], result of:
            0.02982135 = score(doc=5101,freq=1.0), product of:
              0.07772143 = queryWeight, product of:
                1.2743542 = boost
                4.9112997 = idf(docFreq=884, maxDocs=44218)
                0.012418072 = queryNorm
              0.3836953 = fieldWeight in 5101, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9112997 = idf(docFreq=884, maxDocs=44218)
                0.078125 = fieldNorm(doc=5101)
          0.015946796 = weight(abstract_txt:results in 5101) [ClassicSimilarity], result of:
            0.015946796 = score(doc=5101,freq=1.0), product of:
              0.058614083 = queryWeight, product of:
                1.3553966 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.012418072 = queryNorm
              0.27206424 = fieldWeight in 5101, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.078125 = fieldNorm(doc=5101)
          0.045155022 = weight(abstract_txt:knowledge in 5101) [ClassicSimilarity], result of:
            0.045155022 = score(doc=5101,freq=4.0), product of:
              0.08134234 = queryWeight, product of:
                1.8437121 = boost
                3.5527887 = idf(docFreq=3442, maxDocs=44218)
                0.012418072 = queryNorm
              0.5551232 = fieldWeight in 5101, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.5527887 = idf(docFreq=3442, maxDocs=44218)
                0.078125 = fieldNorm(doc=5101)
          0.4979089 = weight(abstract_txt:entity in 5101) [ClassicSimilarity], result of:
            0.4979089 = score(doc=5101,freq=4.0), product of:
              0.50771797 = queryWeight, product of:
                6.5141993 = boost
                6.2763524 = idf(docFreq=225, maxDocs=44218)
                0.012418072 = queryNorm
              0.98068005 = fieldWeight in 5101, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.2763524 = idf(docFreq=225, maxDocs=44218)
                0.078125 = fieldNorm(doc=5101)
        0.2 = coord(5/25)
    
  5. Ku, C.-H.; Leroy, G.: ¬A crime reports analysis system to identify related crimes (2011) 0.12
    0.120358765 = sum of:
      0.120358765 = product of:
        0.6017938 = sum of:
          0.013317368 = weight(abstract_txt:text in 4629) [ClassicSimilarity], result of:
            0.013317368 = score(doc=4629,freq=1.0), product of:
              0.05269162 = queryWeight, product of:
                1.049278 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.012418072 = queryNorm
              0.25274166 = fieldWeight in 4629, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=4629)
          0.01617003 = weight(abstract_txt:specific in 4629) [ClassicSimilarity], result of:
            0.01617003 = score(doc=4629,freq=1.0), product of:
              0.059970338 = queryWeight, product of:
                1.119407 = boost
                4.314141 = idf(docFreq=1607, maxDocs=44218)
                0.012418072 = queryNorm
              0.2696338 = fieldWeight in 4629, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.314141 = idf(docFreq=1607, maxDocs=44218)
                0.0625 = fieldNorm(doc=4629)
          0.023857078 = weight(abstract_txt:task in 4629) [ClassicSimilarity], result of:
            0.023857078 = score(doc=4629,freq=1.0), product of:
              0.07772143 = queryWeight, product of:
                1.2743542 = boost
                4.9112997 = idf(docFreq=884, maxDocs=44218)
                0.012418072 = queryNorm
              0.30695623 = fieldWeight in 4629, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9112997 = idf(docFreq=884, maxDocs=44218)
                0.0625 = fieldNorm(doc=4629)
          0.060600262 = weight(abstract_txt:accuracy in 4629) [ClassicSimilarity], result of:
            0.060600262 = score(doc=4629,freq=2.0), product of:
              0.114842415 = queryWeight, product of:
                1.5490696 = boost
                5.9700394 = idf(docFreq=306, maxDocs=44218)
                0.012418072 = queryNorm
              0.5276819 = fieldWeight in 4629, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9700394 = idf(docFreq=306, maxDocs=44218)
                0.0625 = fieldNorm(doc=4629)
          0.4878491 = weight(abstract_txt:entity in 4629) [ClassicSimilarity], result of:
            0.4878491 = score(doc=4629,freq=6.0), product of:
              0.50771797 = queryWeight, product of:
                6.5141993 = boost
                6.2763524 = idf(docFreq=225, maxDocs=44218)
                0.012418072 = queryNorm
              0.96086633 = fieldWeight in 4629, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.2763524 = idf(docFreq=225, maxDocs=44218)
                0.0625 = fieldNorm(doc=4629)
        0.2 = coord(5/25)