Document (#43670)

Author
Collard, J.
Paiva, V. de
Fong, B.
Subrahmanian, E.
Title
Extracting mathematical concepts from text
Source
arXiv, [https://doi.org/10.48550/arXiv.2208.13830]
Year
2022
Abstract
We investigate different systems for extracting mathematical entities from English texts in the mathematical field of category theory as a first step for constructing a mathematical knowledge graph. We consider four different term extractors and compare their results. This small experiment showcases some of the issues with the construction and evaluation of terms extracted from noisy domain text. We also make available two open corpora in research mathematics, in particular in category theory: a small corpus of 755 abstracts from the journal TAC (3188 sentences), and a larger corpus from the nLab community wiki (15,000 sentences).
Theme
Computerlinguistik
Wissensrepräsentation
Field
Mathematik

Similar documents (author)

  1. Fong, W.W.: Searching the World Wide Web (1996) 6.19
    6.190705 = sum of:
      6.190705 = weight(author_txt:fong in 6597) [ClassicSimilarity], result of:
        6.190705 = fieldWeight in 6597, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.905128 = idf(docFreq=5, maxDocs=44218)
          0.625 = fieldNorm(doc=6597)
    
  2. Fong, K.Y.: Interpretive object-oriented facility which can access precompiled classes (1995) 6.19
    6.190705 = sum of:
      6.190705 = weight(author_txt:fong in 6834) [ClassicSimilarity], result of:
        6.190705 = fieldWeight in 6834, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.905128 = idf(docFreq=5, maxDocs=44218)
          0.625 = fieldNorm(doc=6834)
    
  3. Fong, A.C.M.: Mining a Web citation database for document clustering (2002) 6.19
    6.190705 = sum of:
      6.190705 = weight(author_txt:fong in 3940) [ClassicSimilarity], result of:
        6.190705 = fieldWeight in 3940, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.905128 = idf(docFreq=5, maxDocs=44218)
          0.625 = fieldNorm(doc=3940)
    
  4. Tho, Q.T.; Hui, S.C.; Fong, A.C.M.: ¬A citation-based document retrieval system for finding research expertise (2007) 3.71
    3.7144227 = sum of:
      3.7144227 = weight(author_txt:fong in 956) [ClassicSimilarity], result of:
        3.7144227 = fieldWeight in 956, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.905128 = idf(docFreq=5, maxDocs=44218)
          0.375 = fieldNorm(doc=956)
    

Similar documents (content)

  1. Yang, Y.; Lu, Q.; Zhao, T.: ¬A delimiter-based general approach for Chinese term extraction (2009) 0.17
    0.17211664 = sum of:
      0.17211664 = product of:
        0.5378645 = sum of:
          0.07478679 = weight(abstract_txt:step in 3315) [ClassicSimilarity], result of:
            0.07478679 = score(doc=3315,freq=3.0), product of:
              0.11348836 = queryWeight, product of:
                1.076608 = boost
                6.087415 = idf(docFreq=272, maxDocs=44218)
                0.017316528 = queryNorm
              0.65898204 = fieldWeight in 3315, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.087415 = idf(docFreq=272, maxDocs=44218)
                0.0625 = fieldNorm(doc=3315)
          0.04473146 = weight(abstract_txt:extracted in 3315) [ClassicSimilarity], result of:
            0.04473146 = score(doc=3315,freq=1.0), product of:
              0.11619404 = queryWeight, product of:
                1.0893661 = boost
                6.159553 = idf(docFreq=253, maxDocs=44218)
                0.017316528 = queryNorm
              0.38497207 = fieldWeight in 3315, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.159553 = idf(docFreq=253, maxDocs=44218)
                0.0625 = fieldNorm(doc=3315)
          0.018853666 = weight(abstract_txt:different in 3315) [ClassicSimilarity], result of:
            0.018853666 = score(doc=3315,freq=1.0), product of:
              0.082296684 = queryWeight, product of:
                1.2965465 = boost
                3.6655018 = idf(docFreq=3075, maxDocs=44218)
                0.017316528 = queryNorm
              0.22909386 = fieldWeight in 3315, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6655018 = idf(docFreq=3075, maxDocs=44218)
                0.0625 = fieldNorm(doc=3315)
          0.025315467 = weight(abstract_txt:text in 3315) [ClassicSimilarity], result of:
            0.025315467 = score(doc=3315,freq=1.0), product of:
              0.10016341 = queryWeight, product of:
                1.4303801 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.017316528 = queryNorm
              0.25274166 = fieldWeight in 3315, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=3315)
          0.08682748 = weight(abstract_txt:corpus in 3315) [ClassicSimilarity], result of:
            0.08682748 = score(doc=3315,freq=1.0), product of:
              0.22780152 = queryWeight, product of:
                2.1571245 = boost
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.017316528 = queryNorm
              0.3811541 = fieldWeight in 3315, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.0625 = fieldNorm(doc=3315)
          0.028576516 = weight(abstract_txt:from in 3315) [ClassicSimilarity], result of:
            0.028576516 = score(doc=3315,freq=2.0), product of:
              0.116975434 = queryWeight, product of:
                2.4440734 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.017316528 = queryNorm
              0.24429502 = fieldWeight in 3315, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.0625 = fieldNorm(doc=3315)
          0.1276677 = weight(abstract_txt:extracting in 3315) [ClassicSimilarity], result of:
            0.1276677 = score(doc=3315,freq=1.0), product of:
              0.29455912 = queryWeight, product of:
                2.4529188 = boost
                6.9347134 = idf(docFreq=116, maxDocs=44218)
                0.017316528 = queryNorm
              0.4334196 = fieldWeight in 3315, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9347134 = idf(docFreq=116, maxDocs=44218)
                0.0625 = fieldNorm(doc=3315)
          0.13110542 = weight(abstract_txt:sentences in 3315) [ClassicSimilarity], result of:
            0.13110542 = score(doc=3315,freq=1.0), product of:
              0.29982343 = queryWeight, product of:
                2.4747407 = boost
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.017316528 = queryNorm
              0.43727544 = fieldWeight in 3315, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.0625 = fieldNorm(doc=3315)
        0.32 = coord(8/25)
    
  2. Ling, X.; Jiang, J.; He, X.; Mei, Q.; Zhai, C.; Schatz, B.: Generating gene summaries from biomedical literature : a study of semi-structured summarization (2007) 0.14
    0.14488453 = sum of:
      0.14488453 = product of:
        0.51744473 = sum of:
          0.034732677 = weight(abstract_txt:experiment in 946) [ClassicSimilarity], result of:
            0.034732677 = score(doc=946,freq=1.0), product of:
              0.098159865 = queryWeight, product of:
                1.0012647 = boost
                5.6614056 = idf(docFreq=417, maxDocs=44218)
                0.017316528 = queryNorm
              0.35383785 = fieldWeight in 946, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6614056 = idf(docFreq=417, maxDocs=44218)
                0.0625 = fieldNorm(doc=946)
          0.04473146 = weight(abstract_txt:extracted in 946) [ClassicSimilarity], result of:
            0.04473146 = score(doc=946,freq=1.0), product of:
              0.11619404 = queryWeight, product of:
                1.0893661 = boost
                6.159553 = idf(docFreq=253, maxDocs=44218)
                0.017316528 = queryNorm
              0.38497207 = fieldWeight in 946, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.159553 = idf(docFreq=253, maxDocs=44218)
                0.0625 = fieldNorm(doc=946)
          0.018853666 = weight(abstract_txt:different in 946) [ClassicSimilarity], result of:
            0.018853666 = score(doc=946,freq=1.0), product of:
              0.082296684 = queryWeight, product of:
                1.2965465 = boost
                3.6655018 = idf(docFreq=3075, maxDocs=44218)
                0.017316528 = queryNorm
              0.22909386 = fieldWeight in 946, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6655018 = idf(docFreq=3075, maxDocs=44218)
                0.0625 = fieldNorm(doc=946)
          0.035801478 = weight(abstract_txt:text in 946) [ClassicSimilarity], result of:
            0.035801478 = score(doc=946,freq=2.0), product of:
              0.10016341 = queryWeight, product of:
                1.4303801 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.017316528 = queryNorm
              0.3574307 = fieldWeight in 946, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=946)
          0.028576516 = weight(abstract_txt:from in 946) [ClassicSimilarity], result of:
            0.028576516 = score(doc=946,freq=2.0), product of:
              0.116975434 = queryWeight, product of:
                2.4440734 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.017316528 = queryNorm
              0.24429502 = fieldWeight in 946, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.0625 = fieldNorm(doc=946)
          0.1276677 = weight(abstract_txt:extracting in 946) [ClassicSimilarity], result of:
            0.1276677 = score(doc=946,freq=1.0), product of:
              0.29455912 = queryWeight, product of:
                2.4529188 = boost
                6.9347134 = idf(docFreq=116, maxDocs=44218)
                0.017316528 = queryNorm
              0.4334196 = fieldWeight in 946, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9347134 = idf(docFreq=116, maxDocs=44218)
                0.0625 = fieldNorm(doc=946)
          0.22708125 = weight(abstract_txt:sentences in 946) [ClassicSimilarity], result of:
            0.22708125 = score(doc=946,freq=3.0), product of:
              0.29982343 = queryWeight, product of:
                2.4747407 = boost
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.017316528 = queryNorm
              0.7573833 = fieldWeight in 946, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.0625 = fieldNorm(doc=946)
        0.28 = coord(7/25)
    
  3. Stathopoulos, Y.; Baker, S.; Rei, M.; Teufel, S.: Variable typing : assigning meaning to variables in mathematical text (2018) 0.14
    0.14156955 = sum of:
      0.14156955 = product of:
        0.7078478 = sum of:
          0.055914324 = weight(abstract_txt:extracted in 4432) [ClassicSimilarity], result of:
            0.055914324 = score(doc=4432,freq=1.0), product of:
              0.11619404 = queryWeight, product of:
                1.0893661 = boost
                6.159553 = idf(docFreq=253, maxDocs=44218)
                0.017316528 = queryNorm
              0.4812151 = fieldWeight in 4432, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.159553 = idf(docFreq=253, maxDocs=44218)
                0.078125 = fieldNorm(doc=4432)
          0.023567082 = weight(abstract_txt:different in 4432) [ClassicSimilarity], result of:
            0.023567082 = score(doc=4432,freq=1.0), product of:
              0.082296684 = queryWeight, product of:
                1.2965465 = boost
                3.6655018 = idf(docFreq=3075, maxDocs=44218)
                0.017316528 = queryNorm
              0.28636733 = fieldWeight in 4432, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6655018 = idf(docFreq=3075, maxDocs=44218)
                0.078125 = fieldNorm(doc=4432)
          0.044751845 = weight(abstract_txt:text in 4432) [ClassicSimilarity], result of:
            0.044751845 = score(doc=4432,freq=2.0), product of:
              0.10016341 = queryWeight, product of:
                1.4303801 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.017316528 = queryNorm
              0.44678837 = fieldWeight in 4432, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=4432)
          0.035720646 = weight(abstract_txt:from in 4432) [ClassicSimilarity], result of:
            0.035720646 = score(doc=4432,freq=2.0), product of:
              0.116975434 = queryWeight, product of:
                2.4440734 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.017316528 = queryNorm
              0.30536878 = fieldWeight in 4432, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.078125 = fieldNorm(doc=4432)
          0.5478939 = weight(abstract_txt:mathematical in 4432) [ClassicSimilarity], result of:
            0.5478939 = score(doc=4432,freq=5.0), product of:
              0.493927 = queryWeight, product of:
                4.492037 = boost
                6.3497796 = idf(docFreq=209, maxDocs=44218)
                0.017316528 = queryNorm
              1.1092608 = fieldWeight in 4432, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.3497796 = idf(docFreq=209, maxDocs=44218)
                0.078125 = fieldNorm(doc=4432)
        0.2 = coord(5/25)
    
  4. Li, J.; Zhang, Z.; Li, X.; Chen, H.: Kernel-based learning for biomedical relation extraction (2008) 0.14
    0.14071836 = sum of:
      0.14071836 = product of:
        0.50256556 = sum of:
          0.047490712 = weight(abstract_txt:entities in 1611) [ClassicSimilarity], result of:
            0.047490712 = score(doc=1611,freq=1.0), product of:
              0.10420957 = queryWeight, product of:
                1.0316579 = boost
                5.8332562 = idf(docFreq=351, maxDocs=44218)
                0.017316528 = queryNorm
              0.45572314 = fieldWeight in 1611, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8332562 = idf(docFreq=351, maxDocs=44218)
                0.078125 = fieldNorm(doc=1611)
          0.08291628 = weight(abstract_txt:corpora in 1611) [ClassicSimilarity], result of:
            0.08291628 = score(doc=1611,freq=1.0), product of:
              0.15109903 = queryWeight, product of:
                1.2422607 = boost
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.017316528 = queryNorm
              0.5487546 = fieldWeight in 1611, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.078125 = fieldNorm(doc=1611)
          0.023567082 = weight(abstract_txt:different in 1611) [ClassicSimilarity], result of:
            0.023567082 = score(doc=1611,freq=1.0), product of:
              0.082296684 = queryWeight, product of:
                1.2965465 = boost
                3.6655018 = idf(docFreq=3075, maxDocs=44218)
                0.017316528 = queryNorm
              0.28636733 = fieldWeight in 1611, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6655018 = idf(docFreq=3075, maxDocs=44218)
                0.078125 = fieldNorm(doc=1611)
          0.044751845 = weight(abstract_txt:text in 1611) [ClassicSimilarity], result of:
            0.044751845 = score(doc=1611,freq=2.0), product of:
              0.10016341 = queryWeight, product of:
                1.4303801 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.017316528 = queryNorm
              0.44678837 = fieldWeight in 1611, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=1611)
          0.10853435 = weight(abstract_txt:corpus in 1611) [ClassicSimilarity], result of:
            0.10853435 = score(doc=1611,freq=1.0), product of:
              0.22780152 = queryWeight, product of:
                2.1571245 = boost
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.017316528 = queryNorm
              0.4764426 = fieldWeight in 1611, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.078125 = fieldNorm(doc=1611)
          0.035720646 = weight(abstract_txt:from in 1611) [ClassicSimilarity], result of:
            0.035720646 = score(doc=1611,freq=2.0), product of:
              0.116975434 = queryWeight, product of:
                2.4440734 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.017316528 = queryNorm
              0.30536878 = fieldWeight in 1611, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.078125 = fieldNorm(doc=1611)
          0.15958463 = weight(abstract_txt:extracting in 1611) [ClassicSimilarity], result of:
            0.15958463 = score(doc=1611,freq=1.0), product of:
              0.29455912 = queryWeight, product of:
                2.4529188 = boost
                6.9347134 = idf(docFreq=116, maxDocs=44218)
                0.017316528 = queryNorm
              0.5417745 = fieldWeight in 1611, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9347134 = idf(docFreq=116, maxDocs=44218)
                0.078125 = fieldNorm(doc=1611)
        0.28 = coord(7/25)
    
  5. Fraser, C.: Mathematics in library and review classification systems : an historical overview (2020) 0.14
    0.13945904 = sum of:
      0.13945904 = product of:
        0.6972952 = sum of:
          0.053208113 = weight(abstract_txt:larger in 5900) [ClassicSimilarity], result of:
            0.053208113 = score(doc=5900,freq=1.0), product of:
              0.112413995 = queryWeight, product of:
                1.0714998 = boost
                6.0585327 = idf(docFreq=280, maxDocs=44218)
                0.017316528 = queryNorm
              0.47332287 = fieldWeight in 5900, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0585327 = idf(docFreq=280, maxDocs=44218)
                0.078125 = fieldNorm(doc=5900)
          0.14261274 = weight(abstract_txt:mathematics in 5900) [ClassicSimilarity], result of:
            0.14261274 = score(doc=5900,freq=4.0), product of:
              0.13664299 = queryWeight, product of:
                1.1813419 = boost
                6.6796074 = idf(docFreq=150, maxDocs=44218)
                0.017316528 = queryNorm
              1.0436887 = fieldWeight in 5900, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.6796074 = idf(docFreq=150, maxDocs=44218)
                0.078125 = fieldNorm(doc=5900)
          0.033328883 = weight(abstract_txt:different in 5900) [ClassicSimilarity], result of:
            0.033328883 = score(doc=5900,freq=2.0), product of:
              0.082296684 = queryWeight, product of:
                1.2965465 = boost
                3.6655018 = idf(docFreq=3075, maxDocs=44218)
                0.017316528 = queryNorm
              0.40498453 = fieldWeight in 5900, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.6655018 = idf(docFreq=3075, maxDocs=44218)
                0.078125 = fieldNorm(doc=5900)
          0.043748677 = weight(abstract_txt:from in 5900) [ClassicSimilarity], result of:
            0.043748677 = score(doc=5900,freq=3.0), product of:
              0.116975434 = queryWeight, product of:
                2.4440734 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.017316528 = queryNorm
              0.37399885 = fieldWeight in 5900, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.078125 = fieldNorm(doc=5900)
          0.42439675 = weight(abstract_txt:mathematical in 5900) [ClassicSimilarity], result of:
            0.42439675 = score(doc=5900,freq=3.0), product of:
              0.493927 = queryWeight, product of:
                4.492037 = boost
                6.3497796 = idf(docFreq=209, maxDocs=44218)
                0.017316528 = queryNorm
              0.8592297 = fieldWeight in 5900, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.3497796 = idf(docFreq=209, maxDocs=44218)
                0.078125 = fieldNorm(doc=5900)
        0.2 = coord(5/25)