Document (#39833)

Author
Martinez-Romo, J.
Araujo, L.
Fernandez, A.D.
Title
SemGraph : extracting keyphrases following a novel semantic graph-based approach
Source
Journal of the Association for Information Science and Technology. 67(2016) no.1, S.71-82
Year
2016
Abstract
Keyphrases represent the main topics a text is about. In this article, we introduce SemGraph, an unsupervised algorithm for extracting keyphrases from a collection of texts based on a semantic relationship graph. The main novelty of this algorithm is its ability to identify semantic relationships between words whose presence is statistically significant. Our method constructs a co-occurrence graph in which words appearing in the same document are linked, provided their presence in the collection is statistically significant with respect to a null model. Furthermore, the graph obtained is enriched with information from WordNet. We have used the most recent and standardized benchmark to evaluate the system ability to detect the keyphrases that are part of the text. The result is a method that achieves an improvement of 5.3% and 7.28% in F measure over the two labeled sets of keyphrases used in the evaluation of SemEval-2010.
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23365/abstract.
Theme
Automatisches Abstracting
Object
SemGraph

Similar documents (author)

  1. Fernandez, C.W.: Semantic relationships between title phrases and LCSH (1991) 1.18
    1.1827863 = sum of:
      1.1827863 = product of:
        3.548359 = sum of:
          3.548359 = weight(author_txt:fernandez in 509) [ClassicSimilarity], result of:
            3.548359 = score(doc=509,freq=1.0), product of:
              0.6043423 = queryWeight, product of:
                1.1182295 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.057529088 = queryNorm
              5.871439 = fieldWeight in 509, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.625 = fieldNorm(doc=509)
        0.33333334 = coord(1/3)
    
  2. Fernandez, F.S.; Moreno, A.G.: History of information science in Spain : a selected bibliography (1997) 0.95
    0.9462291 = sum of:
      0.9462291 = product of:
        2.8386872 = sum of:
          2.8386872 = weight(author_txt:fernandez in 52) [ClassicSimilarity], result of:
            2.8386872 = score(doc=52,freq=1.0), product of:
              0.6043423 = queryWeight, product of:
                1.1182295 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.057529088 = queryNorm
              4.697151 = fieldWeight in 52, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.5 = fieldNorm(doc=52)
        0.33333334 = coord(1/3)
    
  3. Novaes, M. de Araujo => Araujo Novaes, M. de: 0.90
    0.89738107 = sum of:
      0.89738107 = product of:
        2.6921432 = sum of:
          2.6921432 = weight(author_txt:araujo in 4819) [ClassicSimilarity], result of:
            2.6921432 = score(doc=4819,freq=2.0), product of:
              0.6333932 = queryWeight, product of:
                1.1447909 = boost
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.057529088 = queryNorm
              4.2503505 = fieldWeight in 4819, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.3125 = fieldNorm(doc=4819)
        0.33333334 = coord(1/3)
    
  4. Araujo, A. de Freitas => Freitas Araujo, A. de: 0.90
    0.89738107 = sum of:
      0.89738107 = product of:
        2.6921432 = sum of:
          2.6921432 = weight(author_txt:araujo in 4885) [ClassicSimilarity], result of:
            2.6921432 = score(doc=4885,freq=2.0), product of:
              0.6333932 = queryWeight, product of:
                1.1447909 = boost
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.057529088 = queryNorm
              4.2503505 = fieldWeight in 4885, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.3125 = fieldNorm(doc=4885)
        0.33333334 = coord(1/3)
    
  5. Araujo, P.C. de; Guimaraes, J.A.: Epistemology of knowledge organization : a study of epistemic communities (2016) 0.89
    0.88836205 = sum of:
      0.88836205 = product of:
        2.665086 = sum of:
          2.665086 = weight(author_txt:araujo in 4883) [ClassicSimilarity], result of:
            2.665086 = score(doc=4883,freq=1.0), product of:
              0.6333932 = queryWeight, product of:
                1.1447909 = boost
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.057529088 = queryNorm
              4.2076325 = fieldWeight in 4883, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.4375 = fieldNorm(doc=4883)
        0.33333334 = coord(1/3)
    

Similar documents (content)

  1. Wu, Y.-f.B.; Li, Q.; Bot, R.S.; Chen, X.: Finding nuggets in documents : a machine learning approach (2006) 0.27
    0.27332112 = sum of:
      0.27332112 = product of:
        1.3666056 = sum of:
          0.021011282 = weight(abstract_txt:text in 5290) [ClassicSimilarity], result of:
            0.021011282 = score(doc=5290,freq=2.0), product of:
              0.058784213 = queryWeight, product of:
                1.1420503 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.012728542 = queryNorm
              0.3574307 = fieldWeight in 5290, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=5290)
          0.024343539 = weight(abstract_txt:significant in 5290) [ClassicSimilarity], result of:
            0.024343539 = score(doc=5290,freq=1.0), product of:
              0.08170052 = queryWeight, product of:
                1.3463788 = boost
                4.76737 = idf(docFreq=1021, maxDocs=44218)
                0.012728542 = queryNorm
              0.29796064 = fieldWeight in 5290, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.76737 = idf(docFreq=1021, maxDocs=44218)
                0.0625 = fieldNorm(doc=5290)
          0.059009995 = weight(abstract_txt:algorithm in 5290) [ClassicSimilarity], result of:
            0.059009995 = score(doc=5290,freq=2.0), product of:
              0.11701532 = queryWeight, product of:
                1.6112993 = boost
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.012728542 = queryNorm
              0.5042929 = fieldWeight in 5290, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.0625 = fieldNorm(doc=5290)
          0.030186992 = weight(abstract_txt:semantic in 5290) [ClassicSimilarity], result of:
            0.030186992 = score(doc=5290,freq=1.0), product of:
              0.10794751 = queryWeight, product of:
                1.8954259 = boost
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.012728542 = queryNorm
              0.2796451 = fieldWeight in 5290, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.0625 = fieldNorm(doc=5290)
          1.2320539 = weight(abstract_txt:keyphrases in 5290) [ClassicSimilarity], result of:
            1.2320539 = score(doc=5290,freq=7.0), product of:
              0.79311496 = queryWeight, product of:
                6.6327395 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.012728542 = queryNorm
              1.5534366 = fieldWeight in 5290, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.0625 = fieldNorm(doc=5290)
        0.2 = coord(5/25)
    
  2. Jiang, Y.; Meng, R.; Huang, Y.; Lu, W.; Liu, J.: Generating keyphrases for readers : a controllable keyphrase generation framework (2023) 0.22
    0.21645299 = sum of:
      0.21645299 = product of:
        1.0822649 = sum of:
          0.0148572195 = weight(abstract_txt:text in 1012) [ClassicSimilarity], result of:
            0.0148572195 = score(doc=1012,freq=1.0), product of:
              0.058784213 = queryWeight, product of:
                1.1420503 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.012728542 = queryNorm
              0.25274166 = fieldWeight in 1012, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=1012)
          0.022049647 = weight(abstract_txt:main in 1012) [ClassicSimilarity], result of:
            0.022049647 = score(doc=1012,freq=1.0), product of:
              0.07648391 = queryWeight, product of:
                1.3026865 = boost
                4.612661 = idf(docFreq=1192, maxDocs=44218)
                0.012728542 = queryNorm
              0.2882913 = fieldWeight in 1012, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.612661 = idf(docFreq=1192, maxDocs=44218)
                0.0625 = fieldNorm(doc=1012)
          0.04269085 = weight(abstract_txt:semantic in 1012) [ClassicSimilarity], result of:
            0.04269085 = score(doc=1012,freq=2.0), product of:
              0.10794751 = queryWeight, product of:
                1.8954259 = boost
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.012728542 = queryNorm
              0.39547786 = fieldWeight in 1012, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.0625 = fieldNorm(doc=1012)
          0.07132191 = weight(abstract_txt:statistically in 1012) [ClassicSimilarity], result of:
            0.07132191 = score(doc=1012,freq=1.0), product of:
              0.16728269 = queryWeight, product of:
                1.9265504 = boost
                6.82169 = idf(docFreq=130, maxDocs=44218)
                0.012728542 = queryNorm
              0.42635563 = fieldWeight in 1012, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.82169 = idf(docFreq=130, maxDocs=44218)
                0.0625 = fieldNorm(doc=1012)
          0.9313452 = weight(abstract_txt:keyphrases in 1012) [ClassicSimilarity], result of:
            0.9313452 = score(doc=1012,freq=4.0), product of:
              0.79311496 = queryWeight, product of:
                6.6327395 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.012728542 = queryNorm
              1.1742878 = fieldWeight in 1012, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.0625 = fieldNorm(doc=1012)
        0.2 = coord(5/25)
    
  3. Jones, S.; Paynter, G.W.: Automatic extractionof document keyphrases for use in digital libraries : evaluations and applications (2002) 0.16
    0.15752919 = sum of:
      0.15752919 = product of:
        1.3127433 = sum of:
          0.03896307 = weight(abstract_txt:ability in 601) [ClassicSimilarity], result of:
            0.03896307 = score(doc=601,freq=1.0), product of:
              0.11179038 = queryWeight, product of:
                1.5749148 = boost
                5.5765896 = idf(docFreq=454, maxDocs=44218)
                0.012728542 = queryNorm
              0.34853685 = fieldWeight in 601, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5765896 = idf(docFreq=454, maxDocs=44218)
                0.0625 = fieldNorm(doc=601)
          0.041726366 = weight(abstract_txt:algorithm in 601) [ClassicSimilarity], result of:
            0.041726366 = score(doc=601,freq=1.0), product of:
              0.11701532 = queryWeight, product of:
                1.6112993 = boost
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.012728542 = queryNorm
              0.35658893 = fieldWeight in 601, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.0625 = fieldNorm(doc=601)
          1.2320539 = weight(abstract_txt:keyphrases in 601) [ClassicSimilarity], result of:
            1.2320539 = score(doc=601,freq=7.0), product of:
              0.79311496 = queryWeight, product of:
                6.6327395 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.012728542 = queryNorm
              1.5534366 = fieldWeight in 601, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.0625 = fieldNorm(doc=601)
        0.12 = coord(3/25)
    
  4. Urbain, J.; Goharian, N.; Frieder, O.: Probabilistic passage models for semantic search of genomics literature (2008) 0.13
    0.12906322 = sum of:
      0.12906322 = product of:
        0.46094006 = sum of:
          0.04969021 = weight(abstract_txt:unsupervised in 2380) [ClassicSimilarity], result of:
            0.04969021 = score(doc=2380,freq=1.0), product of:
              0.1043453 = queryWeight, product of:
                1.0759109 = boost
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.012728542 = queryNorm
              0.47620937 = fieldWeight in 2380, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.0625 = fieldNorm(doc=2380)
          0.034426965 = weight(abstract_txt:significant in 2380) [ClassicSimilarity], result of:
            0.034426965 = score(doc=2380,freq=2.0), product of:
              0.08170052 = queryWeight, product of:
                1.3463788 = boost
                4.76737 = idf(docFreq=1021, maxDocs=44218)
                0.012728542 = queryNorm
              0.42137998 = fieldWeight in 2380, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.76737 = idf(docFreq=1021, maxDocs=44218)
                0.0625 = fieldNorm(doc=2380)
          0.060283512 = weight(abstract_txt:presence in 2380) [ClassicSimilarity], result of:
            0.060283512 = score(doc=2380,freq=1.0), product of:
              0.14954367 = queryWeight, product of:
                1.8215407 = boost
                6.449863 = idf(docFreq=189, maxDocs=44218)
                0.012728542 = queryNorm
              0.40311643 = fieldWeight in 2380, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.449863 = idf(docFreq=189, maxDocs=44218)
                0.0625 = fieldNorm(doc=2380)
          0.04269085 = weight(abstract_txt:semantic in 2380) [ClassicSimilarity], result of:
            0.04269085 = score(doc=2380,freq=2.0), product of:
              0.10794751 = queryWeight, product of:
                1.8954259 = boost
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.012728542 = queryNorm
              0.39547786 = fieldWeight in 2380, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.0625 = fieldNorm(doc=2380)
          0.07132191 = weight(abstract_txt:statistically in 2380) [ClassicSimilarity], result of:
            0.07132191 = score(doc=2380,freq=1.0), product of:
              0.16728269 = queryWeight, product of:
                1.9265504 = boost
                6.82169 = idf(docFreq=130, maxDocs=44218)
                0.012728542 = queryNorm
              0.42635563 = fieldWeight in 2380, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.82169 = idf(docFreq=130, maxDocs=44218)
                0.0625 = fieldNorm(doc=2380)
          0.07492601 = weight(abstract_txt:extracting in 2380) [ClassicSimilarity], result of:
            0.07492601 = score(doc=2380,freq=1.0), product of:
              0.17287177 = queryWeight, product of:
                1.9584699 = boost
                6.9347134 = idf(docFreq=116, maxDocs=44218)
                0.012728542 = queryNorm
              0.4334196 = fieldWeight in 2380, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9347134 = idf(docFreq=116, maxDocs=44218)
                0.0625 = fieldNorm(doc=2380)
          0.1276006 = weight(abstract_txt:graph in 2380) [ClassicSimilarity], result of:
            0.1276006 = score(doc=2380,freq=1.0), product of:
              0.31060907 = queryWeight, product of:
                3.7125895 = boost
                6.572923 = idf(docFreq=167, maxDocs=44218)
                0.012728542 = queryNorm
              0.4108077 = fieldWeight in 2380, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.572923 = idf(docFreq=167, maxDocs=44218)
                0.0625 = fieldNorm(doc=2380)
        0.28 = coord(7/25)
    
  5. Pirkola, A.: Constructing topic-specific search keyphrase suggestion tools for Web information retrieval (2010) 0.13
    0.12848273 = sum of:
      0.12848273 = product of:
        1.0706894 = sum of:
          0.026264103 = weight(abstract_txt:text in 4665) [ClassicSimilarity], result of:
            0.026264103 = score(doc=4665,freq=2.0), product of:
              0.058784213 = queryWeight, product of:
                1.1420503 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.012728542 = queryNorm
              0.44678837 = fieldWeight in 4665, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=4665)
          0.036214672 = weight(abstract_txt:method in 4665) [ClassicSimilarity], result of:
            0.036214672 = score(doc=4665,freq=2.0), product of:
              0.07282414 = queryWeight, product of:
                1.2711376 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.012728542 = queryNorm
              0.49728936 = fieldWeight in 4665, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.078125 = fieldNorm(doc=4665)
          1.0082107 = weight(abstract_txt:keyphrases in 4665) [ClassicSimilarity], result of:
            1.0082107 = score(doc=4665,freq=3.0), product of:
              0.79311496 = queryWeight, product of:
                6.6327395 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.012728542 = queryNorm
              1.2712038 = fieldWeight in 4665, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.078125 = fieldNorm(doc=4665)
        0.12 = coord(3/25)