Document (#38058)

Author
Sojka, P.
Lee, M.
Rehurek, R.
Hatlapatka, R.
Kucbel, M.
Bouche, T.
Goutorbe, C.
Anghelache, R.
Wojciechowski, K.
Title
Toolset for entity and semantic associations : Final Release
Issue
Revision: 1.0 as of 8th February 2013.
Source
https://wiki.eudml.eu/eudml-w/images/D8.4-v1.0.pdf
Year
2013
Abstract
In this document we describe the final release of the toolset for entity and semantic associations, integrating two versions (language dependent and language independent) of Unsupervised Document Similarity implemented by MU (using gensim tool) and Citation Indexing, Resolution and Matching (UJF/CMD). We give a brief description of tools, the rationale behind decisions made, and provide elementary evaluation. Tools are integrated in the main project result, EuDML website, and they deliver the needed functionality for exploratory searching and browsing the collected documents. EuDML users and content providers thus benefit from millions of algorithmically generated similarity and citation links, developed using state of the art machine learning and matching methods.
Content
Vgl. auch: https://is.muni.cz/repo/1076213/en/Lee-Sojka-Rehurek-Bolikowski/Toolset-for-Entity-and-Semantic-Associations-Initial-Release-Deliverable-82-of-project-EuDML?lang=en.
Theme
Automatisches Klassifizieren
Field
Mathematik
Object
GENSIM
Latent Semantic Indexing
Zentralblatt für Mathematik

Similar documents (author)

  1. Sojka, P.: Exploiting semantic annotations in math information retrieval (2012) 6.19
    6.190705 = sum of:
      6.190705 = weight(author_txt:sojka in 32) [ClassicSimilarity], result of:
        6.190705 = fieldWeight in 32, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.905128 = idf(docFreq=5, maxDocs=44218)
          0.625 = fieldNorm(doc=32)
    
  2. Rehurek, R.; Sojka, P.: Software framework for topic modelling with large corpora (2010) 4.95
    4.952564 = sum of:
      4.952564 = weight(author_txt:sojka in 1058) [ClassicSimilarity], result of:
        4.952564 = fieldWeight in 1058, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.905128 = idf(docFreq=5, maxDocs=44218)
          0.5 = fieldNorm(doc=1058)
    
  3. Líska, M.; Sojka, P.: MIaS 1.5 (2014) 4.95
    4.952564 = sum of:
      4.952564 = weight(author_txt:sojka in 1652) [ClassicSimilarity], result of:
        4.952564 = fieldWeight in 1652, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.905128 = idf(docFreq=5, maxDocs=44218)
          0.5 = fieldNorm(doc=1652)
    
  4. Sojka, P.; Liska, M.: ¬The art of mathematics retrieval (2011) 4.95
    4.952564 = sum of:
      4.952564 = weight(author_txt:sojka in 3450) [ClassicSimilarity], result of:
        4.952564 = fieldWeight in 3450, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.905128 = idf(docFreq=5, maxDocs=44218)
          0.5 = fieldNorm(doc=3450)
    

Similar documents (content)

  1. Kim, J.-M.; Shin, H.; Kim, H.-J.: Schema and constraints-based matching and merging of Topic Maps (2007) 0.17
    0.17296277 = sum of:
      0.17296277 = product of:
        0.6177242 = sum of:
          0.07918803 = weight(abstract_txt:dependent in 922) [ClassicSimilarity], result of:
            0.07918803 = score(doc=922,freq=2.0), product of:
              0.13990766 = queryWeight, product of:
                1.0084732 = boost
                6.4035826 = idf(docFreq=198, maxDocs=44218)
                0.021664772 = queryNorm
              0.5660021 = fieldWeight in 922, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.4035826 = idf(docFreq=198, maxDocs=44218)
                0.0625 = fieldNorm(doc=922)
          0.017713629 = weight(abstract_txt:using in 922) [ClassicSimilarity], result of:
            0.017713629 = score(doc=922,freq=1.0), product of:
              0.08183897 = queryWeight, product of:
                1.0907838 = boost
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.021664772 = queryNorm
              0.21644491 = fieldWeight in 922, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.0625 = fieldNorm(doc=922)
          0.07913018 = weight(abstract_txt:resolution in 922) [ClassicSimilarity], result of:
            0.07913018 = score(doc=922,freq=1.0), product of:
              0.17618676 = queryWeight, product of:
                1.1316972 = boost
                7.1860275 = idf(docFreq=90, maxDocs=44218)
                0.021664772 = queryNorm
              0.44912672 = fieldWeight in 922, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1860275 = idf(docFreq=90, maxDocs=44218)
                0.0625 = fieldNorm(doc=922)
          0.031194985 = weight(abstract_txt:language in 922) [ClassicSimilarity], result of:
            0.031194985 = score(doc=922,freq=1.0), product of:
              0.11934704 = queryWeight, product of:
                1.3172386 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.021664772 = queryNorm
              0.26138046 = fieldWeight in 922, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.0625 = fieldNorm(doc=922)
          0.03820208 = weight(abstract_txt:semantic in 922) [ClassicSimilarity], result of:
            0.03820208 = score(doc=922,freq=1.0), product of:
              0.13660917 = queryWeight, product of:
                1.409284 = boost
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.021664772 = queryNorm
              0.2796451 = fieldWeight in 922, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.0625 = fieldNorm(doc=922)
          0.2668499 = weight(abstract_txt:matching in 922) [ClassicSimilarity], result of:
            0.2668499 = score(doc=922,freq=8.0), product of:
              0.24959537 = queryWeight, product of:
                1.9049206 = boost
                6.047913 = idf(docFreq=283, maxDocs=44218)
                0.021664772 = queryNorm
              1.0691301 = fieldWeight in 922, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                6.047913 = idf(docFreq=283, maxDocs=44218)
                0.0625 = fieldNorm(doc=922)
          0.105445355 = weight(abstract_txt:entity in 922) [ClassicSimilarity], result of:
            0.105445355 = score(doc=922,freq=1.0), product of:
              0.26880673 = queryWeight, product of:
                1.9768726 = boost
                6.2763524 = idf(docFreq=225, maxDocs=44218)
                0.021664772 = queryNorm
              0.39227203 = fieldWeight in 922, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2763524 = idf(docFreq=225, maxDocs=44218)
                0.0625 = fieldNorm(doc=922)
        0.28 = coord(7/25)
    
  2. Steinberger, J.; Poesio, M.; Kabadjov, M.A.; Jezek, K.: Two uses of anaphora resolution in summarization (2007) 0.11
    0.11026329 = sum of:
      0.11026329 = product of:
        0.55131644 = sum of:
          0.03835113 = weight(abstract_txt:using in 949) [ClassicSimilarity], result of:
            0.03835113 = score(doc=949,freq=3.0), product of:
              0.08183897 = queryWeight, product of:
                1.0907838 = boost
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.021664772 = queryNorm
              0.46861696 = fieldWeight in 949, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.078125 = fieldNorm(doc=949)
          0.17132185 = weight(abstract_txt:resolution in 949) [ClassicSimilarity], result of:
            0.17132185 = score(doc=949,freq=3.0), product of:
              0.17618676 = queryWeight, product of:
                1.1316972 = boost
                7.1860275 = idf(docFreq=90, maxDocs=44218)
                0.021664772 = queryNorm
              0.97238785 = fieldWeight in 949, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.1860275 = idf(docFreq=90, maxDocs=44218)
                0.078125 = fieldNorm(doc=949)
          0.059634045 = weight(abstract_txt:document in 949) [ClassicSimilarity], result of:
            0.059634045 = score(doc=949,freq=2.0), product of:
              0.12573841 = queryWeight, product of:
                1.3520495 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.021664772 = queryNorm
              0.4742707 = fieldWeight in 949, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.078125 = fieldNorm(doc=949)
          0.047752604 = weight(abstract_txt:semantic in 949) [ClassicSimilarity], result of:
            0.047752604 = score(doc=949,freq=1.0), product of:
              0.13660917 = queryWeight, product of:
                1.409284 = boost
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.021664772 = queryNorm
              0.34955636 = fieldWeight in 949, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.078125 = fieldNorm(doc=949)
          0.23425679 = weight(abstract_txt:release in 949) [ClassicSimilarity], result of:
            0.23425679 = score(doc=949,freq=1.0), product of:
              0.39440578 = queryWeight, product of:
                2.394585 = boost
                7.602543 = idf(docFreq=59, maxDocs=44218)
                0.021664772 = queryNorm
              0.59394866 = fieldWeight in 949, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.602543 = idf(docFreq=59, maxDocs=44218)
                0.078125 = fieldNorm(doc=949)
        0.2 = coord(5/25)
    
  3. Shakir, H.S.; Nagao, M.: Context-sensitive processing of semantic queries in an image database system (1996) 0.11
    0.10615903 = sum of:
      0.10615903 = product of:
        0.53079516 = sum of:
          0.022142038 = weight(abstract_txt:using in 6626) [ClassicSimilarity], result of:
            0.022142038 = score(doc=6626,freq=1.0), product of:
              0.08183897 = queryWeight, product of:
                1.0907838 = boost
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.021664772 = queryNorm
              0.27055615 = fieldWeight in 6626, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.078125 = fieldNorm(doc=6626)
          0.067532375 = weight(abstract_txt:semantic in 6626) [ClassicSimilarity], result of:
            0.067532375 = score(doc=6626,freq=2.0), product of:
              0.13660917 = queryWeight, product of:
                1.409284 = boost
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.021664772 = queryNorm
              0.49434733 = fieldWeight in 6626, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.078125 = fieldNorm(doc=6626)
          0.105049625 = weight(abstract_txt:similarity in 6626) [ClassicSimilarity], result of:
            0.105049625 = score(doc=6626,freq=1.0), product of:
              0.23107067 = queryWeight, product of:
                1.8328673 = boost
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.021664772 = queryNorm
              0.4546212 = fieldWeight in 6626, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.078125 = fieldNorm(doc=6626)
          0.20426442 = weight(abstract_txt:matching in 6626) [ClassicSimilarity], result of:
            0.20426442 = score(doc=6626,freq=3.0), product of:
              0.24959537 = queryWeight, product of:
                1.9049206 = boost
                6.047913 = idf(docFreq=283, maxDocs=44218)
                0.021664772 = queryNorm
              0.8183822 = fieldWeight in 6626, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.047913 = idf(docFreq=283, maxDocs=44218)
                0.078125 = fieldNorm(doc=6626)
          0.1318067 = weight(abstract_txt:entity in 6626) [ClassicSimilarity], result of:
            0.1318067 = score(doc=6626,freq=1.0), product of:
              0.26880673 = queryWeight, product of:
                1.9768726 = boost
                6.2763524 = idf(docFreq=225, maxDocs=44218)
                0.021664772 = queryNorm
              0.49034002 = fieldWeight in 6626, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2763524 = idf(docFreq=225, maxDocs=44218)
                0.078125 = fieldNorm(doc=6626)
        0.2 = coord(5/25)
    
  4. Vani, K.; Gupta, D.: Integrating syntax-semantic-based text analysis with structural and citation information for scientific plagiarism detection (2018) 0.10
    0.100523844 = sum of:
      0.100523844 = product of:
        0.41884935 = sum of:
          0.017713629 = weight(abstract_txt:using in 4543) [ClassicSimilarity], result of:
            0.017713629 = score(doc=4543,freq=1.0), product of:
              0.08183897 = queryWeight, product of:
                1.0907838 = boost
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.021664772 = queryNorm
              0.21644491 = fieldWeight in 4543, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.0625 = fieldNorm(doc=4543)
          0.03373411 = weight(abstract_txt:document in 4543) [ClassicSimilarity], result of:
            0.03373411 = score(doc=4543,freq=1.0), product of:
              0.12573841 = queryWeight, product of:
                1.3520495 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.021664772 = queryNorm
              0.26828802 = fieldWeight in 4543, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.0625 = fieldNorm(doc=4543)
          0.06616794 = weight(abstract_txt:semantic in 4543) [ClassicSimilarity], result of:
            0.06616794 = score(doc=4543,freq=3.0), product of:
              0.13660917 = queryWeight, product of:
                1.409284 = boost
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.021664772 = queryNorm
              0.48435947 = fieldWeight in 4543, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.0625 = fieldNorm(doc=4543)
          0.11197098 = weight(abstract_txt:citation in 4543) [ClassicSimilarity], result of:
            0.11197098 = score(doc=4543,freq=5.0), product of:
              0.16361965 = queryWeight, product of:
                1.5423266 = boost
                4.896717 = idf(docFreq=897, maxDocs=44218)
                0.021664772 = queryNorm
              0.684337 = fieldWeight in 4543, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.896717 = idf(docFreq=897, maxDocs=44218)
                0.0625 = fieldNorm(doc=4543)
          0.0840397 = weight(abstract_txt:similarity in 4543) [ClassicSimilarity], result of:
            0.0840397 = score(doc=4543,freq=1.0), product of:
              0.23107067 = queryWeight, product of:
                1.8328673 = boost
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.021664772 = queryNorm
              0.36369696 = fieldWeight in 4543, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.0625 = fieldNorm(doc=4543)
          0.105222985 = weight(abstract_txt:final in 4543) [ClassicSimilarity], result of:
            0.105222985 = score(doc=4543,freq=1.0), product of:
              0.26842865 = queryWeight, product of:
                1.9754819 = boost
                6.2719374 = idf(docFreq=226, maxDocs=44218)
                0.021664772 = queryNorm
              0.3919961 = fieldWeight in 4543, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2719374 = idf(docFreq=226, maxDocs=44218)
                0.0625 = fieldNorm(doc=4543)
        0.24 = coord(6/25)
    
  5. Gipp, B.; Meuschke, N.; Breitinger, C.: Citation-based plagiarism detection : practicability on a large-scale scientific corpus (2014) 0.10
    0.096731335 = sum of:
      0.096731335 = product of:
        0.40304723 = sum of:
          0.017713629 = weight(abstract_txt:using in 3332) [ClassicSimilarity], result of:
            0.017713629 = score(doc=3332,freq=1.0), product of:
              0.08183897 = queryWeight, product of:
                1.0907838 = boost
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.021664772 = queryNorm
              0.21644491 = fieldWeight in 3332, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.0625 = fieldNorm(doc=3332)
          0.031194985 = weight(abstract_txt:language in 3332) [ClassicSimilarity], result of:
            0.031194985 = score(doc=3332,freq=1.0), product of:
              0.11934704 = queryWeight, product of:
                1.3172386 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.021664772 = queryNorm
              0.26138046 = fieldWeight in 3332, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.0625 = fieldNorm(doc=3332)
          0.047707234 = weight(abstract_txt:document in 3332) [ClassicSimilarity], result of:
            0.047707234 = score(doc=3332,freq=2.0), product of:
              0.12573841 = queryWeight, product of:
                1.3520495 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.021664772 = queryNorm
              0.37941656 = fieldWeight in 3332, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.0625 = fieldNorm(doc=3332)
          0.03820208 = weight(abstract_txt:semantic in 3332) [ClassicSimilarity], result of:
            0.03820208 = score(doc=3332,freq=1.0), product of:
              0.13660917 = queryWeight, product of:
                1.409284 = boost
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.021664772 = queryNorm
              0.2796451 = fieldWeight in 3332, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.0625 = fieldNorm(doc=3332)
          0.10014989 = weight(abstract_txt:citation in 3332) [ClassicSimilarity], result of:
            0.10014989 = score(doc=3332,freq=4.0), product of:
              0.16361965 = queryWeight, product of:
                1.5423266 = boost
                4.896717 = idf(docFreq=897, maxDocs=44218)
                0.021664772 = queryNorm
              0.61208963 = fieldWeight in 3332, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.896717 = idf(docFreq=897, maxDocs=44218)
                0.0625 = fieldNorm(doc=3332)
          0.1680794 = weight(abstract_txt:similarity in 3332) [ClassicSimilarity], result of:
            0.1680794 = score(doc=3332,freq=4.0), product of:
              0.23107067 = queryWeight, product of:
                1.8328673 = boost
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.021664772 = queryNorm
              0.7273939 = fieldWeight in 3332, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.0625 = fieldNorm(doc=3332)
        0.24 = coord(6/25)