Document (#42033)

Author
Grün, S.
Poley, C
Title
Statistische Analysen von Semantic Entities aus Metadaten- und Volltextbeständen von German Medical Science
Source
GMS Medizin-Bibliothek-Information. 17(2017) no.3, S.1-5
Year
2017
Abstract
This paper analyzes the information content of metadata and full texts in German Medical Science (GMS) articles in English language. The object of the study is to compare semantic entities that are used to enrich GMS metadata (titles and abstracts) and GMS full texts. The aim of the study is to test whether using full texts increases the value added information. The comparison and evaluation of semantic entities was done statistically. Measures of descriptive statistics were gathered for this purpose. In addition to the ratio of central tendencies and scatterings, we computed the overlaps and complements of the values. The results show a distinct increase of information when full texts are added. On average, metadata contain 25 different entities and full texts 215. 89% of the concepts in the metadata are also represented in the full texts. Hence, 11% of the metadata concepts are found in the metadata only. In summary, the results show that the addition of full texts increases the informational value, e.g. for information retrieval processes.
Theme
Metadaten
Field
Medizin

Similar documents (content)

  1. Chen, S.-J.: Semantic enrichment of linked personal authority data : a case study of elites in late imperial China (2019) 0.19
    0.18870129 = sum of:
      0.18870129 = product of:
        0.78625536 = sum of:
          0.01215579 = weight(abstract_txt:results in 5642) [ClassicSimilarity], result of:
            0.01215579 = score(doc=5642,freq=1.0), product of:
              0.055849817 = queryWeight, product of:
                1.0140162 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.015815957 = queryNorm
              0.21765138 = fieldWeight in 5642, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.0625 = fieldNorm(doc=5642)
          0.039154064 = weight(abstract_txt:addition in 5642) [ClassicSimilarity], result of:
            0.039154064 = score(doc=5642,freq=1.0), product of:
              0.12181035 = queryWeight, product of:
                1.4975319 = boost
                5.142954 = idf(docFreq=701, maxDocs=44218)
                0.015815957 = queryNorm
              0.32143462 = fieldWeight in 5642, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.142954 = idf(docFreq=701, maxDocs=44218)
                0.0625 = fieldNorm(doc=5642)
          0.07734668 = weight(abstract_txt:semantic in 5642) [ClassicSimilarity], result of:
            0.07734668 = score(doc=5642,freq=4.0), product of:
              0.13829437 = queryWeight, product of:
                1.9542578 = boost
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.015815957 = queryNorm
              0.5592902 = fieldWeight in 5642, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.0625 = fieldNorm(doc=5642)
          0.27988377 = weight(abstract_txt:entities in 5642) [ClassicSimilarity], result of:
            0.27988377 = score(doc=5642,freq=6.0), product of:
              0.31340867 = queryWeight, product of:
                3.39707 = boost
                5.8332562 = idf(docFreq=351, maxDocs=44218)
                0.015815957 = queryNorm
              0.89303136 = fieldWeight in 5642, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.8332562 = idf(docFreq=351, maxDocs=44218)
                0.0625 = fieldNorm(doc=5642)
          0.12017311 = weight(abstract_txt:full in 5642) [ClassicSimilarity], result of:
            0.12017311 = score(doc=5642,freq=1.0), product of:
              0.39059544 = queryWeight, product of:
                5.0168552 = boost
                4.922663 = idf(docFreq=874, maxDocs=44218)
                0.015815957 = queryNorm
              0.30766645 = fieldWeight in 5642, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.922663 = idf(docFreq=874, maxDocs=44218)
                0.0625 = fieldNorm(doc=5642)
          0.25754195 = weight(abstract_txt:texts in 5642) [ClassicSimilarity], result of:
            0.25754195 = score(doc=5642,freq=2.0), product of:
              0.5153208 = queryWeight, product of:
                5.7624454 = boost
                5.6542544 = idf(docFreq=420, maxDocs=44218)
                0.015815957 = queryNorm
              0.4997702 = fieldWeight in 5642, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.6542544 = idf(docFreq=420, maxDocs=44218)
                0.0625 = fieldNorm(doc=5642)
        0.24 = coord(6/25)
    
  2. Kragelj, M.; Borstnar, M.K.: Automatic classification of older electronic texts into the Universal Decimal Classification-UDC (2021) 0.17
    0.16942412 = sum of:
      0.16942412 = product of:
        0.7059338 = sum of:
          0.010636317 = weight(abstract_txt:results in 175) [ClassicSimilarity], result of:
            0.010636317 = score(doc=175,freq=1.0), product of:
              0.055849817 = queryWeight, product of:
                1.0140162 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.015815957 = queryNorm
              0.19044496 = fieldWeight in 175, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.0546875 = fieldNorm(doc=175)
          0.014494956 = weight(abstract_txt:science in 175) [ClassicSimilarity], result of:
            0.014494956 = score(doc=175,freq=1.0), product of:
              0.06864973 = queryWeight, product of:
                1.1242254 = boost
                3.8609126 = idf(docFreq=2529, maxDocs=44218)
                0.015815957 = queryNorm
              0.21114366 = fieldWeight in 175, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.8609126 = idf(docFreq=2529, maxDocs=44218)
                0.0546875 = fieldNorm(doc=175)
          0.021052605 = weight(abstract_txt:value in 175) [ClassicSimilarity], result of:
            0.021052605 = score(doc=175,freq=1.0), product of:
              0.088043675 = queryWeight, product of:
                1.2731602 = boost
                4.3723974 = idf(docFreq=1516, maxDocs=44218)
                0.015815957 = queryNorm
              0.23911548 = fieldWeight in 175, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3723974 = idf(docFreq=1516, maxDocs=44218)
                0.0546875 = fieldNorm(doc=175)
          0.007147143 = weight(abstract_txt:information in 175) [ClassicSimilarity], result of:
            0.007147143 = score(doc=175,freq=1.0), product of:
              0.053983275 = queryWeight, product of:
                1.4098685 = boost
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.015815957 = queryNorm
              0.1323955 = fieldWeight in 175, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.0546875 = fieldNorm(doc=175)
          0.14870664 = weight(abstract_txt:full in 175) [ClassicSimilarity], result of:
            0.14870664 = score(doc=175,freq=2.0), product of:
              0.39059544 = queryWeight, product of:
                5.0168552 = boost
                4.922663 = idf(docFreq=874, maxDocs=44218)
                0.015815957 = queryNorm
              0.3807178 = fieldWeight in 175, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.922663 = idf(docFreq=874, maxDocs=44218)
                0.0546875 = fieldNorm(doc=175)
          0.5038962 = weight(abstract_txt:texts in 175) [ClassicSimilarity], result of:
            0.5038962 = score(doc=175,freq=10.0), product of:
              0.5153208 = queryWeight, product of:
                5.7624454 = boost
                5.6542544 = idf(docFreq=420, maxDocs=44218)
                0.015815957 = queryNorm
              0.9778302 = fieldWeight in 175, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                5.6542544 = idf(docFreq=420, maxDocs=44218)
                0.0546875 = fieldNorm(doc=175)
        0.24 = coord(6/25)
    
  3. Chen, S.-J.: Semantic enrichment of linked archival materials (2019) 0.16
    0.1566393 = sum of:
      0.1566393 = product of:
        0.48949778 = sum of:
          0.01215579 = weight(abstract_txt:results in 5488) [ClassicSimilarity], result of:
            0.01215579 = score(doc=5488,freq=1.0), product of:
              0.055849817 = queryWeight, product of:
                1.0140162 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.015815957 = queryNorm
              0.21765138 = fieldWeight in 5488, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.0625 = fieldNorm(doc=5488)
          0.057832498 = weight(abstract_txt:enrich in 5488) [ClassicSimilarity], result of:
            0.057832498 = score(doc=5488,freq=1.0), product of:
              0.12539232 = queryWeight, product of:
                1.0743715 = boost
                7.3793993 = idf(docFreq=74, maxDocs=44218)
                0.015815957 = queryNorm
              0.46121246 = fieldWeight in 5488, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3793993 = idf(docFreq=74, maxDocs=44218)
                0.0625 = fieldNorm(doc=5488)
          0.02406012 = weight(abstract_txt:value in 5488) [ClassicSimilarity], result of:
            0.02406012 = score(doc=5488,freq=1.0), product of:
              0.088043675 = queryWeight, product of:
                1.2731602 = boost
                4.3723974 = idf(docFreq=1516, maxDocs=44218)
                0.015815957 = queryNorm
              0.27327484 = fieldWeight in 5488, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3723974 = idf(docFreq=1516, maxDocs=44218)
                0.0625 = fieldNorm(doc=5488)
          0.008168164 = weight(abstract_txt:information in 5488) [ClassicSimilarity], result of:
            0.008168164 = score(doc=5488,freq=1.0), product of:
              0.053983275 = queryWeight, product of:
                1.4098685 = boost
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.015815957 = queryNorm
              0.15130915 = fieldWeight in 5488, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.0625 = fieldNorm(doc=5488)
          0.06400903 = weight(abstract_txt:added in 5488) [ClassicSimilarity], result of:
            0.06400903 = score(doc=5488,freq=1.0), product of:
              0.16904168 = queryWeight, product of:
                1.7641312 = boost
                6.0585327 = idf(docFreq=280, maxDocs=44218)
                0.015815957 = queryNorm
              0.3786583 = fieldWeight in 5488, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0585327 = idf(docFreq=280, maxDocs=44218)
                0.0625 = fieldNorm(doc=5488)
          0.066984184 = weight(abstract_txt:semantic in 5488) [ClassicSimilarity], result of:
            0.066984184 = score(doc=5488,freq=3.0), product of:
              0.13829437 = queryWeight, product of:
                1.9542578 = boost
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.015815957 = queryNorm
              0.48435947 = fieldWeight in 5488, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.0625 = fieldNorm(doc=5488)
          0.11426207 = weight(abstract_txt:entities in 5488) [ClassicSimilarity], result of:
            0.11426207 = score(doc=5488,freq=1.0), product of:
              0.31340867 = queryWeight, product of:
                3.39707 = boost
                5.8332562 = idf(docFreq=351, maxDocs=44218)
                0.015815957 = queryNorm
              0.36457852 = fieldWeight in 5488, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8332562 = idf(docFreq=351, maxDocs=44218)
                0.0625 = fieldNorm(doc=5488)
          0.1420259 = weight(abstract_txt:metadata in 5488) [ClassicSimilarity], result of:
            0.1420259 = score(doc=5488,freq=2.0), product of:
              0.3291863 = queryWeight, product of:
                4.2639832 = boost
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.015815957 = queryNorm
              0.43144536 = fieldWeight in 5488, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.0625 = fieldNorm(doc=5488)
        0.32 = coord(8/25)
    
  4. Mai, F.; Galke, L.; Scherp, A.: Using deep learning for title-based semantic subject indexing to reach competitive performance to full-text (2018) 0.15
    0.14787485 = sum of:
      0.14787485 = product of:
        0.7393742 = sum of:
          0.010636317 = weight(abstract_txt:results in 4093) [ClassicSimilarity], result of:
            0.010636317 = score(doc=4093,freq=1.0), product of:
              0.055849817 = queryWeight, product of:
                1.0140162 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.015815957 = queryNorm
              0.19044496 = fieldWeight in 4093, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4093)
          0.049414113 = weight(abstract_txt:medical in 4093) [ClassicSimilarity], result of:
            0.049414113 = score(doc=4093,freq=1.0), product of:
              0.15549923 = queryWeight, product of:
                1.6919913 = boost
                5.8107834 = idf(docFreq=359, maxDocs=44218)
                0.015815957 = queryNorm
              0.31777722 = fieldWeight in 4093, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8107834 = idf(docFreq=359, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4093)
          0.08787404 = weight(abstract_txt:metadata in 4093) [ClassicSimilarity], result of:
            0.08787404 = score(doc=4093,freq=1.0), product of:
              0.3291863 = queryWeight, product of:
                4.2639832 = boost
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.015815957 = queryNorm
              0.2669432 = fieldWeight in 4093, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4093)
          0.31545442 = weight(abstract_txt:full in 4093) [ClassicSimilarity], result of:
            0.31545442 = score(doc=4093,freq=9.0), product of:
              0.39059544 = queryWeight, product of:
                5.0168552 = boost
                4.922663 = idf(docFreq=874, maxDocs=44218)
                0.015815957 = queryNorm
              0.80762446 = fieldWeight in 4093, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                4.922663 = idf(docFreq=874, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4093)
          0.2759953 = weight(abstract_txt:texts in 4093) [ClassicSimilarity], result of:
            0.2759953 = score(doc=4093,freq=3.0), product of:
              0.5153208 = queryWeight, product of:
                5.7624454 = boost
                5.6542544 = idf(docFreq=420, maxDocs=44218)
                0.015815957 = queryNorm
              0.5355796 = fieldWeight in 4093, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.6542544 = idf(docFreq=420, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4093)
        0.2 = coord(5/25)
    
  5. Wang, S.; Koopman, R.: Embed first, then predict (2019) 0.14
    0.13801326 = sum of:
      0.13801326 = product of:
        0.57505524 = sum of:
          0.07957427 = weight(abstract_txt:computed in 5400) [ClassicSimilarity], result of:
            0.07957427 = score(doc=5400,freq=1.0), product of:
              0.13367948 = queryWeight, product of:
                1.109306 = boost
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.015815957 = queryNorm
              0.5952617 = fieldWeight in 5400, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.078125 = fieldNorm(doc=5400)
          0.030772055 = weight(abstract_txt:show in 5400) [ClassicSimilarity], result of:
            0.030772055 = score(doc=5400,freq=1.0), product of:
              0.08939858 = queryWeight, product of:
                1.2829192 = boost
                4.4059124 = idf(docFreq=1466, maxDocs=44218)
                0.015815957 = queryNorm
              0.3442119 = fieldWeight in 5400, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4059124 = idf(docFreq=1466, maxDocs=44218)
                0.078125 = fieldNorm(doc=5400)
          0.048942577 = weight(abstract_txt:addition in 5400) [ClassicSimilarity], result of:
            0.048942577 = score(doc=5400,freq=1.0), product of:
              0.12181035 = queryWeight, product of:
                1.4975319 = boost
                5.142954 = idf(docFreq=701, maxDocs=44218)
                0.015815957 = queryNorm
              0.40179327 = fieldWeight in 5400, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.142954 = idf(docFreq=701, maxDocs=44218)
                0.078125 = fieldNorm(doc=5400)
          0.10001628 = weight(abstract_txt:increases in 5400) [ClassicSimilarity], result of:
            0.10001628 = score(doc=5400,freq=1.0), product of:
              0.19615832 = queryWeight, product of:
                1.9003664 = boost
                6.5264034 = idf(docFreq=175, maxDocs=44218)
                0.015815957 = queryNorm
              0.5098753 = fieldWeight in 5400, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5264034 = idf(docFreq=175, maxDocs=44218)
                0.078125 = fieldNorm(doc=5400)
          0.068365455 = weight(abstract_txt:semantic in 5400) [ClassicSimilarity], result of:
            0.068365455 = score(doc=5400,freq=2.0), product of:
              0.13829437 = queryWeight, product of:
                1.9542578 = boost
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.015815957 = queryNorm
              0.49434733 = fieldWeight in 5400, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.078125 = fieldNorm(doc=5400)
          0.24738462 = weight(abstract_txt:entities in 5400) [ClassicSimilarity], result of:
            0.24738462 = score(doc=5400,freq=3.0), product of:
              0.31340867 = queryWeight, product of:
                3.39707 = boost
                5.8332562 = idf(docFreq=351, maxDocs=44218)
                0.015815957 = queryNorm
              0.7893356 = fieldWeight in 5400, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.8332562 = idf(docFreq=351, maxDocs=44218)
                0.078125 = fieldNorm(doc=5400)
        0.24 = coord(6/25)