Document (#42133)

Author
Hubain, R.
Wilde, M. De
Hooland, S. van
Title
Automated SKOS vocabulary design for the biopharmaceutical industry
Source
Cataloging and classification quarterly. 54(2016) no.7, S.403-417
Year
2016
Abstract
Ensuring quick and consistent access to large collections of unstructured documents is one of the biggest challenges facing knowledge-intensive organizations. Designing specific vocabularies to index and retrieve documents is often deemed too expensive, full-text search being preferred despite its known limitations. However, the process of creating controlled vocabularies can be partly automated thanks to natural language processing and machine learning techniques. With a case study from the biopharmaceutical industry, we demonstrate how small organizations can use an automated workflow in order to create a controlled vocabulary to index unstructured documents in a semantically meaningful way.
Content
Vgl.: https://doi.org/10.1080/01639374.2016.1201560.
Theme
Semantische Interoperabilität
Field
Pharmazie
Object
SKOS
Area
Informationswirtschaft

Similar documents (author)

  1. Hooland, S. van; Verborgh, R.; Wilde, M. De; Hercher, J.; Mannens, E.; Wa, R.Van de: Evaluating the success of vocabulary reconciliation for cultural heritage collections (2013) 3.43
    3.430317 = sum of:
      3.430317 = sum of:
        1.6077683 = weight(author_txt:wilde in 662) [ClassicSimilarity], result of:
          1.6077683 = score(doc=662,freq=1.0), product of:
            0.6769791 = queryWeight, product of:
              9.499662 = idf(docFreq=8, maxDocs=44218)
              0.07126349 = queryNorm
            2.3749156 = fieldWeight in 662, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.499662 = idf(docFreq=8, maxDocs=44218)
              0.25 = fieldNorm(doc=662)
        1.8225487 = weight(author_txt:hooland in 662) [ClassicSimilarity], result of:
          1.8225487 = score(doc=662,freq=1.0), product of:
            0.73600215 = queryWeight, product of:
              1.042682 = boost
              9.905128 = idf(docFreq=5, maxDocs=44218)
              0.07126349 = queryNorm
            2.476282 = fieldWeight in 662, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.905128 = idf(docFreq=5, maxDocs=44218)
              0.25 = fieldNorm(doc=662)
    
  2. Wilde, D.U.: Generation and use of machine-readable data bases (1976) 2.01
    2.0097106 = sum of:
      2.0097106 = product of:
        4.019421 = sum of:
          4.019421 = weight(author_txt:wilde in 267) [ClassicSimilarity], result of:
            4.019421 = score(doc=267,freq=1.0), product of:
              0.6769791 = queryWeight, product of:
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.07126349 = queryNorm
              5.937289 = fieldWeight in 267, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.625 = fieldNorm(doc=267)
        0.5 = coord(1/2)
    
  3. Wilde, E.: Semantische Interoperabilität von XML Schemas (2005) 2.01
    2.0097106 = sum of:
      2.0097106 = product of:
        4.019421 = sum of:
          4.019421 = weight(author_txt:wilde in 155) [ClassicSimilarity], result of:
            4.019421 = score(doc=155,freq=1.0), product of:
              0.6769791 = queryWeight, product of:
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.07126349 = queryNorm
              5.937289 = fieldWeight in 155, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.625 = fieldNorm(doc=155)
        0.5 = coord(1/2)
    
  4. Hooland, S. van; Verborgh, R.: Linked data for Lilibraries, archives and museums : how to clean, link, and publish your metadata (2014) 1.59
    1.5947301 = sum of:
      1.5947301 = product of:
        3.1894603 = sum of:
          3.1894603 = weight(author_txt:hooland in 5153) [ClassicSimilarity], result of:
            3.1894603 = score(doc=5153,freq=1.0), product of:
              0.73600215 = queryWeight, product of:
                1.042682 = boost
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.07126349 = queryNorm
              4.333493 = fieldWeight in 5153, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.4375 = fieldNorm(doc=5153)
        0.5 = coord(1/2)
    
  5. Hooland, S. van; Bontemps, Y.; Kaufman, S.: Answering the call for more accountability : applying data profiling to museum metadata (2008) 1.37
    1.3669115 = sum of:
      1.3669115 = product of:
        2.733823 = sum of:
          2.733823 = weight(author_txt:hooland in 2644) [ClassicSimilarity], result of:
            2.733823 = score(doc=2644,freq=1.0), product of:
              0.73600215 = queryWeight, product of:
                1.042682 = boost
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.07126349 = queryNorm
              3.7144227 = fieldWeight in 2644, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.375 = fieldNorm(doc=2644)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Zhang, J.; Mostafa, J.; Tripathy, H.: Information retrieval by semantic analysis and visualization of the concept space of D-Lib® magazine (2002) 0.12
    0.11547507 = sum of:
      0.11547507 = product of:
        0.3608596 = sum of:
          0.0433955 = weight(abstract_txt:intensive in 1211) [ClassicSimilarity], result of:
            0.0433955 = score(doc=1211,freq=2.0), product of:
              0.1477243 = queryWeight, product of:
                1.0825679 = boost
                6.6470313 = idf(docFreq=155, maxDocs=44218)
                0.02052906 = queryNorm
              0.29376006 = fieldWeight in 1211, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.6470313 = idf(docFreq=155, maxDocs=44218)
                0.03125 = fieldNorm(doc=1211)
          0.044206504 = weight(abstract_txt:partly in 1211) [ClassicSimilarity], result of:
            0.044206504 = score(doc=1211,freq=1.0), product of:
              0.1884327 = queryWeight, product of:
                1.2226646 = boost
                7.5072327 = idf(docFreq=65, maxDocs=44218)
                0.02052906 = queryNorm
              0.23460102 = fieldWeight in 1211, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5072327 = idf(docFreq=65, maxDocs=44218)
                0.03125 = fieldNorm(doc=1211)
          0.022380475 = weight(abstract_txt:index in 1211) [ClassicSimilarity], result of:
            0.022380475 = score(doc=1211,freq=1.0), product of:
              0.15080707 = queryWeight, product of:
                1.5468744 = boost
                4.74895 = idf(docFreq=1040, maxDocs=44218)
                0.02052906 = queryNorm
              0.14840469 = fieldWeight in 1211, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.74895 = idf(docFreq=1040, maxDocs=44218)
                0.03125 = fieldNorm(doc=1211)
          0.032148145 = weight(abstract_txt:vocabulary in 1211) [ClassicSimilarity], result of:
            0.032148145 = score(doc=1211,freq=1.0), product of:
              0.19199038 = queryWeight, product of:
                1.7453556 = boost
                5.358293 = idf(docFreq=565, maxDocs=44218)
                0.02052906 = queryNorm
              0.16744666 = fieldWeight in 1211, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.358293 = idf(docFreq=565, maxDocs=44218)
                0.03125 = fieldNorm(doc=1211)
          0.043202996 = weight(abstract_txt:vocabularies in 1211) [ClassicSimilarity], result of:
            0.043202996 = score(doc=1211,freq=1.0), product of:
              0.2338037 = queryWeight, product of:
                1.9260603 = boost
                5.913062 = idf(docFreq=324, maxDocs=44218)
                0.02052906 = queryNorm
              0.18478319 = fieldWeight in 1211, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.913062 = idf(docFreq=324, maxDocs=44218)
                0.03125 = fieldNorm(doc=1211)
          0.043883502 = weight(abstract_txt:documents in 1211) [ClassicSimilarity], result of:
            0.043883502 = score(doc=1211,freq=4.0), product of:
              0.17036751 = queryWeight, product of:
                2.0136464 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.02052906 = queryNorm
              0.2575814 = fieldWeight in 1211, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.03125 = fieldNorm(doc=1211)
          0.07649783 = weight(abstract_txt:unstructured in 1211) [ClassicSimilarity], result of:
            0.07649783 = score(doc=1211,freq=1.0), product of:
              0.34219596 = queryWeight, product of:
                2.330138 = boost
                7.1535926 = idf(docFreq=93, maxDocs=44218)
                0.02052906 = queryNorm
              0.22354977 = fieldWeight in 1211, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1535926 = idf(docFreq=93, maxDocs=44218)
                0.03125 = fieldNorm(doc=1211)
          0.055144656 = weight(abstract_txt:automated in 1211) [ClassicSimilarity], result of:
            0.055144656 = score(doc=1211,freq=1.0), product of:
              0.31492576 = queryWeight, product of:
                2.7377508 = boost
                5.6033173 = idf(docFreq=442, maxDocs=44218)
                0.02052906 = queryNorm
              0.17510366 = fieldWeight in 1211, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6033173 = idf(docFreq=442, maxDocs=44218)
                0.03125 = fieldNorm(doc=1211)
        0.32 = coord(8/25)
    
  2. Angjeli, A.; Isaac, A.: Semantic web and vocabularies interoperability : an experiment with illuminations collections (2008) 0.11
    0.10861552 = sum of:
      0.10861552 = product of:
        0.45256466 = sum of:
          0.059457533 = weight(abstract_txt:semantically in 2324) [ClassicSimilarity], result of:
            0.059457533 = score(doc=2324,freq=1.0), product of:
              0.15810467 = queryWeight, product of:
                1.1199576 = boost
                6.8766055 = idf(docFreq=123, maxDocs=44218)
                0.02052906 = queryNorm
              0.37606436 = fieldWeight in 2324, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8766055 = idf(docFreq=123, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2324)
          0.0694636 = weight(abstract_txt:skos in 2324) [ClassicSimilarity], result of:
            0.0694636 = score(doc=2324,freq=1.0), product of:
              0.17537929 = queryWeight, product of:
                1.1795554 = boost
                7.24254 = idf(docFreq=85, maxDocs=44218)
                0.02052906 = queryNorm
              0.3960764 = fieldWeight in 2324, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.24254 = idf(docFreq=85, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2324)
          0.039165832 = weight(abstract_txt:index in 2324) [ClassicSimilarity], result of:
            0.039165832 = score(doc=2324,freq=1.0), product of:
              0.15080707 = queryWeight, product of:
                1.5468744 = boost
                4.74895 = idf(docFreq=1040, maxDocs=44218)
                0.02052906 = queryNorm
              0.2597082 = fieldWeight in 2324, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.74895 = idf(docFreq=1040, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2324)
          0.05625926 = weight(abstract_txt:vocabulary in 2324) [ClassicSimilarity], result of:
            0.05625926 = score(doc=2324,freq=1.0), product of:
              0.19199038 = queryWeight, product of:
                1.7453556 = boost
                5.358293 = idf(docFreq=565, maxDocs=44218)
                0.02052906 = queryNorm
              0.29303166 = fieldWeight in 2324, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.358293 = idf(docFreq=565, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2324)
          0.05915995 = weight(abstract_txt:controlled in 2324) [ClassicSimilarity], result of:
            0.05915995 = score(doc=2324,freq=1.0), product of:
              0.19853419 = queryWeight, product of:
                1.7748508 = boost
                5.4488444 = idf(docFreq=516, maxDocs=44218)
                0.02052906 = queryNorm
              0.29798368 = fieldWeight in 2324, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4488444 = idf(docFreq=516, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2324)
          0.16905846 = weight(abstract_txt:vocabularies in 2324) [ClassicSimilarity], result of:
            0.16905846 = score(doc=2324,freq=5.0), product of:
              0.2338037 = queryWeight, product of:
                1.9260603 = boost
                5.913062 = idf(docFreq=324, maxDocs=44218)
                0.02052906 = queryNorm
              0.7230786 = fieldWeight in 2324, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.913062 = idf(docFreq=324, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2324)
        0.24 = coord(6/25)
    
  3. Harpring, P.: Introduction to controlled vocabularies : terminology for art, architecture, and other cultural works (2010) 0.10
    0.10253049 = sum of:
      0.10253049 = product of:
        0.64081556 = sum of:
          0.08037037 = weight(abstract_txt:vocabulary in 4164) [ClassicSimilarity], result of:
            0.08037037 = score(doc=4164,freq=1.0), product of:
              0.19199038 = queryWeight, product of:
                1.7453556 = boost
                5.358293 = idf(docFreq=565, maxDocs=44218)
                0.02052906 = queryNorm
              0.41861665 = fieldWeight in 4164, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.358293 = idf(docFreq=565, maxDocs=44218)
                0.078125 = fieldNorm(doc=4164)
          0.18897952 = weight(abstract_txt:controlled in 4164) [ClassicSimilarity], result of:
            0.18897952 = score(doc=4164,freq=5.0), product of:
              0.19853419 = queryWeight, product of:
                1.7748508 = boost
                5.4488444 = idf(docFreq=516, maxDocs=44218)
                0.02052906 = queryNorm
              0.95187396 = fieldWeight in 4164, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.4488444 = idf(docFreq=516, maxDocs=44218)
                0.078125 = fieldNorm(doc=4164)
          0.08570476 = weight(abstract_txt:organizations in 4164) [ClassicSimilarity], result of:
            0.08570476 = score(doc=4164,freq=1.0), product of:
              0.20039433 = queryWeight, product of:
                1.783146 = boost
                5.474311 = idf(docFreq=503, maxDocs=44218)
                0.02052906 = queryNorm
              0.42768055 = fieldWeight in 4164, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.474311 = idf(docFreq=503, maxDocs=44218)
                0.078125 = fieldNorm(doc=4164)
          0.28576094 = weight(abstract_txt:vocabularies in 4164) [ClassicSimilarity], result of:
            0.28576094 = score(doc=4164,freq=7.0), product of:
              0.2338037 = queryWeight, product of:
                1.9260603 = boost
                5.913062 = idf(docFreq=324, maxDocs=44218)
                0.02052906 = queryNorm
              1.2222259 = fieldWeight in 4164, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.913062 = idf(docFreq=324, maxDocs=44218)
                0.078125 = fieldNorm(doc=4164)
        0.16 = coord(4/25)
    
  4. Vatant, B.; Dunsire, G.: Use case vocabulary merging (2010) 0.10
    0.10028128 = sum of:
      0.10028128 = product of:
        0.4178387 = sum of:
          0.059540227 = weight(abstract_txt:skos in 4336) [ClassicSimilarity], result of:
            0.059540227 = score(doc=4336,freq=1.0), product of:
              0.17537929 = queryWeight, product of:
                1.1795554 = boost
                7.24254 = idf(docFreq=85, maxDocs=44218)
                0.02052906 = queryNorm
              0.33949405 = fieldWeight in 4336, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.24254 = idf(docFreq=85, maxDocs=44218)
                0.046875 = fieldNorm(doc=4336)
          0.062308308 = weight(abstract_txt:expensive in 4336) [ClassicSimilarity], result of:
            0.062308308 = score(doc=4336,freq=1.0), product of:
              0.1807737 = queryWeight, product of:
                1.1975588 = boost
                7.3530817 = idf(docFreq=76, maxDocs=44218)
                0.02052906 = queryNorm
              0.34467572 = fieldWeight in 4336, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3530817 = idf(docFreq=76, maxDocs=44218)
                0.046875 = fieldNorm(doc=4336)
          0.047476154 = weight(abstract_txt:index in 4336) [ClassicSimilarity], result of:
            0.047476154 = score(doc=4336,freq=2.0), product of:
              0.15080707 = queryWeight, product of:
                1.5468744 = boost
                4.74895 = idf(docFreq=1040, maxDocs=44218)
                0.02052906 = queryNorm
              0.31481385 = fieldWeight in 4336, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.74895 = idf(docFreq=1040, maxDocs=44218)
                0.046875 = fieldNorm(doc=4336)
          0.06819652 = weight(abstract_txt:vocabulary in 4336) [ClassicSimilarity], result of:
            0.06819652 = score(doc=4336,freq=2.0), product of:
              0.19199038 = queryWeight, product of:
                1.7453556 = boost
                5.358293 = idf(docFreq=565, maxDocs=44218)
                0.02052906 = queryNorm
              0.355208 = fieldWeight in 4336, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.358293 = idf(docFreq=565, maxDocs=44218)
                0.046875 = fieldNorm(doc=4336)
          0.050708525 = weight(abstract_txt:controlled in 4336) [ClassicSimilarity], result of:
            0.050708525 = score(doc=4336,freq=1.0), product of:
              0.19853419 = queryWeight, product of:
                1.7748508 = boost
                5.4488444 = idf(docFreq=516, maxDocs=44218)
                0.02052906 = queryNorm
              0.25541458 = fieldWeight in 4336, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4488444 = idf(docFreq=516, maxDocs=44218)
                0.046875 = fieldNorm(doc=4336)
          0.12960897 = weight(abstract_txt:vocabularies in 4336) [ClassicSimilarity], result of:
            0.12960897 = score(doc=4336,freq=4.0), product of:
              0.2338037 = queryWeight, product of:
                1.9260603 = boost
                5.913062 = idf(docFreq=324, maxDocs=44218)
                0.02052906 = queryNorm
              0.55434954 = fieldWeight in 4336, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.913062 = idf(docFreq=324, maxDocs=44218)
                0.046875 = fieldNorm(doc=4336)
        0.24 = coord(6/25)
    
  5. Wang, J.: Automatic thesaurus development : term extraction from title metadata (2006) 0.10
    0.09683736 = sum of:
      0.09683736 = product of:
        0.48418677 = sum of:
          0.059020683 = weight(abstract_txt:meaningful in 5063) [ClassicSimilarity], result of:
            0.059020683 = score(doc=5063,freq=1.0), product of:
              0.143929 = queryWeight, product of:
                1.068571 = boost
                6.5610886 = idf(docFreq=169, maxDocs=44218)
                0.02052906 = queryNorm
              0.41006804 = fieldWeight in 5063, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5610886 = idf(docFreq=169, maxDocs=44218)
                0.0625 = fieldNorm(doc=5063)
          0.111364454 = weight(abstract_txt:vocabulary in 5063) [ClassicSimilarity], result of:
            0.111364454 = score(doc=5063,freq=3.0), product of:
              0.19199038 = queryWeight, product of:
                1.7453556 = boost
                5.358293 = idf(docFreq=565, maxDocs=44218)
                0.02052906 = queryNorm
              0.58005226 = fieldWeight in 5063, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.358293 = idf(docFreq=565, maxDocs=44218)
                0.0625 = fieldNorm(doc=5063)
          0.117106326 = weight(abstract_txt:controlled in 5063) [ClassicSimilarity], result of:
            0.117106326 = score(doc=5063,freq=3.0), product of:
              0.19853419 = queryWeight, product of:
                1.7748508 = boost
                5.4488444 = idf(docFreq=516, maxDocs=44218)
                0.02052906 = queryNorm
              0.5898547 = fieldWeight in 5063, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.4488444 = idf(docFreq=516, maxDocs=44218)
                0.0625 = fieldNorm(doc=5063)
          0.08640599 = weight(abstract_txt:vocabularies in 5063) [ClassicSimilarity], result of:
            0.08640599 = score(doc=5063,freq=1.0), product of:
              0.2338037 = queryWeight, product of:
                1.9260603 = boost
                5.913062 = idf(docFreq=324, maxDocs=44218)
                0.02052906 = queryNorm
              0.36956638 = fieldWeight in 5063, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.913062 = idf(docFreq=324, maxDocs=44218)
                0.0625 = fieldNorm(doc=5063)
          0.11028931 = weight(abstract_txt:automated in 5063) [ClassicSimilarity], result of:
            0.11028931 = score(doc=5063,freq=1.0), product of:
              0.31492576 = queryWeight, product of:
                2.7377508 = boost
                5.6033173 = idf(docFreq=442, maxDocs=44218)
                0.02052906 = queryNorm
              0.35020733 = fieldWeight in 5063, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6033173 = idf(docFreq=442, maxDocs=44218)
                0.0625 = fieldNorm(doc=5063)
        0.2 = coord(5/25)