Document (#38041)

Author
Pabón, G.
Gutiérrez, C.
Fernández, J.D.
Martínez-Prieto, M.A.
Title
Linked Open Data technologies for publication of census microdata
Source
Journal of the American Society for Information Science and Technology. 64(2013) no.9, S.1802-1814
Year
2013
Abstract
Censuses are one of the most relevant types of statistical data, allowing analyses of the population in terms of demography, economy, sociology, and culture. For fine-grained analysis, census agencies publish census microdata that consist of a sample of individual records of the census containing detailed anonymous individual information. Working with microdata from different censuses and doing comparative studies are currently difficult tasks due to the diversity of formats and granularities. In this article, we show that novel data processing techniques can be applied to make census microdata interoperable and easy to access and combine. In fact, we demonstrate how Linked Open Data principles, a set of techniques to publish and make connections of (semi-)structured data on the web, can be fruitfully applied to census microdata. We present a step-by-step process to achieve this goal and we study, in theory and practice, two real case studies: the 2001 Spanish census and a general framework for Integrated Public Use Microdata Series (IPUMS-I).

Similar documents (author)

  1. Prieto-Díaz, R.: Applying faceted classification to domain analysis (1992) 1.20
    1.2020062 = sum of:
      1.2020062 = product of:
        3.6060185 = sum of:
          3.6060185 = weight(author_txt:prieto in 195) [ClassicSimilarity], result of:
            3.6060185 = score(doc=195,freq=1.0), product of:
              0.7281115 = queryWeight, product of:
                1.275602 = boost
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.057626553 = queryNorm
              4.952564 = fieldWeight in 195, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.5 = fieldNorm(doc=195)
        0.33333334 = coord(1/3)
    
  2. Prieto-Díaz, R.: Implementing faceted classification for software reuse (1991) 1.20
    1.2020062 = sum of:
      1.2020062 = product of:
        3.6060185 = sum of:
          3.6060185 = weight(author_txt:prieto in 479) [ClassicSimilarity], result of:
            3.6060185 = score(doc=479,freq=1.0), product of:
              0.7281115 = queryWeight, product of:
                1.275602 = boost
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.057626553 = queryNorm
              4.952564 = fieldWeight in 479, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.5 = fieldNorm(doc=479)
        0.33333334 = coord(1/3)
    
  3. Prieto-Díaz, R.: ¬A faceted approach to building ontologies (2002) 1.20
    1.2020062 = sum of:
      1.2020062 = product of:
        3.6060185 = sum of:
          3.6060185 = weight(author_txt:prieto in 2259) [ClassicSimilarity], result of:
            3.6060185 = score(doc=2259,freq=1.0), product of:
              0.7281115 = queryWeight, product of:
                1.275602 = boost
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.057626553 = queryNorm
              4.952564 = fieldWeight in 2259, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.5 = fieldNorm(doc=2259)
        0.33333334 = coord(1/3)
    
  4. Moreno Fernández, L.M. -> Fernández, L.M.M.: 0.90
    0.8957731 = sum of:
      0.8957731 = product of:
        2.6873193 = sum of:
          2.6873193 = weight(author_txt:fernández in 5951) [ClassicSimilarity], result of:
            2.6873193 = score(doc=5951,freq=2.0), product of:
              0.5192503 = queryWeight, product of:
                1.0772204 = boost
                8.364683 = idf(docFreq=27, maxDocs=44218)
                0.057626553 = queryNorm
              5.1753836 = fieldWeight in 5951, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.364683 = idf(docFreq=27, maxDocs=44218)
                0.4375 = fieldNorm(doc=5951)
        0.33333334 = coord(1/3)
    
  5. Hernández, S. Fernández- -> Fernández-Hernández, S.: 0.77
    0.7678055 = sum of:
      0.7678055 = product of:
        2.3034165 = sum of:
          2.3034165 = weight(author_txt:fernández in 1953) [ClassicSimilarity], result of:
            2.3034165 = score(doc=1953,freq=2.0), product of:
              0.5192503 = queryWeight, product of:
                1.0772204 = boost
                8.364683 = idf(docFreq=27, maxDocs=44218)
                0.057626553 = queryNorm
              4.436043 = fieldWeight in 1953, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.364683 = idf(docFreq=27, maxDocs=44218)
                0.375 = fieldNorm(doc=1953)
        0.33333334 = coord(1/3)
    

Similar documents (content)

  1. Phenix, K.: Software for libraries : reviews of products for librarians and patrons (1993) 0.16
    0.15551414 = sum of:
      0.15551414 = product of:
        1.2959511 = sum of:
          0.02733775 = weight(abstract_txt:studies in 6755) [ClassicSimilarity], result of:
            0.02733775 = score(doc=6755,freq=1.0), product of:
              0.05856314 = queryWeight, product of:
                1.2890803 = boost
                4.26796 = idf(docFreq=1683, maxDocs=44218)
                0.010644469 = queryNorm
              0.46680814 = fieldWeight in 6755, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.26796 = idf(docFreq=1683, maxDocs=44218)
                0.109375 = fieldNorm(doc=6755)
          0.056547873 = weight(abstract_txt:data in 6755) [ClassicSimilarity], result of:
            0.056547873 = score(doc=6755,freq=3.0), product of:
              0.089467704 = queryWeight, product of:
                2.5192482 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.010644469 = queryNorm
              0.6320479 = fieldWeight in 6755, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.109375 = fieldNorm(doc=6755)
          1.2120655 = weight(abstract_txt:census in 6755) [ClassicSimilarity], result of:
            1.2120655 = score(doc=6755,freq=2.0), product of:
              0.8840549 = queryWeight, product of:
                9.37004 = boost
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.010644469 = queryNorm
              1.3710296 = fieldWeight in 6755, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.109375 = fieldNorm(doc=6755)
        0.12 = coord(3/25)
    
  2. Lamb, I.; Larson, C.: Shining a light on scientific data : building a data catalog to foster data sharing and reuse (2016) 0.15
    0.1492678 = sum of:
      0.1492678 = product of:
        0.74633896 = sum of:
          0.036671393 = weight(abstract_txt:population in 3195) [ClassicSimilarity], result of:
            0.036671393 = score(doc=3195,freq=1.0), product of:
              0.070752665 = queryWeight, product of:
                1.0018996 = boost
                6.634292 = idf(docFreq=157, maxDocs=44218)
                0.010644469 = queryNorm
              0.51830405 = fieldWeight in 3195, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.634292 = idf(docFreq=157, maxDocs=44218)
                0.078125 = fieldNorm(doc=3195)
          0.019526964 = weight(abstract_txt:studies in 3195) [ClassicSimilarity], result of:
            0.019526964 = score(doc=3195,freq=1.0), product of:
              0.05856314 = queryWeight, product of:
                1.2890803 = boost
                4.26796 = idf(docFreq=1683, maxDocs=44218)
                0.010644469 = queryNorm
              0.33343437 = fieldWeight in 3195, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.26796 = idf(docFreq=1683, maxDocs=44218)
                0.078125 = fieldNorm(doc=3195)
          0.02581003 = weight(abstract_txt:make in 3195) [ClassicSimilarity], result of:
            0.02581003 = score(doc=3195,freq=1.0), product of:
              0.070533186 = queryWeight, product of:
                1.4147006 = boost
                4.6838713 = idf(docFreq=1110, maxDocs=44218)
                0.010644469 = queryNorm
              0.36592746 = fieldWeight in 3195, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6838713 = idf(docFreq=1110, maxDocs=44218)
                0.078125 = fieldNorm(doc=3195)
          0.052144997 = weight(abstract_txt:data in 3195) [ClassicSimilarity], result of:
            0.052144997 = score(doc=3195,freq=5.0), product of:
              0.089467704 = queryWeight, product of:
                2.5192482 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.010644469 = queryNorm
              0.582836 = fieldWeight in 3195, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.078125 = fieldNorm(doc=3195)
          0.61218554 = weight(abstract_txt:census in 3195) [ClassicSimilarity], result of:
            0.61218554 = score(doc=3195,freq=1.0), product of:
              0.8840549 = queryWeight, product of:
                9.37004 = boost
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.010644469 = queryNorm
              0.69247454 = fieldWeight in 3195, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.078125 = fieldNorm(doc=3195)
        0.2 = coord(5/25)
    
  3. Mixter, J.; Childress, E.R.: FAST (Faceted Application of Subject Terminology) users : summary and case studies (2013) 0.09
    0.089255616 = sum of:
      0.089255616 = product of:
        0.5578476 = sum of:
          0.029170565 = weight(abstract_txt:agencies in 2011) [ClassicSimilarity], result of:
            0.029170565 = score(doc=2011,freq=1.0), product of:
              0.07048463 = queryWeight, product of:
                6.6217136 = idf(docFreq=159, maxDocs=44218)
                0.010644469 = queryNorm
              0.4138571 = fieldWeight in 2011, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6217136 = idf(docFreq=159, maxDocs=44218)
                0.0625 = fieldNorm(doc=2011)
          0.015621571 = weight(abstract_txt:studies in 2011) [ClassicSimilarity], result of:
            0.015621571 = score(doc=2011,freq=1.0), product of:
              0.05856314 = queryWeight, product of:
                1.2890803 = boost
                4.26796 = idf(docFreq=1683, maxDocs=44218)
                0.010644469 = queryNorm
              0.2667475 = fieldWeight in 2011, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.26796 = idf(docFreq=1683, maxDocs=44218)
                0.0625 = fieldNorm(doc=2011)
          0.02330705 = weight(abstract_txt:individual in 2011) [ClassicSimilarity], result of:
            0.02330705 = score(doc=2011,freq=1.0), product of:
              0.07646559 = queryWeight, product of:
                1.4729935 = boost
                4.8768706 = idf(docFreq=915, maxDocs=44218)
                0.010644469 = queryNorm
              0.3048044 = fieldWeight in 2011, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8768706 = idf(docFreq=915, maxDocs=44218)
                0.0625 = fieldNorm(doc=2011)
          0.48974842 = weight(abstract_txt:census in 2011) [ClassicSimilarity], result of:
            0.48974842 = score(doc=2011,freq=1.0), product of:
              0.8840549 = queryWeight, product of:
                9.37004 = boost
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.010644469 = queryNorm
              0.55397964 = fieldWeight in 2011, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.0625 = fieldNorm(doc=2011)
        0.16 = coord(4/25)
    
  4. Leoncini, C.; Servello, R.M.: ¬The activities for authority control in EDIT16: authors, publishers/printers, devices, and places (2004) 0.08
    0.0813447 = sum of:
      0.0813447 = product of:
        1.0168087 = sum of:
          0.037311923 = weight(abstract_txt:data in 5529) [ClassicSimilarity], result of:
            0.037311923 = score(doc=5529,freq=1.0), product of:
              0.089467704 = queryWeight, product of:
                2.5192482 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.010644469 = queryNorm
              0.41704348 = fieldWeight in 5529, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.125 = fieldNorm(doc=5529)
          0.97949684 = weight(abstract_txt:census in 5529) [ClassicSimilarity], result of:
            0.97949684 = score(doc=5529,freq=1.0), product of:
              0.8840549 = queryWeight, product of:
                9.37004 = boost
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.010644469 = queryNorm
              1.1079593 = fieldWeight in 5529, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.125 = fieldNorm(doc=5529)
        0.08 = coord(2/25)
    
  5. Hernon, P.; Dugan, R.E.: GIS and privacy (1997) 0.08
    0.079756714 = sum of:
      0.079756714 = product of:
        0.6646393 = sum of:
          0.029133813 = weight(abstract_txt:individual in 1583) [ClassicSimilarity], result of:
            0.029133813 = score(doc=1583,freq=1.0), product of:
              0.07646559 = queryWeight, product of:
                1.4729935 = boost
                4.8768706 = idf(docFreq=915, maxDocs=44218)
                0.010644469 = queryNorm
              0.38100553 = fieldWeight in 1583, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8768706 = idf(docFreq=915, maxDocs=44218)
                0.078125 = fieldNorm(doc=1583)
          0.023319952 = weight(abstract_txt:data in 1583) [ClassicSimilarity], result of:
            0.023319952 = score(doc=1583,freq=1.0), product of:
              0.089467704 = queryWeight, product of:
                2.5192482 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.010644469 = queryNorm
              0.26065218 = fieldWeight in 1583, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.078125 = fieldNorm(doc=1583)
          0.61218554 = weight(abstract_txt:census in 1583) [ClassicSimilarity], result of:
            0.61218554 = score(doc=1583,freq=1.0), product of:
              0.8840549 = queryWeight, product of:
                9.37004 = boost
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.010644469 = queryNorm
              0.69247454 = fieldWeight in 1583, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.078125 = fieldNorm(doc=1583)
        0.12 = coord(3/25)