Document (#42750)

Author
Morris, V.
Title
Automated language identification of bibliographic resources
Source
Cataloging and classification quarterly. 58(2020) no.1, S.1-27
Year
2020
Abstract
This article describes experiments in the use of machine learning techniques at the British Library to assign language codes to catalog records, in order to provide information about the language of content of the resources described. In the first phase of the project, language codes were assigned to 1.15 million records with 99.7% confidence. The automated language identification tools developed will be used to contribute to future enhancement of over 4 million legacy records.
Content
Vgl.: https://doi.org/10.1080/01639374.2019.1700201.
Theme
Formalerschließung
Computerlinguistik
Location
GB

Similar documents (author)

  1. Morris, L.R.: ¬The frequency of use of Library of Congress Classification numbers and Dewey Decimal Classification numbers in the MARC file in the field of library science (1991) 4.96
    4.9598045 = sum of:
      4.9598045 = weight(author_txt:morris in 2308) [ClassicSimilarity], result of:
        4.9598045 = fieldWeight in 2308, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.935687 = idf(docFreq=42, maxDocs=44218)
          0.625 = fieldNorm(doc=2308)
    
  2. Morris, S.: Metadata and rights (2000) 4.96
    4.9598045 = sum of:
      4.9598045 = weight(author_txt:morris in 2628) [ClassicSimilarity], result of:
        4.9598045 = fieldWeight in 2628, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.935687 = idf(docFreq=42, maxDocs=44218)
          0.625 = fieldNorm(doc=2628)
    
  3. Morris, K.: Software reviews: RediReference Plus (1990) 4.96
    4.9598045 = sum of:
      4.9598045 = weight(author_txt:morris in 3226) [ClassicSimilarity], result of:
        4.9598045 = fieldWeight in 3226, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.935687 = idf(docFreq=42, maxDocs=44218)
          0.625 = fieldNorm(doc=3226)
    
  4. Morris, L.R.: Choosing a bibliographic utility (1989) 4.96
    4.9598045 = sum of:
      4.9598045 = weight(author_txt:morris in 3727) [ClassicSimilarity], result of:
        4.9598045 = fieldWeight in 3727, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.935687 = idf(docFreq=42, maxDocs=44218)
          0.625 = fieldNorm(doc=3727)
    
  5. Morris, S.A.: Mapping research specialties (2008) 4.96
    4.9598045 = sum of:
      4.9598045 = weight(author_txt:morris in 3962) [ClassicSimilarity], result of:
        4.9598045 = fieldWeight in 3962, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.935687 = idf(docFreq=42, maxDocs=44218)
          0.625 = fieldNorm(doc=3962)
    

Similar documents (content)

  1. Bardenheier, P.; Wilkinson, E.H.; Dale, H.: Ki te Tika te Hanga, Ka Pakari te Kete : with the right structure we weave a strong basket (2015) 0.17
    0.16599183 = sum of:
      0.16599183 = product of:
        0.6916326 = sum of:
          0.033414584 = weight(abstract_txt:project in 2176) [ClassicSimilarity], result of:
            0.033414584 = score(doc=2176,freq=1.0), product of:
              0.08140564 = queryWeight, product of:
                1.0315381 = boost
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.018024322 = queryNorm
              0.41047013 = fieldWeight in 2176, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.09375 = fieldNorm(doc=2176)
          0.048928317 = weight(abstract_txt:described in 2176) [ClassicSimilarity], result of:
            0.048928317 = score(doc=2176,freq=1.0), product of:
              0.10497131 = queryWeight, product of:
                1.171368 = boost
                4.9718537 = idf(docFreq=832, maxDocs=44218)
                0.018024322 = queryNorm
              0.4661113 = fieldWeight in 2176, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9718537 = idf(docFreq=832, maxDocs=44218)
                0.09375 = fieldNorm(doc=2176)
          0.14445668 = weight(abstract_txt:enhancement in 2176) [ClassicSimilarity], result of:
            0.14445668 = score(doc=2176,freq=1.0), product of:
              0.21603405 = queryWeight, product of:
                1.680425 = boost
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.018024322 = queryNorm
              0.66867554 = fieldWeight in 2176, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.09375 = fieldNorm(doc=2176)
          0.15582006 = weight(abstract_txt:assign in 2176) [ClassicSimilarity], result of:
            0.15582006 = score(doc=2176,freq=1.0), product of:
              0.22721973 = queryWeight, product of:
                1.7233801 = boost
                7.314861 = idf(docFreq=79, maxDocs=44218)
                0.018024322 = queryNorm
              0.6857682 = fieldWeight in 2176, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.314861 = idf(docFreq=79, maxDocs=44218)
                0.09375 = fieldNorm(doc=2176)
          0.10310674 = weight(abstract_txt:records in 2176) [ClassicSimilarity], result of:
            0.10310674 = score(doc=2176,freq=1.0), product of:
              0.248845 = queryWeight, product of:
                3.1237993 = boost
                4.4196396 = idf(docFreq=1446, maxDocs=44218)
                0.018024322 = queryNorm
              0.4143412 = fieldWeight in 2176, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4196396 = idf(docFreq=1446, maxDocs=44218)
                0.09375 = fieldNorm(doc=2176)
          0.20590626 = weight(abstract_txt:language in 2176) [ClassicSimilarity], result of:
            0.20590626 = score(doc=2176,freq=2.0), product of:
              0.37135574 = queryWeight, product of:
                4.9264956 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.018024322 = queryNorm
              0.55447173 = fieldWeight in 2176, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.09375 = fieldNorm(doc=2176)
        0.24 = coord(6/25)
    
  2. Parka, A.L.; Panchyshyn, R.S.: ¬The path to an RDA hybridized catalog : lessons from the Kent State University Libraries' RDA enrichment project (2016) 0.16
    0.15819396 = sum of:
      0.15819396 = product of:
        0.65914154 = sum of:
          0.030442491 = weight(abstract_txt:over in 2632) [ClassicSimilarity], result of:
            0.030442491 = score(doc=2632,freq=1.0), product of:
              0.07650396 = queryWeight, product of:
                4.244485 = idf(docFreq=1723, maxDocs=44218)
                0.018024322 = queryNorm
              0.39792046 = fieldWeight in 2632, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.244485 = idf(docFreq=1723, maxDocs=44218)
                0.09375 = fieldNorm(doc=2632)
          0.057875752 = weight(abstract_txt:project in 2632) [ClassicSimilarity], result of:
            0.057875752 = score(doc=2632,freq=3.0), product of:
              0.08140564 = queryWeight, product of:
                1.0315381 = boost
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.018024322 = queryNorm
              0.7109551 = fieldWeight in 2632, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.09375 = fieldNorm(doc=2632)
          0.1737985 = weight(abstract_txt:legacy in 2632) [ClassicSimilarity], result of:
            0.1737985 = score(doc=2632,freq=1.0), product of:
              0.24437746 = queryWeight, product of:
                1.7872636 = boost
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.018024322 = queryNorm
              0.71118873 = fieldWeight in 2632, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.09375 = fieldNorm(doc=2632)
          0.05995156 = weight(abstract_txt:resources in 2632) [ClassicSimilarity], result of:
            0.05995156 = score(doc=2632,freq=1.0), product of:
              0.15144007 = queryWeight, product of:
                1.9897267 = boost
                4.2226825 = idf(docFreq=1761, maxDocs=44218)
                0.018024322 = queryNorm
              0.39587647 = fieldWeight in 2632, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2226825 = idf(docFreq=1761, maxDocs=44218)
                0.09375 = fieldNorm(doc=2632)
          0.19125831 = weight(abstract_txt:million in 2632) [ClassicSimilarity], result of:
            0.19125831 = score(doc=2632,freq=1.0), product of:
              0.32818645 = queryWeight, product of:
                2.9290943 = boost
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.018024322 = queryNorm
              0.5827733 = fieldWeight in 2632, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.09375 = fieldNorm(doc=2632)
          0.14581494 = weight(abstract_txt:records in 2632) [ClassicSimilarity], result of:
            0.14581494 = score(doc=2632,freq=2.0), product of:
              0.248845 = queryWeight, product of:
                3.1237993 = boost
                4.4196396 = idf(docFreq=1446, maxDocs=44218)
                0.018024322 = queryNorm
              0.58596694 = fieldWeight in 2632, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4196396 = idf(docFreq=1446, maxDocs=44218)
                0.09375 = fieldNorm(doc=2632)
        0.24 = coord(6/25)
    
  3. Fischer, T.; Neuroth, H.: SSG-FI - special subject gateways to high quality Internet resources for scientific users (2000) 0.13
    0.1343674 = sum of:
      0.1343674 = product of:
        0.47988358 = sum of:
          0.025368743 = weight(abstract_txt:over in 4873) [ClassicSimilarity], result of:
            0.025368743 = score(doc=4873,freq=1.0), product of:
              0.07650396 = queryWeight, product of:
                4.244485 = idf(docFreq=1723, maxDocs=44218)
                0.018024322 = queryNorm
              0.33160037 = fieldWeight in 4873, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.244485 = idf(docFreq=1723, maxDocs=44218)
                0.078125 = fieldNorm(doc=4873)
          0.04822979 = weight(abstract_txt:project in 4873) [ClassicSimilarity], result of:
            0.04822979 = score(doc=4873,freq=3.0), product of:
              0.08140564 = queryWeight, product of:
                1.0315381 = boost
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.018024322 = queryNorm
              0.59246254 = fieldWeight in 4873, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.078125 = fieldNorm(doc=4873)
          0.040773593 = weight(abstract_txt:described in 4873) [ClassicSimilarity], result of:
            0.040773593 = score(doc=4873,freq=1.0), product of:
              0.10497131 = queryWeight, product of:
                1.171368 = boost
                4.9718537 = idf(docFreq=832, maxDocs=44218)
                0.018024322 = queryNorm
              0.38842607 = fieldWeight in 4873, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9718537 = idf(docFreq=832, maxDocs=44218)
                0.078125 = fieldNorm(doc=4873)
          0.07024761 = weight(abstract_txt:contribute in 4873) [ClassicSimilarity], result of:
            0.07024761 = score(doc=4873,freq=1.0), product of:
              0.15085939 = queryWeight, product of:
                1.4042493 = boost
                5.9603148 = idf(docFreq=309, maxDocs=44218)
                0.018024322 = queryNorm
              0.4656496 = fieldWeight in 4873, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9603148 = idf(docFreq=309, maxDocs=44218)
                0.078125 = fieldNorm(doc=4873)
          0.049959637 = weight(abstract_txt:resources in 4873) [ClassicSimilarity], result of:
            0.049959637 = score(doc=4873,freq=1.0), product of:
              0.15144007 = queryWeight, product of:
                1.9897267 = boost
                4.2226825 = idf(docFreq=1761, maxDocs=44218)
                0.018024322 = queryNorm
              0.32989708 = fieldWeight in 4873, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2226825 = idf(docFreq=1761, maxDocs=44218)
                0.078125 = fieldNorm(doc=4873)
          0.15938191 = weight(abstract_txt:million in 4873) [ClassicSimilarity], result of:
            0.15938191 = score(doc=4873,freq=1.0), product of:
              0.32818645 = queryWeight, product of:
                2.9290943 = boost
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.018024322 = queryNorm
              0.4856444 = fieldWeight in 4873, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.078125 = fieldNorm(doc=4873)
          0.08592228 = weight(abstract_txt:records in 4873) [ClassicSimilarity], result of:
            0.08592228 = score(doc=4873,freq=1.0), product of:
              0.248845 = queryWeight, product of:
                3.1237993 = boost
                4.4196396 = idf(docFreq=1446, maxDocs=44218)
                0.018024322 = queryNorm
              0.34528434 = fieldWeight in 4873, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4196396 = idf(docFreq=1446, maxDocs=44218)
                0.078125 = fieldNorm(doc=4873)
        0.28 = coord(7/25)
    
  4. Kent, C.; Deliot, C.; Martyn, C.: Management information from classification : methods of collection analysis using DDC (2008) 0.13
    0.12862672 = sum of:
      0.12862672 = product of:
        0.45938116 = sum of:
          0.025368743 = weight(abstract_txt:over in 2165) [ClassicSimilarity], result of:
            0.025368743 = score(doc=2165,freq=1.0), product of:
              0.07650396 = queryWeight, product of:
                4.244485 = idf(docFreq=1723, maxDocs=44218)
                0.018024322 = queryNorm
              0.33160037 = fieldWeight in 2165, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.244485 = idf(docFreq=1723, maxDocs=44218)
                0.078125 = fieldNorm(doc=2165)
          0.027845485 = weight(abstract_txt:project in 2165) [ClassicSimilarity], result of:
            0.027845485 = score(doc=2165,freq=1.0), product of:
              0.08140564 = queryWeight, product of:
                1.0315381 = boost
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.018024322 = queryNorm
              0.34205842 = fieldWeight in 2165, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.078125 = fieldNorm(doc=2165)
          0.0487934 = weight(abstract_txt:machine in 2165) [ClassicSimilarity], result of:
            0.0487934 = score(doc=2165,freq=1.0), product of:
              0.118320145 = queryWeight, product of:
                1.2436191 = boost
                5.2785225 = idf(docFreq=612, maxDocs=44218)
                0.018024322 = queryNorm
              0.41238457 = fieldWeight in 2165, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2785225 = idf(docFreq=612, maxDocs=44218)
                0.078125 = fieldNorm(doc=2165)
          0.06935058 = weight(abstract_txt:british in 2165) [ClassicSimilarity], result of:
            0.06935058 = score(doc=2165,freq=1.0), product of:
              0.14957236 = queryWeight, product of:
                1.3982464 = boost
                5.934836 = idf(docFreq=317, maxDocs=44218)
                0.018024322 = queryNorm
              0.46365905 = fieldWeight in 2165, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.934836 = idf(docFreq=317, maxDocs=44218)
                0.078125 = fieldNorm(doc=2165)
          0.049959637 = weight(abstract_txt:resources in 2165) [ClassicSimilarity], result of:
            0.049959637 = score(doc=2165,freq=1.0), product of:
              0.15144007 = queryWeight, product of:
                1.9897267 = boost
                4.2226825 = idf(docFreq=1761, maxDocs=44218)
                0.018024322 = queryNorm
              0.32989708 = fieldWeight in 2165, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2226825 = idf(docFreq=1761, maxDocs=44218)
                0.078125 = fieldNorm(doc=2165)
          0.116731904 = weight(abstract_txt:automated in 2165) [ClassicSimilarity], result of:
            0.116731904 = score(doc=2165,freq=1.0), product of:
              0.26665783 = queryWeight, product of:
                2.6402814 = boost
                5.6033173 = idf(docFreq=442, maxDocs=44218)
                0.018024322 = queryNorm
              0.43775916 = fieldWeight in 2165, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6033173 = idf(docFreq=442, maxDocs=44218)
                0.078125 = fieldNorm(doc=2165)
          0.121331416 = weight(abstract_txt:language in 2165) [ClassicSimilarity], result of:
            0.121331416 = score(doc=2165,freq=1.0), product of:
              0.37135574 = queryWeight, product of:
                4.9264956 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.018024322 = queryNorm
              0.32672557 = fieldWeight in 2165, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.078125 = fieldNorm(doc=2165)
        0.28 = coord(7/25)
    
  5. Hodges, D.W.; Schlottmann, K.: better archival migration outcomes with Python and the Google Sheets API : Reporting from the archives (2019) 0.13
    0.12664399 = sum of:
      0.12664399 = product of:
        0.45229992 = sum of:
          0.020294994 = weight(abstract_txt:over in 5444) [ClassicSimilarity], result of:
            0.020294994 = score(doc=5444,freq=1.0), product of:
              0.07650396 = queryWeight, product of:
                4.244485 = idf(docFreq=1723, maxDocs=44218)
                0.018024322 = queryNorm
              0.2652803 = fieldWeight in 5444, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.244485 = idf(docFreq=1723, maxDocs=44218)
                0.0625 = fieldNorm(doc=5444)
          0.038583834 = weight(abstract_txt:project in 5444) [ClassicSimilarity], result of:
            0.038583834 = score(doc=5444,freq=3.0), product of:
              0.08140564 = queryWeight, product of:
                1.0315381 = boost
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.018024322 = queryNorm
              0.47397006 = fieldWeight in 5444, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.0625 = fieldNorm(doc=5444)
          0.02340748 = weight(abstract_txt:tools in 5444) [ClassicSimilarity], result of:
            0.02340748 = score(doc=5444,freq=1.0), product of:
              0.08413843 = queryWeight, product of:
                1.0487096 = boost
                4.451232 = idf(docFreq=1401, maxDocs=44218)
                0.018024322 = queryNorm
              0.278202 = fieldWeight in 5444, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.451232 = idf(docFreq=1401, maxDocs=44218)
                0.0625 = fieldNorm(doc=5444)
          0.09202459 = weight(abstract_txt:phase in 5444) [ClassicSimilarity], result of:
            0.09202459 = score(doc=5444,freq=2.0), product of:
              0.16634786 = queryWeight, product of:
                1.4745742 = boost
                6.258808 = idf(docFreq=229, maxDocs=44218)
                0.018024322 = queryNorm
              0.5532057 = fieldWeight in 5444, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.258808 = idf(docFreq=229, maxDocs=44218)
                0.0625 = fieldNorm(doc=5444)
          0.11586567 = weight(abstract_txt:legacy in 5444) [ClassicSimilarity], result of:
            0.11586567 = score(doc=5444,freq=1.0), product of:
              0.24437746 = queryWeight, product of:
                1.7872636 = boost
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.018024322 = queryNorm
              0.47412583 = fieldWeight in 5444, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.0625 = fieldNorm(doc=5444)
          0.093385525 = weight(abstract_txt:automated in 5444) [ClassicSimilarity], result of:
            0.093385525 = score(doc=5444,freq=1.0), product of:
              0.26665783 = queryWeight, product of:
                2.6402814 = boost
                5.6033173 = idf(docFreq=442, maxDocs=44218)
                0.018024322 = queryNorm
              0.35020733 = fieldWeight in 5444, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6033173 = idf(docFreq=442, maxDocs=44218)
                0.0625 = fieldNorm(doc=5444)
          0.06873783 = weight(abstract_txt:records in 5444) [ClassicSimilarity], result of:
            0.06873783 = score(doc=5444,freq=1.0), product of:
              0.248845 = queryWeight, product of:
                3.1237993 = boost
                4.4196396 = idf(docFreq=1446, maxDocs=44218)
                0.018024322 = queryNorm
              0.27622747 = fieldWeight in 5444, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4196396 = idf(docFreq=1446, maxDocs=44218)
                0.0625 = fieldNorm(doc=5444)
        0.28 = coord(7/25)