Search (2 results, page 1 of 1)

  • × theme_ss:"Computerlinguistik"
  • × theme_ss:"Formalerschließung"
  1. Morris, V.: Automated language identification of bibliographic resources (2020) 0.03
    0.027334882 = product of:
      0.0683372 = sum of:
        0.005448922 = weight(_text_:a in 5749) [ClassicSimilarity], result of:
          0.005448922 = score(doc=5749,freq=2.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.10191591 = fieldWeight in 5749, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=5749)
        0.06288828 = sum of:
          0.012630116 = weight(_text_:information in 5749) [ClassicSimilarity], result of:
            0.012630116 = score(doc=5749,freq=2.0), product of:
              0.08139861 = queryWeight, product of:
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.046368346 = queryNorm
              0.1551638 = fieldWeight in 5749, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.0625 = fieldNorm(doc=5749)
          0.050258167 = weight(_text_:22 in 5749) [ClassicSimilarity], result of:
            0.050258167 = score(doc=5749,freq=2.0), product of:
              0.16237405 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046368346 = queryNorm
              0.30952093 = fieldWeight in 5749, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0625 = fieldNorm(doc=5749)
      0.4 = coord(2/5)
    
    Abstract
    This article describes experiments in the use of machine learning techniques at the British Library to assign language codes to catalog records, in order to provide information about the language of content of the resources described. In the first phase of the project, language codes were assigned to 1.15 million records with 99.7% confidence. The automated language identification tools developed will be used to contribute to future enhancement of over 4 million legacy records.
    Date
    2. 3.2020 19:04:22
    Type
    a
  2. Corbara, S.; Moreo, A.; Sebastiani, F.: Syllabic quantity patterns as rhythmic features for Latin authorship attribution (2023) 0.01
    0.0051638708 = product of:
      0.012909677 = sum of:
        0.008173384 = weight(_text_:a in 846) [ClassicSimilarity], result of:
          0.008173384 = score(doc=846,freq=8.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.15287387 = fieldWeight in 846, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=846)
        0.0047362936 = product of:
          0.009472587 = sum of:
            0.009472587 = weight(_text_:information in 846) [ClassicSimilarity], result of:
              0.009472587 = score(doc=846,freq=2.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.116372846 = fieldWeight in 846, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046875 = fieldNorm(doc=846)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    It is well known that, within the Latin production of written text, peculiar metric schemes were followed not only in poetic compositions, but also in many prose works. Such metric patterns were based on so-called syllabic quantity, that is, on the length of the involved syllables, and there is substantial evidence suggesting that certain authors had a preference for certain metric patterns over others. In this research we investigate the possibility to employ syllabic quantity as a base for deriving rhythmic features for the task of computational authorship attribution of Latin prose texts. We test the impact of these features on the authorship attribution task when combined with other topic-agnostic features. Our experiments, carried out on three different datasets using support vector machines (SVMs) show that rhythmic features based on syllabic quantity are beneficial in discriminating among Latin prose authors.
    Source
    Journal of the Association for Information Science and Technology. 74(2023) no.1, S.128-141
    Type
    a