Search (4 results, page 1 of 1)

  • × author_ss:"Arsenault, C."
  • × type_ss:"a"
  • × year_i:[2000 TO 2010}
  1. Arsenault, C.: Word division in the transcription of Chinese script in the title fields of bibliographic Records (2001) 0.02
    0.023040833 = product of:
      0.046081666 = sum of:
        0.046081666 = product of:
          0.09216333 = sum of:
            0.09216333 = weight(_text_:bibliographic in 5434) [ClassicSimilarity], result of:
              0.09216333 = score(doc=5434,freq=6.0), product of:
                0.17672792 = queryWeight, product of:
                  3.893044 = idf(docFreq=2449, maxDocs=44218)
                  0.045395818 = queryNorm
                0.52149844 = fieldWeight in 5434, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.893044 = idf(docFreq=2449, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5434)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Recently, the Library of Congress adopted the pinyin Romanization system for transcribing Chinese data in its bibliographic records. In its canonical form, pinyin aggregates Chinese "words" into single linguistic units, but pinyin entries could be constructed following either a monosyllabic or a polysyllabic pattern. Although the former is easier and less costly to implement, the latter method is potentially more beneficial for end-users, as it reduces ambiguity, and generates a much larger variety of indexable terms. The current study investigates if following the polysyllabic method improves retrieval efficiency and effectiveness in item-specific searching within online bibliographic databases. Analysis of the results revealed that aggregation of monosyllables does improve efficiency significantly (p < .05), especially during keyword searches, while effectiveness remains mainly unaffected.
  2. Arsenault, C.: Aggregation consistency and frequency of Chinese words and characters (2006) 0.02
    0.01900376 = product of:
      0.03800752 = sum of:
        0.03800752 = product of:
          0.07601504 = sum of:
            0.07601504 = weight(_text_:bibliographic in 609) [ClassicSimilarity], result of:
              0.07601504 = score(doc=609,freq=8.0), product of:
                0.17672792 = queryWeight, product of:
                  3.893044 = idf(docFreq=2449, maxDocs=44218)
                  0.045395818 = queryNorm
                0.43012467 = fieldWeight in 609, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.893044 = idf(docFreq=2449, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=609)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Purpose - Aims to measure syllable aggregation consistency of Romanized Chinese data in the title fields of bibliographic records. Also aims to verify if the term frequency distributions satisfy conventional bibliometric laws. Design/methodology/approach - Uses Cooper's interindexer formula to evaluate aggregation consistency within and between two sets of Chinese bibliographic data. Compares the term frequency distributions of polysyllabic words and monosyllabic characters (for vernacular and Romanized data) with the Lotka and the generalised Zipf theoretical distributions. The fits are tested with the Kolmogorov-Smirnov test. Findings - Finds high internal aggregation consistency within each data set but some aggregation discrepancy between sets. Shows that word (polysyllabic) distributions satisfy Lotka's law but that character (monosyllabic) distributions do not abide by the law. Research limitations/implications - The findings are limited to only two sets of bibliographic data (for aggregation consistency analysis) and to one set of data for the frequency distribution analysis. Only two bibliometric distributions are tested. Internal consistency within each database remains fairly high. Therefore the main argument against syllable aggregation does not appear to hold true. The analysis revealed that Chinese words and characters behave differently in terms of frequency distribution but that there is no noticeable difference between vernacular and Romanized data. The distribution of Romanized characters exhibits the worst case in terms of fit to either Lotka's or Zipf's laws, which indicates that Romanized data in aggregated form appear to be a preferable option. Originality/value - Provides empirical data on consistency and distribution of Romanized Chinese titles in bibliographic records.
  3. Arsenault, C.: Testing the impact of syllable aggregation in romanized fields of Chinese language bibliographic records (2000) 0.02
    0.016457738 = product of:
      0.032915477 = sum of:
        0.032915477 = product of:
          0.06583095 = sum of:
            0.06583095 = weight(_text_:bibliographic in 87) [ClassicSimilarity], result of:
              0.06583095 = score(doc=87,freq=6.0), product of:
                0.17672792 = queryWeight, product of:
                  3.893044 = idf(docFreq=2449, maxDocs=44218)
                  0.045395818 = queryNorm
                0.3724989 = fieldWeight in 87, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.893044 = idf(docFreq=2449, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=87)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Today, two Romanization systems for Chinese data are in use in most libraries in the Western world: 1) Wade-Giles, and 2) Hanyu pinyin (simply referred to as pinyin). In 1997, the Library of Congress finally officially announced the adoption of pinyin for Romanizing Chinese data in its bibliographic records. One of the main problems in implementing the pinyin standard for library use is that pinyin, as opposed to Wade-Giles, aggregates Chinese "words" into single linguistic units. Chinese characters represent monosyllabic morphemes rather than words and are equally spaced from one another, and the Chinese text, in its original form, does not provide visual cues as to where a word starts or ends. When the script is romanized it is however essential that syllables or words be separated from one another, since, in most information retrieval techniques, the identification of "visual words" is required. In this respect, the Romanized strings could be divided either in monosyllables or in polysyllable words. This study aims to explore the impact of using either unaggregated pinyin (monosyllabic) or aggregated pinyin (polysyllabic) Romanization in Chinese-language bibliographic records. An experiment, using transaction log analysis, was carried out to observe variations in the retrieval performance of title searches-both phrase and keyword-in a large OPAC of Chinese language records. General results are presented and a summary of the pros and cons of using either method is given
  4. Arsenault, C.; Ménard, E.: Searching titles with initial articles in library catalogs : a case study and search behavior analysis (2007) 0.01
    0.00922576 = product of:
      0.01845152 = sum of:
        0.01845152 = product of:
          0.03690304 = sum of:
            0.03690304 = weight(_text_:22 in 2264) [ClassicSimilarity], result of:
              0.03690304 = score(doc=2264,freq=2.0), product of:
                0.15896842 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045395818 = queryNorm
                0.23214069 = fieldWeight in 2264, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2264)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    10. 9.2000 17:38:22