Search (3 results, page 1 of 1)

Tsuji, K.; Kageura, K.: Automatic generation of Japanese-English bilingual thesauri based on bilingual corpora (2006) 0.05
```
0.047185194 = sum of:
  0.021194125 = product of:
    0.0847765 = sum of:
      0.0847765 = weight(_text_:authors in 5061) [ClassicSimilarity], result of:
        0.0847765 = score(doc=5061,freq=4.0), product of:
          0.23803101 = queryWeight, product of:
            4.558814 = idf(docFreq=1258, maxDocs=44218)
            0.052213363 = queryNorm
          0.35615736 = fieldWeight in 5061, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            4.558814 = idf(docFreq=1258, maxDocs=44218)
            0.0390625 = fieldNorm(doc=5061)
    0.25 = coord(1/4)
  0.02599107 = product of:
    0.05198214 = sum of:
      0.05198214 = weight(_text_:k in 5061) [ClassicSimilarity], result of:
        0.05198214 = score(doc=5061,freq=4.0), product of:
          0.18639012 = queryWeight, product of:
            3.569778 = idf(docFreq=3384, maxDocs=44218)
            0.052213363 = queryNorm
          0.2788889 = fieldWeight in 5061, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            3.569778 = idf(docFreq=3384, maxDocs=44218)
            0.0390625 = fieldNorm(doc=5061)
    0.5 = coord(1/2)
```
Abstract

The authors propose a method for automatically generating Japanese-English bilingual thesauri based on bilingual corpora. The term bilingual thesaurus refers to a set of bilingual equivalent words and their synonyms. Most of the methods proposed so far for extracting bilingual equivalent word clusters from bilingual corpora depend heavily on word frequency and are not effective for dealing with low-frequency clusters. These low-frequency bilingual clusters are worth extracting because they contain many newly coined terms that are in demand but are not listed in existing bilingual thesauri. Assuming that single language-pair-independent methods such as frequency-based ones have reached their limitations and that a language-pair-dependent method used in combination with other methods shows promise, the authors propose the following approach: (a) Extract translation pairs based on transliteration patterns; (b) remove the pairs from among the candidate words; (c) extract translation pairs based on word frequency from the remaining candidate words; and (d) generate bilingual clusters based on the extracted pairs using a graph-theoretic method. The proposed method has been found to be significantly more effective than other methods.
Yoshikane, F.; Kageura, K.; Tsuji, K.: ¬A method for the comparative analysis of concentration of author productivity, giving consideration to the effect of sample size dependency of statistical measures (2003) 0.05
```
0.047185194 = sum of:
  0.021194125 = product of:
    0.0847765 = sum of:
      0.0847765 = weight(_text_:authors in 5123) [ClassicSimilarity], result of:
        0.0847765 = score(doc=5123,freq=4.0), product of:
          0.23803101 = queryWeight, product of:
            4.558814 = idf(docFreq=1258, maxDocs=44218)
            0.052213363 = queryNorm
          0.35615736 = fieldWeight in 5123, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            4.558814 = idf(docFreq=1258, maxDocs=44218)
            0.0390625 = fieldNorm(doc=5123)
    0.25 = coord(1/4)
  0.02599107 = product of:
    0.05198214 = sum of:
      0.05198214 = weight(_text_:k in 5123) [ClassicSimilarity], result of:
        0.05198214 = score(doc=5123,freq=4.0), product of:
          0.18639012 = queryWeight, product of:
            3.569778 = idf(docFreq=3384, maxDocs=44218)
            0.052213363 = queryNorm
          0.2788889 = fieldWeight in 5123, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            3.569778 = idf(docFreq=3384, maxDocs=44218)
            0.0390625 = fieldNorm(doc=5123)
    0.5 = coord(1/2)
```
Abstract

Studies of the concentration of author productivity based upon counts of papers by individual authors will produce measures that change systematically with sample size. Yoshikane, Kageura, and Tsuji seek a statistical framework which will avoid this scale effect problem. Using the number of authors in a field as an absolute concentration measure, and Gini's index as a relative concentration measure, they describe four literatures form both viewpoints with measures insensitive to one another. Both measures will increase with sample size. They then plot profiles of the two measures on the basis of a Monte-Carlo simulation of 1000 trials for 20 equally spaced intervals and compare the characteristics of the literatures. Using data from conferences hosted by four academic societies between 1992 and 1997, they find a coefficient of loss exceeding 0.15 indicating measures will depend highly on sample size. The simulation shows that a larger sample size leads to lower absolute concentration and higher relative concentration. Comparisons made at the same sample size present quite different results than the original data and allow direct comparison of population characteristics.

Kageura, K.: ¬The dynamics of terminology : a descriptive theory of term formation and terminological growth (2002) 0.02

0.018031968 = product of:
  0.036063936 = sum of:
    0.036063936 = sum of:
      0.01837846 = weight(_text_:k in 1787) [ClassicSimilarity], result of:
        0.01837846 = score(doc=1787,freq=2.0), product of:
          0.18639012 = queryWeight, product of:
            3.569778 = idf(docFreq=3384, maxDocs=44218)
            0.052213363 = queryNorm
          0.098602116 = fieldWeight in 1787, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.569778 = idf(docFreq=3384, maxDocs=44218)
            0.01953125 = fieldNorm(doc=1787)
      0.017685475 = weight(_text_:22 in 1787) [ClassicSimilarity], result of:
        0.017685475 = score(doc=1787,freq=2.0), product of:
          0.1828423 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.052213363 = queryNorm
          0.09672529 = fieldWeight in 1787, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.01953125 = fieldNorm(doc=1787)
  0.5 = coord(1/2)

Date: 22. 3.2008 18:18:53

Search (3 results, page 1 of 1)

Authors

Types

Themes

Subjects

Classifications