Search (2 results, page 1 of 1)

Did you mean:
rswk_00%3a%22Unternehmen %2f Computerunterst%c3%betete kommunikation %2f web log %2f schutz %28GBV%29%22 2
rswk_00%3a%22Unternehmen %2f Computerunterst%c3%betete kommunikation %2f web log %2f schulz %28GBV%29%22 2
rswk_00%3a%22Unternehmen %2f Computerunterst%c3%btzten kommunikation %2f web log %2f schutz %28GBV%29%22 2
rswk_00%3a%22Unternehmen %2f Computerunterst%c3%betete kommunikation %2f web log %2f schultz %28GBV%29%22 2
rswk_00%3a%22Unternehmen %2f Computerunterst%c3%bitte kommunikation %2f web log %2f schutz %28GBV%29%22 2

Muneer, I.; Sharjeel, M.; Iqbal, M.; Adeel Nawab, R.M.; Rayson, P.: CLEU - A Cross-language english-urdu corpus and benchmark for text reuse experiments (2019) 0.00
```
0.0016833913 = product of:
  0.016833913 = sum of:
    0.016833913 = weight(_text_:web in 5299) [ClassicSimilarity], result of:
      0.016833913 = score(doc=5299,freq=2.0), product of:
        0.0933738 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.028611459 = queryNorm
        0.18028519 = fieldWeight in 5299, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5299)
  0.1 = coord(1/10)
```
Abstract

Text reuse is becoming a serious issue in many fields and research shows that it is much harder to detect when it occurs across languages. The recent rise in multi-lingual content on the Web has increased cross-language text reuse to an unprecedented scale. Although researchers have proposed methods to detect it, one major drawback is the unavailability of large-scale gold standard evaluation resources built on real cases. To overcome this problem, we propose a cross-language sentence/passage level text reuse corpus for the English-Urdu language pair. The Cross-Language English-Urdu Corpus (CLEU) has source text in English whereas the derived text is in Urdu. It contains in total 3,235 sentence/passage pairs manually tagged into three categories that is near copy, paraphrased copy, and independently written. Further, as a second contribution, we evaluate the Translation plus Mono-lingual Analysis method using three sets of experiments on the proposed dataset to highlight its usefulness. Evaluation results (f1=0.732 binary, f1=0.552 ternary classification) indicate that it is harder to detect cross-language real cases of text reuse, especially when the language pairs have unrelated scripts. The corpus is a useful benchmark resource for the future development and assessment of cross-language text reuse detection systems for the English-Urdu language pair.

Rayson, P.; Piao, S.; Sharoff, S.; Evert, S.; Moiron, B.V.: Multiword expressions : hard going or plain sailing? (2015) 0.00

9.1271737E-4 = product of:
  0.009127174 = sum of:
    0.009127174 = product of:
      0.027381519 = sum of:
        0.027381519 = weight(_text_:29 in 2918) [ClassicSimilarity], result of:
          0.027381519 = score(doc=2918,freq=2.0), product of:
            0.10064617 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.028611459 = queryNorm
            0.27205724 = fieldWeight in 2918, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2918)
      0.33333334 = coord(1/3)
  0.1 = coord(1/10)

Date: 29. 4.2016 12:05:56

Search (2 results, page 1 of 1)

Authors