Search (7 results, page 1 of 1)

  • × theme_ss:"Computerlinguistik"
  • × year_i:[2010 TO 2020}
  1. Lawrie, D.; Mayfield, J.; McNamee, P.; Oard, P.W.: Cross-language person-entity linking from 20 languages (2015) 0.08
    0.0838783 = product of:
      0.1677566 = sum of:
        0.1677566 = sum of:
          0.12689878 = weight(_text_:plus in 1848) [ClassicSimilarity], result of:
            0.12689878 = score(doc=1848,freq=2.0), product of:
              0.3101809 = queryWeight, product of:
                6.1714344 = idf(docFreq=250, maxDocs=44218)
                0.05026075 = queryNorm
              0.40911216 = fieldWeight in 1848, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1714344 = idf(docFreq=250, maxDocs=44218)
                0.046875 = fieldNorm(doc=1848)
          0.04085782 = weight(_text_:22 in 1848) [ClassicSimilarity], result of:
            0.04085782 = score(doc=1848,freq=2.0), product of:
              0.17600457 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.05026075 = queryNorm
              0.23214069 = fieldWeight in 1848, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=1848)
      0.5 = coord(1/2)
    
    Abstract
    The goal of entity linking is to associate references to an entity that is found in unstructured natural language content to an authoritative inventory of known entities. This article describes the construction of 6 test collections for cross-language person-entity linking that together span 22 languages. Fully automated components were used together with 2 crowdsourced validation stages to affordably generate ground-truth annotations with an accuracy comparable to that of a completely manual process. The resulting test collections each contain between 642 (Arabic) and 2,361 (Romanian) person references in non-English texts for which the correct resolution in English Wikipedia is known, plus a similar number of references for which no correct resolution into English Wikipedia is believed to exist. Fully automated cross-language person-name linking experiments with 20 non-English languages yielded a resolution accuracy of between 0.84 (Serbian) and 0.98 (Romanian), which compares favorably with previously reported cross-language entity linking results for Spanish.
  2. Muneer, I.; Sharjeel, M.; Iqbal, M.; Adeel Nawab, R.M.; Rayson, P.: CLEU - A Cross-language english-urdu corpus and benchmark for text reuse experiments (2019) 0.03
    0.026437245 = product of:
      0.05287449 = sum of:
        0.05287449 = product of:
          0.10574898 = sum of:
            0.10574898 = weight(_text_:plus in 5299) [ClassicSimilarity], result of:
              0.10574898 = score(doc=5299,freq=2.0), product of:
                0.3101809 = queryWeight, product of:
                  6.1714344 = idf(docFreq=250, maxDocs=44218)
                  0.05026075 = queryNorm
                0.3409268 = fieldWeight in 5299, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  6.1714344 = idf(docFreq=250, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5299)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Text reuse is becoming a serious issue in many fields and research shows that it is much harder to detect when it occurs across languages. The recent rise in multi-lingual content on the Web has increased cross-language text reuse to an unprecedented scale. Although researchers have proposed methods to detect it, one major drawback is the unavailability of large-scale gold standard evaluation resources built on real cases. To overcome this problem, we propose a cross-language sentence/passage level text reuse corpus for the English-Urdu language pair. The Cross-Language English-Urdu Corpus (CLEU) has source text in English whereas the derived text is in Urdu. It contains in total 3,235 sentence/passage pairs manually tagged into three categories that is near copy, paraphrased copy, and independently written. Further, as a second contribution, we evaluate the Translation plus Mono-lingual Analysis method using three sets of experiments on the proposed dataset to highlight its usefulness. Evaluation results (f1=0.732 binary, f1=0.552 ternary classification) indicate that it is harder to detect cross-language real cases of text reuse, especially when the language pairs have unrelated scripts. The corpus is a useful benchmark resource for the future development and assessment of cross-language text reuse detection systems for the English-Urdu language pair.
  3. Lezius, W.: Morphy - Morphologie und Tagging für das Deutsche (2013) 0.01
    0.013619275 = product of:
      0.02723855 = sum of:
        0.02723855 = product of:
          0.0544771 = sum of:
            0.0544771 = weight(_text_:22 in 1490) [ClassicSimilarity], result of:
              0.0544771 = score(doc=1490,freq=2.0), product of:
                0.17600457 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05026075 = queryNorm
                0.30952093 = fieldWeight in 1490, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1490)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    22. 3.2015 9:30:24
  4. Huo, W.: Automatic multi-word term extraction and its application to Web-page summarization (2012) 0.01
    0.010214455 = product of:
      0.02042891 = sum of:
        0.02042891 = product of:
          0.04085782 = sum of:
            0.04085782 = weight(_text_:22 in 563) [ClassicSimilarity], result of:
              0.04085782 = score(doc=563,freq=2.0), product of:
                0.17600457 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05026075 = queryNorm
                0.23214069 = fieldWeight in 563, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=563)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    10. 1.2013 19:22:47
  5. Fóris, A.: Network theory and terminology (2013) 0.01
    0.008512047 = product of:
      0.017024094 = sum of:
        0.017024094 = product of:
          0.03404819 = sum of:
            0.03404819 = weight(_text_:22 in 1365) [ClassicSimilarity], result of:
              0.03404819 = score(doc=1365,freq=2.0), product of:
                0.17600457 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05026075 = queryNorm
                0.19345059 = fieldWeight in 1365, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1365)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    2. 9.2014 21:22:48
  6. Rötzer, F.: KI-Programm besser als Menschen im Verständnis natürlicher Sprache (2018) 0.01
    0.0068096374 = product of:
      0.013619275 = sum of:
        0.013619275 = product of:
          0.02723855 = sum of:
            0.02723855 = weight(_text_:22 in 4217) [ClassicSimilarity], result of:
              0.02723855 = score(doc=4217,freq=2.0), product of:
                0.17600457 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05026075 = queryNorm
                0.15476047 = fieldWeight in 4217, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03125 = fieldNorm(doc=4217)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    22. 1.2018 11:32:44
  7. Deventer, J.P. van; Kruger, C.J.; Johnson, R.D.: Delineating knowledge management through lexical analysis : a retrospective (2015) 0.01
    0.005958433 = product of:
      0.011916866 = sum of:
        0.011916866 = product of:
          0.023833731 = sum of:
            0.023833731 = weight(_text_:22 in 3807) [ClassicSimilarity], result of:
              0.023833731 = score(doc=3807,freq=2.0), product of:
                0.17600457 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05026075 = queryNorm
                0.1354154 = fieldWeight in 3807, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=3807)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    20. 1.2015 18:30:22