Search (201 results, page 2 of 11)

Brown, T.B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; Agarwal, S.; Herbert-Voss, A.; Krueger, G.; Henighan, T.; Child, R.; Ramesh, A.; Ziegler, D.M.; Wu, J.; Winter, C.; Hesse, C.; Chen, M.; Sigler, E.; Litwin, M.; Gray, S.; Chess, B.; Clark, J.; Berner, C.; McCandlish, S.; Radford, A.; Sutskever, I.; Amodei, D.: Language models are few-shot learners (2020) 0.02

0.023797423 = product of:
  0.047594845 = sum of:
    0.047594845 = product of:
      0.07139227 = sum of:
        0.03460505 = weight(_text_:j in 872) [ClassicSimilarity], result of:
          0.03460505 = score(doc=872,freq=6.0), product of:
            0.14227505 = queryWeight, product of:
              3.1774964 = idf(docFreq=5010, maxDocs=44218)
              0.044775832 = queryNorm
            0.24322641 = fieldWeight in 872, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1774964 = idf(docFreq=5010, maxDocs=44218)
              0.03125 = fieldNorm(doc=872)
        0.036787223 = weight(_text_:n in 872) [ClassicSimilarity], result of:
          0.036787223 = score(doc=872,freq=2.0), product of:
            0.19305801 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.044775832 = queryNorm
            0.19055009 = fieldWeight in 872, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.03125 = fieldNorm(doc=872)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)

Malo, P.; Sinha, A.; Korhonen, P.; Wallenius, J.; Takala, P.: Good debt or bad debt : detecting semantic orientations in economic texts (2014) 0.02
```
0.023652691 = product of:
  0.047305383 = sum of:
    0.047305383 = product of:
      0.07095807 = sum of:
        0.024974043 = weight(_text_:j in 1226) [ClassicSimilarity], result of:
          0.024974043 = score(doc=1226,freq=2.0), product of:
            0.14227505 = queryWeight, product of:
              3.1774964 = idf(docFreq=5010, maxDocs=44218)
              0.044775832 = queryNorm
            0.17553353 = fieldWeight in 1226, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1774964 = idf(docFreq=5010, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1226)
        0.045984026 = weight(_text_:n in 1226) [ClassicSimilarity], result of:
          0.045984026 = score(doc=1226,freq=2.0), product of:
            0.19305801 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.044775832 = queryNorm
            0.23818761 = fieldWeight in 1226, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1226)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)
```
Abstract

The use of robo-readers to analyze news texts is an emerging technology trend in computational finance. Recent research has developed sophisticated financial polarity lexicons for investigating how financial sentiments relate to future company performance. However, based on experience from fields that commonly analyze sentiment, it is well known that the overall semantic orientation of a sentence may differ from that of individual words. This article investigates how semantic orientations can be better detected in financial and economic news by accommodating the overall phrase-structure information and domain-specific use of language. Our three main contributions are the following: (a) a human-annotated finance phrase bank that can be used for training and evaluating alternative models; (b) a technique to enhance financial lexicons with attributes that help to identify expected direction of events that affect sentiment; and (c) a linearized phrase-structure model for detecting contextual semantic orientations in economic texts. The relevance of the newly added lexicon features and the benefit of using the proposed learning algorithm are demonstrated in a comparative study against general sentiment models as well as the popular word frequency models used in recent financial studies. The proposed framework is parsimonious and avoids the explosion in feature space caused by the use of conventional n-gram features.

Lawrie, D.; Mayfield, J.; McNamee, P.; Oard, P.W.: Cross-language person-entity linking from 20 languages (2015) 0.02

0.022122633 = product of:
  0.044245265 = sum of:
    0.044245265 = product of:
      0.066367894 = sum of:
        0.029968852 = weight(_text_:j in 1848) [ClassicSimilarity], result of:
          0.029968852 = score(doc=1848,freq=2.0), product of:
            0.14227505 = queryWeight, product of:
              3.1774964 = idf(docFreq=5010, maxDocs=44218)
              0.044775832 = queryNorm
            0.21064025 = fieldWeight in 1848, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1774964 = idf(docFreq=5010, maxDocs=44218)
              0.046875 = fieldNorm(doc=1848)
        0.03639904 = weight(_text_:22 in 1848) [ClassicSimilarity], result of:
          0.03639904 = score(doc=1848,freq=2.0), product of:
            0.15679733 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044775832 = queryNorm
            0.23214069 = fieldWeight in 1848, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=1848)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)

Abstract: The goal of entity linking is to associate references to an entity that is found in unstructured natural language content to an authoritative inventory of known entities. This article describes the construction of 6 test collections for cross-language person-entity linking that together span 22 languages. Fully automated components were used together with 2 crowdsourced validation stages to affordably generate ground-truth annotations with an accuracy comparable to that of a completely manual process. The resulting test collections each contain between 642 (Arabic) and 2,361 (Romanian) person references in non-English texts for which the correct resolution in English Wikipedia is known, plus a similar number of references for which no correct resolution into English Wikipedia is believed to exist. Fully automated cross-language person-name linking experiments with 20 non-English languages yielded a resolution accuracy of between 0.84 (Serbian) and 0.98 (Romanian), which compares favorably with previously reported cross-language entity linking results for Spanish.

Ahmed, F.; Nürnberger, A.: Evaluation of n-gram conflation approaches for Arabic text retrieval (2009) 0.02
```
0.020564683 = product of:
  0.041129366 = sum of:
    0.041129366 = product of:
      0.1233881 = sum of:
        0.1233881 = weight(_text_:n in 2941) [ClassicSimilarity], result of:
          0.1233881 = score(doc=2941,freq=10.0), product of:
            0.19305801 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.044775832 = queryNorm
            0.63912445 = fieldWeight in 2941, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.046875 = fieldNorm(doc=2941)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)
```
Abstract

In this paper we present a language-independent approach for conflation that does not depend on predefined rules or prior knowledge of the target language. The proposed unsupervised method is based on an enhancement of the pure n-gram model that can group related words based on various string-similarity measures, while restricting the search to specific locations of the target word by taking into account the order of n-grams. We show that the method is effective to achieve high score similarities for all word-form variations and reduces the ambiguity, i.e., obtains a higher precision and recall, compared to pure n-gram-based approaches for English, Portuguese, and Arabic. The proposed method is especially suited for conflation approaches in Arabic, since Arabic is a highly inflectional language. Therefore, we present in addition an adaptive user interface for Arabic text retrieval called araSearch. araSearch serves as a metasearch interface to existing search engines. The system is able to extend a query using the proposed conflation approach such that additional results for relevant subwords can be found automatically.

Object

n-grams

Sienel, J.; Weiss, M.; Laube, M.: Sprachtechnologien für die Informationsgesellschaft des 21. Jahrhunderts (2000) 0.02

0.018435527 = product of:
  0.036871053 = sum of:
    0.036871053 = product of:
      0.055306576 = sum of:
        0.024974043 = weight(_text_:j in 5557) [ClassicSimilarity], result of:
          0.024974043 = score(doc=5557,freq=2.0), product of:
            0.14227505 = queryWeight, product of:
              3.1774964 = idf(docFreq=5010, maxDocs=44218)
              0.044775832 = queryNorm
            0.17553353 = fieldWeight in 5557, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1774964 = idf(docFreq=5010, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5557)
        0.030332536 = weight(_text_:22 in 5557) [ClassicSimilarity], result of:
          0.030332536 = score(doc=5557,freq=2.0), product of:
            0.15679733 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044775832 = queryNorm
            0.19345059 = fieldWeight in 5557, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5557)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)

Date: 26.12.2000 13:22:17

Luo, L.; Ju, J.; Li, Y.-F.; Haffari, G.; Xiong, B.; Pan, S.: ChatRule: mining logical rules with large language models for knowledge graph reasoning (2023) 0.02

0.018435527 = product of:
  0.036871053 = sum of:
    0.036871053 = product of:
      0.055306576 = sum of:
        0.024974043 = weight(_text_:j in 1171) [ClassicSimilarity], result of:
          0.024974043 = score(doc=1171,freq=2.0), product of:
            0.14227505 = queryWeight, product of:
              3.1774964 = idf(docFreq=5010, maxDocs=44218)
              0.044775832 = queryNorm
            0.17553353 = fieldWeight in 1171, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1774964 = idf(docFreq=5010, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1171)
        0.030332536 = weight(_text_:22 in 1171) [ClassicSimilarity], result of:
          0.030332536 = score(doc=1171,freq=2.0), product of:
            0.15679733 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044775832 = queryNorm
            0.19345059 = fieldWeight in 1171, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1171)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)

Date: 23.11.2023 19:07:22

Alonge, A.; Calzolari, N.; Vossen, P.; Bloksma, L.; Castellon, I.; Marti, M.A.; Peters, W.: ¬The linguistic design of the EuroWordNet database (1998) 0.02

0.018393612 = product of:
  0.036787223 = sum of:
    0.036787223 = product of:
      0.110361665 = sum of:
        0.110361665 = weight(_text_:n in 6440) [ClassicSimilarity], result of:
          0.110361665 = score(doc=6440,freq=2.0), product of:
            0.19305801 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.044775832 = queryNorm
            0.57165027 = fieldWeight in 6440, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.09375 = fieldNorm(doc=6440)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Figuerola, C.G.; Gomez, R.; Lopez de San Roman, E.: Stemming and n-grams in Spanish : an evaluation of their impact in information retrieval (2000) 0.02

0.018393612 = product of:
  0.036787223 = sum of:
    0.036787223 = product of:
      0.110361665 = sum of:
        0.110361665 = weight(_text_:n in 6501) [ClassicSimilarity], result of:
          0.110361665 = score(doc=6501,freq=2.0), product of:
            0.19305801 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.044775832 = queryNorm
            0.57165027 = fieldWeight in 6501, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.09375 = fieldNorm(doc=6501)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Gencosman, B.C.; Ozmutlu, H.C.; Ozmutlu, S.: Character n-gram application for automatic new topic identification (2014) 0.02
```
0.017137237 = product of:
  0.034274474 = sum of:
    0.034274474 = product of:
      0.10282342 = sum of:
        0.10282342 = weight(_text_:n in 2688) [ClassicSimilarity], result of:
          0.10282342 = score(doc=2688,freq=10.0), product of:
            0.19305801 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.044775832 = queryNorm
            0.53260374 = fieldWeight in 2688, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2688)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)
```
Abstract

The widespread availability of the Internet and the variety of Internet-based applications have resulted in a significant increase in the amount of web pages. Determining the behaviors of search engine users has become a critical step in enhancing search engine performance. Search engine user behaviors can be determined by content-based or content-ignorant algorithms. Although many content-ignorant studies have been performed to automatically identify new topics, previous results have demonstrated that spelling errors can cause significant errors in topic shift estimates. In this study, we focused on minimizing the number of wrong estimates that were based on spelling errors. We developed a new hybrid algorithm combining character n-gram and neural network methodologies, and compared the experimental results with results from previous studies. For the FAST and Excite datasets, the proposed algorithm improved topic shift estimates by 6.987% and 2.639%, respectively. Moreover, we analyzed the performance of the character n-gram method in different aspects including the comparison with Levenshtein edit-distance method. The experimental results demonstrated that the character n-gram method outperformed to the Levensthein edit distance method in terms of topic identification.

Object

n-grams

Patrick, J.; Zhang, J.; Artola-Zubillaga, X.: ¬An architecture and query language for a federation of heterogeneous dictionary databases (2000) 0.02

0.01648203 = product of:
  0.03296406 = sum of:
    0.03296406 = product of:
      0.09889217 = sum of:
        0.09889217 = weight(_text_:j in 339) [ClassicSimilarity], result of:
          0.09889217 = score(doc=339,freq=4.0), product of:
            0.14227505 = queryWeight, product of:
              3.1774964 = idf(docFreq=5010, maxDocs=44218)
              0.044775832 = queryNorm
            0.69507736 = fieldWeight in 339, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1774964 = idf(docFreq=5010, maxDocs=44218)
              0.109375 = fieldNorm(doc=339)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Warner, A.J.: Natural language processing (1987) 0.02

0.016177353 = product of:
  0.032354705 = sum of:
    0.032354705 = product of:
      0.097064115 = sum of:
        0.097064115 = weight(_text_:22 in 337) [ClassicSimilarity], result of:
          0.097064115 = score(doc=337,freq=2.0), product of:
            0.15679733 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044775832 = queryNorm
            0.61904186 = fieldWeight in 337, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.125 = fieldNorm(doc=337)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Source: Annual review of information science and technology. 22(1987), S.79-108

Chen, L.; Fang, H.: ¬An automatic method for ex-tracting innovative ideas based on the Scopus® database (2019) 0.02
```
0.015328009 = product of:
  0.030656017 = sum of:
    0.030656017 = product of:
      0.09196805 = sum of:
        0.09196805 = weight(_text_:n in 5310) [ClassicSimilarity], result of:
          0.09196805 = score(doc=5310,freq=8.0), product of:
            0.19305801 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.044775832 = queryNorm
            0.47637522 = fieldWeight in 5310, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5310)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)
```
Abstract

The novelty of knowledge claims in a research paper can be considered an evaluation criterion for papers to supplement citations. To provide a foundation for research evaluation from the perspective of innovativeness, we propose an automatic approach for extracting innovative ideas from the abstracts of technology and engineering papers. The approach extracts N-grams as candidates based on part-of-speech tagging and determines whether they are novel by checking the Scopus® database to determine whether they had ever been presented previously. Moreover, we discussed the distributions of innovative ideas in different abstract structures. To improve the performance by excluding noisy N-grams, a list of stopwords and a list of research description characteristics were developed. We selected abstracts of articles published from 2011 to 2017 with the topic of semantic analysis as the experimental texts. Excluding noisy N-grams, considering the distribution of innovative ideas in abstracts, and suitably combining N-grams can effectively improve the performance of automatic innovative idea extraction. Unlike co-word and co-citation analysis, innovative-idea extraction aims to identify the differences in a paper from all previously published papers.

Dampz, N.: ChatGPT interpretiert jetzt auch Bilder : Neue Version (2023) 0.02

0.015328009 = product of:
  0.030656017 = sum of:
    0.030656017 = product of:
      0.09196805 = sum of:
        0.09196805 = weight(_text_:n in 874) [ClassicSimilarity], result of:
          0.09196805 = score(doc=874,freq=2.0), product of:
            0.19305801 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.044775832 = queryNorm
            0.47637522 = fieldWeight in 874, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.078125 = fieldNorm(doc=874)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Yang, C.C.; Luk, J.: Automatic generation of English/Chinese thesaurus based on a parallel corpus in laws (2003) 0.02
```
0.015318605 = product of:
  0.03063721 = sum of:
    0.03063721 = product of:
      0.045955814 = sum of:
        0.024723042 = weight(_text_:j in 1616) [ClassicSimilarity], result of:
          0.024723042 = score(doc=1616,freq=4.0), product of:
            0.14227505 = queryWeight, product of:
              3.1774964 = idf(docFreq=5010, maxDocs=44218)
              0.044775832 = queryNorm
            0.17376934 = fieldWeight in 1616, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1774964 = idf(docFreq=5010, maxDocs=44218)
              0.02734375 = fieldNorm(doc=1616)
        0.021232774 = weight(_text_:22 in 1616) [ClassicSimilarity], result of:
          0.021232774 = score(doc=1616,freq=2.0), product of:
            0.15679733 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044775832 = queryNorm
            0.1354154 = fieldWeight in 1616, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.02734375 = fieldNorm(doc=1616)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)
```
Abstract

The information available in languages other than English in the World Wide Web is increasing significantly. According to a report from Computer Economics in 1999, 54% of Internet users are English speakers ("English Will Dominate Web for Only Three More Years," Computer Economics, July 9, 1999, http://www.computereconomics. com/new4/pr/pr990610.html). However, it is predicted that there will be only 60% increase in Internet users among English speakers verses a 150% growth among nonEnglish speakers for the next five years. By 2005, 57% of Internet users will be non-English speakers. A report by CNN.com in 2000 showed that the number of Internet users in China had been increased from 8.9 million to 16.9 million from January to June in 2000 ("Report: China Internet users double to 17 million," CNN.com, July, 2000, http://cnn.org/2000/TECH/computing/07/27/ china.internet.reut/index.html). According to Nielsen/ NetRatings, there was a dramatic leap from 22.5 millions to 56.6 millions Internet users from 2001 to 2002. China had become the second largest global at-home Internet population in 2002 (US's Internet population was 166 millions) (Robyn Greenspan, "China Pulls Ahead of Japan," Internet.com, April 22, 2002, http://cyberatias.internet.com/big-picture/geographics/article/0,,5911_1013841,00. html). All of the evidences reveal the importance of crosslingual research to satisfy the needs in the near future. Digital library research has been focusing in structural and semantic interoperability in the past. Searching and retrieving objects across variations in protocols, formats and disciplines are widely explored (Schatz, B., & Chen, H. (1999). Digital libraries: technological advances and social impacts. IEEE Computer, Special Issue an Digital Libraries, February, 32(2), 45-50.; Chen, H., Yen, J., & Yang, C.C. (1999). International activities: development of Asian digital libraries. IEEE Computer, Special Issue an Digital Libraries, 32(2), 48-49.). However, research in crossing language boundaries, especially across European languages and Oriental languages, is still in the initial stage. In this proposal, we put our focus an cross-lingual semantic interoperability by developing automatic generation of a cross-lingual thesaurus based an English/Chinese parallel corpus. When the searchers encounter retrieval problems, Professional librarians usually consult the thesaurus to identify other relevant vocabularies. In the problem of searching across language boundaries, a cross-lingual thesaurus, which is generated by co-occurrence analysis and Hopfield network, can be used to generate additional semantically relevant terms that cannot be obtained from dictionary. In particular, the automatically generated cross-lingual thesaurus is able to capture the unknown words that do not exist in a dictionary, such as names of persons, organizations, and events. Due to Hong Kong's unique history background, both English and Chinese are used as official languages in all legal documents. Therefore, English/Chinese cross-lingual information retrieval is critical for applications in courts and the government. In this paper, we develop an automatic thesaurus by the Hopfield network based an a parallel corpus collected from the Web site of the Department of Justice of the Hong Kong Special Administrative Region (HKSAR) Government. Experiments are conducted to measure the precision and recall of the automatic generated English/Chinese thesaurus. The result Shows that such thesaurus is a promising tool to retrieve relevant terms, especially in the language that is not the same as the input term. The direct translation of the input term can also be retrieved in most of the cases.

Rorvig, M.; Smith, M.M.; Uemura, A.: ¬The N-gram hypothesis applied to matched sets of visualized Japanese-English technical documents (1999) 0.02

0.015173956 = product of:
  0.030347912 = sum of:
    0.030347912 = product of:
      0.09104373 = sum of:
        0.09104373 = weight(_text_:n in 6675) [ClassicSimilarity], result of:
          0.09104373 = score(doc=6675,freq=4.0), product of:
            0.19305801 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.044775832 = queryNorm
            0.47158742 = fieldWeight in 6675, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6675)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Abstract: Shape Recovery Analysis (SHERA), a new visual analytical technique, is applied to the N-Gram hypothesis on matched Japanese-English technical documents supplied by the National Center for Science Information Systems (NACSIS) in Japan. The results of the SHERA study reveal compaction in the translation of Japanese subject terms to English subject terms. Surprisingly, the bigram approach to the Japanese data yields a remarkable similarity to the matching visualized English texts

McMahon, J.G.; Smith, F.J.: Improved statistical language model performance with automatic generated word hierarchies (1996) 0.01

0.014155183 = product of:
  0.028310366 = sum of:
    0.028310366 = product of:
      0.0849311 = sum of:
        0.0849311 = weight(_text_:22 in 3164) [ClassicSimilarity], result of:
          0.0849311 = score(doc=3164,freq=2.0), product of:
            0.15679733 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044775832 = queryNorm
            0.5416616 = fieldWeight in 3164, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=3164)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Source: Computational linguistics. 22(1996) no.2, S.217-248

Ruge, G.: ¬A spreading activation network for automatic generation of thesaurus relationships (1991) 0.01

0.014155183 = product of:
  0.028310366 = sum of:
    0.028310366 = product of:
      0.0849311 = sum of:
        0.0849311 = weight(_text_:22 in 4506) [ClassicSimilarity], result of:
          0.0849311 = score(doc=4506,freq=2.0), product of:
            0.15679733 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044775832 = queryNorm
            0.5416616 = fieldWeight in 4506, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=4506)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Date: 8.10.2000 11:52:22

Somers, H.: Example-based machine translation : Review article (1999) 0.01

0.014155183 = product of:
  0.028310366 = sum of:
    0.028310366 = product of:
      0.0849311 = sum of:
        0.0849311 = weight(_text_:22 in 6672) [ClassicSimilarity], result of:
          0.0849311 = score(doc=6672,freq=2.0), product of:
            0.15679733 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044775832 = queryNorm
            0.5416616 = fieldWeight in 6672, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=6672)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Date: 31. 7.1996 9:22:19

New tools for human translators (1997) 0.01

0.014155183 = product of:
  0.028310366 = sum of:
    0.028310366 = product of:
      0.0849311 = sum of:
        0.0849311 = weight(_text_:22 in 1179) [ClassicSimilarity], result of:
          0.0849311 = score(doc=1179,freq=2.0), product of:
            0.15679733 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044775832 = queryNorm
            0.5416616 = fieldWeight in 1179, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=1179)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Date: 31. 7.1996 9:22:19

Baayen, R.H.; Lieber, H.: Word frequency distributions and lexical semantics (1997) 0.01

0.014155183 = product of:
  0.028310366 = sum of:
    0.028310366 = product of:
      0.0849311 = sum of:
        0.0849311 = weight(_text_:22 in 3117) [ClassicSimilarity], result of:
          0.0849311 = score(doc=3117,freq=2.0), product of:
            0.15679733 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044775832 = queryNorm
            0.5416616 = fieldWeight in 3117, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=3117)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Date: 28. 2.1999 10:48:22

Search (201 results, page 2 of 11)

Authors

Years

Languages

Types

Themes

Subjects

Classifications