Search (735 results, page 1 of 37)

Bordogna, G.; Pasi, G.: ¬A fuzzy linguistic approach generalizing Boolean information retrieval : a model and its evaluation (1993) 0.12

0.115062416 = product of:
  0.15341656 = sum of:
    0.0132912 = weight(_text_:a in 2569) [ClassicSimilarity], result of:
      0.0132912 = score(doc=2569,freq=10.0), product of:
        0.05832264 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.05058132 = queryNorm
        0.22789092 = fieldWeight in 2569, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0625 = fieldNorm(doc=2569)
    0.12819354 = weight(_text_:70 in 2569) [ClassicSimilarity], result of:
      0.12819354 = score(doc=2569,freq=2.0), product of:
        0.27085114 = queryWeight, product of:
          5.354766 = idf(docFreq=567, maxDocs=44218)
          0.05058132 = queryNorm
        0.4732989 = fieldWeight in 2569, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.354766 = idf(docFreq=567, maxDocs=44218)
          0.0625 = fieldNorm(doc=2569)
    0.011931818 = product of:
      0.023863636 = sum of:
        0.023863636 = weight(_text_:information in 2569) [ClassicSimilarity], result of:
          0.023863636 = score(doc=2569,freq=6.0), product of:
            0.088794395 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.05058132 = queryNorm
            0.2687516 = fieldWeight in 2569, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0625 = fieldNorm(doc=2569)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: Describes an approach to research in the field of the generalisation of Boolean information retrieval using a model derived from an existing weighted Boolean retrieval model, with a linguistic extension, formalised within fuzzy set theory in which numeric query weights were replaced by linguistic descriptors specifying the degree of importance of the terms
Source: Journal of the American Society for Information Science. 44(1993) no.2, S.70-82
Type: a

Schwarz, C.: THESYS: Thesaurus Syntax System : a fully automatic thesaurus building aid (1988) 0.11

0.109917834 = product of:
  0.14655711 = sum of:
    0.010402009 = weight(_text_:a in 1361) [ClassicSimilarity], result of:
      0.010402009 = score(doc=1361,freq=8.0), product of:
        0.05832264 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.05058132 = queryNorm
        0.17835285 = fieldWeight in 1361, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1361)
    0.11216935 = weight(_text_:70 in 1361) [ClassicSimilarity], result of:
      0.11216935 = score(doc=1361,freq=2.0), product of:
        0.27085114 = queryWeight, product of:
          5.354766 = idf(docFreq=567, maxDocs=44218)
          0.05058132 = queryNorm
        0.41413653 = fieldWeight in 1361, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.354766 = idf(docFreq=567, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1361)
    0.023985747 = product of:
      0.047971494 = sum of:
        0.047971494 = weight(_text_:22 in 1361) [ClassicSimilarity], result of:
          0.047971494 = score(doc=1361,freq=2.0), product of:
            0.17712717 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05058132 = queryNorm
            0.2708308 = fieldWeight in 1361, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1361)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: THESYS is based on the natural language processing of free-text databases. It yields statistically evaluated correlations between words of the database. These correlations correspond to traditional thesaurus relations. The person who has to build a thesaurus is thus assisted by the proposals made by THESYS. THESYS is being tested on commercial databases under real world conditions. It is part of a text processing project at Siemens, called TINA (Text-Inhalts-Analyse). Software from TINA is actually being applied and evaluated by the US Department of Commerce for patent search and indexing (REALIST: REtrieval Aids by Linguistics and STatistics)
Date: 6. 1.1999 10:22:07
Pages: S.63-70
Type: a

Hammwöhner, R.: TransRouter revisited : Decision support in the routing of translation projects (2000) 0.11

0.10887262 = product of:
  0.14516349 = sum of:
    0.009008404 = weight(_text_:a in 5483) [ClassicSimilarity], result of:
      0.009008404 = score(doc=5483,freq=6.0), product of:
        0.05832264 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.05058132 = queryNorm
        0.1544581 = fieldWeight in 5483, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5483)
    0.11216935 = weight(_text_:70 in 5483) [ClassicSimilarity], result of:
      0.11216935 = score(doc=5483,freq=2.0), product of:
        0.27085114 = queryWeight, product of:
          5.354766 = idf(docFreq=567, maxDocs=44218)
          0.05058132 = queryNorm
        0.41413653 = fieldWeight in 5483, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.354766 = idf(docFreq=567, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5483)
    0.023985747 = product of:
      0.047971494 = sum of:
        0.047971494 = weight(_text_:22 in 5483) [ClassicSimilarity], result of:
          0.047971494 = score(doc=5483,freq=2.0), product of:
            0.17712717 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05058132 = queryNorm
            0.2708308 = fieldWeight in 5483, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5483)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: This paper gives an outline of the final results of the TransRouter project. In the scope of this project a decision support system for translation managers has been developed, which will support the selection of appropriate routes for translation projects. In this paper emphasis is put on the decision model, which is based on a stepwise refined assessment of translation routes. The workflow of using this system is considered as well
Date: 10.12.2000 18:22:35
Pages: S.49-70
Type: a

Toutanova, K.; Manning, C.D.: Enriching the knowledge sources used in a maximum entropy Part-of-Speech Tagger (2000) 0.10

0.0954041 = product of:
  0.12720548 = sum of:
    0.009008404 = weight(_text_:a in 1060) [ClassicSimilarity], result of:
      0.009008404 = score(doc=1060,freq=6.0), product of:
        0.05832264 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.05058132 = queryNorm
        0.1544581 = fieldWeight in 1060, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1060)
    0.11216935 = weight(_text_:70 in 1060) [ClassicSimilarity], result of:
      0.11216935 = score(doc=1060,freq=2.0), product of:
        0.27085114 = queryWeight, product of:
          5.354766 = idf(docFreq=567, maxDocs=44218)
          0.05058132 = queryNorm
        0.41413653 = fieldWeight in 1060, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.354766 = idf(docFreq=567, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1060)
    0.006027733 = product of:
      0.012055466 = sum of:
        0.012055466 = weight(_text_:information in 1060) [ClassicSimilarity], result of:
          0.012055466 = score(doc=1060,freq=2.0), product of:
            0.088794395 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.05058132 = queryNorm
            0.13576832 = fieldWeight in 1060, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1060)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: This paper presents results for a maximumentropy-based part of speech tagger, which achieves superior performance principally by enriching the information sources used for tagging. In particular, we get improved results by incorporating these features: (i) more extensive treatment of capitalization for unknown words; (ii) features for the disambiguation of the tense forms of verbs; (iii) features for disambiguating particles from prepositions and adverbs. The best resulting accuracy for the tagger on the Penn Treebank is 96.86% overall, and 86.91% on previously unseen words.
Pages: S.63-70
Type: a

Lu, C.; Bu, Y.; Wang, J.; Ding, Y.; Torvik, V.; Schnaars, M.; Zhang, C.: Examining scientific writing styles from the perspective of linguistic complexity : a cross-level moderation model (2019) 0.08

0.08346014 = product of:
  0.11128019 = sum of:
    0.0099684 = weight(_text_:a in 5219) [ClassicSimilarity], result of:
      0.0099684 = score(doc=5219,freq=10.0), product of:
        0.05832264 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.05058132 = queryNorm
        0.1709182 = fieldWeight in 5219, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=5219)
    0.09614516 = weight(_text_:70 in 5219) [ClassicSimilarity], result of:
      0.09614516 = score(doc=5219,freq=2.0), product of:
        0.27085114 = queryWeight, product of:
          5.354766 = idf(docFreq=567, maxDocs=44218)
          0.05058132 = queryNorm
        0.35497418 = fieldWeight in 5219, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.354766 = idf(docFreq=567, maxDocs=44218)
          0.046875 = fieldNorm(doc=5219)
    0.0051666284 = product of:
      0.010333257 = sum of:
        0.010333257 = weight(_text_:information in 5219) [ClassicSimilarity], result of:
          0.010333257 = score(doc=5219,freq=2.0), product of:
            0.088794395 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.05058132 = queryNorm
            0.116372846 = fieldWeight in 5219, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=5219)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: Publishing articles in high-impact English journals is difficult for scholars around the world, especially for non-native English-speaking scholars (NNESs), most of whom struggle with proficiency in English. To uncover the differences in English scientific writing between native English-speaking scholars (NESs) and NNESs, we collected a large-scale data set containing more than 150,000 full-text articles published in PLoS between 2006 and 2015. We divided these articles into three groups according to the ethnic backgrounds of the first and corresponding authors, obtained by Ethnea, and examined the scientific writing styles in English from a two-fold perspective of linguistic complexity: (a) syntactic complexity, including measurements of sentence length and sentence complexity; and (b) lexical complexity, including measurements of lexical diversity, lexical density, and lexical sophistication. The observations suggest marginal differences between groups in syntactical and lexical complexity.
Source: Journal of the Association for Information Science and Technology. 70(2019) no.5, S.462-475
Type: a

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.08

0.08235882 = product of:
  0.10981177 = sum of:
    0.08033655 = product of:
      0.24100964 = sum of:
        0.24100964 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.24100964 = score(doc=562,freq=2.0), product of:
            0.428829 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.05058132 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.008916007 = weight(_text_:a in 562) [ClassicSimilarity], result of:
      0.008916007 = score(doc=562,freq=8.0), product of:
        0.05832264 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.05058132 = queryNorm
        0.15287387 = fieldWeight in 562, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.020559212 = product of:
      0.041118424 = sum of:
        0.041118424 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.041118424 = score(doc=562,freq=2.0), product of:
            0.17712717 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05058132 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: Document representations for text classification are typically based on the classical Bag-Of-Words paradigm. This approach comes with deficiencies that motivate the integration of features on a higher semantic level than single words. In this paper we propose an enhancement of the classical document representation through concepts extracted from background knowledge. Boosting is used for actual classification. Experimental evaluations on two well known text corpora support our approach through consistent improvement of the results.
Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32
Type: a

Geißler, S.: Maschinelles Lernen und NLP : Reif für die industrielle Anwendung! (2019) 0.08

0.079327345 = product of:
  0.10576979 = sum of:
    0.0044580037 = weight(_text_:a in 3547) [ClassicSimilarity], result of:
      0.0044580037 = score(doc=3547,freq=2.0), product of:
        0.05832264 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.05058132 = queryNorm
        0.07643694 = fieldWeight in 3547, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=3547)
    0.09614516 = weight(_text_:70 in 3547) [ClassicSimilarity], result of:
      0.09614516 = score(doc=3547,freq=2.0), product of:
        0.27085114 = queryWeight, product of:
          5.354766 = idf(docFreq=567, maxDocs=44218)
          0.05058132 = queryNorm
        0.35497418 = fieldWeight in 3547, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.354766 = idf(docFreq=567, maxDocs=44218)
          0.046875 = fieldNorm(doc=3547)
    0.0051666284 = product of:
      0.010333257 = sum of:
        0.010333257 = weight(_text_:information in 3547) [ClassicSimilarity], result of:
          0.010333257 = score(doc=3547,freq=2.0), product of:
            0.088794395 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.05058132 = queryNorm
            0.116372846 = fieldWeight in 3547, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=3547)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Source: Information - Wissenschaft und Praxis. 70(2019) H.2/3, S.134-140
Type: a

Warner, A.J.: Natural language processing (1987) 0.07

0.07454625 = product of:
  0.1490925 = sum of:
    0.0118880095 = weight(_text_:a in 337) [ClassicSimilarity], result of:
      0.0118880095 = score(doc=337,freq=2.0), product of:
        0.05832264 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.05058132 = queryNorm
        0.20383182 = fieldWeight in 337, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.125 = fieldNorm(doc=337)
    0.13720448 = sum of:
      0.02755535 = weight(_text_:information in 337) [ClassicSimilarity], result of:
        0.02755535 = score(doc=337,freq=2.0), product of:
          0.088794395 = queryWeight, product of:
            1.7554779 = idf(docFreq=20772, maxDocs=44218)
            0.05058132 = queryNorm
          0.3103276 = fieldWeight in 337, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            1.7554779 = idf(docFreq=20772, maxDocs=44218)
            0.125 = fieldNorm(doc=337)
      0.10964913 = weight(_text_:22 in 337) [ClassicSimilarity], result of:
        0.10964913 = score(doc=337,freq=2.0), product of:
          0.17712717 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05058132 = queryNorm
          0.61904186 = fieldWeight in 337, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.125 = fieldNorm(doc=337)
  0.5 = coord(2/4)

Source: Annual review of information science and technology. 22(1987), S.79-108
Type: a

Doval, Y.; Gómez-Rodríguez, C.: Comparing neural- and N-gram-based language models for word segmentation (2019) 0.07

0.07346833 = product of:
  0.097957775 = sum of:
    0.0117478715 = weight(_text_:a in 4675) [ClassicSimilarity], result of:
      0.0117478715 = score(doc=4675,freq=20.0), product of:
        0.05832264 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.05058132 = queryNorm
        0.20142901 = fieldWeight in 4675, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4675)
    0.08012097 = weight(_text_:70 in 4675) [ClassicSimilarity], result of:
      0.08012097 = score(doc=4675,freq=2.0), product of:
        0.27085114 = queryWeight, product of:
          5.354766 = idf(docFreq=567, maxDocs=44218)
          0.05058132 = queryNorm
        0.29581183 = fieldWeight in 4675, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.354766 = idf(docFreq=567, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4675)
    0.0060889297 = product of:
      0.012177859 = sum of:
        0.012177859 = weight(_text_:information in 4675) [ClassicSimilarity], result of:
          0.012177859 = score(doc=4675,freq=4.0), product of:
            0.088794395 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.05058132 = queryNorm
            0.13714671 = fieldWeight in 4675, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4675)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: Word segmentation is the task of inserting or deleting word boundary characters in order to separate character sequences that correspond to words in some language. In this article we propose an approach based on a beam search algorithm and a language model working at the byte/character level, the latter component implemented either as an n-gram model or a recurrent neural network. The resulting system analyzes the text input with no word boundaries one token at a time, which can be a character or a byte, and uses the information gathered by the language model to determine if a boundary must be placed in the current position or not. Our aim is to use this system in a preprocessing step for a microtext normalization system. This means that it needs to effectively cope with the data sparsity present on this kind of texts. We also strove to surpass the performance of two readily available word segmentation systems: The well-known and accessible Word Breaker by Microsoft, and the Python module WordSegment by Grant Jenks. The results show that we have met our objectives, and we hope to continue to improve both the precision and the efficiency of our system in the future.
Source: Journal of the Association for Information Science and Technology. 70(2019) no.2, S.187-197
Type: a

Tseng, Y.-H.: Automatic thesaurus generation for Chinese documents (2002) 0.07

0.072130784 = product of:
  0.096174374 = sum of:
    0.0117478715 = weight(_text_:a in 5226) [ClassicSimilarity], result of:
      0.0117478715 = score(doc=5226,freq=20.0), product of:
        0.05832264 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.05058132 = queryNorm
        0.20142901 = fieldWeight in 5226, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5226)
    0.08012097 = weight(_text_:70 in 5226) [ClassicSimilarity], result of:
      0.08012097 = score(doc=5226,freq=2.0), product of:
        0.27085114 = queryWeight, product of:
          5.354766 = idf(docFreq=567, maxDocs=44218)
          0.05058132 = queryNorm
        0.29581183 = fieldWeight in 5226, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.354766 = idf(docFreq=567, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5226)
    0.0043055234 = product of:
      0.008611047 = sum of:
        0.008611047 = weight(_text_:information in 5226) [ClassicSimilarity], result of:
          0.008611047 = score(doc=5226,freq=2.0), product of:
            0.088794395 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.05058132 = queryNorm
            0.09697737 = fieldWeight in 5226, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5226)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: Tseng constructs a word co-occurrence based thesaurus by means of the automatic analysis of Chinese text. Words are identified by a longest dictionary match supplemented by a key word extraction algorithm that merges back nearby tokens and accepts shorter strings of characters if they occur more often than the longest string. Single character auxiliary words are a major source of error but this can be greatly reduced with the use of a 70-character 2680 word stop list. Extracted terms with their associate document weights are sorted by decreasing frequency and the top of this list is associated using a Dice coefficient modified to account for longer documents on the weights of term pairs. Co-occurrence is not in the document as a whole but in paragraph or sentence size sections in order to reduce computation time. A window of 29 characters or 11 words was found to be sufficient. A thesaurus was produced from 25,230 Chinese news articles and judges asked to review the top 50 terms associated with each of 30 single word query terms. They determined 69% to be relevant.
Source: Journal of the American Society for Information Science and technology. 53(2002) no.13, S.1130-1138
Type: a

Stede, M.: Lexicalization in natural language generation (2002) 0.07
```
0.07022993 = product of:
  0.09363991 = sum of:
    0.007430006 = weight(_text_:a in 4245) [ClassicSimilarity], result of:
      0.007430006 = score(doc=4245,freq=8.0), product of:
        0.05832264 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.05058132 = queryNorm
        0.12739488 = fieldWeight in 4245, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4245)
    0.08012097 = weight(_text_:70 in 4245) [ClassicSimilarity], result of:
      0.08012097 = score(doc=4245,freq=2.0), product of:
        0.27085114 = queryWeight, product of:
          5.354766 = idf(docFreq=567, maxDocs=44218)
          0.05058132 = queryNorm
        0.29581183 = fieldWeight in 4245, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.354766 = idf(docFreq=567, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4245)
    0.0060889297 = product of:
      0.012177859 = sum of:
        0.012177859 = weight(_text_:information in 4245) [ClassicSimilarity], result of:
          0.012177859 = score(doc=4245,freq=4.0), product of:
            0.088794395 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.05058132 = queryNorm
            0.13714671 = fieldWeight in 4245, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4245)
      0.5 = coord(1/2)
  0.75 = coord(3/4)
```
Abstract

Natural language generation (NLG), the automatic production of text by Computers, is commonly seen as a process consisting of the following distinct phases: Obviously, choosing words is a central aspect of generatiog language. In which of the these phases it should take place is not entirely clear, however. The decision depends an various factors: what exactly is seen as an individual lexical item; how the relation between word meaning and background knowledge (concepts) is defined; how one accounts for the interactions between individual lexical choices in the Same sentence; what criteria are employed for choosing between similar words; whether or not output is required in one or more languages. This article surveys these issues and the answers that have been proposed in NLG research. For many applications of natural language processing, large scale lexical resources have become available in recent years, such as the WordNet database. In language generation, however, generic lexicons are not in use yet; rather, almost every generation project develops its own format for lexical representations. The reason is that the entries of a generation lexicon need their specific interfaces to the Input representations processed by the generator; lexical semantics in an NLG lexicon needs to be tailored to the Input. Ort the other hand, the large lexicons used for language analysis typically have only very limited semantic information at all. Yet the syntactic behavior of words remains the same regardless of the particular application; thus, it should be possible to build at least parts of generic NLG lexical entries automatically, which could then be used by different systems.

Source

Encyclopedia of library and information science. Vol.70, [=Suppl.33]

Type

a

Muneer, I.; Sharjeel, M.; Iqbal, M.; Adeel Nawab, R.M.; Rayson, P.: CLEU - A Cross-language english-urdu corpus and benchmark for text reuse experiments (2019) 0.07

0.07014477 = product of:
  0.09352636 = sum of:
    0.009099863 = weight(_text_:a in 5299) [ClassicSimilarity], result of:
      0.009099863 = score(doc=5299,freq=12.0), product of:
        0.05832264 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.05058132 = queryNorm
        0.15602624 = fieldWeight in 5299, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5299)
    0.08012097 = weight(_text_:70 in 5299) [ClassicSimilarity], result of:
      0.08012097 = score(doc=5299,freq=2.0), product of:
        0.27085114 = queryWeight, product of:
          5.354766 = idf(docFreq=567, maxDocs=44218)
          0.05058132 = queryNorm
        0.29581183 = fieldWeight in 5299, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.354766 = idf(docFreq=567, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5299)
    0.0043055234 = product of:
      0.008611047 = sum of:
        0.008611047 = weight(_text_:information in 5299) [ClassicSimilarity], result of:
          0.008611047 = score(doc=5299,freq=2.0), product of:
            0.088794395 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.05058132 = queryNorm
            0.09697737 = fieldWeight in 5299, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5299)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: Text reuse is becoming a serious issue in many fields and research shows that it is much harder to detect when it occurs across languages. The recent rise in multi-lingual content on the Web has increased cross-language text reuse to an unprecedented scale. Although researchers have proposed methods to detect it, one major drawback is the unavailability of large-scale gold standard evaluation resources built on real cases. To overcome this problem, we propose a cross-language sentence/passage level text reuse corpus for the English-Urdu language pair. The Cross-Language English-Urdu Corpus (CLEU) has source text in English whereas the derived text is in Urdu. It contains in total 3,235 sentence/passage pairs manually tagged into three categories that is near copy, paraphrased copy, and independently written. Further, as a second contribution, we evaluate the Translation plus Mono-lingual Analysis method using three sets of experiments on the proposed dataset to highlight its usefulness. Evaluation results (f1=0.732 binary, f1=0.552 ternary classification) indicate that it is harder to detect cross-language real cases of text reuse, especially when the language pairs have unrelated scripts. The corpus is a useful benchmark resource for the future development and assessment of cross-language text reuse detection systems for the English-Urdu language pair.
Source: Journal of the Association for Information Science and Technology. 70(2019) no.7, S.729-741
Type: a

Byrne, C.C.; McCracken, S.A.: ¬An adaptive thesaurus employing semantic distance, relational inheritance and nominal compound interpretation for linguistic support of information retrieval (1999) 0.06

0.060189858 = product of:
  0.120379716 = sum of:
    0.008916007 = weight(_text_:a in 4483) [ClassicSimilarity], result of:
      0.008916007 = score(doc=4483,freq=2.0), product of:
        0.05832264 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.05058132 = queryNorm
        0.15287387 = fieldWeight in 4483, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.09375 = fieldNorm(doc=4483)
    0.11146371 = sum of:
      0.029226862 = weight(_text_:information in 4483) [ClassicSimilarity], result of:
        0.029226862 = score(doc=4483,freq=4.0), product of:
          0.088794395 = queryWeight, product of:
            1.7554779 = idf(docFreq=20772, maxDocs=44218)
            0.05058132 = queryNorm
          0.3291521 = fieldWeight in 4483, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            1.7554779 = idf(docFreq=20772, maxDocs=44218)
            0.09375 = fieldNorm(doc=4483)
      0.08223685 = weight(_text_:22 in 4483) [ClassicSimilarity], result of:
        0.08223685 = score(doc=4483,freq=2.0), product of:
          0.17712717 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05058132 = queryNorm
          0.46428138 = fieldWeight in 4483, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.09375 = fieldNorm(doc=4483)
  0.5 = coord(2/4)

Date: 15. 3.2000 10:22:37
Source: Journal of information science. 25(1999) no.2, S.113-131
Type: a

Schürmann, H.: Software scannt Radio- und Fernsehsendungen : Recherche in Nachrichtenarchiven erleichtert (2001) 0.05
```
0.053008534 = product of:
  0.07067805 = sum of:
    0.0026005022 = weight(_text_:a in 5759) [ClassicSimilarity], result of:
      0.0026005022 = score(doc=5759,freq=2.0), product of:
        0.05832264 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.05058132 = queryNorm
        0.044588212 = fieldWeight in 5759, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.02734375 = fieldNorm(doc=5759)
    0.056084674 = weight(_text_:70 in 5759) [ClassicSimilarity], result of:
      0.056084674 = score(doc=5759,freq=2.0), product of:
        0.27085114 = queryWeight, product of:
          5.354766 = idf(docFreq=567, maxDocs=44218)
          0.05058132 = queryNorm
        0.20706826 = fieldWeight in 5759, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.354766 = idf(docFreq=567, maxDocs=44218)
          0.02734375 = fieldNorm(doc=5759)
    0.011992874 = product of:
      0.023985747 = sum of:
        0.023985747 = weight(_text_:22 in 5759) [ClassicSimilarity], result of:
          0.023985747 = score(doc=5759,freq=2.0), product of:
            0.17712717 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05058132 = queryNorm
            0.1354154 = fieldWeight in 5759, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.02734375 = fieldNorm(doc=5759)
      0.5 = coord(1/2)
  0.75 = coord(3/4)
```
Content

Um Firmen und Agenturen die Beobachtungen von Medien zu erleichtern, entwickeln Forscher an der Duisburger Hochschule zurzeit ein System zur automatischen Themenerkennung in Rundfunk und Fernsehen. Das so genannte Alert-System soll dem Nutzer helfen, die für ihn relevanten Sprachinformationen aus Nachrichtensendungen herauszufiltem und weiterzuverarbeiten. Durch die automatische Analyse durch den Computer können mehrere Programme rund um die Uhr beobachtet werden. Noch erfolgt die Informationsgewinnung aus TV- und Radiosendungen auf klassischem Wege: Ein Mensch sieht, hört, liest und wertet aus. Das ist enorm zeitaufwendig und für eine Firma, die beispielsweise die Konkurrenz beobachten oder ihre Medienpräsenz dokumentieren lassen möchte, auch sehr teuer. Diese Arbeit ließe sich mit einem Spracherkenner automatisieren, sagten sich die Duisburger Forscher. Sie arbeiten nun zusammen mit Partnern aus Deutschland, Frankreich und Portugal in einem europaweiten Projekt an der Entwicklung einer entsprechenden Technologie (http://alert.uni-duisburg.de). An dem Projekt sind auch zwei Medienbeobachtungsuntemehmen beteiligt, die Oberserver Argus Media GmbH aus Baden-Baden und das französische Unternehmen Secodip. Unsere Arbeit würde schon dadurch erleichtert, wenn Informationen, die über unsere Kunden in den Medien erscheinen, vorselektiert würden", beschreibt Simone Holderbach, Leiterin der Produktentwicklung bei Oberserver, ihr Interesse an der Technik. Und wie funktioniert Alert? Das Spracherkennungssystem wird darauf getrimmt, Nachrichtensendungen in Radio und Fernsehen zu überwachen: Alles, was gesagt wird - sei es vom Nachrichtensprecher, Reporter oder Interviewten -, wird durch die automatische Spracherkennung in Text umgewandelt. Dabei werden Themen und Schlüsselwörter erkannt und gespeichert. Diese werden mit den Suchbegriffen des Nutzers verglichen. Gefundene Übereinstimmungen werden angezeigt und dem Benutzer automatisch mitgeteilt. Konventionelle Spracherkennungstechnik sei für die Medienbeobachtung nicht einsetzbar, da diese für einen anderen Zweck entwickelt worden sei, betont Prof. Gerhard Rigoll, Leiter des Fachgebiets Technische Informatik an der Duisburger Hochschule. Für die Umwandlung von Sprache in Text wurde die Alert-Software gründlich trainiert. Aus Zeitungstexten, Audio- und Video-Material wurden bislang rund 3 50 Millionen Wörter verarbeitet. Das System arbeitet in drei Sprachen. Doch so ganz fehlerfrei sei der automatisch gewonnene Text nicht, räumt Rigoll ein. Zurzeit liegt die Erkennungsrate bei 40 bis 70 Prozent. Und das wird sich in absehbarer Zeit auch nicht ändern." Musiküberlagerungen oder starke Hintergrundgeräusche bei Reportagen führen zu Ungenauigkeiten bei der Textumwandlung. Deshalb haben die, Duisburger Wissenschaftler Methoden entwickelt, die über die herkömmliche Suche nach Schlüsselwörtern hinausgehen und eine inhaltsorientierte Zuordnung ermöglichen. Dadurch erhält der Nutzer dann auch solche Nachrichten, die zwar zum Thema passen, in denen das Stichwort aber gar nicht auftaucht", bringt Rigoll den Vorteil der Technik auf den Punkt. Wird beispielsweise "Ölpreis" als Suchbegriff eingegeben, werden auch solche Nachrichten angezeigt, in denen Olkonzerne und Energieagenturen eine Rolle spielen. Rigoll: Das Alert-System liest sozusagen zwischen den Zeilen!' Das Forschungsprojekt wurde vor einem Jahr gestartet und läuft noch bis Mitte 2002. Wer sich über den Stand der Technik informieren möchte, kann dies in dieser Woche auf der Industriemesse in Hannover. Das Alert-System wird auf dem Gemeinschaftsstand "Forschungsland NRW" in Halle 18, Stand M12, präsentiert

Source

Handelsblatt. Nr.79 vom 24.4.2001, S.22

Type

a
Noever, D.; Ciolino, M.: ¬The Turing deception (2022) 0.05
```
0.045152474 = product of:
  0.09030495 = sum of:
    0.08033655 = product of:
      0.24100964 = sum of:
        0.24100964 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
          0.24100964 = score(doc=862,freq=2.0), product of:
            0.428829 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.05058132 = queryNorm
            0.56201804 = fieldWeight in 862, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=862)
      0.33333334 = coord(1/3)
    0.0099684 = weight(_text_:a in 862) [ClassicSimilarity], result of:
      0.0099684 = score(doc=862,freq=10.0), product of:
        0.05832264 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.05058132 = queryNorm
        0.1709182 = fieldWeight in 862, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=862)
  0.5 = coord(2/4)
```
Abstract

This research revisits the classic Turing test and compares recent large language models such as ChatGPT for their abilities to reproduce human-level comprehension and compelling text generation. Two task challenges- summary and question answering- prompt ChatGPT to produce original content (98-99%) from a single text entry and sequential questions initially posed by Turing in 1950. We score the original and generated content against the OpenAI GPT-2 Output Detector from 2019, and establish multiple cases where the generated content proves original and undetectable (98%). The question of a machine fooling a human judge recedes in this work relative to the question of "how would one prove it?" The original contribution of the work presents a metric and simple grammatical set for understanding the writing mechanics of chatbots in evaluating their readability and statistical clarity, engagement, delivery, overall quality, and plagiarism risks. While Turing's original prose scores at least 14% below the machine-generated output, whether an algorithm displays hints of Turing's true initial thoughts (the "Lovelace 2.0" test) remains unanswerable.

Source

https%3A%2F%2Farxiv.org%2Fabs%2F2212.06721&usg=AOvVaw3i_9pZm9y_dQWoHi6uv0EN

Type

a

Paolillo, J.C.: Linguistics and the information sciences (2009) 0.05

0.045134634 = product of:
  0.09026927 = sum of:
    0.010402009 = weight(_text_:a in 3840) [ClassicSimilarity], result of:
      0.010402009 = score(doc=3840,freq=8.0), product of:
        0.05832264 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.05058132 = queryNorm
        0.17835285 = fieldWeight in 3840, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3840)
    0.07986726 = sum of:
      0.031895764 = weight(_text_:information in 3840) [ClassicSimilarity], result of:
        0.031895764 = score(doc=3840,freq=14.0), product of:
          0.088794395 = queryWeight, product of:
            1.7554779 = idf(docFreq=20772, maxDocs=44218)
            0.05058132 = queryNorm
          0.3592092 = fieldWeight in 3840, product of:
            3.7416575 = tf(freq=14.0), with freq of:
              14.0 = termFreq=14.0
            1.7554779 = idf(docFreq=20772, maxDocs=44218)
            0.0546875 = fieldNorm(doc=3840)
      0.047971494 = weight(_text_:22 in 3840) [ClassicSimilarity], result of:
        0.047971494 = score(doc=3840,freq=2.0), product of:
          0.17712717 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05058132 = queryNorm
          0.2708308 = fieldWeight in 3840, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=3840)
  0.5 = coord(2/4)

Abstract: Linguistics is the scientific study of language which emphasizes language spoken in everyday settings by human beings. It has a long history of interdisciplinarity, both internally and in contribution to other fields, including information science. A linguistic perspective is beneficial in many ways in information science, since it examines the relationship between the forms of meaningful expressions and their social, cognitive, institutional, and communicative context, these being two perspectives on information that are actively studied, to different degrees, in information science. Examples of issues relevant to information science are presented for which the approach taken under a linguistic perspective is illustrated.
Date: 27. 8.2011 14:22:33
Source: Encyclopedia of library and information sciences. 3rd ed. Ed.: M.J. Bates
Type: a

Riloff, E.: ¬An empirical study of automated dictionary construction for information extraction in three domains (1996) 0.04

0.04354715 = product of:
  0.0870943 = sum of:
    0.008406092 = weight(_text_:a in 6752) [ClassicSimilarity], result of:
      0.008406092 = score(doc=6752,freq=4.0), product of:
        0.05832264 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.05058132 = queryNorm
        0.14413087 = fieldWeight in 6752, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0625 = fieldNorm(doc=6752)
    0.078688204 = sum of:
      0.023863636 = weight(_text_:information in 6752) [ClassicSimilarity], result of:
        0.023863636 = score(doc=6752,freq=6.0), product of:
          0.088794395 = queryWeight, product of:
            1.7554779 = idf(docFreq=20772, maxDocs=44218)
            0.05058132 = queryNorm
          0.2687516 = fieldWeight in 6752, product of:
            2.4494898 = tf(freq=6.0), with freq of:
              6.0 = termFreq=6.0
            1.7554779 = idf(docFreq=20772, maxDocs=44218)
            0.0625 = fieldNorm(doc=6752)
      0.054824565 = weight(_text_:22 in 6752) [ClassicSimilarity], result of:
        0.054824565 = score(doc=6752,freq=2.0), product of:
          0.17712717 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05058132 = queryNorm
          0.30952093 = fieldWeight in 6752, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0625 = fieldNorm(doc=6752)
  0.5 = coord(2/4)

Abstract: AutoSlog is a system that addresses the knowledge engineering bottleneck for information extraction. AutoSlog automatically creates domain specific dictionaries for information extraction, given an appropriate training corpus. Describes experiments with AutoSlog in terrorism, joint ventures and microelectronics domains. Compares the performance of AutoSlog across the 3 domains, discusses the lessons learned and presents results from 2 experiments which demonstrate that novice users can generate effective dictionaries using AutoSlog
Date: 6. 3.1997 16:22:15
Type: a

Haas, S.W.: Natural language processing : toward large-scale, robust systems (1996) 0.04

0.041357618 = product of:
  0.082715236 = sum of:
    0.008406092 = weight(_text_:a in 7415) [ClassicSimilarity], result of:
      0.008406092 = score(doc=7415,freq=4.0), product of:
        0.05832264 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.05058132 = queryNorm
        0.14413087 = fieldWeight in 7415, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0625 = fieldNorm(doc=7415)
    0.07430914 = sum of:
      0.019484574 = weight(_text_:information in 7415) [ClassicSimilarity], result of:
        0.019484574 = score(doc=7415,freq=4.0), product of:
          0.088794395 = queryWeight, product of:
            1.7554779 = idf(docFreq=20772, maxDocs=44218)
            0.05058132 = queryNorm
          0.21943474 = fieldWeight in 7415, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            1.7554779 = idf(docFreq=20772, maxDocs=44218)
            0.0625 = fieldNorm(doc=7415)
      0.054824565 = weight(_text_:22 in 7415) [ClassicSimilarity], result of:
        0.054824565 = score(doc=7415,freq=2.0), product of:
          0.17712717 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05058132 = queryNorm
          0.30952093 = fieldWeight in 7415, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0625 = fieldNorm(doc=7415)
  0.5 = coord(2/4)

Abstract: State of the art review of natural language processing updating an earlier review published in ARIST 22(1987). Discusses important developments that have allowed for significant advances in the field of natural language processing: materials and resources; knowledge based systems and statistical approaches; and a strong emphasis on evaluation. Reviews some natural language processing applications and common problems still awaiting solution. Considers closely related applications such as language generation and th egeneration phase of machine translation which face the same problems as natural language processing. Covers natural language methodologies for information retrieval only briefly
Source: Annual review of information science and technology. 31(1996), S.83-119
Type: a

Liddy, E.D.: Natural language processing for information retrieval and knowledge discovery (1998) 0.04

0.041141834 = product of:
  0.08228367 = sum of:
    0.0073553314 = weight(_text_:a in 2345) [ClassicSimilarity], result of:
      0.0073553314 = score(doc=2345,freq=4.0), product of:
        0.05832264 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.05058132 = queryNorm
        0.12611452 = fieldWeight in 2345, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2345)
    0.074928336 = sum of:
      0.026956841 = weight(_text_:information in 2345) [ClassicSimilarity], result of:
        0.026956841 = score(doc=2345,freq=10.0), product of:
          0.088794395 = queryWeight, product of:
            1.7554779 = idf(docFreq=20772, maxDocs=44218)
            0.05058132 = queryNorm
          0.3035872 = fieldWeight in 2345, product of:
            3.1622777 = tf(freq=10.0), with freq of:
              10.0 = termFreq=10.0
            1.7554779 = idf(docFreq=20772, maxDocs=44218)
            0.0546875 = fieldNorm(doc=2345)
      0.047971494 = weight(_text_:22 in 2345) [ClassicSimilarity], result of:
        0.047971494 = score(doc=2345,freq=2.0), product of:
          0.17712717 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05058132 = queryNorm
          0.2708308 = fieldWeight in 2345, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=2345)
  0.5 = coord(2/4)

Abstract: Natural language processing (NLP) is a powerful technology for the vital tasks of information retrieval (IR) and knowledge discovery (KD) which, in turn, feed the visualization systems of the present and future and enable knowledge workers to focus more of their time on the vital tasks of analysis and prediction
Date: 22. 9.1997 19:16:05
Imprint: Urbana-Champaign, IL : Illinois University at Urbana-Champaign, Graduate School of Library and Information Science
Source: Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al
Type: a

Wanner, L.: Lexical choice in text generation and machine translation (1996) 0.04

0.037273124 = product of:
  0.07454625 = sum of:
    0.0059440047 = weight(_text_:a in 8521) [ClassicSimilarity], result of:
      0.0059440047 = score(doc=8521,freq=2.0), product of:
        0.05832264 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.05058132 = queryNorm
        0.10191591 = fieldWeight in 8521, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0625 = fieldNorm(doc=8521)
    0.06860224 = sum of:
      0.013777675 = weight(_text_:information in 8521) [ClassicSimilarity], result of:
        0.013777675 = score(doc=8521,freq=2.0), product of:
          0.088794395 = queryWeight, product of:
            1.7554779 = idf(docFreq=20772, maxDocs=44218)
            0.05058132 = queryNorm
          0.1551638 = fieldWeight in 8521, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            1.7554779 = idf(docFreq=20772, maxDocs=44218)
            0.0625 = fieldNorm(doc=8521)
      0.054824565 = weight(_text_:22 in 8521) [ClassicSimilarity], result of:
        0.054824565 = score(doc=8521,freq=2.0), product of:
          0.17712717 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05058132 = queryNorm
          0.30952093 = fieldWeight in 8521, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0625 = fieldNorm(doc=8521)
  0.5 = coord(2/4)

Abstract: Presents the state of the art in lexical choice research in text generation and machine translation. Discusses the existing implementations with respect to: the place of lexical choice in the overall generation rates; the information flow within the generation process and the consequences thereof for lexical choice; the internal organization of the lexical choice process; and the phenomena covered by lexical choice. Identifies possible future directions in lexical choice research
Date: 31. 7.1996 9:22:19
Type: a

Search (735 results, page 1 of 37)

Authors

Years

Languages

Types

Themes

Subjects

Classifications