Search (45 results, page 1 of 3)

  • × theme_ss:"Computerlinguistik"
  • × type_ss:"a"
  1. Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.18
    0.18370643 = product of:
      0.24494192 = sum of:
        0.05873049 = product of:
          0.17619146 = sum of:
            0.17619146 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
              0.17619146 = score(doc=562,freq=2.0), product of:
                0.31349787 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.03697776 = queryNorm
                0.56201804 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.33333334 = coord(1/3)
        0.17619146 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
          0.17619146 = score(doc=562,freq=2.0), product of:
            0.31349787 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.03697776 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
        0.010019952 = product of:
          0.030059857 = sum of:
            0.030059857 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
              0.030059857 = score(doc=562,freq=2.0), product of:
                0.12948982 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03697776 = queryNorm
                0.23214069 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.33333334 = coord(1/3)
      0.75 = coord(3/4)
    
    Content
    Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
    Date
    8. 1.2013 10:22:32
  2. Noever, D.; Ciolino, M.: ¬The Turing deception (2022) 0.12
    0.11746098 = product of:
      0.23492196 = sum of:
        0.05873049 = product of:
          0.17619146 = sum of:
            0.17619146 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
              0.17619146 = score(doc=862,freq=2.0), product of:
                0.31349787 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.03697776 = queryNorm
                0.56201804 = fieldWeight in 862, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=862)
          0.33333334 = coord(1/3)
        0.17619146 = weight(_text_:2f in 862) [ClassicSimilarity], result of:
          0.17619146 = score(doc=862,freq=2.0), product of:
            0.31349787 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.03697776 = queryNorm
            0.56201804 = fieldWeight in 862, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=862)
      0.5 = coord(2/4)
    
    Source
    https%3A%2F%2Farxiv.org%2Fabs%2F2212.06721&usg=AOvVaw3i_9pZm9y_dQWoHi6uv0EN
  3. Thelwall, M.; Price, L.: Language evolution and the spread of ideas on the Web : a procedure for identifying emergent hybrid word (2006) 0.02
    0.024313705 = product of:
      0.09725482 = sum of:
        0.09725482 = weight(_text_:evolution in 5896) [ClassicSimilarity], result of:
          0.09725482 = score(doc=5896,freq=4.0), product of:
            0.19585751 = queryWeight, product of:
              5.29663 = idf(docFreq=601, maxDocs=44218)
              0.03697776 = queryNorm
            0.49655905 = fieldWeight in 5896, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.29663 = idf(docFreq=601, maxDocs=44218)
              0.046875 = fieldNorm(doc=5896)
      0.25 = coord(1/4)
    
    Abstract
    Word usage is of interest to linguists for its own sake as well as to social scientists and others who seek to track the spread of ideas, for example, in public debates over political decisions. The historical evolution of language can be analyzed with the tools of corpus linguistics through evolving corpora and the Web. But word usage statistics can only be gathered for known words. In this article, techniques are described and tested for identifying new words from the Web, focusing on the case when the words are related to a topic and have a hybrid form with a common sequence of letters. The results highlight the need to employ a combination of search techniques and show the wide potential of hybrid word family investigations in linguistics and social science.
  4. Robertson, A.M.; Willett, P.: Generation of equifrequent groups of words using a genetic algorithm (1994) 0.02
    0.020057783 = product of:
      0.08023113 = sum of:
        0.08023113 = weight(_text_:evolution in 8158) [ClassicSimilarity], result of:
          0.08023113 = score(doc=8158,freq=2.0), product of:
            0.19585751 = queryWeight, product of:
              5.29663 = idf(docFreq=601, maxDocs=44218)
              0.03697776 = queryNorm
            0.4096403 = fieldWeight in 8158, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.29663 = idf(docFreq=601, maxDocs=44218)
              0.0546875 = fieldNorm(doc=8158)
      0.25 = coord(1/4)
    
    Abstract
    Genetic algorithms are a class of non-deterministic algorithms that derive from Darwinian evolution and that provide good, though not necessarily optimal, solutions to combinatorial problems. We describe their application to the identification of characteristics that occur approximately equifrequently in a database, using two different methods for the creation of the chromosome data structures that lie at the heart of a genetic algortihm. Experiments with files of English and Turkish text suggest that the genetic algorithm developed here can produce results superior to those produced by existing non-deterministic algorithms; however, the results are inferior to those produced by an existing deterministic algorithm
  5. Ibekwe-SanJuan, F.; SanJuan, E.: From term variants to research topics (2002) 0.01
    0.014326988 = product of:
      0.05730795 = sum of:
        0.05730795 = weight(_text_:evolution in 1853) [ClassicSimilarity], result of:
          0.05730795 = score(doc=1853,freq=2.0), product of:
            0.19585751 = queryWeight, product of:
              5.29663 = idf(docFreq=601, maxDocs=44218)
              0.03697776 = queryNorm
            0.2926002 = fieldWeight in 1853, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.29663 = idf(docFreq=601, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1853)
      0.25 = coord(1/4)
    
    Abstract
    In a scientific and technological watch (STW) task, an expert user needs to survey the evolution of research topics in his area of specialisation in order to detect interesting changes. The majority of methods proposing evaluation metrics (bibliometrics and scientometrics studies) for STW rely solely an statistical data analysis methods (Co-citation analysis, co-word analysis). Such methods usually work an structured databases where the units of analysis (words, keywords) are already attributed to documents by human indexers. The advent of huge amounts of unstructured textual data has rendered necessary the integration of natural language processing (NLP) techniques to first extract meaningful units from texts. We propose a method for STW which is NLP-oriented. The method not only analyses texts linguistically in order to extract terms from them, but also uses linguistic relations (syntactic variations) as the basis for clustering. Terms and variation relations are formalised as weighted di-graphs which the clustering algorithm, CPCL (Classification by Preferential Clustered Link) will seek to reduce in order to produces classes. These classes ideally represent the research topics present in the corpus. The results of the classification are subjected to validation by an expert in STW.
  6. Savoy, J.: Text representation strategies : an example with the State of the union addresses (2016) 0.01
    0.014326988 = product of:
      0.05730795 = sum of:
        0.05730795 = weight(_text_:evolution in 3042) [ClassicSimilarity], result of:
          0.05730795 = score(doc=3042,freq=2.0), product of:
            0.19585751 = queryWeight, product of:
              5.29663 = idf(docFreq=601, maxDocs=44218)
              0.03697776 = queryNorm
            0.2926002 = fieldWeight in 3042, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.29663 = idf(docFreq=601, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3042)
      0.25 = coord(1/4)
    
    Abstract
    Based on State of the Union addresses from 1790 to 2014 (225 speeches delivered by 42 presidents), this paper describes and evaluates different text representation strategies. To determine the most important words of a given text, the term frequencies (tf) or the tf?idf weighting scheme can be applied. Recently, latent Dirichlet allocation (LDA) has been proposed to define the topics included in a corpus. As another strategy, this study proposes to apply a vocabulary specificity measure (Z?score) to determine the most significantly overused word-types or short sequences of them. Our experiments show that the simple term frequency measure is not able to discriminate between specific terms associated with a document or a set of texts. Using the tf idf or LDA approach, the selection requires some arbitrary decisions. Based on the term-specific measure (Z?score), the term selection has a clear theoretical basis. Moreover, the most significant sentences for each presidency can be determined. As another facet, we can visualize the dynamic evolution of usage of some terms associated with their specificity measures. Finally, this technique can be employed to define the most important lexical leaders introducing terms overused by the k following presidencies.
  7. Soni, S.; Lerman, K.; Eisenstein, J.: Follow the leader : documents on the leading edge of semantic change get more citations (2021) 0.01
    0.014326988 = product of:
      0.05730795 = sum of:
        0.05730795 = weight(_text_:evolution in 169) [ClassicSimilarity], result of:
          0.05730795 = score(doc=169,freq=2.0), product of:
            0.19585751 = queryWeight, product of:
              5.29663 = idf(docFreq=601, maxDocs=44218)
              0.03697776 = queryNorm
            0.2926002 = fieldWeight in 169, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.29663 = idf(docFreq=601, maxDocs=44218)
              0.0390625 = fieldNorm(doc=169)
      0.25 = coord(1/4)
    
    Abstract
    Diachronic word embeddings-vector representations of words over time-offer remarkable insights into the evolution of language and provide a tool for quantifying sociocultural change from text documents. Prior work has used such embeddings to identify shifts in the meaning of individual words. However, simply knowing that a word has changed in meaning is insufficient to identify the instances of word usage that convey the historical meaning or the newer meaning. In this study, we link diachronic word embeddings to documents, by situating those documents as leaders or laggards with respect to ongoing semantic changes. Specifically, we propose a novel method to quantify the degree of semantic progressiveness in each word usage, and then show how these usages can be aggregated to obtain scores for each document. We analyze two large collections of documents, representing legal opinions and scientific articles. Documents that are scored as semantically progressive receive a larger number of citations, indicating that they are especially influential. Our work thus provides a new technique for identifying lexical semantic leaders and demonstrates a new link between progressive use of language and influence in a citation network.
  8. Warner, A.J.: Natural language processing (1987) 0.01
    0.0066799684 = product of:
      0.026719874 = sum of:
        0.026719874 = product of:
          0.08015962 = sum of:
            0.08015962 = weight(_text_:22 in 337) [ClassicSimilarity], result of:
              0.08015962 = score(doc=337,freq=2.0), product of:
                0.12948982 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03697776 = queryNorm
                0.61904186 = fieldWeight in 337, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.125 = fieldNorm(doc=337)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Source
    Annual review of information science and technology. 22(1987), S.79-108
  9. McMahon, J.G.; Smith, F.J.: Improved statistical language model performance with automatic generated word hierarchies (1996) 0.01
    0.0058449726 = product of:
      0.02337989 = sum of:
        0.02337989 = product of:
          0.07013967 = sum of:
            0.07013967 = weight(_text_:22 in 3164) [ClassicSimilarity], result of:
              0.07013967 = score(doc=3164,freq=2.0), product of:
                0.12948982 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03697776 = queryNorm
                0.5416616 = fieldWeight in 3164, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=3164)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Source
    Computational linguistics. 22(1996) no.2, S.217-248
  10. Ruge, G.: ¬A spreading activation network for automatic generation of thesaurus relationships (1991) 0.01
    0.0058449726 = product of:
      0.02337989 = sum of:
        0.02337989 = product of:
          0.07013967 = sum of:
            0.07013967 = weight(_text_:22 in 4506) [ClassicSimilarity], result of:
              0.07013967 = score(doc=4506,freq=2.0), product of:
                0.12948982 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03697776 = queryNorm
                0.5416616 = fieldWeight in 4506, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=4506)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Date
    8.10.2000 11:52:22
  11. Somers, H.: Example-based machine translation : Review article (1999) 0.01
    0.0058449726 = product of:
      0.02337989 = sum of:
        0.02337989 = product of:
          0.07013967 = sum of:
            0.07013967 = weight(_text_:22 in 6672) [ClassicSimilarity], result of:
              0.07013967 = score(doc=6672,freq=2.0), product of:
                0.12948982 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03697776 = queryNorm
                0.5416616 = fieldWeight in 6672, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=6672)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Date
    31. 7.1996 9:22:19
  12. Baayen, R.H.; Lieber, H.: Word frequency distributions and lexical semantics (1997) 0.01
    0.0058449726 = product of:
      0.02337989 = sum of:
        0.02337989 = product of:
          0.07013967 = sum of:
            0.07013967 = weight(_text_:22 in 3117) [ClassicSimilarity], result of:
              0.07013967 = score(doc=3117,freq=2.0), product of:
                0.12948982 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03697776 = queryNorm
                0.5416616 = fieldWeight in 3117, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=3117)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Date
    28. 2.1999 10:48:22
  13. ¬Der Student aus dem Computer (2023) 0.01
    0.0058449726 = product of:
      0.02337989 = sum of:
        0.02337989 = product of:
          0.07013967 = sum of:
            0.07013967 = weight(_text_:22 in 1079) [ClassicSimilarity], result of:
              0.07013967 = score(doc=1079,freq=2.0), product of:
                0.12948982 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03697776 = queryNorm
                0.5416616 = fieldWeight in 1079, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=1079)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Date
    27. 1.2023 16:22:55
  14. Byrne, C.C.; McCracken, S.A.: ¬An adaptive thesaurus employing semantic distance, relational inheritance and nominal compound interpretation for linguistic support of information retrieval (1999) 0.01
    0.005009976 = product of:
      0.020039905 = sum of:
        0.020039905 = product of:
          0.060119715 = sum of:
            0.060119715 = weight(_text_:22 in 4483) [ClassicSimilarity], result of:
              0.060119715 = score(doc=4483,freq=2.0), product of:
                0.12948982 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03697776 = queryNorm
                0.46428138 = fieldWeight in 4483, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=4483)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Date
    15. 3.2000 10:22:37
  15. Monnerjahn, P.: Vorsprung ohne Technik : Übersetzen: Computer und Qualität (2000) 0.01
    0.005009976 = product of:
      0.020039905 = sum of:
        0.020039905 = product of:
          0.060119715 = sum of:
            0.060119715 = weight(_text_:22 in 5429) [ClassicSimilarity], result of:
              0.060119715 = score(doc=5429,freq=2.0), product of:
                0.12948982 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03697776 = queryNorm
                0.46428138 = fieldWeight in 5429, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=5429)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Source
    c't. 2000, H.22, S.230-231
  16. Hutchins, J.: From first conception to first demonstration : the nascent years of machine translation, 1947-1954. A chronology (1997) 0.00
    0.0041749803 = product of:
      0.016699921 = sum of:
        0.016699921 = product of:
          0.050099764 = sum of:
            0.050099764 = weight(_text_:22 in 1463) [ClassicSimilarity], result of:
              0.050099764 = score(doc=1463,freq=2.0), product of:
                0.12948982 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03697776 = queryNorm
                0.38690117 = fieldWeight in 1463, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=1463)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Date
    31. 7.1996 9:22:19
  17. Kuhlmann, U.; Monnerjahn, P.: Sprache auf Knopfdruck : Sieben automatische Übersetzungsprogramme im Test (2000) 0.00
    0.0041749803 = product of:
      0.016699921 = sum of:
        0.016699921 = product of:
          0.050099764 = sum of:
            0.050099764 = weight(_text_:22 in 5428) [ClassicSimilarity], result of:
              0.050099764 = score(doc=5428,freq=2.0), product of:
                0.12948982 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03697776 = queryNorm
                0.38690117 = fieldWeight in 5428, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=5428)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Source
    c't. 2000, H.22, S.220-229
  18. Lezius, W.; Rapp, R.; Wettler, M.: ¬A morphology-system and part-of-speech tagger for German (1996) 0.00
    0.0041749803 = product of:
      0.016699921 = sum of:
        0.016699921 = product of:
          0.050099764 = sum of:
            0.050099764 = weight(_text_:22 in 1693) [ClassicSimilarity], result of:
              0.050099764 = score(doc=1693,freq=2.0), product of:
                0.12948982 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03697776 = queryNorm
                0.38690117 = fieldWeight in 1693, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=1693)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Date
    22. 3.2015 9:37:18
  19. Wanner, L.: Lexical choice in text generation and machine translation (1996) 0.00
    0.0033399842 = product of:
      0.013359937 = sum of:
        0.013359937 = product of:
          0.04007981 = sum of:
            0.04007981 = weight(_text_:22 in 8521) [ClassicSimilarity], result of:
              0.04007981 = score(doc=8521,freq=2.0), product of:
                0.12948982 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03697776 = queryNorm
                0.30952093 = fieldWeight in 8521, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=8521)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Date
    31. 7.1996 9:22:19
  20. Riloff, E.: ¬An empirical study of automated dictionary construction for information extraction in three domains (1996) 0.00
    0.0033399842 = product of:
      0.013359937 = sum of:
        0.013359937 = product of:
          0.04007981 = sum of:
            0.04007981 = weight(_text_:22 in 6752) [ClassicSimilarity], result of:
              0.04007981 = score(doc=6752,freq=2.0), product of:
                0.12948982 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03697776 = queryNorm
                0.30952093 = fieldWeight in 6752, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=6752)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Date
    6. 3.1997 16:22:15