Search (715 results, page 1 of 36)

  • × theme_ss:"Computerlinguistik"
  1. Stock, W.G.: Textwortmethode : Norbert Henrichs zum 65. (3) (2000) 0.10
    0.09685992 = product of:
      0.16143319 = sum of:
        0.004876186 = weight(_text_:a in 4891) [ClassicSimilarity], result of:
          0.004876186 = score(doc=4891,freq=2.0), product of:
            0.047845192 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.041494574 = queryNorm
            0.10191591 = fieldWeight in 4891, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=4891)
        0.04961738 = product of:
          0.09923476 = sum of:
            0.09923476 = weight(_text_:dewey in 4891) [ClassicSimilarity], result of:
              0.09923476 = score(doc=4891,freq=2.0), product of:
                0.21583907 = queryWeight, product of:
                  5.2016215 = idf(docFreq=661, maxDocs=44218)
                  0.041494574 = queryNorm
                0.45976272 = fieldWeight in 4891, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.2016215 = idf(docFreq=661, maxDocs=44218)
                  0.0625 = fieldNorm(doc=4891)
          0.5 = coord(1/2)
        0.10693963 = product of:
          0.21387926 = sum of:
            0.21387926 = weight(_text_:melvil in 4891) [ClassicSimilarity], result of:
              0.21387926 = score(doc=4891,freq=2.0), product of:
                0.316871 = queryWeight, product of:
                  7.636444 = idf(docFreq=57, maxDocs=44218)
                  0.041494574 = queryNorm
                0.67497265 = fieldWeight in 4891, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  7.636444 = idf(docFreq=57, maxDocs=44218)
                  0.0625 = fieldNorm(doc=4891)
          0.5 = coord(1/2)
      0.6 = coord(3/5)
    
    Abstract
    Nur wenige Dokumentationsmethoden werden mit dem Namen ihrer Entwickler assoziiert. Ausnahmen sind Melvil Dewey (DDC), S.R. Ranganathan (Colon Classification) - und Norbert Henrichs. Seine Textwortmethode ermöglicht die Indexierung und das Retrieval von Literatur aus Fachgebieten, die keine allseits akzeptierte Fachterminologie vorweisen, also viele Sozial- und Geisteswissenschaften, vorneweg die Philosophie. Für den Einsatz in der elektronischen Philosophie-Dokumentation hat Henrichs in den späten sechziger Jahren die Textwortmethode entworfen. Er ist damit nicht nur einer der Pioniere der Anwendung der elektronischen Datenverarbeitung in der Informationspraxis, sondern auch der Pionier bei der Dokumentation terminologisch nicht starrer Fachsprachen
    Type
    a
  2. Schwarz, C.: THESYS: Thesaurus Syntax System : a fully automatic thesaurus building aid (1988) 0.06
    0.06311665 = product of:
      0.105194405 = sum of:
        0.008533326 = weight(_text_:a in 1361) [ClassicSimilarity], result of:
          0.008533326 = score(doc=1361,freq=8.0), product of:
            0.047845192 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.041494574 = queryNorm
            0.17835285 = fieldWeight in 1361, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1361)
        0.07698428 = weight(_text_:63 in 1361) [ClassicSimilarity], result of:
          0.07698428 = score(doc=1361,freq=2.0), product of:
            0.20323344 = queryWeight, product of:
              4.8978314 = idf(docFreq=896, maxDocs=44218)
              0.041494574 = queryNorm
            0.37879732 = fieldWeight in 1361, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.8978314 = idf(docFreq=896, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1361)
        0.019676797 = product of:
          0.039353594 = sum of:
            0.039353594 = weight(_text_:22 in 1361) [ClassicSimilarity], result of:
              0.039353594 = score(doc=1361,freq=2.0), product of:
                0.14530693 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.041494574 = queryNorm
                0.2708308 = fieldWeight in 1361, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1361)
          0.5 = coord(1/2)
      0.6 = coord(3/5)
    
    Abstract
    THESYS is based on the natural language processing of free-text databases. It yields statistically evaluated correlations between words of the database. These correlations correspond to traditional thesaurus relations. The person who has to build a thesaurus is thus assisted by the proposals made by THESYS. THESYS is being tested on commercial databases under real world conditions. It is part of a text processing project at Siemens, called TINA (Text-Inhalts-Analyse). Software from TINA is actually being applied and evaluated by the US Department of Commerce for patent search and indexing (REALIST: REtrieval Aids by Linguistics and STatistics)
    Date
    6. 1.1999 10:22:07
    Pages
    S.63-70
    Type
    a
  3. Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.05
    0.0540507 = product of:
      0.09008449 = sum of:
        0.06590439 = product of:
          0.19771315 = sum of:
            0.19771315 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
              0.19771315 = score(doc=562,freq=2.0), product of:
                0.35179147 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.041494574 = queryNorm
                0.56201804 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.33333334 = coord(1/3)
        0.0073142797 = weight(_text_:a in 562) [ClassicSimilarity], result of:
          0.0073142797 = score(doc=562,freq=8.0), product of:
            0.047845192 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.041494574 = queryNorm
            0.15287387 = fieldWeight in 562, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
        0.016865825 = product of:
          0.03373165 = sum of:
            0.03373165 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
              0.03373165 = score(doc=562,freq=2.0), product of:
                0.14530693 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.041494574 = queryNorm
                0.23214069 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.5 = coord(1/2)
      0.6 = coord(3/5)
    
    Abstract
    Document representations for text classification are typically based on the classical Bag-Of-Words paradigm. This approach comes with deficiencies that motivate the integration of features on a higher semantic level than single words. In this paper we propose an enhancement of the classical document representation through concepts extracted from background knowledge. Boosting is used for actual classification. Experimental evaluations on two well known text corpora support our approach through consistent improvement of the results.
    Content
    Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
    Date
    8. 1.2013 10:22:32
    Type
    a
  4. Ludwig, B.; Reischer, J.: Informationslinguistik in Regensburg (2012) 0.04
    0.03714329 = product of:
      0.092858225 = sum of:
        0.004876186 = weight(_text_:a in 555) [ClassicSimilarity], result of:
          0.004876186 = score(doc=555,freq=2.0), product of:
            0.047845192 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.041494574 = queryNorm
            0.10191591 = fieldWeight in 555, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=555)
        0.087982036 = weight(_text_:63 in 555) [ClassicSimilarity], result of:
          0.087982036 = score(doc=555,freq=2.0), product of:
            0.20323344 = queryWeight, product of:
              4.8978314 = idf(docFreq=896, maxDocs=44218)
              0.041494574 = queryNorm
            0.43291122 = fieldWeight in 555, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.8978314 = idf(docFreq=896, maxDocs=44218)
              0.0625 = fieldNorm(doc=555)
      0.4 = coord(2/5)
    
    Source
    Information - Wissenschaft und Praxis. 63(2012) H.5, S.292-296
    Type
    a
  5. Magennis, M.: Expert rule-based query expansion (1995) 0.03
    0.03374974 = product of:
      0.08437435 = sum of:
        0.0073900777 = weight(_text_:a in 5181) [ClassicSimilarity], result of:
          0.0073900777 = score(doc=5181,freq=6.0), product of:
            0.047845192 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.041494574 = queryNorm
            0.1544581 = fieldWeight in 5181, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5181)
        0.07698428 = weight(_text_:63 in 5181) [ClassicSimilarity], result of:
          0.07698428 = score(doc=5181,freq=2.0), product of:
            0.20323344 = queryWeight, product of:
              4.8978314 = idf(docFreq=896, maxDocs=44218)
              0.041494574 = queryNorm
            0.37879732 = fieldWeight in 5181, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.8978314 = idf(docFreq=896, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5181)
      0.4 = coord(2/5)
    
    Abstract
    Examines how, for term based free text retrieval, Interactive Query Expansion (IQE) provides better retrieval performance tahn Automatic Query Expansion (AQE) but the performance of IQE depends on the strategy employed by the user to select expansion terms. The aim is to build an expert query expansion system using term selection rules based on expert users' strategies. It is expected that such a system will achieve better performance for novice or inexperienced users that either AQE or IQE. The procedure is to discover expert IQE users' term selection strategies through observation and interrogation, to construct a rule based query expansion (RQE) system based on these and to compare the resulting retrieval performance with that of comparable AQE and IQE systems
    Source
    New review of document and text management. 1995, no.1, S.63-83
    Type
    a
  6. Toutanova, K.; Manning, C.D.: Enriching the knowledge sources used in a maximum entropy Part-of-Speech Tagger (2000) 0.03
    0.03374974 = product of:
      0.08437435 = sum of:
        0.0073900777 = weight(_text_:a in 1060) [ClassicSimilarity], result of:
          0.0073900777 = score(doc=1060,freq=6.0), product of:
            0.047845192 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.041494574 = queryNorm
            0.1544581 = fieldWeight in 1060, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1060)
        0.07698428 = weight(_text_:63 in 1060) [ClassicSimilarity], result of:
          0.07698428 = score(doc=1060,freq=2.0), product of:
            0.20323344 = queryWeight, product of:
              4.8978314 = idf(docFreq=896, maxDocs=44218)
              0.041494574 = queryNorm
            0.37879732 = fieldWeight in 1060, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.8978314 = idf(docFreq=896, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1060)
      0.4 = coord(2/5)
    
    Abstract
    This paper presents results for a maximumentropy-based part of speech tagger, which achieves superior performance principally by enriching the information sources used for tagging. In particular, we get improved results by incorporating these features: (i) more extensive treatment of capitalization for unknown words; (ii) features for the disambiguation of the tense forms of verbs; (iii) features for disambiguating particles from prepositions and adverbs. The best resulting accuracy for the tagger on the Penn Treebank is 96.86% overall, and 86.91% on previously unseen words.
    Pages
    S.63-70
    Type
    a
  7. Ingenerf, J.: Disambiguating lexical meaning : conceptual meta-modelling as a means of controlling semantic language analysis (1994) 0.03
    0.03078318 = product of:
      0.07695795 = sum of:
        0.0109714195 = weight(_text_:a in 2572) [ClassicSimilarity], result of:
          0.0109714195 = score(doc=2572,freq=18.0), product of:
            0.047845192 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.041494574 = queryNorm
            0.22931081 = fieldWeight in 2572, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=2572)
        0.06598653 = weight(_text_:63 in 2572) [ClassicSimilarity], result of:
          0.06598653 = score(doc=2572,freq=2.0), product of:
            0.20323344 = queryWeight, product of:
              4.8978314 = idf(docFreq=896, maxDocs=44218)
              0.041494574 = queryNorm
            0.32468343 = fieldWeight in 2572, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.8978314 = idf(docFreq=896, maxDocs=44218)
              0.046875 = fieldNorm(doc=2572)
      0.4 = coord(2/5)
    
    Abstract
    A formal terminology consists of a set of conceptual definitions for the semantical reconstruction of a vocabulary on an intensional level of description. The marking of comparatively abstract concepts as semantic categories and their relational positioning on a meta-level is shown to be instrumental in adapting the conceptual design to domain-specific characteristics. Such a meta-model implies that concepts subsumed by categories may share their compositional possibilities as regards the construction of complex structures. Our approach to language processing leads to an automatic derivation of contextual semantic information about the linguistic expressions under review. This information is encoded by means of values of certain attributes defined in a feature-based grammatical framework. A standard process controlling grammatical analysis, the unification of feature structures, is used for its evaluation. One important example for the usefulness of this approach is the disamgiguation of lexical meaning
    Pages
    S.63-73
    Type
    a
  8. Yang, Y.; Wilbur, J.: Using corpus statistics to remove redundant words in text categorization (1996) 0.03
    0.030264964 = product of:
      0.07566241 = sum of:
        0.009675884 = weight(_text_:a in 4199) [ClassicSimilarity], result of:
          0.009675884 = score(doc=4199,freq=14.0), product of:
            0.047845192 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.041494574 = queryNorm
            0.20223314 = fieldWeight in 4199, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=4199)
        0.06598653 = weight(_text_:63 in 4199) [ClassicSimilarity], result of:
          0.06598653 = score(doc=4199,freq=2.0), product of:
            0.20323344 = queryWeight, product of:
              4.8978314 = idf(docFreq=896, maxDocs=44218)
              0.041494574 = queryNorm
            0.32468343 = fieldWeight in 4199, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.8978314 = idf(docFreq=896, maxDocs=44218)
              0.046875 = fieldNorm(doc=4199)
      0.4 = coord(2/5)
    
    Abstract
    This article studies aggressive word removal in text categorization to reduce the noice in free texts to enhance the computational efficiency of categorization. We use a novel stop word identification method to automatically generate domain specific stoplists which are much larger than a conventional domain-independent stoplist. In our tests with 3 categorization methods on text collections from different domains/applications, significant numbers of words were removed without sacrificing categorization effectiveness. In the test of the Expert Network method on CACM documents, for example, an 87% removal of unique qords reduced the vocabulary of documents from 8.002 distinct words to 1.045 words, which resulted in a 63% time savings and a 74% memory savings in the computation of category ranking, with a 10% precision improvement on average over not using word removal. It is evident in this study that automated word removal based on corpus statistics has a practical and significant impact on the computational tractability of categorization methods in large databases
    Type
    a
  9. Smalheiser, N.R.: Literature-based discovery : Beyond the ABCs (2012) 0.03
    0.029665658 = product of:
      0.074164145 = sum of:
        0.008177614 = weight(_text_:a in 4967) [ClassicSimilarity], result of:
          0.008177614 = score(doc=4967,freq=10.0), product of:
            0.047845192 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.041494574 = queryNorm
            0.1709182 = fieldWeight in 4967, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=4967)
        0.06598653 = weight(_text_:63 in 4967) [ClassicSimilarity], result of:
          0.06598653 = score(doc=4967,freq=2.0), product of:
            0.20323344 = queryWeight, product of:
              4.8978314 = idf(docFreq=896, maxDocs=44218)
              0.041494574 = queryNorm
            0.32468343 = fieldWeight in 4967, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.8978314 = idf(docFreq=896, maxDocs=44218)
              0.046875 = fieldNorm(doc=4967)
      0.4 = coord(2/5)
    
    Abstract
    Literature-based discovery (LBD) refers to a particular type of text mining that seeks to identify nontrivial assertions that are implicit, and not explicitly stated, and that are detected by juxtaposing (generally a large body of) documents. In this review, I will provide a brief overview of LBD, both past and present, and will propose some new directions for the next decade. The prevalent ABC model is not "wrong"; however, it is only one of several different types of models that can contribute to the development of the next generation of LBD tools. Perhaps the most urgent need is to develop a series of objective literature-based interestingness measures, which can customize the output of LBD systems for different types of scientific investigations.
    Source
    Journal of the American Society for Information Science and Technology. 63(2012) no.2, S.218-224
    Type
    a
  10. Zadeh, B.Q.; Handschuh, S.: ¬The ACL RD-TEC : a dataset for benchmarking terminology extraction and classification in computational linguistics (2014) 0.03
    0.029665658 = product of:
      0.074164145 = sum of:
        0.008177614 = weight(_text_:a in 2803) [ClassicSimilarity], result of:
          0.008177614 = score(doc=2803,freq=10.0), product of:
            0.047845192 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.041494574 = queryNorm
            0.1709182 = fieldWeight in 2803, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=2803)
        0.06598653 = weight(_text_:63 in 2803) [ClassicSimilarity], result of:
          0.06598653 = score(doc=2803,freq=2.0), product of:
            0.20323344 = queryWeight, product of:
              4.8978314 = idf(docFreq=896, maxDocs=44218)
              0.041494574 = queryNorm
            0.32468343 = fieldWeight in 2803, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.8978314 = idf(docFreq=896, maxDocs=44218)
              0.046875 = fieldNorm(doc=2803)
      0.4 = coord(2/5)
    
    Abstract
    This paper introduces ACL RD-TEC: a dataset for evaluating the extraction and classification of terms from literature in the domain of computational linguistics. The dataset is derived from the Association for Computational Linguistics anthology reference corpus (ACL ARC). In its first release, the ACL RD-TEC consists of automatically segmented, part-of-speech-tagged ACL ARC documents, three lists of candidate terms, and more than 82,000 manually annotated terms. The annotated terms are marked as either valid or invalid, and valid terms are further classified as technology and non-technology terms. Technology terms signify methods, algorithms, and solutions in computational linguistics. The paper describes the dataset and reports the relevant statistics. We hope the step described in this paper encourages a collaborative effort towards building a full-fledged annotated corpus from the computational linguistics literature.
    Pages
    S.52-63
    Type
    a
  11. Noever, D.; Ciolino, M.: ¬The Turing deception (2022) 0.03
    0.029632801 = product of:
      0.074082 = sum of:
        0.06590439 = product of:
          0.19771315 = sum of:
            0.19771315 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
              0.19771315 = score(doc=862,freq=2.0), product of:
                0.35179147 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.041494574 = queryNorm
                0.56201804 = fieldWeight in 862, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=862)
          0.33333334 = coord(1/3)
        0.008177614 = weight(_text_:a in 862) [ClassicSimilarity], result of:
          0.008177614 = score(doc=862,freq=10.0), product of:
            0.047845192 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.041494574 = queryNorm
            0.1709182 = fieldWeight in 862, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=862)
      0.4 = coord(2/5)
    
    Abstract
    This research revisits the classic Turing test and compares recent large language models such as ChatGPT for their abilities to reproduce human-level comprehension and compelling text generation. Two task challenges- summary and question answering- prompt ChatGPT to produce original content (98-99%) from a single text entry and sequential questions initially posed by Turing in 1950. We score the original and generated content against the OpenAI GPT-2 Output Detector from 2019, and establish multiple cases where the generated content proves original and undetectable (98%). The question of a machine fooling a human judge recedes in this work relative to the question of "how would one prove it?" The original contribution of the work presents a metric and simple grammatical set for understanding the writing mechanics of chatbots in evaluating their readability and statistical clarity, engagement, delivery, overall quality, and plagiarism risks. While Turing's original prose scores at least 14% below the machine-generated output, whether an algorithm displays hints of Turing's true initial thoughts (the "Lovelace 2.0" test) remains unanswerable.
    Source
    https%3A%2F%2Farxiv.org%2Fabs%2F2212.06721&usg=AOvVaw3i_9pZm9y_dQWoHi6uv0EN
    Type
    a
  12. Kracht, M.: Mathematical linguistics (2002) 0.03
    0.028928353 = product of:
      0.07232088 = sum of:
        0.0063343523 = weight(_text_:a in 3572) [ClassicSimilarity], result of:
          0.0063343523 = score(doc=3572,freq=6.0), product of:
            0.047845192 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.041494574 = queryNorm
            0.13239266 = fieldWeight in 3572, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=3572)
        0.06598653 = weight(_text_:63 in 3572) [ClassicSimilarity], result of:
          0.06598653 = score(doc=3572,freq=2.0), product of:
            0.20323344 = queryWeight, product of:
              4.8978314 = idf(docFreq=896, maxDocs=44218)
              0.041494574 = queryNorm
            0.32468343 = fieldWeight in 3572, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.8978314 = idf(docFreq=896, maxDocs=44218)
              0.046875 = fieldNorm(doc=3572)
      0.4 = coord(2/5)
    
    Abstract
    This book studies language(s) and linguistic theories from a mathematical point of view. Starting with ideas already contained in Montague's work, it develops the mathematical foundations of present day linguistics. It equips the reader with all the background necessary to understand and evaluate theories as diverse as Montague Grammar, Categorial Grammar, HPSG and GB. The mathematical tools are mainly from universal algebra and logic, but no particular knowledge is presupposed beyond a certain mathematical sophistication that is in any case needed in order to fruitfully work within these theories. The presentation focuses an abstract mathematical structures and their computational properties, but plenty of examples from different natural languages are provided to illustrate the main concepts and results. In contrast to books devoted to so-called formal language theory, languages are seen here as semiotic systems, that is, as systems of signs. A language sign correlates form with meaning. Using the principle of compositionality it is possible to gain substantial insight into the interaction between form and meaning in natural languages.
    Series
    Studies in generative grammar; 63
  13. Farreús, M.; Costa-jussà, M.R.; Popovic' Morse, M.: Study and correlation analysis of linguistic, perceptual, and automatic machine translation evaluations (2012) 0.03
    0.028463403 = product of:
      0.071158506 = sum of:
        0.0051719765 = weight(_text_:a in 4975) [ClassicSimilarity], result of:
          0.0051719765 = score(doc=4975,freq=4.0), product of:
            0.047845192 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.041494574 = queryNorm
            0.10809815 = fieldWeight in 4975, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=4975)
        0.06598653 = weight(_text_:63 in 4975) [ClassicSimilarity], result of:
          0.06598653 = score(doc=4975,freq=2.0), product of:
            0.20323344 = queryWeight, product of:
              4.8978314 = idf(docFreq=896, maxDocs=44218)
              0.041494574 = queryNorm
            0.32468343 = fieldWeight in 4975, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.8978314 = idf(docFreq=896, maxDocs=44218)
              0.046875 = fieldNorm(doc=4975)
      0.4 = coord(2/5)
    
    Abstract
    Evaluation of machine translation output is an important task. Various human evaluation techniques as well as automatic metrics have been proposed and investigated in the last decade. However, very few evaluation methods take the linguistic aspect into account. In this article, we use an objective evaluation method for machine translation output that classifies all translation errors into one of the five following linguistic levels: orthographic, morphological, lexical, semantic, and syntactic. Linguistic guidelines for the target language are required, and human evaluators use them in to classify the output errors. The experiments are performed on English-to-Catalan and Spanish-to-Catalan translation outputs generated by four different systems: 2 rule-based and 2 statistical. All translations are evaluated using the 3 following methods: a standard human perceptual evaluation method, several widely used automatic metrics, and the human linguistic evaluation. Pearson and Spearman correlation coefficients between the linguistic, perceptual, and automatic results are then calculated, showing that the semantic level correlates significantly with both perceptual evaluation and automatic metrics.
    Source
    Journal of the American Society for Information Science and Technology. 63(2012) no.1, S.174-184
    Type
    a
  14. Natural language processing and speech technology : Results of the 3rd KONVENS Conference, Bielefeld, October 1996 (1996) 0.03
    0.027857468 = product of:
      0.06964367 = sum of:
        0.0036571398 = weight(_text_:a in 7291) [ClassicSimilarity], result of:
          0.0036571398 = score(doc=7291,freq=2.0), product of:
            0.047845192 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.041494574 = queryNorm
            0.07643694 = fieldWeight in 7291, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=7291)
        0.06598653 = weight(_text_:63 in 7291) [ClassicSimilarity], result of:
          0.06598653 = score(doc=7291,freq=2.0), product of:
            0.20323344 = queryWeight, product of:
              4.8978314 = idf(docFreq=896, maxDocs=44218)
              0.041494574 = queryNorm
            0.32468343 = fieldWeight in 7291, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.8978314 = idf(docFreq=896, maxDocs=44218)
              0.046875 = fieldNorm(doc=7291)
      0.4 = coord(2/5)
    
    Content
    Enthält u.a. die Beiträge: HILDEBRANDT, B. u.a.: Kognitive Modellierung von Sprach- und Bildverstehen; KELLER, F.: How do humans deal with ungrammatical input? Experimental evidence and computational modelling; MARX, J:: Die 'Computer-Talk-These' in der Sprachgenerierung: Hinweise zur Gestaltung natürlichsprachlicher Zustandsanzeigen in multimodalen Informationssystemen; SCHULTZ, T. u. H. SOLTAU: Automatische Identifizierung spontan gesprochener Sprachen mit neuronalen Netzen; WAUSCHKUHN, O.: Ein Werkzeug zur partiellen syntaktischen Analyse deutscher Textkorpora; LEZIUS, W., R. RAPP u. M. WETTLER: A morphology-system and part-of-speech tagger for German; KONRAD, K. u.a.: CLEARS: ein Werkzeug für Ausbildung und Forschung in der Computerlinguistik
    Signature
    63 BFP 176 (GWZ)
  15. Levin, M.; Krawczyk, S.; Bethard, S.; Jurafsky, D.: Citation-based bootstrapping for large-scale author disambiguation (2012) 0.03
    0.025443492 = product of:
      0.06360873 = sum of:
        0.008619961 = weight(_text_:a in 246) [ClassicSimilarity], result of:
          0.008619961 = score(doc=246,freq=16.0), product of:
            0.047845192 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.041494574 = queryNorm
            0.18016359 = fieldWeight in 246, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=246)
        0.054988768 = weight(_text_:63 in 246) [ClassicSimilarity], result of:
          0.054988768 = score(doc=246,freq=2.0), product of:
            0.20323344 = queryWeight, product of:
              4.8978314 = idf(docFreq=896, maxDocs=44218)
              0.041494574 = queryNorm
            0.2705695 = fieldWeight in 246, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.8978314 = idf(docFreq=896, maxDocs=44218)
              0.0390625 = fieldNorm(doc=246)
      0.4 = coord(2/5)
    
    Abstract
    We present a new, two-stage, self-supervised algorithm for author disambiguation in large bibliographic databases. In the first "bootstrap" stage, a collection of high-precision features is used to bootstrap a training set with positive and negative examples of coreferring authors. A supervised feature-based classifier is then trained on the bootstrap clusters and used to cluster the authors in a larger unlabeled dataset. Our self-supervised approach shares the advantages of unsupervised approaches (no need for expensive hand labels) as well as supervised approaches (a rich set of features that can be discriminatively trained). The algorithm disambiguates 54,000,000 author instances in Thomson Reuters' Web of Knowledge with B3 F1 of.807. We analyze parameters and features, particularly those from citation networks, which have not been deeply investigated in author disambiguation. The most important citation feature is self-citation, which can be approximated without expensive extraction of the full network. For the supervised stage, the minor improvement due to other citation features (increasing F1 from.748 to.767) suggests they may not be worth the trouble of extracting from databases that don't already have them. A lean feature set without expensive abstract and title features performs 130 times faster with about equal F1.
    Source
    Journal of the American Society for Information Science and Technology. 63(2012) no.5, S.1030-1047
    Type
    a
  16. Cruz Díaz, N.P.; Maña López, M.J.; Mata Vázquez, J.; Pachón Álvarez, V.: ¬A machine-learning approach to negation and speculation detection in clinical texts (2012) 0.03
    0.025443492 = product of:
      0.06360873 = sum of:
        0.008619961 = weight(_text_:a in 283) [ClassicSimilarity], result of:
          0.008619961 = score(doc=283,freq=16.0), product of:
            0.047845192 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.041494574 = queryNorm
            0.18016359 = fieldWeight in 283, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=283)
        0.054988768 = weight(_text_:63 in 283) [ClassicSimilarity], result of:
          0.054988768 = score(doc=283,freq=2.0), product of:
            0.20323344 = queryWeight, product of:
              4.8978314 = idf(docFreq=896, maxDocs=44218)
              0.041494574 = queryNorm
            0.2705695 = fieldWeight in 283, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.8978314 = idf(docFreq=896, maxDocs=44218)
              0.0390625 = fieldNorm(doc=283)
      0.4 = coord(2/5)
    
    Abstract
    Detecting negative and speculative information is essential in most biomedical text-mining tasks where these language forms are used to express impressions, hypotheses, or explanations of experimental results. Our research is focused on developing a system based on machine-learning techniques that identifies negation and speculation signals and their scope in clinical texts. The proposed system works in two consecutive phases: first, a classifier decides whether each token in a sentence is a negation/speculation signal or not. Then another classifier determines, at sentence level, the tokens which are affected by the signals previously identified. The system was trained and evaluated on the clinical texts of the BioScope corpus, a freely available resource consisting of medical and biological texts: full-length articles, scientific abstracts, and clinical reports. The results obtained by our system were compared with those of two different systems, one based on regular expressions and the other based on machine learning. Our system's results outperformed the results obtained by these two systems. In the signal detection task, the F-score value was 97.3% in negation and 94.9% in speculation. In the scope-finding task, a token was correctly classified if it had been properly identified as being inside or outside the scope of all the negation signals present in the sentence. Our proposal showed an F score of 93.2% in negation and 80.9% in speculation. Additionally, the percentage of correct scopes (those with all their tokens correctly classified) was evaluated obtaining F scores of 90.9% in negation and 71.9% in speculation.
    Source
    Journal of the American Society for Information Science and Technology. 63(2012) no.7, S.1398-1410
    Type
    a
  17. Fegley, B.D.; Torvik, V.I.: On the role of poetic versus nonpoetic features in "kindred" and diachronic poetry attribution (2012) 0.02
    0.024981549 = product of:
      0.062453873 = sum of:
        0.0074651055 = weight(_text_:a in 488) [ClassicSimilarity], result of:
          0.0074651055 = score(doc=488,freq=12.0), product of:
            0.047845192 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.041494574 = queryNorm
            0.15602624 = fieldWeight in 488, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=488)
        0.054988768 = weight(_text_:63 in 488) [ClassicSimilarity], result of:
          0.054988768 = score(doc=488,freq=2.0), product of:
            0.20323344 = queryWeight, product of:
              4.8978314 = idf(docFreq=896, maxDocs=44218)
              0.041494574 = queryNorm
            0.2705695 = fieldWeight in 488, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.8978314 = idf(docFreq=896, maxDocs=44218)
              0.0390625 = fieldNorm(doc=488)
      0.4 = coord(2/5)
    
    Abstract
    Author attribution studies have demonstrated remarkable success in applying orthographic and lexicographic features of text in a variety of discrimination problems. What might poetic features, such as syllabic stress and mood, contribute? We address this question in the context of two different attribution problems: (a) kindred: differentiate Langston Hughes' early poems from those of kindred poets and (b) diachronic: differentiate Hughes' early from his later poems. Using a diverse set of 535 generic text features, each categorized as poetic or nonpoetic, correlation-based greedy forward search ranked the features and a support vector machine classified the poems. A small subset of features (~10) achieved cross-validated precision and recall as high as 87%. Poetic features (rhyme patterns particularly) were nearly as effective as nonpoetic in kindred discrimination, but less effective diachronically. In other words, Hughes used both poetic and nonpoetic features in distinctive ways and his use of nonpoetic features evolved systematically while he continued to experiment with poetic features. These findings affirm qualitative studies attesting to structural elements from Black oral tradition and Black folk music (blues) and to the internal consistency of Hughes' early poetry.
    Source
    Journal of the American Society for Information Science and Technology. 63(2012) no.11, S.2165-2181
    Type
    a
  18. Peng, F.; Huang, X.: Machine learning for Asian language text classification (2007) 0.02
    0.024433602 = product of:
      0.061084002 = sum of:
        0.006095233 = weight(_text_:a in 831) [ClassicSimilarity], result of:
          0.006095233 = score(doc=831,freq=8.0), product of:
            0.047845192 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.041494574 = queryNorm
            0.12739488 = fieldWeight in 831, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=831)
        0.054988768 = weight(_text_:63 in 831) [ClassicSimilarity], result of:
          0.054988768 = score(doc=831,freq=2.0), product of:
            0.20323344 = queryWeight, product of:
              4.8978314 = idf(docFreq=896, maxDocs=44218)
              0.041494574 = queryNorm
            0.2705695 = fieldWeight in 831, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.8978314 = idf(docFreq=896, maxDocs=44218)
              0.0390625 = fieldNorm(doc=831)
      0.4 = coord(2/5)
    
    Abstract
    Purpose - The purpose of this research is to compare several machine learning techniques on the task of Asian language text classification, such as Chinese and Japanese where no word boundary information is available in written text. The paper advocates a simple language modeling based approach for this task. Design/methodology/approach - Naïve Bayes, maximum entropy model, support vector machines, and language modeling approaches were implemented and were applied to Chinese and Japanese text classification. To investigate the influence of word segmentation, different word segmentation approaches were investigated and applied to Chinese text. A segmentation-based approach was compared with the non-segmentation-based approach. Findings - There were two findings: the experiments show that statistical language modeling can significantly outperform standard techniques, given the same set of features; and it was found that classification with word level features normally yields improved classification performance, but that classification performance is not monotonically related to segmentation accuracy. In particular, classification performance may initially improve with increased segmentation accuracy, but eventually classification performance stops improving, and can in fact even decrease, after a certain level of segmentation accuracy. Practical implications - Apply the findings to real web text classification is ongoing work. Originality/value - The paper is very relevant to Chinese and Japanese information processing, e.g. webpage classification, web search.
    Source
    Journal of documentation. 63(2007) no.3, S.378-397
    Type
    a
  19. Mengel, T.: Wie viel Terminologiearbeit steckt in der Übersetzung der Dewey-Dezimalklassifikation? (2019) 0.02
    0.022513729 = product of:
      0.05628432 = sum of:
        0.0036571398 = weight(_text_:a in 5603) [ClassicSimilarity], result of:
          0.0036571398 = score(doc=5603,freq=2.0), product of:
            0.047845192 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.041494574 = queryNorm
            0.07643694 = fieldWeight in 5603, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=5603)
        0.05262718 = product of:
          0.10525436 = sum of:
            0.10525436 = weight(_text_:dewey in 5603) [ClassicSimilarity], result of:
              0.10525436 = score(doc=5603,freq=4.0), product of:
                0.21583907 = queryWeight, product of:
                  5.2016215 = idf(docFreq=661, maxDocs=44218)
                  0.041494574 = queryNorm
                0.487652 = fieldWeight in 5603, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.2016215 = idf(docFreq=661, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5603)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    Bibliotheken weltweit setzen die Dewey-Dezimalklassifikation (DDC) als Aufstellungssystematik und/oder zur Katalogsuche ein. Es existieren Übersetzungen der DDC in über 30 Sprachen. Als ein umfassendes System zur Ordnung von Wissen bestehend aus numerischen Notationen und sprachlichen Klasseninhalten bietet die DDC dem Terminologen bzw. der Terminologin ein weites Arbeits- und Forschungsfeld. Aber wie spielen Terminologiearbeit und Übersetzung zusammen, wenn, wie in diesem Fall, die Terminologie selbst das Übersetzungsobjekt ist? Der Aufsatz kann nicht alle Themen erschöpfend behandeln, aber er präsentiert Merkmale der DDC erstmals aus der Perspektive der DDC-Übersetzungsarbeit, und er wirft die Frage auf, ob dem Aspekt der Terminologiearbeit in der DDC-Übersetzung bislang tatsächlich genügend Aufmerksamkeit geschenkt wurde.
    Type
    a
  20. Warner, A.J.: Natural language processing (1987) 0.02
    0.021891166 = product of:
      0.054727912 = sum of:
        0.009752372 = weight(_text_:a in 337) [ClassicSimilarity], result of:
          0.009752372 = score(doc=337,freq=2.0), product of:
            0.047845192 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.041494574 = queryNorm
            0.20383182 = fieldWeight in 337, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.125 = fieldNorm(doc=337)
        0.044975538 = product of:
          0.089951076 = sum of:
            0.089951076 = weight(_text_:22 in 337) [ClassicSimilarity], result of:
              0.089951076 = score(doc=337,freq=2.0), product of:
                0.14530693 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.041494574 = queryNorm
                0.61904186 = fieldWeight in 337, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.125 = fieldNorm(doc=337)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Source
    Annual review of information science and technology. 22(1987), S.79-108
    Type
    a

Languages

Types

  • a 629
  • el 75
  • m 46
  • s 22
  • x 9
  • p 7
  • b 1
  • d 1
  • pat 1
  • r 1
  • More… Less…

Subjects

Classifications