Search (144 results, page 1 of 8)

  • × theme_ss:"Computerlinguistik"
  1. Becker, D.: Automated language processing (1981) 0.08
    0.0763061 = product of:
      0.3052244 = sum of:
        0.3052244 = weight(_text_:becker in 287) [ClassicSimilarity], result of:
          0.3052244 = score(doc=287,freq=2.0), product of:
            0.25693014 = queryWeight, product of:
              6.7201533 = idf(docFreq=144, maxDocs=44218)
              0.03823278 = queryNorm
            1.1879665 = fieldWeight in 287, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.7201533 = idf(docFreq=144, maxDocs=44218)
              0.125 = fieldNorm(doc=287)
      0.25 = coord(1/4)
    
  2. Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.07
    0.0658593 = product of:
      0.0878124 = sum of:
        0.036434274 = product of:
          0.18217137 = sum of:
            0.18217137 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
              0.18217137 = score(doc=562,freq=2.0), product of:
                0.32413796 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.03823278 = queryNorm
                0.56201804 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.2 = coord(1/5)
        0.035838082 = weight(_text_:data in 562) [ClassicSimilarity], result of:
          0.035838082 = score(doc=562,freq=4.0), product of:
            0.120893985 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.03823278 = queryNorm
            0.29644224 = fieldWeight in 562, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
        0.015540041 = product of:
          0.031080082 = sum of:
            0.031080082 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
              0.031080082 = score(doc=562,freq=2.0), product of:
                0.13388468 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03823278 = queryNorm
                0.23214069 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Content
    Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
    Date
    8. 1.2013 10:22:32
    Source
    Proceedings of the 4th IEEE International Conference on Data Mining (ICDM 2004), 1-4 November 2004, Brighton, UK
  3. Basili, R.; Pazienza, M.T.; Velardi, P.: ¬An empirical symbolic approach to natural language processing (1996) 0.03
    0.03425208 = product of:
      0.06850416 = sum of:
        0.04778411 = weight(_text_:data in 6753) [ClassicSimilarity], result of:
          0.04778411 = score(doc=6753,freq=4.0), product of:
            0.120893985 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.03823278 = queryNorm
            0.3952563 = fieldWeight in 6753, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0625 = fieldNorm(doc=6753)
        0.020720055 = product of:
          0.04144011 = sum of:
            0.04144011 = weight(_text_:22 in 6753) [ClassicSimilarity], result of:
              0.04144011 = score(doc=6753,freq=2.0), product of:
                0.13388468 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03823278 = queryNorm
                0.30952093 = fieldWeight in 6753, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=6753)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Describes and evaluates the results of a large scale lexical learning system, ARISTO-LEX, that uses a combination of probabilisitc and knowledge based methods for the acquisition of selectional restrictions of words in sublanguages. Presents experimental data obtained from different corpora in different doamins and languages, and shows that the acquired lexical data not only have practical applications in natural language processing, but they are useful for a comparative analysis of sublanguages
    Date
    6. 3.1997 16:22:15
  4. Semantik, Lexikographie und Computeranwendungen : Workshop ... (Bonn) : 1995.01.27-28 (1996) 0.03
    0.027592812 = product of:
      0.055185623 = sum of:
        0.042235587 = weight(_text_:data in 190) [ClassicSimilarity], result of:
          0.042235587 = score(doc=190,freq=8.0), product of:
            0.120893985 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.03823278 = queryNorm
            0.34936053 = fieldWeight in 190, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=190)
        0.012950035 = product of:
          0.02590007 = sum of:
            0.02590007 = weight(_text_:22 in 190) [ClassicSimilarity], result of:
              0.02590007 = score(doc=190,freq=2.0), product of:
                0.13388468 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03823278 = queryNorm
                0.19345059 = fieldWeight in 190, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=190)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Date
    14. 4.2007 10:04:22
    LCSH
    Semantics / Data processing ; Lexicography / Data processing ; Computational linguistics
    Subject
    Semantics / Data processing ; Lexicography / Data processing ; Computational linguistics
  5. Pinker, S.: Wörter und Regeln : Die Natur der Sprache (2000) 0.03
    0.027216766 = product of:
      0.108867064 = sum of:
        0.108867064 = sum of:
          0.08296699 = weight(_text_:lexikon in 734) [ClassicSimilarity], result of:
            0.08296699 = score(doc=734,freq=2.0), product of:
              0.23962554 = queryWeight, product of:
                6.2675414 = idf(docFreq=227, maxDocs=44218)
                0.03823278 = queryNorm
              0.346236 = fieldWeight in 734, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.2675414 = idf(docFreq=227, maxDocs=44218)
                0.0390625 = fieldNorm(doc=734)
          0.02590007 = weight(_text_:22 in 734) [ClassicSimilarity], result of:
            0.02590007 = score(doc=734,freq=2.0), product of:
              0.13388468 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.03823278 = queryNorm
              0.19345059 = fieldWeight in 734, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=734)
      0.25 = coord(1/4)
    
    Abstract
    Wie lernen Kinder sprechen? Welche Hinweise geben gerade ihre Fehler beim Spracherwerb auf den Ablauf des Lernprozesses - getreu dem Motto: "Kinder sagen die töllsten Sachen«? Und wie helfen beziehungsweise warum scheitern bislang Computer bei der Simulation neuronaler Netzwerke, die am komplizierten Gewebe der menschlichen Sprache mitwirken? In seinem neuen Buch Wörter und Regeln hat der bekannte US-amerikanische Kognitionswissenschaftler Steven Pinker (Der Sprachinstinkt) wieder einmal eine ebenso informative wie kurzweifige Erkundungstour ins Reich der Sprache unternommen. Was die Sache besonders spannend und lesenswert macht: Souverän beleuchtet der Professor am Massachusetts Institute of Technology sowohl natur- als auch geisteswissenschaftliche Aspekte. So vermittelt er einerseits linguistische Grundlagen in den Fußspuren Ferdinand de Saussures, etwa die einer generativen Grammatik, liefert einen Exkurs durch die Sprachgeschichte und widmet ein eigenes Kapitel den Schrecken der deutschen Sprache". Andererseits lässt er aber auch die neuesten bildgebenden Verfahren nicht außen vor, die zeigen, was im Gehirn bei der Sprachverarbeitung abläuft. Pinkers Theorie, die sich in diesem Puzzle verschiedenster Aspekte wiederfindet: Sprache besteht im Kein aus zwei Bestandteilen - einem mentalen Lexikon aus erinnerten Wörtern und einer mentalen Grammatik aus verschiedenen kombinatorischen Regeln. Konkret heißt das: Wir prägen uns bekannte Größen und ihre abgestuften, sich kreuzenden Merkmale ein, aber wir erzeugen auch neue geistige Produkte, in dem wir Regeln anwenden. Gerade daraus, so schließt Pinker, erschließt sich der Reichtum und die ungeheure Ausdruckskraft unserer Sprache
    Date
    19. 7.2002 14:22:31
  6. Liddy, E.D.: Natural language processing for information retrieval and knowledge discovery (1998) 0.02
    0.02384748 = product of:
      0.04769496 = sum of:
        0.02956491 = weight(_text_:data in 2345) [ClassicSimilarity], result of:
          0.02956491 = score(doc=2345,freq=2.0), product of:
            0.120893985 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.03823278 = queryNorm
            0.24455236 = fieldWeight in 2345, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2345)
        0.01813005 = product of:
          0.0362601 = sum of:
            0.0362601 = weight(_text_:22 in 2345) [ClassicSimilarity], result of:
              0.0362601 = score(doc=2345,freq=2.0), product of:
                0.13388468 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03823278 = queryNorm
                0.2708308 = fieldWeight in 2345, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2345)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Date
    22. 9.1997 19:16:05
    Source
    Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al
  7. Egger, W.: Helferlein für jedermann : Elektronische Wörterbücher (2004) 0.02
    0.020741748 = product of:
      0.08296699 = sum of:
        0.08296699 = product of:
          0.16593398 = sum of:
            0.16593398 = weight(_text_:lexikon in 1501) [ClassicSimilarity], result of:
              0.16593398 = score(doc=1501,freq=2.0), product of:
                0.23962554 = queryWeight, product of:
                  6.2675414 = idf(docFreq=227, maxDocs=44218)
                  0.03823278 = queryNorm
                0.692472 = fieldWeight in 1501, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  6.2675414 = idf(docFreq=227, maxDocs=44218)
                  0.078125 = fieldNorm(doc=1501)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Series
    Software: Der große Lexikon-Ratgeber
  8. Rahmstorf, G.: Concept structures for large vocabularies (1998) 0.02
    0.020440696 = product of:
      0.04088139 = sum of:
        0.02534135 = weight(_text_:data in 75) [ClassicSimilarity], result of:
          0.02534135 = score(doc=75,freq=2.0), product of:
            0.120893985 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.03823278 = queryNorm
            0.2096163 = fieldWeight in 75, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=75)
        0.015540041 = product of:
          0.031080082 = sum of:
            0.031080082 = weight(_text_:22 in 75) [ClassicSimilarity], result of:
              0.031080082 = score(doc=75,freq=2.0), product of:
                0.13388468 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03823278 = queryNorm
                0.23214069 = fieldWeight in 75, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=75)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    A technology is described which supports the acquisition, visualisation and manipulation of large vocabularies with associated structures. It is used for dictionary production, terminology data bases, thesauri, library classification systems etc. Essential features of the technology are a lexicographic user interface, variable word description, unlimited list of word readings, a concept language, automatic transformations of formulas into graphic structures, structure manipulation operations and retransformation into formulas. The concept language includes notations for undefined concepts. The structure of defined concepts can be constructed interactively. The technology supports the generation of large vocabularies with structures representing word senses. Concept structures and ordering systems for indexing and retrieval can be constructed separately and connected by associating relations.
    Date
    30.12.2001 19:01:22
  9. Huo, W.: Automatic multi-word term extraction and its application to Web-page summarization (2012) 0.02
    0.020440696 = product of:
      0.04088139 = sum of:
        0.02534135 = weight(_text_:data in 563) [ClassicSimilarity], result of:
          0.02534135 = score(doc=563,freq=2.0), product of:
            0.120893985 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.03823278 = queryNorm
            0.2096163 = fieldWeight in 563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=563)
        0.015540041 = product of:
          0.031080082 = sum of:
            0.031080082 = weight(_text_:22 in 563) [ClassicSimilarity], result of:
              0.031080082 = score(doc=563,freq=2.0), product of:
                0.13388468 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03823278 = queryNorm
                0.23214069 = fieldWeight in 563, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=563)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    In this thesis we propose three new word association measures for multi-word term extraction. We combine these association measures with LocalMaxs algorithm in our extraction model and compare the results of different multi-word term extraction methods. Our approach is language and domain independent and requires no training data. It can be applied to such tasks as text summarization, information retrieval, and document classification. We further explore the potential of using multi-word terms as an effective representation for general web-page summarization. We extract multi-word terms from human written summaries in a large collection of web-pages, and generate the summaries by aligning document words with these multi-word terms. Our system applies machine translation technology to learn the aligning process from a training set and focuses on selecting high quality multi-word terms from human written summaries to generate suitable results for web-page summarization.
    Date
    10. 1.2013 19:22:47
  10. WordNet : an electronic lexical database (language, speech and communication) (1998) 0.02
    0.018104734 = product of:
      0.072418936 = sum of:
        0.072418936 = weight(_text_:data in 2434) [ClassicSimilarity], result of:
          0.072418936 = score(doc=2434,freq=12.0), product of:
            0.120893985 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.03823278 = queryNorm
            0.59902847 = fieldWeight in 2434, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2434)
      0.25 = coord(1/4)
    
    LCSH
    Semantics / Data processing
    Lexicology / Data processing
    English language / Data processing
    Subject
    Semantics / Data processing
    Lexicology / Data processing
    English language / Data processing
  11. Barton, G.E. Jr.; Berwick, R.C.; Ristad, E.S.: Computational complexity and natural language (1987) 0.02
    0.017919041 = product of:
      0.071676165 = sum of:
        0.071676165 = weight(_text_:data in 7138) [ClassicSimilarity], result of:
          0.071676165 = score(doc=7138,freq=4.0), product of:
            0.120893985 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.03823278 = queryNorm
            0.5928845 = fieldWeight in 7138, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.09375 = fieldNorm(doc=7138)
      0.25 = coord(1/4)
    
    LCSH
    Linguistics / Data processing
    Subject
    Linguistics / Data processing
  12. Hausser, R.: Language and nonlanguage cognition (2021) 0.02
    0.01676173 = product of:
      0.06704692 = sum of:
        0.06704692 = weight(_text_:data in 255) [ClassicSimilarity], result of:
          0.06704692 = score(doc=255,freq=14.0), product of:
            0.120893985 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.03823278 = queryNorm
            0.55459267 = fieldWeight in 255, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=255)
      0.25 = coord(1/4)
    
    Abstract
    A basic distinction in agent-based data-driven Database Semantics (DBS) is between language and nonlanguage cognition. Language cognition transfers content between agents by means of raw data. Nonlanguage cognition maps between content and raw data inside the focus agent. {\it Recognition} applies a concept type to raw data, resulting in a concept token. In language recognition, the focus agent (hearer) takes raw language-data (surfaces) produced by another agent (speaker) as input, while nonlanguage recognition takes raw nonlanguage-data as input. In either case, the output is a content which is stored in the agent's onboard short term memory. {\it Action} adapts a concept type to a purpose, resulting in a token. In language action, the focus agent (speaker) produces language-dependent surfaces for another agent (hearer), while nonlanguage action produces intentions for a nonlanguage purpose. In either case, the output is raw action data. As long as the procedural implementation of place holder values works properly, it is compatible with the DBS requirement of input-output equivalence between the natural prototype and the artificial reconstruction.
  13. Arsenault, C.: Aggregation consistency and frequency of Chinese words and characters (2006) 0.02
    0.015838344 = product of:
      0.063353375 = sum of:
        0.063353375 = weight(_text_:data in 609) [ClassicSimilarity], result of:
          0.063353375 = score(doc=609,freq=18.0), product of:
            0.120893985 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.03823278 = queryNorm
            0.52404076 = fieldWeight in 609, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=609)
      0.25 = coord(1/4)
    
    Abstract
    Purpose - Aims to measure syllable aggregation consistency of Romanized Chinese data in the title fields of bibliographic records. Also aims to verify if the term frequency distributions satisfy conventional bibliometric laws. Design/methodology/approach - Uses Cooper's interindexer formula to evaluate aggregation consistency within and between two sets of Chinese bibliographic data. Compares the term frequency distributions of polysyllabic words and monosyllabic characters (for vernacular and Romanized data) with the Lotka and the generalised Zipf theoretical distributions. The fits are tested with the Kolmogorov-Smirnov test. Findings - Finds high internal aggregation consistency within each data set but some aggregation discrepancy between sets. Shows that word (polysyllabic) distributions satisfy Lotka's law but that character (monosyllabic) distributions do not abide by the law. Research limitations/implications - The findings are limited to only two sets of bibliographic data (for aggregation consistency analysis) and to one set of data for the frequency distribution analysis. Only two bibliometric distributions are tested. Internal consistency within each database remains fairly high. Therefore the main argument against syllable aggregation does not appear to hold true. The analysis revealed that Chinese words and characters behave differently in terms of frequency distribution but that there is no noticeable difference between vernacular and Romanized data. The distribution of Romanized characters exhibits the worst case in terms of fit to either Lotka's or Zipf's laws, which indicates that Romanized data in aggregated form appear to be a preferable option. Originality/value - Provides empirical data on consistency and distribution of Romanized Chinese titles in bibliographic records.
  14. Xiang, R.; Chersoni, E.; Lu, Q.; Huang, C.-R.; Li, W.; Long, Y.: Lexical data augmentation for sentiment analysis (2021) 0.01
    0.014932535 = product of:
      0.05973014 = sum of:
        0.05973014 = weight(_text_:data in 392) [ClassicSimilarity], result of:
          0.05973014 = score(doc=392,freq=16.0), product of:
            0.120893985 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.03823278 = queryNorm
            0.49407038 = fieldWeight in 392, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=392)
      0.25 = coord(1/4)
    
    Abstract
    Machine learning methods, especially deep learning models, have achieved impressive performance in various natural language processing tasks including sentiment analysis. However, deep learning models are more demanding for training data. Data augmentation techniques are widely used to generate new instances based on modifications to existing data or relying on external knowledge bases to address annotated data scarcity, which hinders the full potential of machine learning techniques. This paper presents our work using part-of-speech (POS) focused lexical substitution for data augmentation (PLSDA) to enhance the performance of machine learning algorithms in sentiment analysis. We exploit POS information to identify words to be replaced and investigate different augmentation strategies to find semantically related substitutions when generating new instances. The choice of POS tags as well as a variety of strategies such as semantic-based substitution methods and sampling methods are discussed in detail. Performance evaluation focuses on the comparison between PLSDA and two previous lexical substitution-based data augmentation methods, one of which is thesaurus-based, and the other is lexicon manipulation based. Our approach is tested on five English sentiment analysis benchmarks: SST-2, MR, IMDB, Twitter, and AirRecord. Hyperparameters such as the candidate similarity threshold and number of newly generated instances are optimized. Results show that six classifiers (SVM, LSTM, BiLSTM-AT, bidirectional encoder representations from transformers [BERT], XLNet, and RoBERTa) trained with PLSDA achieve accuracy improvement of more than 0.6% comparing to two previous lexical substitution methods averaged on five benchmarks. Introducing POS constraint and well-designed augmentation strategies can improve the reliability of lexical data augmentation methods. Consequently, PLSDA significantly improves the performance of sentiment analysis algorithms.
  15. Ruge, G.: Experiments on linguistically-based term associations (1992) 0.01
    0.014630836 = product of:
      0.058523346 = sum of:
        0.058523346 = weight(_text_:data in 1810) [ClassicSimilarity], result of:
          0.058523346 = score(doc=1810,freq=6.0), product of:
            0.120893985 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.03823278 = queryNorm
            0.48408815 = fieldWeight in 1810, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0625 = fieldNorm(doc=1810)
      0.25 = coord(1/4)
    
    Abstract
    Describes the hyperterm system REALIST (REtrieval Aids by LInguistic and STatistics) and describes its semantic component. The semantic component of REALIST generates semantic term relations such synonyms. It takes as input a free text data base and generates as output term pairs that are semantically related with respect to their meanings in the data base. In the 1st step an automatic syntactic analysis provides linguistical knowledge about the terms of the data base. In the 2nd step this knowledge is compared by statistical similarity computation. Various experiments with different similarity measures are described
  16. Sprachtechnologie, mobile Kommunikation und linguistische Ressourcen : Beiträge zur GLDV Tagung 2005 in Bonn (2005) 0.01
    0.014307394 = product of:
      0.057229575 = sum of:
        0.057229575 = weight(_text_:becker in 3578) [ClassicSimilarity], result of:
          0.057229575 = score(doc=3578,freq=2.0), product of:
            0.25693014 = queryWeight, product of:
              6.7201533 = idf(docFreq=144, maxDocs=44218)
              0.03823278 = queryNorm
            0.22274372 = fieldWeight in 3578, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.7201533 = idf(docFreq=144, maxDocs=44218)
              0.0234375 = fieldNorm(doc=3578)
      0.25 = coord(1/4)
    
    Content
    Darja Mönke: Ein Parser für natürlichsprachlich formulierte mathematische Beweise - Martin Müller: Ontologien für mathematische Beweistexte - Moritz Neugebauer: The status of functional phonological classification in statistical speech recognition - Uwe Quasthoff: Kookkurrenzanalyse und korpusbasierte Sachgruppenlexikographie - Reinhard Rapp: On the Relationship between Word Frequency and Word Familiarity - Ulrich Schade/Miloslaw Frey/Sebastian Becker: Computerlinguistische Anwendungen zur Verbesserung der Kommunikation zwischen militärischen Einheiten und deren Führungsinformationssystemen - David Schlangen/Thomas Hanneforth/Manfred Stede: Weaving the Semantic Web: Extracting and Representing the Content of Pathology Reports - Thomas Schmidt: Modellbildung und Modellierungsparadigmen in der computergestützten Korpuslinguistik - Sabine Schröder/Martina Ziefle: Semantic transparency of cellular phone menus - Thorsten Trippel/Thierry Declerck/Ulrich Held: Standardisierung von Sprachressourcen: Der aktuelle Stand - Charlotte Wollermann: Evaluation der audiovisuellen Kongruenz bei der multimodalen Sprachsynsthese - Claudia Kunze/Lothar Lemnitzer: Anwendungen des GermaNet II: Einleitung - Claudia Kunze/Lothar Lemnitzer: Die Zukunft der Wortnetze oder die Wortnetze der Zukunft - ein Roadmap-Beitrag -
  17. Gill, A.J.; Hinrichs-Krapels, S.; Blanke, T.; Grant, J.; Hedges, M.; Tanner, S.: Insight workflow : systematically combining human and computational methods to explore textual data (2017) 0.01
    0.012931953 = product of:
      0.051727813 = sum of:
        0.051727813 = weight(_text_:data in 3682) [ClassicSimilarity], result of:
          0.051727813 = score(doc=3682,freq=12.0), product of:
            0.120893985 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.03823278 = queryNorm
            0.4278775 = fieldWeight in 3682, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3682)
      0.25 = coord(1/4)
    
    Abstract
    Analyzing large quantities of real-world textual data has the potential to provide new insights for researchers. However, such data present challenges for both human and computational methods, requiring a diverse range of specialist skills, often shared across a number of individuals. In this paper we use the analysis of a real-world data set as our case study, and use this exploration as a demonstration of our "insight workflow," which we present for use and adaptation by other researchers. The data we use are impact case study documents collected as part of the UK Research Excellence Framework (REF), consisting of 6,679 documents and 6.25 million words; the analysis was commissioned by the Higher Education Funding Council for England (published as report HEFCE 2015). In our exploration and analysis we used a variety of techniques, ranging from keyword in context and frequency information to more sophisticated methods (topic modeling), with these automated techniques providing an empirical point of entry for in-depth and intensive human analysis. We present the 60 topics to demonstrate the output of our methods, and illustrate how the variety of analysis techniques can be combined to provide insights. We note potential limitations and propose future work.
    Theme
    Data Mining
  18. McKelvie, D.; Brew, C.; Thompson, H.S.: Uisng SGML as a basis for data-intensive natural language processing (1998) 0.01
    0.012670675 = product of:
      0.0506827 = sum of:
        0.0506827 = weight(_text_:data in 3147) [ClassicSimilarity], result of:
          0.0506827 = score(doc=3147,freq=2.0), product of:
            0.120893985 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.03823278 = queryNorm
            0.4192326 = fieldWeight in 3147, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.09375 = fieldNorm(doc=3147)
      0.25 = coord(1/4)
    
  19. dpa: 14 Forscher mit viel Geld angelockt : Wolfgang-Paul-Preis (2001) 0.01
    0.012445048 = product of:
      0.049780194 = sum of:
        0.049780194 = product of:
          0.09956039 = sum of:
            0.09956039 = weight(_text_:lexikon in 6814) [ClassicSimilarity], result of:
              0.09956039 = score(doc=6814,freq=2.0), product of:
                0.23962554 = queryWeight, product of:
                  6.2675414 = idf(docFreq=227, maxDocs=44218)
                  0.03823278 = queryNorm
                0.4154832 = fieldWeight in 6814, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  6.2675414 = idf(docFreq=227, maxDocs=44218)
                  0.046875 = fieldNorm(doc=6814)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Content
    Darin. "Die Sprachwissenschaftlerin Christiane Fellbaum (dpa-Bild) wird ihr Preisgeld für das an der Berlin-Brandenburgischen Akademie der Wissenschaften zu erstellende "Digitale Wörterbuch der Deutschen Sprache des 20. Jahrhunderts" einsetzen. Sie setzt mit ihrem Computer dort an, wo konventionelle Wörterbücher nicht mehr mithalten können. Sie stellt per Knopfdruck Wortverbindungen her, die eine Sprache so reich an Bildern und Vorstellungen - und damit einzigartig - machen. Ihr elektronisches Lexikon aus über 500 Millionen Wörtern soll später als Datenbank zugänglich sein. Seine Grundlage ist die deutsche Sprache der vergangenen hundert Jahre - ein repräsentativer Querschnitt, zusammengestellt aus Literatur, Zeitungsdeutsch, Fachbuchsprache, Werbetexten und niedergeschriebener Umgangssprache. Wo ein Wörterbuch heute nur ein Wort mit Synonymen oder wenigen Verwendungsmöglichkeiten präsentiert, spannt die Forscherin ein riesiges Netz von Wortverbindungen. Bei Christiane Fellbaums Systematik heißt es beispielsweise nicht nur "verlieren", sondern auch noch "den Faden" oder "die Geduld" verlieren - samt allen möglichen weiteren Kombinationen, die der Computer wie eine Suchmaschine in seinen gespeicherten Texten findet."
  20. Owei, V.; Higa, K.: ¬A paradigm for natural language explanation of database queries : a semantic data model approach (1994) 0.01
    0.011946027 = product of:
      0.04778411 = sum of:
        0.04778411 = weight(_text_:data in 8189) [ClassicSimilarity], result of:
          0.04778411 = score(doc=8189,freq=4.0), product of:
            0.120893985 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.03823278 = queryNorm
            0.3952563 = fieldWeight in 8189, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0625 = fieldNorm(doc=8189)
      0.25 = coord(1/4)
    
    Abstract
    An interface that provides the user with automatic feedback in the form of an explanation of how the database management system interprets user specified queries. Proposes an approach that exploits the rich semantics of graphical semantic data models to construct a restricted natural language explanation of database queries that are specified in a very high level declarative form. These interpretations of the specified query represent the system's 'understanding' of the query, and are returned to the user for validation

Authors

Years

Languages

  • e 112
  • d 31
  • f 1
  • m 1
  • More… Less…

Types

  • a 110
  • m 17
  • el 15
  • s 9
  • x 8
  • p 3
  • d 1
  • More… Less…

Subjects

Classifications