Search (302 results, page 1 of 16)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.15

0.1495964 = sum of:
  0.081033945 = product of:
    0.24310184 = sum of:
      0.24310184 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
        0.24310184 = score(doc=562,freq=2.0), product of:
          0.43255165 = queryWeight, product of:
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.051020417 = queryNorm
          0.56201804 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
    0.33333334 = coord(1/3)
  0.04782477 = weight(_text_:data in 562) [ClassicSimilarity], result of:
    0.04782477 = score(doc=562,freq=4.0), product of:
      0.16132914 = queryWeight, product of:
        3.1620505 = idf(docFreq=5088, maxDocs=44218)
        0.051020417 = queryNorm
      0.29644224 = fieldWeight in 562, product of:
        2.0 = tf(freq=4.0), with freq of:
          4.0 = termFreq=4.0
        3.1620505 = idf(docFreq=5088, maxDocs=44218)
        0.046875 = fieldNorm(doc=562)
  0.020737685 = product of:
    0.04147537 = sum of:
      0.04147537 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
        0.04147537 = score(doc=562,freq=2.0), product of:
          0.1786648 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051020417 = queryNorm
          0.23214069 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
    0.5 = coord(1/2)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32
Source: Proceedings of the 4th IEEE International Conference on Data Mining (ICDM 2004), 1-4 November 2004, Brighton, UK

Basili, R.; Pazienza, M.T.; Velardi, P.: ¬An empirical symbolic approach to natural language processing (1996) 0.15

0.14905265 = product of:
  0.22357896 = sum of:
    0.06376636 = weight(_text_:data in 6753) [ClassicSimilarity], result of:
      0.06376636 = score(doc=6753,freq=4.0), product of:
        0.16132914 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.051020417 = queryNorm
        0.3952563 = fieldWeight in 6753, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0625 = fieldNorm(doc=6753)
    0.1598126 = sum of:
      0.10451211 = weight(_text_:processing in 6753) [ClassicSimilarity], result of:
        0.10451211 = score(doc=6753,freq=4.0), product of:
          0.20653816 = queryWeight, product of:
            4.048147 = idf(docFreq=2097, maxDocs=44218)
            0.051020417 = queryNorm
          0.5060184 = fieldWeight in 6753, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            4.048147 = idf(docFreq=2097, maxDocs=44218)
            0.0625 = fieldNorm(doc=6753)
      0.055300497 = weight(_text_:22 in 6753) [ClassicSimilarity], result of:
        0.055300497 = score(doc=6753,freq=2.0), product of:
          0.1786648 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051020417 = queryNorm
          0.30952093 = fieldWeight in 6753, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0625 = fieldNorm(doc=6753)
  0.6666667 = coord(2/3)

Abstract: Describes and evaluates the results of a large scale lexical learning system, ARISTO-LEX, that uses a combination of probabilisitc and knowledge based methods for the acquisition of selectional restrictions of words in sublanguages. Presents experimental data obtained from different corpora in different doamins and languages, and shows that the acquired lexical data not only have practical applications in natural language processing, but they are useful for a comparative analysis of sublanguages
Date: 6. 3.1997 16:22:15

Liddy, E.D.: Natural language processing for information retrieval and knowledge discovery (1998) 0.13

0.13322797 = product of:
  0.19984195 = sum of:
    0.03945342 = weight(_text_:data in 2345) [ClassicSimilarity], result of:
      0.03945342 = score(doc=2345,freq=2.0), product of:
        0.16132914 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.051020417 = queryNorm
        0.24455236 = fieldWeight in 2345, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2345)
    0.16038853 = sum of:
      0.11200059 = weight(_text_:processing in 2345) [ClassicSimilarity], result of:
        0.11200059 = score(doc=2345,freq=6.0), product of:
          0.20653816 = queryWeight, product of:
            4.048147 = idf(docFreq=2097, maxDocs=44218)
            0.051020417 = queryNorm
          0.54227555 = fieldWeight in 2345, product of:
            2.4494898 = tf(freq=6.0), with freq of:
              6.0 = termFreq=6.0
            4.048147 = idf(docFreq=2097, maxDocs=44218)
            0.0546875 = fieldNorm(doc=2345)
      0.048387934 = weight(_text_:22 in 2345) [ClassicSimilarity], result of:
        0.048387934 = score(doc=2345,freq=2.0), product of:
          0.1786648 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051020417 = queryNorm
          0.2708308 = fieldWeight in 2345, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=2345)
  0.6666667 = coord(2/3)

Abstract: Natural language processing (NLP) is a powerful technology for the vital tasks of information retrieval (IR) and knowledge discovery (KD) which, in turn, feed the visualization systems of the present and future and enable knowledge workers to focus more of their time on the vital tasks of analysis and prediction
Date: 22. 9.1997 19:16:05
Source: Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al

Semantik, Lexikographie und Computeranwendungen : Workshop ... (Bonn) : 1995.01.27-28 (1996) 0.12

0.12220091 = product of:
  0.18330136 = sum of:
    0.056362033 = weight(_text_:data in 190) [ClassicSimilarity], result of:
      0.056362033 = score(doc=190,freq=8.0), product of:
        0.16132914 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.051020417 = queryNorm
        0.34936053 = fieldWeight in 190, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=190)
    0.12693933 = sum of:
      0.09237652 = weight(_text_:processing in 190) [ClassicSimilarity], result of:
        0.09237652 = score(doc=190,freq=8.0), product of:
          0.20653816 = queryWeight, product of:
            4.048147 = idf(docFreq=2097, maxDocs=44218)
            0.051020417 = queryNorm
          0.4472613 = fieldWeight in 190, product of:
            2.828427 = tf(freq=8.0), with freq of:
              8.0 = termFreq=8.0
            4.048147 = idf(docFreq=2097, maxDocs=44218)
            0.0390625 = fieldNorm(doc=190)
      0.03456281 = weight(_text_:22 in 190) [ClassicSimilarity], result of:
        0.03456281 = score(doc=190,freq=2.0), product of:
          0.1786648 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051020417 = queryNorm
          0.19345059 = fieldWeight in 190, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=190)
  0.6666667 = coord(2/3)

Date: 14. 4.2007 10:04:22
LCSH: Semantics / Data processing ; Lexicography / Data processing ; Computational linguistics
Subject: Semantics / Data processing ; Lexicography / Data processing ; Computational linguistics

WordNet : an electronic lexical database (language, speech and communication) (1998) 0.12

0.117224745 = product of:
  0.17583711 = sum of:
    0.09664074 = weight(_text_:data in 2434) [ClassicSimilarity], result of:
      0.09664074 = score(doc=2434,freq=12.0), product of:
        0.16132914 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.051020417 = queryNorm
        0.59902847 = fieldWeight in 2434, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2434)
    0.07919637 = product of:
      0.15839274 = sum of:
        0.15839274 = weight(_text_:processing in 2434) [ClassicSimilarity], result of:
          0.15839274 = score(doc=2434,freq=12.0), product of:
            0.20653816 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.051020417 = queryNorm
            0.7668934 = fieldWeight in 2434, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2434)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

LCSH: Semantics / Data processing
Lexicology / Data processing
English language / Data processing
Subject: Semantics / Data processing
Lexicology / Data processing
English language / Data processing

Barton, G.E. Jr.; Berwick, R.C.; Ristad, E.S.: Computational complexity and natural language (1987) 0.12

0.11602241 = product of:
  0.17403361 = sum of:
    0.09564954 = weight(_text_:data in 7138) [ClassicSimilarity], result of:
      0.09564954 = score(doc=7138,freq=4.0), product of:
        0.16132914 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.051020417 = queryNorm
        0.5928845 = fieldWeight in 7138, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.09375 = fieldNorm(doc=7138)
    0.07838408 = product of:
      0.15676816 = sum of:
        0.15676816 = weight(_text_:processing in 7138) [ClassicSimilarity], result of:
          0.15676816 = score(doc=7138,freq=4.0), product of:
            0.20653816 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.051020417 = queryNorm
            0.7590276 = fieldWeight in 7138, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.09375 = fieldNorm(doc=7138)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

LCSH: Linguistics / Data processing
Subject: Linguistics / Data processing

Warner, A.J.: Natural language processing (1987) 0.09

0.08613448 = product of:
  0.25840342 = sum of:
    0.25840342 = sum of:
      0.14780244 = weight(_text_:processing in 337) [ClassicSimilarity], result of:
        0.14780244 = score(doc=337,freq=2.0), product of:
          0.20653816 = queryWeight, product of:
            4.048147 = idf(docFreq=2097, maxDocs=44218)
            0.051020417 = queryNorm
          0.7156181 = fieldWeight in 337, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            4.048147 = idf(docFreq=2097, maxDocs=44218)
            0.125 = fieldNorm(doc=337)
      0.11060099 = weight(_text_:22 in 337) [ClassicSimilarity], result of:
        0.11060099 = score(doc=337,freq=2.0), product of:
          0.1786648 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051020417 = queryNorm
          0.61904186 = fieldWeight in 337, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.125 = fieldNorm(doc=337)
  0.33333334 = coord(1/3)

Source: Annual review of information science and technology. 22(1987), S.79-108

Jacquemin, C.: Spotting and discovering terms through natural language processing (2001) 0.08
```
0.082744 = product of:
  0.124116 = sum of:
    0.06301467 = weight(_text_:data in 119) [ClassicSimilarity], result of:
      0.06301467 = score(doc=119,freq=10.0), product of:
        0.16132914 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.051020417 = queryNorm
        0.39059696 = fieldWeight in 119, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=119)
    0.06110133 = product of:
      0.12220266 = sum of:
        0.12220266 = weight(_text_:processing in 119) [ClassicSimilarity], result of:
          0.12220266 = score(doc=119,freq=14.0), product of:
            0.20653816 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.051020417 = queryNorm
            0.5916711 = fieldWeight in 119, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0390625 = fieldNorm(doc=119)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

In this book Christian Jacquemin shows how the power of natural language processing (NLP) can be used to advance text indexing and information retrieval (IR). Jacquemin's novel tool is FASTR, a parser that normalizes terms and recognizes term variants. Since there are more meanings in a language than there are words, FASTR uses a metagrammar composed of shallow linguistic transformations that describe the morphological, syntactic, semantic, and pragmatic variations of words and terms. The acquired parsed terms can then be applied for precise retrieval and assembly of information. The use of a corpus-based unification grammar to define, recognize, and combine term variants from their base forms allows for intelligent information access to, or "linguistic data tuning" of, heterogeneous texts. FASTR can be used to do automatic controlled indexing, to carry out content-based Web searches through conceptually related alternative query formulations, to abstract scientific and technical extracts, and even to translate and collect terms from multilingual material. Jacquemin provides a comprehensive account of the method and implementation of this innovative retrieval technique for text processing.

LCSH

Language and languages / Variation / Data processing
Terms and phrases / Data processing

Subject

Language and languages / Variation / Data processing
Terms and phrases / Data processing

McKelvie, D.; Brew, C.; Thompson, H.S.: Uisng SGML as a basis for data-intensive natural language processing (1998) 0.08

0.082040235 = product of:
  0.123060346 = sum of:
    0.06763443 = weight(_text_:data in 3147) [ClassicSimilarity], result of:
      0.06763443 = score(doc=3147,freq=2.0), product of:
        0.16132914 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.051020417 = queryNorm
        0.4192326 = fieldWeight in 3147, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.09375 = fieldNorm(doc=3147)
    0.055425912 = product of:
      0.110851824 = sum of:
        0.110851824 = weight(_text_:processing in 3147) [ClassicSimilarity], result of:
          0.110851824 = score(doc=3147,freq=2.0), product of:
            0.20653816 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.051020417 = queryNorm
            0.53671354 = fieldWeight in 3147, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.09375 = fieldNorm(doc=3147)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Ruge, G.: Experiments on linguistically-based term associations (1992) 0.08

0.07669876 = product of:
  0.11504813 = sum of:
    0.07809752 = weight(_text_:data in 1810) [ClassicSimilarity], result of:
      0.07809752 = score(doc=1810,freq=6.0), product of:
        0.16132914 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.051020417 = queryNorm
        0.48408815 = fieldWeight in 1810, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0625 = fieldNorm(doc=1810)
    0.03695061 = product of:
      0.07390122 = sum of:
        0.07390122 = weight(_text_:processing in 1810) [ClassicSimilarity], result of:
          0.07390122 = score(doc=1810,freq=2.0), product of:
            0.20653816 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.051020417 = queryNorm
            0.35780904 = fieldWeight in 1810, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0625 = fieldNorm(doc=1810)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: Describes the hyperterm system REALIST (REtrieval Aids by LInguistic and STatistics) and describes its semantic component. The semantic component of REALIST generates semantic term relations such synonyms. It takes as input a free text data base and generates as output term pairs that are semantically related with respect to their meanings in the data base. In the 1st step an automatic syntactic analysis provides linguistical knowledge about the terms of the data base. In the 2nd step this knowledge is compared by statistical similarity computation. Various experiments with different similarity measures are described
Source: Information processing and management. 28(1992) no.3, S.317-332

Haas, S.W.: Natural language processing : toward large-scale, robust systems (1996) 0.07

0.07351622 = product of:
  0.22054866 = sum of:
    0.22054866 = sum of:
      0.16524816 = weight(_text_:processing in 7415) [ClassicSimilarity], result of:
        0.16524816 = score(doc=7415,freq=10.0), product of:
          0.20653816 = queryWeight, product of:
            4.048147 = idf(docFreq=2097, maxDocs=44218)
            0.051020417 = queryNorm
          0.80008537 = fieldWeight in 7415, product of:
            3.1622777 = tf(freq=10.0), with freq of:
              10.0 = termFreq=10.0
            4.048147 = idf(docFreq=2097, maxDocs=44218)
            0.0625 = fieldNorm(doc=7415)
      0.055300497 = weight(_text_:22 in 7415) [ClassicSimilarity], result of:
        0.055300497 = score(doc=7415,freq=2.0), product of:
          0.1786648 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051020417 = queryNorm
          0.30952093 = fieldWeight in 7415, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0625 = fieldNorm(doc=7415)
  0.33333334 = coord(1/3)

Abstract: State of the art review of natural language processing updating an earlier review published in ARIST 22(1987). Discusses important developments that have allowed for significant advances in the field of natural language processing: materials and resources; knowledge based systems and statistical approaches; and a strong emphasis on evaluation. Reviews some natural language processing applications and common problems still awaiting solution. Considers closely related applications such as language generation and th egeneration phase of machine translation which face the same problems as natural language processing. Covers natural language methodologies for information retrieval only briefly

Xiang, R.; Chersoni, E.; Lu, Q.; Huang, C.-R.; Li, W.; Long, Y.: Lexical data augmentation for sentiment analysis (2021) 0.07
```
0.068534724 = product of:
  0.10280208 = sum of:
    0.07970795 = weight(_text_:data in 392) [ClassicSimilarity], result of:
      0.07970795 = score(doc=392,freq=16.0), product of:
        0.16132914 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.051020417 = queryNorm
        0.49407038 = fieldWeight in 392, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=392)
    0.02309413 = product of:
      0.04618826 = sum of:
        0.04618826 = weight(_text_:processing in 392) [ClassicSimilarity], result of:
          0.04618826 = score(doc=392,freq=2.0), product of:
            0.20653816 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.051020417 = queryNorm
            0.22363065 = fieldWeight in 392, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0390625 = fieldNorm(doc=392)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Machine learning methods, especially deep learning models, have achieved impressive performance in various natural language processing tasks including sentiment analysis. However, deep learning models are more demanding for training data. Data augmentation techniques are widely used to generate new instances based on modifications to existing data or relying on external knowledge bases to address annotated data scarcity, which hinders the full potential of machine learning techniques. This paper presents our work using part-of-speech (POS) focused lexical substitution for data augmentation (PLSDA) to enhance the performance of machine learning algorithms in sentiment analysis. We exploit POS information to identify words to be replaced and investigate different augmentation strategies to find semantically related substitutions when generating new instances. The choice of POS tags as well as a variety of strategies such as semantic-based substitution methods and sampling methods are discussed in detail. Performance evaluation focuses on the comparison between PLSDA and two previous lexical substitution-based data augmentation methods, one of which is thesaurus-based, and the other is lexicon manipulation based. Our approach is tested on five English sentiment analysis benchmarks: SST-2, MR, IMDB, Twitter, and AirRecord. Hyperparameters such as the candidate similarity threshold and number of newly generated instances are optimized. Results show that six classifiers (SVM, LSTM, BiLSTM-AT, bidirectional encoder representations from transformers [BERT], XLNet, and RoBERTa) trained with PLSDA achieve accuracy improvement of more than 0.6% comparing to two previous lexical substitution methods averaged on five benchmarks. Introducing POS constraint and well-designed augmentation strategies can improve the reliability of lexical data augmentation methods. Consequently, PLSDA significantly improves the performance of sentiment analysis algorithms.

Engerer, V.: Informationswissenschaft und Linguistik. : kurze Geschichte eines fruchtbaren interdisziplinäaren Verhäaltnisses in drei Akten (2012) 0.07

0.06836687 = product of:
  0.1025503 = sum of:
    0.056362033 = weight(_text_:data in 3376) [ClassicSimilarity], result of:
      0.056362033 = score(doc=3376,freq=2.0), product of:
        0.16132914 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.051020417 = queryNorm
        0.34936053 = fieldWeight in 3376, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.078125 = fieldNorm(doc=3376)
    0.04618826 = product of:
      0.09237652 = sum of:
        0.09237652 = weight(_text_:processing in 3376) [ClassicSimilarity], result of:
          0.09237652 = score(doc=3376,freq=2.0), product of:
            0.20653816 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.051020417 = queryNorm
            0.4472613 = fieldWeight in 3376, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.078125 = fieldNorm(doc=3376)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Source: SDV - Sprache und Datenverarbeitung. International journal for language data processing. 36(2012) H.2, S.71-91 [= E-Books - Fakten, Perspektiven und Szenarien] 36/2 (2012), S. 71-91

Weingarten, R.: ¬Die Verkabelung der Sprache : Grenzen der Technisierung von Kommunikation (1989) 0.07

0.06767975 = product of:
  0.101519614 = sum of:
    0.055795565 = weight(_text_:data in 7156) [ClassicSimilarity], result of:
      0.055795565 = score(doc=7156,freq=4.0), product of:
        0.16132914 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.051020417 = queryNorm
        0.34584928 = fieldWeight in 7156, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7156)
    0.045724045 = product of:
      0.09144809 = sum of:
        0.09144809 = weight(_text_:processing in 7156) [ClassicSimilarity], result of:
          0.09144809 = score(doc=7156,freq=4.0), product of:
            0.20653816 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.051020417 = queryNorm
            0.4427661 = fieldWeight in 7156, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0546875 = fieldNorm(doc=7156)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

LCSH: Communication / Data processing
Subject: Communication / Data processing

Fox, C.: Lexical analysis and stoplists (1992) 0.07

0.06714465 = product of:
  0.10071697 = sum of:
    0.06376636 = weight(_text_:data in 3502) [ClassicSimilarity], result of:
      0.06376636 = score(doc=3502,freq=4.0), product of:
        0.16132914 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.051020417 = queryNorm
        0.3952563 = fieldWeight in 3502, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0625 = fieldNorm(doc=3502)
    0.03695061 = product of:
      0.07390122 = sum of:
        0.07390122 = weight(_text_:processing in 3502) [ClassicSimilarity], result of:
          0.07390122 = score(doc=3502,freq=2.0), product of:
            0.20653816 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.051020417 = queryNorm
            0.35780904 = fieldWeight in 3502, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0625 = fieldNorm(doc=3502)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: Lexical analysis is a fundamental operation in both query processing and automatic indexing, and filtering stoplist words is an important step in the automatic indexing process. Presents basic algorithms and data structures for lexical analysis, and shows how stoplist word removal can be efficiently incorporated into lexical analysis
Source: Information retrieval: data structures and algorithms. Ed.: W.B. Frakes u. R. Baeza-Yates

Montgomery, C.A.: Linguistics and information science (1972) 0.06
```
0.063856855 = product of:
  0.095785275 = sum of:
    0.033817217 = weight(_text_:data in 6669) [ClassicSimilarity], result of:
      0.033817217 = score(doc=6669,freq=2.0), product of:
        0.16132914 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.051020417 = queryNorm
        0.2096163 = fieldWeight in 6669, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=6669)
    0.06196806 = product of:
      0.12393612 = sum of:
        0.12393612 = weight(_text_:processing in 6669) [ClassicSimilarity], result of:
          0.12393612 = score(doc=6669,freq=10.0), product of:
            0.20653816 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.051020417 = queryNorm
            0.60006404 = fieldWeight in 6669, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046875 = fieldNorm(doc=6669)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

This paper defines the relationship between linguistics and information science in terms of a common interest in natural language. The notion of automated processing of natural language - i.e., machine simulation of the language processing activities of a human - provides novel possibilities for interaction between linguistics, who have a theoretical interest in such activities, and information scientists, who have more practical goals, e.g. simulating the language processing activities of an indexer with a machine. The concept of a natural language information system is introduces as a framenwork for reviewing automated language processing efforts by computational linguists and information scientists. In terms of this framework, the former have concentrated on automating the operations of the component for content analysis and representation, while the latter have emphasized the data management component. The complementary nature of these developments allows the postulation of an integrated approach to automated language processing. This approach, which is outlined in the final sections of the paper, incorporates current notions in linguistic theory and information science, as well as design features of recent computational linguistic models

Mustafa el Hadi, W.; Jouis, C.: Evaluating natural language processing systems as a tool for building terminological databases (1996) 0.06

0.06363581 = product of:
  0.09545372 = sum of:
    0.03945342 = weight(_text_:data in 5191) [ClassicSimilarity], result of:
      0.03945342 = score(doc=5191,freq=2.0), product of:
        0.16132914 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.051020417 = queryNorm
        0.24455236 = fieldWeight in 5191, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5191)
    0.056000296 = product of:
      0.11200059 = sum of:
        0.11200059 = weight(_text_:processing in 5191) [ClassicSimilarity], result of:
          0.11200059 = score(doc=5191,freq=6.0), product of:
            0.20653816 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.051020417 = queryNorm
            0.54227555 = fieldWeight in 5191, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5191)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: Natural language processing systems use various modules in order to identify terms or concept names and the logico-semantic relations they entertain. The approaches involved in corpus analysis are either based on morpho-syntactic analysis, statistical analysis, semantic analysis, recent connexionist models or any combination of 2 or more of these approaches. This paper will examine the capacity of natural language processing systems to create databases from extensive textual data. We are endeavouring to evaluate the contribution of these systems, their advantages and their shortcomings

Seelbach, D.: Computerlinguistik und Dokumentation : keyphrases in Dokumentationsprozessen (1975) 0.06

0.058011204 = product of:
  0.087016806 = sum of:
    0.04782477 = weight(_text_:data in 299) [ClassicSimilarity], result of:
      0.04782477 = score(doc=299,freq=4.0), product of:
        0.16132914 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.051020417 = queryNorm
        0.29644224 = fieldWeight in 299, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=299)
    0.03919204 = product of:
      0.07838408 = sum of:
        0.07838408 = weight(_text_:processing in 299) [ClassicSimilarity], result of:
          0.07838408 = score(doc=299,freq=4.0), product of:
            0.20653816 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.051020417 = queryNorm
            0.3795138 = fieldWeight in 299, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046875 = fieldNorm(doc=299)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

LCSH: Documentation / Data processing
Subject: Documentation / Data processing

Beitzel, S.M.; Jensen, E.C.; Chowdhury, A.; Grossman, D.; Frieder, O; Goharian, N.: Fusion of effective retrieval strategies in the same information retrieval system (2004) 0.06
```
0.057524066 = product of:
  0.0862861 = sum of:
    0.05857314 = weight(_text_:data in 2502) [ClassicSimilarity], result of:
      0.05857314 = score(doc=2502,freq=6.0), product of:
        0.16132914 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.051020417 = queryNorm
        0.3630661 = fieldWeight in 2502, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=2502)
    0.027712956 = product of:
      0.055425912 = sum of:
        0.055425912 = weight(_text_:processing in 2502) [ClassicSimilarity], result of:
          0.055425912 = score(doc=2502,freq=2.0), product of:
            0.20653816 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.051020417 = queryNorm
            0.26835677 = fieldWeight in 2502, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046875 = fieldNorm(doc=2502)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Prior efforts have shown that under certain situations retrieval effectiveness may be improved via the use of data fusion techniques. Although these improvements have been observed from the fusion of result sets from several distinct information retrieval systems, it has often been thought that fusing different document retrieval strategies in a single information retrieval system will lead to similar improvements. In this study, we show that this is not the case. We hold constant systemic differences such as parsing, stemming, phrase processing, and relevance feedback, and fuse result sets generated from highly effective retrieval strategies in the same information retrieval system. From this, we show that data fusion of highly effective retrieval strategies alone shows little or no improvement in retrieval effectiveness. Furthermore, we present a detailed analysis of the performance of modern data fusion approaches, and demonstrate the reasons why they do not perform weIl when applied to this problem. Detailed results and analyses are included to support our conclusions.
K., Vani; Gupta, D.: Unmasking text plagiarism using syntactic-semantic based natural language processing techniques : comparisons, analysis and challenges (2018) 0.06
```
0.057361495 = product of:
  0.08604224 = sum of:
    0.039853975 = weight(_text_:data in 5084) [ClassicSimilarity], result of:
      0.039853975 = score(doc=5084,freq=4.0), product of:
        0.16132914 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.051020417 = queryNorm
        0.24703519 = fieldWeight in 5084, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5084)
    0.04618826 = product of:
      0.09237652 = sum of:
        0.09237652 = weight(_text_:processing in 5084) [ClassicSimilarity], result of:
          0.09237652 = score(doc=5084,freq=8.0), product of:
            0.20653816 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.051020417 = queryNorm
            0.4472613 = fieldWeight in 5084, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5084)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

The proposed work aims to explore and compare the potency of syntactic-semantic based linguistic structures in plagiarism detection using natural language processing techniques. The current work explores linguistic features, viz., part of speech tags, chunks and semantic roles in detecting plagiarized fragments and utilizes a combined syntactic-semantic similarity metric, which extracts the semantic concepts from WordNet lexical database. The linguistic information is utilized for effective pre-processing and for availing semantically relevant comparisons. Another major contribution is the analysis of the proposed approach on plagiarism cases of various complexity levels. The impact of plagiarism types and complexity levels, upon the features extracted is analyzed and discussed. Further, unlike the existing systems, which were evaluated on some limited data sets, the proposed approach is evaluated on a larger scale using the plagiarism corpus provided by PAN1 competition from 2009 to 2014. The approach presented considerable improvement in comparison with the top-ranked systems of the respective years. The evaluation and analysis with various cases of plagiarism also reflected the supremacy of deeper linguistic features for identifying manually plagiarized data.

Source

Information processing and management. 54(2018) no.3, S.408-432

Search (302 results, page 1 of 16)

Authors

Years

Languages

Types

Themes

Subjects

Classifications