Search (230 results, page 1 of 12)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.36

0.3604252 = sum of:
  0.07437435 = product of:
    0.22312303 = sum of:
      0.22312303 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
        0.22312303 = score(doc=562,freq=2.0), product of:
          0.39700332 = queryWeight, product of:
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.046827413 = queryNorm
          0.56201804 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
    0.33333334 = coord(1/3)
  0.22312303 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
    0.22312303 = score(doc=562,freq=2.0), product of:
      0.39700332 = queryWeight, product of:
        8.478011 = idf(docFreq=24, maxDocs=44218)
        0.046827413 = queryNorm
      0.56201804 = fieldWeight in 562, product of:
        1.4142135 = tf(freq=2.0), with freq of:
          2.0 = termFreq=2.0
        8.478011 = idf(docFreq=24, maxDocs=44218)
        0.046875 = fieldNorm(doc=562)
  0.043894395 = weight(_text_:data in 562) [ClassicSimilarity], result of:
    0.043894395 = score(doc=562,freq=4.0), product of:
      0.14807065 = queryWeight, product of:
        3.1620505 = idf(docFreq=5088, maxDocs=44218)
        0.046827413 = queryNorm
      0.29644224 = fieldWeight in 562, product of:
        2.0 = tf(freq=4.0), with freq of:
          4.0 = termFreq=4.0
        3.1620505 = idf(docFreq=5088, maxDocs=44218)
        0.046875 = fieldNorm(doc=562)
  0.019033402 = product of:
    0.038066804 = sum of:
      0.038066804 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
        0.038066804 = score(doc=562,freq=2.0), product of:
          0.16398162 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046827413 = queryNorm
          0.23214069 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
    0.5 = coord(1/2)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32
Source: Proceedings of the 4th IEEE International Conference on Data Mining (ICDM 2004), 1-4 November 2004, Brighton, UK

Noever, D.; Ciolino, M.: ¬The Turing deception (2022) 0.15

0.1487487 = product of:
  0.2974974 = sum of:
    0.07437435 = product of:
      0.22312303 = sum of:
        0.22312303 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
          0.22312303 = score(doc=862,freq=2.0), product of:
            0.39700332 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046827413 = queryNorm
            0.56201804 = fieldWeight in 862, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=862)
      0.33333334 = coord(1/3)
    0.22312303 = weight(_text_:2f in 862) [ClassicSimilarity], result of:
      0.22312303 = score(doc=862,freq=2.0), product of:
        0.39700332 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046827413 = queryNorm
        0.56201804 = fieldWeight in 862, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=862)
  0.5 = coord(2/4)

Source: https%3A%2F%2Farxiv.org%2Fabs%2F2212.06721&usg=AOvVaw3i_9pZm9y_dQWoHi6uv0EN

Basili, R.; Pazienza, M.T.; Velardi, P.: ¬An empirical symbolic approach to natural language processing (1996) 0.10

0.1026023 = product of:
  0.2052046 = sum of:
    0.058525857 = weight(_text_:data in 6753) [ClassicSimilarity], result of:
      0.058525857 = score(doc=6753,freq=4.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.3952563 = fieldWeight in 6753, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0625 = fieldNorm(doc=6753)
    0.14667875 = sum of:
      0.095923 = weight(_text_:processing in 6753) [ClassicSimilarity], result of:
        0.095923 = score(doc=6753,freq=4.0), product of:
          0.18956426 = queryWeight, product of:
            4.048147 = idf(docFreq=2097, maxDocs=44218)
            0.046827413 = queryNorm
          0.5060184 = fieldWeight in 6753, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            4.048147 = idf(docFreq=2097, maxDocs=44218)
            0.0625 = fieldNorm(doc=6753)
      0.050755743 = weight(_text_:22 in 6753) [ClassicSimilarity], result of:
        0.050755743 = score(doc=6753,freq=2.0), product of:
          0.16398162 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046827413 = queryNorm
          0.30952093 = fieldWeight in 6753, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0625 = fieldNorm(doc=6753)
  0.5 = coord(2/4)

Abstract: Describes and evaluates the results of a large scale lexical learning system, ARISTO-LEX, that uses a combination of probabilisitc and knowledge based methods for the acquisition of selectional restrictions of words in sublanguages. Presents experimental data obtained from different corpora in different doamins and languages, and shows that the acquired lexical data not only have practical applications in natural language processing, but they are useful for a comparative analysis of sublanguages
Date: 6. 3.1997 16:22:15

Liddy, E.D.: Natural language processing for information retrieval and knowledge discovery (1998) 0.09

0.09170918 = product of:
  0.18341836 = sum of:
    0.036211025 = weight(_text_:data in 2345) [ClassicSimilarity], result of:
      0.036211025 = score(doc=2345,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.24455236 = fieldWeight in 2345, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2345)
    0.14720733 = sum of:
      0.10279606 = weight(_text_:processing in 2345) [ClassicSimilarity], result of:
        0.10279606 = score(doc=2345,freq=6.0), product of:
          0.18956426 = queryWeight, product of:
            4.048147 = idf(docFreq=2097, maxDocs=44218)
            0.046827413 = queryNorm
          0.54227555 = fieldWeight in 2345, product of:
            2.4494898 = tf(freq=6.0), with freq of:
              6.0 = termFreq=6.0
            4.048147 = idf(docFreq=2097, maxDocs=44218)
            0.0546875 = fieldNorm(doc=2345)
      0.044411276 = weight(_text_:22 in 2345) [ClassicSimilarity], result of:
        0.044411276 = score(doc=2345,freq=2.0), product of:
          0.16398162 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046827413 = queryNorm
          0.2708308 = fieldWeight in 2345, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=2345)
  0.5 = coord(2/4)

Abstract: Natural language processing (NLP) is a powerful technology for the vital tasks of information retrieval (IR) and knowledge discovery (KD) which, in turn, feed the visualization systems of the present and future and enable knowledge workers to focus more of their time on the vital tasks of analysis and prediction
Date: 22. 9.1997 19:16:05
Source: Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al

Warner, A.J.: Natural language processing (1987) 0.06

0.059291773 = product of:
  0.23716709 = sum of:
    0.23716709 = sum of:
      0.13565561 = weight(_text_:processing in 337) [ClassicSimilarity], result of:
        0.13565561 = score(doc=337,freq=2.0), product of:
          0.18956426 = queryWeight, product of:
            4.048147 = idf(docFreq=2097, maxDocs=44218)
            0.046827413 = queryNorm
          0.7156181 = fieldWeight in 337, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            4.048147 = idf(docFreq=2097, maxDocs=44218)
            0.125 = fieldNorm(doc=337)
      0.101511486 = weight(_text_:22 in 337) [ClassicSimilarity], result of:
        0.101511486 = score(doc=337,freq=2.0), product of:
          0.16398162 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046827413 = queryNorm
          0.61904186 = fieldWeight in 337, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.125 = fieldNorm(doc=337)
  0.25 = coord(1/4)

Source: Annual review of information science and technology. 22(1987), S.79-108

McKelvie, D.; Brew, C.; Thompson, H.S.: Uisng SGML as a basis for data-intensive natural language processing (1998) 0.06

0.05647345 = product of:
  0.1129469 = sum of:
    0.062076043 = weight(_text_:data in 3147) [ClassicSimilarity], result of:
      0.062076043 = score(doc=3147,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.4192326 = fieldWeight in 3147, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.09375 = fieldNorm(doc=3147)
    0.05087085 = product of:
      0.1017417 = sum of:
        0.1017417 = weight(_text_:processing in 3147) [ClassicSimilarity], result of:
          0.1017417 = score(doc=3147,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.53671354 = fieldWeight in 3147, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.09375 = fieldNorm(doc=3147)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Ruge, G.: Experiments on linguistically-based term associations (1992) 0.05

0.052796576 = product of:
  0.10559315 = sum of:
    0.07167925 = weight(_text_:data in 1810) [ClassicSimilarity], result of:
      0.07167925 = score(doc=1810,freq=6.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.48408815 = fieldWeight in 1810, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0625 = fieldNorm(doc=1810)
    0.033913903 = product of:
      0.067827806 = sum of:
        0.067827806 = weight(_text_:processing in 1810) [ClassicSimilarity], result of:
          0.067827806 = score(doc=1810,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.35780904 = fieldWeight in 1810, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0625 = fieldNorm(doc=1810)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: Describes the hyperterm system REALIST (REtrieval Aids by LInguistic and STatistics) and describes its semantic component. The semantic component of REALIST generates semantic term relations such synonyms. It takes as input a free text data base and generates as output term pairs that are semantically related with respect to their meanings in the data base. In the 1st step an automatic syntactic analysis provides linguistical knowledge about the terms of the data base. In the 2nd step this knowledge is compared by statistical similarity computation. Various experiments with different similarity measures are described
Source: Information processing and management. 28(1992) no.3, S.317-332

Haas, S.W.: Natural language processing : toward large-scale, robust systems (1996) 0.05

0.050605834 = product of:
  0.20242333 = sum of:
    0.20242333 = sum of:
      0.1516676 = weight(_text_:processing in 7415) [ClassicSimilarity], result of:
        0.1516676 = score(doc=7415,freq=10.0), product of:
          0.18956426 = queryWeight, product of:
            4.048147 = idf(docFreq=2097, maxDocs=44218)
            0.046827413 = queryNorm
          0.80008537 = fieldWeight in 7415, product of:
            3.1622777 = tf(freq=10.0), with freq of:
              10.0 = termFreq=10.0
            4.048147 = idf(docFreq=2097, maxDocs=44218)
            0.0625 = fieldNorm(doc=7415)
      0.050755743 = weight(_text_:22 in 7415) [ClassicSimilarity], result of:
        0.050755743 = score(doc=7415,freq=2.0), product of:
          0.16398162 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046827413 = queryNorm
          0.30952093 = fieldWeight in 7415, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0625 = fieldNorm(doc=7415)
  0.25 = coord(1/4)

Abstract: State of the art review of natural language processing updating an earlier review published in ARIST 22(1987). Discusses important developments that have allowed for significant advances in the field of natural language processing: materials and resources; knowledge based systems and statistical approaches; and a strong emphasis on evaluation. Reviews some natural language processing applications and common problems still awaiting solution. Considers closely related applications such as language generation and th egeneration phase of machine translation which face the same problems as natural language processing. Covers natural language methodologies for information retrieval only briefly

Xiang, R.; Chersoni, E.; Lu, Q.; Huang, C.-R.; Li, W.; Long, Y.: Lexical data augmentation for sentiment analysis (2021) 0.05
```
0.047176752 = product of:
  0.094353504 = sum of:
    0.07315732 = weight(_text_:data in 392) [ClassicSimilarity], result of:
      0.07315732 = score(doc=392,freq=16.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.49407038 = fieldWeight in 392, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=392)
    0.021196188 = product of:
      0.042392377 = sum of:
        0.042392377 = weight(_text_:processing in 392) [ClassicSimilarity], result of:
          0.042392377 = score(doc=392,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.22363065 = fieldWeight in 392, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0390625 = fieldNorm(doc=392)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Machine learning methods, especially deep learning models, have achieved impressive performance in various natural language processing tasks including sentiment analysis. However, deep learning models are more demanding for training data. Data augmentation techniques are widely used to generate new instances based on modifications to existing data or relying on external knowledge bases to address annotated data scarcity, which hinders the full potential of machine learning techniques. This paper presents our work using part-of-speech (POS) focused lexical substitution for data augmentation (PLSDA) to enhance the performance of machine learning algorithms in sentiment analysis. We exploit POS information to identify words to be replaced and investigate different augmentation strategies to find semantically related substitutions when generating new instances. The choice of POS tags as well as a variety of strategies such as semantic-based substitution methods and sampling methods are discussed in detail. Performance evaluation focuses on the comparison between PLSDA and two previous lexical substitution-based data augmentation methods, one of which is thesaurus-based, and the other is lexicon manipulation based. Our approach is tested on five English sentiment analysis benchmarks: SST-2, MR, IMDB, Twitter, and AirRecord. Hyperparameters such as the candidate similarity threshold and number of newly generated instances are optimized. Results show that six classifiers (SVM, LSTM, BiLSTM-AT, bidirectional encoder representations from transformers [BERT], XLNet, and RoBERTa) trained with PLSDA achieve accuracy improvement of more than 0.6% comparing to two previous lexical substitution methods averaged on five benchmarks. Introducing POS constraint and well-designed augmentation strategies can improve the reliability of lexical data augmentation methods. Consequently, PLSDA significantly improves the performance of sentiment analysis algorithms.

Engerer, V.: Informationswissenschaft und Linguistik. : kurze Geschichte eines fruchtbaren interdisziplinäaren Verhäaltnisses in drei Akten (2012) 0.05

0.04706121 = product of:
  0.09412242 = sum of:
    0.05173004 = weight(_text_:data in 3376) [ClassicSimilarity], result of:
      0.05173004 = score(doc=3376,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.34936053 = fieldWeight in 3376, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.078125 = fieldNorm(doc=3376)
    0.042392377 = product of:
      0.08478475 = sum of:
        0.08478475 = weight(_text_:processing in 3376) [ClassicSimilarity], result of:
          0.08478475 = score(doc=3376,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.4472613 = fieldWeight in 3376, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.078125 = fieldNorm(doc=3376)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Source: SDV - Sprache und Datenverarbeitung. International journal for language data processing. 36(2012) H.2, S.71-91 [= E-Books - Fakten, Perspektiven und Szenarien] 36/2 (2012), S. 71-91

Fox, C.: Lexical analysis and stoplists (1992) 0.05

0.046219878 = product of:
  0.092439756 = sum of:
    0.058525857 = weight(_text_:data in 3502) [ClassicSimilarity], result of:
      0.058525857 = score(doc=3502,freq=4.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.3952563 = fieldWeight in 3502, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0625 = fieldNorm(doc=3502)
    0.033913903 = product of:
      0.067827806 = sum of:
        0.067827806 = weight(_text_:processing in 3502) [ClassicSimilarity], result of:
          0.067827806 = score(doc=3502,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.35780904 = fieldWeight in 3502, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0625 = fieldNorm(doc=3502)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: Lexical analysis is a fundamental operation in both query processing and automatic indexing, and filtering stoplist words is an important step in the automatic indexing process. Presents basic algorithms and data structures for lexical analysis, and shows how stoplist word removal can be efficiently incorporated into lexical analysis
Source: Information retrieval: data structures and algorithms. Ed.: W.B. Frakes u. R. Baeza-Yates

Montgomery, C.A.: Linguistics and information science (1972) 0.04
```
0.043956686 = product of:
  0.08791337 = sum of:
    0.031038022 = weight(_text_:data in 6669) [ClassicSimilarity], result of:
      0.031038022 = score(doc=6669,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.2096163 = fieldWeight in 6669, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=6669)
    0.056875348 = product of:
      0.113750696 = sum of:
        0.113750696 = weight(_text_:processing in 6669) [ClassicSimilarity], result of:
          0.113750696 = score(doc=6669,freq=10.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.60006404 = fieldWeight in 6669, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046875 = fieldNorm(doc=6669)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

This paper defines the relationship between linguistics and information science in terms of a common interest in natural language. The notion of automated processing of natural language - i.e., machine simulation of the language processing activities of a human - provides novel possibilities for interaction between linguistics, who have a theoretical interest in such activities, and information scientists, who have more practical goals, e.g. simulating the language processing activities of an indexer with a machine. The concept of a natural language information system is introduces as a framenwork for reviewing automated language processing efforts by computational linguists and information scientists. In terms of this framework, the former have concentrated on automating the operations of the component for content analysis and representation, while the latter have emphasized the data management component. The complementary nature of these developments allows the postulation of an integrated approach to automated language processing. This approach, which is outlined in the final sections of the paper, incorporates current notions in linguistic theory and information science, as well as design features of recent computational linguistic models

Mustafa el Hadi, W.; Jouis, C.: Evaluating natural language processing systems as a tool for building terminological databases (1996) 0.04

0.043804526 = product of:
  0.08760905 = sum of:
    0.036211025 = weight(_text_:data in 5191) [ClassicSimilarity], result of:
      0.036211025 = score(doc=5191,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.24455236 = fieldWeight in 5191, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5191)
    0.05139803 = product of:
      0.10279606 = sum of:
        0.10279606 = weight(_text_:processing in 5191) [ClassicSimilarity], result of:
          0.10279606 = score(doc=5191,freq=6.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.54227555 = fieldWeight in 5191, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5191)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: Natural language processing systems use various modules in order to identify terms or concept names and the logico-semantic relations they entertain. The approaches involved in corpus analysis are either based on morpho-syntactic analysis, statistical analysis, semantic analysis, recent connexionist models or any combination of 2 or more of these approaches. This paper will examine the capacity of natural language processing systems to create databases from extensive textual data. We are endeavouring to evaluate the contribution of these systems, their advantages and their shortcomings

Beitzel, S.M.; Jensen, E.C.; Chowdhury, A.; Grossman, D.; Frieder, O; Goharian, N.: Fusion of effective retrieval strategies in the same information retrieval system (2004) 0.04
```
0.03959743 = product of:
  0.07919486 = sum of:
    0.053759433 = weight(_text_:data in 2502) [ClassicSimilarity], result of:
      0.053759433 = score(doc=2502,freq=6.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.3630661 = fieldWeight in 2502, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=2502)
    0.025435425 = product of:
      0.05087085 = sum of:
        0.05087085 = weight(_text_:processing in 2502) [ClassicSimilarity], result of:
          0.05087085 = score(doc=2502,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.26835677 = fieldWeight in 2502, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046875 = fieldNorm(doc=2502)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Prior efforts have shown that under certain situations retrieval effectiveness may be improved via the use of data fusion techniques. Although these improvements have been observed from the fusion of result sets from several distinct information retrieval systems, it has often been thought that fusing different document retrieval strategies in a single information retrieval system will lead to similar improvements. In this study, we show that this is not the case. We hold constant systemic differences such as parsing, stemming, phrase processing, and relevance feedback, and fuse result sets generated from highly effective retrieval strategies in the same information retrieval system. From this, we show that data fusion of highly effective retrieval strategies alone shows little or no improvement in retrieval effectiveness. Furthermore, we present a detailed analysis of the performance of modern data fusion approaches, and demonstrate the reasons why they do not perform weIl when applied to this problem. Detailed results and analyses are included to support our conclusions.
K., Vani; Gupta, D.: Unmasking text plagiarism using syntactic-semantic based natural language processing techniques : comparisons, analysis and challenges (2018) 0.04
```
0.039485518 = product of:
  0.078971036 = sum of:
    0.03657866 = weight(_text_:data in 5084) [ClassicSimilarity], result of:
      0.03657866 = score(doc=5084,freq=4.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.24703519 = fieldWeight in 5084, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5084)
    0.042392377 = product of:
      0.08478475 = sum of:
        0.08478475 = weight(_text_:processing in 5084) [ClassicSimilarity], result of:
          0.08478475 = score(doc=5084,freq=8.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.4472613 = fieldWeight in 5084, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5084)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

The proposed work aims to explore and compare the potency of syntactic-semantic based linguistic structures in plagiarism detection using natural language processing techniques. The current work explores linguistic features, viz., part of speech tags, chunks and semantic roles in detecting plagiarized fragments and utilizes a combined syntactic-semantic similarity metric, which extracts the semantic concepts from WordNet lexical database. The linguistic information is utilized for effective pre-processing and for availing semantically relevant comparisons. Another major contribution is the analysis of the proposed approach on plagiarism cases of various complexity levels. The impact of plagiarism types and complexity levels, upon the features extracted is analyzed and discussed. Further, unlike the existing systems, which were evaluated on some limited data sets, the proposed approach is evaluated on a larger scale using the plagiarism corpus provided by PAN1 competition from 2009 to 2014. The approach presented considerable improvement in comparison with the top-ranked systems of the respective years. The evaluation and analysis with various cases of plagiarism also reflected the supremacy of deeper linguistic features for identifying manually plagiarized data.

Source

Information processing and management. 54(2018) no.3, S.408-432

Al-Khatib, K.; Ghosa, T.; Hou, Y.; Waard, A. de; Freitag, D.: Argument mining for scholarly document processing : taking stock and looking ahead (2021) 0.04

0.03908867 = product of:
  0.07817734 = sum of:
    0.036211025 = weight(_text_:data in 568) [ClassicSimilarity], result of:
      0.036211025 = score(doc=568,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.24455236 = fieldWeight in 568, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0546875 = fieldNorm(doc=568)
    0.041966315 = product of:
      0.08393263 = sum of:
        0.08393263 = weight(_text_:processing in 568) [ClassicSimilarity], result of:
          0.08393263 = score(doc=568,freq=4.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.4427661 = fieldWeight in 568, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0546875 = fieldNorm(doc=568)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: Argument mining targets structures in natural language related to interpretation and persuasion. Most scholarly discourse involves interpreting experimental evidence and attempting to persuade other scientists to adopt the same conclusions, which could benefit from argument mining techniques. However, While various argument mining studies have addressed student essays and news articles, those that target scientific discourse are still scarce. This paper surveys existing work in argument mining of scholarly discourse, and provides an overview of current models, data, tasks, and applications. We identify a number of key challenges confronting argument mining in the scientific domain, and suggest some possible solutions and future directions.
Source: Proceedings of the Second Workshop on Scholarly Document Processing,

Polity, Y.: Vers une ergonomie linguistique (1994) 0.04

0.03764897 = product of:
  0.07529794 = sum of:
    0.04138403 = weight(_text_:data in 36) [ClassicSimilarity], result of:
      0.04138403 = score(doc=36,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.2794884 = fieldWeight in 36, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0625 = fieldNorm(doc=36)
    0.033913903 = product of:
      0.067827806 = sum of:
        0.067827806 = weight(_text_:processing in 36) [ClassicSimilarity], result of:
          0.067827806 = score(doc=36,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.35780904 = fieldWeight in 36, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0625 = fieldNorm(doc=36)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: Analyzed a special type of man-mchine interaction, that of searching an information system with natural language. A model for full text processing for information retrieval was proposed that considered the system's users and how they employ information. Describes how INIST (the National Institute for Scientific and Technical Information) is developing computer assisted indexing as an aid to improving relevance when retrieving information from bibliographic data banks

Lezius, W.; Rapp, R.; Wettler, M.: ¬A morphology-system and part-of-speech tagger for German (1996) 0.04

0.03705736 = product of:
  0.14822944 = sum of:
    0.14822944 = sum of:
      0.08478475 = weight(_text_:processing in 1693) [ClassicSimilarity], result of:
        0.08478475 = score(doc=1693,freq=2.0), product of:
          0.18956426 = queryWeight, product of:
            4.048147 = idf(docFreq=2097, maxDocs=44218)
            0.046827413 = queryNorm
          0.4472613 = fieldWeight in 1693, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            4.048147 = idf(docFreq=2097, maxDocs=44218)
            0.078125 = fieldNorm(doc=1693)
      0.06344468 = weight(_text_:22 in 1693) [ClassicSimilarity], result of:
        0.06344468 = score(doc=1693,freq=2.0), product of:
          0.16398162 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046827413 = queryNorm
          0.38690117 = fieldWeight in 1693, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.078125 = fieldNorm(doc=1693)
  0.25 = coord(1/4)

Date: 22. 3.2015 9:37:18
Source: Natural language processing and speech technology: Results of the 3rd KONVENS Conference, Bielefeld, October 1996. Ed.: D. Gibbon

Rahmstorf, G.: Information retrieval using conceptual representations of phrases (1994) 0.03

0.03466491 = product of:
  0.06932982 = sum of:
    0.043894395 = weight(_text_:data in 7862) [ClassicSimilarity], result of:
      0.043894395 = score(doc=7862,freq=4.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.29644224 = fieldWeight in 7862, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=7862)
    0.025435425 = product of:
      0.05087085 = sum of:
        0.05087085 = weight(_text_:processing in 7862) [ClassicSimilarity], result of:
          0.05087085 = score(doc=7862,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.26835677 = fieldWeight in 7862, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046875 = fieldNorm(doc=7862)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: The information retrieval problem is described starting from an analysis of the concepts 'user's information request' and 'information offerings of texts'. It is shown that natural language phrases are a more adequate medium for expressing information requests and information offerings than character string based query and indexing languages complemented by Boolean oprators. The phrases must be represented as concepts to reach a language invariant level for rule based relevance analysis. The special type of representation called advanced thesaurus is used for the semantic representation of natural language phrases and for relevance processing. The analysis of the retrieval problem leads to a symmetric system structure
Series: Studies in classification, data analysis, and knowledge organization
Source: Information systems and data analysis: prospects - foundations - applications. Proc. of the 17th Annual Conference of the Gesellschaft für Klassifikation, Kaiserslautern, March 3-5, 1993. Ed.: H.-H. Bock et al

Ingenerf, J.: Disambiguating lexical meaning : conceptual meta-modelling as a means of controlling semantic language analysis (1994) 0.03
```
0.03466491 = product of:
  0.06932982 = sum of:
    0.043894395 = weight(_text_:data in 2572) [ClassicSimilarity], result of:
      0.043894395 = score(doc=2572,freq=4.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.29644224 = fieldWeight in 2572, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=2572)
    0.025435425 = product of:
      0.05087085 = sum of:
        0.05087085 = weight(_text_:processing in 2572) [ClassicSimilarity], result of:
          0.05087085 = score(doc=2572,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.26835677 = fieldWeight in 2572, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046875 = fieldNorm(doc=2572)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

A formal terminology consists of a set of conceptual definitions for the semantical reconstruction of a vocabulary on an intensional level of description. The marking of comparatively abstract concepts as semantic categories and their relational positioning on a meta-level is shown to be instrumental in adapting the conceptual design to domain-specific characteristics. Such a meta-model implies that concepts subsumed by categories may share their compositional possibilities as regards the construction of complex structures. Our approach to language processing leads to an automatic derivation of contextual semantic information about the linguistic expressions under review. This information is encoded by means of values of certain attributes defined in a feature-based grammatical framework. A standard process controlling grammatical analysis, the unification of feature structures, is used for its evaluation. One important example for the usefulness of this approach is the disamgiguation of lexical meaning

Series

Studies in classification, data analysis, and knowledge organization

Source

Information systems and data analysis: prospects - foundations - applications. Proc. of the 17th Annual Conference of the Gesellschaft für Klassifikation, Kaiserslautern, March 3-5, 1993. Ed.: H.-H. Bock et al

Search (230 results, page 1 of 12)

Authors

Years

Languages

Types

Themes