Search (110 results, page 1 of 6)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.17

0.17408767 = sum of:
  0.082819656 = product of:
    0.24845897 = sum of:
      0.24845897 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
        0.24845897 = score(doc=562,freq=2.0), product of:
          0.44208363 = queryWeight, product of:
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.052144732 = queryNorm
          0.56201804 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
    0.33333334 = coord(1/3)
  0.09126801 = sum of:
    0.048878662 = weight(_text_:data in 562) [ClassicSimilarity], result of:
      0.048878662 = score(doc=562,freq=4.0), product of:
        0.16488427 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.052144732 = queryNorm
        0.29644224 = fieldWeight in 562, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.04238935 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
      0.04238935 = score(doc=562,freq=2.0), product of:
        0.18260197 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.052144732 = queryNorm
        0.23214069 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32
Source: Proceedings of the 4th IEEE International Conference on Data Mining (ICDM 2004), 1-4 November 2004, Brighton, UK

Basili, R.; Pazienza, M.T.; Velardi, P.: ¬An empirical symbolic approach to natural language processing (1996) 0.06

0.060845338 = product of:
  0.121690676 = sum of:
    0.121690676 = sum of:
      0.06517155 = weight(_text_:data in 6753) [ClassicSimilarity], result of:
        0.06517155 = score(doc=6753,freq=4.0), product of:
          0.16488427 = queryWeight, product of:
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.052144732 = queryNorm
          0.3952563 = fieldWeight in 6753, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.0625 = fieldNorm(doc=6753)
      0.056519132 = weight(_text_:22 in 6753) [ClassicSimilarity], result of:
        0.056519132 = score(doc=6753,freq=2.0), product of:
          0.18260197 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.052144732 = queryNorm
          0.30952093 = fieldWeight in 6753, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0625 = fieldNorm(doc=6753)
  0.5 = coord(1/2)

Abstract: Describes and evaluates the results of a large scale lexical learning system, ARISTO-LEX, that uses a combination of probabilisitc and knowledge based methods for the acquisition of selectional restrictions of words in sublanguages. Presents experimental data obtained from different corpora in different doamins and languages, and shows that the acquired lexical data not only have practical applications in natural language processing, but they are useful for a comparative analysis of sublanguages
Date: 6. 3.1997 16:22:15

Liddy, E.D.: Natural language processing for information retrieval and knowledge discovery (1998) 0.04

0.04488854 = product of:
  0.08977708 = sum of:
    0.08977708 = sum of:
      0.040322836 = weight(_text_:data in 2345) [ClassicSimilarity], result of:
        0.040322836 = score(doc=2345,freq=2.0), product of:
          0.16488427 = queryWeight, product of:
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.052144732 = queryNorm
          0.24455236 = fieldWeight in 2345, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.0546875 = fieldNorm(doc=2345)
      0.049454242 = weight(_text_:22 in 2345) [ClassicSimilarity], result of:
        0.049454242 = score(doc=2345,freq=2.0), product of:
          0.18260197 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.052144732 = queryNorm
          0.2708308 = fieldWeight in 2345, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=2345)
  0.5 = coord(1/2)

Date: 22. 9.1997 19:16:05
Source: Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al

Noever, D.; Ciolino, M.: ¬The Turing deception (2022) 0.04

0.041409828 = product of:
  0.082819656 = sum of:
    0.082819656 = product of:
      0.24845897 = sum of:
        0.24845897 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
          0.24845897 = score(doc=862,freq=2.0), product of:
            0.44208363 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.052144732 = queryNorm
            0.56201804 = fieldWeight in 862, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=862)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Source: https%3A%2F%2Farxiv.org%2Fabs%2F2212.06721&usg=AOvVaw3i_9pZm9y_dQWoHi6uv0EN

Rahmstorf, G.: Concept structures for large vocabularies (1998) 0.04

0.03847589 = product of:
  0.07695178 = sum of:
    0.07695178 = sum of:
      0.03456243 = weight(_text_:data in 75) [ClassicSimilarity], result of:
        0.03456243 = score(doc=75,freq=2.0), product of:
          0.16488427 = queryWeight, product of:
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.052144732 = queryNorm
          0.2096163 = fieldWeight in 75, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.046875 = fieldNorm(doc=75)
      0.04238935 = weight(_text_:22 in 75) [ClassicSimilarity], result of:
        0.04238935 = score(doc=75,freq=2.0), product of:
          0.18260197 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.052144732 = queryNorm
          0.23214069 = fieldWeight in 75, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=75)
  0.5 = coord(1/2)

Abstract: A technology is described which supports the acquisition, visualisation and manipulation of large vocabularies with associated structures. It is used for dictionary production, terminology data bases, thesauri, library classification systems etc. Essential features of the technology are a lexicographic user interface, variable word description, unlimited list of word readings, a concept language, automatic transformations of formulas into graphic structures, structure manipulation operations and retransformation into formulas. The concept language includes notations for undefined concepts. The structure of defined concepts can be constructed interactively. The technology supports the generation of large vocabularies with structures representing word senses. Concept structures and ordering systems for indexing and retrieval can be constructed separately and connected by associating relations.
Date: 30.12.2001 19:01:22

Huo, W.: Automatic multi-word term extraction and its application to Web-page summarization (2012) 0.04

0.03847589 = product of:
  0.07695178 = sum of:
    0.07695178 = sum of:
      0.03456243 = weight(_text_:data in 563) [ClassicSimilarity], result of:
        0.03456243 = score(doc=563,freq=2.0), product of:
          0.16488427 = queryWeight, product of:
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.052144732 = queryNorm
          0.2096163 = fieldWeight in 563, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.046875 = fieldNorm(doc=563)
      0.04238935 = weight(_text_:22 in 563) [ClassicSimilarity], result of:
        0.04238935 = score(doc=563,freq=2.0), product of:
          0.18260197 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.052144732 = queryNorm
          0.23214069 = fieldWeight in 563, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=563)
  0.5 = coord(1/2)

Abstract: In this thesis we propose three new word association measures for multi-word term extraction. We combine these association measures with LocalMaxs algorithm in our extraction model and compare the results of different multi-word term extraction methods. Our approach is language and domain independent and requires no training data. It can be applied to such tasks as text summarization, information retrieval, and document classification. We further explore the potential of using multi-word terms as an effective representation for general web-page summarization. We extract multi-word terms from human written summaries in a large collection of web-pages, and generate the summaries by aligning document words with these multi-word terms. Our system applies machine translation technology to learn the aligning process from a training set and focuses on selecting high quality multi-word terms from human written summaries to generate suitable results for web-page summarization.
Date: 10. 1.2013 19:22:47

Warner, A.J.: Natural language processing (1987) 0.03

0.028259566 = product of:
  0.056519132 = sum of:
    0.056519132 = product of:
      0.113038264 = sum of:
        0.113038264 = weight(_text_:22 in 337) [ClassicSimilarity], result of:
          0.113038264 = score(doc=337,freq=2.0), product of:
            0.18260197 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052144732 = queryNorm
            0.61904186 = fieldWeight in 337, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.125 = fieldNorm(doc=337)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Annual review of information science and technology. 22(1987), S.79-108

McMahon, J.G.; Smith, F.J.: Improved statistical language model performance with automatic generated word hierarchies (1996) 0.02

0.024727121 = product of:
  0.049454242 = sum of:
    0.049454242 = product of:
      0.098908484 = sum of:
        0.098908484 = weight(_text_:22 in 3164) [ClassicSimilarity], result of:
          0.098908484 = score(doc=3164,freq=2.0), product of:
            0.18260197 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052144732 = queryNorm
            0.5416616 = fieldWeight in 3164, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=3164)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Computational linguistics. 22(1996) no.2, S.217-248

Ruge, G.: ¬A spreading activation network for automatic generation of thesaurus relationships (1991) 0.02

0.024727121 = product of:
  0.049454242 = sum of:
    0.049454242 = product of:
      0.098908484 = sum of:
        0.098908484 = weight(_text_:22 in 4506) [ClassicSimilarity], result of:
          0.098908484 = score(doc=4506,freq=2.0), product of:
            0.18260197 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052144732 = queryNorm
            0.5416616 = fieldWeight in 4506, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=4506)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 8.10.2000 11:52:22

Somers, H.: Example-based machine translation : Review article (1999) 0.02

0.024727121 = product of:
  0.049454242 = sum of:
    0.049454242 = product of:
      0.098908484 = sum of:
        0.098908484 = weight(_text_:22 in 6672) [ClassicSimilarity], result of:
          0.098908484 = score(doc=6672,freq=2.0), product of:
            0.18260197 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052144732 = queryNorm
            0.5416616 = fieldWeight in 6672, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=6672)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 31. 7.1996 9:22:19

New tools for human translators (1997) 0.02

0.024727121 = product of:
  0.049454242 = sum of:
    0.049454242 = product of:
      0.098908484 = sum of:
        0.098908484 = weight(_text_:22 in 1179) [ClassicSimilarity], result of:
          0.098908484 = score(doc=1179,freq=2.0), product of:
            0.18260197 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052144732 = queryNorm
            0.5416616 = fieldWeight in 1179, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=1179)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 31. 7.1996 9:22:19

Baayen, R.H.; Lieber, H.: Word frequency distributions and lexical semantics (1997) 0.02

0.024727121 = product of:
  0.049454242 = sum of:
    0.049454242 = product of:
      0.098908484 = sum of:
        0.098908484 = weight(_text_:22 in 3117) [ClassicSimilarity], result of:
          0.098908484 = score(doc=3117,freq=2.0), product of:
            0.18260197 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052144732 = queryNorm
            0.5416616 = fieldWeight in 3117, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=3117)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 28. 2.1999 10:48:22

WordNet : an electronic lexical database (language, speech and communication) (1998) 0.02

0.024692593 = product of:
  0.049385186 = sum of:
    0.049385186 = product of:
      0.09877037 = sum of:
        0.09877037 = weight(_text_:data in 2434) [ClassicSimilarity], result of:
          0.09877037 = score(doc=2434,freq=12.0), product of:
            0.16488427 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.052144732 = queryNorm
            0.59902847 = fieldWeight in 2434, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2434)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

LCSH: Semantics / Data processing
Lexicology / Data processing
English language / Data processing
Subject: Semantics / Data processing
Lexicology / Data processing
English language / Data processing

Barton, G.E. Jr.; Berwick, R.C.; Ristad, E.S.: Computational complexity and natural language (1987) 0.02

0.024439331 = product of:
  0.048878662 = sum of:
    0.048878662 = product of:
      0.097757325 = sum of:
        0.097757325 = weight(_text_:data in 7138) [ClassicSimilarity], result of:
          0.097757325 = score(doc=7138,freq=4.0), product of:
            0.16488427 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.052144732 = queryNorm
            0.5928845 = fieldWeight in 7138, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.09375 = fieldNorm(doc=7138)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

LCSH: Linguistics / Data processing
Subject: Linguistics / Data processing

Hausser, R.: Language and nonlanguage cognition (2021) 0.02
```
0.022860901 = product of:
  0.045721803 = sum of:
    0.045721803 = product of:
      0.091443606 = sum of:
        0.091443606 = weight(_text_:data in 255) [ClassicSimilarity], result of:
          0.091443606 = score(doc=255,freq=14.0), product of:
            0.16488427 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.052144732 = queryNorm
            0.55459267 = fieldWeight in 255, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=255)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

A basic distinction in agent-based data-driven Database Semantics (DBS) is between language and nonlanguage cognition. Language cognition transfers content between agents by means of raw data. Nonlanguage cognition maps between content and raw data inside the focus agent. {\it Recognition} applies a concept type to raw data, resulting in a concept token. In language recognition, the focus agent (hearer) takes raw language-data (surfaces) produced by another agent (speaker) as input, while nonlanguage recognition takes raw nonlanguage-data as input. In either case, the output is a content which is stored in the agent's onboard short term memory. {\it Action} adapts a concept type to a purpose, resulting in a token. In language action, the focus agent (speaker) produces language-dependent surfaces for another agent (hearer), while nonlanguage action produces intentions for a nonlanguage purpose. In either case, the output is raw action data. As long as the procedural implementation of place holder values works properly, it is compatible with the DBS requirement of input-output equivalence between the natural prototype and the artificial reconstruction.
Deventer, J.P. van; Kruger, C.J.; Johnson, R.D.: Delineating knowledge management through lexical analysis : a retrospective (2015) 0.02
```
0.02244427 = product of:
  0.04488854 = sum of:
    0.04488854 = sum of:
      0.020161418 = weight(_text_:data in 3807) [ClassicSimilarity], result of:
        0.020161418 = score(doc=3807,freq=2.0), product of:
          0.16488427 = queryWeight, product of:
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.052144732 = queryNorm
          0.12227618 = fieldWeight in 3807, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.02734375 = fieldNorm(doc=3807)
      0.024727121 = weight(_text_:22 in 3807) [ClassicSimilarity], result of:
        0.024727121 = score(doc=3807,freq=2.0), product of:
          0.18260197 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.052144732 = queryNorm
          0.1354154 = fieldWeight in 3807, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.02734375 = fieldNorm(doc=3807)
  0.5 = coord(1/2)
```
Abstract

Purpose Academic authors tend to define terms that meet their own needs. Knowledge Management (KM) is a term that comes to mind and is examined in this study. Lexicographical research identified KM terms used by authors from 1996 to 2006 in academic outlets to define KM. Data were collected based on strict criteria which included that definitions should be unique instances. From 2006 onwards, these authors could not identify new unique instances of definitions with repetitive usage of such definition instances. Analysis revealed that KM is directly defined by People (Person and Organisation), Processes (Codify, Share, Leverage, and Process) and Contextualised Content (Information). The paper aims to discuss these issues. Design/methodology/approach The aim of this paper is to add to the body of knowledge in the KM discipline and supply KM practitioners and scholars with insight into what is commonly regarded to be KM so as to reignite the debate on what one could consider as KM. The lexicon used by KM scholars was evaluated though the application of lexicographical research methods as extended though Knowledge Discovery and Text Analysis methods. Findings By simplifying term relationships through the application of lexicographical research methods, as extended though Knowledge Discovery and Text Analysis methods, it was found that KM is directly defined by People (Person and Organisation), Processes (Codify, Share, Leverage, Process) and Contextualised Content (Information). One would therefore be able to indicate that KM, from an academic point of view, refers to people processing contextualised content.

Date

20. 1.2015 18:30:22
Arsenault, C.: Aggregation consistency and frequency of Chinese words and characters (2006) 0.02
```
0.021601519 = product of:
  0.043203037 = sum of:
    0.043203037 = product of:
      0.086406074 = sum of:
        0.086406074 = weight(_text_:data in 609) [ClassicSimilarity], result of:
          0.086406074 = score(doc=609,freq=18.0), product of:
            0.16488427 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.052144732 = queryNorm
            0.52404076 = fieldWeight in 609, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=609)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Purpose - Aims to measure syllable aggregation consistency of Romanized Chinese data in the title fields of bibliographic records. Also aims to verify if the term frequency distributions satisfy conventional bibliometric laws. Design/methodology/approach - Uses Cooper's interindexer formula to evaluate aggregation consistency within and between two sets of Chinese bibliographic data. Compares the term frequency distributions of polysyllabic words and monosyllabic characters (for vernacular and Romanized data) with the Lotka and the generalised Zipf theoretical distributions. The fits are tested with the Kolmogorov-Smirnov test. Findings - Finds high internal aggregation consistency within each data set but some aggregation discrepancy between sets. Shows that word (polysyllabic) distributions satisfy Lotka's law but that character (monosyllabic) distributions do not abide by the law. Research limitations/implications - The findings are limited to only two sets of bibliographic data (for aggregation consistency analysis) and to one set of data for the frequency distribution analysis. Only two bibliometric distributions are tested. Internal consistency within each database remains fairly high. Therefore the main argument against syllable aggregation does not appear to hold true. The analysis revealed that Chinese words and characters behave differently in terms of frequency distribution but that there is no noticeable difference between vernacular and Romanized data. The distribution of Romanized characters exhibits the worst case in terms of fit to either Lotka's or Zipf's laws, which indicates that Romanized data in aggregated form appear to be a preferable option. Originality/value - Provides empirical data on consistency and distribution of Romanized Chinese titles in bibliographic records.

Byrne, C.C.; McCracken, S.A.: ¬An adaptive thesaurus employing semantic distance, relational inheritance and nominal compound interpretation for linguistic support of information retrieval (1999) 0.02

0.021194674 = product of:
  0.04238935 = sum of:
    0.04238935 = product of:
      0.0847787 = sum of:
        0.0847787 = weight(_text_:22 in 4483) [ClassicSimilarity], result of:
          0.0847787 = score(doc=4483,freq=2.0), product of:
            0.18260197 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052144732 = queryNorm
            0.46428138 = fieldWeight in 4483, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=4483)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 15. 3.2000 10:22:37

Boleda, G.; Evert, S.: Multiword expressions : a pain in the neck of lexical semantics (2009) 0.02

0.021194674 = product of:
  0.04238935 = sum of:
    0.04238935 = product of:
      0.0847787 = sum of:
        0.0847787 = weight(_text_:22 in 4888) [ClassicSimilarity], result of:
          0.0847787 = score(doc=4888,freq=2.0), product of:
            0.18260197 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052144732 = queryNorm
            0.46428138 = fieldWeight in 4888, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=4888)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 1. 3.2013 14:56:22

Xiang, R.; Chersoni, E.; Lu, Q.; Huang, C.-R.; Li, W.; Long, Y.: Lexical data augmentation for sentiment analysis (2021) 0.02
```
0.020366108 = product of:
  0.040732216 = sum of:
    0.040732216 = product of:
      0.08146443 = sum of:
        0.08146443 = weight(_text_:data in 392) [ClassicSimilarity], result of:
          0.08146443 = score(doc=392,freq=16.0), product of:
            0.16488427 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.052144732 = queryNorm
            0.49407038 = fieldWeight in 392, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=392)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Machine learning methods, especially deep learning models, have achieved impressive performance in various natural language processing tasks including sentiment analysis. However, deep learning models are more demanding for training data. Data augmentation techniques are widely used to generate new instances based on modifications to existing data or relying on external knowledge bases to address annotated data scarcity, which hinders the full potential of machine learning techniques. This paper presents our work using part-of-speech (POS) focused lexical substitution for data augmentation (PLSDA) to enhance the performance of machine learning algorithms in sentiment analysis. We exploit POS information to identify words to be replaced and investigate different augmentation strategies to find semantically related substitutions when generating new instances. The choice of POS tags as well as a variety of strategies such as semantic-based substitution methods and sampling methods are discussed in detail. Performance evaluation focuses on the comparison between PLSDA and two previous lexical substitution-based data augmentation methods, one of which is thesaurus-based, and the other is lexicon manipulation based. Our approach is tested on five English sentiment analysis benchmarks: SST-2, MR, IMDB, Twitter, and AirRecord. Hyperparameters such as the candidate similarity threshold and number of newly generated instances are optimized. Results show that six classifiers (SVM, LSTM, BiLSTM-AT, bidirectional encoder representations from transformers [BERT], XLNet, and RoBERTa) trained with PLSDA achieve accuracy improvement of more than 0.6% comparing to two previous lexical substitution methods averaged on five benchmarks. Introducing POS constraint and well-designed augmentation strategies can improve the reliability of lexical data augmentation methods. Consequently, PLSDA significantly improves the performance of sentiment analysis algorithms.

Search (110 results, page 1 of 6)

Authors

Years

Types

Themes

Subjects

Classifications