Search (106 results, page 1 of 6)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.17

0.17408767 = sum of:
  0.082819656 = product of:
    0.24845897 = sum of:
      0.24845897 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
        0.24845897 = score(doc=562,freq=2.0), product of:
          0.44208363 = queryWeight, product of:
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.052144732 = queryNorm
          0.56201804 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
    0.33333334 = coord(1/3)
  0.09126801 = sum of:
    0.048878662 = weight(_text_:data in 562) [ClassicSimilarity], result of:
      0.048878662 = score(doc=562,freq=4.0), product of:
        0.16488427 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.052144732 = queryNorm
        0.29644224 = fieldWeight in 562, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.04238935 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
      0.04238935 = score(doc=562,freq=2.0), product of:
        0.18260197 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.052144732 = queryNorm
        0.23214069 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32
Source: Proceedings of the 4th IEEE International Conference on Data Mining (ICDM 2004), 1-4 November 2004, Brighton, UK

Basili, R.; Pazienza, M.T.; Velardi, P.: ¬An empirical symbolic approach to natural language processing (1996) 0.06

0.060845338 = product of:
  0.121690676 = sum of:
    0.121690676 = sum of:
      0.06517155 = weight(_text_:data in 6753) [ClassicSimilarity], result of:
        0.06517155 = score(doc=6753,freq=4.0), product of:
          0.16488427 = queryWeight, product of:
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.052144732 = queryNorm
          0.3952563 = fieldWeight in 6753, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.0625 = fieldNorm(doc=6753)
      0.056519132 = weight(_text_:22 in 6753) [ClassicSimilarity], result of:
        0.056519132 = score(doc=6753,freq=2.0), product of:
          0.18260197 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.052144732 = queryNorm
          0.30952093 = fieldWeight in 6753, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0625 = fieldNorm(doc=6753)
  0.5 = coord(1/2)

Abstract: Describes and evaluates the results of a large scale lexical learning system, ARISTO-LEX, that uses a combination of probabilisitc and knowledge based methods for the acquisition of selectional restrictions of words in sublanguages. Presents experimental data obtained from different corpora in different doamins and languages, and shows that the acquired lexical data not only have practical applications in natural language processing, but they are useful for a comparative analysis of sublanguages
Date: 6. 3.1997 16:22:15

Liddy, E.D.: Natural language processing for information retrieval and knowledge discovery (1998) 0.04

0.04488854 = product of:
  0.08977708 = sum of:
    0.08977708 = sum of:
      0.040322836 = weight(_text_:data in 2345) [ClassicSimilarity], result of:
        0.040322836 = score(doc=2345,freq=2.0), product of:
          0.16488427 = queryWeight, product of:
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.052144732 = queryNorm
          0.24455236 = fieldWeight in 2345, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.0546875 = fieldNorm(doc=2345)
      0.049454242 = weight(_text_:22 in 2345) [ClassicSimilarity], result of:
        0.049454242 = score(doc=2345,freq=2.0), product of:
          0.18260197 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.052144732 = queryNorm
          0.2708308 = fieldWeight in 2345, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=2345)
  0.5 = coord(1/2)

Date: 22. 9.1997 19:16:05
Source: Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al

Noever, D.; Ciolino, M.: ¬The Turing deception (2022) 0.04

0.041409828 = product of:
  0.082819656 = sum of:
    0.082819656 = product of:
      0.24845897 = sum of:
        0.24845897 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
          0.24845897 = score(doc=862,freq=2.0), product of:
            0.44208363 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.052144732 = queryNorm
            0.56201804 = fieldWeight in 862, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=862)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Source: https%3A%2F%2Farxiv.org%2Fabs%2F2212.06721&usg=AOvVaw3i_9pZm9y_dQWoHi6uv0EN

Rahmstorf, G.: Concept structures for large vocabularies (1998) 0.04

0.03847589 = product of:
  0.07695178 = sum of:
    0.07695178 = sum of:
      0.03456243 = weight(_text_:data in 75) [ClassicSimilarity], result of:
        0.03456243 = score(doc=75,freq=2.0), product of:
          0.16488427 = queryWeight, product of:
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.052144732 = queryNorm
          0.2096163 = fieldWeight in 75, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.046875 = fieldNorm(doc=75)
      0.04238935 = weight(_text_:22 in 75) [ClassicSimilarity], result of:
        0.04238935 = score(doc=75,freq=2.0), product of:
          0.18260197 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.052144732 = queryNorm
          0.23214069 = fieldWeight in 75, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=75)
  0.5 = coord(1/2)

Abstract: A technology is described which supports the acquisition, visualisation and manipulation of large vocabularies with associated structures. It is used for dictionary production, terminology data bases, thesauri, library classification systems etc. Essential features of the technology are a lexicographic user interface, variable word description, unlimited list of word readings, a concept language, automatic transformations of formulas into graphic structures, structure manipulation operations and retransformation into formulas. The concept language includes notations for undefined concepts. The structure of defined concepts can be constructed interactively. The technology supports the generation of large vocabularies with structures representing word senses. Concept structures and ordering systems for indexing and retrieval can be constructed separately and connected by associating relations.
Date: 30.12.2001 19:01:22

Warner, A.J.: Natural language processing (1987) 0.03

0.028259566 = product of:
  0.056519132 = sum of:
    0.056519132 = product of:
      0.113038264 = sum of:
        0.113038264 = weight(_text_:22 in 337) [ClassicSimilarity], result of:
          0.113038264 = score(doc=337,freq=2.0), product of:
            0.18260197 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052144732 = queryNorm
            0.61904186 = fieldWeight in 337, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.125 = fieldNorm(doc=337)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Annual review of information science and technology. 22(1987), S.79-108

McMahon, J.G.; Smith, F.J.: Improved statistical language model performance with automatic generated word hierarchies (1996) 0.02

0.024727121 = product of:
  0.049454242 = sum of:
    0.049454242 = product of:
      0.098908484 = sum of:
        0.098908484 = weight(_text_:22 in 3164) [ClassicSimilarity], result of:
          0.098908484 = score(doc=3164,freq=2.0), product of:
            0.18260197 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052144732 = queryNorm
            0.5416616 = fieldWeight in 3164, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=3164)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Computational linguistics. 22(1996) no.2, S.217-248

Ruge, G.: ¬A spreading activation network for automatic generation of thesaurus relationships (1991) 0.02

0.024727121 = product of:
  0.049454242 = sum of:
    0.049454242 = product of:
      0.098908484 = sum of:
        0.098908484 = weight(_text_:22 in 4506) [ClassicSimilarity], result of:
          0.098908484 = score(doc=4506,freq=2.0), product of:
            0.18260197 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052144732 = queryNorm
            0.5416616 = fieldWeight in 4506, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=4506)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 8.10.2000 11:52:22

Somers, H.: Example-based machine translation : Review article (1999) 0.02

0.024727121 = product of:
  0.049454242 = sum of:
    0.049454242 = product of:
      0.098908484 = sum of:
        0.098908484 = weight(_text_:22 in 6672) [ClassicSimilarity], result of:
          0.098908484 = score(doc=6672,freq=2.0), product of:
            0.18260197 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052144732 = queryNorm
            0.5416616 = fieldWeight in 6672, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=6672)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 31. 7.1996 9:22:19

Baayen, R.H.; Lieber, H.: Word frequency distributions and lexical semantics (1997) 0.02

0.024727121 = product of:
  0.049454242 = sum of:
    0.049454242 = product of:
      0.098908484 = sum of:
        0.098908484 = weight(_text_:22 in 3117) [ClassicSimilarity], result of:
          0.098908484 = score(doc=3117,freq=2.0), product of:
            0.18260197 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052144732 = queryNorm
            0.5416616 = fieldWeight in 3117, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=3117)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 28. 2.1999 10:48:22

¬Der Student aus dem Computer (2023) 0.02

0.024727121 = product of:
  0.049454242 = sum of:
    0.049454242 = product of:
      0.098908484 = sum of:
        0.098908484 = weight(_text_:22 in 1079) [ClassicSimilarity], result of:
          0.098908484 = score(doc=1079,freq=2.0), product of:
            0.18260197 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052144732 = queryNorm
            0.5416616 = fieldWeight in 1079, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=1079)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 27. 1.2023 16:22:55

Arsenault, C.: Aggregation consistency and frequency of Chinese words and characters (2006) 0.02
```
0.021601519 = product of:
  0.043203037 = sum of:
    0.043203037 = product of:
      0.086406074 = sum of:
        0.086406074 = weight(_text_:data in 609) [ClassicSimilarity], result of:
          0.086406074 = score(doc=609,freq=18.0), product of:
            0.16488427 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.052144732 = queryNorm
            0.52404076 = fieldWeight in 609, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=609)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Purpose - Aims to measure syllable aggregation consistency of Romanized Chinese data in the title fields of bibliographic records. Also aims to verify if the term frequency distributions satisfy conventional bibliometric laws. Design/methodology/approach - Uses Cooper's interindexer formula to evaluate aggregation consistency within and between two sets of Chinese bibliographic data. Compares the term frequency distributions of polysyllabic words and monosyllabic characters (for vernacular and Romanized data) with the Lotka and the generalised Zipf theoretical distributions. The fits are tested with the Kolmogorov-Smirnov test. Findings - Finds high internal aggregation consistency within each data set but some aggregation discrepancy between sets. Shows that word (polysyllabic) distributions satisfy Lotka's law but that character (monosyllabic) distributions do not abide by the law. Research limitations/implications - The findings are limited to only two sets of bibliographic data (for aggregation consistency analysis) and to one set of data for the frequency distribution analysis. Only two bibliometric distributions are tested. Internal consistency within each database remains fairly high. Therefore the main argument against syllable aggregation does not appear to hold true. The analysis revealed that Chinese words and characters behave differently in terms of frequency distribution but that there is no noticeable difference between vernacular and Romanized data. The distribution of Romanized characters exhibits the worst case in terms of fit to either Lotka's or Zipf's laws, which indicates that Romanized data in aggregated form appear to be a preferable option. Originality/value - Provides empirical data on consistency and distribution of Romanized Chinese titles in bibliographic records.

Byrne, C.C.; McCracken, S.A.: ¬An adaptive thesaurus employing semantic distance, relational inheritance and nominal compound interpretation for linguistic support of information retrieval (1999) 0.02

0.021194674 = product of:
  0.04238935 = sum of:
    0.04238935 = product of:
      0.0847787 = sum of:
        0.0847787 = weight(_text_:22 in 4483) [ClassicSimilarity], result of:
          0.0847787 = score(doc=4483,freq=2.0), product of:
            0.18260197 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052144732 = queryNorm
            0.46428138 = fieldWeight in 4483, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=4483)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 15. 3.2000 10:22:37

Monnerjahn, P.: Vorsprung ohne Technik : Übersetzen: Computer und Qualität (2000) 0.02

0.021194674 = product of:
  0.04238935 = sum of:
    0.04238935 = product of:
      0.0847787 = sum of:
        0.0847787 = weight(_text_:22 in 5429) [ClassicSimilarity], result of:
          0.0847787 = score(doc=5429,freq=2.0), product of:
            0.18260197 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052144732 = queryNorm
            0.46428138 = fieldWeight in 5429, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=5429)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: c't. 2000, H.22, S.230-231

Xiang, R.; Chersoni, E.; Lu, Q.; Huang, C.-R.; Li, W.; Long, Y.: Lexical data augmentation for sentiment analysis (2021) 0.02
```
0.020366108 = product of:
  0.040732216 = sum of:
    0.040732216 = product of:
      0.08146443 = sum of:
        0.08146443 = weight(_text_:data in 392) [ClassicSimilarity], result of:
          0.08146443 = score(doc=392,freq=16.0), product of:
            0.16488427 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.052144732 = queryNorm
            0.49407038 = fieldWeight in 392, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=392)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Machine learning methods, especially deep learning models, have achieved impressive performance in various natural language processing tasks including sentiment analysis. However, deep learning models are more demanding for training data. Data augmentation techniques are widely used to generate new instances based on modifications to existing data or relying on external knowledge bases to address annotated data scarcity, which hinders the full potential of machine learning techniques. This paper presents our work using part-of-speech (POS) focused lexical substitution for data augmentation (PLSDA) to enhance the performance of machine learning algorithms in sentiment analysis. We exploit POS information to identify words to be replaced and investigate different augmentation strategies to find semantically related substitutions when generating new instances. The choice of POS tags as well as a variety of strategies such as semantic-based substitution methods and sampling methods are discussed in detail. Performance evaluation focuses on the comparison between PLSDA and two previous lexical substitution-based data augmentation methods, one of which is thesaurus-based, and the other is lexicon manipulation based. Our approach is tested on five English sentiment analysis benchmarks: SST-2, MR, IMDB, Twitter, and AirRecord. Hyperparameters such as the candidate similarity threshold and number of newly generated instances are optimized. Results show that six classifiers (SVM, LSTM, BiLSTM-AT, bidirectional encoder representations from transformers [BERT], XLNet, and RoBERTa) trained with PLSDA achieve accuracy improvement of more than 0.6% comparing to two previous lexical substitution methods averaged on five benchmarks. Introducing POS constraint and well-designed augmentation strategies can improve the reliability of lexical data augmentation methods. Consequently, PLSDA significantly improves the performance of sentiment analysis algorithms.
Ruge, G.: Experiments on linguistically-based term associations (1992) 0.02
```
0.019954631 = product of:
  0.039909262 = sum of:
    0.039909262 = product of:
      0.079818524 = sum of:
        0.079818524 = weight(_text_:data in 1810) [ClassicSimilarity], result of:
          0.079818524 = score(doc=1810,freq=6.0), product of:
            0.16488427 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.052144732 = queryNorm
            0.48408815 = fieldWeight in 1810, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0625 = fieldNorm(doc=1810)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Describes the hyperterm system REALIST (REtrieval Aids by LInguistic and STatistics) and describes its semantic component. The semantic component of REALIST generates semantic term relations such synonyms. It takes as input a free text data base and generates as output term pairs that are semantically related with respect to their meanings in the data base. In the 1st step an automatic syntactic analysis provides linguistical knowledge about the terms of the data base. In the 2nd step this knowledge is compared by statistical similarity computation. Various experiments with different similarity measures are described

Hutchins, J.: From first conception to first demonstration : the nascent years of machine translation, 1947-1954. A chronology (1997) 0.02

0.017662229 = product of:
  0.035324458 = sum of:
    0.035324458 = product of:
      0.070648916 = sum of:
        0.070648916 = weight(_text_:22 in 1463) [ClassicSimilarity], result of:
          0.070648916 = score(doc=1463,freq=2.0), product of:
            0.18260197 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052144732 = queryNorm
            0.38690117 = fieldWeight in 1463, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=1463)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 31. 7.1996 9:22:19

Kuhlmann, U.; Monnerjahn, P.: Sprache auf Knopfdruck : Sieben automatische Übersetzungsprogramme im Test (2000) 0.02

0.017662229 = product of:
  0.035324458 = sum of:
    0.035324458 = product of:
      0.070648916 = sum of:
        0.070648916 = weight(_text_:22 in 5428) [ClassicSimilarity], result of:
          0.070648916 = score(doc=5428,freq=2.0), product of:
            0.18260197 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052144732 = queryNorm
            0.38690117 = fieldWeight in 5428, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=5428)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: c't. 2000, H.22, S.220-229

Lezius, W.; Rapp, R.; Wettler, M.: ¬A morphology-system and part-of-speech tagger for German (1996) 0.02

0.017662229 = product of:
  0.035324458 = sum of:
    0.035324458 = product of:
      0.070648916 = sum of:
        0.070648916 = weight(_text_:22 in 1693) [ClassicSimilarity], result of:
          0.070648916 = score(doc=1693,freq=2.0), product of:
            0.18260197 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052144732 = queryNorm
            0.38690117 = fieldWeight in 1693, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=1693)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 3.2015 9:37:18

Gill, A.J.; Hinrichs-Krapels, S.; Blanke, T.; Grant, J.; Hedges, M.; Tanner, S.: Insight workflow : systematically combining human and computational methods to explore textual data (2017) 0.02
```
0.017637566 = product of:
  0.03527513 = sum of:
    0.03527513 = product of:
      0.07055026 = sum of:
        0.07055026 = weight(_text_:data in 3682) [ClassicSimilarity], result of:
          0.07055026 = score(doc=3682,freq=12.0), product of:
            0.16488427 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.052144732 = queryNorm
            0.4278775 = fieldWeight in 3682, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3682)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Analyzing large quantities of real-world textual data has the potential to provide new insights for researchers. However, such data present challenges for both human and computational methods, requiring a diverse range of specialist skills, often shared across a number of individuals. In this paper we use the analysis of a real-world data set as our case study, and use this exploration as a demonstration of our "insight workflow," which we present for use and adaptation by other researchers. The data we use are impact case study documents collected as part of the UK Research Excellence Framework (REF), consisting of 6,679 documents and 6.25 million words; the analysis was commissioned by the Higher Education Funding Council for England (published as report HEFCE 2015). In our exploration and analysis we used a variety of techniques, ranging from keyword in context and frequency information to more sophisticated methods (topic modeling), with these automated techniques providing an empirical point of entry for in-depth and intensive human analysis. We present the 60 topics to demonstrate the output of our methods, and illustrate how the variety of analysis techniques can be combined to provide insights. We note potential limitations and propose future work.

Theme

Data Mining

Search (106 results, page 1 of 6)

Authors

Years

Languages

Types

Themes