Search (34 results, page 1 of 2)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.19

0.18589748 = sum of:
  0.08280347 = product of:
    0.2484104 = sum of:
      0.2484104 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
        0.2484104 = score(doc=562,freq=2.0), product of:
          0.4419972 = queryWeight, product of:
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.05213454 = queryNorm
          0.56201804 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
    0.33333334 = coord(1/3)
  0.10309401 = sum of:
    0.060712952 = weight(_text_:classification in 562) [ClassicSimilarity], result of:
      0.060712952 = score(doc=562,freq=6.0), product of:
        0.16603322 = queryWeight, product of:
          3.1847067 = idf(docFreq=4974, maxDocs=44218)
          0.05213454 = queryNorm
        0.3656675 = fieldWeight in 562, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.1847067 = idf(docFreq=4974, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.04238106 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
      0.04238106 = score(doc=562,freq=2.0), product of:
        0.18256627 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.05213454 = queryNorm
        0.23214069 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)

Abstract: Document representations for text classification are typically based on the classical Bag-Of-Words paradigm. This approach comes with deficiencies that motivate the integration of features on a higher semantic level than single words. In this paper we propose an enhancement of the classical document representation through concepts extracted from background knowledge. Boosting is used for actual classification. Experimental evaluations on two well known text corpora support our approach through consistent improvement of the results.
Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Peng, F.; Huang, X.: Machine learning for Asian language text classification (2007) 0.02
```
0.023092953 = product of:
  0.046185907 = sum of:
    0.046185907 = product of:
      0.092371814 = sum of:
        0.092371814 = weight(_text_:classification in 831) [ClassicSimilarity], result of:
          0.092371814 = score(doc=831,freq=20.0), product of:
            0.16603322 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.05213454 = queryNorm
            0.55634534 = fieldWeight in 831, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0390625 = fieldNorm(doc=831)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Purpose - The purpose of this research is to compare several machine learning techniques on the task of Asian language text classification, such as Chinese and Japanese where no word boundary information is available in written text. The paper advocates a simple language modeling based approach for this task. Design/methodology/approach - Naïve Bayes, maximum entropy model, support vector machines, and language modeling approaches were implemented and were applied to Chinese and Japanese text classification. To investigate the influence of word segmentation, different word segmentation approaches were investigated and applied to Chinese text. A segmentation-based approach was compared with the non-segmentation-based approach. Findings - There were two findings: the experiments show that statistical language modeling can significantly outperform standard techniques, given the same set of features; and it was found that classification with word level features normally yields improved classification performance, but that classification performance is not monotonically related to segmentation accuracy. In particular, classification performance may initially improve with increased segmentation accuracy, but eventually classification performance stops improving, and can in fact even decrease, after a certain level of segmentation accuracy. Practical implications - Apply the findings to real web text classification is ongoing work. Originality/value - The paper is very relevant to Chinese and Japanese information processing, e.g. webpage classification, web search.
Goller, C.; Löning, J.; Will, T.; Wolff, W.: Automatic document classification : a thourough evaluation of various methods (2000) 0.02
```
0.02146527 = product of:
  0.04293054 = sum of:
    0.04293054 = product of:
      0.08586108 = sum of:
        0.08586108 = weight(_text_:classification in 5480) [ClassicSimilarity], result of:
          0.08586108 = score(doc=5480,freq=12.0), product of:
            0.16603322 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.05213454 = queryNorm
            0.5171319 = fieldWeight in 5480, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=5480)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

(Automatic) document classification is generally defined as content-based assignment of one or more predefined categories to documents. Usually, machine learning, statistical pattern recognition, or neural network approaches are used to construct classifiers automatically. In this paper we thoroughly evaluate a wide variety of these methods on a document classification task for German text. We evaluate different feature construction and selection methods and various classifiers. Our main results are: (1) feature selection is necessary not only to reduce learning and classification time, but also to avoid overfitting (even for Support Vector Machines); (2) surprisingly, our morphological analysis does not improve classification quality compared to a letter 5-gram approach; (3) Support Vector Machines are significantly better than all other classification methods

Boleda, G.; Evert, S.: Multiword expressions : a pain in the neck of lexical semantics (2009) 0.02

0.02119053 = product of:
  0.04238106 = sum of:
    0.04238106 = product of:
      0.08476212 = sum of:
        0.08476212 = weight(_text_:22 in 4888) [ClassicSimilarity], result of:
          0.08476212 = score(doc=4888,freq=2.0), product of:
            0.18256627 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05213454 = queryNorm
            0.46428138 = fieldWeight in 4888, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=4888)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 1. 3.2013 14:56:22

Monnerjahn, P.: Vorsprung ohne Technik : Übersetzen: Computer und Qualität (2000) 0.02

0.02119053 = product of:
  0.04238106 = sum of:
    0.04238106 = product of:
      0.08476212 = sum of:
        0.08476212 = weight(_text_:22 in 5429) [ClassicSimilarity], result of:
          0.08476212 = score(doc=5429,freq=2.0), product of:
            0.18256627 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05213454 = queryNorm
            0.46428138 = fieldWeight in 5429, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=5429)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: c't. 2000, H.22, S.230-231

Kuhlmann, U.; Monnerjahn, P.: Sprache auf Knopfdruck : Sieben automatische Übersetzungsprogramme im Test (2000) 0.02

0.017658776 = product of:
  0.03531755 = sum of:
    0.03531755 = product of:
      0.0706351 = sum of:
        0.0706351 = weight(_text_:22 in 5428) [ClassicSimilarity], result of:
          0.0706351 = score(doc=5428,freq=2.0), product of:
            0.18256627 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05213454 = queryNorm
            0.38690117 = fieldWeight in 5428, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=5428)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: c't. 2000, H.22, S.220-229

Hull, D.; Ait-Mokhtar, S.; Chuat, M.; Eisele, A.; Gaussier, E.; Grefenstette, G.; Isabelle, P.; Samulesson, C.; Segand, F.: Language technologies and patent search and classification (2001) 0.02

0.01752632 = product of:
  0.03505264 = sum of:
    0.03505264 = product of:
      0.07010528 = sum of:
        0.07010528 = weight(_text_:classification in 6318) [ClassicSimilarity], result of:
          0.07010528 = score(doc=6318,freq=2.0), product of:
            0.16603322 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.05213454 = queryNorm
            0.42223644 = fieldWeight in 6318, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.09375 = fieldNorm(doc=6318)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Argamon, S.; Whitelaw, C.; Chase, P.; Hota, S.R.; Garg, N.; Levitan, S.: Stylistic text classification using functional lexical features (2007) 0.02
```
0.015178238 = product of:
  0.030356476 = sum of:
    0.030356476 = product of:
      0.060712952 = sum of:
        0.060712952 = weight(_text_:classification in 280) [ClassicSimilarity], result of:
          0.060712952 = score(doc=280,freq=6.0), product of:
            0.16603322 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.05213454 = queryNorm
            0.3656675 = fieldWeight in 280, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=280)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Most text analysis and retrieval work to date has focused on the topic of a text; that is, what it is about. However, a text also contains much useful information in its style, or how it is written. This includes information about its author, its purpose, feelings it is meant to evoke, and more. This article develops a new type of lexical feature for use in stylistic text classification, based on taxonomies of various semantic functions of certain choice words or phrases. We demonstrate the usefulness of such features for the stylistic text classification tasks of determining author identity and nationality, the gender of literary characters, a text's sentiment (positive/ negative evaluation), and the rhetorical character of scientific journal articles. We further show how the use of functional features aids in gaining insight about stylistic differences among different kinds of texts.

Ruiz, M.E.; Srinivasan, P.: Combining machine learning and hierarchical indexing structures for text categorization (2001) 0.01

0.014458476 = product of:
  0.028916951 = sum of:
    0.028916951 = product of:
      0.057833903 = sum of:
        0.057833903 = weight(_text_:classification in 1595) [ClassicSimilarity], result of:
          0.057833903 = score(doc=1595,freq=4.0), product of:
            0.16603322 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.05213454 = queryNorm
            0.34832728 = fieldWeight in 1595, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1595)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Advances in classification research, vol.10: proceedings of the 10th ASIS SIG/CR Classification Research Workshop. Ed.: Albrechtsen, H. u. J.E. Mai

Doszkocs, T.E.; Zamora, A.: Dictionary services and spelling aids for Web searching (2004) 0.01

0.012486639 = product of:
  0.024973279 = sum of:
    0.024973279 = product of:
      0.049946558 = sum of:
        0.049946558 = weight(_text_:22 in 2541) [ClassicSimilarity], result of:
          0.049946558 = score(doc=2541,freq=4.0), product of:
            0.18256627 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05213454 = queryNorm
            0.27358043 = fieldWeight in 2541, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2541)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 14. 8.2004 17:22:56
Source: Online. 28(2004) no.3, S.22-29

Martínez, F.; Martín, M.T.; Rivas, V.M.; Díaz, M.C.; Ureña, L.A.: Using neural networks for multiword recognition in IR (2003) 0.01
```
0.012392979 = product of:
  0.024785958 = sum of:
    0.024785958 = product of:
      0.049571916 = sum of:
        0.049571916 = weight(_text_:classification in 2777) [ClassicSimilarity], result of:
          0.049571916 = score(doc=2777,freq=4.0), product of:
            0.16603322 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.05213454 = queryNorm
            0.29856625 = fieldWeight in 2777, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=2777)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

In this paper, a supervised neural network has been used to classify pairs of terms as being multiwords or non-multiwords. Classification is based an the values yielded by different estimators, currently available in literature, used as inputs for the neural network. Lists of multiwords and non-multiwords have been built to train the net. Afterward, many other pairs of terms have been classified using the trained net. Results obtained in this classification have been used to perform information retrieval tasks. Experiments show that detecting multiwords results in better performance of the IR methods.
Kettunen, K.: Reductive and generative approaches to management of morphological variation of keywords in monolingual information retrieval : an overview (2009) 0.01
```
0.012392979 = product of:
  0.024785958 = sum of:
    0.024785958 = product of:
      0.049571916 = sum of:
        0.049571916 = weight(_text_:classification in 2835) [ClassicSimilarity], result of:
          0.049571916 = score(doc=2835,freq=4.0), product of:
            0.16603322 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.05213454 = queryNorm
            0.29856625 = fieldWeight in 2835, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=2835)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Purpose - The purpose of this article is to discuss advantages and disadvantages of various means to manage morphological variation of keywords in monolingual information retrieval. Design/methodology/approach - The authors present a compilation of query results from 11 mostly European languages and a new general classification of the language dependent techniques for management of morphological variation. Variants of the different techniques are compared in some detail in terms of retrieval effectiveness and other criteria. The paper consists mainly of an overview of different management methods for keyword variation in information retrieval. Typical IR retrieval results of 11 languages and a new classification for keyword management methods are also presented. Findings - The main results of the paper are an overall comparison of reductive and generative keyword management methods in terms of retrieval effectiveness and other broader criteria. Originality/value - The paper is of value to anyone who wants to get an overall picture of keyword management techniques used in IR.

Hammwöhner, R.: TransRouter revisited : Decision support in the routing of translation projects (2000) 0.01

0.012361143 = product of:
  0.024722286 = sum of:
    0.024722286 = product of:
      0.04944457 = sum of:
        0.04944457 = weight(_text_:22 in 5483) [ClassicSimilarity], result of:
          0.04944457 = score(doc=5483,freq=2.0), product of:
            0.18256627 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05213454 = queryNorm
            0.2708308 = fieldWeight in 5483, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5483)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 10.12.2000 18:22:35

Schneider, J.W.; Borlund, P.: ¬A bibliometric-based semiautomatic approach to identification of candidate thesaurus terms : parsing and filtering of noun phrases from citation contexts (2005) 0.01

0.012361143 = product of:
  0.024722286 = sum of:
    0.024722286 = product of:
      0.04944457 = sum of:
        0.04944457 = weight(_text_:22 in 156) [ClassicSimilarity], result of:
          0.04944457 = score(doc=156,freq=2.0), product of:
            0.18256627 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05213454 = queryNorm
            0.2708308 = fieldWeight in 156, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=156)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 8. 3.2007 19:55:22

Paolillo, J.C.: Linguistics and the information sciences (2009) 0.01

0.012361143 = product of:
  0.024722286 = sum of:
    0.024722286 = product of:
      0.04944457 = sum of:
        0.04944457 = weight(_text_:22 in 3840) [ClassicSimilarity], result of:
          0.04944457 = score(doc=3840,freq=2.0), product of:
            0.18256627 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05213454 = queryNorm
            0.2708308 = fieldWeight in 3840, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3840)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 27. 8.2011 14:22:33

Schneider, R.: Web 3.0 ante portas? : Integration von Social Web und Semantic Web (2008) 0.01

0.012361143 = product of:
  0.024722286 = sum of:
    0.024722286 = product of:
      0.04944457 = sum of:
        0.04944457 = weight(_text_:22 in 4184) [ClassicSimilarity], result of:
          0.04944457 = score(doc=4184,freq=2.0), product of:
            0.18256627 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05213454 = queryNorm
            0.2708308 = fieldWeight in 4184, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4184)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 1.2011 10:38:28

Stock, W.G.: Textwortmethode : Norbert Henrichs zum 65. (3) (2000) 0.01
```
0.011684213 = product of:
  0.023368426 = sum of:
    0.023368426 = product of:
      0.04673685 = sum of:
        0.04673685 = weight(_text_:classification in 4891) [ClassicSimilarity], result of:
          0.04673685 = score(doc=4891,freq=2.0), product of:
            0.16603322 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.05213454 = queryNorm
            0.28149095 = fieldWeight in 4891, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0625 = fieldNorm(doc=4891)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Nur wenige Dokumentationsmethoden werden mit dem Namen ihrer Entwickler assoziiert. Ausnahmen sind Melvil Dewey (DDC), S.R. Ranganathan (Colon Classification) - und Norbert Henrichs. Seine Textwortmethode ermöglicht die Indexierung und das Retrieval von Literatur aus Fachgebieten, die keine allseits akzeptierte Fachterminologie vorweisen, also viele Sozial- und Geisteswissenschaften, vorneweg die Philosophie. Für den Einsatz in der elektronischen Philosophie-Dokumentation hat Henrichs in den späten sechziger Jahren die Textwortmethode entworfen. Er ist damit nicht nur einer der Pioniere der Anwendung der elektronischen Datenverarbeitung in der Informationspraxis, sondern auch der Pionier bei der Dokumentation terminologisch nicht starrer Fachsprachen

Bian, G.-W.; Chen, H.-H.: Cross-language information access to multilingual collections on the Internet (2000) 0.01

0.010595265 = product of:
  0.02119053 = sum of:
    0.02119053 = product of:
      0.04238106 = sum of:
        0.04238106 = weight(_text_:22 in 4436) [ClassicSimilarity], result of:
          0.04238106 = score(doc=4436,freq=2.0), product of:
            0.18256627 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05213454 = queryNorm
            0.23214069 = fieldWeight in 4436, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=4436)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 16. 2.2000 14:22:39

Lorenz, S.: Konzeption und prototypische Realisierung einer begriffsbasierten Texterschließung (2006) 0.01

0.010595265 = product of:
  0.02119053 = sum of:
    0.02119053 = product of:
      0.04238106 = sum of:
        0.04238106 = weight(_text_:22 in 1746) [ClassicSimilarity], result of:
          0.04238106 = score(doc=1746,freq=2.0), product of:
            0.18256627 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05213454 = queryNorm
            0.23214069 = fieldWeight in 1746, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=1746)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 3.2015 9:17:30

Ibekwe-SanJuan, F.; SanJuan, E.: From term variants to research topics (2002) 0.01
```
0.010327483 = product of:
  0.020654965 = sum of:
    0.020654965 = product of:
      0.04130993 = sum of:
        0.04130993 = weight(_text_:classification in 1853) [ClassicSimilarity], result of:
          0.04130993 = score(doc=1853,freq=4.0), product of:
            0.16603322 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.05213454 = queryNorm
            0.24880521 = fieldWeight in 1853, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1853)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

In a scientific and technological watch (STW) task, an expert user needs to survey the evolution of research topics in his area of specialisation in order to detect interesting changes. The majority of methods proposing evaluation metrics (bibliometrics and scientometrics studies) for STW rely solely an statistical data analysis methods (Co-citation analysis, co-word analysis). Such methods usually work an structured databases where the units of analysis (words, keywords) are already attributed to documents by human indexers. The advent of huge amounts of unstructured textual data has rendered necessary the integration of natural language processing (NLP) techniques to first extract meaningful units from texts. We propose a method for STW which is NLP-oriented. The method not only analyses texts linguistically in order to extract terms from them, but also uses linguistic relations (syntactic variations) as the basis for clustering. Terms and variation relations are formalised as weighted di-graphs which the clustering algorithm, CPCL (Classification by Preferential Clustered Link) will seek to reduce in order to produces classes. These classes ideally represent the research topics present in the corpus. The results of the classification are subjected to validation by an expert in STW.

Search (34 results, page 1 of 2)

Authors

Languages

Types

Themes

Subjects