Search (1 results, page 1 of 1)

  • × author_ss:"Ibekwe-SanJuan, F."
  • × theme_ss:"Automatisches Klassifizieren"
  1. Ibekwe-SanJuan, F.; SanJuan, E.: From term variants to research topics (2002) 0.01
    0.014326988 = product of:
      0.05730795 = sum of:
        0.05730795 = weight(_text_:evolution in 1853) [ClassicSimilarity], result of:
          0.05730795 = score(doc=1853,freq=2.0), product of:
            0.19585751 = queryWeight, product of:
              5.29663 = idf(docFreq=601, maxDocs=44218)
              0.03697776 = queryNorm
            0.2926002 = fieldWeight in 1853, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.29663 = idf(docFreq=601, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1853)
      0.25 = coord(1/4)
    
    Abstract
    In a scientific and technological watch (STW) task, an expert user needs to survey the evolution of research topics in his area of specialisation in order to detect interesting changes. The majority of methods proposing evaluation metrics (bibliometrics and scientometrics studies) for STW rely solely an statistical data analysis methods (Co-citation analysis, co-word analysis). Such methods usually work an structured databases where the units of analysis (words, keywords) are already attributed to documents by human indexers. The advent of huge amounts of unstructured textual data has rendered necessary the integration of natural language processing (NLP) techniques to first extract meaningful units from texts. We propose a method for STW which is NLP-oriented. The method not only analyses texts linguistically in order to extract terms from them, but also uses linguistic relations (syntactic variations) as the basis for clustering. Terms and variation relations are formalised as weighted di-graphs which the clustering algorithm, CPCL (Classification by Preferential Clustered Link) will seek to reduce in order to produces classes. These classes ideally represent the research topics present in the corpus. The results of the classification are subjected to validation by an expert in STW.