Search (2 results, page 1 of 1)

  • × theme_ss:"Automatisches Indexieren"
  • × theme_ss:"Automatisches Klassifizieren"
  1. Chung, Y.M.; Lee, J.Y.: ¬A corpus-based approach to comparative evaluation of statistical term association measures (2001) 0.02
    0.024084264 = product of:
      0.04816853 = sum of:
        0.04816853 = product of:
          0.09633706 = sum of:
            0.09633706 = weight(_text_:y in 5769) [ClassicSimilarity], result of:
              0.09633706 = score(doc=5769,freq=4.0), product of:
                0.25623685 = queryWeight, product of:
                  4.8124003 = idf(docFreq=976, maxDocs=44218)
                  0.053245123 = queryNorm
                0.37596878 = fieldWeight in 5769, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.8124003 = idf(docFreq=976, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5769)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Statistical association measures have been widely applied in information retrieval research, usually employing a clustering of documents or terms on the basis of their relationships. Applications of the association measures for term clustering include automatic thesaurus construction and query expansion. This research evaluates the similarity of six association measures by comparing the relationship and behavior they demonstrate in various analyses of a test corpus. Analysis techniques include comparisons of highly ranked term pairs and term clusters, analyses of the correlation among the association measures using Pearson's correlation coefficient and MDS mapping, and an analysis of the impact of a term frequency on the association values by means of z-score. The major findings of the study are as follows: First, the most similar association measures are mutual information and Yule's coefficient of colligation Y, whereas cosine and Jaccard coefficients, as well as X**2 statistic and likelihood ratio, demonstrate quite similar behavior for terms with high frequency. Second, among all the measures, the X**2 statistic is the least affected by the frequency of terms. Third, although cosine and Jaccard coefficients tend to emphasize high frequency terms, mutual information and Yule's Y seem to overestimate rare terms
  2. Smiraglia, R.P.; Cai, X.: Tracking the evolution of clustering, machine learning, automatic indexing and automatic classification in knowledge organization (2017) 0.01
    0.0076413243 = product of:
      0.015282649 = sum of:
        0.015282649 = product of:
          0.061130594 = sum of:
            0.061130594 = weight(_text_:authors in 3627) [ClassicSimilarity], result of:
              0.061130594 = score(doc=3627,freq=2.0), product of:
                0.24273461 = queryWeight, product of:
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.053245123 = queryNorm
                0.25184128 = fieldWeight in 3627, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3627)
          0.25 = coord(1/4)
      0.5 = coord(1/2)
    
    Abstract
    A very important extension of the traditional domain of knowledge organization (KO) arises from attempts to incorporate techniques devised in the computer science domain for automatic concept extraction and for grouping, categorizing, clustering and otherwise organizing knowledge using mechanical means. Four specific terms have emerged to identify the most prevalent techniques: machine learning, clustering, automatic indexing, and automatic classification. Our study presents three domain analytical case analyses in search of answers. The first case relies on citations located using the ISKO-supported "Knowledge Organization Bibliography." The second case relies on works in both Web of Science and SCOPUS. Case three applies co-word analysis and citation analysis to the contents of the papers in the present special issue. We observe scholars involved in "clustering" and "automatic classification" who share common thematic emphases. But we have found no coherence, no common activity and no social semantics. We have not found a research front, or a common teleology within the KO domain. We also have found a lively group of authors who have succeeded in submitting papers to this special issue, and their work quite interestingly aligns with the case studies we report. There is an emphasis on KO for information retrieval; there is much work on clustering (which involves conceptual points within texts) and automatic classification (which involves semantic groupings at the meta-document level).