Search (4 results, page 1 of 1)

  • × theme_ss:"Automatisches Indexieren"
  • × theme_ss:"Automatisches Klassifizieren"
  1. Chung, Y.M.; Lee, J.Y.: ¬A corpus-based approach to comparative evaluation of statistical term association measures (2001) 0.23
    0.23399408 = product of:
      0.3119921 = sum of:
        0.005885557 = product of:
          0.023542227 = sum of:
            0.023542227 = weight(_text_:based in 5769) [ClassicSimilarity], result of:
              0.023542227 = score(doc=5769,freq=2.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.16644597 = fieldWeight in 5769, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5769)
          0.25 = coord(1/4)
        0.12624991 = weight(_text_:term in 5769) [ClassicSimilarity], result of:
          0.12624991 = score(doc=5769,freq=10.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.5763782 = fieldWeight in 5769, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5769)
        0.17985666 = weight(_text_:frequency in 5769) [ClassicSimilarity], result of:
          0.17985666 = score(doc=5769,freq=8.0), product of:
            0.27643865 = queryWeight, product of:
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.04694356 = queryNorm
            0.6506205 = fieldWeight in 5769, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5769)
      0.75 = coord(3/4)
    
    Abstract
    Statistical association measures have been widely applied in information retrieval research, usually employing a clustering of documents or terms on the basis of their relationships. Applications of the association measures for term clustering include automatic thesaurus construction and query expansion. This research evaluates the similarity of six association measures by comparing the relationship and behavior they demonstrate in various analyses of a test corpus. Analysis techniques include comparisons of highly ranked term pairs and term clusters, analyses of the correlation among the association measures using Pearson's correlation coefficient and MDS mapping, and an analysis of the impact of a term frequency on the association values by means of z-score. The major findings of the study are as follows: First, the most similar association measures are mutual information and Yule's coefficient of colligation Y, whereas cosine and Jaccard coefficients, as well as X**2 statistic and likelihood ratio, demonstrate quite similar behavior for terms with high frequency. Second, among all the measures, the X**2 statistic is the least affected by the frequency of terms. Third, although cosine and Jaccard coefficients tend to emphasize high frequency terms, mutual information and Yule's Y seem to overestimate rare terms
  2. Panyr, J.: Vektorraum-Modell und Clusteranalyse in Information-Retrieval-Systemen (1987) 0.02
    0.02258427 = product of:
      0.09033708 = sum of:
        0.09033708 = weight(_text_:term in 2322) [ClassicSimilarity], result of:
          0.09033708 = score(doc=2322,freq=2.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.41242266 = fieldWeight in 2322, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.0625 = fieldNorm(doc=2322)
      0.25 = coord(1/4)
    
    Abstract
    Ausgehend von theoretischen Indexierungsansätzen wird das klassische Vektorraum-Modell für automatische Indexierung (mit dem Trennschärfen-Modell) erläutert. Das Clustering in Information-Retrieval-Systemem wird als eine natürliche logische Folge aus diesem Modell aufgefaßt und in allen seinen Ausprägungen (d.h. als Dokumenten-, Term- oder Dokumenten- und Termklassifikation) behandelt. Anschließend werden die Suchstrategien in vorklassifizierten Dokumentenbeständen (Clustersuche) detailliert beschrieben. Zum Schluß wird noch die sinnvolle Anwendung der Clusteranalyse in Information-Retrieval-Systemen kurz diskutiert
  3. Golub, K.; Soergel, D.; Buchanan, G.; Tudhope, D.; Lykke, M.; Hiom, D.: ¬A framework for evaluating automatic indexing or classification in the context of retrieval (2016) 0.01
    0.009880973 = product of:
      0.039523892 = sum of:
        0.039523892 = product of:
          0.079047784 = sum of:
            0.079047784 = weight(_text_:assessment in 3311) [ClassicSimilarity], result of:
              0.079047784 = score(doc=3311,freq=2.0), product of:
                0.25917634 = queryWeight, product of:
                  5.52102 = idf(docFreq=480, maxDocs=44218)
                  0.04694356 = queryNorm
                0.30499613 = fieldWeight in 3311, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.52102 = idf(docFreq=480, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3311)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Tools for automatic subject assignment help deal with scale and sustainability in creating and enriching metadata, establishing more connections across and between resources and enhancing consistency. Although some software vendors and experimental researchers claim the tools can replace manual subject indexing, hard scientific evidence of their performance in operating information environments is scarce. A major reason for this is that research is usually conducted in laboratory conditions, excluding the complexities of real-life systems and situations. The article reviews and discusses issues with existing evaluation approaches such as problems of aboutness and relevance assessments, implying the need to use more than a single "gold standard" method when evaluating indexing and retrieval, and proposes a comprehensive evaluation framework. The framework is informed by a systematic review of the literature on evaluation approaches: evaluating indexing quality directly through assessment by an evaluator or through comparison with a gold standard, evaluating the quality of computer-assisted indexing directly in the context of an indexing workflow, and evaluating indexing quality indirectly through analyzing retrieval performance.
  4. Humphrey, S.M.; Névéol, A.; Browne, A.; Gobeil, J.; Ruch, P.; Darmoni, S.J.: Comparing a rule-based versus statistical system for automatic categorization of MEDLINE documents according to biomedical specialty (2009) 0.00
    0.0032901266 = product of:
      0.013160506 = sum of:
        0.013160506 = product of:
          0.052642025 = sum of:
            0.052642025 = weight(_text_:based in 3300) [ClassicSimilarity], result of:
              0.052642025 = score(doc=3300,freq=10.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.37218451 = fieldWeight in 3300, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3300)
          0.25 = coord(1/4)
      0.25 = coord(1/4)
    
    Abstract
    Automatic document categorization is an important research problem in Information Science and Natural Language Processing. Many applications, including, Word Sense Disambiguation and Information Retrieval in large collections, can benefit from such categorization. This paper focuses on automatic categorization of documents from the biomedical literature into broad discipline-based categories. Two different systems are described and contrasted: CISMeF, which uses rules based on human indexing of the documents by the Medical Subject Headings (MeSH) controlled vocabulary in order to assign metaterms (MTs), and Journal Descriptor Indexing (JDI), based on human categorization of about 4,000 journals and statistical associations between journal descriptors (JDs) and textwords in the documents. We evaluate and compare the performance of these systems against a gold standard of humanly assigned categories for 100 MEDLINE documents, using six measures selected from trec_eval. The results show that for five of the measures performance is comparable, and for one measure JDI is superior. We conclude that these results favor JDI, given the significantly greater intellectual overhead involved in human indexing and maintaining a rule base for mapping MeSH terms to MTs. We also note a JDI method that associates JDs with MeSH indexing rather than textwords, and it may be worthwhile to investigate whether this JDI method (statistical) and CISMeF (rule-based) might be combined and then evaluated showing they are complementary to one another.