Search (13 results, page 1 of 1)

Hlava, M.M.K.: Automatic indexing : comparing rule-based and statistics-based indexing systems (2005) 0.09

0.08760825 = product of:
  0.1752165 = sum of:
    0.1752165 = sum of:
      0.07623341 = weight(_text_:systems in 6265) [ClassicSimilarity], result of:
        0.07623341 = score(doc=6265,freq=2.0), product of:
          0.16037072 = queryWeight, product of:
            3.0731742 = idf(docFreq=5561, maxDocs=44218)
            0.052184064 = queryNorm
          0.47535738 = fieldWeight in 6265, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.0731742 = idf(docFreq=5561, maxDocs=44218)
            0.109375 = fieldNorm(doc=6265)
      0.09898309 = weight(_text_:22 in 6265) [ClassicSimilarity], result of:
        0.09898309 = score(doc=6265,freq=2.0), product of:
          0.1827397 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.052184064 = queryNorm
          0.5416616 = fieldWeight in 6265, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.109375 = fieldNorm(doc=6265)
  0.5 = coord(1/2)

Source: Information outlook. 9(2005) no.8, S.22-23

Lepsky, K.; Vorhauer, J.: Lingo - ein open source System für die Automatische Indexierung deutschsprachiger Dokumente (2006) 0.05

0.050061855 = product of:
  0.10012371 = sum of:
    0.10012371 = sum of:
      0.043561947 = weight(_text_:systems in 3581) [ClassicSimilarity], result of:
        0.043561947 = score(doc=3581,freq=2.0), product of:
          0.16037072 = queryWeight, product of:
            3.0731742 = idf(docFreq=5561, maxDocs=44218)
            0.052184064 = queryNorm
          0.2716328 = fieldWeight in 3581, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.0731742 = idf(docFreq=5561, maxDocs=44218)
            0.0625 = fieldNorm(doc=3581)
      0.056561764 = weight(_text_:22 in 3581) [ClassicSimilarity], result of:
        0.056561764 = score(doc=3581,freq=2.0), product of:
          0.1827397 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.052184064 = queryNorm
          0.30952093 = fieldWeight in 3581, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0625 = fieldNorm(doc=3581)
  0.5 = coord(1/2)

Abstract: Lingo ist ein frei verfügbares System (open source) zur automatischen Indexierung der deutschen Sprache. Bei der Entwicklung von lingo standen hohe Konfigurierbarkeit und Flexibilität des Systems für unterschiedliche Einsatzmöglichkeiten im Vordergrund. Der Beitrag zeigt den Nutzen einer linguistisch basierten automatischen Indexierung für das Information Retrieval auf. Die für eine Retrievalverbesserung zur Verfügung stehende linguistische Funktionalität von lingo wird vorgestellt und an Beispielen erläutert: Grundformerkennung, Kompositumerkennung bzw. Kompositumzerlegung, Wortrelationierung, lexikalische und algorithmische Mehrwortgruppenerkennung, OCR-Fehlerkorrektur. Der offene Systemaufbau von lingo wird beschrieben, mögliche Einsatzszenarien und Anwendungsgrenzen werden benannt.
Date: 24. 3.2006 12:22:02

Hauer, M.: Automatische Indexierung (2000) 0.02

0.021210661 = product of:
  0.042421322 = sum of:
    0.042421322 = product of:
      0.084842645 = sum of:
        0.084842645 = weight(_text_:22 in 5887) [ClassicSimilarity], result of:
          0.084842645 = score(doc=5887,freq=2.0), product of:
            0.1827397 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052184064 = queryNorm
            0.46428138 = fieldWeight in 5887, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=5887)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Wissen in Aktion: Wege des Knowledge Managements. 22. Online-Tagung der DGI, Frankfurt am Main, 2.-4.5.2000. Proceedings. Hrsg.: R. Schmidt

Probst, M.; Mittelbach, J.: Maschinelle Indexierung in der Sacherschließung wissenschaftlicher Bibliotheken (2006) 0.01

0.014140441 = product of:
  0.028280882 = sum of:
    0.028280882 = product of:
      0.056561764 = sum of:
        0.056561764 = weight(_text_:22 in 1755) [ClassicSimilarity], result of:
          0.056561764 = score(doc=1755,freq=2.0), product of:
            0.1827397 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052184064 = queryNorm
            0.30952093 = fieldWeight in 1755, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=1755)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 3.2008 12:35:19

Renz, M.: Automatische Inhaltserschließung im Zeichen von Wissensmanagement (2001) 0.01

0.012372886 = product of:
  0.024745772 = sum of:
    0.024745772 = product of:
      0.049491543 = sum of:
        0.049491543 = weight(_text_:22 in 5671) [ClassicSimilarity], result of:
          0.049491543 = score(doc=5671,freq=2.0), product of:
            0.1827397 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052184064 = queryNorm
            0.2708308 = fieldWeight in 5671, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5671)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 3.2001 13:14:48

Newman, D.J.; Block, S.: Probabilistic topic decomposition of an eighteenth-century American newspaper (2006) 0.01

0.012372886 = product of:
  0.024745772 = sum of:
    0.024745772 = product of:
      0.049491543 = sum of:
        0.049491543 = weight(_text_:22 in 5291) [ClassicSimilarity], result of:
          0.049491543 = score(doc=5291,freq=2.0), product of:
            0.1827397 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052184064 = queryNorm
            0.2708308 = fieldWeight in 5291, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5291)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 7.2006 17:32:00

Tsai, C.-F.; McGarry, K.; Tait, J.: Qualitative evaluation of automatic assignment of keywords to images (2006) 0.01
```
0.011789299 = product of:
  0.023578597 = sum of:
    0.023578597 = product of:
      0.047157194 = sum of:
        0.047157194 = weight(_text_:systems in 963) [ClassicSimilarity], result of:
          0.047157194 = score(doc=963,freq=6.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.29405114 = fieldWeight in 963, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.0390625 = fieldNorm(doc=963)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

In image retrieval, most systems lack user-centred evaluation since they are assessed by some chosen ground truth dataset. The results reported through precision and recall assessed against the ground truth are thought of as being an acceptable surrogate for the judgment of real users. Much current research focuses on automatically assigning keywords to images for enhancing retrieval effectiveness. However, evaluation methods are usually based on system-level assessment, e.g. classification accuracy based on some chosen ground truth dataset. In this paper, we present a qualitative evaluation methodology for automatic image indexing systems. The automatic indexing task is formulated as one of image annotation, or automatic metadata generation for images. The evaluation is composed of two individual methods. First, the automatic indexing annotation results are assessed by human subjects. Second, the subjects are asked to annotate some chosen images as the test set whose annotations are used as ground truth. Then, the system is tested by the test set whose annotation results are judged against the ground truth. Only one of these methods is reported for most systems on which user-centred evaluation are conducted. We believe that both methods need to be considered for full evaluation. We also provide an example evaluation of our system based on this methodology. According to this study, our proposed evaluation methodology is able to provide deeper understanding of the system's performance.

Lorenz, S.: Konzeption und prototypische Realisierung einer begriffsbasierten Texterschließung (2006) 0.01

0.010605331 = product of:
  0.021210661 = sum of:
    0.021210661 = product of:
      0.042421322 = sum of:
        0.042421322 = weight(_text_:22 in 1746) [ClassicSimilarity], result of:
          0.042421322 = score(doc=1746,freq=2.0), product of:
            0.1827397 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052184064 = queryNorm
            0.23214069 = fieldWeight in 1746, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=1746)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 3.2015 9:17:30

Jones, S.; Paynter, G.W.: Automatic extractionof document keyphrases for use in digital libraries : evaluations and applications (2002) 0.01
```
0.009625921 = product of:
  0.019251842 = sum of:
    0.019251842 = product of:
      0.038503684 = sum of:
        0.038503684 = weight(_text_:systems in 601) [ClassicSimilarity], result of:
          0.038503684 = score(doc=601,freq=4.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.24009174 = fieldWeight in 601, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.0390625 = fieldNorm(doc=601)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

This article describes an evaluation of the Kea automatic keyphrase extraction algorithm. Document keyphrases are conventionally used as concise descriptors of document content, and are increasingly used in novel ways, including document clustering, searching and browsing interfaces, and retrieval engines. However, it is costly and time consuming to manually assign keyphrases to documents, motivating the development of tools that automatically perform this function. Previous studies have evaluated Kea's performance by measuring its ability to identify author keywords and keyphrases, but this methodology has a number of well-known limitations. The results presented in this article are based on evaluations by human assessors of the quality and appropriateness of Kea keyphrases. The results indicate that, in general, Kea produces keyphrases that are rated positively by human assessors. However, typical Kea settings can degrade performance, particularly those relating to keyphrase length and domain specificity. We found that for some settings, Kea's performance is better than that of similar systems, and that Kea's ranking of extracted keyphrases is effective. We also determined that author-specified keyphrases appear to exhibit an inherent ranking, and that they are rated highly and therefore suitable for use in training and evaluation of automatic keyphrasing systems.
Humphrey, S.M.; Névéol, A.; Browne, A.; Gobeil, J.; Ruch, P.; Darmoni, S.J.: Comparing a rule-based versus statistical system for automatic categorization of MEDLINE documents according to biomedical specialty (2009) 0.01
```
0.009625921 = product of:
  0.019251842 = sum of:
    0.019251842 = product of:
      0.038503684 = sum of:
        0.038503684 = weight(_text_:systems in 3300) [ClassicSimilarity], result of:
          0.038503684 = score(doc=3300,freq=4.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.24009174 = fieldWeight in 3300, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3300)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Automatic document categorization is an important research problem in Information Science and Natural Language Processing. Many applications, including, Word Sense Disambiguation and Information Retrieval in large collections, can benefit from such categorization. This paper focuses on automatic categorization of documents from the biomedical literature into broad discipline-based categories. Two different systems are described and contrasted: CISMeF, which uses rules based on human indexing of the documents by the Medical Subject Headings (MeSH) controlled vocabulary in order to assign metaterms (MTs), and Journal Descriptor Indexing (JDI), based on human categorization of about 4,000 journals and statistical associations between journal descriptors (JDs) and textwords in the documents. We evaluate and compare the performance of these systems against a gold standard of humanly assigned categories for 100 MEDLINE documents, using six measures selected from trec_eval. The results show that for five of the measures performance is comparable, and for one measure JDI is superior. We conclude that these results favor JDI, given the significantly greater intellectual overhead involved in human indexing and maintaining a rule base for mapping MeSH terms to MTs. We also note a JDI method that associates JDs with MeSH indexing rather than textwords, and it may be worthwhile to investigate whether this JDI method (statistical) and CISMeF (rule-based) might be combined and then evaluated showing they are complementary to one another.
Pirkola, A.: Morphological typology of languages for IR (2001) 0.01
```
0.008167865 = product of:
  0.01633573 = sum of:
    0.01633573 = product of:
      0.03267146 = sum of:
        0.03267146 = weight(_text_:systems in 4476) [ClassicSimilarity], result of:
          0.03267146 = score(doc=4476,freq=2.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.2037246 = fieldWeight in 4476, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.046875 = fieldNorm(doc=4476)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

This paper presents a morphological classification of languages from the IR perspective. Linguistic typology research has shown that the morphological complexity of every language in the world can be described by two variables, index of synthesis and index of fusion. These variables provide a theoretical basis for IR research handling morphological issues. A common theoretical framework is needed in particular because of the increasing significance of cross-language retrieval research and CLIR systems processing different languages. The paper elaborates the linguistic morphological typology for the purposes of IR research. It studies how the indexes of synthesis and fusion could be used as practical tools in mono- and cross-lingual IR research. The need for semantic and syntactic typologies is discussed. The paper also reviews studies made in different languages on the effects of morphology and stemming in IR.
Snajder, J.; Dalbelo Basic, B.D.; Tadic, M.: Automatic acquisition of inflectional lexica for morphological normalisation (2008) 0.01
```
0.008167865 = product of:
  0.01633573 = sum of:
    0.01633573 = product of:
      0.03267146 = sum of:
        0.03267146 = weight(_text_:systems in 2910) [ClassicSimilarity], result of:
          0.03267146 = score(doc=2910,freq=2.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.2037246 = fieldWeight in 2910, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.046875 = fieldNorm(doc=2910)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Due to natural language morphology, words can take on various morphological forms. Morphological normalisation - often used in information retrieval and text mining systems - conflates morphological variants of a word to a single representative form. In this paper, we describe an approach to lexicon-based inflectional normalisation. This approach is in between stemming and lemmatisation, and is suitable for morphological normalisation of inflectionally complex languages. To eliminate the immense effort required to compile the lexicon by hand, we focus on the problem of acquiring automatically an inflectional morphological lexicon from raw corpora. We propose a convenient and highly expressive morphology representation formalism on which the acquisition procedure is based. Our approach is applied to the morphologically complex Croatian language, but it should be equally applicable to other languages of similar morphological complexity. Experimental results show that our approach can be used to acquire a lexicon whose linguistic quality allows for rather good normalisation performance.

Nohr, H.: Grundlagen der automatischen Indexierung : ein Lehrbuch (2003) 0.01

0.0070702205 = product of:
  0.014140441 = sum of:
    0.014140441 = product of:
      0.028280882 = sum of:
        0.028280882 = weight(_text_:22 in 1767) [ClassicSimilarity], result of:
          0.028280882 = score(doc=1767,freq=2.0), product of:
            0.1827397 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052184064 = queryNorm
            0.15476047 = fieldWeight in 1767, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=1767)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 6.2009 12:46:51

Search (13 results, page 1 of 1)

Authors

Languages

Types

Themes