Search (61 results, page 1 of 4)

Hlava, M.M.K.: Automatic indexing : comparing rule-based and statistics-based indexing systems (2005) 0.03

0.034760237 = product of:
  0.052140355 = sum of:
    0.009291277 = weight(_text_:a in 6265) [ClassicSimilarity], result of:
      0.009291277 = score(doc=6265,freq=2.0), product of:
        0.05209492 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.045180224 = queryNorm
        0.17835285 = fieldWeight in 6265, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.109375 = fieldNorm(doc=6265)
    0.04284908 = product of:
      0.08569816 = sum of:
        0.08569816 = weight(_text_:22 in 6265) [ClassicSimilarity], result of:
          0.08569816 = score(doc=6265,freq=2.0), product of:
            0.15821345 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045180224 = queryNorm
            0.5416616 = fieldWeight in 6265, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=6265)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Source: Information outlook. 9(2005) no.8, S.22-23
Type: a

Hauer, M.: Automatische Indexierung (2000) 0.03

0.029794488 = product of:
  0.04469173 = sum of:
    0.007963953 = weight(_text_:a in 5887) [ClassicSimilarity], result of:
      0.007963953 = score(doc=5887,freq=2.0), product of:
        0.05209492 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.045180224 = queryNorm
        0.15287387 = fieldWeight in 5887, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.09375 = fieldNorm(doc=5887)
    0.03672778 = product of:
      0.07345556 = sum of:
        0.07345556 = weight(_text_:22 in 5887) [ClassicSimilarity], result of:
          0.07345556 = score(doc=5887,freq=2.0), product of:
            0.15821345 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045180224 = queryNorm
            0.46428138 = fieldWeight in 5887, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=5887)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Source: Wissen in Aktion: Wege des Knowledge Managements. 22. Online-Tagung der DGI, Frankfurt am Main, 2.-4.5.2000. Proceedings. Hrsg.: R. Schmidt
Type: a

Galvez, C.; Moya-Anegón, F. de: ¬An evaluation of conflation accuracy using finite-state transducers (2006) 0.02
```
0.023747265 = product of:
  0.035620898 = sum of:
    0.007963953 = weight(_text_:a in 5599) [ClassicSimilarity], result of:
      0.007963953 = score(doc=5599,freq=8.0), product of:
        0.05209492 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.045180224 = queryNorm
        0.15287387 = fieldWeight in 5599, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=5599)
    0.027656946 = product of:
      0.055313893 = sum of:
        0.055313893 = weight(_text_:de in 5599) [ClassicSimilarity], result of:
          0.055313893 = score(doc=5599,freq=2.0), product of:
            0.19416152 = queryWeight, product of:
              4.297489 = idf(docFreq=1634, maxDocs=44218)
              0.045180224 = queryNorm
            0.28488597 = fieldWeight in 5599, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.297489 = idf(docFreq=1634, maxDocs=44218)
              0.046875 = fieldNorm(doc=5599)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Purpose - To evaluate the accuracy of conflation methods based on finite-state transducers (FSTs). Design/methodology/approach - Incorrectly lemmatized and stemmed forms may lead to the retrieval of inappropriate documents. Experimental studies to date have focused on retrieval performance, but very few on conflation performance. The process of normalization we used involved a linguistic toolbox that allowed us to construct, through graphic interfaces, electronic dictionaries represented internally by FSTs. The lexical resources developed were applied to a Spanish test corpus for merging term variants in canonical lemmatized forms. Conflation performance was evaluated in terms of an adaptation of recall and precision measures, based on accuracy and coverage, not actual retrieval. The results were compared with those obtained using a Spanish version of the Porter algorithm. Findings - The conclusion is that the main strength of lemmatization is its accuracy, whereas its main limitation is the underanalysis of variant forms. Originality/value - The report outlines the potential of transducers in their application to normalization processes.

Type

a

Newman, D.J.; Block, S.: Probabilistic topic decomposition of an eighteenth-century American newspaper (2006) 0.02

0.020477211 = product of:
  0.030715816 = sum of:
    0.009291277 = weight(_text_:a in 5291) [ClassicSimilarity], result of:
      0.009291277 = score(doc=5291,freq=8.0), product of:
        0.05209492 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.045180224 = queryNorm
        0.17835285 = fieldWeight in 5291, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5291)
    0.02142454 = product of:
      0.04284908 = sum of:
        0.04284908 = weight(_text_:22 in 5291) [ClassicSimilarity], result of:
          0.04284908 = score(doc=5291,freq=2.0), product of:
            0.15821345 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045180224 = queryNorm
            0.2708308 = fieldWeight in 5291, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5291)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: We use a probabilistic mixture decomposition method to determine topics in the Pennsylvania Gazette, a major colonial U.S. newspaper from 1728-1800. We assess the value of several topic decomposition techniques for historical research and compare the accuracy and efficacy of various methods. After determining the topics covered by the 80,000 articles and advertisements in the entire 18th century run of the Gazette, we calculate how the prevalence of those topics changed over time, and give historically relevant examples of our findings. This approach reveals important information about the content of this colonial newspaper, and suggests the value of such approaches to a more complete understanding of early American print culture and society.
Date: 22. 7.2006 17:32:00
Type: a

Lepsky, K.; Vorhauer, J.: Lingo - ein open source System für die Automatische Indexierung deutschsprachiger Dokumente (2006) 0.02

0.019862993 = product of:
  0.029794488 = sum of:
    0.0053093014 = weight(_text_:a in 3581) [ClassicSimilarity], result of:
      0.0053093014 = score(doc=3581,freq=2.0), product of:
        0.05209492 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.045180224 = queryNorm
        0.10191591 = fieldWeight in 3581, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0625 = fieldNorm(doc=3581)
    0.024485188 = product of:
      0.048970375 = sum of:
        0.048970375 = weight(_text_:22 in 3581) [ClassicSimilarity], result of:
          0.048970375 = score(doc=3581,freq=2.0), product of:
            0.15821345 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045180224 = queryNorm
            0.30952093 = fieldWeight in 3581, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=3581)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Date: 24. 3.2006 12:22:02
Type: a

Probst, M.; Mittelbach, J.: Maschinelle Indexierung in der Sacherschließung wissenschaftlicher Bibliotheken (2006) 0.02

0.019862993 = product of:
  0.029794488 = sum of:
    0.0053093014 = weight(_text_:a in 1755) [ClassicSimilarity], result of:
      0.0053093014 = score(doc=1755,freq=2.0), product of:
        0.05209492 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.045180224 = queryNorm
        0.10191591 = fieldWeight in 1755, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0625 = fieldNorm(doc=1755)
    0.024485188 = product of:
      0.048970375 = sum of:
        0.048970375 = weight(_text_:22 in 1755) [ClassicSimilarity], result of:
          0.048970375 = score(doc=1755,freq=2.0), product of:
            0.15821345 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045180224 = queryNorm
            0.30952093 = fieldWeight in 1755, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=1755)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Date: 22. 3.2008 12:35:19
Type: a

Nicoletti, M.: Automatische Indexierung (2001) 0.02

0.018437965 = product of:
  0.055313893 = sum of:
    0.055313893 = product of:
      0.110627785 = sum of:
        0.110627785 = weight(_text_:de in 4326) [ClassicSimilarity], result of:
          0.110627785 = score(doc=4326,freq=2.0), product of:
            0.19416152 = queryWeight, product of:
              4.297489 = idf(docFreq=1634, maxDocs=44218)
              0.045180224 = queryNorm
            0.56977195 = fieldWeight in 4326, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.297489 = idf(docFreq=1634, maxDocs=44218)
              0.09375 = fieldNorm(doc=4326)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Content: Inhalt: 1. Aufgabe - 2. Ermittlung von Mehrwortgruppen - 2.1 Definition - 3. Kennzeichnung der Mehrwortgruppen - 4. Grundformen - 5. Term- und Dokumenthäufigkeit --- Termgewichtung - 6. Steuerungsinstrument Schwellenwert - 7. Invertierter Index. Vgl. unter: http://www.grin.com/de/e-book/104966/automatische-indexierung.

Renz, M.: Automatische Inhaltserschließung im Zeichen von Wissensmanagement (2001) 0.02

0.017380118 = product of:
  0.026070178 = sum of:
    0.0046456386 = weight(_text_:a in 5671) [ClassicSimilarity], result of:
      0.0046456386 = score(doc=5671,freq=2.0), product of:
        0.05209492 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.045180224 = queryNorm
        0.089176424 = fieldWeight in 5671, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5671)
    0.02142454 = product of:
      0.04284908 = sum of:
        0.04284908 = weight(_text_:22 in 5671) [ClassicSimilarity], result of:
          0.04284908 = score(doc=5671,freq=2.0), product of:
            0.15821345 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045180224 = queryNorm
            0.2708308 = fieldWeight in 5671, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5671)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Date: 22. 3.2001 13:14:48
Type: a

Lorenz, S.: Konzeption und prototypische Realisierung einer begriffsbasierten Texterschließung (2006) 0.01

0.0061212964 = product of:
  0.01836389 = sum of:
    0.01836389 = product of:
      0.03672778 = sum of:
        0.03672778 = weight(_text_:22 in 1746) [ClassicSimilarity], result of:
          0.03672778 = score(doc=1746,freq=2.0), product of:
            0.15821345 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045180224 = queryNorm
            0.23214069 = fieldWeight in 1746, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=1746)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 22. 3.2015 9:17:30

Hlava, M.M.: Automatic indexing : a matter of degree (2002) 0.00

0.0043799505 = product of:
  0.013139851 = sum of:
    0.013139851 = weight(_text_:a in 2501) [ClassicSimilarity], result of:
      0.013139851 = score(doc=2501,freq=4.0), product of:
        0.05209492 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.045180224 = queryNorm
        0.25222903 = fieldWeight in 2501, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.109375 = fieldNorm(doc=2501)
  0.33333334 = coord(1/3)

Type: a

Yusuff, A.: Automatisches Indexing and Abstracting : Grundlagen und Beispiele (2002) 0.00

0.0043799505 = product of:
  0.013139851 = sum of:
    0.013139851 = weight(_text_:a in 1577) [ClassicSimilarity], result of:
      0.013139851 = score(doc=1577,freq=4.0), product of:
        0.05209492 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.045180224 = queryNorm
        0.25222903 = fieldWeight in 1577, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.109375 = fieldNorm(doc=1577)
  0.33333334 = coord(1/3)

Imprint: Potsdam : Fachhochschule, FB A-B-D

Nohr, H.: Grundlagen der automatischen Indexierung : ein Lehrbuch (2003) 0.00

0.0040808646 = product of:
  0.012242594 = sum of:
    0.012242594 = product of:
      0.024485188 = sum of:
        0.024485188 = weight(_text_:22 in 1767) [ClassicSimilarity], result of:
          0.024485188 = score(doc=1767,freq=2.0), product of:
            0.15821345 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045180224 = queryNorm
            0.15476047 = fieldWeight in 1767, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=1767)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 22. 6.2009 12:46:51

Pulgarin, A.; Gil-Leiva, I.: Bibliometric analysis of the automatic indexing literature : 1956-2000 (2004) 0.00
```
0.003793148 = product of:
  0.011379444 = sum of:
    0.011379444 = weight(_text_:a in 2566) [ClassicSimilarity], result of:
      0.011379444 = score(doc=2566,freq=12.0), product of:
        0.05209492 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.045180224 = queryNorm
        0.21843673 = fieldWeight in 2566, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2566)
  0.33333334 = coord(1/3)
```
Abstract

We present a bibliometric study of a corpus of 839 bibliographic references about automatic indexing, covering the period 1956-2000. We analyse the distribution of authors and works, the obsolescence and its dispersion, and the distribution of the literature by topic, year, and source type. We conclude that: (i) there has been a constant interest on the part of researchers; (ii) the most studied topics were the techniques and methods employed and the general aspects of automatic indexing; (iii) the productivity of the authors does fit a Lotka distribution (Dmax=0.02 and critical value=0.054); (iv) the annual aging factor is 95%; and (v) the dispersion of the literature is low.

Type

a

Kantor, P.B.; Voorhees, E.: Information retrieval with scanned texts (2000) 0.00

0.0035395343 = product of:
  0.010618603 = sum of:
    0.010618603 = weight(_text_:a in 3901) [ClassicSimilarity], result of:
      0.010618603 = score(doc=3901,freq=2.0), product of:
        0.05209492 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.045180224 = queryNorm
        0.20383182 = fieldWeight in 3901, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.125 = fieldNorm(doc=3901)
  0.33333334 = coord(1/3)

Type: a

Mongin, L.; Fu, Y.Y.; Mostafa, J.: Open Archives data Service prototype and automated subject indexing using D-Lib archive content as a testbed (2003) 0.00
```
0.00325127 = product of:
  0.009753809 = sum of:
    0.009753809 = weight(_text_:a in 1167) [ClassicSimilarity], result of:
      0.009753809 = score(doc=1167,freq=12.0), product of:
        0.05209492 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.045180224 = queryNorm
        0.18723148 = fieldWeight in 1167, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=1167)
  0.33333334 = coord(1/3)
```
Abstract

The Indiana University School of Library and Information Science opened a new research laboratory in January 2003; The Indiana University School of Library and Information Science Information Processing Laboratory [IU IP Lab]. The purpose of the new laboratory is to facilitate collaboration between scientists in the department in the areas of information retrieval (IR) and information visualization (IV) research. The lab has several areas of focus. These include grid and cluster computing, and a standard Java-based software platform to support plug and play research datasets, a selection of standard IR modules and standard IV algorithms. Future development includes software to enable researchers to contribute datasets, IR algorithms, and visualization algorithms into the standard environment. We decided early on to use OAI-PMH as a resource discovery tool because it is consistent with our mission.

Type

a
Medelyan, O.; Witten, I.H.: Domain-independent automatic keyphrase indexing with small training sets (2008) 0.00
```
0.00325127 = product of:
  0.009753809 = sum of:
    0.009753809 = weight(_text_:a in 1871) [ClassicSimilarity], result of:
      0.009753809 = score(doc=1871,freq=12.0), product of:
        0.05209492 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.045180224 = queryNorm
        0.18723148 = fieldWeight in 1871, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=1871)
  0.33333334 = coord(1/3)
```
Abstract

Keyphrases are widely used in both physical and digital libraries as a brief, but precise, summary of documents. They help organize material based on content, provide thematic access, represent search results, and assist with navigation. Manual assignment is expensive because trained human indexers must reach an understanding of the document and select appropriate descriptors according to defined cataloging rules. We propose a new method that enhances automatic keyphrase extraction by using semantic information about terms and phrases gleaned from a domain-specific thesaurus. The key advantage of the new approach is that it performs well with very little training data. We evaluate it on a large set of manually indexed documents in the domain of agriculture, compare its consistency with a group of six professional indexers, and explore its performance on smaller collections of documents in other domains and of French and Spanish documents.

Type

a

Woltering, H.: Maschinelle Indexierung in der Bibliothek der Friedrich-Ebert-Stiftung (2002) 0.00

0.0030970925 = product of:
  0.009291277 = sum of:
    0.009291277 = weight(_text_:a in 4351) [ClassicSimilarity], result of:
      0.009291277 = score(doc=4351,freq=2.0), product of:
        0.05209492 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.045180224 = queryNorm
        0.17835285 = fieldWeight in 4351, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.109375 = fieldNorm(doc=4351)
  0.33333334 = coord(1/3)

Type: a

Oberhauser, O.; Labner, J.: Einführung der automatischen Indexierung im Österreichischen Verbundkatalog? : Bericht über eine empirische Studie (2003) 0.00

0.0030970925 = product of:
  0.009291277 = sum of:
    0.009291277 = weight(_text_:a in 1878) [ClassicSimilarity], result of:
      0.009291277 = score(doc=1878,freq=2.0), product of:
        0.05209492 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.045180224 = queryNorm
        0.17835285 = fieldWeight in 1878, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.109375 = fieldNorm(doc=1878)
  0.33333334 = coord(1/3)

Location: A

Goller, C.; Löning, J.; Will, T.; Wolff, W.: Automatic document classification : a thourough evaluation of various methods (2000) 0.00
```
0.00296799 = product of:
  0.00890397 = sum of:
    0.00890397 = weight(_text_:a in 5480) [ClassicSimilarity], result of:
      0.00890397 = score(doc=5480,freq=10.0), product of:
        0.05209492 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.045180224 = queryNorm
        0.1709182 = fieldWeight in 5480, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=5480)
  0.33333334 = coord(1/3)
```
Abstract

(Automatic) document classification is generally defined as content-based assignment of one or more predefined categories to documents. Usually, machine learning, statistical pattern recognition, or neural network approaches are used to construct classifiers automatically. In this paper we thoroughly evaluate a wide variety of these methods on a document classification task for German text. We evaluate different feature construction and selection methods and various classifiers. Our main results are: (1) feature selection is necessary not only to reduce learning and classification time, but also to avoid overfitting (even for Support Vector Machines); (2) surprisingly, our morphological analysis does not improve classification quality compared to a letter 5-gram approach; (3) Support Vector Machines are significantly better than all other classification methods

Type

a
Pirkola, A.: Morphological typology of languages for IR (2001) 0.00
```
0.00296799 = product of:
  0.00890397 = sum of:
    0.00890397 = weight(_text_:a in 4476) [ClassicSimilarity], result of:
      0.00890397 = score(doc=4476,freq=10.0), product of:
        0.05209492 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.045180224 = queryNorm
        0.1709182 = fieldWeight in 4476, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=4476)
  0.33333334 = coord(1/3)
```
Abstract

This paper presents a morphological classification of languages from the IR perspective. Linguistic typology research has shown that the morphological complexity of every language in the world can be described by two variables, index of synthesis and index of fusion. These variables provide a theoretical basis for IR research handling morphological issues. A common theoretical framework is needed in particular because of the increasing significance of cross-language retrieval research and CLIR systems processing different languages. The paper elaborates the linguistic morphological typology for the purposes of IR research. It studies how the indexes of synthesis and fusion could be used as practical tools in mono- and cross-lingual IR research. The need for semantic and syntactic typologies is discussed. The paper also reviews studies made in different languages on the effects of morphology and stemming in IR.

Type

a

Search (61 results, page 1 of 4)

Authors

Languages

Types

Themes