Search (31 results, page 2 of 2)

Gross, D.: Maschinelle Bilderkennung mit Big Data und Deep Learning (2017) 0.00

0.001353075 = product of:
  0.00270615 = sum of:
    0.00270615 = product of:
      0.0054123 = sum of:
        0.0054123 = weight(_text_:a in 3726) [ClassicSimilarity], result of:
          0.0054123 = score(doc=3726,freq=2.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.10191591 = fieldWeight in 3726, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=3726)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Type: a

Suominen, O.; Koskenniemi, I.: Annif Analyzer Shootout : comparing text lemmatization methods for automated subject indexing (2022) 0.00
```
0.0011959607 = product of:
  0.0023919214 = sum of:
    0.0023919214 = product of:
      0.0047838427 = sum of:
        0.0047838427 = weight(_text_:a in 658) [ClassicSimilarity], result of:
          0.0047838427 = score(doc=658,freq=4.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.090081796 = fieldWeight in 658, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=658)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Automated text classification is an important function for many AI systems relevant to libraries, including automated subject indexing and classification. When implemented using the traditional natural language processing (NLP) paradigm, one key part of the process is the normalization of words using stemming or lemmatization, which reduces the amount of linguistic variation and often improves the quality of classification. In this paper, we compare the output of seven different text lemmatization algorithms as well as two baseline methods. We measure how the choice of method affects the quality of text classification using example corpora in three languages. The experiments have been performed using the open source Annif toolkit for automated subject indexing and classification, but should generalize also to other NLP toolkits and similar text classification tasks. The results show that lemmatization methods in most cases outperform baseline methods in text classification particularly for Finnish and Swedish text, but not English, where baseline methods are most effective. The differences between lemmatization methods are quite small. The systematic comparison will help optimize text classification pipelines and inform the further development of the Annif toolkit to incorporate a wider choice of normalization methods.

Type

a

Wiesenmüller, H.: Maschinelle Indexierung am Beispiel der DNB : Analyse und Entwicklungmöglichkeiten (2018) 0.00

0.0011839407 = product of:
  0.0023678814 = sum of:
    0.0023678814 = product of:
      0.0047357627 = sum of:
        0.0047357627 = weight(_text_:a in 5209) [ClassicSimilarity], result of:
          0.0047357627 = score(doc=5209,freq=2.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.089176424 = fieldWeight in 5209, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5209)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Type: a

Mielke, B.: Wider einige gängige Ansichten zur juristischen Informationserschließung (2002) 0.00

0.0010148063 = product of:
  0.0020296127 = sum of:
    0.0020296127 = product of:
      0.0040592253 = sum of:
        0.0040592253 = weight(_text_:a in 2145) [ClassicSimilarity], result of:
          0.0040592253 = score(doc=2145,freq=2.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.07643694 = fieldWeight in 2145, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=2145)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Type: a

Pielmeier, S.; Voß, V.; Carstensen, H.; Kahl, B.: Online-Workshop "Computerunterstützte Inhaltserschließung" 2020 (2021) 0.00

0.0010148063 = product of:
  0.0020296127 = sum of:
    0.0020296127 = product of:
      0.0040592253 = sum of:
        0.0040592253 = weight(_text_:a in 4409) [ClassicSimilarity], result of:
          0.0040592253 = score(doc=4409,freq=2.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.07643694 = fieldWeight in 4409, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=4409)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Type: a

Beckmann, R.; Hinrichs, I.; Janßen, M.; Milmeister, G.; Schäuble, P.: ¬Der Digitale Assistent DA-3 : Eine Plattform für die Inhaltserschließung (2019) 0.00

0.0010148063 = product of:
  0.0020296127 = sum of:
    0.0020296127 = product of:
      0.0040592253 = sum of:
        0.0040592253 = weight(_text_:a in 5408) [ClassicSimilarity], result of:
          0.0040592253 = score(doc=5408,freq=2.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.07643694 = fieldWeight in 5408, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=5408)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Type: a

Kasprzik, A.: Aufbau eines produktiven Dienstes für die automatisierte Inhaltserschließung an der ZBW : ein Status- und Erfahrungsbericht. (2023) 0.00

9.567685E-4 = product of:
  0.001913537 = sum of:
    0.001913537 = product of:
      0.003827074 = sum of:
        0.003827074 = weight(_text_:a in 935) [ClassicSimilarity], result of:
          0.003827074 = score(doc=935,freq=4.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.072065435 = fieldWeight in 935, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.03125 = fieldNorm(doc=935)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Type: a

Strobel, S.: Englischsprachige Erweiterung des TIB / AV-Portals : Ein GND/DBpedia-Mapping zur Gewinnung eines englischen Begriffssystems (2014) 0.00

8.4567186E-4 = product of:
  0.0016913437 = sum of:
    0.0016913437 = product of:
      0.0033826875 = sum of:
        0.0033826875 = weight(_text_:a in 2876) [ClassicSimilarity], result of:
          0.0033826875 = score(doc=2876,freq=2.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.06369744 = fieldWeight in 2876, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2876)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Type: a

Donath, A.: Flickr sorgt mit Automatik-Tags für Aufregung (2015) 0.00

8.4567186E-4 = product of:
  0.0016913437 = sum of:
    0.0016913437 = product of:
      0.0033826875 = sum of:
        0.0033826875 = weight(_text_:a in 1876) [ClassicSimilarity], result of:
          0.0033826875 = score(doc=1876,freq=2.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.06369744 = fieldWeight in 1876, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1876)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Toepfer, M.; Kempf, A.O.: Automatische Indexierung auf Basis von Titeln und Autoren-Keywords : ein Werkstattbericht (2016) 0.00

8.4567186E-4 = product of:
  0.0016913437 = sum of:
    0.0016913437 = product of:
      0.0033826875 = sum of:
        0.0033826875 = weight(_text_:a in 3209) [ClassicSimilarity], result of:
          0.0033826875 = score(doc=3209,freq=2.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.06369744 = fieldWeight in 3209, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3209)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Type: a

Mödden, E.; Dreger, A.; Hommes, K.P.; Mohammadianbisheh, N.; Mölck, L.; Pinna, L.; Sitte-Zöllner, D.: ¬Der Weg zur Gründung der AG Erschließung ÖB-DNB und die Entwicklung eines maschinellen Verfahrens zur Verschlagwortung der Kinder- und Jugendliteratur mit GND-Vokabular (2020) 0.00

8.371725E-4 = product of:
  0.001674345 = sum of:
    0.001674345 = product of:
      0.00334869 = sum of:
        0.00334869 = weight(_text_:a in 71) [ClassicSimilarity], result of:
          0.00334869 = score(doc=71,freq=4.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.06305726 = fieldWeight in 71, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.02734375 = fieldNorm(doc=71)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Type: a

Search (31 results, page 2 of 2)

Authors

Years

Languages

Themes