Search (7 results, page 1 of 1)

Maghsoodi, N.; Homayounpour, M.M.: Improving Farsi multiclass text classification using a thesaurus and two-stage feature selection (2011) 0.03
```
0.02709466 = product of:
  0.1354733 = sum of:
    0.1354733 = weight(_text_:thesaurus in 4775) [ClassicSimilarity], result of:
      0.1354733 = score(doc=4775,freq=10.0), product of:
        0.23732872 = queryWeight, product of:
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.051357865 = queryNorm
        0.5708255 = fieldWeight in 4775, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4775)
  0.2 = coord(1/5)
```
Abstract

The progressive increase of information content has recently made it necessary to create a system for automatic classification of documents. In this article, a system is presented for the categorization of multiclass Farsi documents that requires fewer training examples and can help to compensate the shortcoming of the standard training dataset. The new idea proposed in the present article is based on extending the feature vector by adding some words extracted from a thesaurus and then filtering the new feature vector by applying secondary feature selection to discard inappropriate features. In fact, a phase of secondary feature selection is applied to choose more appropriate features among the features added from a thesaurus to enhance the effect of using a thesaurus on the efficiency of the classifier. To evaluate the proposed system, a corpus is gathered from the Farsi Wikipedia website and some articles in the Hamshahri newspaper, the Roshd periodical, and the Soroush magazine. In addition to studying the role of a thesaurus and applying secondary feature selection, the effect of a various number of categories, size of the training dataset, and average number of words in the test data also are examined. As the results indicate, classification efficiency improves by applying this approach, especially when available data is not sufficient for some text categories.

HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.01

0.013916564 = product of:
  0.06958282 = sum of:
    0.06958282 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
      0.06958282 = score(doc=2748,freq=2.0), product of:
        0.1798465 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.051357865 = queryNorm
        0.38690117 = fieldWeight in 2748, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.078125 = fieldNorm(doc=2748)
  0.2 = coord(1/5)

Date: 1. 2.2016 18:25:22

Groß, T.; Faden, M.: Automatische Indexierung elektronischer Dokumente an der Deutschen Zentralbibliothek für Wirtschaftswissenschaften : Bericht über die Jahrestagung der Internationalen Buchwissenschaftlichen Gesellschaft (2010) 0.01
```
0.013708933 = product of:
  0.06854466 = sum of:
    0.06854466 = weight(_text_:thesaurus in 4051) [ClassicSimilarity], result of:
      0.06854466 = score(doc=4051,freq=4.0), product of:
        0.23732872 = queryWeight, product of:
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.051357865 = queryNorm
        0.2888174 = fieldWeight in 4051, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.03125 = fieldNorm(doc=4051)
  0.2 = coord(1/5)
```
Abstract

Mit der Anfang 2010 begonnen Implementierung und Ergebnisevaluierung des automatischen Indexierungsverfahrens "Decisiv Categorization" der Firma Recommind soll das hier skizzierte Informationsstrukturierungsproblem in zwei Schritten gelöst werden. Kurz- bis mittelfristig soll die intellektuelle Indexierung durch ein semiautomatisches Verfahren6 unterstützt werden. Mittel- bis langfristig soll das maschinelle Verfahren, aufbauend auf einem entsprechenden Training, in die Lage versetzt werden, sowohl im Hause vorliegende Dokumente vollautomatisch zu indexieren als auch ZBW-fremde digitale Informationsressourcen zu verschlagworten bzw. zu klassifizieren, um sie in einem gemeinsamen Suchraum auffindbar machen zu können. Im Anschluss an diese Einleitung werden die ersten Ansätze maschineller Sacherschließung an der ZBW (2001-2004) und deren Ergebnisse und Problemlagen aufgezeigt. Danach werden die Rahmenbedingungen (Projektauftrag und -ziel) für eine Wiederaufnahme des Vorhabens im Jahre 2009 aufgezeigt, gefolgt von einer Darstellung der Funktionsweise der Recommind-Technologie und deren Einsatz im Rahmen der Sacherschließung von Online-Dokumenten mit einem Thesaurus. Schwerpunkt dieser Abhandlung bilden im Anschluss daran die Evaluierungsmöglichkeiten automatischer Indexierungsansätze sowie die aktuellen Ergebnisse und zentralen Erkenntnisse des Einsatzes im Kontext der ZBW. Das Fazit beschreibt die entsprechenden Schlussfolgerungen aus den erzielten Ergebnissen sowie den Ausblick auf das weitere Vorgehen.

Object

Standard-Thesaurus Wirtschaft
Wartena, C.; Sommer, M.: Automatic classification of scientific records using the German Subject Heading Authority File (SWD) (2012) 0.01
```
0.012117098 = product of:
  0.06058549 = sum of:
    0.06058549 = weight(_text_:thesaurus in 472) [ClassicSimilarity], result of:
      0.06058549 = score(doc=472,freq=2.0), product of:
        0.23732872 = queryWeight, product of:
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.051357865 = queryNorm
        0.2552809 = fieldWeight in 472, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.0390625 = fieldNorm(doc=472)
  0.2 = coord(1/5)
```
Abstract

The following paper deals with an automatic text classification method which does not require training documents. For this method the German Subject Heading Authority File (SWD), provided by the linked data service of the German National Library is used. Recently the SWD was enriched with notations of the Dewey Decimal Classification (DDC). In consequence it became possible to utilize the subject headings as textual representations for the notations of the DDC. Basically, we we derive the classification of a text from the classification of the words in the text given by the thesaurus. The method was tested by classifying 3826 OAI-Records from 7 different repositories. Mean reciprocal rank and recall were chosen as evaluation measure. Direct comparison to a machine learning method has shown that this method is definitely competitive. Thus we can conclude that the enriched version of the SWD provides high quality information with a broad coverage for classification of German scientific articles.

Zhu, W.Z.; Allen, R.B.: Document clustering using the LSI subspace signature model (2013) 0.01

0.008349938 = product of:
  0.04174969 = sum of:
    0.04174969 = weight(_text_:22 in 690) [ClassicSimilarity], result of:
      0.04174969 = score(doc=690,freq=2.0), product of:
        0.1798465 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.051357865 = queryNorm
        0.23214069 = fieldWeight in 690, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.046875 = fieldNorm(doc=690)
  0.2 = coord(1/5)

Date: 23. 3.2013 13:22:36

Egbert, J.; Biber, D.; Davies, M.: Developing a bottom-up, user-based method of web register classification (2015) 0.01

0.008349938 = product of:
  0.04174969 = sum of:
    0.04174969 = weight(_text_:22 in 2158) [ClassicSimilarity], result of:
      0.04174969 = score(doc=2158,freq=2.0), product of:
        0.1798465 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.051357865 = queryNorm
        0.23214069 = fieldWeight in 2158, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.046875 = fieldNorm(doc=2158)
  0.2 = coord(1/5)

Date: 4. 8.2015 19:22:04

Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.01

0.006958282 = product of:
  0.03479141 = sum of:
    0.03479141 = weight(_text_:22 in 1107) [ClassicSimilarity], result of:
      0.03479141 = score(doc=1107,freq=2.0), product of:
        0.1798465 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.051357865 = queryNorm
        0.19345059 = fieldWeight in 1107, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1107)
  0.2 = coord(1/5)

Date: 28.10.2013 19:22:57

Search (7 results, page 1 of 1)

Authors

Languages

Types

Themes