Search (68 results, page 3 of 4)

Borko, H.: Research in computer based classification systems (1985) 0.00
```
0.00273512 = product of:
  0.01641072 = sum of:
    0.01641072 = weight(_text_:r in 3647) [ClassicSimilarity], result of:
      0.01641072 = score(doc=3647,freq=2.0), product of:
        0.12820137 = queryWeight, product of:
          3.3102584 = idf(docFreq=4387, maxDocs=44218)
          0.03872851 = queryNorm
        0.12800737 = fieldWeight in 3647, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3102584 = idf(docFreq=4387, maxDocs=44218)
          0.02734375 = fieldNorm(doc=3647)
  0.16666667 = coord(1/6)
```
Abstract

The selection in this reader by R. M. Needham and K. Sparck Jones reports an early approach to automatic classification that was taken in England. The following selection reviews various approaches that were being pursued in the United States at about the same time. It then discusses a particular approach initiated in the early 1960s by Harold Borko, at that time Head of the Language Processing and Retrieval Research Staff at the System Development Corporation, Santa Monica, California and, since 1966, a member of the faculty at the Graduate School of Library and Information Science, University of California, Los Angeles. As was described earlier, there are two steps in automatic classification, the first being to identify pairs of terms that are similar by virtue of co-occurring as index terms in the same documents, and the second being to form equivalence classes of intersubstitutable terms. To compute similarities, Borko and his associates used a standard correlation formula; to derive classification categories, where Needham and Sparck Jones used clumping, the Borko team used the statistical technique of factor analysis. The fact that documents can be classified automatically, and in any number of ways, is worthy of passing notice. Worthy of serious attention would be a demonstra tion that a computer-based classification system was effective in the organization and retrieval of documents. One reason for the inclusion of the following selection in the reader is that it addresses the question of evaluation. To evaluate the effectiveness of their automatically derived classification, Borko and his team asked three questions. The first was Is the classification reliable? in other words, could the categories derived from one sample of texts be used to classify other texts? Reliability was assessed by a case-study comparison of the classes derived from three different samples of abstracts. The notso-surprising conclusion reached was that automatically derived classes were reliable only to the extent that the sample from which they were derived was representative of the total document collection. The second evaluation question asked whether the classification was reasonable, in the sense of adequately describing the content of the document collection. The answer was sought by comparing the automatically derived categories with categories in a related classification system that was manually constructed. Here the conclusion was that the automatic method yielded categories that fairly accurately reflected the major area of interest in the sample collection of texts; however, since there were only eleven such categories and they were quite broad, they could not be regarded as suitable for use in a university or any large general library. The third evaluation question asked whether automatic classification was accurate, in the sense of producing results similar to those obtainabie by human cIassifiers. When using human classification as a criterion, automatic classification was found to be 50 percent accurate.

Shafer, K.E.: Evaluating Scorpion Results (2001) 0.00

0.0026258973 = product of:
  0.015755383 = sum of:
    0.015755383 = product of:
      0.031510767 = sum of:
        0.031510767 = weight(_text_:4 in 4085) [ClassicSimilarity], result of:
          0.031510767 = score(doc=4085,freq=2.0), product of:
            0.105097495 = queryWeight, product of:
              2.7136984 = idf(docFreq=7967, maxDocs=44218)
              0.03872851 = queryNorm
            0.29982415 = fieldWeight in 4085, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.7136984 = idf(docFreq=7967, maxDocs=44218)
              0.078125 = fieldNorm(doc=4085)
      0.5 = coord(1/2)
  0.16666667 = coord(1/6)

Source: Journal of library administration. 34(2001) nos.3/4, S.237-244

Pfeffer, M.: Automatische Vergabe von RVK-Notationen mittels fallbasiertem Schließen (2009) 0.00

0.0026235892 = product of:
  0.015741535 = sum of:
    0.015741535 = product of:
      0.03148307 = sum of:
        0.03148307 = weight(_text_:22 in 3051) [ClassicSimilarity], result of:
          0.03148307 = score(doc=3051,freq=2.0), product of:
            0.13562064 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03872851 = queryNorm
            0.23214069 = fieldWeight in 3051, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=3051)
      0.5 = coord(1/2)
  0.16666667 = coord(1/6)

Date: 22. 8.2009 19:51:28

Cheng, P.T.K.; Wu, A.K.W.: ACS: an automatic classification system (1995) 0.00

0.0021007177 = product of:
  0.012604306 = sum of:
    0.012604306 = product of:
      0.025208613 = sum of:
        0.025208613 = weight(_text_:4 in 2188) [ClassicSimilarity], result of:
          0.025208613 = score(doc=2188,freq=2.0), product of:
            0.105097495 = queryWeight, product of:
              2.7136984 = idf(docFreq=7967, maxDocs=44218)
              0.03872851 = queryNorm
            0.23985931 = fieldWeight in 2188, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.7136984 = idf(docFreq=7967, maxDocs=44218)
              0.0625 = fieldNorm(doc=2188)
      0.5 = coord(1/2)
  0.16666667 = coord(1/6)

Source: Journal of information science. 21(1995) no.4, S.289-299

Koch, T.; Ardö, A.: Automatic classification of full-text HTML-documents from one specific subject area : DESIRE II D3.6a, Working Paper 2 (2000) 0.00

0.0021007177 = product of:
  0.012604306 = sum of:
    0.012604306 = product of:
      0.025208613 = sum of:
        0.025208613 = weight(_text_:4 in 1667) [ClassicSimilarity], result of:
          0.025208613 = score(doc=1667,freq=2.0), product of:
            0.105097495 = queryWeight, product of:
              2.7136984 = idf(docFreq=7967, maxDocs=44218)
              0.03872851 = queryNorm
            0.23985931 = fieldWeight in 1667, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.7136984 = idf(docFreq=7967, maxDocs=44218)
              0.0625 = fieldNorm(doc=1667)
      0.5 = coord(1/2)
  0.16666667 = coord(1/6)

Content: 1 Introduction / 2 Method overview / 3 Ei thesaurus preprocessing / 4 Automatic classification process: 4.1 Matching -- 4.2 Weighting -- 4.3 Preparation for display / 5 Results of the classification process / 6 Evaluations / 7 Software / 8 Other applications / 9 Experiments with universal classification systems / References / Appendix A: Ei classification service: Software / Appendix B: Use of the classification software as subject filter in a WWW harvester.

Savic, D.: Automatic classification of office documents : review of available methods and techniques (1995) 0.00

0.001838128 = product of:
  0.011028768 = sum of:
    0.011028768 = product of:
      0.022057535 = sum of:
        0.022057535 = weight(_text_:4 in 2219) [ClassicSimilarity], result of:
          0.022057535 = score(doc=2219,freq=2.0), product of:
            0.105097495 = queryWeight, product of:
              2.7136984 = idf(docFreq=7967, maxDocs=44218)
              0.03872851 = queryNorm
            0.2098769 = fieldWeight in 2219, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.7136984 = idf(docFreq=7967, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2219)
      0.5 = coord(1/2)
  0.16666667 = coord(1/6)

Source: Records management quarterly. 29(1995) no.4, S.3-18

May, A.D.: Automatic classification of e-mail messages by message type (1997) 0.00
```
0.001838128 = product of:
  0.011028768 = sum of:
    0.011028768 = product of:
      0.022057535 = sum of:
        0.022057535 = weight(_text_:4 in 6493) [ClassicSimilarity], result of:
          0.022057535 = score(doc=6493,freq=2.0), product of:
            0.105097495 = queryWeight, product of:
              2.7136984 = idf(docFreq=7967, maxDocs=44218)
              0.03872851 = queryNorm
            0.2098769 = fieldWeight in 6493, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.7136984 = idf(docFreq=7967, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6493)
      0.5 = coord(1/2)
  0.16666667 = coord(1/6)
```
Abstract

This article describes a system that automatically classifies e-mail messages in the HUMANIST electronic discussion group into one of 4 classes: questions, responses, announcement or administartive. A total of 1.372 messages were analyzed. The automatic classification of a message was based on string matching between a message text and predefined string sets for each of the massage types. The system's automated ability to accurately classify a message was compared against manually assigned codes. The Cohen's Kappa of .55 suggested that there was a statistical agreement between the automatic and manually assigned codes

Panyr, J.: Automatische thematische Textklassifikation und ihre Interpretation in der Dokumentengrobrecherche (1980) 0.00

0.001838128 = product of:
  0.011028768 = sum of:
    0.011028768 = product of:
      0.022057535 = sum of:
        0.022057535 = weight(_text_:4 in 100) [ClassicSimilarity], result of:
          0.022057535 = score(doc=100,freq=2.0), product of:
            0.105097495 = queryWeight, product of:
              2.7136984 = idf(docFreq=7967, maxDocs=44218)
              0.03872851 = queryNorm
            0.2098769 = fieldWeight in 100, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.7136984 = idf(docFreq=7967, maxDocs=44218)
              0.0546875 = fieldNorm(doc=100)
      0.5 = coord(1/2)
  0.16666667 = coord(1/6)

Source: Wissensstrukturen und Ordnungsmuster. Proc. der 4. Fachtagung der Gesellschaft für Klassifikation, Salzburg, 16.-19.4.1980. Red.: W. Dahlberg

Wätjen, H.-J.: GERHARD : Automatisches Sammeln, Klassifizieren und Indexieren von wissenschaftlich relevanten Informationsressourcen im deutschen World Wide Web (1998) 0.00

0.001838128 = product of:
  0.011028768 = sum of:
    0.011028768 = product of:
      0.022057535 = sum of:
        0.022057535 = weight(_text_:4 in 3064) [ClassicSimilarity], result of:
          0.022057535 = score(doc=3064,freq=2.0), product of:
            0.105097495 = queryWeight, product of:
              2.7136984 = idf(docFreq=7967, maxDocs=44218)
              0.03872851 = queryNorm
            0.2098769 = fieldWeight in 3064, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.7136984 = idf(docFreq=7967, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3064)
      0.5 = coord(1/2)
  0.16666667 = coord(1/6)

Source: B.I.T.online. 1(1998) H.4, S.279-290

Schek, M.: Automatische Klassifizierung und Visualisierung im Archiv der Süddeutschen Zeitung (2005) 0.00
```
0.001838128 = product of:
  0.011028768 = sum of:
    0.011028768 = product of:
      0.022057535 = sum of:
        0.022057535 = weight(_text_:4 in 4884) [ClassicSimilarity], result of:
          0.022057535 = score(doc=4884,freq=8.0), product of:
            0.105097495 = queryWeight, product of:
              2.7136984 = idf(docFreq=7967, maxDocs=44218)
              0.03872851 = queryNorm
            0.2098769 = fieldWeight in 4884, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              2.7136984 = idf(docFreq=7967, maxDocs=44218)
              0.02734375 = fieldNorm(doc=4884)
      0.5 = coord(1/2)
  0.16666667 = coord(1/6)
```
Abstract

DIZ definiert das Wissensnetz als Alleinstellungsmerkmal und wendet beträchtliche personelle Ressourcen für die Aktualisierung und Oualitätssicherung der Dossiers auf. Nach der Umstellung auf den komplett digitalisierten Workflow im April 2001 identifizierte DIZ vier Ansatzpunkte, wie die Aufwände auf der Inputseite (Lektorat) zu optimieren sind und gleichzeitig auf der Outputseite (Recherche) das Wissensnetz besser zu vermarkten ist: 1. (Teil-)Automatische Klassifizierung von Pressetexten (Vorschlagwesen) 2. Visualisierung des Wissensnetzes (Topic Mapping) 3. (Voll-)Automatische Klassifizierung und Optimierung des Wissensnetzes 4. Neue Retrievalmöglichkeiten (Clustering, Konzeptsuche) Die Projekte 1 und 2 "Automatische Klassifizierung und Visualisierung" starteten zuerst und wurden beschleunigt durch zwei Entwicklungen: - Der Bayerische Rundfunk (BR), ursprünglich Mitbegründer und 50%-Gesellschafter der DIZ München GmbH, entschloss sich aus strategischen Gründen, zum Ende 2003 aus der Kooperation auszusteigen. - Die Medienkrise, hervorgerufen durch den massiven Rückgang der Anzeigenerlöse, erforderte auch im Süddeutschen Verlag massive Einsparungen und die Suche nach neuen Erlösquellen. Beides führte dazu, dass die Kapazitäten im Bereich Pressedokumentation von ursprünglich rund 20 (nur SZ, ohne BR-Anteil) auf rund 13 zum 1. Januar 2004 sanken und gleichzeitig die Aufwände für die Pflege des Wissensnetzes unter verstärkten Rechtfertigungsdruck gerieten. Für die Projekte 1 und 2 ergaben sich daraus drei quantitative und qualitative Ziele: - Produktivitätssteigerung im Lektorat - Konsistenzverbesserung im Lektorat - Bessere Vermarktung und intensivere Nutzung der Dossiers in der Recherche Alle drei genannten Ziele konnten erreicht werden, wobei insbesondere die Produktivität im Lektorat gestiegen ist. Die Projekte 1 und 2 "Automatische Klassifizierung und Visualisierung" sind seit Anfang 2004 erfolgreich abgeschlossen. Die Folgeprojekte 3 und 4 laufen seit Mitte 2004 und sollen bis Mitte 2005 abgeschlossen sein. Im folgenden wird in Abschnitt 2 die Produktauswahl und Arbeitsweise der Automatischen Klassifizierung beschrieben. Abschnitt 3 schildert den Einsatz der Wissensnetz-Visualisierung in Lektorat und Recherche. Abschnitt 4 fasst die Ergebnisse der Projekte 1 und 2 zusammen und gibt einen Ausblick auf die Ziele der Projekte 3 und 4.

Hu, G.; Zhou, S.; Guan, J.; Hu, X.: Towards effective document clustering : a constrained K-means based approach (2008) 0.00

0.001838128 = product of:
  0.011028768 = sum of:
    0.011028768 = product of:
      0.022057535 = sum of:
        0.022057535 = weight(_text_:4 in 2113) [ClassicSimilarity], result of:
          0.022057535 = score(doc=2113,freq=2.0), product of:
            0.105097495 = queryWeight, product of:
              2.7136984 = idf(docFreq=7967, maxDocs=44218)
              0.03872851 = queryNorm
            0.2098769 = fieldWeight in 2113, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.7136984 = idf(docFreq=7967, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2113)
      0.5 = coord(1/2)
  0.16666667 = coord(1/6)

Source: Information processing and management. 44(2008) no.4, S.1397-1409

Khoo, C.S.G.; Ng, K.; Ou, S.: ¬An exploratory study of human clustering of Web pages (2003) 0.00

0.0017490595 = product of:
  0.010494357 = sum of:
    0.010494357 = product of:
      0.020988714 = sum of:
        0.020988714 = weight(_text_:22 in 2741) [ClassicSimilarity], result of:
          0.020988714 = score(doc=2741,freq=2.0), product of:
            0.13562064 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03872851 = queryNorm
            0.15476047 = fieldWeight in 2741, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=2741)
      0.5 = coord(1/2)
  0.16666667 = coord(1/6)

Date: 12. 9.2004 9:56:22

Reiner, U.: Automatische DDC-Klassifizierung bibliografischer Titeldatensätze der Deutschen Nationalbibliografie (2009) 0.00

0.0017490595 = product of:
  0.010494357 = sum of:
    0.010494357 = product of:
      0.020988714 = sum of:
        0.020988714 = weight(_text_:22 in 3284) [ClassicSimilarity], result of:
          0.020988714 = score(doc=3284,freq=2.0), product of:
            0.13562064 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03872851 = queryNorm
            0.15476047 = fieldWeight in 3284, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=3284)
      0.5 = coord(1/2)
  0.16666667 = coord(1/6)

Date: 22. 1.2010 14:41:24

Koch, T.; Vizine-Goetz, D.: DDC and knowledge organization in the digital library : Research and development. Demonstration pages (1999) 0.00

0.0015755383 = product of:
  0.00945323 = sum of:
    0.00945323 = product of:
      0.01890646 = sum of:
        0.01890646 = weight(_text_:4 in 942) [ClassicSimilarity], result of:
          0.01890646 = score(doc=942,freq=2.0), product of:
            0.105097495 = queryWeight, product of:
              2.7136984 = idf(docFreq=7967, maxDocs=44218)
              0.03872851 = queryNorm
            0.17989448 = fieldWeight in 942, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.7136984 = idf(docFreq=7967, maxDocs=44218)
              0.046875 = fieldNorm(doc=942)
      0.5 = coord(1/2)
  0.16666667 = coord(1/6)

Content: 1. Increased Importance of Knowledge Organization in Internet Services - 2. Quality Subject Service and the role of classification - 3. Developing the DDC into a knowledge organization instrument for the digital library. OCLC site - 4. DESIRE's Barefoot Solutions of Automatic Classification - 5. Advanced Classification Solutions in DESIRE and CORC - 6. Future directions of research and development - 7. General references

Golub, K.; Hamon, T.; Ardö, A.: Automated classification of textual documents based on a controlled vocabulary in engineering (2007) 0.00

0.0015755383 = product of:
  0.00945323 = sum of:
    0.00945323 = product of:
      0.01890646 = sum of:
        0.01890646 = weight(_text_:4 in 1461) [ClassicSimilarity], result of:
          0.01890646 = score(doc=1461,freq=2.0), product of:
            0.105097495 = queryWeight, product of:
              2.7136984 = idf(docFreq=7967, maxDocs=44218)
              0.03872851 = queryNorm
            0.17989448 = fieldWeight in 1461, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.7136984 = idf(docFreq=7967, maxDocs=44218)
              0.046875 = fieldNorm(doc=1461)
      0.5 = coord(1/2)
  0.16666667 = coord(1/6)

Source: Knowledge organization. 34(2007) no.4, S.247-263

Montesi, M.; Navarrete, T.: Classifying web genres in context : A case study documenting the web genres used by a software engineer (2008) 0.00

0.0015755383 = product of:
  0.00945323 = sum of:
    0.00945323 = product of:
      0.01890646 = sum of:
        0.01890646 = weight(_text_:4 in 2100) [ClassicSimilarity], result of:
          0.01890646 = score(doc=2100,freq=2.0), product of:
            0.105097495 = queryWeight, product of:
              2.7136984 = idf(docFreq=7967, maxDocs=44218)
              0.03872851 = queryNorm
            0.17989448 = fieldWeight in 2100, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.7136984 = idf(docFreq=7967, maxDocs=44218)
              0.046875 = fieldNorm(doc=2100)
      0.5 = coord(1/2)
  0.16666667 = coord(1/6)

Source: Information processing and management. 44(2008) no.4, S.1410-1430

Xu, Y.; Bernard, A.: Knowledge organization through statistical computation : a new approach (2009) 0.00

0.0015755383 = product of:
  0.00945323 = sum of:
    0.00945323 = product of:
      0.01890646 = sum of:
        0.01890646 = weight(_text_:4 in 3252) [ClassicSimilarity], result of:
          0.01890646 = score(doc=3252,freq=2.0), product of:
            0.105097495 = queryWeight, product of:
              2.7136984 = idf(docFreq=7967, maxDocs=44218)
              0.03872851 = queryNorm
            0.17989448 = fieldWeight in 3252, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.7136984 = idf(docFreq=7967, maxDocs=44218)
              0.046875 = fieldNorm(doc=3252)
      0.5 = coord(1/2)
  0.16666667 = coord(1/6)

Source: Knowledge organization. 36(2009) no.4, S.227-239

Aphinyanaphongs, Y.; Fu, L.D.; Li, Z.; Peskin, E.R.; Efstathiadis, E.; Aliferis, C.F.; Statnikov, A.: ¬A comprehensive empirical comparison of modern supervised classification and feature selection methods for text categorization (2014) 0.00
```
0.0015755383 = product of:
  0.00945323 = sum of:
    0.00945323 = product of:
      0.01890646 = sum of:
        0.01890646 = weight(_text_:4 in 1496) [ClassicSimilarity], result of:
          0.01890646 = score(doc=1496,freq=2.0), product of:
            0.105097495 = queryWeight, product of:
              2.7136984 = idf(docFreq=7967, maxDocs=44218)
              0.03872851 = queryNorm
            0.17989448 = fieldWeight in 1496, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.7136984 = idf(docFreq=7967, maxDocs=44218)
              0.046875 = fieldNorm(doc=1496)
      0.5 = coord(1/2)
  0.16666667 = coord(1/6)
```
Abstract

An important aspect to performing text categorization is selecting appropriate supervised classification and feature selection methods. A comprehensive benchmark is needed to inform best practices in this broad application field. Previous benchmarks have evaluated performance for a few supervised classification and feature selection methods and limited ways to optimize them. The present work updates prior benchmarks by increasing the number of classifiers and feature selection methods order of magnitude, including adding recently developed, state-of-the-art methods. Specifically, this study used 229 text categorization data sets/tasks, and evaluated 28 classification methods (both well-established and proprietary/commercial) and 19 feature selection methods according to 4 classification performance metrics. We report several key findings that will be helpful in establishing best methodological practices for text categorization.
Billal, B.; Fonseca, A.; Sadat, F.; Lounis, H.: Semi-supervised learning and social media text analysis towards multi-labeling categorization (2017) 0.00
```
0.0014854318 = product of:
  0.00891259 = sum of:
    0.00891259 = product of:
      0.01782518 = sum of:
        0.01782518 = weight(_text_:4 in 4095) [ClassicSimilarity], result of:
          0.01782518 = score(doc=4095,freq=4.0), product of:
            0.105097495 = queryWeight, product of:
              2.7136984 = idf(docFreq=7967, maxDocs=44218)
              0.03872851 = queryNorm
            0.16960615 = fieldWeight in 4095, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.7136984 = idf(docFreq=7967, maxDocs=44218)
              0.03125 = fieldNorm(doc=4095)
      0.5 = coord(1/2)
  0.16666667 = coord(1/6)
```
Abstract

In traditional text classification, classes are mutually exclusive, i.e. it is not possible to have one text or text fragment classified into more than one class. On the other hand, in multi-label classification an individual text may belong to several classes simultaneously. This type of classification is required by a large number of current applications such as big data classification, images and video annotation. Supervised learning is the most used type of machine learning in the classification task. It requires large quantities of labeled data and the intervention of a human tagger in the creation of the training sets. When the data sets become very large or heavily noisy, this operation can be tedious, prone to error and time consuming. In this case, semi-supervised learning, which requires only few labels, is a better choice. In this paper, we study and evaluate several methods to address the problem of multi-label classification using semi-supervised learning and data from social networks. First, we propose a linguistic pre-processing involving tokeni-sation, recognition of named entities and hashtag segmentation in order to decrease the noise in this type of massive and unstructured real data and then we perform a word sense disambiguation using WordNet. Second, several experiments related to multi-label classification and semi-supervised learning are carried out on these data sets and compared to each other. These evaluations compare the results of the approaches considered. This paper proposes a method for combining semi-supervised methods with a graph method for the extraction of subjects in social networks using a multi-label classification approach. Experiments show that the performance of the proposed model increases in 4 p.p. the precision of the classification when compared to a baseline.

Date

4. 2.2018 13:10:17

Chung, Y.M.; Lee, J.Y.: ¬A corpus-based approach to comparative evaluation of statistical term association measures (2001) 0.00

0.0013129486 = product of:
  0.007877692 = sum of:
    0.007877692 = product of:
      0.015755383 = sum of:
        0.015755383 = weight(_text_:4 in 5769) [ClassicSimilarity], result of:
          0.015755383 = score(doc=5769,freq=2.0), product of:
            0.105097495 = queryWeight, product of:
              2.7136984 = idf(docFreq=7967, maxDocs=44218)
              0.03872851 = queryNorm
            0.14991207 = fieldWeight in 5769, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.7136984 = idf(docFreq=7967, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5769)
      0.5 = coord(1/2)
  0.16666667 = coord(1/6)

Source: Journal of the American Society for Information Science and technology. 52(2001) no.4, S.283-296

Search (68 results, page 3 of 4)

Authors

Years

Languages

Types

Themes