Search (46 results, page 1 of 3)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.03

0.027540796 = product of:
  0.05508159 = sum of:
    0.041068334 = product of:
      0.16427334 = sum of:
        0.16427334 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.16427334 = score(doc=562,freq=2.0), product of:
            0.29229194 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.034476474 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.25 = coord(1/4)
    0.014013258 = product of:
      0.028026516 = sum of:
        0.028026516 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.028026516 = score(doc=562,freq=2.0), product of:
            0.120730735 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.034476474 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Reiner, U.: Automatische DDC-Klassifizierung bibliografischer Titeldatensätze der Deutschen Nationalbibliografie (2009) 0.01
```
0.014234014 = product of:
  0.056936055 = sum of:
    0.056936055 = sum of:
      0.03825171 = weight(_text_:b in 3284) [ClassicSimilarity], result of:
        0.03825171 = score(doc=3284,freq=8.0), product of:
          0.12214884 = queryWeight, product of:
            3.542962 = idf(docFreq=3476, maxDocs=44218)
            0.034476474 = queryNorm
          0.31315655 = fieldWeight in 3284, product of:
            2.828427 = tf(freq=8.0), with freq of:
              8.0 = termFreq=8.0
            3.542962 = idf(docFreq=3476, maxDocs=44218)
            0.03125 = fieldNorm(doc=3284)
      0.018684344 = weight(_text_:22 in 3284) [ClassicSimilarity], result of:
        0.018684344 = score(doc=3284,freq=2.0), product of:
          0.120730735 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.034476474 = queryNorm
          0.15476047 = fieldWeight in 3284, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.03125 = fieldNorm(doc=3284)
  0.25 = coord(1/4)
```
Abstract

Das Klassifizieren von Objekten (z. B. Fauna, Flora, Texte) ist ein Verfahren, das auf menschlicher Intelligenz basiert. In der Informatik - insbesondere im Gebiet der Künstlichen Intelligenz (KI) - wird u. a. untersucht, inweit Verfahren, die menschliche Intelligenz benötigen, automatisiert werden können. Hierbei hat sich herausgestellt, dass die Lösung von Alltagsproblemen eine größere Herausforderung darstellt, als die Lösung von Spezialproblemen, wie z. B. das Erstellen eines Schachcomputers. So ist "Rybka" der seit Juni 2007 amtierende Computerschach-Weltmeistern. Inwieweit Alltagsprobleme mit Methoden der Künstlichen Intelligenz gelöst werden können, ist eine - für den allgemeinen Fall - noch offene Frage. Beim Lösen von Alltagsproblemen spielt die Verarbeitung der natürlichen Sprache, wie z. B. das Verstehen, eine wesentliche Rolle. Den "gesunden Menschenverstand" als Maschine (in der Cyc-Wissensbasis in Form von Fakten und Regeln) zu realisieren, ist Lenat's Ziel seit 1984. Bezüglich des KI-Paradeprojektes "Cyc" gibt es CycOptimisten und Cyc-Pessimisten. Das Verstehen der natürlichen Sprache (z. B. Werktitel, Zusammenfassung, Vorwort, Inhalt) ist auch beim intellektuellen Klassifizieren von bibliografischen Titeldatensätzen oder Netzpublikationen notwendig, um diese Textobjekte korrekt klassifizieren zu können. Seit dem Jahr 2007 werden von der Deutschen Nationalbibliothek nahezu alle Veröffentlichungen mit der Dewey Dezimalklassifikation (DDC) intellektuell klassifiziert.

Date

22. 1.2010 14:41:24

Calado, P.; Cristo, M.; Gonçalves, M.A.; Moura, E.S. de; Ribeiro-Neto, B.; Ziviani, N.: Link-based similarity measures for the classification of Web documents (2006) 0.01

0.012132276 = product of:
  0.024264552 = sum of:
    0.012310892 = product of:
      0.024621785 = sum of:
        0.024621785 = weight(_text_:p in 4921) [ClassicSimilarity], result of:
          0.024621785 = score(doc=4921,freq=2.0), product of:
            0.1239606 = queryWeight, product of:
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.034476474 = queryNorm
            0.19862589 = fieldWeight in 4921, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4921)
      0.5 = coord(1/2)
    0.011953659 = product of:
      0.023907319 = sum of:
        0.023907319 = weight(_text_:b in 4921) [ClassicSimilarity], result of:
          0.023907319 = score(doc=4921,freq=2.0), product of:
            0.12214884 = queryWeight, product of:
              3.542962 = idf(docFreq=3476, maxDocs=44218)
              0.034476474 = queryNorm
            0.19572285 = fieldWeight in 4921, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.542962 = idf(docFreq=3476, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4921)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.01
```
0.011815688 = product of:
  0.04726275 = sum of:
    0.04726275 = sum of:
      0.023907319 = weight(_text_:b in 1107) [ClassicSimilarity], result of:
        0.023907319 = score(doc=1107,freq=2.0), product of:
          0.12214884 = queryWeight, product of:
            3.542962 = idf(docFreq=3476, maxDocs=44218)
            0.034476474 = queryNorm
          0.19572285 = fieldWeight in 1107, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.542962 = idf(docFreq=3476, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1107)
      0.023355432 = weight(_text_:22 in 1107) [ClassicSimilarity], result of:
        0.023355432 = score(doc=1107,freq=2.0), product of:
          0.120730735 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.034476474 = queryNorm
          0.19345059 = fieldWeight in 1107, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1107)
  0.25 = coord(1/4)
```
Abstract

Retrieval of disease information is often based on several key aspects such as etiology, diagnosis, treatment, prevention, and symptoms of diseases. Automatic identification of disease aspect information is thus essential. In this article, I model the aspect identification problem as a text classification (TC) problem in which a disease aspect corresponds to a category. The disease aspect classification problem poses two challenges to classifiers: (a) a medical text often contains information about multiple aspects of a disease and hence produces noise for the classifiers and (b) text classifiers often cannot extract the textual parts (i.e., passages) about the categories of interest. I thus develop a technique, PETC (Passage Extractor for Text Classification), that extracts passages (from medical texts) for the underlying text classifiers to classify. Case studies on thousands of Chinese and English medical texts show that PETC enhances a support vector machine (SVM) classifier in classifying disease aspect information. PETC also performs better than three state-of-the-art classifier enhancement techniques, including two passage extraction techniques for text classifiers and a technique that employs term proximity information to enhance text classifiers. The contribution is of significance to evidence-based medicine, health education, and healthcare decision support. PETC can be used in those application domains in which a text to be classified may have several parts about different categories.

Date

28.10.2013 19:22:57

Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.01

0.007006629 = product of:
  0.028026516 = sum of:
    0.028026516 = product of:
      0.05605303 = sum of:
        0.05605303 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
          0.05605303 = score(doc=1046,freq=2.0), product of:
            0.120730735 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.034476474 = queryNorm
            0.46428138 = fieldWeight in 1046, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=1046)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 5. 5.2003 14:17:22

Cathey, R.J.; Jensen, E.C.; Beitzel, S.M.; Frieder, O.; Grossman, D.: Exploiting parallelism to support scalable hierarchical clustering (2007) 0.01
```
0.006155446 = product of:
  0.024621785 = sum of:
    0.024621785 = product of:
      0.04924357 = sum of:
        0.04924357 = weight(_text_:p in 448) [ClassicSimilarity], result of:
          0.04924357 = score(doc=448,freq=8.0), product of:
            0.1239606 = queryWeight, product of:
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.034476474 = queryNorm
            0.39725178 = fieldWeight in 448, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.0390625 = fieldNorm(doc=448)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

A distributed memory parallel version of the group average hierarchical agglomerative clustering algorithm is proposed to enable scaling the document clustering problem to large collections. Using standard message passing operations reduces interprocess communication while maintaining efficient load balancing. In a series of experiments using a subset of a standard Text REtrieval Conference (TREC) test collection, our parallel hierarchical clustering algorithm is shown to be scalable in terms of processors efficiently used and the collection size. Results show that our algorithm performs close to the expected O(n**2/p) time on p processors rather than the worst-case O(n**3/p) time. Furthermore, the O(n**2/p) memory complexity per node allows larger collections to be clustered as the number of nodes increases. While partitioning algorithms such as k-means are trivially parallelizable, our results confirm those of other studies which showed that hierarchical algorithms produce significantly tighter clusters in the document clustering task. Finally, we show how our parallel hierarchical agglomerative clustering algorithm can be used as the clustering subroutine for a parallel version of the buckshot algorithm to cluster the complete TREC collection at near theoretical runtime expectations.

Wätjen, H.-J.; Diekmann, B.; Möller, G.; Carstensen, K.-U.: Bericht zum DFG-Projekt: GERHARD : German Harvest Automated Retrieval and Directory (1998) 0.01

0.0059768297 = product of:
  0.023907319 = sum of:
    0.023907319 = product of:
      0.047814637 = sum of:
        0.047814637 = weight(_text_:b in 3065) [ClassicSimilarity], result of:
          0.047814637 = score(doc=3065,freq=2.0), product of:
            0.12214884 = queryWeight, product of:
              3.542962 = idf(docFreq=3476, maxDocs=44218)
              0.034476474 = queryNorm
            0.3914457 = fieldWeight in 3065, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.542962 = idf(docFreq=3476, maxDocs=44218)
              0.078125 = fieldNorm(doc=3065)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Shen, D.; Chen, Z.; Yang, Q.; Zeng, H.J.; Zhang, B.; Lu, Y.; Ma, W.Y.: Web page classification through summarization (2004) 0.01

0.0059768297 = product of:
  0.023907319 = sum of:
    0.023907319 = product of:
      0.047814637 = sum of:
        0.047814637 = weight(_text_:b in 4132) [ClassicSimilarity], result of:
          0.047814637 = score(doc=4132,freq=2.0), product of:
            0.12214884 = queryWeight, product of:
              3.542962 = idf(docFreq=3476, maxDocs=44218)
              0.034476474 = queryNorm
            0.3914457 = fieldWeight in 4132, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.542962 = idf(docFreq=3476, maxDocs=44218)
              0.078125 = fieldNorm(doc=4132)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Reiner, U.: Automatische DDC-Klassifizierung von bibliografischen Titeldatensätzen (2009) 0.01

0.005838858 = product of:
  0.023355432 = sum of:
    0.023355432 = product of:
      0.046710864 = sum of:
        0.046710864 = weight(_text_:22 in 611) [ClassicSimilarity], result of:
          0.046710864 = score(doc=611,freq=2.0), product of:
            0.120730735 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.034476474 = queryNorm
            0.38690117 = fieldWeight in 611, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=611)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 22. 8.2009 12:54:24

HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.01

0.005838858 = product of:
  0.023355432 = sum of:
    0.023355432 = product of:
      0.046710864 = sum of:
        0.046710864 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
          0.046710864 = score(doc=2748,freq=2.0), product of:
            0.120730735 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.034476474 = queryNorm
            0.38690117 = fieldWeight in 2748, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2748)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 1. 2.2016 18:25:22

Malo, P.; Sinha, A.; Wallenius, J.; Korhonen, P.: Concept-based document classification using Wikipedia and value function (2011) 0.01

0.0052230693 = product of:
  0.020892277 = sum of:
    0.020892277 = product of:
      0.041784555 = sum of:
        0.041784555 = weight(_text_:p in 4948) [ClassicSimilarity], result of:
          0.041784555 = score(doc=4948,freq=4.0), product of:
            0.1239606 = queryWeight, product of:
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.034476474 = queryNorm
            0.33707932 = fieldWeight in 4948, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.046875 = fieldNorm(doc=4948)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Bollmann, P.; Konrad, E.; Schneider, H.-J.; Zuse, H.: Anwendung automatischer Klassifikationsverfahren mit dem System FAKYR (1978) 0.00

0.004924357 = product of:
  0.019697428 = sum of:
    0.019697428 = product of:
      0.039394855 = sum of:
        0.039394855 = weight(_text_:p in 82) [ClassicSimilarity], result of:
          0.039394855 = score(doc=82,freq=2.0), product of:
            0.1239606 = queryWeight, product of:
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.034476474 = queryNorm
            0.31780142 = fieldWeight in 82, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.0625 = fieldNorm(doc=82)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Ingwersen, P.; Wormell, I.: Ranganathan in the perspective of advanced information retrieval (1992) 0.00

0.004924357 = product of:
  0.019697428 = sum of:
    0.019697428 = product of:
      0.039394855 = sum of:
        0.039394855 = weight(_text_:p in 7695) [ClassicSimilarity], result of:
          0.039394855 = score(doc=7695,freq=2.0), product of:
            0.1239606 = queryWeight, product of:
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.034476474 = queryNorm
            0.31780142 = fieldWeight in 7695, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.0625 = fieldNorm(doc=7695)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Koch, T.; Ardö, A.: Automatic classification of full-text HTML-documents from one specific subject area : DESIRE II D3.6a, Working Paper 2 (2000) 0.00

0.0047814636 = product of:
  0.019125855 = sum of:
    0.019125855 = product of:
      0.03825171 = sum of:
        0.03825171 = weight(_text_:b in 1667) [ClassicSimilarity], result of:
          0.03825171 = score(doc=1667,freq=2.0), product of:
            0.12214884 = queryWeight, product of:
              3.542962 = idf(docFreq=3476, maxDocs=44218)
              0.034476474 = queryNorm
            0.31315655 = fieldWeight in 1667, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.542962 = idf(docFreq=3476, maxDocs=44218)
              0.0625 = fieldNorm(doc=1667)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Content: 1 Introduction / 2 Method overview / 3 Ei thesaurus preprocessing / 4 Automatic classification process: 4.1 Matching -- 4.2 Weighting -- 4.3 Preparation for display / 5 Results of the classification process / 6 Evaluations / 7 Software / 8 Other applications / 9 Experiments with universal classification systems / References / Appendix A: Ei classification service: Software / Appendix B: Use of the classification software as subject filter in a WWW harvester.

Ruiz, M.E.; Srinivasan, P.: Combining machine learning and hierarchical indexing structures for text categorization (2001) 0.00

0.004308812 = product of:
  0.017235247 = sum of:
    0.017235247 = product of:
      0.034470495 = sum of:
        0.034470495 = weight(_text_:p in 1595) [ClassicSimilarity], result of:
          0.034470495 = score(doc=1595,freq=2.0), product of:
            0.1239606 = queryWeight, product of:
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.034476474 = queryNorm
            0.27807623 = fieldWeight in 1595, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1595)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Bianchini, C.; Bargioni, S.: Automated classification using linked open data : a case study on faceted classification and Wikidata (2021) 0.00

0.004308812 = product of:
  0.017235247 = sum of:
    0.017235247 = product of:
      0.034470495 = sum of:
        0.034470495 = weight(_text_:p in 724) [ClassicSimilarity], result of:
          0.034470495 = score(doc=724,freq=2.0), product of:
            0.1239606 = queryWeight, product of:
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.034476474 = queryNorm
            0.27807623 = fieldWeight in 724, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.0546875 = fieldNorm(doc=724)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Source: Cataloging and classification quarterly. 59(2021) no.8, p.835-852

Reiner, U.: VZG-Projekt Colibri : Bewertung von automatisch DDC-klassifizierten Titeldatensätzen der Deutschen Nationalbibliothek (DNB) (2009) 0.00
```
0.0042262566 = product of:
  0.016905027 = sum of:
    0.016905027 = product of:
      0.033810053 = sum of:
        0.033810053 = weight(_text_:b in 2675) [ClassicSimilarity], result of:
          0.033810053 = score(doc=2675,freq=4.0), product of:
            0.12214884 = queryWeight, product of:
              3.542962 = idf(docFreq=3476, maxDocs=44218)
              0.034476474 = queryNorm
            0.2767939 = fieldWeight in 2675, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.542962 = idf(docFreq=3476, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2675)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

Das VZG-Projekt Colibri/DDC beschäftigt sich seit 2003 mit automatischen Verfahren zur Dewey-Dezimalklassifikation (Dewey Decimal Classification, kurz DDC). Ziel des Projektes ist eine einheitliche DDC-Erschließung von bibliografischen Titeldatensätzen und eine Unterstützung der DDC-Expert(inn)en und DDC-Laien, z. B. bei der Analyse und Synthese von DDC-Notationen und deren Qualitätskontrolle und der DDC-basierten Suche. Der vorliegende Bericht konzentriert sich auf die erste größere automatische DDC-Klassifizierung und erste automatische und intellektuelle Bewertung mit der Klassifizierungskomponente vc_dcl1. Grundlage hierfür waren die von der Deutschen Nationabibliothek (DNB) im November 2007 zur Verfügung gestellten 25.653 Titeldatensätze (12 Wochen-/Monatslieferungen) der Deutschen Nationalbibliografie der Reihen A, B und H. Nach Erläuterung der automatischen DDC-Klassifizierung und automatischen Bewertung in Kapitel 2 wird in Kapitel 3 auf den DNB-Bericht "Colibri_Auswertung_DDC_Endbericht_Sommer_2008" eingegangen. Es werden Sachverhalte geklärt und Fragen gestellt, deren Antworten die Weichen für den Verlauf der weiteren Klassifizierungstests stellen werden. Über das Kapitel 3 hinaus führende weitergehende Betrachtungen und Gedanken zur Fortführung der automatischen DDC-Klassifizierung werden in Kapitel 4 angestellt. Der Bericht dient dem vertieften Verständnis für die automatischen Verfahren.
Qu, B.; Cong, G.; Li, C.; Sun, A.; Chen, H.: ¬An evaluation of classification models for question topic categorization (2012) 0.00
```
0.0042262566 = product of:
  0.016905027 = sum of:
    0.016905027 = product of:
      0.033810053 = sum of:
        0.033810053 = weight(_text_:b in 237) [ClassicSimilarity], result of:
          0.033810053 = score(doc=237,freq=4.0), product of:
            0.12214884 = queryWeight, product of:
              3.542962 = idf(docFreq=3476, maxDocs=44218)
              0.034476474 = queryNorm
            0.2767939 = fieldWeight in 237, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.542962 = idf(docFreq=3476, maxDocs=44218)
              0.0390625 = fieldNorm(doc=237)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

We study the problem of question topic classification using a very large real-world Community Question Answering (CQA) dataset from Yahoo! Answers. The dataset comprises 3.9 million questions and these questions are organized into more than 1,000 categories in a hierarchy. To the best knowledge, this is the first systematic evaluation of the performance of different classification methods on question topic classification as well as short texts. Specifically, we empirically evaluate the following in classifying questions into CQA categories: (a) the usefulness of n-gram features and bag-of-word features; (b) the performance of three standard classification algorithms (naive Bayes, maximum entropy, and support vector machines); (c) the performance of the state-of-the-art hierarchical classification algorithms; (d) the effect of training data size on performance; and (e) the effectiveness of the different components of CQA data, including subject, content, asker, and the best answer. The experimental results show what aspects are important for question topic classification in terms of both effectiveness and efficiency. We believe that the experimental findings from this study will be useful in real-world classification problems.

Bock, H.-H.: Datenanalyse zur Strukturierung und Ordnung von Information (1989) 0.00

0.0040872004 = product of:
  0.016348802 = sum of:
    0.016348802 = product of:
      0.032697603 = sum of:
        0.032697603 = weight(_text_:22 in 141) [ClassicSimilarity], result of:
          0.032697603 = score(doc=141,freq=2.0), product of:
            0.120730735 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.034476474 = queryNorm
            0.2708308 = fieldWeight in 141, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=141)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Pages: S.1-22

Dubin, D.: Dimensions and discriminability (1998) 0.00

0.0040872004 = product of:
  0.016348802 = sum of:
    0.016348802 = product of:
      0.032697603 = sum of:
        0.032697603 = weight(_text_:22 in 2338) [ClassicSimilarity], result of:
          0.032697603 = score(doc=2338,freq=2.0), product of:
            0.120730735 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.034476474 = queryNorm
            0.2708308 = fieldWeight in 2338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2338)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 22. 9.1997 19:16:05

Search (46 results, page 1 of 3)

Authors

Years

Languages

Types

Themes