Search (38 results, page 1 of 2)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.08

0.08276204 = sum of:
  0.061706625 = product of:
    0.2468265 = sum of:
      0.2468265 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
        0.2468265 = score(doc=562,freq=2.0), product of:
          0.43917897 = queryWeight, product of:
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.05180212 = queryNorm
          0.56201804 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
    0.25 = coord(1/4)
  0.021055417 = product of:
    0.042110834 = sum of:
      0.042110834 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
        0.042110834 = score(doc=562,freq=2.0), product of:
          0.1814022 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05180212 = queryNorm
          0.23214069 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
    0.5 = coord(1/2)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Calado, P.; Cristo, M.; Gonçalves, M.A.; Moura, E.S. de; Ribeiro-Neto, B.; Ziviani, N.: Link-based similarity measures for the classification of Web documents (2006) 0.05
```
0.04762715 = sum of:
  0.021027196 = product of:
    0.084108785 = sum of:
      0.084108785 = weight(_text_:authors in 4921) [ClassicSimilarity], result of:
        0.084108785 = score(doc=4921,freq=4.0), product of:
          0.23615624 = queryWeight, product of:
            4.558814 = idf(docFreq=1258, maxDocs=44218)
            0.05180212 = queryNorm
          0.35615736 = fieldWeight in 4921, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            4.558814 = idf(docFreq=1258, maxDocs=44218)
            0.0390625 = fieldNorm(doc=4921)
    0.25 = coord(1/4)
  0.026599953 = product of:
    0.053199906 = sum of:
      0.053199906 = weight(_text_:n in 4921) [ClassicSimilarity], result of:
        0.053199906 = score(doc=4921,freq=2.0), product of:
          0.22335295 = queryWeight, product of:
            4.3116565 = idf(docFreq=1611, maxDocs=44218)
            0.05180212 = queryNorm
          0.23818761 = fieldWeight in 4921, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            4.3116565 = idf(docFreq=1611, maxDocs=44218)
            0.0390625 = fieldNorm(doc=4921)
    0.5 = coord(1/2)
```
Abstract

Traditional text-based document classifiers tend to perform poorly an the Web. Text in Web documents is usually noisy and often does not contain enough information to determine their topic. However, the Web provides a different source that can be useful to document classification: its hyperlink structure. In this work, the authors evaluate how the link structure of the Web can be used to determine a measure of similarity appropriate for document classification. They experiment with five different similarity measures and determine their adequacy for predicting the topic of a Web page. Tests performed an a Web directory Show that link information alone allows classifying documents with an average precision of 86%. Further, when combined with a traditional textbased classifier, precision increases to values of up to 90%, representing gains that range from 63 to 132% over the use of text-based classification alone. Because the measures proposed in this article are straightforward to compute, they provide a practical and effective solution for Web classification and related information retrieval tasks. Further, the authors provide an important set of guidelines an how link structure can be used effectively to classify Web documents.

Mengle, S.; Goharian, N.: Passage detection using text classification (2009) 0.04

0.044146135 = product of:
  0.08829227 = sum of:
    0.08829227 = sum of:
      0.053199906 = weight(_text_:n in 2765) [ClassicSimilarity], result of:
        0.053199906 = score(doc=2765,freq=2.0), product of:
          0.22335295 = queryWeight, product of:
            4.3116565 = idf(docFreq=1611, maxDocs=44218)
            0.05180212 = queryNorm
          0.23818761 = fieldWeight in 2765, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            4.3116565 = idf(docFreq=1611, maxDocs=44218)
            0.0390625 = fieldNorm(doc=2765)
      0.03509236 = weight(_text_:22 in 2765) [ClassicSimilarity], result of:
        0.03509236 = score(doc=2765,freq=2.0), product of:
          0.1814022 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05180212 = queryNorm
          0.19345059 = fieldWeight in 2765, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=2765)
  0.5 = coord(1/2)

Date: 22. 3.2009 19:14:43

Khoo, C.S.G.; Ng, K.; Ou, S.: ¬An exploratory study of human clustering of Web pages (2003) 0.03
```
0.025931723 = sum of:
  0.011894777 = product of:
    0.04757911 = sum of:
      0.04757911 = weight(_text_:authors in 2741) [ClassicSimilarity], result of:
        0.04757911 = score(doc=2741,freq=2.0), product of:
          0.23615624 = queryWeight, product of:
            4.558814 = idf(docFreq=1258, maxDocs=44218)
            0.05180212 = queryNorm
          0.20147301 = fieldWeight in 2741, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            4.558814 = idf(docFreq=1258, maxDocs=44218)
            0.03125 = fieldNorm(doc=2741)
    0.25 = coord(1/4)
  0.014036945 = product of:
    0.02807389 = sum of:
      0.02807389 = weight(_text_:22 in 2741) [ClassicSimilarity], result of:
        0.02807389 = score(doc=2741,freq=2.0), product of:
          0.1814022 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05180212 = queryNorm
          0.15476047 = fieldWeight in 2741, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.03125 = fieldNorm(doc=2741)
    0.5 = coord(1/2)
```
Abstract

This study seeks to find out how human beings cluster Web pages naturally. Twenty Web pages retrieved by the Northem Light search engine for each of 10 queries were sorted by 3 subjects into categories that were natural or meaningful to them. lt was found that different subjects clustered the same set of Web pages quite differently and created different categories. The average inter-subject similarity of the clusters created was a low 0.27. Subjects created an average of 5.4 clusters for each sorting. The categories constructed can be divided into 10 types. About 1/3 of the categories created were topical. Another 20% of the categories relate to the degree of relevance or usefulness. The rest of the categories were subject-independent categories such as format, purpose, authoritativeness and direction to other sources. The authors plan to develop automatic methods for categorizing Web pages using the common categories created by the subjects. lt is hoped that the techniques developed can be used by Web search engines to automatically organize Web pages retrieved into categories that are natural to users. 1. Introduction The World Wide Web is an increasingly important source of information for people globally because of its ease of access, the ease of publishing, its ability to transcend geographic and national boundaries, its flexibility and heterogeneity and its dynamic nature. However, Web users also find it increasingly difficult to locate relevant and useful information in this vast information storehouse. Web search engines, despite their scope and power, appear to be quite ineffective. They retrieve too many pages, and though they attempt to rank retrieved pages in order of probable relevance, often the relevant documents do not appear in the top-ranked 10 or 20 documents displayed. Several studies have found that users do not know how to use the advanced features of Web search engines, and do not know how to formulate and re-formulate queries. Users also typically exert minimal effort in performing, evaluating and refining their searches, and are unwilling to scan more than 10 or 20 items retrieved (Jansen, Spink, Bateman & Saracevic, 1998). This suggests that the conventional ranked-list display of search results does not satisfy user requirements, and that better ways of presenting and summarizing search results have to be developed. One promising approach is to group retrieved pages into clusters or categories to allow users to navigate immediately to the "promising" clusters where the most useful Web pages are likely to be located. This approach has been adopted by a number of search engines (notably Northem Light) and search agents.

Date

12. 9.2004 9:56:22
Cathey, R.J.; Jensen, E.C.; Beitzel, S.M.; Frieder, O.; Grossman, D.: Exploiting parallelism to support scalable hierarchical clustering (2007) 0.02
```
0.023036236 = product of:
  0.046072472 = sum of:
    0.046072472 = product of:
      0.092144944 = sum of:
        0.092144944 = weight(_text_:n in 448) [ClassicSimilarity], result of:
          0.092144944 = score(doc=448,freq=6.0), product of:
            0.22335295 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.05180212 = queryNorm
            0.41255307 = fieldWeight in 448, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0390625 = fieldNorm(doc=448)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

A distributed memory parallel version of the group average hierarchical agglomerative clustering algorithm is proposed to enable scaling the document clustering problem to large collections. Using standard message passing operations reduces interprocess communication while maintaining efficient load balancing. In a series of experiments using a subset of a standard Text REtrieval Conference (TREC) test collection, our parallel hierarchical clustering algorithm is shown to be scalable in terms of processors efficiently used and the collection size. Results show that our algorithm performs close to the expected O(n**2/p) time on p processors rather than the worst-case O(n**3/p) time. Furthermore, the O(n**2/p) memory complexity per node allows larger collections to be clustered as the number of nodes increases. While partitioning algorithms such as k-means are trivially parallelizable, our results confirm those of other studies which showed that hierarchical algorithms produce significantly tighter clusters in the document clustering task. Finally, we show how our parallel hierarchical agglomerative clustering algorithm can be used as the clustering subroutine for a parallel version of the buckshot algorithm to cluster the complete TREC collection at near theoretical runtime expectations.
Choi, B.; Peng, X.: Dynamic and hierarchical classification of Web pages (2004) 0.02
```
0.02257081 = product of:
  0.04514162 = sum of:
    0.04514162 = product of:
      0.09028324 = sum of:
        0.09028324 = weight(_text_:n in 2555) [ClassicSimilarity], result of:
          0.09028324 = score(doc=2555,freq=4.0), product of:
            0.22335295 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.05180212 = queryNorm
            0.40421778 = fieldWeight in 2555, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.046875 = fieldNorm(doc=2555)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Automatic classification of Web pages is an effective way to organise the vast amount of information and to assist in retrieving relevant information from the Internet. Although many automatic classification systems have been proposed, most of them ignore the conflict between the fixed number of categories and the growing number of Web pages being added into the systems. They also require searching through all existing categories to make any classification. This article proposes a dynamic and hierarchical classification system that is capable of adding new categories as required, organising the Web pages into a tree structure, and classifying Web pages by searching through only one path of the tree. The proposed single-path search technique reduces the search complexity from (n) to (log(n)). Test results show that the system improves the accuracy of classification by 6 percent in comparison to related systems. The dynamic-category expansion technique also achieves satisfying results for adding new categories into the system as required.

Fuhr, N.: Klassifikationsverfahren bei der automatischen Indexierung (1983) 0.02

0.021279963 = product of:
  0.042559925 = sum of:
    0.042559925 = product of:
      0.08511985 = sum of:
        0.08511985 = weight(_text_:n in 7697) [ClassicSimilarity], result of:
          0.08511985 = score(doc=7697,freq=2.0), product of:
            0.22335295 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.05180212 = queryNorm
            0.38110018 = fieldWeight in 7697, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0625 = fieldNorm(doc=7697)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.02

0.021055417 = product of:
  0.042110834 = sum of:
    0.042110834 = product of:
      0.08422167 = sum of:
        0.08422167 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
          0.08422167 = score(doc=1046,freq=2.0), product of:
            0.1814022 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05180212 = queryNorm
            0.46428138 = fieldWeight in 1046, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=1046)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 5. 5.2003 14:17:22

Meder, N.: Artificial intelligence as a tool of classification, or: the network of language games as cognitive paradigm (1985) 0.02

0.018619968 = product of:
  0.037239935 = sum of:
    0.037239935 = product of:
      0.07447987 = sum of:
        0.07447987 = weight(_text_:n in 7694) [ClassicSimilarity], result of:
          0.07447987 = score(doc=7694,freq=2.0), product of:
            0.22335295 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.05180212 = queryNorm
            0.33346266 = fieldWeight in 7694, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0546875 = fieldNorm(doc=7694)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Reiner, U.: Automatische DDC-Klassifizierung von bibliografischen Titeldatensätzen (2009) 0.02

0.01754618 = product of:
  0.03509236 = sum of:
    0.03509236 = product of:
      0.07018472 = sum of:
        0.07018472 = weight(_text_:22 in 611) [ClassicSimilarity], result of:
          0.07018472 = score(doc=611,freq=2.0), product of:
            0.1814022 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05180212 = queryNorm
            0.38690117 = fieldWeight in 611, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=611)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 8.2009 12:54:24

HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.02

0.01754618 = product of:
  0.03509236 = sum of:
    0.03509236 = product of:
      0.07018472 = sum of:
        0.07018472 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
          0.07018472 = score(doc=2748,freq=2.0), product of:
            0.1814022 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05180212 = queryNorm
            0.38690117 = fieldWeight in 2748, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2748)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 1. 2.2016 18:25:22

Drori, O.; Alon, N.: Using document classification for displaying search results (2003) 0.02

0.015959973 = product of:
  0.031919945 = sum of:
    0.031919945 = product of:
      0.06383989 = sum of:
        0.06383989 = weight(_text_:n in 1565) [ClassicSimilarity], result of:
          0.06383989 = score(doc=1565,freq=2.0), product of:
            0.22335295 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.05180212 = queryNorm
            0.28582513 = fieldWeight in 1565, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.046875 = fieldNorm(doc=1565)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Finn, A.; Kushmerick, N.: Learning to classify documents according to genre (2006) 0.02

0.015959973 = product of:
  0.031919945 = sum of:
    0.031919945 = product of:
      0.06383989 = sum of:
        0.06383989 = weight(_text_:n in 6010) [ClassicSimilarity], result of:
          0.06383989 = score(doc=6010,freq=2.0), product of:
            0.22335295 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.05180212 = queryNorm
            0.28582513 = fieldWeight in 6010, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.046875 = fieldNorm(doc=6010)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Teich, E.; Degaetano-Ortlieb, S.; Fankhauser, P.; Kermes, H.; Lapshinova-Koltunski, E.: ¬The linguistic construal of disciplinarity : a data-mining approach using register features (2016) 0.02
```
0.015959973 = product of:
  0.031919945 = sum of:
    0.031919945 = product of:
      0.06383989 = sum of:
        0.06383989 = weight(_text_:n in 3015) [ClassicSimilarity], result of:
          0.06383989 = score(doc=3015,freq=2.0), product of:
            0.22335295 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.05180212 = queryNorm
            0.28582513 = fieldWeight in 3015, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.046875 = fieldNorm(doc=3015)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

We analyze the linguistic evolution of selected scientific disciplines over a 30-year time span (1970s to 2000s). Our focus is on four highly specialized disciplines at the boundaries of computer science that emerged during that time: computational linguistics, bioinformatics, digital construction, and microelectronics. Our analysis is driven by the question whether these disciplines develop a distinctive language use-both individually and collectively-over the given time period. The data set is the English Scientific Text Corpus (scitex), which includes texts from the 1970s/1980s and early 2000s. Our theoretical basis is register theory. In terms of methods, we combine corpus-based methods of feature extraction (various aggregated features [part-of-speech based], n-grams, lexico-grammatical patterns) and automatic text classification. The results of our research are directly relevant to the study of linguistic variation and languages for specific purposes (LSP) and have implications for various natural language processing (NLP) tasks, for example, authorship attribution, text mining, or training NLP tools.
Bianchini, C.; Bargioni, S.: Automated classification using linked open data : a case study on faceted classification and Wikidata (2021) 0.01
```
0.014719036 = product of:
  0.029438073 = sum of:
    0.029438073 = product of:
      0.11775229 = sum of:
        0.11775229 = weight(_text_:authors in 724) [ClassicSimilarity], result of:
          0.11775229 = score(doc=724,freq=4.0), product of:
            0.23615624 = queryWeight, product of:
              4.558814 = idf(docFreq=1258, maxDocs=44218)
              0.05180212 = queryNorm
            0.49862027 = fieldWeight in 724, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.558814 = idf(docFreq=1258, maxDocs=44218)
              0.0546875 = fieldNorm(doc=724)
      0.25 = coord(1/4)
  0.5 = coord(1/2)
```
Abstract

The Wikidata gadget, CCLitBox, for the automated classification of literary authors and works by a faceted classification and using Linked Open Data (LOD) is presented. The tool reproduces the classification algorithm of class O Literature of the Colon Classification and uses data freely available in Wikidata to create Colon Classification class numbers. CCLitBox is totally free and enables any user to classify literary authors and their works; it is easily accessible to everybody; it uses LOD from Wikidata but missing data for classification can be freely added if necessary; it is readymade for any cooperative and networked project.

Rooney, N.; Patterson, D.; Galushka, M.; Dobrynin, V.; Smirnova, E.: ¬An investigation into the stability of contextual document clustering (2008) 0.01

0.0132999765 = product of:
  0.026599953 = sum of:
    0.026599953 = product of:
      0.053199906 = sum of:
        0.053199906 = weight(_text_:n in 1356) [ClassicSimilarity], result of:
          0.053199906 = score(doc=1356,freq=2.0), product of:
            0.22335295 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.05180212 = queryNorm
            0.23818761 = fieldWeight in 1356, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1356)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Mengle, S.S.R.; Goharian, N.: Ambiguity measure feature-selection algorithm (2009) 0.01

0.0132999765 = product of:
  0.026599953 = sum of:
    0.026599953 = product of:
      0.053199906 = sum of:
        0.053199906 = weight(_text_:n in 2804) [ClassicSimilarity], result of:
          0.053199906 = score(doc=2804,freq=2.0), product of:
            0.22335295 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.05180212 = queryNorm
            0.23818761 = fieldWeight in 2804, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2804)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Maghsoodi, N.; Homayounpour, M.M.: Improving Farsi multiclass text classification using a thesaurus and two-stage feature selection (2011) 0.01

0.0132999765 = product of:
  0.026599953 = sum of:
    0.026599953 = product of:
      0.053199906 = sum of:
        0.053199906 = weight(_text_:n in 4775) [ClassicSimilarity], result of:
          0.053199906 = score(doc=4775,freq=2.0), product of:
            0.22335295 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.05180212 = queryNorm
            0.23818761 = fieldWeight in 4775, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4775)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Qu, B.; Cong, G.; Li, C.; Sun, A.; Chen, H.: ¬An evaluation of classification models for question topic categorization (2012) 0.01
```
0.0132999765 = product of:
  0.026599953 = sum of:
    0.026599953 = product of:
      0.053199906 = sum of:
        0.053199906 = weight(_text_:n in 237) [ClassicSimilarity], result of:
          0.053199906 = score(doc=237,freq=2.0), product of:
            0.22335295 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.05180212 = queryNorm
            0.23818761 = fieldWeight in 237, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0390625 = fieldNorm(doc=237)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

We study the problem of question topic classification using a very large real-world Community Question Answering (CQA) dataset from Yahoo! Answers. The dataset comprises 3.9 million questions and these questions are organized into more than 1,000 categories in a hierarchy. To the best knowledge, this is the first systematic evaluation of the performance of different classification methods on question topic classification as well as short texts. Specifically, we empirically evaluate the following in classifying questions into CQA categories: (a) the usefulness of n-gram features and bag-of-word features; (b) the performance of three standard classification algorithms (naive Bayes, maximum entropy, and support vector machines); (c) the performance of the state-of-the-art hierarchical classification algorithms; (d) the effect of training data size on performance; and (e) the effectiveness of the different components of CQA data, including subject, content, asker, and the best answer. The experimental results show what aspects are important for question topic classification in terms of both effectiveness and efficiency. We believe that the experimental findings from this study will be useful in real-world classification problems.
Alberts, I.; Forest, D.: Email pragmatics and automatic classification : a study in the organizational context (2012) 0.01
```
0.0132999765 = product of:
  0.026599953 = sum of:
    0.026599953 = product of:
      0.053199906 = sum of:
        0.053199906 = weight(_text_:n in 238) [ClassicSimilarity], result of:
          0.053199906 = score(doc=238,freq=2.0), product of:
            0.22335295 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.05180212 = queryNorm
            0.23818761 = fieldWeight in 238, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0390625 = fieldNorm(doc=238)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

This paper presents a two-phased research project aiming to improve email triage for public administration managers. The first phase developed a typology of email classification patterns through a qualitative study involving 34 participants. Inspired by the fields of pragmatics and speech act theory, this typology comprising four top level categories and 13 subcategories represents the typical email triage behaviors of managers in an organizational context. The second study phase was conducted on a corpus of 1,703 messages using email samples of two managers. Using the k-NN (k-nearest neighbor) algorithm, statistical treatments automatically classified the email according to lexical and nonlexical features representative of managers' triage patterns. The automatic classification of email according to the lexicon of the messages was found to be substantially more efficient when k = 2 and n = 2,000. For four categories, the average recall rate was 94.32%, the average precision rate was 94.50%, and the accuracy rate was 94.54%. For 13 categories, the average recall rate was 91.09%, the average precision rate was 84.18%, and the accuracy rate was 88.70%. It appears that a message's nonlexical features are also deeply influenced by email pragmatics. Features related to the recipient and the sender were the most relevant for characterizing email.

Search (38 results, page 1 of 2)

Authors

Years

Languages

Types

Themes