Search (31 results, page 1 of 2)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.23

0.23485142 = product of:
  0.31313524 = sum of:
    0.0735765 = product of:
      0.2207295 = sum of:
        0.2207295 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.2207295 = score(doc=562,freq=2.0), product of:
            0.3927445 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046325076 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.2207295 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.2207295 = score(doc=562,freq=2.0), product of:
        0.3927445 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046325076 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.018829225 = product of:
      0.03765845 = sum of:
        0.03765845 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.03765845 = score(doc=562,freq=2.0), product of:
            0.16222252 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046325076 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Chae, G.; Park, J.; Park, J.; Yeo, W.S.; Shi, C.: Linking and clustering artworks using social tags : revitalizing crowd-sourced information on cultural collections (2016) 0.03
```
0.028773637 = product of:
  0.11509455 = sum of:
    0.11509455 = weight(_text_:social in 2852) [ClassicSimilarity], result of:
      0.11509455 = score(doc=2852,freq=16.0), product of:
        0.1847249 = queryWeight, product of:
          3.9875789 = idf(docFreq=2228, maxDocs=44218)
          0.046325076 = queryNorm
        0.6230592 = fieldWeight in 2852, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          3.9875789 = idf(docFreq=2228, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2852)
  0.25 = coord(1/4)
```
Abstract

Social tagging is one of the most popular methods for collecting crowd-sourced information in galleries, libraries, archives, and museums (GLAMs). However, when the number of social tags grows rapidly, using them becomes problematic and, as a result, they are often left as simply big data that cannot be used for practical purposes. To revitalize the use of this crowd-sourced information, we propose using social tags to link and cluster artworks based on an experimental study using an online collection at the Gyeonggi Museum of Modern Art (GMoMA). We view social tagging as a folksonomy, where artworks are classified by keywords of the crowd's various interpretations and one artwork can belong to several different categories simultaneously. To leverage this strength of social tags, we used a clustering method called "link communities" to detect overlapping communities in a network of artworks constructed by computing similarities between all artwork pairs. We used this framework to identify semantic relationships and clusters of similar artworks. By comparing the clustering results with curators' manual classification results, we demonstrated the potential of social tagging data for automatically clustering artworks in a way that reflects the dynamic perspectives of crowds.

Theme

Social tagging
Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.03
```
0.026329588 = product of:
  0.10531835 = sum of:
    0.10531835 = sum of:
      0.073936306 = weight(_text_:aspects in 1107) [ClassicSimilarity], result of:
        0.073936306 = score(doc=1107,freq=4.0), product of:
          0.20938325 = queryWeight, product of:
            4.5198684 = idf(docFreq=1308, maxDocs=44218)
            0.046325076 = queryNorm
          0.35311472 = fieldWeight in 1107, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            4.5198684 = idf(docFreq=1308, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1107)
      0.031382043 = weight(_text_:22 in 1107) [ClassicSimilarity], result of:
        0.031382043 = score(doc=1107,freq=2.0), product of:
          0.16222252 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046325076 = queryNorm
          0.19345059 = fieldWeight in 1107, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1107)
  0.25 = coord(1/4)
```
Abstract

Retrieval of disease information is often based on several key aspects such as etiology, diagnosis, treatment, prevention, and symptoms of diseases. Automatic identification of disease aspect information is thus essential. In this article, I model the aspect identification problem as a text classification (TC) problem in which a disease aspect corresponds to a category. The disease aspect classification problem poses two challenges to classifiers: (a) a medical text often contains information about multiple aspects of a disease and hence produces noise for the classifiers and (b) text classifiers often cannot extract the textual parts (i.e., passages) about the categories of interest. I thus develop a technique, PETC (Passage Extractor for Text Classification), that extracts passages (from medical texts) for the underlying text classifiers to classify. Case studies on thousands of Chinese and English medical texts show that PETC enhances a support vector machine (SVM) classifier in classifying disease aspect information. PETC also performs better than three state-of-the-art classifier enhancement techniques, including two passage extraction techniques for text classifiers and a technique that employs term proximity information to enhance text classifiers. The contribution is of significance to evidence-based medicine, health education, and healthcare decision support. PETC can be used in those application domains in which a text to be classified may have several parts about different categories.

Date

28.10.2013 19:22:57
Losee, R.M.; Haas, S.W.: Sublanguage terms : dictionaries, usage, and automatic classification (1995) 0.02
```
0.016276827 = product of:
  0.06510731 = sum of:
    0.06510731 = weight(_text_:social in 2650) [ClassicSimilarity], result of:
      0.06510731 = score(doc=2650,freq=2.0), product of:
        0.1847249 = queryWeight, product of:
          3.9875789 = idf(docFreq=2228, maxDocs=44218)
          0.046325076 = queryNorm
        0.3524555 = fieldWeight in 2650, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9875789 = idf(docFreq=2228, maxDocs=44218)
          0.0625 = fieldNorm(doc=2650)
  0.25 = coord(1/4)
```
Abstract

The use of terms from natural and social science titles and abstracts is studied from the perspective of sublanguages and their specialized dictionaries. Explores different notions of sublanguage distinctiveness. Object methods for separating hard and soft sciences are suggested based on measures of sublanguage use, dictionary characteristics, and sublanguage distinctiveness. Abstracts were automatically classified with a high degree of accuracy by using a formula that condsiders the degree of uniqueness of terms in each sublanguage. This may prove useful for text filtering of information retrieval systems
Billal, B.; Fonseca, A.; Sadat, F.; Lounis, H.: Semi-supervised learning and social media text analysis towards multi-labeling categorization (2017) 0.01
```
0.014096146 = product of:
  0.056384586 = sum of:
    0.056384586 = weight(_text_:social in 4095) [ClassicSimilarity], result of:
      0.056384586 = score(doc=4095,freq=6.0), product of:
        0.1847249 = queryWeight, product of:
          3.9875789 = idf(docFreq=2228, maxDocs=44218)
          0.046325076 = queryNorm
        0.30523545 = fieldWeight in 4095, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.9875789 = idf(docFreq=2228, maxDocs=44218)
          0.03125 = fieldNorm(doc=4095)
  0.25 = coord(1/4)
```
Abstract

In traditional text classification, classes are mutually exclusive, i.e. it is not possible to have one text or text fragment classified into more than one class. On the other hand, in multi-label classification an individual text may belong to several classes simultaneously. This type of classification is required by a large number of current applications such as big data classification, images and video annotation. Supervised learning is the most used type of machine learning in the classification task. It requires large quantities of labeled data and the intervention of a human tagger in the creation of the training sets. When the data sets become very large or heavily noisy, this operation can be tedious, prone to error and time consuming. In this case, semi-supervised learning, which requires only few labels, is a better choice. In this paper, we study and evaluate several methods to address the problem of multi-label classification using semi-supervised learning and data from social networks. First, we propose a linguistic pre-processing involving tokeni-sation, recognition of named entities and hashtag segmentation in order to decrease the noise in this type of massive and unstructured real data and then we perform a word sense disambiguation using WordNet. Second, several experiments related to multi-label classification and semi-supervised learning are carried out on these data sets and compared to each other. These evaluations compare the results of the approaches considered. This paper proposes a method for combining semi-supervised methods with a graph method for the extraction of subjects in social networks using a multi-label classification approach. Experiments show that the performance of the proposed model increases in 4 p.p. the precision of the classification when compared to a baseline.
Wu, K.J.; Chen, M.-C.; Sun, Y.: Automatic topics discovery from hyperlinked documents (2004) 0.01
```
0.01220762 = product of:
  0.04883048 = sum of:
    0.04883048 = weight(_text_:social in 2563) [ClassicSimilarity], result of:
      0.04883048 = score(doc=2563,freq=2.0), product of:
        0.1847249 = queryWeight, product of:
          3.9875789 = idf(docFreq=2228, maxDocs=44218)
          0.046325076 = queryNorm
        0.26434162 = fieldWeight in 2563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9875789 = idf(docFreq=2228, maxDocs=44218)
          0.046875 = fieldNorm(doc=2563)
  0.25 = coord(1/4)
```
Abstract

Topic discovery is an important means for marketing, e-Business and social science studies. As well, it can be applied to various purposes, such as identifying a group with certain properties and observing the emergence and diminishment of a certain cyber community. Previous topic discovery work (J.M. Kleinberg, Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms, San Francisco, California, p. 668) requires manual judgment of usefulness of outcomes and is thus incapable of handling the explosive growth of the Internet. In this paper, we propose the Automatic Topic Discovery (ATD) method, which combines a method of base set construction, a clustering algorithm and an iterative principal eigenvector computation method to discover the topics relevant to a given query without using manual examination. Given a query, ATD returns with topics associated with the query and top representative pages for each topic. Our experiments show that the ATD method performs better than the traditional eigenvector method in terms of computation time and topic discovery quality.
Giorgetti, D.; Sebastiani, F.: Automating survey coding by multiclass text categorization techniques (2003) 0.01
```
0.010173016 = product of:
  0.040692065 = sum of:
    0.040692065 = weight(_text_:social in 5172) [ClassicSimilarity], result of:
      0.040692065 = score(doc=5172,freq=2.0), product of:
        0.1847249 = queryWeight, product of:
          3.9875789 = idf(docFreq=2228, maxDocs=44218)
          0.046325076 = queryNorm
        0.22028469 = fieldWeight in 5172, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9875789 = idf(docFreq=2228, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5172)
  0.25 = coord(1/4)
```
Abstract

In this issue Giorgetti, and Sebastiani suggest that answers to open ended questions in survey instruments can be coded automatically by creating classifiers which learn from training sets of manually coded answers. The manual effort required is only that of classifying a representative set of documents, not creating a dictionary of words that trigger an assignment. They use a naive Bayesian probabilistic learner from Mc Callum's RAINBOW package and the multi-class support vector machine learner from Hsu and Lin's BSVM package, both examples of text categorization techniques. Data from the 1996 General Social Survey by the U.S. National Opinion Research Center provided a set of answers to three questions (previously tested by Viechnicki using a dictionary approach), their associated manually assigned category codes, and a complete set of predefined category codes. The learners were run on three random disjoint subsets of the answer sets to create the classifiers and a remaining set was used as a test set. The dictionary approach is out preformed by 18% for RAINBOW and by 17% for BSVM, while the standard deviation of the results is reduced by 28% and 34% respectively over the dictionary approach.
Ma, Z.; Sun, A.; Cong, G.: On predicting the popularity of newly emerging hashtags in Twitter (2013) 0.01
```
0.010173016 = product of:
  0.040692065 = sum of:
    0.040692065 = weight(_text_:social in 967) [ClassicSimilarity], result of:
      0.040692065 = score(doc=967,freq=2.0), product of:
        0.1847249 = queryWeight, product of:
          3.9875789 = idf(docFreq=2228, maxDocs=44218)
          0.046325076 = queryNorm
        0.22028469 = fieldWeight in 967, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9875789 = idf(docFreq=2228, maxDocs=44218)
          0.0390625 = fieldNorm(doc=967)
  0.25 = coord(1/4)
```
Abstract

Because of Twitter's popularity and the viral nature of information dissemination on Twitter, predicting which Twitter topics will become popular in the near future becomes a task of considerable economic importance. Many Twitter topics are annotated by hashtags. In this article, we propose methods to predict the popularity of new hashtags on Twitter by formulating the problem as a classification task. We use five standard classification models (i.e., Naïve bayes, k-nearest neighbors, decision trees, support vector machines, and logistic regression) for prediction. The main challenge is the identification of effective features for describing new hashtags. We extract 7 content features from a hashtag string and the collection of tweets containing the hashtag and 11 contextual features from the social graph formed by users who have adopted the hashtag. We conducted experiments on a Twitter data set consisting of 31 million tweets from 2 million Singapore-based users. The experimental results show that the standard classifiers using the extracted features significantly outperform the baseline methods that do not use these features. Among the five classifiers, the logistic regression model performs the best in terms of the Micro-F1 measure. We also observe that contextual features are more effective than content features.
Vilares, D.; Alonso, M.A.; Gómez-Rodríguez, C.: On the usefulness of lexical and syntactic processing in polarity classification of Twitter messages (2015) 0.01
```
0.010173016 = product of:
  0.040692065 = sum of:
    0.040692065 = weight(_text_:social in 2161) [ClassicSimilarity], result of:
      0.040692065 = score(doc=2161,freq=2.0), product of:
        0.1847249 = queryWeight, product of:
          3.9875789 = idf(docFreq=2228, maxDocs=44218)
          0.046325076 = queryNorm
        0.22028469 = fieldWeight in 2161, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9875789 = idf(docFreq=2228, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2161)
  0.25 = coord(1/4)
```
Abstract

Millions of micro texts are published every day on Twitter. Identifying the sentiment present in them can be helpful for measuring the frame of mind of the public, their satisfaction with respect to a product, or their support of a social event. In this context, polarity classification is a subfield of sentiment analysis focused on determining whether the content of a text is objective or subjective, and in the latter case, if it conveys a positive or a negative opinion. Most polarity detection techniques tend to take into account individual terms in the text and even some degree of linguistic knowledge, but they do not usually consider syntactic relations between words. This article explores how relating lexical, syntactic, and psychometric information can be helpful to perform polarity classification on Spanish tweets. We provide an evaluation for both shallow and deep linguistic perspectives. Empirical results show an improved performance of syntactic approaches over pure lexical models when using large training sets to create a classifier, but this tendency is reversed when small training collections are used.
Smiraglia, R.P.; Cai, X.: Tracking the evolution of clustering, machine learning, automatic indexing and automatic classification in knowledge organization (2017) 0.01
```
0.010173016 = product of:
  0.040692065 = sum of:
    0.040692065 = weight(_text_:social in 3627) [ClassicSimilarity], result of:
      0.040692065 = score(doc=3627,freq=2.0), product of:
        0.1847249 = queryWeight, product of:
          3.9875789 = idf(docFreq=2228, maxDocs=44218)
          0.046325076 = queryNorm
        0.22028469 = fieldWeight in 3627, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9875789 = idf(docFreq=2228, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3627)
  0.25 = coord(1/4)
```
Abstract

A very important extension of the traditional domain of knowledge organization (KO) arises from attempts to incorporate techniques devised in the computer science domain for automatic concept extraction and for grouping, categorizing, clustering and otherwise organizing knowledge using mechanical means. Four specific terms have emerged to identify the most prevalent techniques: machine learning, clustering, automatic indexing, and automatic classification. Our study presents three domain analytical case analyses in search of answers. The first case relies on citations located using the ISKO-supported "Knowledge Organization Bibliography." The second case relies on works in both Web of Science and SCOPUS. Case three applies co-word analysis and citation analysis to the contents of the papers in the present special issue. We observe scholars involved in "clustering" and "automatic classification" who share common thematic emphases. But we have found no coherence, no common activity and no social semantics. We have not found a research front, or a common teleology within the KO domain. We also have found a lively group of authors who have succeeded in submitting papers to this special issue, and their work quite interestingly aligns with the case studies we report. There is an emphasis on KO for information retrieval; there is much work on clustering (which involves conceptual points within texts) and automatic classification (which involves semantic groupings at the meta-document level).
Yilmaz, T.; Ozcan, R.; Altingovde, I.S.; Ulusoy, Ö.: Improving educational web search for question-like queries through subject classification (2019) 0.01
```
0.010173016 = product of:
  0.040692065 = sum of:
    0.040692065 = weight(_text_:social in 5041) [ClassicSimilarity], result of:
      0.040692065 = score(doc=5041,freq=2.0), product of:
        0.1847249 = queryWeight, product of:
          3.9875789 = idf(docFreq=2228, maxDocs=44218)
          0.046325076 = queryNorm
        0.22028469 = fieldWeight in 5041, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9875789 = idf(docFreq=2228, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5041)
  0.25 = coord(1/4)
```
Abstract

Students use general web search engines as their primary source of research while trying to find answers to school-related questions. Although search engines are highly relevant for the general population, they may return results that are out of educational context. Another rising trend; social community question answering websites are the second choice for students who try to get answers from other peers online. We attempt discovering possible improvements in educational search by leveraging both of these information sources. For this purpose, we first implement a classifier for educational questions. This classifier is built by an ensemble method that employs several regular learning algorithms and retrieval based approaches that utilize external resources. We also build a query expander to facilitate classification. We further improve the classification using search engine results and obtain 83.5% accuracy. Although our work is entirely based on the Turkish language, the features could easily be mapped to other languages as well. In order to find out whether search engine ranking can be improved in the education domain using the classification model, we collect and label a set of query results retrieved from a general web search engine. We propose five ad-hoc methods to improve search ranking based on the idea that the query-document category relation is an indicator of relevance. We evaluate these methods for overall performance, varying query length and based on factoid and non-factoid queries. We show that some of the methods significantly improve the rankings in the education domain.

Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.01

0.009414612 = product of:
  0.03765845 = sum of:
    0.03765845 = product of:
      0.0753169 = sum of:
        0.0753169 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
          0.0753169 = score(doc=1046,freq=2.0), product of:
            0.16222252 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046325076 = queryNorm
            0.46428138 = fieldWeight in 1046, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=1046)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 5. 5.2003 14:17:22

Kragelj, M.; Borstnar, M.K.: Automatic classification of older electronic texts into the Universal Decimal Classification-UDC (2021) 0.01
```
0.008138414 = product of:
  0.032553654 = sum of:
    0.032553654 = weight(_text_:social in 175) [ClassicSimilarity], result of:
      0.032553654 = score(doc=175,freq=2.0), product of:
        0.1847249 = queryWeight, product of:
          3.9875789 = idf(docFreq=2228, maxDocs=44218)
          0.046325076 = queryNorm
        0.17622775 = fieldWeight in 175, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9875789 = idf(docFreq=2228, maxDocs=44218)
          0.03125 = fieldNorm(doc=175)
  0.25 = coord(1/4)
```
Abstract

Purpose The purpose of this study is to develop a model for automated classification of old digitised texts to the Universal Decimal Classification (UDC), using machine-learning methods. Design/methodology/approach The general research approach is inherent to design science research, in which the problem of UDC assignment of the old, digitised texts is addressed by developing a machine-learning classification model. A corpus of 70,000 scholarly texts, fully bibliographically processed by librarians, was used to train and test the model, which was used for classification of old texts on a corpus of 200,000 items. Human experts evaluated the performance of the model. Findings Results suggest that machine-learning models can correctly assign the UDC at some level for almost any scholarly text. Furthermore, the model can be recommended for the UDC assignment of older texts. Ten librarians corroborated this on 150 randomly selected texts. Research limitations/implications The main limitations of this study were unavailability of labelled older texts and the limited availability of librarians. Practical implications The classification model can provide a recommendation to the librarians during their classification work; furthermore, it can be implemented as an add-on to full-text search in the library databases. Social implications The proposed methodology supports librarians by recommending UDC classifiers, thus saving time in their daily work. By automatically classifying older texts, digital libraries can provide a better user experience by enabling structured searches. These contribute to making knowledge more widely available and useable. Originality/value These findings contribute to the field of automated classification of bibliographical information with the usage of full texts, especially in cases in which the texts are old, unstructured and in which archaic language and vocabulary are used.

Reiner, U.: Automatische DDC-Klassifizierung von bibliografischen Titeldatensätzen (2009) 0.01

0.007845511 = product of:
  0.031382043 = sum of:
    0.031382043 = product of:
      0.062764086 = sum of:
        0.062764086 = weight(_text_:22 in 611) [ClassicSimilarity], result of:
          0.062764086 = score(doc=611,freq=2.0), product of:
            0.16222252 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046325076 = queryNorm
            0.38690117 = fieldWeight in 611, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=611)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 22. 8.2009 12:54:24

HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.01

0.007845511 = product of:
  0.031382043 = sum of:
    0.031382043 = product of:
      0.062764086 = sum of:
        0.062764086 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
          0.062764086 = score(doc=2748,freq=2.0), product of:
            0.16222252 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046325076 = queryNorm
            0.38690117 = fieldWeight in 2748, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2748)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 1. 2.2016 18:25:22

Qu, B.; Cong, G.; Li, C.; Sun, A.; Chen, H.: ¬An evaluation of classification models for question topic categorization (2012) 0.01
```
0.0065351077 = product of:
  0.026140431 = sum of:
    0.026140431 = product of:
      0.052280862 = sum of:
        0.052280862 = weight(_text_:aspects in 237) [ClassicSimilarity], result of:
          0.052280862 = score(doc=237,freq=2.0), product of:
            0.20938325 = queryWeight, product of:
              4.5198684 = idf(docFreq=1308, maxDocs=44218)
              0.046325076 = queryNorm
            0.2496898 = fieldWeight in 237, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.5198684 = idf(docFreq=1308, maxDocs=44218)
              0.0390625 = fieldNorm(doc=237)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

We study the problem of question topic classification using a very large real-world Community Question Answering (CQA) dataset from Yahoo! Answers. The dataset comprises 3.9 million questions and these questions are organized into more than 1,000 categories in a hierarchy. To the best knowledge, this is the first systematic evaluation of the performance of different classification methods on question topic classification as well as short texts. Specifically, we empirically evaluate the following in classifying questions into CQA categories: (a) the usefulness of n-gram features and bag-of-word features; (b) the performance of three standard classification algorithms (naive Bayes, maximum entropy, and support vector machines); (c) the performance of the state-of-the-art hierarchical classification algorithms; (d) the effect of training data size on performance; and (e) the effectiveness of the different components of CQA data, including subject, content, asker, and the best answer. The experimental results show what aspects are important for question topic classification in terms of both effectiveness and efficiency. We believe that the experimental findings from this study will be useful in real-world classification problems.

Bock, H.-H.: Datenanalyse zur Strukturierung und Ordnung von Information (1989) 0.01

0.005491857 = product of:
  0.021967428 = sum of:
    0.021967428 = product of:
      0.043934856 = sum of:
        0.043934856 = weight(_text_:22 in 141) [ClassicSimilarity], result of:
          0.043934856 = score(doc=141,freq=2.0), product of:
            0.16222252 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046325076 = queryNorm
            0.2708308 = fieldWeight in 141, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=141)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Pages: S.1-22

Dubin, D.: Dimensions and discriminability (1998) 0.01

0.005491857 = product of:
  0.021967428 = sum of:
    0.021967428 = product of:
      0.043934856 = sum of:
        0.043934856 = weight(_text_:22 in 2338) [ClassicSimilarity], result of:
          0.043934856 = score(doc=2338,freq=2.0), product of:
            0.16222252 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046325076 = queryNorm
            0.2708308 = fieldWeight in 2338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2338)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 22. 9.1997 19:16:05

Automatic classification research at OCLC (2002) 0.01

0.005491857 = product of:
  0.021967428 = sum of:
    0.021967428 = product of:
      0.043934856 = sum of:
        0.043934856 = weight(_text_:22 in 1563) [ClassicSimilarity], result of:
          0.043934856 = score(doc=1563,freq=2.0), product of:
            0.16222252 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046325076 = queryNorm
            0.2708308 = fieldWeight in 1563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1563)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 5. 5.2003 9:22:09

Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998) 0.01

0.005491857 = product of:
  0.021967428 = sum of:
    0.021967428 = product of:
      0.043934856 = sum of:
        0.043934856 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
          0.043934856 = score(doc=1673,freq=2.0), product of:
            0.16222252 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046325076 = queryNorm
            0.2708308 = fieldWeight in 1673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1673)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 1. 8.1996 22:08:06

Search (31 results, page 1 of 2)

Authors

Years

Languages

Types

Themes