-
Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004)
0.07
0.07272229 = sum of:
0.05422108 = product of:
0.21688432 = sum of:
0.21688432 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
0.21688432 = score(doc=562,freq=2.0), product of:
0.38590276 = queryWeight, product of:
8.478011 = idf(docFreq=24, maxDocs=44218)
0.045518078 = queryNorm
0.56201804 = fieldWeight in 562, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
8.478011 = idf(docFreq=24, maxDocs=44218)
0.046875 = fieldNorm(doc=562)
0.25 = coord(1/4)
0.018501213 = product of:
0.037002426 = sum of:
0.037002426 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
0.037002426 = score(doc=562,freq=2.0), product of:
0.15939656 = queryWeight, product of:
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.045518078 = queryNorm
0.23214069 = fieldWeight in 562, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.046875 = fieldNorm(doc=562)
0.5 = coord(1/2)
- Content
- Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
- Date
- 8. 1.2013 10:22:32
-
Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013)
0.03
0.031199675 = product of:
0.06239935 = sum of:
0.06239935 = sum of:
0.031563994 = weight(_text_:b in 1107) [ClassicSimilarity], result of:
0.031563994 = score(doc=1107,freq=2.0), product of:
0.16126883 = queryWeight, product of:
3.542962 = idf(docFreq=3476, maxDocs=44218)
0.045518078 = queryNorm
0.19572285 = fieldWeight in 1107, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.542962 = idf(docFreq=3476, maxDocs=44218)
0.0390625 = fieldNorm(doc=1107)
0.030835358 = weight(_text_:22 in 1107) [ClassicSimilarity], result of:
0.030835358 = score(doc=1107,freq=2.0), product of:
0.15939656 = queryWeight, product of:
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.045518078 = queryNorm
0.19345059 = fieldWeight in 1107, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.0390625 = fieldNorm(doc=1107)
0.5 = coord(1/2)
- Abstract
- Retrieval of disease information is often based on several key aspects such as etiology, diagnosis, treatment, prevention, and symptoms of diseases. Automatic identification of disease aspect information is thus essential. In this article, I model the aspect identification problem as a text classification (TC) problem in which a disease aspect corresponds to a category. The disease aspect classification problem poses two challenges to classifiers: (a) a medical text often contains information about multiple aspects of a disease and hence produces noise for the classifiers and (b) text classifiers often cannot extract the textual parts (i.e., passages) about the categories of interest. I thus develop a technique, PETC (Passage Extractor for Text Classification), that extracts passages (from medical texts) for the underlying text classifiers to classify. Case studies on thousands of Chinese and English medical texts show that PETC enhances a support vector machine (SVM) classifier in classifying disease aspect information. PETC also performs better than three state-of-the-art classifier enhancement techniques, including two passage extraction techniques for text classifiers and a technique that employs term proximity information to enhance text classifiers. The contribution is of significance to evidence-based medicine, health education, and healthcare decision support. PETC can be used in those application domains in which a text to be classified may have several parts about different categories.
- Date
- 28.10.2013 19:22:57
-
Subramanian, S.; Shafer, K.E.: Clustering (2001)
0.02
0.018501213 = product of:
0.037002426 = sum of:
0.037002426 = product of:
0.07400485 = sum of:
0.07400485 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
0.07400485 = score(doc=1046,freq=2.0), product of:
0.15939656 = queryWeight, product of:
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.045518078 = queryNorm
0.46428138 = fieldWeight in 1046, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.09375 = fieldNorm(doc=1046)
0.5 = coord(1/2)
0.5 = coord(1/2)
- Date
- 5. 5.2003 14:17:22
-
Shen, D.; Chen, Z.; Yang, Q.; Zeng, H.J.; Zhang, B.; Lu, Y.; Ma, W.Y.: Web page classification through summarization (2004)
0.02
0.015781997 = product of:
0.031563994 = sum of:
0.031563994 = product of:
0.06312799 = sum of:
0.06312799 = weight(_text_:b in 4132) [ClassicSimilarity], result of:
0.06312799 = score(doc=4132,freq=2.0), product of:
0.16126883 = queryWeight, product of:
3.542962 = idf(docFreq=3476, maxDocs=44218)
0.045518078 = queryNorm
0.3914457 = fieldWeight in 4132, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.542962 = idf(docFreq=3476, maxDocs=44218)
0.078125 = fieldNorm(doc=4132)
0.5 = coord(1/2)
0.5 = coord(1/2)
-
HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016)
0.02
0.015417679 = product of:
0.030835358 = sum of:
0.030835358 = product of:
0.061670717 = sum of:
0.061670717 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
0.061670717 = score(doc=2748,freq=2.0), product of:
0.15939656 = queryWeight, product of:
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.045518078 = queryNorm
0.38690117 = fieldWeight in 2748, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.078125 = fieldNorm(doc=2748)
0.5 = coord(1/2)
0.5 = coord(1/2)
- Date
- 1. 2.2016 18:25:22
-
Koch, T.; Ardö, A.: Automatic classification of full-text HTML-documents from one specific subject area : DESIRE II D3.6a, Working Paper 2 (2000)
0.01
0.012625597 = product of:
0.025251195 = sum of:
0.025251195 = product of:
0.05050239 = sum of:
0.05050239 = weight(_text_:b in 1667) [ClassicSimilarity], result of:
0.05050239 = score(doc=1667,freq=2.0), product of:
0.16126883 = queryWeight, product of:
3.542962 = idf(docFreq=3476, maxDocs=44218)
0.045518078 = queryNorm
0.31315655 = fieldWeight in 1667, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.542962 = idf(docFreq=3476, maxDocs=44218)
0.0625 = fieldNorm(doc=1667)
0.5 = coord(1/2)
0.5 = coord(1/2)
- Content
- 1 Introduction / 2 Method overview / 3 Ei thesaurus preprocessing / 4 Automatic classification process: 4.1 Matching -- 4.2 Weighting -- 4.3 Preparation for display / 5 Results of the classification process / 6 Evaluations / 7 Software / 8 Other applications / 9 Experiments with universal classification systems / References / Appendix A: Ei classification service: Software / Appendix B: Use of the classification software as subject filter in a WWW harvester.
-
Qu, B.; Cong, G.; Li, C.; Sun, A.; Chen, H.: ¬An evaluation of classification models for question topic categorization (2012)
0.01
0.011159557 = product of:
0.022319114 = sum of:
0.022319114 = product of:
0.044638228 = sum of:
0.044638228 = weight(_text_:b in 237) [ClassicSimilarity], result of:
0.044638228 = score(doc=237,freq=4.0), product of:
0.16126883 = queryWeight, product of:
3.542962 = idf(docFreq=3476, maxDocs=44218)
0.045518078 = queryNorm
0.2767939 = fieldWeight in 237, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
3.542962 = idf(docFreq=3476, maxDocs=44218)
0.0390625 = fieldNorm(doc=237)
0.5 = coord(1/2)
0.5 = coord(1/2)
- Abstract
- We study the problem of question topic classification using a very large real-world Community Question Answering (CQA) dataset from Yahoo! Answers. The dataset comprises 3.9 million questions and these questions are organized into more than 1,000 categories in a hierarchy. To the best knowledge, this is the first systematic evaluation of the performance of different classification methods on question topic classification as well as short texts. Specifically, we empirically evaluate the following in classifying questions into CQA categories: (a) the usefulness of n-gram features and bag-of-word features; (b) the performance of three standard classification algorithms (naive Bayes, maximum entropy, and support vector machines); (c) the performance of the state-of-the-art hierarchical classification algorithms; (d) the effect of training data size on performance; and (e) the effectiveness of the different components of CQA data, including subject, content, asker, and the best answer. The experimental results show what aspects are important for question topic classification in terms of both effectiveness and efficiency. We believe that the experimental findings from this study will be useful in real-world classification problems.
-
Dubin, D.: Dimensions and discriminability (1998)
0.01
0.010792375 = product of:
0.02158475 = sum of:
0.02158475 = product of:
0.0431695 = sum of:
0.0431695 = weight(_text_:22 in 2338) [ClassicSimilarity], result of:
0.0431695 = score(doc=2338,freq=2.0), product of:
0.15939656 = queryWeight, product of:
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.045518078 = queryNorm
0.2708308 = fieldWeight in 2338, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.0546875 = fieldNorm(doc=2338)
0.5 = coord(1/2)
0.5 = coord(1/2)
- Date
- 22. 9.1997 19:16:05
-
Automatic classification research at OCLC (2002)
0.01
0.010792375 = product of:
0.02158475 = sum of:
0.02158475 = product of:
0.0431695 = sum of:
0.0431695 = weight(_text_:22 in 1563) [ClassicSimilarity], result of:
0.0431695 = score(doc=1563,freq=2.0), product of:
0.15939656 = queryWeight, product of:
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.045518078 = queryNorm
0.2708308 = fieldWeight in 1563, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.0546875 = fieldNorm(doc=1563)
0.5 = coord(1/2)
0.5 = coord(1/2)
- Date
- 5. 5.2003 9:22:09
-
Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998)
0.01
0.010792375 = product of:
0.02158475 = sum of:
0.02158475 = product of:
0.0431695 = sum of:
0.0431695 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
0.0431695 = score(doc=1673,freq=2.0), product of:
0.15939656 = queryWeight, product of:
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.045518078 = queryNorm
0.2708308 = fieldWeight in 1673, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.0546875 = fieldNorm(doc=1673)
0.5 = coord(1/2)
0.5 = coord(1/2)
- Date
- 1. 8.1996 22:08:06
-
Yoon, Y.; Lee, C.; Lee, G.G.: ¬An effective procedure for constructing a hierarchical text classification system (2006)
0.01
0.010792375 = product of:
0.02158475 = sum of:
0.02158475 = product of:
0.0431695 = sum of:
0.0431695 = weight(_text_:22 in 5273) [ClassicSimilarity], result of:
0.0431695 = score(doc=5273,freq=2.0), product of:
0.15939656 = queryWeight, product of:
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.045518078 = queryNorm
0.2708308 = fieldWeight in 5273, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.0546875 = fieldNorm(doc=5273)
0.5 = coord(1/2)
0.5 = coord(1/2)
- Date
- 22. 7.2006 16:24:52
-
Yi, K.: Automatic text classification using library classification schemes : trends, issues and challenges (2007)
0.01
0.010792375 = product of:
0.02158475 = sum of:
0.02158475 = product of:
0.0431695 = sum of:
0.0431695 = weight(_text_:22 in 2560) [ClassicSimilarity], result of:
0.0431695 = score(doc=2560,freq=2.0), product of:
0.15939656 = queryWeight, product of:
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.045518078 = queryNorm
0.2708308 = fieldWeight in 2560, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.0546875 = fieldNorm(doc=2560)
0.5 = coord(1/2)
0.5 = coord(1/2)
- Date
- 22. 9.2008 18:31:54
-
Montesi, M.; Navarrete, T.: Classifying web genres in context : A case study documenting the web genres used by a software engineer (2008)
0.01
0.009469198 = product of:
0.018938396 = sum of:
0.018938396 = product of:
0.037876792 = sum of:
0.037876792 = weight(_text_:b in 2100) [ClassicSimilarity], result of:
0.037876792 = score(doc=2100,freq=2.0), product of:
0.16126883 = queryWeight, product of:
3.542962 = idf(docFreq=3476, maxDocs=44218)
0.045518078 = queryNorm
0.23486741 = fieldWeight in 2100, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.542962 = idf(docFreq=3476, maxDocs=44218)
0.046875 = fieldNorm(doc=2100)
0.5 = coord(1/2)
0.5 = coord(1/2)
- Abstract
- This case study analyzes the Internet-based resources that a software engineer uses in his daily work. Methodologically, we studied the web browser history of the participant, classifying all the web pages he had seen over a period of 12 days into web genres. We interviewed him before and after the analysis of the web browser history. In the first interview, he spoke about his general information behavior; in the second, he commented on each web genre, explaining why and how he used them. As a result, three approaches allow us to describe the set of 23 web genres obtained: (a) the purposes they serve for the participant; (b) the role they play in the various work and search phases; (c) and the way they are used in combination with each other. Further observations concern the way the participant assesses quality of web-based resources, and his information behavior as a software engineer.
-
Choi, B.; Peng, X.: Dynamic and hierarchical classification of Web pages (2004)
0.01
0.009469198 = product of:
0.018938396 = sum of:
0.018938396 = product of:
0.037876792 = sum of:
0.037876792 = weight(_text_:b in 2555) [ClassicSimilarity], result of:
0.037876792 = score(doc=2555,freq=2.0), product of:
0.16126883 = queryWeight, product of:
3.542962 = idf(docFreq=3476, maxDocs=44218)
0.045518078 = queryNorm
0.23486741 = fieldWeight in 2555, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.542962 = idf(docFreq=3476, maxDocs=44218)
0.046875 = fieldNorm(doc=2555)
0.5 = coord(1/2)
0.5 = coord(1/2)
-
Liu, R.-L.: Context recognition for hierarchical text classification (2009)
0.01
0.009250606 = product of:
0.018501213 = sum of:
0.018501213 = product of:
0.037002426 = sum of:
0.037002426 = weight(_text_:22 in 2760) [ClassicSimilarity], result of:
0.037002426 = score(doc=2760,freq=2.0), product of:
0.15939656 = queryWeight, product of:
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.045518078 = queryNorm
0.23214069 = fieldWeight in 2760, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.046875 = fieldNorm(doc=2760)
0.5 = coord(1/2)
0.5 = coord(1/2)
- Date
- 22. 3.2009 19:11:54
-
Zhu, W.Z.; Allen, R.B.: Document clustering using the LSI subspace signature model (2013)
0.01
0.009250606 = product of:
0.018501213 = sum of:
0.018501213 = product of:
0.037002426 = sum of:
0.037002426 = weight(_text_:22 in 690) [ClassicSimilarity], result of:
0.037002426 = score(doc=690,freq=2.0), product of:
0.15939656 = queryWeight, product of:
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.045518078 = queryNorm
0.23214069 = fieldWeight in 690, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.046875 = fieldNorm(doc=690)
0.5 = coord(1/2)
0.5 = coord(1/2)
- Date
- 23. 3.2013 13:22:36
-
Egbert, J.; Biber, D.; Davies, M.: Developing a bottom-up, user-based method of web register classification (2015)
0.01
0.009250606 = product of:
0.018501213 = sum of:
0.018501213 = product of:
0.037002426 = sum of:
0.037002426 = weight(_text_:22 in 2158) [ClassicSimilarity], result of:
0.037002426 = score(doc=2158,freq=2.0), product of:
0.15939656 = queryWeight, product of:
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.045518078 = queryNorm
0.23214069 = fieldWeight in 2158, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.046875 = fieldNorm(doc=2158)
0.5 = coord(1/2)
0.5 = coord(1/2)
- Date
- 4. 8.2015 19:22:04
-
Ribeiro-Neto, B.; Laender, A.H.F.; Lima, L.R.S. de: ¬An experimental study in automatically categorizing medical documents (2001)
0.01
0.007890998 = product of:
0.015781997 = sum of:
0.015781997 = product of:
0.031563994 = sum of:
0.031563994 = weight(_text_:b in 5702) [ClassicSimilarity], result of:
0.031563994 = score(doc=5702,freq=2.0), product of:
0.16126883 = queryWeight, product of:
3.542962 = idf(docFreq=3476, maxDocs=44218)
0.045518078 = queryNorm
0.19572285 = fieldWeight in 5702, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.542962 = idf(docFreq=3476, maxDocs=44218)
0.0390625 = fieldNorm(doc=5702)
0.5 = coord(1/2)
0.5 = coord(1/2)
-
Calado, P.; Cristo, M.; Gonçalves, M.A.; Moura, E.S. de; Ribeiro-Neto, B.; Ziviani, N.: Link-based similarity measures for the classification of Web documents (2006)
0.01
0.007890998 = product of:
0.015781997 = sum of:
0.015781997 = product of:
0.031563994 = sum of:
0.031563994 = weight(_text_:b in 4921) [ClassicSimilarity], result of:
0.031563994 = score(doc=4921,freq=2.0), product of:
0.16126883 = queryWeight, product of:
3.542962 = idf(docFreq=3476, maxDocs=44218)
0.045518078 = queryNorm
0.19572285 = fieldWeight in 4921, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.542962 = idf(docFreq=3476, maxDocs=44218)
0.0390625 = fieldNorm(doc=4921)
0.5 = coord(1/2)
0.5 = coord(1/2)
-
Han, K.; Rezapour, R.; Nakamura, K.; Devkota, D.; Miller, D.C.; Diesner, J.: ¬An expert-in-the-loop method for domain-specific document categorization based on small training data (2023)
0.01
0.007890998 = product of:
0.015781997 = sum of:
0.015781997 = product of:
0.031563994 = sum of:
0.031563994 = weight(_text_:b in 967) [ClassicSimilarity], result of:
0.031563994 = score(doc=967,freq=2.0), product of:
0.16126883 = queryWeight, product of:
3.542962 = idf(docFreq=3476, maxDocs=44218)
0.045518078 = queryNorm
0.19572285 = fieldWeight in 967, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.542962 = idf(docFreq=3476, maxDocs=44218)
0.0390625 = fieldNorm(doc=967)
0.5 = coord(1/2)
0.5 = coord(1/2)
- Abstract
- Automated text categorization methods are of broad relevance for domain experts since they free researchers and practitioners from manual labeling, save their resources (e.g., time, labor), and enrich the data with information helpful to study substantive questions. Despite a variety of newly developed categorization methods that require substantial amounts of annotated data, little is known about how to build models when (a) labeling texts with categories requires substantial domain expertise and/or in-depth reading, (b) only a few annotated documents are available for model training, and (c) no relevant computational resources, such as pretrained models, are available. In a collaboration with environmental scientists who study the socio-ecological impact of funded biodiversity conservation projects, we develop a method that integrates deep domain expertise with computational models to automatically categorize project reports based on a small sample of 93 annotated documents. Our results suggest that domain expertise can improve automated categorization and that the magnitude of these improvements is influenced by the experts' understanding of categories and their confidence in their annotation, as well as data sparsity and additional category characteristics such as the portion of exclusive keywords that can identify a category.