Search (146 results, page 1 of 8)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.12

0.120835625 = product of:
  0.18125343 = sum of:
    0.08073489 = product of:
      0.24220468 = sum of:
        0.24220468 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.24220468 = score(doc=562,freq=2.0), product of:
            0.43095535 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.05083213 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.10051855 = sum of:
      0.059196237 = weight(_text_:classification in 562) [ClassicSimilarity], result of:
        0.059196237 = score(doc=562,freq=6.0), product of:
          0.16188543 = queryWeight, product of:
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.05083213 = queryNorm
          0.3656675 = fieldWeight in 562, product of:
            2.4494898 = tf(freq=6.0), with freq of:
              6.0 = termFreq=6.0
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
      0.04132231 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
        0.04132231 = score(doc=562,freq=2.0), product of:
          0.17800546 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05083213 = queryNorm
          0.23214069 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
  0.6666667 = coord(2/3)

Abstract: Document representations for text classification are typically based on the classical Bag-Of-Words paradigm. This approach comes with deficiencies that motivate the integration of features on a higher semantic level than single words. In this paper we propose an enhancement of the classical document representation through concepts extracted from background knowledge. Boosting is used for actual classification. Experimental evaluations on two well known text corpora support our approach through consistent improvement of the results.
Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Ardö, A.; Koch, T.: Automatic classification applied to full-text Internet documents in a robot-generated subject index (1999) 0.11

0.108577125 = product of:
  0.16286568 = sum of:
    0.12868872 = weight(_text_:index in 382) [ClassicSimilarity], result of:
      0.12868872 = score(doc=382,freq=2.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.5793543 = fieldWeight in 382, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.09375 = fieldNorm(doc=382)
    0.034176964 = product of:
      0.06835393 = sum of:
        0.06835393 = weight(_text_:classification in 382) [ClassicSimilarity], result of:
          0.06835393 = score(doc=382,freq=2.0), product of:
            0.16188543 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.05083213 = queryNorm
            0.42223644 = fieldWeight in 382, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.09375 = fieldNorm(doc=382)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Lindholm, J.; Schönthal, T.; Jansson , K.: Experiences of harvesting Web resources in engineering using automatic classification (2003) 0.10

0.102367505 = product of:
  0.15355125 = sum of:
    0.1213289 = weight(_text_:index in 4088) [ClassicSimilarity], result of:
      0.1213289 = score(doc=4088,freq=4.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.5462205 = fieldWeight in 4088, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.0625 = fieldNorm(doc=4088)
    0.03222235 = product of:
      0.0644447 = sum of:
        0.0644447 = weight(_text_:classification in 4088) [ClassicSimilarity], result of:
          0.0644447 = score(doc=4088,freq=4.0), product of:
            0.16188543 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.05083213 = queryNorm
            0.39808834 = fieldWeight in 4088, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0625 = fieldNorm(doc=4088)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: Authors describe the background and the work involved in setting up Engine-e, a Web index that uses automatic classification as a mean for the selection of resources in Engineering. Considerations in offering a robot-generated Web index as a successor to a manually indexed quality-controlled subject gateway are also discussed

Golub, K.; Lykke, M.: Automated classification of web pages in hierarchical browsing (2009) 0.08
```
0.084783375 = product of:
  0.12717506 = sum of:
    0.075830564 = weight(_text_:index in 3614) [ClassicSimilarity], result of:
      0.075830564 = score(doc=3614,freq=4.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.3413878 = fieldWeight in 3614, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3614)
    0.0513445 = product of:
      0.102689 = sum of:
        0.102689 = weight(_text_:classification in 3614) [ClassicSimilarity], result of:
          0.102689 = score(doc=3614,freq=26.0), product of:
            0.16188543 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.05083213 = queryNorm
            0.63433135 = fieldWeight in 3614, product of:
              5.0990195 = tf(freq=26.0), with freq of:
                26.0 = termFreq=26.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3614)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Purpose - The purpose of this study is twofold: to investigate whether it is meaningful to use the Engineering Index (Ei) classification scheme for browsing, and then, if proven useful, to investigate the performance of an automated classification algorithm based on the Ei classification scheme. Design/methodology/approach - A user study was conducted in which users solved four controlled searching tasks. The users browsed the Ei classification scheme in order to examine the suitability of the classification systems for browsing. The classification algorithm was evaluated by the users who judged the correctness of the automatically assigned classes. Findings - The study showed that the Ei classification scheme is suited for browsing. Automatically assigned classes were on average partly correct, with some classes working better than others. Success of browsing showed to be correlated and dependent on classification correctness. Research limitations/implications - Further research should address problems of disparate evaluations of one and the same web page. Additional reasons behind browsing failures in the Ei classification scheme also need further investigation. Practical implications - Improvements for browsing were identified: describing class captions and/or listing their subclasses from start; allowing for searching for words from class captions with synonym search (easily provided for Ei since the classes are mapped to thesauri terms); when searching for class captions, returning the hierarchical tree expanded around the class in which caption the search term is found. The need for improvements of classification schemes was also indicated. Originality/value - A user-based evaluation of automated subject classification in the context of browsing has not been conducted before; hence the study also presents new findings concerning methodology.

Object

Engineering Index Classification

Classification, automation, and new media : Proceedings of the 24th Annual Conference of the Gesellschaft für Klassifikation e.V., University of Passau, March 15 - 17, 2000 (2002) 0.07

0.06954091 = product of:
  0.10431136 = sum of:
    0.075830564 = weight(_text_:index in 5997) [ClassicSimilarity], result of:
      0.075830564 = score(doc=5997,freq=4.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.3413878 = fieldWeight in 5997, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5997)
    0.028480802 = product of:
      0.056961603 = sum of:
        0.056961603 = weight(_text_:classification in 5997) [ClassicSimilarity], result of:
          0.056961603 = score(doc=5997,freq=8.0), product of:
            0.16188543 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.05083213 = queryNorm
            0.35186368 = fieldWeight in 5997, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5997)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: Given the huge amount of information in the internet and in practically every domain of knowledge that we are facing today, knowledge discovery calls for automation. The book deals with methods from classification and data analysis that respond effectively to this rapidly growing challenge. The interested reader will find new methodological insights as well as applications in economics, management science, finance, and marketing, and in pattern recognition, biology, health, and archaeology.
Content: Data Analysis, Statistics, and Classification.- Pattern Recognition and Automation.- Data Mining, Information Processing, and Automation.- New Media, Web Mining, and Automation.- Applications in Management Science, Finance, and Marketing.- Applications in Medicine, Biology, Archaeology, and Others.- Author Index.- Subject Index.
Series: Proceedings of the ... annual conference of the Gesellschaft für Klassifikation e.V. ; 24)(Studies in classification, data analysis, and knowledge organization

Koch, T.: Experiments with automatic classification of WAIS databases and indexing of WWW : some results from the Nordic WAIS/WWW project (1994) 0.07

0.06884199 = product of:
  0.103262976 = sum of:
    0.07506842 = weight(_text_:index in 7209) [ClassicSimilarity], result of:
      0.07506842 = score(doc=7209,freq=2.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.33795667 = fieldWeight in 7209, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7209)
    0.028194554 = product of:
      0.05638911 = sum of:
        0.05638911 = weight(_text_:classification in 7209) [ClassicSimilarity], result of:
          0.05638911 = score(doc=7209,freq=4.0), product of:
            0.16188543 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.05083213 = queryNorm
            0.34832728 = fieldWeight in 7209, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0546875 = fieldNorm(doc=7209)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: The Nordic WAIS/WWW project sponsored by NORDINFO is a joint project between Lund University Library and the National Technological Library of Denmark. It aims to improve the existing networked information discovery and retrieval tools Wide Area Information System (WAIS) and World Wide Web (WWW), and to move towards unifying WWW and WAIS. Details current results focusing on the WAIS side of the project. Describes research into automatic indexing and classification of WAIS sources, development of an orientation tool for WAIS, and development of a WAIS index of WWW resources

Dubin, D.: Dimensions and discriminability (1998) 0.07

0.0661154 = product of:
  0.0991731 = sum of:
    0.07506842 = weight(_text_:index in 2338) [ClassicSimilarity], result of:
      0.07506842 = score(doc=2338,freq=2.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.33795667 = fieldWeight in 2338, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2338)
    0.02410468 = product of:
      0.04820936 = sum of:
        0.04820936 = weight(_text_:22 in 2338) [ClassicSimilarity], result of:
          0.04820936 = score(doc=2338,freq=2.0), product of:
            0.17800546 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05083213 = queryNorm
            0.2708308 = fieldWeight in 2338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2338)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: Visualization interfaces can improve subject access by highlighting the inclusion of document representation components in similarity and discrimination relationships. Within a set of retrieved documents, what kinds of groupings can index terms and subject headings make explicit? The role of controlled vocabulary in classifying search output is examined
Date: 22. 9.1997 19:16:05

Koch, T.; Ardö, A.; Noodén, L.: ¬The construction of a robot-generated subject index : DESIRE II D3.6a, Working Paper 1 (1999) 0.05

0.054288562 = product of:
  0.08143284 = sum of:
    0.06434436 = weight(_text_:index in 1668) [ClassicSimilarity], result of:
      0.06434436 = score(doc=1668,freq=2.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.28967714 = fieldWeight in 1668, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.046875 = fieldNorm(doc=1668)
    0.017088482 = product of:
      0.034176964 = sum of:
        0.034176964 = weight(_text_:classification in 1668) [ClassicSimilarity], result of:
          0.034176964 = score(doc=1668,freq=2.0), product of:
            0.16188543 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.05083213 = queryNorm
            0.21111822 = fieldWeight in 1668, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=1668)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: This working paper describes the creation of a test database to carry out the automatic classification tasks of the DESIRE II work package D3.6a on. It is an improved version of NetLab's existing "All" Engineering database created after a comparative study of the outcome of two different approaches to collecting the documents. These two methods were selected from seven different general methodologies to build robot-generated subject indices, presented in this paper. We found a surprisingly low overlap between the Engineering link collections we used as seed pages for the robot and subsequently an even more surprisingly low overlap between the resources collected by the two different approaches. That inspite of using basically the same services to start the harvesting process from. A intellectual evaluation of the contents of both databases showed almost exactly the same percentage of relevant documents (77%), indicating that the main difference between those aproaches was the coverage of the resulting database.

Yi, K.: Automatic text classification using library classification schemes : trends, issues and challenges (2007) 0.05

0.05366253 = product of:
  0.16098759 = sum of:
    0.16098759 = sum of:
      0.11277822 = weight(_text_:classification in 2560) [ClassicSimilarity], result of:
        0.11277822 = score(doc=2560,freq=16.0), product of:
          0.16188543 = queryWeight, product of:
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.05083213 = queryNorm
          0.69665456 = fieldWeight in 2560, product of:
            4.0 = tf(freq=16.0), with freq of:
              16.0 = termFreq=16.0
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.0546875 = fieldNorm(doc=2560)
      0.04820936 = weight(_text_:22 in 2560) [ClassicSimilarity], result of:
        0.04820936 = score(doc=2560,freq=2.0), product of:
          0.17800546 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05083213 = queryNorm
          0.2708308 = fieldWeight in 2560, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=2560)
  0.33333334 = coord(1/3)

Abstract: The proliferation of digital resources and their integration into a traditional library setting has created a pressing need for an automated tool that organizes textual information based on library classification schemes. Automated text classification is a research field of developing tools, methods, and models to automate text classification. This article describes the current popular approach for text classification and major text classification projects and applications that are based on library classification schemes. Related issues and challenges are discussed, and a number of considerations for the challenges are examined.
Date: 22. 9.2008 18:31:54

Borko, H.: Research in computer based classification systems (1985) 0.05
```
0.048983574 = product of:
  0.07347536 = sum of:
    0.03753421 = weight(_text_:index in 3647) [ClassicSimilarity], result of:
      0.03753421 = score(doc=3647,freq=2.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.16897833 = fieldWeight in 3647, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.02734375 = fieldNorm(doc=3647)
    0.03594115 = product of:
      0.0718823 = sum of:
        0.0718823 = weight(_text_:classification in 3647) [ClassicSimilarity], result of:
          0.0718823 = score(doc=3647,freq=26.0), product of:
            0.16188543 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.05083213 = queryNorm
            0.44403192 = fieldWeight in 3647, product of:
              5.0990195 = tf(freq=26.0), with freq of:
                26.0 = termFreq=26.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.02734375 = fieldNorm(doc=3647)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

The selection in this reader by R. M. Needham and K. Sparck Jones reports an early approach to automatic classification that was taken in England. The following selection reviews various approaches that were being pursued in the United States at about the same time. It then discusses a particular approach initiated in the early 1960s by Harold Borko, at that time Head of the Language Processing and Retrieval Research Staff at the System Development Corporation, Santa Monica, California and, since 1966, a member of the faculty at the Graduate School of Library and Information Science, University of California, Los Angeles. As was described earlier, there are two steps in automatic classification, the first being to identify pairs of terms that are similar by virtue of co-occurring as index terms in the same documents, and the second being to form equivalence classes of intersubstitutable terms. To compute similarities, Borko and his associates used a standard correlation formula; to derive classification categories, where Needham and Sparck Jones used clumping, the Borko team used the statistical technique of factor analysis. The fact that documents can be classified automatically, and in any number of ways, is worthy of passing notice. Worthy of serious attention would be a demonstra tion that a computer-based classification system was effective in the organization and retrieval of documents. One reason for the inclusion of the following selection in the reader is that it addresses the question of evaluation. To evaluate the effectiveness of their automatically derived classification, Borko and his team asked three questions. The first was Is the classification reliable? in other words, could the categories derived from one sample of texts be used to classify other texts? Reliability was assessed by a case-study comparison of the classes derived from three different samples of abstracts. The notso-surprising conclusion reached was that automatically derived classes were reliable only to the extent that the sample from which they were derived was representative of the total document collection. The second evaluation question asked whether the classification was reasonable, in the sense of adequately describing the content of the document collection. The answer was sought by comparing the automatically derived categories with categories in a related classification system that was manually constructed. Here the conclusion was that the automatic method yielded categories that fairly accurately reflected the major area of interest in the sample collection of texts; however, since there were only eleven such categories and they were quite broad, they could not be regarded as suitable for use in a university or any large general library. The third evaluation question asked whether automatic classification was accurate, in the sense of producing results similar to those obtainabie by human cIassifiers. When using human classification as a criterion, automatic classification was found to be 50 percent accurate.

Footnote

Original in: Classification research: Proceedings of the Second International Study Conference held at Hotel Prins Hamlet, Elsinore, Denmark, 14th-18th Sept. 1964. Ed.: Pauline Atherton. Copenhagen: Munksgaard 1965. S.220-238.

Yoon, Y.; Lee, C.; Lee, G.G.: ¬An effective procedure for constructing a hierarchical text classification system (2006) 0.05

0.048626058 = product of:
  0.14587817 = sum of:
    0.14587817 = sum of:
      0.097668804 = weight(_text_:classification in 5273) [ClassicSimilarity], result of:
        0.097668804 = score(doc=5273,freq=12.0), product of:
          0.16188543 = queryWeight, product of:
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.05083213 = queryNorm
          0.60332054 = fieldWeight in 5273, product of:
            3.4641016 = tf(freq=12.0), with freq of:
              12.0 = termFreq=12.0
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.0546875 = fieldNorm(doc=5273)
      0.04820936 = weight(_text_:22 in 5273) [ClassicSimilarity], result of:
        0.04820936 = score(doc=5273,freq=2.0), product of:
          0.17800546 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05083213 = queryNorm
          0.2708308 = fieldWeight in 5273, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=5273)
  0.33333334 = coord(1/3)

Abstract: In text categorization tasks, classification on some class hierarchies has better results than in cases without the hierarchy. Currently, because a large number of documents are divided into several subgroups in a hierarchy, we can appropriately use a hierarchical classification method. However, we have no systematic method to build a hierarchical classification system that performs well with large collections of practical data. In this article, we introduce a new evaluation scheme for internal node classifiers, which can be used effectively to develop a hierarchical classification system. We also show that our method for constructing the hierarchical classification system is very effective, especially for the task of constructing classifiers applied to hierarchy tree with a lot of levels.
Date: 22. 7.2006 16:24:52

HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.04

0.04194404 = product of:
  0.12583213 = sum of:
    0.12583213 = sum of:
      0.056961603 = weight(_text_:classification in 2748) [ClassicSimilarity], result of:
        0.056961603 = score(doc=2748,freq=2.0), product of:
          0.16188543 = queryWeight, product of:
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.05083213 = queryNorm
          0.35186368 = fieldWeight in 2748, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.078125 = fieldNorm(doc=2748)
      0.06887052 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
        0.06887052 = score(doc=2748,freq=2.0), product of:
          0.17800546 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05083213 = queryNorm
          0.38690117 = fieldWeight in 2748, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.078125 = fieldNorm(doc=2748)
  0.33333334 = coord(1/3)

Date: 1. 2.2016 18:25:22

Liu, R.-L.: Context recognition for hierarchical text classification (2009) 0.04
```
0.03924811 = product of:
  0.11774433 = sum of:
    0.11774433 = sum of:
      0.07642201 = weight(_text_:classification in 2760) [ClassicSimilarity], result of:
        0.07642201 = score(doc=2760,freq=10.0), product of:
          0.16188543 = queryWeight, product of:
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.05083213 = queryNorm
          0.4720747 = fieldWeight in 2760, product of:
            3.1622777 = tf(freq=10.0), with freq of:
              10.0 = termFreq=10.0
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.046875 = fieldNorm(doc=2760)
      0.04132231 = weight(_text_:22 in 2760) [ClassicSimilarity], result of:
        0.04132231 = score(doc=2760,freq=2.0), product of:
          0.17800546 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05083213 = queryNorm
          0.23214069 = fieldWeight in 2760, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=2760)
  0.33333334 = coord(1/3)
```
Abstract

Information is often organized as a text hierarchy. A hierarchical text-classification system is thus essential for the management, sharing, and dissemination of information. It aims to automatically classify each incoming document into zero, one, or several categories in the text hierarchy. In this paper, we present a technique called CRHTC (context recognition for hierarchical text classification) that performs hierarchical text classification by recognizing the context of discussion (COD) of each category. A category's COD is governed by its ancestor categories, whose contents indicate contextual backgrounds of the category. A document may be classified into a category only if its content matches the category's COD. CRHTC does not require any trials to manually set parameters, and hence is more portable and easier to implement than other methods. It is empirically evaluated under various conditions. The results show that CRHTC achieves both better and more stable performance than several hierarchical and nonhierarchical text-classification methodologies.

Date

22. 3.2009 19:11:54

Automatic classification research at OCLC (2002) 0.04

0.039090548 = product of:
  0.11727164 = sum of:
    0.11727164 = sum of:
      0.06906228 = weight(_text_:classification in 1563) [ClassicSimilarity], result of:
        0.06906228 = score(doc=1563,freq=6.0), product of:
          0.16188543 = queryWeight, product of:
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.05083213 = queryNorm
          0.42661208 = fieldWeight in 1563, product of:
            2.4494898 = tf(freq=6.0), with freq of:
              6.0 = termFreq=6.0
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.0546875 = fieldNorm(doc=1563)
      0.04820936 = weight(_text_:22 in 1563) [ClassicSimilarity], result of:
        0.04820936 = score(doc=1563,freq=2.0), product of:
          0.17800546 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05083213 = queryNorm
          0.2708308 = fieldWeight in 1563, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=1563)
  0.33333334 = coord(1/3)

Abstract: OCLC enlists the cooperation of the world's libraries to make the written record of humankind's cultural heritage more accessible through electronic media. Part of this goal can be accomplished through the application of the principles of knowledge organization. We believe that cultural artifacts are effectively lost unless they are indexed, cataloged and classified. Accordingly, OCLC has developed products, sponsored research projects, and encouraged the participation in international standards communities whose outcome has been improved library classification schemes, cataloging productivity tools, and new proposals for the creation and maintenance of metadata. Though cataloging and classification requires expert intellectual effort, we recognize that at least some of the work must be automated if we hope to keep pace with cultural change
Date: 5. 5.2003 9:22:09

Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998) 0.04

0.039090548 = product of:
  0.11727164 = sum of:
    0.11727164 = sum of:
      0.06906228 = weight(_text_:classification in 1673) [ClassicSimilarity], result of:
        0.06906228 = score(doc=1673,freq=6.0), product of:
          0.16188543 = queryWeight, product of:
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.05083213 = queryNorm
          0.42661208 = fieldWeight in 1673, product of:
            2.4494898 = tf(freq=6.0), with freq of:
              6.0 = termFreq=6.0
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.0546875 = fieldNorm(doc=1673)
      0.04820936 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
        0.04820936 = score(doc=1673,freq=2.0), product of:
          0.17800546 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05083213 = queryNorm
          0.2708308 = fieldWeight in 1673, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=1673)
  0.33333334 = coord(1/3)

Abstract: The Wolverhampton Web Library (WWLib) is a WWW search engine that provides access to UK based information. The experimental version developed in 1995, was a success but highlighted the need for a much higher degree of automation. An interesting feature of the experimental WWLib was that it organised information according to DDC. Discusses the advantages of classification and describes the automatic classifier that is being developed in Java as part of the new, fully automated WWLib
Date: 1. 8.1996 22:08:06

Reiner, U.: Automatische DDC-Klassifizierung bibliografischer Titeldatensätze der Deutschen Nationalbibliografie (2009) 0.04

0.037780233 = product of:
  0.056670345 = sum of:
    0.04289624 = weight(_text_:index in 3284) [ClassicSimilarity], result of:
      0.04289624 = score(doc=3284,freq=2.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.1931181 = fieldWeight in 3284, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.03125 = fieldNorm(doc=3284)
    0.0137741035 = product of:
      0.027548207 = sum of:
        0.027548207 = weight(_text_:22 in 3284) [ClassicSimilarity], result of:
          0.027548207 = score(doc=3284,freq=2.0), product of:
            0.17800546 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05083213 = queryNorm
            0.15476047 = fieldWeight in 3284, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=3284)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Date: 22. 1.2010 14:41:24
Footnote: Vortrag gehalten am 03.06.2009 auf dem 98. Bibliothekartag 2009 in Erfurt; erscheint in: Dialog mit Biliotheken. Vgl. auch: http://www.gbv.de/vgm/info/biblio/01VZG/06Publikationen/2009/index.

Huang, Y.-L.: ¬A theoretic and empirical research of cluster indexing for Mandarine Chinese full text document (1998) 0.04
```
0.035387594 = product of:
  0.10616278 = sum of:
    0.10616278 = weight(_text_:index in 513) [ClassicSimilarity], result of:
      0.10616278 = score(doc=513,freq=4.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.4779429 = fieldWeight in 513, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.0546875 = fieldNorm(doc=513)
  0.33333334 = coord(1/3)
```
Abstract

Since most popular commercialized systems for full text retrieval are designed with full text scaning and Boolean logic query mode, these systems use an oversimplified relationship between the indexing form and the content of document. Reports the use of Singular Value Decomposition (SVD) to develop a Cluster Indexing Model (CIM) based on a Vector Space Model (VSM) in orer to explore the index theory of cluster indexing for chinese full text documents. From a series of experiments, it was found that the indexing performance of CIM is better than traditional VSM, and has almost equivalent effectiveness of the authority control of index terms

Egbert, J.; Biber, D.; Davies, M.: Developing a bottom-up, user-based method of web register classification (2015) 0.03

0.033506185 = product of:
  0.10051855 = sum of:
    0.10051855 = sum of:
      0.059196237 = weight(_text_:classification in 2158) [ClassicSimilarity], result of:
        0.059196237 = score(doc=2158,freq=6.0), product of:
          0.16188543 = queryWeight, product of:
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.05083213 = queryNorm
          0.3656675 = fieldWeight in 2158, product of:
            2.4494898 = tf(freq=6.0), with freq of:
              6.0 = termFreq=6.0
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.046875 = fieldNorm(doc=2158)
      0.04132231 = weight(_text_:22 in 2158) [ClassicSimilarity], result of:
        0.04132231 = score(doc=2158,freq=2.0), product of:
          0.17800546 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05083213 = queryNorm
          0.23214069 = fieldWeight in 2158, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=2158)
  0.33333334 = coord(1/3)

Abstract: This paper introduces a project to develop a reliable, cost-effective method for classifying Internet texts into register categories, and apply that approach to the analysis of a large corpus of web documents. To date, the project has proceeded in 2 key phases. First, we developed a bottom-up method for web register classification, asking end users of the web to utilize a decision-tree survey to code relevant situational characteristics of web documents, resulting in a bottom-up identification of register and subregister categories. We present details regarding the development and testing of this method through a series of 10 pilot studies. Then, in the second phase of our project we applied this procedure to a corpus of 53,000 web documents. An analysis of the results demonstrates the effectiveness of these methods for web register classification and provides a preliminary description of the types and distribution of registers on the web.
Date: 4. 8.2015 19:22:04

Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.03
```
0.030465623 = product of:
  0.09139687 = sum of:
    0.09139687 = sum of:
      0.056961603 = weight(_text_:classification in 1107) [ClassicSimilarity], result of:
        0.056961603 = score(doc=1107,freq=8.0), product of:
          0.16188543 = queryWeight, product of:
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.05083213 = queryNorm
          0.35186368 = fieldWeight in 1107, product of:
            2.828427 = tf(freq=8.0), with freq of:
              8.0 = termFreq=8.0
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1107)
      0.03443526 = weight(_text_:22 in 1107) [ClassicSimilarity], result of:
        0.03443526 = score(doc=1107,freq=2.0), product of:
          0.17800546 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05083213 = queryNorm
          0.19345059 = fieldWeight in 1107, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1107)
  0.33333334 = coord(1/3)
```
Abstract

Retrieval of disease information is often based on several key aspects such as etiology, diagnosis, treatment, prevention, and symptoms of diseases. Automatic identification of disease aspect information is thus essential. In this article, I model the aspect identification problem as a text classification (TC) problem in which a disease aspect corresponds to a category. The disease aspect classification problem poses two challenges to classifiers: (a) a medical text often contains information about multiple aspects of a disease and hence produces noise for the classifiers and (b) text classifiers often cannot extract the textual parts (i.e., passages) about the categories of interest. I thus develop a technique, PETC (Passage Extractor for Text Classification), that extracts passages (from medical texts) for the underlying text classifiers to classify. Case studies on thousands of Chinese and English medical texts show that PETC enhances a support vector machine (SVM) classifier in classifying disease aspect information. PETC also performs better than three state-of-the-art classifier enhancement techniques, including two passage extraction techniques for text classifiers and a technique that employs term proximity information to enhance text classifiers. The contribution is of significance to evidence-based medicine, health education, and healthcare decision support. PETC can be used in those application domains in which a text to be classified may have several parts about different categories.

Date

28.10.2013 19:22:57

Pfeffer, M.: Automatische Vergabe von RVK-Notationen mittels fallbasiertem Schließen (2009) 0.03

0.025166426 = product of:
  0.075499274 = sum of:
    0.075499274 = sum of:
      0.034176964 = weight(_text_:classification in 3051) [ClassicSimilarity], result of:
        0.034176964 = score(doc=3051,freq=2.0), product of:
          0.16188543 = queryWeight, product of:
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.05083213 = queryNorm
          0.21111822 = fieldWeight in 3051, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.046875 = fieldNorm(doc=3051)
      0.04132231 = weight(_text_:22 in 3051) [ClassicSimilarity], result of:
        0.04132231 = score(doc=3051,freq=2.0), product of:
          0.17800546 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05083213 = queryNorm
          0.23214069 = fieldWeight in 3051, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=3051)
  0.33333334 = coord(1/3)

Date: 22. 8.2009 19:51:28
Footnote: Vgl. auch die Präsentationen unter: http://www.bibliothek.uni-regensburg.de/Systematik/pdf/Anw2008_PPT1.pdf. http://blog.bib.uni-mannheim.de/Classification/wp-content/uploads/2007/10/hu-berlin-2007-2.pdf. Volltexte unter:

Search (146 results, page 1 of 8)

Authors

Years

Languages

Types

Themes

Subjects