Search (71 results, page 1 of 4)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.27

0.2722137 = product of:
  0.47637397 = sum of:
    0.065653205 = product of:
      0.1969596 = sum of:
        0.1969596 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.1969596 = score(doc=562,freq=2.0), product of:
            0.35045066 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.041336425 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.1969596 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.1969596 = score(doc=562,freq=2.0), product of:
        0.35045066 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.041336425 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.1969596 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.1969596 = score(doc=562,freq=2.0), product of:
        0.35045066 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.041336425 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.016801544 = product of:
      0.033603087 = sum of:
        0.033603087 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.033603087 = score(doc=562,freq=2.0), product of:
            0.14475311 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.041336425 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
  0.5714286 = coord(4/7)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Smiraglia, R.P.; Cai, X.: Tracking the evolution of clustering, machine learning, automatic indexing and automatic classification in knowledge organization (2017) 0.03
```
0.03339241 = product of:
  0.11687343 = sum of:
    0.09869386 = weight(_text_:case in 3627) [ClassicSimilarity], result of:
      0.09869386 = score(doc=3627,freq=10.0), product of:
        0.18173204 = queryWeight, product of:
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.041336425 = queryNorm
        0.54307353 = fieldWeight in 3627, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3627)
    0.018179566 = product of:
      0.03635913 = sum of:
        0.03635913 = weight(_text_:studies in 3627) [ClassicSimilarity], result of:
          0.03635913 = score(doc=3627,freq=2.0), product of:
            0.16494368 = queryWeight, product of:
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.041336425 = queryNorm
            0.22043361 = fieldWeight in 3627, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3627)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)
```
Abstract

A very important extension of the traditional domain of knowledge organization (KO) arises from attempts to incorporate techniques devised in the computer science domain for automatic concept extraction and for grouping, categorizing, clustering and otherwise organizing knowledge using mechanical means. Four specific terms have emerged to identify the most prevalent techniques: machine learning, clustering, automatic indexing, and automatic classification. Our study presents three domain analytical case analyses in search of answers. The first case relies on citations located using the ISKO-supported "Knowledge Organization Bibliography." The second case relies on works in both Web of Science and SCOPUS. Case three applies co-word analysis and citation analysis to the contents of the papers in the present special issue. We observe scholars involved in "clustering" and "automatic classification" who share common thematic emphases. But we have found no coherence, no common activity and no social semantics. We have not found a research front, or a common teleology within the KO domain. We also have found a lively group of authors who have succeeded in submitting papers to this special issue, and their work quite interestingly aligns with the case studies we report. There is an emphasis on KO for information retrieval; there is much work on clustering (which involves conceptual points within texts) and automatic classification (which involves semantic groupings at the meta-document level).

Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.03

0.0309997 = product of:
  0.108498946 = sum of:
    0.04413724 = weight(_text_:case in 1107) [ClassicSimilarity], result of:
      0.04413724 = score(doc=1107,freq=2.0), product of:
        0.18173204 = queryWeight, product of:
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.041336425 = queryNorm
        0.24286987 = fieldWeight in 1107, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1107)
    0.06436171 = sum of:
      0.03635913 = weight(_text_:studies in 1107) [ClassicSimilarity], result of:
        0.03635913 = score(doc=1107,freq=2.0), product of:
          0.16494368 = queryWeight, product of:
            3.9902744 = idf(docFreq=2222, maxDocs=44218)
            0.041336425 = queryNorm
          0.22043361 = fieldWeight in 1107, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.9902744 = idf(docFreq=2222, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1107)
      0.028002575 = weight(_text_:22 in 1107) [ClassicSimilarity], result of:
        0.028002575 = score(doc=1107,freq=2.0), product of:
          0.14475311 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.041336425 = queryNorm
          0.19345059 = fieldWeight in 1107, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1107)
  0.2857143 = coord(2/7)

Abstract: Retrieval of disease information is often based on several key aspects such as etiology, diagnosis, treatment, prevention, and symptoms of diseases. Automatic identification of disease aspect information is thus essential. In this article, I model the aspect identification problem as a text classification (TC) problem in which a disease aspect corresponds to a category. The disease aspect classification problem poses two challenges to classifiers: (a) a medical text often contains information about multiple aspects of a disease and hence produces noise for the classifiers and (b) text classifiers often cannot extract the textual parts (i.e., passages) about the categories of interest. I thus develop a technique, PETC (Passage Extractor for Text Classification), that extracts passages (from medical texts) for the underlying text classifiers to classify. Case studies on thousands of Chinese and English medical texts show that PETC enhances a support vector machine (SVM) classifier in classifying disease aspect information. PETC also performs better than three state-of-the-art classifier enhancement techniques, including two passage extraction techniques for text classifiers and a technique that employs term proximity information to enhance text classifiers. The contribution is of significance to evidence-based medicine, health education, and healthcare decision support. PETC can be used in those application domains in which a text to be classified may have several parts about different categories.
Date: 28.10.2013 19:22:57

Montesi, M.; Navarrete, T.: Classifying web genres in context : A case study documenting the web genres used by a software engineer (2008) 0.03
```
0.030295819 = product of:
  0.10603536 = sum of:
    0.031131983 = weight(_text_:management in 2100) [ClassicSimilarity], result of:
      0.031131983 = score(doc=2100,freq=2.0), product of:
        0.13932906 = queryWeight, product of:
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.041336425 = queryNorm
        0.22344214 = fieldWeight in 2100, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.046875 = fieldNorm(doc=2100)
    0.07490338 = weight(_text_:case in 2100) [ClassicSimilarity], result of:
      0.07490338 = score(doc=2100,freq=4.0), product of:
        0.18173204 = queryWeight, product of:
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.041336425 = queryNorm
        0.41216385 = fieldWeight in 2100, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.046875 = fieldNorm(doc=2100)
  0.2857143 = coord(2/7)
```
Abstract

This case study analyzes the Internet-based resources that a software engineer uses in his daily work. Methodologically, we studied the web browser history of the participant, classifying all the web pages he had seen over a period of 12 days into web genres. We interviewed him before and after the analysis of the web browser history. In the first interview, he spoke about his general information behavior; in the second, he commented on each web genre, explaining why and how he used them. As a result, three approaches allow us to describe the set of 23 web genres obtained: (a) the purposes they serve for the participant; (b) the role they play in the various work and search phases; (c) and the way they are used in combination with each other. Further observations concern the way the participant assesses quality of web-based resources, and his information behavior as a software engineer.

Source

Information processing and management. 44(2008) no.4, S.1410-1430

Pfeffer, M.: Automatische Vergabe von RVK-Notationen mittels fallbasiertem Schließen (2009) 0.02

0.019933209 = product of:
  0.06976623 = sum of:
    0.052964687 = weight(_text_:case in 3051) [ClassicSimilarity], result of:
      0.052964687 = score(doc=3051,freq=2.0), product of:
        0.18173204 = queryWeight, product of:
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.041336425 = queryNorm
        0.29144385 = fieldWeight in 3051, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.046875 = fieldNorm(doc=3051)
    0.016801544 = product of:
      0.033603087 = sum of:
        0.033603087 = weight(_text_:22 in 3051) [ClassicSimilarity], result of:
          0.033603087 = score(doc=3051,freq=2.0), product of:
            0.14475311 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.041336425 = queryNorm
            0.23214069 = fieldWeight in 3051, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=3051)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)

Date: 22. 8.2009 19:51:28
Theme: Case Based Reasoning

Cathey, R.J.; Jensen, E.C.; Beitzel, S.M.; Frieder, O.; Grossman, D.: Exploiting parallelism to support scalable hierarchical clustering (2007) 0.02
```
0.017804801 = product of:
  0.062316805 = sum of:
    0.04413724 = weight(_text_:case in 448) [ClassicSimilarity], result of:
      0.04413724 = score(doc=448,freq=2.0), product of:
        0.18173204 = queryWeight, product of:
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.041336425 = queryNorm
        0.24286987 = fieldWeight in 448, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.0390625 = fieldNorm(doc=448)
    0.018179566 = product of:
      0.03635913 = sum of:
        0.03635913 = weight(_text_:studies in 448) [ClassicSimilarity], result of:
          0.03635913 = score(doc=448,freq=2.0), product of:
            0.16494368 = queryWeight, product of:
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.041336425 = queryNorm
            0.22043361 = fieldWeight in 448, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.0390625 = fieldNorm(doc=448)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)
```
Abstract

A distributed memory parallel version of the group average hierarchical agglomerative clustering algorithm is proposed to enable scaling the document clustering problem to large collections. Using standard message passing operations reduces interprocess communication while maintaining efficient load balancing. In a series of experiments using a subset of a standard Text REtrieval Conference (TREC) test collection, our parallel hierarchical clustering algorithm is shown to be scalable in terms of processors efficiently used and the collection size. Results show that our algorithm performs close to the expected O(n**2/p) time on p processors rather than the worst-case O(n**3/p) time. Furthermore, the O(n**2/p) memory complexity per node allows larger collections to be clustered as the number of nodes increases. While partitioning algorithms such as k-means are trivially parallelizable, our results confirm those of other studies which showed that hierarchical algorithms produce significantly tighter clusters in the document clustering task. Finally, we show how our parallel hierarchical agglomerative clustering algorithm can be used as the clustering subroutine for a parallel version of the buckshot algorithm to cluster the complete TREC collection at near theoretical runtime expectations.

Classification, automation, and new media : Proceedings of the 24th Annual Conference of the Gesellschaft für Klassifikation e.V., University of Passau, March 15 - 17, 2000 (2002) 0.02

0.015676847 = product of:
  0.05486896 = sum of:
    0.036689393 = weight(_text_:management in 5997) [ClassicSimilarity], result of:
      0.036689393 = score(doc=5997,freq=4.0), product of:
        0.13932906 = queryWeight, product of:
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.041336425 = queryNorm
        0.2633291 = fieldWeight in 5997, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5997)
    0.018179566 = product of:
      0.03635913 = sum of:
        0.03635913 = weight(_text_:studies in 5997) [ClassicSimilarity], result of:
          0.03635913 = score(doc=5997,freq=2.0), product of:
            0.16494368 = queryWeight, product of:
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.041336425 = queryNorm
            0.22043361 = fieldWeight in 5997, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5997)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)

Abstract: Given the huge amount of information in the internet and in practically every domain of knowledge that we are facing today, knowledge discovery calls for automation. The book deals with methods from classification and data analysis that respond effectively to this rapidly growing challenge. The interested reader will find new methodological insights as well as applications in economics, management science, finance, and marketing, and in pattern recognition, biology, health, and archaeology.
Content: Data Analysis, Statistics, and Classification.- Pattern Recognition and Automation.- Data Mining, Information Processing, and Automation.- New Media, Web Mining, and Automation.- Applications in Management Science, Finance, and Marketing.- Applications in Medicine, Biology, Archaeology, and Others.- Author Index.- Subject Index.
Series: Proceedings of the ... annual conference of the Gesellschaft für Klassifikation e.V. ; 24)(Studies in classification, data analysis, and knowledge organization

Wu, K.J.; Chen, M.-C.; Sun, Y.: Automatic topics discovery from hyperlinked documents (2004) 0.02
```
0.015127847 = product of:
  0.05294746 = sum of:
    0.031131983 = weight(_text_:management in 2563) [ClassicSimilarity], result of:
      0.031131983 = score(doc=2563,freq=2.0), product of:
        0.13932906 = queryWeight, product of:
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.041336425 = queryNorm
        0.22344214 = fieldWeight in 2563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.046875 = fieldNorm(doc=2563)
    0.021815477 = product of:
      0.043630954 = sum of:
        0.043630954 = weight(_text_:studies in 2563) [ClassicSimilarity], result of:
          0.043630954 = score(doc=2563,freq=2.0), product of:
            0.16494368 = queryWeight, product of:
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.041336425 = queryNorm
            0.26452032 = fieldWeight in 2563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.046875 = fieldNorm(doc=2563)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)
```
Abstract

Topic discovery is an important means for marketing, e-Business and social science studies. As well, it can be applied to various purposes, such as identifying a group with certain properties and observing the emergence and diminishment of a certain cyber community. Previous topic discovery work (J.M. Kleinberg, Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms, San Francisco, California, p. 668) requires manual judgment of usefulness of outcomes and is thus incapable of handling the explosive growth of the Internet. In this paper, we propose the Automatic Topic Discovery (ATD) method, which combines a method of base set construction, a clustering algorithm and an iterative principal eigenvector computation method to discover the topics relevant to a given query without using manual examination. Given a query, ATD returns with topics associated with the query and top representative pages for each topic. Our experiments show that the ATD method performs better than the traditional eigenvector method in terms of computation time and topic discovery quality.

Source

Information processing and management. 40(2004) no.2, S.239-255

Liu, R.-L.: Context recognition for hierarchical text classification (2009) 0.01

0.013695294 = product of:
  0.047933526 = sum of:
    0.031131983 = weight(_text_:management in 2760) [ClassicSimilarity], result of:
      0.031131983 = score(doc=2760,freq=2.0), product of:
        0.13932906 = queryWeight, product of:
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.041336425 = queryNorm
        0.22344214 = fieldWeight in 2760, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.046875 = fieldNorm(doc=2760)
    0.016801544 = product of:
      0.033603087 = sum of:
        0.033603087 = weight(_text_:22 in 2760) [ClassicSimilarity], result of:
          0.033603087 = score(doc=2760,freq=2.0), product of:
            0.14475311 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.041336425 = queryNorm
            0.23214069 = fieldWeight in 2760, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2760)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)

Abstract: Information is often organized as a text hierarchy. A hierarchical text-classification system is thus essential for the management, sharing, and dissemination of information. It aims to automatically classify each incoming document into zero, one, or several categories in the text hierarchy. In this paper, we present a technique called CRHTC (context recognition for hierarchical text classification) that performs hierarchical text classification by recognizing the context of discussion (COD) of each category. A category's COD is governed by its ancestor categories, whose contents indicate contextual backgrounds of the category. A document may be classified into a category only if its content matches the category's COD. CRHTC does not require any trials to manually set parameters, and hence is more portable and easier to implement than other methods. It is empirically evaluated under various conditions. The results show that CRHTC achieves both better and more stable performance than several hierarchical and nonhierarchical text-classification methodologies.
Date: 22. 3.2009 19:11:54

Lim, C.S.; Lee, K.J.; Kim, G.C.: Multiple sets of features for automatic genre classification of web documents (2005) 0.01
```
0.012606538 = product of:
  0.044122882 = sum of:
    0.025943318 = weight(_text_:management in 1048) [ClassicSimilarity], result of:
      0.025943318 = score(doc=1048,freq=2.0), product of:
        0.13932906 = queryWeight, product of:
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.041336425 = queryNorm
        0.18620178 = fieldWeight in 1048, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1048)
    0.018179566 = product of:
      0.03635913 = sum of:
        0.03635913 = weight(_text_:studies in 1048) [ClassicSimilarity], result of:
          0.03635913 = score(doc=1048,freq=2.0), product of:
            0.16494368 = queryWeight, product of:
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.041336425 = queryNorm
            0.22043361 = fieldWeight in 1048, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1048)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)
```
Abstract

With the increase of information on the Web, it is difficult to find desired information quickly out of the documents retrieved by a search engine. One way to solve this problem is to classify web documents according to various criteria. Most document classification has been focused on a subject or a topic of a document. A genre or a style is another view of a document different from a subject or a topic. The genre is also a criterion to classify documents. In this paper, we suggest multiple sets of features to classify genres of web documents. The basic set of features, which have been proposed in the previous studies, is acquired from the textual properties of documents, such as the number of sentences, the number of a certain word, etc. However, web documents are different from textual documents in that they contain URL and HTML tags within the pages. We introduce new sets of features specific to web documents, which are extracted from URL and HTML tags. The present work is an attempt to evaluate the performance of the proposed sets of features, and to discuss their characteristics. Finally, we conclude which is an appropriate set of features in automatic genre classification of web documents.

Source

Information processing and management. 41(2005) no.5, S.1263-1276

Egbert, J.; Biber, D.; Davies, M.: Developing a bottom-up, user-based method of web register classification (2015) 0.01

0.011033435 = product of:
  0.077234045 = sum of:
    0.077234045 = sum of:
      0.043630954 = weight(_text_:studies in 2158) [ClassicSimilarity], result of:
        0.043630954 = score(doc=2158,freq=2.0), product of:
          0.16494368 = queryWeight, product of:
            3.9902744 = idf(docFreq=2222, maxDocs=44218)
            0.041336425 = queryNorm
          0.26452032 = fieldWeight in 2158, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.9902744 = idf(docFreq=2222, maxDocs=44218)
            0.046875 = fieldNorm(doc=2158)
      0.033603087 = weight(_text_:22 in 2158) [ClassicSimilarity], result of:
        0.033603087 = score(doc=2158,freq=2.0), product of:
          0.14475311 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.041336425 = queryNorm
          0.23214069 = fieldWeight in 2158, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=2158)
  0.14285715 = coord(1/7)

Abstract: This paper introduces a project to develop a reliable, cost-effective method for classifying Internet texts into register categories, and apply that approach to the analysis of a large corpus of web documents. To date, the project has proceeded in 2 key phases. First, we developed a bottom-up method for web register classification, asking end users of the web to utilize a decision-tree survey to code relevant situational characteristics of web documents, resulting in a bottom-up identification of register and subregister categories. We present details regarding the development and testing of this method through a series of 10 pilot studies. Then, in the second phase of our project we applied this procedure to a corpus of 53,000 web documents. An analysis of the results demonstrates the effectiveness of these methods for web register classification and provides a preliminary description of the types and distribution of registers on the web.
Date: 4. 8.2015 19:22:04

Kwok, K.L.: ¬The use of titles and cited titles as document representations for automatic classification (1975) 0.01

0.010377328 = product of:
  0.07264129 = sum of:
    0.07264129 = weight(_text_:management in 4347) [ClassicSimilarity], result of:
      0.07264129 = score(doc=4347,freq=2.0), product of:
        0.13932906 = queryWeight, product of:
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.041336425 = queryNorm
        0.521365 = fieldWeight in 4347, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.109375 = fieldNorm(doc=4347)
  0.14285715 = coord(1/7)

Source: Information processing and management. 11(1975), S.201-206

Wu, M.; Fuller, M.; Wilkinson, R.: Using clustering and classification approaches in interactive retrieval (2001) 0.01

0.010377328 = product of:
  0.07264129 = sum of:
    0.07264129 = weight(_text_:management in 2666) [ClassicSimilarity], result of:
      0.07264129 = score(doc=2666,freq=2.0), product of:
        0.13932906 = queryWeight, product of:
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.041336425 = queryNorm
        0.521365 = fieldWeight in 2666, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.109375 = fieldNorm(doc=2666)
  0.14285715 = coord(1/7)

Source: Information processing and management. 37(2001) no.3, S.459-484

Ingwersen, P.; Wormell, I.: Ranganathan in the perspective of advanced information retrieval (1992) 0.01

0.010088513 = product of:
  0.07061958 = sum of:
    0.07061958 = weight(_text_:case in 7695) [ClassicSimilarity], result of:
      0.07061958 = score(doc=7695,freq=2.0), product of:
        0.18173204 = queryWeight, product of:
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.041336425 = queryNorm
        0.3885918 = fieldWeight in 7695, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.0625 = fieldNorm(doc=7695)
  0.14285715 = coord(1/7)

Abstract: Examnines Ranganathan's approach to knowledge organisation and its relevance to intellectual accessibility in libraries. Discusses the current and future developments of his methodology and theories in knowledge-based systems. Topics covered include: semi-automatic classification and structure of thesauri; user-intermediary interactions in information retrieval (IR); semantic value-theory and uncertainty principles in IR; and case grammar

Major, R.L.; Ragsdale, C.T.: ¬An aggregation approach to the classification problem using multiple prediction experts (2000) 0.01

0.008894852 = product of:
  0.062263966 = sum of:
    0.062263966 = weight(_text_:management in 3789) [ClassicSimilarity], result of:
      0.062263966 = score(doc=3789,freq=2.0), product of:
        0.13932906 = queryWeight, product of:
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.041336425 = queryNorm
        0.44688427 = fieldWeight in 3789, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.09375 = fieldNorm(doc=3789)
  0.14285715 = coord(1/7)

Source: Information processing and management. 36(2000) no.4, S.683-696

Krellenstein, M.: Document classification at Northern Light (1999) 0.01

0.008894852 = product of:
  0.062263966 = sum of:
    0.062263966 = weight(_text_:management in 4435) [ClassicSimilarity], result of:
      0.062263966 = score(doc=4435,freq=2.0), product of:
        0.13932906 = queryWeight, product of:
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.041336425 = queryNorm
        0.44688427 = fieldWeight in 4435, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.09375 = fieldNorm(doc=4435)
  0.14285715 = coord(1/7)

Footnote: Vortrag bei: Search engines and beyond: developing efficient knowledge management systems; 1999 Search engine Meeting, Boston, MA, April 19-20 1999

Bianchini, C.; Bargioni, S.: Automated classification using linked open data : a case study on faceted classification and Wikidata (2021) 0.01

0.008827448 = product of:
  0.061792135 = sum of:
    0.061792135 = weight(_text_:case in 724) [ClassicSimilarity], result of:
      0.061792135 = score(doc=724,freq=2.0), product of:
        0.18173204 = queryWeight, product of:
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.041336425 = queryNorm
        0.34001783 = fieldWeight in 724, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.0546875 = fieldNorm(doc=724)
  0.14285715 = coord(1/7)

Savic, D.: Designing an expert system for classifying office documents (1994) 0.01

0.008386148 = product of:
  0.05870303 = sum of:
    0.05870303 = weight(_text_:management in 2655) [ClassicSimilarity], result of:
      0.05870303 = score(doc=2655,freq=4.0), product of:
        0.13932906 = queryWeight, product of:
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.041336425 = queryNorm
        0.42132655 = fieldWeight in 2655, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.0625 = fieldNorm(doc=2655)
  0.14285715 = coord(1/7)

Abstract: Can records management benefit from artificial intelligence technology, in particular from expert systems? Gives an answer to this question by showing an example of a small scale prototype project in automatic classification of office documents. Project methodology and basic elements of an expert system's approach are elaborated to give guidelines to potential users of this promising technology
Source: Records management quarterly. 28(1994) no.3, S.20-29

Larson, R.R.: Experiments in automatic Library of Congress Classification (1992) 0.01
```
0.007566384 = product of:
  0.052964687 = sum of:
    0.052964687 = weight(_text_:case in 1054) [ClassicSimilarity], result of:
      0.052964687 = score(doc=1054,freq=2.0), product of:
        0.18173204 = queryWeight, product of:
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.041336425 = queryNorm
        0.29144385 = fieldWeight in 1054, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.046875 = fieldNorm(doc=1054)
  0.14285715 = coord(1/7)
```
Abstract

This article presents the results of research into the automatic selection of Library of Congress Classification numbers based on the titles and subject headings in MARC records. The method used in this study was based on partial match retrieval techniques using various elements of new recors (i.e., those to be classified) as "queries", and a test database of classification clusters generated from previously classified MARC records. Sixty individual methods for automatic classification were tested on a set of 283 new records, using all combinations of four different partial match methods, five query types, and three representations of search terms. The results indicate that if the best method for a particular case can be determined, then up to 86% of the new records may be correctly classified. The single method with the best accuracy was able to select the correct classification for about 46% of the new records.

Pfeffer, M.: Automatische Vergabe von RVK-Notationen anhand von bibliografischen Daten mittels fallbasiertem Schließen (2007) 0.01

0.007566384 = product of:
  0.052964687 = sum of:
    0.052964687 = weight(_text_:case in 558) [ClassicSimilarity], result of:
      0.052964687 = score(doc=558,freq=2.0), product of:
        0.18173204 = queryWeight, product of:
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.041336425 = queryNorm
        0.29144385 = fieldWeight in 558, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.046875 = fieldNorm(doc=558)
  0.14285715 = coord(1/7)

Theme: Case Based Reasoning

Search (71 results, page 1 of 4)

Authors

Years

Languages

Types

Themes

Subjects