Search (55 results, page 2 of 3)

Lim, C.S.; Lee, K.J.; Kim, G.C.: Multiple sets of features for automatic genre classification of web documents (2005) 0.00
```
0.0038096926 = product of:
  0.049526002 = sum of:
    0.049526002 = weight(_text_:web in 1048) [ClassicSimilarity], result of:
      0.049526002 = score(doc=1048,freq=14.0), product of:
        0.10383032 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031815533 = queryNorm
        0.47698978 = fieldWeight in 1048, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1048)
  0.07692308 = coord(1/13)
```
Abstract

With the increase of information on the Web, it is difficult to find desired information quickly out of the documents retrieved by a search engine. One way to solve this problem is to classify web documents according to various criteria. Most document classification has been focused on a subject or a topic of a document. A genre or a style is another view of a document different from a subject or a topic. The genre is also a criterion to classify documents. In this paper, we suggest multiple sets of features to classify genres of web documents. The basic set of features, which have been proposed in the previous studies, is acquired from the textual properties of documents, such as the number of sentences, the number of a certain word, etc. However, web documents are different from textual documents in that they contain URL and HTML tags within the pages. We introduce new sets of features specific to web documents, which are extracted from URL and HTML tags. The present work is an attempt to evaluate the performance of the proposed sets of features, and to discuss their characteristics. Finally, we conclude which is an appropriate set of features in automatic genre classification of web documents.

Chan, L.M.; Lin, X.; Zeng, M.L.: Structural and multilingual approaches to subject access on the Web (2000) 0.00

0.0034558282 = product of:
  0.044925764 = sum of:
    0.044925764 = weight(_text_:web in 507) [ClassicSimilarity], result of:
      0.044925764 = score(doc=507,freq=2.0), product of:
        0.10383032 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031815533 = queryNorm
        0.43268442 = fieldWeight in 507, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.09375 = fieldNorm(doc=507)
  0.07692308 = coord(1/13)

Golub, K.: Automated subject classification of textual Web pages, based on a controlled vocabulary : challenges and recommendations (2006) 0.00
```
0.0034558282 = product of:
  0.044925764 = sum of:
    0.044925764 = weight(_text_:web in 5897) [ClassicSimilarity], result of:
      0.044925764 = score(doc=5897,freq=8.0), product of:
        0.10383032 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031815533 = queryNorm
        0.43268442 = fieldWeight in 5897, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=5897)
  0.07692308 = coord(1/13)
```
Abstract

The primary objective of this study was to identify and address problems of applying a controlled vocabulary in automated subject classification of textual Web pages, in the area of engineering. Web pages have special characteristics such as structural information, but are at the same time rather heterogeneous. The classification approach used comprises string-to-string matching between words in a term list extracted from the Ei (Engineering Information) thesaurus and classification scheme, and words in the text to be classified. Based on a sample of 70 Web pages, a number of problems with the term list are identified. Reasons for those problems are discussed and improvements proposed. Methods for implementing the improvements are also specified, suggesting further research.

Brückner, T.; Dambeck, H.: Sortierautomaten : Grundlagen der Textklassifizierung (2003) 0.00

0.003404462 = product of:
  0.044258006 = sum of:
    0.044258006 = weight(_text_:software in 2398) [ClassicSimilarity], result of:
      0.044258006 = score(doc=2398,freq=2.0), product of:
        0.12621705 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.031815533 = queryNorm
        0.35064998 = fieldWeight in 2398, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.0625 = fieldNorm(doc=2398)
  0.07692308 = coord(1/13)

Abstract: Rechnung, Kündigung oder Adressänderung? Eingehende Briefe und E-Mails werden immer häufiger von Software statt aufwändig von Menschenhand sortiert. Die Textklassifizierer arbeiten erstaunlich genau. Sie fahnden auch nach ähnlichen Texten und sorgen so für einen schnellen Überblick. Ihre Werkzeuge sind Linguistik, Statistik und Logik

Prabowo, R.; Jackson, M.; Burden, P.; Knoell, H.-D.: Ontology-based automatic classification for the Web pages : design, implementation and evaluation (2002) 0.00
```
0.0029928354 = product of:
  0.038906857 = sum of:
    0.038906857 = weight(_text_:web in 3383) [ClassicSimilarity], result of:
      0.038906857 = score(doc=3383,freq=6.0), product of:
        0.10383032 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031815533 = queryNorm
        0.37471575 = fieldWeight in 3383, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=3383)
  0.07692308 = coord(1/13)
```
Abstract

In recent years, we have witnessed the continual growth in the use of ontologies in order to provide a mechanism to enable machine reasoning. This paper describes an automatic classifier, which focuses on the use of ontologies for classifying Web pages with respect to the Dewey Decimal Classification (DDC) and Library of Congress Classification (LCC) schemes. Firstly, we explain how these ontologies can be built in a modular fashion, and mapped into DDC and LCC. Secondly, we propose the formal definition of a DDC-LCC and an ontology-classification-scheme mapping. Thirdly, we explain the way the classifier uses these ontologies to assist classification. Finally, an experiment in which the accuracy of the classifier was evaluated is presented. The experiment shows that our approach results an improved classification in terms of accuracy. This improvement, however, comes at a cost in a low overage ratio due to the incompleteness of the ontologies used

Content

Beitrag bei: The Third International Conference on Web Information Systems Engineering (WISE'00) Dec., 12-14, 2002, Singapore, S.182.
Hung, C.-M.; Chien, L.-F.: Web-based text classification in the absence of manually labeled training documents (2007) 0.00
```
0.0029928354 = product of:
  0.038906857 = sum of:
    0.038906857 = weight(_text_:web in 87) [ClassicSimilarity], result of:
      0.038906857 = score(doc=87,freq=6.0), product of:
        0.10383032 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031815533 = queryNorm
        0.37471575 = fieldWeight in 87, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=87)
  0.07692308 = coord(1/13)
```
Abstract

Most text classification techniques assume that manually labeled documents (corpora) can be easily obtained while learning text classifiers. However, labeled training documents are sometimes unavailable or inadequate even if they are available. The goal of this article is to present a self-learned approach to extract high-quality training documents from the Web when the required manually labeled documents are unavailable or of poor quality. To learn a text classifier automatically, we need only a set of user-defined categories and some highly related keywords. Extensive experiments are conducted to evaluate the performance of the proposed approach using the test set from the Reuters-21578 news data set. The experiments show that very promising results can be achieved only by using automatically extracted documents from the Web.

Wille, J.: Automatisches Klassifizieren bibliographischer Beschreibungsdaten : Vorgehensweise und Ergebnisse (2006) 0.00

0.0029789044 = product of:
  0.038725756 = sum of:
    0.038725756 = weight(_text_:software in 6090) [ClassicSimilarity], result of:
      0.038725756 = score(doc=6090,freq=2.0), product of:
        0.12621705 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.031815533 = queryNorm
        0.30681872 = fieldWeight in 6090, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.0546875 = fieldNorm(doc=6090)
  0.07692308 = coord(1/13)

Footnote: http://www.fbi.fh-koeln.de/institut/papers/abschlussarbeiten/abschlussarbeiten_ausgabe.php Vgl. auch: http://eprints.rclis.org/archive/00006659/01/wille_-_automatisches_klassifizieren_bibliographischer_beschreibungsdaten_(diplomarbeit).pdf. Für die Software vgl.: http://blackwinter.de/da/.

Shen, D.; Chen, Z.; Yang, Q.; Zeng, H.J.; Zhang, B.; Lu, Y.; Ma, W.Y.: Web page classification through summarization (2004) 0.00

0.002879857 = product of:
  0.03743814 = sum of:
    0.03743814 = weight(_text_:web in 4132) [ClassicSimilarity], result of:
      0.03743814 = score(doc=4132,freq=2.0), product of:
        0.10383032 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031815533 = queryNorm
        0.36057037 = fieldWeight in 4132, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.078125 = fieldNorm(doc=4132)
  0.07692308 = coord(1/13)

Golub, K.: Automated subject classification of textual web documents (2006) 0.00
```
0.002879857 = product of:
  0.03743814 = sum of:
    0.03743814 = weight(_text_:web in 5600) [ClassicSimilarity], result of:
      0.03743814 = score(doc=5600,freq=8.0), product of:
        0.10383032 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031815533 = queryNorm
        0.36057037 = fieldWeight in 5600, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5600)
  0.07692308 = coord(1/13)
```
Abstract

Purpose - To provide an integrated perspective to similarities and differences between approaches to automated classification in different research communities (machine learning, information retrieval and library science), and point to problems with the approaches and automated classification as such. Design/methodology/approach - A range of works dealing with automated classification of full-text web documents are discussed. Explorations of individual approaches are given in the following sections: special features (description, differences, evaluation), application and characteristics of web pages. Findings - Provides major similarities and differences between the three approaches: document pre-processing and utilization of web-specific document characteristics is common to all the approaches; major differences are in applied algorithms, employment or not of the vector space model and of controlled vocabularies. Problems of automated classification are recognized. Research limitations/implications - The paper does not attempt to provide an exhaustive bibliography of related resources. Practical implications - As an integrated overview of approaches from different research communities with application examples, it is very useful for students in library and information science and computer science, as well as for practitioners. Researchers from one community have the information on how similar tasks are conducted in different communities. Originality/value - To the author's knowledge, no review paper on automated text classification attempted to discuss more than one community's approach from an integrated perspective.
Walther, R.: Möglichkeiten und Grenzen automatischer Klassifikationen von Web-Dokumenten (2001) 0.00
```
0.002850913 = product of:
  0.037061866 = sum of:
    0.037061866 = weight(_text_:web in 1562) [ClassicSimilarity], result of:
      0.037061866 = score(doc=1562,freq=4.0), product of:
        0.10383032 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031815533 = queryNorm
        0.35694647 = fieldWeight in 1562, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1562)
  0.07692308 = coord(1/13)
```
Abstract

Automatische Klassifikationen von Web- und andern Textdokumenten ermöglichen es, betriebsinterne und externe Informationen geordnet zugänglich zu machen. Die Forschung zur automatischen Klassifikation hat sich in den letzten Jahren intensiviert. Das Resultat sind verschiedenen Methoden, die heute in der Praxis einzeln oder kombiniert für die Klassifikation im Einsatz sind. In der vorliegenden Lizenziatsarbeit werden neben allgemeinen Grundsätzen einige Methoden zur automatischen Klassifikation genauer betrachtet und ihre Möglichkeiten und Grenzen erörtert. Daneben erfolgt die Präsentation der Resultate aus einer Umfrage bei Anbieterrfirmen von Softwarelösungen zur automatische Klassifikation von Text-Dokumenten. Die Ausführungen dienen der myax internet AG als Basis, ein eigenes Klassifikations-Produkt zu entwickeln
Yoon, Y.; Lee, G.G.: Efficient implementation of associative classifiers for document classification (2007) 0.00
```
0.0023968702 = product of:
  0.031159312 = sum of:
    0.031159312 = weight(_text_:world in 909) [ClassicSimilarity], result of:
      0.031159312 = score(doc=909,freq=2.0), product of:
        0.122288436 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.031815533 = queryNorm
        0.25480178 = fieldWeight in 909, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.046875 = fieldNorm(doc=909)
  0.07692308 = coord(1/13)
```
Abstract

In practical text classification tasks, the ability to interpret the classification result is as important as the ability to classify exactly. Associative classifiers have many favorable characteristics such as rapid training, good classification accuracy, and excellent interpretation. However, associative classifiers also have some obstacles to overcome when they are applied in the area of text classification. The target text collection generally has a very high dimension, thus the training process might take a very long time. We propose a feature selection based on the mutual information between the word and class variables to reduce the space dimension of the associative classifiers. In addition, the training process of the associative classifier produces a huge amount of classification rules, which makes the prediction with a new document ineffective. We resolve this by introducing a new efficient method for storing and pruning classification rules. This method can also be used when predicting a test document. Experimental results using the 20-newsgroups dataset show many benefits of the associative classification in both training and predicting when applied to a real world problem.

Ozmutlu, S.; Cosar, G.C.: Analyzing the results of automatic new topic identification (2008) 0.00

0.0023968702 = product of:
  0.031159312 = sum of:
    0.031159312 = weight(_text_:world in 2604) [ClassicSimilarity], result of:
      0.031159312 = score(doc=2604,freq=2.0), product of:
        0.122288436 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.031815533 = queryNorm
        0.25480178 = fieldWeight in 2604, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.046875 = fieldNorm(doc=2604)
  0.07692308 = coord(1/13)

Footnote: Beitrag in einem Themenheft "Technology around the world"

Peng, F.; Huang, X.: Machine learning for Asian language text classification (2007) 0.00
```
0.0020363664 = product of:
  0.026472762 = sum of:
    0.026472762 = weight(_text_:web in 831) [ClassicSimilarity], result of:
      0.026472762 = score(doc=831,freq=4.0), product of:
        0.10383032 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031815533 = queryNorm
        0.25496176 = fieldWeight in 831, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=831)
  0.07692308 = coord(1/13)
```
Abstract

Purpose - The purpose of this research is to compare several machine learning techniques on the task of Asian language text classification, such as Chinese and Japanese where no word boundary information is available in written text. The paper advocates a simple language modeling based approach for this task. Design/methodology/approach - Naïve Bayes, maximum entropy model, support vector machines, and language modeling approaches were implemented and were applied to Chinese and Japanese text classification. To investigate the influence of word segmentation, different word segmentation approaches were investigated and applied to Chinese text. A segmentation-based approach was compared with the non-segmentation-based approach. Findings - There were two findings: the experiments show that statistical language modeling can significantly outperform standard techniques, given the same set of features; and it was found that classification with word level features normally yields improved classification performance, but that classification performance is not monotonically related to segmentation accuracy. In particular, classification performance may initially improve with increased segmentation accuracy, but eventually classification performance stops improving, and can in fact even decrease, after a certain level of segmentation accuracy. Practical implications - Apply the findings to real web text classification is ongoing work. Originality/value - The paper is very relevant to Chinese and Japanese information processing, e.g. webpage classification, web search.
Golub, K.; Lykke, M.: Automated classification of web pages in hierarchical browsing (2009) 0.00
```
0.0020363664 = product of:
  0.026472762 = sum of:
    0.026472762 = weight(_text_:web in 3614) [ClassicSimilarity], result of:
      0.026472762 = score(doc=3614,freq=4.0), product of:
        0.10383032 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031815533 = queryNorm
        0.25496176 = fieldWeight in 3614, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3614)
  0.07692308 = coord(1/13)
```
Abstract

Purpose - The purpose of this study is twofold: to investigate whether it is meaningful to use the Engineering Index (Ei) classification scheme for browsing, and then, if proven useful, to investigate the performance of an automated classification algorithm based on the Ei classification scheme. Design/methodology/approach - A user study was conducted in which users solved four controlled searching tasks. The users browsed the Ei classification scheme in order to examine the suitability of the classification systems for browsing. The classification algorithm was evaluated by the users who judged the correctness of the automatically assigned classes. Findings - The study showed that the Ei classification scheme is suited for browsing. Automatically assigned classes were on average partly correct, with some classes working better than others. Success of browsing showed to be correlated and dependent on classification correctness. Research limitations/implications - Further research should address problems of disparate evaluations of one and the same web page. Additional reasons behind browsing failures in the Ei classification scheme also need further investigation. Practical implications - Improvements for browsing were identified: describing class captions and/or listing their subclasses from start; allowing for searching for words from class captions with synonym search (easily provided for Ei since the classes are mapped to thesauri terms); when searching for class captions, returning the hierarchical tree expanded around the class in which caption the search term is found. The need for improvements of classification schemes was also indicated. Originality/value - A user-based evaluation of automated subject classification in the context of browsing has not been conducted before; hence the study also presents new findings concerning methodology.
Wang, J.: ¬An extensive study on automated Dewey Decimal Classification (2009) 0.00
```
0.001997392 = product of:
  0.025966093 = sum of:
    0.025966093 = weight(_text_:world in 3172) [ClassicSimilarity], result of:
      0.025966093 = score(doc=3172,freq=2.0), product of:
        0.122288436 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.031815533 = queryNorm
        0.21233483 = fieldWeight in 3172, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3172)
  0.07692308 = coord(1/13)
```
Abstract

In this paper, we present a theoretical analysis and extensive experiments on the automated assignment of Dewey Decimal Classification (DDC) classes to bibliographic data with a supervised machine-learning approach. Library classification systems, such as the DDC, impose great obstacles on state-of-art text categorization (TC) technologies, including deep hierarchy, data sparseness, and skewed distribution. We first analyze statistically the document and category distributions over the DDC, and discuss the obstacles imposed by bibliographic corpora and library classification schemes on TC technology. To overcome these obstacles, we propose an innovative algorithm to reshape the DDC structure into a balanced virtual tree by balancing the category distribution and flattening the hierarchy. To improve the classification effectiveness to a level acceptable to real-world applications, we propose an interactive classification model that is able to predict a class of any depth within a limited number of user interactions. The experiments are conducted on a large bibliographic collection created by the Library of Congress within the science and technology domains over 10 years. With no more than three interactions, a classification accuracy of nearly 90% is achieved, thus providing a practical solution to the automatic bibliographic classification problem.
Oberhauser, O.: Automatisches Klassifizieren : Verfahren zur Erschließung elektronischer Dokumente (2004) 0.00
```
0.0019952233 = product of:
  0.025937904 = sum of:
    0.025937904 = weight(_text_:web in 2487) [ClassicSimilarity], result of:
      0.025937904 = score(doc=2487,freq=6.0), product of:
        0.10383032 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031815533 = queryNorm
        0.24981049 = fieldWeight in 2487, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03125 = fieldNorm(doc=2487)
  0.07692308 = coord(1/13)
```
Abstract

Automatisches Klassifizieren von Textdokumenten bedeutet die maschinelle Zuordnung jeweils einer oder mehrerer Notationen eines vorgegebenen Klassifikationssystems zu natürlich-sprachlichen Texten mithilfe eines geeigneten Algorithmus. In der vorliegenden Arbeit wird in Form einer umfassenden Literaturstudie ein aktueller Kenntnisstand zu den Ein-satzmöglichkeiten des automatischen Klassifizierens für die sachliche Erschliessung von elektronischen Dokumenten, insbesondere von Web-Ressourcen, erarbeitet. Dies betrifft zum einen den methodischen Aspekt und zum anderen die in relevanten Projekten und Anwendungen gewonnenen Erfahrungen. In methodischer Hinsicht gelten heute statistische Verfahren, die auf dem maschinellen Lernen basieren und auf der Grundlage bereits klassifizierter Beispieldokumente ein Modell - einen "Klassifikator" - erstellen, das zur Klassifizierung neuer Dokumente verwendet werden kann, als "state-of-the-art". Die vier in den 1990er Jahren an den Universitäten Lund, Wolverhampton und Oldenburg sowie bei OCLC (Dublin, OH) durchgeführten "grossen" Projekte zum automatischen Klassifizieren von Web-Ressourcen, die in dieser Arbeit ausführlich analysiert werden, arbeiteten allerdings noch mit einfacheren bzw. älteren methodischen Ansätzen. Diese Projekte bedeuten insbesondere aufgrund ihrer Verwendung etablierter bibliothekarischer Klassifikationssysteme einen wichtigen Erfahrungsgewinn, selbst wenn sie bisher nicht zu permanenten und qualitativ zufriedenstellenden Diensten für die Erschliessung elektronischer Ressourcen geführt haben. Die Analyse der weiteren einschlägigen Anwendungen und Projekte lässt erkennen, dass derzeit in den Bereichen Patent- und Mediendokumentation die aktivsten Bestrebungen bestehen, Systeme für die automatische klassifikatorische Erschliessung elektronischer Dokumente im laufenden operativen Betrieb einzusetzen. Dabei dominieren jedoch halbautomatische Systeme, die menschliche Bearbeiter durch Klassifizierungsvorschläge unterstützen, da die gegenwärtig erreichbare Klassifizierungsgüte für eine Vollautomatisierung meist noch nicht ausreicht. Weitere interessante Anwendungen und Projekte finden sich im Bereich von Web-Portalen, Suchmaschinen und (kommerziellen) Informationsdiensten, während sich etwa im Bibliothekswesen kaum nennenswertes Interesse an einer automatischen Klassifizierung von Büchern bzw. bibliographischen Datensätzen registrieren lässt. Die Studie schliesst mit einer Diskussion der wichtigsten Projekte und Anwendungen sowie einiger im Zusammenhang mit dem automatischen Klassifizieren relevanter Fragestellungen und Themen.
Denoyer, L.; Gallinari, P.: Bayesian network model for semi-structured document classification (2004) 0.00
```
0.0017279141 = product of:
  0.022462882 = sum of:
    0.022462882 = weight(_text_:web in 995) [ClassicSimilarity], result of:
      0.022462882 = score(doc=995,freq=2.0), product of:
        0.10383032 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031815533 = queryNorm
        0.21634221 = fieldWeight in 995, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=995)
  0.07692308 = coord(1/13)
```
Abstract

Recently, a new community has started to emerge around the development of new information research methods for searching and analyzing semi-structured and XML like documents. The goal is to handle both content and structural information, and to deal with different types of information content (text, image, etc.). We consider here the task of structured document classification. We propose a generative model able to handle both structure and content which is based on Bayesian networks. We then show how to transform this generative model into a discriminant classifier using the method of Fisher kernel. The model is then extended for dealing with different types of content information (here text and images). The model was tested on three databases: the classical webKB corpus composed of HTML pages, the new INEX corpus which has become a reference in the field of ad-hoc retrieval for XML documents, and a multimedia corpus of Web pages.
Hagedorn, K.; Chapman, S.; Newman, D.: Enhancing search and browse using automated clustering of subject metadata (2007) 0.00
```
0.0017279141 = product of:
  0.022462882 = sum of:
    0.022462882 = weight(_text_:web in 1168) [ClassicSimilarity], result of:
      0.022462882 = score(doc=1168,freq=2.0), product of:
        0.10383032 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031815533 = queryNorm
        0.21634221 = fieldWeight in 1168, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=1168)
  0.07692308 = coord(1/13)
```
Abstract

The Web puzzle of online information resources often hinders end-users from effective and efficient access to these resources. Clustering resources into appropriate subject-based groupings may help alleviate these difficulties, but will it work with heterogeneous material? The University of Michigan and the University of California Irvine joined forces to test automatically enhancing metadata records using the Topic Modeling algorithm on the varied OAIster corpus. We created labels for the resulting clusters of metadata records, matched the clusters to an in-house classification system, and developed a prototype that would showcase methods for search and retrieval using the enhanced records. Results indicated that while the algorithm was somewhat time-intensive to run and using a local classification scheme had its drawbacks, precise clustering of records was achieved and the prototype interface proved that faceted classification could be powerful in helping end-users find resources.

Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.00

0.0013263279 = product of:
  0.017242262 = sum of:
    0.017242262 = product of:
      0.051726785 = sum of:
        0.051726785 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
          0.051726785 = score(doc=1046,freq=2.0), product of:
            0.11141258 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.031815533 = queryNorm
            0.46428138 = fieldWeight in 1046, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=1046)
      0.33333334 = coord(1/3)
  0.07692308 = coord(1/13)

Date: 5. 5.2003 14:17:22

Hoffmann, R.: Entwicklung einer benutzerunterstützten automatisierten Klassifikation von Web - Dokumenten : Untersuchung gegenwärtiger Methoden zur automatisierten Dokumentklassifikation und Implementierung eines Prototyps zum verbesserten Information Retrieval für das xFIND System (2002) 0.00

0.0011519426 = product of:
  0.014975254 = sum of:
    0.014975254 = weight(_text_:web in 4197) [ClassicSimilarity], result of:
      0.014975254 = score(doc=4197,freq=2.0), product of:
        0.10383032 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031815533 = queryNorm
        0.14422815 = fieldWeight in 4197, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03125 = fieldNorm(doc=4197)
  0.07692308 = coord(1/13)

Search (55 results, page 2 of 3)

Authors

Languages

Types

Themes

Subjects