Search (152 results, page 8 of 8)

Adams, K.C.: Word wranglers : Automatic classification tools transform enterprise documents from "bags of words" into knowledge resources (2003) 0.00
```
0.0023017537 = product of:
  0.009207015 = sum of:
    0.009207015 = weight(_text_:for in 1665) [ClassicSimilarity], result of:
      0.009207015 = score(doc=1665,freq=2.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.103720546 = fieldWeight in 1665, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1665)
  0.25 = coord(1/4)
```
Abstract

Taxonomies are an important part of any knowledge management (KM) system, and automatic classification software is emerging as a "killer app" for consumer and enterprise portals. A number of companies such as Inxight Software , Mohomine, Metacode, and others claim to interpret the semantic content of any textual document and automatically classify text on the fly. The promise that software could automatically produce a Yahoo-style directory is a siren call not many IT managers are able to resist. KM needs have grown more complex due to the increasing amount of digital information, the declining effectiveness of keyword searching, and heterogeneous document formats in corporate databases. This environment requires innovative KM tools, and automatic classification technology is an example of this new kind of software. These products can be divided into three categories according to their underlying technology - rules-based, catalog-by-example, and statistical clustering. Evolving trends in this market include framing classification as a cyborg (computer- and human-based) activity and the increasing use of extensible markup language (XML) and support vector machine (SVM) technology. In this article, we'll survey the rapidly changing automatic classification software market and examine the features and capabilities of leading classification products.
Classification, automation, and new media : Proceedings of the 24th Annual Conference of the Gesellschaft für Klassifikation e.V., University of Passau, March 15 - 17, 2000 (2002) 0.00
```
0.0023017537 = product of:
  0.009207015 = sum of:
    0.009207015 = weight(_text_:for in 5997) [ClassicSimilarity], result of:
      0.009207015 = score(doc=5997,freq=2.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.103720546 = fieldWeight in 5997, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5997)
  0.25 = coord(1/4)
```
Abstract

Given the huge amount of information in the internet and in practically every domain of knowledge that we are facing today, knowledge discovery calls for automation. The book deals with methods from classification and data analysis that respond effectively to this rapidly growing challenge. The interested reader will find new methodological insights as well as applications in economics, management science, finance, and marketing, and in pattern recognition, biology, health, and archaeology.

Lim, C.S.; Lee, K.J.; Kim, G.C.: Multiple sets of features for automatic genre classification of web documents (2005) 0.00

0.0023017537 = product of:
  0.009207015 = sum of:
    0.009207015 = weight(_text_:for in 1048) [ClassicSimilarity], result of:
      0.009207015 = score(doc=1048,freq=2.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.103720546 = fieldWeight in 1048, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1048)
  0.25 = coord(1/4)

Pong, J.Y.-H.; Kwok, R.C.-W.; Lau, R.Y.-K.; Hao, J.-X.; Wong, P.C.-C.: ¬A comparative study of two automatic document classification methods in a library setting (2008) 0.00
```
0.0023017537 = product of:
  0.009207015 = sum of:
    0.009207015 = weight(_text_:for in 2532) [ClassicSimilarity], result of:
      0.009207015 = score(doc=2532,freq=2.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.103720546 = fieldWeight in 2532, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2532)
  0.25 = coord(1/4)
```
Abstract

In current library practice, trained human experts usually carry out document cataloguing and indexing based on a manual approach. With the explosive growth in the number of electronic documents available on the Internet and digital libraries, it is increasingly difficult for library practitioners to categorize both electronic documents and traditional library materials using just a manual approach. To improve the effectiveness and efficiency of document categorization at the library setting, more in-depth studies of using automatic document classification methods to categorize library items are required. Machine learning research has advanced rapidly in recent years. However, applying machine learning techniques to improve library practice is still a relatively unexplored area. This paper illustrates the design and development of a machine learning based automatic document classification system to alleviate the manual categorization problem encountered within the library setting. Two supervised machine learning algorithms have been tested. Our empirical tests show that supervised machine learning algorithms in general, and the k-nearest neighbours (KNN) algorithm in particular, can be used to develop an effective document classification system to enhance current library practice. Moreover, some concrete recommendations regarding how to practically apply the KNN algorithm to develop automatic document classification in a library setting are made. To our best knowledge, this is the first in-depth study of applying the KNN algorithm to automatic document classification based on the widely used LCC classification scheme adopted by many large libraries.

Wang, J.: ¬An extensive study on automated Dewey Decimal Classification (2009) 0.00

0.0023017537 = product of:
  0.009207015 = sum of:
    0.009207015 = weight(_text_:for in 3172) [ClassicSimilarity], result of:
      0.009207015 = score(doc=3172,freq=2.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.103720546 = fieldWeight in 3172, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3172)
  0.25 = coord(1/4)

Source: Journal of the American Society for Information Science and Technology. 60(2009) no.11, S.2269-2286

Golub, K.; Hansson, J.; Soergel, D.; Tudhope, D.: Managing classification in libraries : a methodological outline for evaluating automatic subject indexing and classification in Swedish library catalogues (2015) 0.00

0.0023017537 = product of:
  0.009207015 = sum of:
    0.009207015 = weight(_text_:for in 2300) [ClassicSimilarity], result of:
      0.009207015 = score(doc=2300,freq=2.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.103720546 = fieldWeight in 2300, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2300)
  0.25 = coord(1/4)

Mu, T.; Goulermas, J.Y.; Korkontzelos, I.; Ananiadou, S.: Descriptive document clustering via discriminant learning in a co-embedded space of multilevel similarities (2016) 0.00

0.0023017537 = product of:
  0.009207015 = sum of:
    0.009207015 = weight(_text_:for in 2496) [ClassicSimilarity], result of:
      0.009207015 = score(doc=2496,freq=2.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.103720546 = fieldWeight in 2496, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2496)
  0.25 = coord(1/4)

Source: Journal of the Association for Information Science and Technology. 67(2016) no.1, S.106-133

Pech, G.; Delgado, C.; Sorella, S.P.: Classifying papers into subfields using Abstracts, Titles, Keywords and KeyWords Plus through pattern detection and optimization procedures : an application in Physics (2022) 0.00

0.0023017537 = product of:
  0.009207015 = sum of:
    0.009207015 = weight(_text_:for in 744) [ClassicSimilarity], result of:
      0.009207015 = score(doc=744,freq=2.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.103720546 = fieldWeight in 744, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.0390625 = fieldNorm(doc=744)
  0.25 = coord(1/4)

Source: Journal of the Association for Information Science and Technology. 73(2022) no.11, S.1513-1528

Borko, H.: Research in computer based classification systems (1985) 0.00
```
0.00227862 = product of:
  0.00911448 = sum of:
    0.00911448 = weight(_text_:for in 3647) [ClassicSimilarity], result of:
      0.00911448 = score(doc=3647,freq=4.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.102678105 = fieldWeight in 3647, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.02734375 = fieldNorm(doc=3647)
  0.25 = coord(1/4)
```
Abstract

The selection in this reader by R. M. Needham and K. Sparck Jones reports an early approach to automatic classification that was taken in England. The following selection reviews various approaches that were being pursued in the United States at about the same time. It then discusses a particular approach initiated in the early 1960s by Harold Borko, at that time Head of the Language Processing and Retrieval Research Staff at the System Development Corporation, Santa Monica, California and, since 1966, a member of the faculty at the Graduate School of Library and Information Science, University of California, Los Angeles. As was described earlier, there are two steps in automatic classification, the first being to identify pairs of terms that are similar by virtue of co-occurring as index terms in the same documents, and the second being to form equivalence classes of intersubstitutable terms. To compute similarities, Borko and his associates used a standard correlation formula; to derive classification categories, where Needham and Sparck Jones used clumping, the Borko team used the statistical technique of factor analysis. The fact that documents can be classified automatically, and in any number of ways, is worthy of passing notice. Worthy of serious attention would be a demonstra tion that a computer-based classification system was effective in the organization and retrieval of documents. One reason for the inclusion of the following selection in the reader is that it addresses the question of evaluation. To evaluate the effectiveness of their automatically derived classification, Borko and his team asked three questions. The first was Is the classification reliable? in other words, could the categories derived from one sample of texts be used to classify other texts? Reliability was assessed by a case-study comparison of the classes derived from three different samples of abstracts. The notso-surprising conclusion reached was that automatically derived classes were reliable only to the extent that the sample from which they were derived was representative of the total document collection. The second evaluation question asked whether the classification was reasonable, in the sense of adequately describing the content of the document collection. The answer was sought by comparing the automatically derived categories with categories in a related classification system that was manually constructed. Here the conclusion was that the automatic method yielded categories that fairly accurately reflected the major area of interest in the sample collection of texts; however, since there were only eleven such categories and they were quite broad, they could not be regarded as suitable for use in a university or any large general library. The third evaluation question asked whether automatic classification was accurate, in the sense of producing results similar to those obtainabie by human cIassifiers. When using human classification as a criterion, automatic classification was found to be 50 percent accurate.
Search Engines and Beyond : Developing efficient knowledge management systems, April 19-20 1999, Boston, Mass (1999) 0.00
```
0.0018414031 = product of:
  0.0073656123 = sum of:
    0.0073656123 = weight(_text_:for in 2596) [ClassicSimilarity], result of:
      0.0073656123 = score(doc=2596,freq=2.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.08297644 = fieldWeight in 2596, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.03125 = fieldNorm(doc=2596)
  0.25 = coord(1/4)
```
Content

Ramana Rao (Inxight, Palo Alto, CA) 7 ± 2 Insights on achieving Effective Information Access Session One: Updates and a twelve month perspective Danny Sullivan (Search Engine Watch, US / England) Portalization and other search trends Carol Tenopir (University of Tennessee) Search realities faced by end users and professional searchers Session Two: Today's search engines and beyond Daniel Hoogterp (Retrieval Technologies, McLean, VA) Effective presentation and utilization of search techniques Rick Kenny (Fulcrum Technologies, Ontario, Canada) Beyond document clustering: The knowledge impact statement Gary Stock (Ingenius, Kalamazoo, MI) Automated change monitoring Gary Culliss (Direct Hit, Wellesley Hills, MA) User popularity ranked search engines Byron Dom (IBM, CA) Automatically finding the best pages on the World Wide Web (CLEVER) Peter Tomassi (LookSmart, San Francisco, CA) Adding human intellect to search technology Session Three: Panel discussion: Human v automated categorization and editing Ev Brenner (New York, NY)- Chairman James Callan (University of Massachusetts, MA) Marc Krellenstein (Northern Light Technology, Cambridge, MA) Dan Miller (Ask Jeeves, Berkeley, CA) Session Four: Updates and a twelve month perspective Steve Arnold (AIT, Harrods Creek, KY) Review: The leading edge in search and retrieval software Ellen Voorhees (NIST, Gaithersburg, MD) TREC update Session Five: Search engines now and beyond Intelligent Agents John Snyder (Muscat, Cambridge, England) Practical issues behind intelligent agents Text summarization Therese Firmin, (Dept of Defense, Ft George G. Meade, MD) The TIPSTER/SUMMAC evaluation of automatic text summarization systems Cross language searching Elizabeth Liddy (TextWise, Syracuse, NY) A conceptual interlingua approach to cross-language retrieval. Video search and retrieval Armon Amir (IBM, Almaden, CA) CueVideo: Modular system for automatic indexing and browsing of video/audio Speech recognition Michael Witbrock (Lycos, Waltham, MA) Retrieval of spoken documents Visualization James A. Wise (Integral Visuals, Richland, WA) Information visualization in the new millennium: Emerging science or passing fashion? Text mining David Evans (Claritech, Pittsburgh, PA) Text mining - towards decision support
Hoffmann, R.: Entwicklung einer benutzerunterstützten automatisierten Klassifikation von Web - Dokumenten : Untersuchung gegenwärtiger Methoden zur automatisierten Dokumentklassifikation und Implementierung eines Prototyps zum verbesserten Information Retrieval für das xFIND System (2002) 0.00
```
0.0018414031 = product of:
  0.0073656123 = sum of:
    0.0073656123 = weight(_text_:for in 4197) [ClassicSimilarity], result of:
      0.0073656123 = score(doc=4197,freq=2.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.08297644 = fieldWeight in 4197, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.03125 = fieldNorm(doc=4197)
  0.25 = coord(1/4)
```
Abstract

Das unüberschaubare und permanent wachsende Angebot von Informationen im Internet ermöglicht es den Menschen nicht mehr, dieses inhaltlich zu erfassen oder gezielt nach Informationen zu suchen. Einen Lösungsweg zur verbesserten Informationsauffindung stellt hierbei die Kategorisierung bzw. Klassifikation der Informationen auf Basis ihres thematischen Inhaltes dar. Diese thematische Klassifikation kann sowohl anhand manueller (intellektueller) Methoden als auch durch automatisierte Verfahren erfolgen. Doch beide Ansätze für sich konnten die an sie gestellten Erwartungen bis zum heutigen Tag nur unzureichend erfüllen. Im Rahmen dieser Arbeit soll daher der naheliegende Ansatz, die beiden Methoden sinnvoll zu verknüpfen, untersucht werden. Im ersten Teil dieser Arbeit, dem Untersuchungsbereich, wird einleitend das Problem des Informationsüberangebots in unserer Gesellschaft erläutert und gezeigt, dass die Kategorisierung bzw. Klassifikation dieser Informationen speziell im Internet sinnvoll erscheint. Die prinzipiellen Möglichkeiten der Themenzuordnung von Dokumenten zur Verbesserung der Wissensverwaltung und Wissensauffindung werden beschrieben. Dabei werden unter anderem verschiedene Klassifikationsschemata, Topic Maps und semantische Netze vorgestellt. Schwerpunkt des Untersuchungsbereiches ist die Beschreibung automatisierter Methoden zur Themenzuordnung. Neben einem Überblick über die gebräuchlichsten Klassifikations-Algorithmen werden sowohl am Markt existierende Systeme sowie Forschungsansätze und frei verfügbare Module zur automatischen Klassifikation vorgestellt. Berücksichtigt werden auch Systeme, die zumindest teilweise den erwähnten Ansatz der Kombination von manuellen und automatischen Methoden unterstützen. Auch die in Zusammenhang mit der Klassifikation von Dokumenten im Internet auftretenden Probleme werden aufgezeigt. Die im Untersuchungsbereich gewonnenen Erkenntnisse fließen in die Entwicklung eines Moduls zur benutzerunterstützten, automatischen Dokumentklassifikation im Rahmen des xFIND Systems (extended Framework for Information Discovery) ein. Dieses an der technischen Universität Graz konzipierte Framework stellt die Basis für eine Vielzahl neuer Ideen zur Verbesserung des Information Retrieval dar. Der im Gestaltungsbereich entwickelte Lösungsansatz sieht zunächst die Verwendung bereits im System vorhandener, manuell klassifizierter Dokumente, Server oder Serverbereiche als Grundlage für die automatische Klassifikation vor. Nach erfolgter automatischer Klassifikation können in einem nächsten Schritt dann Autoren und Administratoren die Ergebnisse im Rahmen einer Benutzerunterstützung anpassen. Dabei kann das kollektive Benutzerverhalten durch die Möglichkeit eines Votings - mittels Zustimmung bzw. Ablehnung der Klassifikationsergebnisse - Einfluss finden. Das Wissen von Fachexperten und Benutzern trägt somit letztendlich zur Verbesserung der automatischen Klassifikation bei. Im Gestaltungsbereich werden die grundlegenden Konzepte, der Aufbau und die Funktionsweise des entwickelten Moduls beschrieben, sowie eine Reihe von Vorschlägen und Ideen zur Weiterentwicklung der benutzerunterstützten automatischen Dokumentklassifikation präsentiert.
Altinel, B.; Ganiz, M.C.: Semantic text classification : a survey of past and recent advances (2018) 0.00
```
0.0018414031 = product of:
  0.0073656123 = sum of:
    0.0073656123 = weight(_text_:for in 5051) [ClassicSimilarity], result of:
      0.0073656123 = score(doc=5051,freq=2.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.08297644 = fieldWeight in 5051, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.03125 = fieldNorm(doc=5051)
  0.25 = coord(1/4)
```
Abstract

Automatic text classification is the task of organizing documents into pre-determined classes, generally using machine learning algorithms. Generally speaking, it is one of the most important methods to organize and make use of the gigantic amounts of information that exist in unstructured textual format. Text classification is a widely studied research area of language processing and text mining. In traditional text classification, a document is represented as a bag of words where the words in other words terms are cut from their finer context i.e. their location in a sentence or in a document. Only the broader context of document is used with some type of term frequency information in the vector space. Consequently, semantics of words that can be inferred from the finer context of its location in a sentence and its relations with neighboring words are usually ignored. However, meaning of words, semantic connections between words, documents and even classes are obviously important since methods that capture semantics generally reach better classification performances. Several surveys have been published to analyze diverse approaches for the traditional text classification methods. Most of these surveys cover application of different semantic term relatedness methods in text classification up to a certain degree. However, they do not specifically target semantic text classification algorithms and their advantages over the traditional text classification. In order to fill this gap, we undertake a comprehensive discussion of semantic text classification vs. traditional text classification. This survey explores the past and recent advancements in semantic text classification and attempts to organize existing approaches under five fundamental categories; domain knowledge-based approaches, corpus-based approaches, deep learning based approaches, word/character sequence enhanced approaches and linguistic enriched approaches. Furthermore, this survey highlights the advantages of semantic text classification algorithms over the traditional text classification algorithms.

Search (152 results, page 8 of 8)

Authors

Years

Languages

Types

Themes

Subjects