Search (45 results, page 2 of 3)

Liu, R.-L.: Context recognition for hierarchical text classification (2009) 0.01

0.010605331 = product of:
  0.021210661 = sum of:
    0.021210661 = product of:
      0.042421322 = sum of:
        0.042421322 = weight(_text_:22 in 2760) [ClassicSimilarity], result of:
          0.042421322 = score(doc=2760,freq=2.0), product of:
            0.1827397 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052184064 = queryNorm
            0.23214069 = fieldWeight in 2760, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2760)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 3.2009 19:11:54

Pfeffer, M.: Automatische Vergabe von RVK-Notationen mittels fallbasiertem Schließen (2009) 0.01

0.010605331 = product of:
  0.021210661 = sum of:
    0.021210661 = product of:
      0.042421322 = sum of:
        0.042421322 = weight(_text_:22 in 3051) [ClassicSimilarity], result of:
          0.042421322 = score(doc=3051,freq=2.0), product of:
            0.1827397 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052184064 = queryNorm
            0.23214069 = fieldWeight in 3051, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=3051)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 8.2009 19:51:28

Zhu, W.Z.; Allen, R.B.: Document clustering using the LSI subspace signature model (2013) 0.01

0.010605331 = product of:
  0.021210661 = sum of:
    0.021210661 = product of:
      0.042421322 = sum of:
        0.042421322 = weight(_text_:22 in 690) [ClassicSimilarity], result of:
          0.042421322 = score(doc=690,freq=2.0), product of:
            0.1827397 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052184064 = queryNorm
            0.23214069 = fieldWeight in 690, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=690)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 23. 3.2013 13:22:36

Egbert, J.; Biber, D.; Davies, M.: Developing a bottom-up, user-based method of web register classification (2015) 0.01

0.010605331 = product of:
  0.021210661 = sum of:
    0.021210661 = product of:
      0.042421322 = sum of:
        0.042421322 = weight(_text_:22 in 2158) [ClassicSimilarity], result of:
          0.042421322 = score(doc=2158,freq=2.0), product of:
            0.1827397 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052184064 = queryNorm
            0.23214069 = fieldWeight in 2158, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2158)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 4. 8.2015 19:22:04

Humphrey, S.M.; Névéol, A.; Browne, A.; Gobeil, J.; Ruch, P.; Darmoni, S.J.: Comparing a rule-based versus statistical system for automatic categorization of MEDLINE documents according to biomedical specialty (2009) 0.01
```
0.009625921 = product of:
  0.019251842 = sum of:
    0.019251842 = product of:
      0.038503684 = sum of:
        0.038503684 = weight(_text_:systems in 3300) [ClassicSimilarity], result of:
          0.038503684 = score(doc=3300,freq=4.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.24009174 = fieldWeight in 3300, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3300)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Automatic document categorization is an important research problem in Information Science and Natural Language Processing. Many applications, including, Word Sense Disambiguation and Information Retrieval in large collections, can benefit from such categorization. This paper focuses on automatic categorization of documents from the biomedical literature into broad discipline-based categories. Two different systems are described and contrasted: CISMeF, which uses rules based on human indexing of the documents by the Medical Subject Headings (MeSH) controlled vocabulary in order to assign metaterms (MTs), and Journal Descriptor Indexing (JDI), based on human categorization of about 4,000 journals and statistical associations between journal descriptors (JDs) and textwords in the documents. We evaluate and compare the performance of these systems against a gold standard of humanly assigned categories for 100 MEDLINE documents, using six measures selected from trec_eval. The results show that for five of the measures performance is comparable, and for one measure JDI is superior. We conclude that these results favor JDI, given the significantly greater intellectual overhead involved in human indexing and maintaining a rule base for mapping MeSH terms to MTs. We also note a JDI method that associates JDs with MeSH indexing rather than textwords, and it may be worthwhile to investigate whether this JDI method (statistical) and CISMeF (rule-based) might be combined and then evaluated showing they are complementary to one another.
Savic, D.: Automatic classification of office documents : review of available methods and techniques (1995) 0.01
```
0.009529176 = product of:
  0.019058352 = sum of:
    0.019058352 = product of:
      0.038116705 = sum of:
        0.038116705 = weight(_text_:systems in 2219) [ClassicSimilarity], result of:
          0.038116705 = score(doc=2219,freq=2.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.23767869 = fieldWeight in 2219, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2219)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Classification of office documents is one of the administrative functions carried out by almost every organization and institution which sends and receives correspondence. Processing of this increasing amount of information coming and out going mail, in particular its classification, is time consuming and expensive. More and more organizations are seeking a solution for meeting this challenge by designing computer based systems for automatic classification. Examines the present status of available knowledge and methodology which can be used for automatic classification of office documents. Besides a review of classic methods and techniques, the focus id also placed on the application of artificial intelligence
Sebastiani, F.: Classification of text, automatic (2006) 0.01
```
0.009529176 = product of:
  0.019058352 = sum of:
    0.019058352 = product of:
      0.038116705 = sum of:
        0.038116705 = weight(_text_:systems in 5003) [ClassicSimilarity], result of:
          0.038116705 = score(doc=5003,freq=2.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.23767869 = fieldWeight in 5003, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5003)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Automatic text classification (ATC) is a discipline at the crossroads of information retrieval (IR), machine learning (ML), and computational linguistics (CL), and consists in the realization of text classifiers, i.e. software systems capable of assigning texts to one or more categories, or classes, from a predefined set. Applications range from the automated indexing of scientific articles, to e-mail routing, spam filtering, authorship attribution, and automated survey coding. This article will focus on the ML approach to ATC, whereby a software system (called the learner) automatically builds a classifier for the categories of interest by generalizing from a "training" set of pre-classified texts.

Mengle, S.; Goharian, N.: Passage detection using text classification (2009) 0.01

0.008837775 = product of:
  0.01767555 = sum of:
    0.01767555 = product of:
      0.0353511 = sum of:
        0.0353511 = weight(_text_:22 in 2765) [ClassicSimilarity], result of:
          0.0353511 = score(doc=2765,freq=2.0), product of:
            0.1827397 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052184064 = queryNorm
            0.19345059 = fieldWeight in 2765, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2765)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 3.2009 19:14:43

Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.01

0.008837775 = product of:
  0.01767555 = sum of:
    0.01767555 = product of:
      0.0353511 = sum of:
        0.0353511 = weight(_text_:22 in 1107) [ClassicSimilarity], result of:
          0.0353511 = score(doc=1107,freq=2.0), product of:
            0.1827397 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052184064 = queryNorm
            0.19345059 = fieldWeight in 1107, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1107)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 28.10.2013 19:22:57

Prabowo, R.; Jackson, M.; Burden, P.; Knoell, H.-D.: Ontology-based automatic classification for the Web pages : design, implementation and evaluation (2002) 0.01

0.008167865 = product of:
  0.01633573 = sum of:
    0.01633573 = product of:
      0.03267146 = sum of:
        0.03267146 = weight(_text_:systems in 3383) [ClassicSimilarity], result of:
          0.03267146 = score(doc=3383,freq=2.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.2037246 = fieldWeight in 3383, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.046875 = fieldNorm(doc=3383)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Content: Beitrag bei: The Third International Conference on Web Information Systems Engineering (WISE'00) Dec., 12-14, 2002, Singapore, S.182.

Golub, K.: Automated subject classification of textual Web pages, based on a controlled vocabulary : challenges and recommendations (2006) 0.01

0.008167865 = product of:
  0.01633573 = sum of:
    0.01633573 = product of:
      0.03267146 = sum of:
        0.03267146 = weight(_text_:systems in 5897) [ClassicSimilarity], result of:
          0.03267146 = score(doc=5897,freq=2.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.2037246 = fieldWeight in 5897, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.046875 = fieldNorm(doc=5897)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Content: Beitrag eines Themenheftes "Knowledge organization systems and services"

Ko, Y.; Seo, J.: Text classification from unlabeled documents with bootstrapping and feature projection techniques (2009) 0.01
```
0.008167865 = product of:
  0.01633573 = sum of:
    0.01633573 = product of:
      0.03267146 = sum of:
        0.03267146 = weight(_text_:systems in 2452) [ClassicSimilarity], result of:
          0.03267146 = score(doc=2452,freq=2.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.2037246 = fieldWeight in 2452, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.046875 = fieldNorm(doc=2452)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Many machine learning algorithms have been applied to text classification tasks. In the machine learning paradigm, a general inductive process automatically builds a text classifier by learning, generally known as supervised learning. However, the supervised learning approaches have some problems. The most notable problem is that they require a large number of labeled training documents for accurate learning. While unlabeled documents are easily collected and plentiful, labeled documents are difficultly generated because a labeling task must be done by human developers. In this paper, we propose a new text classification method based on unsupervised or semi-supervised learning. The proposed method launches text classification tasks with only unlabeled documents and the title word of each category for learning, and then it automatically learns text classifier by using bootstrapping and feature projection techniques. The results of experiments showed that the proposed method achieved reasonably useful performance compared to a supervised method. If the proposed method is used in a text classification task, building text classification systems will become significantly faster and less expensive.
Puzicha, J.: Informationen finden! : Intelligente Suchmaschinentechnologie & automatische Kategorisierung (2007) 0.01
```
0.008167865 = product of:
  0.01633573 = sum of:
    0.01633573 = product of:
      0.03267146 = sum of:
        0.03267146 = weight(_text_:systems in 2817) [ClassicSimilarity], result of:
          0.03267146 = score(doc=2817,freq=2.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.2037246 = fieldWeight in 2817, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.046875 = fieldNorm(doc=2817)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Wie in diesem Text erläutert wurde, ist die Effektivität von Such- und Klassifizierungssystemen durch folgendes bestimmt: 1) den Arbeitsauftrag, 2) die Genauigkeit des Systems, 3) den zu erreichenden Automatisierungsgrad, 4) die Einfachheit der Integration in bereits vorhandene Systeme. Diese Kriterien gehen davon aus, dass jedes System, unabhängig von der Technologie, in der Lage ist, Grundvoraussetzungen des Produkts in Bezug auf Funktionalität, Skalierbarkeit und Input-Methode zu erfüllen. Diese Produkteigenschaften sind in der Recommind Produktliteratur genauer erläutert. Von diesen Fähigkeiten ausgehend sollte die vorhergehende Diskussion jedoch einige klare Trends aufgezeigt haben. Es ist nicht überraschend, dass jüngere Entwicklungen im Maschine Learning und anderen Bereichen der Informatik einen theoretischen Ausgangspunkt für die Entwicklung von Suchmaschinen- und Klassifizierungstechnologie haben. Besonders jüngste Fortschritte bei den statistischen Methoden (PLSA) und anderen mathematischen Werkzeugen (SVMs) haben eine Ergebnisqualität auf Durchbruchsniveau erreicht. Dazu kommt noch die Flexibilität in der Anwendung durch Selbsttraining und Kategorienerkennen von PLSA-Systemen, wie auch eine neue Generation von vorher unerreichten Produktivitätsverbesserungen.

Li, T.; Zhu, S.; Ogihara, M.: Hierarchical document classification using automatically generated hierarchy (2007) 0.01

0.008167865 = product of:
  0.01633573 = sum of:
    0.01633573 = product of:
      0.03267146 = sum of:
        0.03267146 = weight(_text_:systems in 4797) [ClassicSimilarity], result of:
          0.03267146 = score(doc=4797,freq=2.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.2037246 = fieldWeight in 4797, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.046875 = fieldNorm(doc=4797)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Journal of intelligent information systems. 29(2007) no.2, S.211-230

Search Engines and Beyond : Developing efficient knowledge management systems, April 19-20 1999, Boston, Mass (1999) 0.01
```
0.007700737 = product of:
  0.015401474 = sum of:
    0.015401474 = product of:
      0.030802948 = sum of:
        0.030802948 = weight(_text_:systems in 2596) [ClassicSimilarity], result of:
          0.030802948 = score(doc=2596,freq=4.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.19207339 = fieldWeight in 2596, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.03125 = fieldNorm(doc=2596)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Content

Ramana Rao (Inxight, Palo Alto, CA) 7 ± 2 Insights on achieving Effective Information Access Session One: Updates and a twelve month perspective Danny Sullivan (Search Engine Watch, US / England) Portalization and other search trends Carol Tenopir (University of Tennessee) Search realities faced by end users and professional searchers Session Two: Today's search engines and beyond Daniel Hoogterp (Retrieval Technologies, McLean, VA) Effective presentation and utilization of search techniques Rick Kenny (Fulcrum Technologies, Ontario, Canada) Beyond document clustering: The knowledge impact statement Gary Stock (Ingenius, Kalamazoo, MI) Automated change monitoring Gary Culliss (Direct Hit, Wellesley Hills, MA) User popularity ranked search engines Byron Dom (IBM, CA) Automatically finding the best pages on the World Wide Web (CLEVER) Peter Tomassi (LookSmart, San Francisco, CA) Adding human intellect to search technology Session Three: Panel discussion: Human v automated categorization and editing Ev Brenner (New York, NY)- Chairman James Callan (University of Massachusetts, MA) Marc Krellenstein (Northern Light Technology, Cambridge, MA) Dan Miller (Ask Jeeves, Berkeley, CA) Session Four: Updates and a twelve month perspective Steve Arnold (AIT, Harrods Creek, KY) Review: The leading edge in search and retrieval software Ellen Voorhees (NIST, Gaithersburg, MD) TREC update Session Five: Search engines now and beyond Intelligent Agents John Snyder (Muscat, Cambridge, England) Practical issues behind intelligent agents Text summarization Therese Firmin, (Dept of Defense, Ft George G. Meade, MD) The TIPSTER/SUMMAC evaluation of automatic text summarization systems Cross language searching Elizabeth Liddy (TextWise, Syracuse, NY) A conceptual interlingua approach to cross-language retrieval. Video search and retrieval Armon Amir (IBM, Almaden, CA) CueVideo: Modular system for automatic indexing and browsing of video/audio Speech recognition Michael Witbrock (Lycos, Waltham, MA) Retrieval of spoken documents Visualization James A. Wise (Integral Visuals, Richland, WA) Information visualization in the new millennium: Emerging science or passing fashion? Text mining David Evans (Claritech, Pittsburgh, PA) Text mining - towards decision support
Piros, A.: Automatic interpretation of complex UDC numbers : towards support for library systems (2015) 0.01
```
0.007700737 = product of:
  0.015401474 = sum of:
    0.015401474 = product of:
      0.030802948 = sum of:
        0.030802948 = weight(_text_:systems in 2301) [ClassicSimilarity], result of:
          0.030802948 = score(doc=2301,freq=4.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.19207339 = fieldWeight in 2301, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.03125 = fieldNorm(doc=2301)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Analytico-synthetic and faceted classifications, such as Universal Decimal Classification (UDC) express content of documents with complex, pre-combined classification codes. Without classification authority control that would help manage and access structured notations, the use of UDC codes in searching and browsing is limited. Existing UDC parsing solutions are usually created for a particular database system or a specific task and are not widely applicable. The approach described in this paper provides a solution by which the analysis and interpretation of UDC notations would be stored into an intermediate format (in this case, in XML) by automatic means without any data or information loss. Due to its richness, the output file can be converted into different formats, such as standard mark-up and data exchange formats or simple lists of the recommended entry points of a UDC number. The program can also be used to create authority records containing complex UDC numbers which can be comprehensively analysed in order to be retrieved effectively. The Java program, as well as the corresponding schema definition it employs, is under continuous development. The current version of the interpreter software is now available online for testing purposes at the following web site: http://interpreter-eto.rhcloud.com. The future plan is to implement conversion methods for standard formats and to create standard online interfaces in order to make it possible to use the features of software as a service. This would result in the algorithm being able to be employed both in existing and future library systems to analyse UDC numbers without any significant programming effort.

Khoo, C.S.G.; Ng, K.; Ou, S.: ¬An exploratory study of human clustering of Web pages (2003) 0.01

0.0070702205 = product of:
  0.014140441 = sum of:
    0.014140441 = product of:
      0.028280882 = sum of:
        0.028280882 = weight(_text_:22 in 2741) [ClassicSimilarity], result of:
          0.028280882 = score(doc=2741,freq=2.0), product of:
            0.1827397 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052184064 = queryNorm
            0.15476047 = fieldWeight in 2741, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=2741)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 12. 9.2004 9:56:22

Reiner, U.: Automatische DDC-Klassifizierung bibliografischer Titeldatensätze der Deutschen Nationalbibliografie (2009) 0.01

0.0070702205 = product of:
  0.014140441 = sum of:
    0.014140441 = product of:
      0.028280882 = sum of:
        0.028280882 = weight(_text_:22 in 3284) [ClassicSimilarity], result of:
          0.028280882 = score(doc=3284,freq=2.0), product of:
            0.1827397 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052184064 = queryNorm
            0.15476047 = fieldWeight in 3284, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=3284)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 1.2010 14:41:24

Wang, J.: ¬An extensive study on automated Dewey Decimal Classification (2009) 0.01
```
0.0068065543 = product of:
  0.013613109 = sum of:
    0.013613109 = product of:
      0.027226217 = sum of:
        0.027226217 = weight(_text_:systems in 3172) [ClassicSimilarity], result of:
          0.027226217 = score(doc=3172,freq=2.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.1697705 = fieldWeight in 3172, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3172)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

In this paper, we present a theoretical analysis and extensive experiments on the automated assignment of Dewey Decimal Classification (DDC) classes to bibliographic data with a supervised machine-learning approach. Library classification systems, such as the DDC, impose great obstacles on state-of-art text categorization (TC) technologies, including deep hierarchy, data sparseness, and skewed distribution. We first analyze statistically the document and category distributions over the DDC, and discuss the obstacles imposed by bibliographic corpora and library classification schemes on TC technology. To overcome these obstacles, we propose an innovative algorithm to reshape the DDC structure into a balanced virtual tree by balancing the category distribution and flattening the hierarchy. To improve the classification effectiveness to a level acceptable to real-world applications, we propose an interactive classification model that is able to predict a class of any depth within a limited number of user interactions. The experiments are conducted on a large bibliographic collection created by the Library of Congress within the science and technology domains over 10 years. With no more than three interactions, a classification accuracy of nearly 90% is achieved, thus providing a practical solution to the automatic bibliographic classification problem.
Golub, K.; Lykke, M.: Automated classification of web pages in hierarchical browsing (2009) 0.01
```
0.0068065543 = product of:
  0.013613109 = sum of:
    0.013613109 = product of:
      0.027226217 = sum of:
        0.027226217 = weight(_text_:systems in 3614) [ClassicSimilarity], result of:
          0.027226217 = score(doc=3614,freq=2.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.1697705 = fieldWeight in 3614, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3614)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Purpose - The purpose of this study is twofold: to investigate whether it is meaningful to use the Engineering Index (Ei) classification scheme for browsing, and then, if proven useful, to investigate the performance of an automated classification algorithm based on the Ei classification scheme. Design/methodology/approach - A user study was conducted in which users solved four controlled searching tasks. The users browsed the Ei classification scheme in order to examine the suitability of the classification systems for browsing. The classification algorithm was evaluated by the users who judged the correctness of the automatically assigned classes. Findings - The study showed that the Ei classification scheme is suited for browsing. Automatically assigned classes were on average partly correct, with some classes working better than others. Success of browsing showed to be correlated and dependent on classification correctness. Research limitations/implications - Further research should address problems of disparate evaluations of one and the same web page. Additional reasons behind browsing failures in the Ei classification scheme also need further investigation. Practical implications - Improvements for browsing were identified: describing class captions and/or listing their subclasses from start; allowing for searching for words from class captions with synonym search (easily provided for Ei since the classes are mapped to thesauri terms); when searching for class captions, returning the hierarchical tree expanded around the class in which caption the search term is found. The need for improvements of classification schemes was also indicated. Originality/value - A user-based evaluation of automated subject classification in the context of browsing has not been conducted before; hence the study also presents new findings concerning methodology.

Search (45 results, page 2 of 3)

Authors

Years

Languages

Types

Themes