Search (167 results, page 9 of 9)

Golub, K.: Automated subject classification of textual web documents (2006) 0.00
```
9.791424E-4 = product of:
  0.0019582848 = sum of:
    0.0019582848 = product of:
      0.0039165695 = sum of:
        0.0039165695 = weight(_text_:a in 5600) [ClassicSimilarity], result of:
          0.0039165695 = score(doc=5600,freq=4.0), product of:
            0.043477926 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.037706986 = queryNorm
            0.090081796 = fieldWeight in 5600, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5600)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Purpose - To provide an integrated perspective to similarities and differences between approaches to automated classification in different research communities (machine learning, information retrieval and library science), and point to problems with the approaches and automated classification as such. Design/methodology/approach - A range of works dealing with automated classification of full-text web documents are discussed. Explorations of individual approaches are given in the following sections: special features (description, differences, evaluation), application and characteristics of web pages. Findings - Provides major similarities and differences between the three approaches: document pre-processing and utilization of web-specific document characteristics is common to all the approaches; major differences are in applied algorithms, employment or not of the vector space model and of controlled vocabularies. Problems of automated classification are recognized. Research limitations/implications - The paper does not attempt to provide an exhaustive bibliography of related resources. Practical implications - As an integrated overview of approaches from different research communities with application examples, it is very useful for students in library and information science and computer science, as well as for practitioners. Researchers from one community have the information on how similar tasks are conducted in different communities. Originality/value - To the author's knowledge, no review paper on automated text classification attempted to discuss more than one community's approach from an integrated perspective.

Type

a
HaCohen-Kerner, Y.; Beck, H.; Yehudai, E.; Rosenstein, M.; Mughaz, D.: Cuisine : classification using stylistic feature sets and/or name-based feature sets (2010) 0.00
```
9.791424E-4 = product of:
  0.0019582848 = sum of:
    0.0019582848 = product of:
      0.0039165695 = sum of:
        0.0039165695 = weight(_text_:a in 3706) [ClassicSimilarity], result of:
          0.0039165695 = score(doc=3706,freq=4.0), product of:
            0.043477926 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.037706986 = queryNorm
            0.090081796 = fieldWeight in 3706, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3706)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Document classification presents challenges due to the large number of features, their dependencies, and the large number of training documents. In this research, we investigated the use of six stylistic feature sets (including 42 features) and/or six name-based feature sets (including 234 features) for various combinations of the following classification tasks: ethnic groups of the authors and/or periods of time when the documents were written and/or places where the documents were written. The investigated corpus contains Jewish Law articles written in Hebrew-Aramaic, which present interesting problems for classification. Our system CUISINE (Classification UsIng Stylistic feature sets and/or NamE-based feature sets) achieves accuracy results between 90.71 to 98.99% for the seven classification experiments (ethnicity, time, place, ethnicity&time, ethnicity&place, time&place, ethnicity&time&place). For the first six tasks, the stylistic feature sets in general and the quantitative feature set in particular are enough for excellent classification results. In contrast, the name-based feature sets are rather poor for these tasks. However, for the most complex task (ethnicity&time&place), a hill-climbing model using all feature sets succeeds in significantly improving the classification results. Most of the stylistic features (34 of 42) are language-independent and domain-independent. These features might be useful to the community at large, at least for rather simple tasks.

Type

a
Koch, T.; Vizine-Goetz, D.: Automatic classification and content navigation support for Web services : DESIRE II cooperates with OCLC (1998) 0.00
```
9.693015E-4 = product of:
  0.001938603 = sum of:
    0.001938603 = product of:
      0.003877206 = sum of:
        0.003877206 = weight(_text_:a in 1568) [ClassicSimilarity], result of:
          0.003877206 = score(doc=1568,freq=2.0), product of:
            0.043477926 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.037706986 = queryNorm
            0.089176424 = fieldWeight in 1568, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1568)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Emerging standards in knowledge representation and organization are preparing the way for distributed vocabulary support in Internet search services. NetLab researchers are exploring several innovative solutions for searching and browsing in the subject-based Internet gateway, Electronic Engineering Library, Sweden (EELS). The implementation of the EELS service is described, specifically, the generation of the robot-gathered database 'All' engineering and the automated application of the Ei thesaurus and classification scheme. NetLab and OCLC researchers are collaborating to investigate advanced solutions to automated classification in the DESIRE II context. A plan for furthering the development of distributed vocabulary support in Internet search services is offered.

Cui, H.; Heidorn, P.B.; Zhang, H.: ¬An approach to automatic classification of text for information retrieval (2002) 0.00

9.693015E-4 = product of:
  0.001938603 = sum of:
    0.001938603 = product of:
      0.003877206 = sum of:
        0.003877206 = weight(_text_:a in 174) [ClassicSimilarity], result of:
          0.003877206 = score(doc=174,freq=2.0), product of:
            0.043477926 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.037706986 = queryNorm
            0.089176424 = fieldWeight in 174, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=174)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Type: a

Koch, T.; Vizine-Goetz, D.: DDC and knowledge organization in the digital library : Research and development. Demonstration pages (1999) 0.00

8.308299E-4 = product of:
  0.0016616598 = sum of:
    0.0016616598 = product of:
      0.0033233196 = sum of:
        0.0033233196 = weight(_text_:a in 942) [ClassicSimilarity], result of:
          0.0033233196 = score(doc=942,freq=2.0), product of:
            0.043477926 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.037706986 = queryNorm
            0.07643694 = fieldWeight in 942, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=942)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Content: 1. Increased Importance of Knowledge Organization in Internet Services - 2. Quality Subject Service and the role of classification - 3. Developing the DDC into a knowledge organization instrument for the digital library. OCLC site - 4. DESIRE's Barefoot Solutions of Automatic Classification - 5. Advanced Classification Solutions in DESIRE and CORC - 6. Future directions of research and development - 7. General references

Reiner, U.: DDC-based search in the data of the German National Bibliography (2008) 0.00

8.308299E-4 = product of:
  0.0016616598 = sum of:
    0.0016616598 = product of:
      0.0033233196 = sum of:
        0.0033233196 = weight(_text_:a in 2166) [ClassicSimilarity], result of:
          0.0033233196 = score(doc=2166,freq=2.0), product of:
            0.043477926 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.037706986 = queryNorm
            0.07643694 = fieldWeight in 2166, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=2166)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Type: a

Sojka, P.; Lee, M.; Rehurek, R.; Hatlapatka, R.; Kucbel, M.; Bouche, T.; Goutorbe, C.; Anghelache, R.; Wojciechowski, K.: Toolset for entity and semantic associations : Final Release (2013) 0.00
```
8.308299E-4 = product of:
  0.0016616598 = sum of:
    0.0016616598 = product of:
      0.0033233196 = sum of:
        0.0033233196 = weight(_text_:a in 1057) [ClassicSimilarity], result of:
          0.0033233196 = score(doc=1057,freq=2.0), product of:
            0.043477926 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.037706986 = queryNorm
            0.07643694 = fieldWeight in 1057, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=1057)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

In this document we describe the final release of the toolset for entity and semantic associations, integrating two versions (language dependent and language independent) of Unsupervised Document Similarity implemented by MU (using gensim tool) and Citation Indexing, Resolution and Matching (UJF/CMD). We give a brief description of tools, the rationale behind decisions made, and provide elementary evaluation. Tools are integrated in the main project result, EuDML website, and they deliver the needed functionality for exploratory searching and browsing the collected documents. EuDML users and content providers thus benefit from millions of algorithmically generated similarity and citation links, developed using state of the art machine learning and matching methods.

Search (167 results, page 9 of 9)

Authors

Years

Types

Themes