Search (48 results, page 1 of 3)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.05

0.054339416 = product of:
  0.08150912 = sum of:
    0.06962967 = product of:
      0.208889 = sum of:
        0.208889 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.208889 = score(doc=562,freq=2.0), product of:
            0.37167668 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.04384008 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.011879452 = product of:
      0.035638355 = sum of:
        0.035638355 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.035638355 = score(doc=562,freq=2.0), product of:
            0.1535205 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04384008 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Ruiz, M.E.; Srinivasan, P.: Combining machine learning and hierarchical indexing structures for text categorization (2001) 0.04

0.039474797 = product of:
  0.059212193 = sum of:
    0.04522703 = weight(_text_:development in 1595) [ClassicSimilarity], result of:
      0.04522703 = score(doc=1595,freq=2.0), product of:
        0.16011542 = queryWeight, product of:
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.04384008 = queryNorm
        0.28246516 = fieldWeight in 1595, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1595)
    0.013985164 = product of:
      0.041955493 = sum of:
        0.041955493 = weight(_text_:29 in 1595) [ClassicSimilarity], result of:
          0.041955493 = score(doc=1595,freq=2.0), product of:
            0.1542157 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04384008 = queryNorm
            0.27205724 = fieldWeight in 1595, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1595)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)

Abstract: This paper presents a method that exploits the hierarchical structure of an indexing vocabulary to guide the development and training of machine learning methods for automatic text categorization. We present the design of a hierarchical classifier based an the divide-and-conquer principle. The method is evaluated using backpropagation neural networks, such as the machine learning algorithm, that leam to assign MeSH categories to a subset of MEDLINE records. Comparisons with traditional Rocchio's algorithm adapted for text categorization, as well as flat neural network classifiers, are provided. The results indicate that the use of hierarchical structures improves Performance significantly.
Date: 11. 5.2003 18:29:44

Egbert, J.; Biber, D.; Davies, M.: Developing a bottom-up, user-based method of web register classification (2015) 0.03

0.03376365 = product of:
  0.050645474 = sum of:
    0.038766023 = weight(_text_:development in 2158) [ClassicSimilarity], result of:
      0.038766023 = score(doc=2158,freq=2.0), product of:
        0.16011542 = queryWeight, product of:
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.04384008 = queryNorm
        0.242113 = fieldWeight in 2158, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.046875 = fieldNorm(doc=2158)
    0.011879452 = product of:
      0.035638355 = sum of:
        0.035638355 = weight(_text_:22 in 2158) [ClassicSimilarity], result of:
          0.035638355 = score(doc=2158,freq=2.0), product of:
            0.1535205 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04384008 = queryNorm
            0.23214069 = fieldWeight in 2158, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2158)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)

Abstract: This paper introduces a project to develop a reliable, cost-effective method for classifying Internet texts into register categories, and apply that approach to the analysis of a large corpus of web documents. To date, the project has proceeded in 2 key phases. First, we developed a bottom-up method for web register classification, asking end users of the web to utilize a decision-tree survey to code relevant situational characteristics of web documents, resulting in a bottom-up identification of register and subregister categories. We present details regarding the development and testing of this method through a series of 10 pilot studies. Then, in the second phase of our project we applied this procedure to a corpus of 53,000 web documents. An analysis of the results demonstrates the effectiveness of these methods for web register classification and provides a preliminary description of the types and distribution of registers on the web.
Date: 4. 8.2015 19:22:04

Yu, W.; Gong, Y.: Document clustering by concept factorization (2004) 0.03

0.025844015 = product of:
  0.077532046 = sum of:
    0.077532046 = weight(_text_:development in 4084) [ClassicSimilarity], result of:
      0.077532046 = score(doc=4084,freq=2.0), product of:
        0.16011542 = queryWeight, product of:
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.04384008 = queryNorm
        0.484226 = fieldWeight in 4084, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.09375 = fieldNorm(doc=4084)
  0.33333334 = coord(1/3)

Source: SIGIR'04: Proceedings of the 27th Annual International ACM-SIGIR Conference an Research and Development in Information Retrieval. Ed.: K. Järvelin, u.a

Piros, A.: Automatic interpretation of complex UDC numbers : towards support for library systems (2015) 0.02
```
0.022557026 = product of:
  0.033835538 = sum of:
    0.025844015 = weight(_text_:development in 2301) [ClassicSimilarity], result of:
      0.025844015 = score(doc=2301,freq=2.0), product of:
        0.16011542 = queryWeight, product of:
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.04384008 = queryNorm
        0.16140866 = fieldWeight in 2301, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.03125 = fieldNorm(doc=2301)
    0.0079915235 = product of:
      0.02397457 = sum of:
        0.02397457 = weight(_text_:29 in 2301) [ClassicSimilarity], result of:
          0.02397457 = score(doc=2301,freq=2.0), product of:
            0.1542157 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04384008 = queryNorm
            0.15546128 = fieldWeight in 2301, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.03125 = fieldNorm(doc=2301)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)
```
Abstract

Analytico-synthetic and faceted classifications, such as Universal Decimal Classification (UDC) express content of documents with complex, pre-combined classification codes. Without classification authority control that would help manage and access structured notations, the use of UDC codes in searching and browsing is limited. Existing UDC parsing solutions are usually created for a particular database system or a specific task and are not widely applicable. The approach described in this paper provides a solution by which the analysis and interpretation of UDC notations would be stored into an intermediate format (in this case, in XML) by automatic means without any data or information loss. Due to its richness, the output file can be converted into different formats, such as standard mark-up and data exchange formats or simple lists of the recommended entry points of a UDC number. The program can also be used to create authority records containing complex UDC numbers which can be comprehensively analysed in order to be retrieved effectively. The Java program, as well as the corresponding schema definition it employs, is under continuous development. The current version of the interpreter software is now available online for testing purposes at the following web site: http://interpreter-eto.rhcloud.com. The future plan is to implement conversion methods for standard formats and to create standard online interfaces in order to make it possible to use the features of software as a service. This would result in the algorithm being able to be employed both in existing and future library systems to analyse UDC numbers without any significant programming effort.

Source

Classification and authority control: expanding resource discovery: proceedings of the International UDC Seminar 2015, 29-30 October 2015, Lisbon, Portugal. Eds.: Slavic, A. u. M.I. Cordeiro

Shen, D.; Chen, Z.; Yang, Q.; Zeng, H.J.; Zhang, B.; Lu, Y.; Ma, W.Y.: Web page classification through summarization (2004) 0.02

0.021536682 = product of:
  0.06461004 = sum of:
    0.06461004 = weight(_text_:development in 4132) [ClassicSimilarity], result of:
      0.06461004 = score(doc=4132,freq=2.0), product of:
        0.16011542 = queryWeight, product of:
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.04384008 = queryNorm
        0.40352166 = fieldWeight in 4132, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.078125 = fieldNorm(doc=4132)
  0.33333334 = coord(1/3)

Source: SIGIR'04: Proceedings of the 27th Annual International ACM-SIGIR Conference an Research and Development in Information Retrieval. Ed.: K. Järvelin, u.a

Koch, T.: Experiments with automatic classification of WAIS databases and indexing of WWW : some results from the Nordic WAIS/WWW project (1994) 0.02
```
0.021320224 = product of:
  0.06396067 = sum of:
    0.06396067 = weight(_text_:development in 7209) [ClassicSimilarity], result of:
      0.06396067 = score(doc=7209,freq=4.0), product of:
        0.16011542 = queryWeight, product of:
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.04384008 = queryNorm
        0.39946604 = fieldWeight in 7209, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7209)
  0.33333334 = coord(1/3)
```
Abstract

The Nordic WAIS/WWW project sponsored by NORDINFO is a joint project between Lund University Library and the National Technological Library of Denmark. It aims to improve the existing networked information discovery and retrieval tools Wide Area Information System (WAIS) and World Wide Web (WWW), and to move towards unifying WWW and WAIS. Details current results focusing on the WAIS side of the project. Describes research into automatic indexing and classification of WAIS sources, development of an orientation tool for WAIS, and development of a WAIS index of WWW resources

Koch, T.; Vizine-Goetz, D.: DDC and knowledge organization in the digital library : Research and development. Demonstration pages (1999) 0.02

0.018274479 = product of:
  0.054823436 = sum of:
    0.054823436 = weight(_text_:development in 942) [ClassicSimilarity], result of:
      0.054823436 = score(doc=942,freq=4.0), product of:
        0.16011542 = queryWeight, product of:
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.04384008 = queryNorm
        0.34239948 = fieldWeight in 942, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.046875 = fieldNorm(doc=942)
  0.33333334 = coord(1/3)

Content: 1. Increased Importance of Knowledge Organization in Internet Services - 2. Quality Subject Service and the role of classification - 3. Developing the DDC into a knowledge organization instrument for the digital library. OCLC site - 4. DESIRE's Barefoot Solutions of Automatic Classification - 5. Advanced Classification Solutions in DESIRE and CORC - 6. Future directions of research and development - 7. General references

Guerrero-Bote, V.P.; Moya Anegón, F. de; Herrero Solana, V.: Document organization using Kohonen's algorithm (2002) 0.02
```
0.017229345 = product of:
  0.05168803 = sum of:
    0.05168803 = weight(_text_:development in 2564) [ClassicSimilarity], result of:
      0.05168803 = score(doc=2564,freq=2.0), product of:
        0.16011542 = queryWeight, product of:
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.04384008 = queryNorm
        0.32281733 = fieldWeight in 2564, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.0625 = fieldNorm(doc=2564)
  0.33333334 = coord(1/3)
```
Abstract

The classification of documents from a bibliographic database is a task that is linked to processes of information retrieval based on partial matching. A method is described of vectorizing reference documents from LISA which permits their topological organization using Kohonen's algorithm. As an example a map is generated of 202 documents from LISA, and an analysis is made of the possibilities of this type of neural network with respect to the development of information retrieval systems based on graphical browsing.
Koch, T.; Vizine-Goetz, D.: Automatic classification and content navigation support for Web services : DESIRE II cooperates with OCLC (1998) 0.02
```
0.015075676 = product of:
  0.04522703 = sum of:
    0.04522703 = weight(_text_:development in 1568) [ClassicSimilarity], result of:
      0.04522703 = score(doc=1568,freq=2.0), product of:
        0.16011542 = queryWeight, product of:
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.04384008 = queryNorm
        0.28246516 = fieldWeight in 1568, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1568)
  0.33333334 = coord(1/3)
```
Abstract

Emerging standards in knowledge representation and organization are preparing the way for distributed vocabulary support in Internet search services. NetLab researchers are exploring several innovative solutions for searching and browsing in the subject-based Internet gateway, Electronic Engineering Library, Sweden (EELS). The implementation of the EELS service is described, specifically, the generation of the robot-gathered database 'All' engineering and the automated application of the Ei thesaurus and classification scheme. NetLab and OCLC researchers are collaborating to investigate advanced solutions to automated classification in the DESIRE II context. A plan for furthering the development of distributed vocabulary support in Internet search services is offered.
Denoyer, L.; Gallinari, P.: Bayesian network model for semi-structured document classification (2004) 0.01
```
0.012922008 = product of:
  0.038766023 = sum of:
    0.038766023 = weight(_text_:development in 995) [ClassicSimilarity], result of:
      0.038766023 = score(doc=995,freq=2.0), product of:
        0.16011542 = queryWeight, product of:
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.04384008 = queryNorm
        0.242113 = fieldWeight in 995, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.046875 = fieldNorm(doc=995)
  0.33333334 = coord(1/3)
```
Abstract

Recently, a new community has started to emerge around the development of new information research methods for searching and analyzing semi-structured and XML like documents. The goal is to handle both content and structural information, and to deal with different types of information content (text, image, etc.). We consider here the task of structured document classification. We propose a generative model able to handle both structure and content which is based on Bayesian networks. We then show how to transform this generative model into a discriminant classifier using the method of Fisher kernel. The model is then extended for dealing with different types of content information (here text and images). The model was tested on three databases: the classical webKB corpus composed of HTML pages, the new INEX corpus which has become a reference in the field of ad-hoc retrieval for XML documents, and a multimedia corpus of Web pages.
Reiner, U.: DDC-based search in the data of the German National Bibliography (2008) 0.01
```
0.012922008 = product of:
  0.038766023 = sum of:
    0.038766023 = weight(_text_:development in 2166) [ClassicSimilarity], result of:
      0.038766023 = score(doc=2166,freq=2.0), product of:
        0.16011542 = queryWeight, product of:
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.04384008 = queryNorm
        0.242113 = fieldWeight in 2166, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.046875 = fieldNorm(doc=2166)
  0.33333334 = coord(1/3)
```
Abstract

In 2004, the German National Library began to classify title records of the German National Bibliography according to subject groups based on the divisions of the Dewey Decimal Classification (DDC). Since 2006, all titles of the main series of the German National Bibliography are classified in strict compliance with the DDC. On this basis, an enhanced DDC-based search can be realized - e.g., searching the data of the German National Bibliography for title records using number components of synthesized classification numbers or searching for DDC numbers using unclassified title records. This paper gives an account of the current research and development of the DDC-based search. The work is conducted in the VZG project Colibri that focuses on the automatic analysis of DDC-synthesized numbers and the automatic classification of bibliographic title records.
Koch, T.; Ardö, A.; Brümmer, A.: ¬The building and maintenance of robot based internet search services : A review of current indexing and data collection methods. Prepared to meet the requirements of Work Package 3 of EU Telematics for Research, project DESIRE. Version D3.11v0.3 (Draft version 3) (1996) 0.01
```
0.012182986 = product of:
  0.036548957 = sum of:
    0.036548957 = weight(_text_:development in 1669) [ClassicSimilarity], result of:
      0.036548957 = score(doc=1669,freq=4.0), product of:
        0.16011542 = queryWeight, product of:
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.04384008 = queryNorm
        0.22826631 = fieldWeight in 1669, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.03125 = fieldNorm(doc=1669)
  0.33333334 = coord(1/3)
```
Abstract

After a short outline of problems, possibilities and difficulties of systematic information retrieval on the Internet and a description of efforts for development in this area, a specification of the terminology for this report is required. Although the process of retrieval is generally seen as an iterative process of browsing and information retrieval and several important services on the net have taken this fact into consideration, the emphasis of this report lays on the general retrieval tools for the whole of Internet. In order to be able to evaluate the differences, possibilities and restrictions of the different services it is necessary to begin with organizing the existing varieties in a typological/ taxonomical survey. The possibilities and weaknesses will be briefly compared and described for the most important services in the categories robot-based WWW-catalogues of different types, list- or form-based catalogues and simultaneous or collected search services respectively. It will however for different reasons not be possible to rank them in order of "best" services. Still more important are the weaknesses and problems common for all attempts of indexing the Internet. The problems of the quality of the input, the technical performance and the general problem of indexing virtual hypertext are shown to be at least as difficult as the different aspects of harvesting, indexing and information retrieval. Some of the attempts made in the area of further development of retrieval services will be mentioned in relation to descriptions of the contents of documents and standardization efforts. Internet harvesting and indexing technology and retrieval software is thoroughly reviewed. Details about all services and software are listed in analytical forms in Annex 1-3.
Pong, J.Y.-H.; Kwok, R.C.-W.; Lau, R.Y.-K.; Hao, J.-X.; Wong, P.C.-C.: ¬A comparative study of two automatic document classification methods in a library setting (2008) 0.01
```
0.010768341 = product of:
  0.03230502 = sum of:
    0.03230502 = weight(_text_:development in 2532) [ClassicSimilarity], result of:
      0.03230502 = score(doc=2532,freq=2.0), product of:
        0.16011542 = queryWeight, product of:
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.04384008 = queryNorm
        0.20176083 = fieldWeight in 2532, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2532)
  0.33333334 = coord(1/3)
```
Abstract

In current library practice, trained human experts usually carry out document cataloguing and indexing based on a manual approach. With the explosive growth in the number of electronic documents available on the Internet and digital libraries, it is increasingly difficult for library practitioners to categorize both electronic documents and traditional library materials using just a manual approach. To improve the effectiveness and efficiency of document categorization at the library setting, more in-depth studies of using automatic document classification methods to categorize library items are required. Machine learning research has advanced rapidly in recent years. However, applying machine learning techniques to improve library practice is still a relatively unexplored area. This paper illustrates the design and development of a machine learning based automatic document classification system to alleviate the manual categorization problem encountered within the library setting. Two supervised machine learning algorithms have been tested. Our empirical tests show that supervised machine learning algorithms in general, and the k-nearest neighbours (KNN) algorithm in particular, can be used to develop an effective document classification system to enhance current library practice. Moreover, some concrete recommendations regarding how to practically apply the KNN algorithm to develop automatic document classification in a library setting are made. To our best knowledge, this is the first in-depth study of applying the KNN algorithm to automatic document classification based on the widely used LCC classification scheme adopted by many large libraries.

Panyr, J.: STEINADLER: ein Verfahren zur automatischen Deskribierung und zur automatischen thematischen Klassifikation (1978) 0.01

0.010655365 = product of:
  0.031966094 = sum of:
    0.031966094 = product of:
      0.09589828 = sum of:
        0.09589828 = weight(_text_:29 in 5169) [ClassicSimilarity], result of:
          0.09589828 = score(doc=5169,freq=2.0), product of:
            0.1542157 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04384008 = queryNorm
            0.6218451 = fieldWeight in 5169, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.125 = fieldNorm(doc=5169)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Source: Nachrichten für Dokumentation. 29(1978), S.92-96

Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.01

0.007919635 = product of:
  0.023758903 = sum of:
    0.023758903 = product of:
      0.07127671 = sum of:
        0.07127671 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
          0.07127671 = score(doc=1046,freq=2.0), product of:
            0.1535205 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04384008 = queryNorm
            0.46428138 = fieldWeight in 1046, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=1046)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Date: 5. 5.2003 14:17:22

Borko, H.: Research in computer based classification systems (1985) 0.01
```
0.007537838 = product of:
  0.022613514 = sum of:
    0.022613514 = weight(_text_:development in 3647) [ClassicSimilarity], result of:
      0.022613514 = score(doc=3647,freq=2.0), product of:
        0.16011542 = queryWeight, product of:
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.04384008 = queryNorm
        0.14123258 = fieldWeight in 3647, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.02734375 = fieldNorm(doc=3647)
  0.33333334 = coord(1/3)
```
Abstract

The selection in this reader by R. M. Needham and K. Sparck Jones reports an early approach to automatic classification that was taken in England. The following selection reviews various approaches that were being pursued in the United States at about the same time. It then discusses a particular approach initiated in the early 1960s by Harold Borko, at that time Head of the Language Processing and Retrieval Research Staff at the System Development Corporation, Santa Monica, California and, since 1966, a member of the faculty at the Graduate School of Library and Information Science, University of California, Los Angeles. As was described earlier, there are two steps in automatic classification, the first being to identify pairs of terms that are similar by virtue of co-occurring as index terms in the same documents, and the second being to form equivalence classes of intersubstitutable terms. To compute similarities, Borko and his associates used a standard correlation formula; to derive classification categories, where Needham and Sparck Jones used clumping, the Borko team used the statistical technique of factor analysis. The fact that documents can be classified automatically, and in any number of ways, is worthy of passing notice. Worthy of serious attention would be a demonstra tion that a computer-based classification system was effective in the organization and retrieval of documents. One reason for the inclusion of the following selection in the reader is that it addresses the question of evaluation. To evaluate the effectiveness of their automatically derived classification, Borko and his team asked three questions. The first was Is the classification reliable? in other words, could the categories derived from one sample of texts be used to classify other texts? Reliability was assessed by a case-study comparison of the classes derived from three different samples of abstracts. The notso-surprising conclusion reached was that automatically derived classes were reliable only to the extent that the sample from which they were derived was representative of the total document collection. The second evaluation question asked whether the classification was reasonable, in the sense of adequately describing the content of the document collection. The answer was sought by comparing the automatically derived categories with categories in a related classification system that was manually constructed. Here the conclusion was that the automatic method yielded categories that fairly accurately reflected the major area of interest in the sample collection of texts; however, since there were only eleven such categories and they were quite broad, they could not be regarded as suitable for use in a university or any large general library. The third evaluation question asked whether automatic classification was accurate, in the sense of producing results similar to those obtainabie by human cIassifiers. When using human classification as a criterion, automatic classification was found to be 50 percent accurate.

Reiner, U.: Automatische DDC-Klassifizierung von bibliografischen Titeldatensätzen (2009) 0.01

0.006599696 = product of:
  0.019799087 = sum of:
    0.019799087 = product of:
      0.059397258 = sum of:
        0.059397258 = weight(_text_:22 in 611) [ClassicSimilarity], result of:
          0.059397258 = score(doc=611,freq=2.0), product of:
            0.1535205 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04384008 = queryNorm
            0.38690117 = fieldWeight in 611, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=611)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Date: 22. 8.2009 12:54:24

HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.01

0.006599696 = product of:
  0.019799087 = sum of:
    0.019799087 = product of:
      0.059397258 = sum of:
        0.059397258 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
          0.059397258 = score(doc=2748,freq=2.0), product of:
            0.1535205 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04384008 = queryNorm
            0.38690117 = fieldWeight in 2748, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2748)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Date: 1. 2.2016 18:25:22

Savic, D.: Designing an expert system for classifying office documents (1994) 0.01

0.0053276825 = product of:
  0.015983047 = sum of:
    0.015983047 = product of:
      0.04794914 = sum of:
        0.04794914 = weight(_text_:29 in 2655) [ClassicSimilarity], result of:
          0.04794914 = score(doc=2655,freq=2.0), product of:
            0.1542157 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04384008 = queryNorm
            0.31092256 = fieldWeight in 2655, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0625 = fieldNorm(doc=2655)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Source: Records management quarterly. 28(1994) no.3, S.20-29

Search (48 results, page 1 of 3)

Authors

Years

Languages

Types

Themes