Search (159 results, page 1 of 8)

Automatic classification research at OCLC (2002) 0.09

0.087416716 = product of:
  0.17483343 = sum of:
    0.077454165 = weight(_text_:standards in 1563) [ClassicSimilarity], result of:
      0.077454165 = score(doc=1563,freq=2.0), product of:
        0.22470023 = queryWeight, product of:
          4.4569545 = idf(docFreq=1393, maxDocs=44218)
          0.050415643 = queryNorm
        0.34469998 = fieldWeight in 1563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4569545 = idf(docFreq=1393, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1563)
    0.09737927 = sum of:
      0.049564905 = weight(_text_:organization in 1563) [ClassicSimilarity], result of:
        0.049564905 = score(doc=1563,freq=2.0), product of:
          0.17974974 = queryWeight, product of:
            3.5653565 = idf(docFreq=3399, maxDocs=44218)
            0.050415643 = queryNorm
          0.27574396 = fieldWeight in 1563, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5653565 = idf(docFreq=3399, maxDocs=44218)
            0.0546875 = fieldNorm(doc=1563)
      0.047814365 = weight(_text_:22 in 1563) [ClassicSimilarity], result of:
        0.047814365 = score(doc=1563,freq=2.0), product of:
          0.17654699 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.050415643 = queryNorm
          0.2708308 = fieldWeight in 1563, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=1563)
  0.5 = coord(2/4)

Abstract: OCLC enlists the cooperation of the world's libraries to make the written record of humankind's cultural heritage more accessible through electronic media. Part of this goal can be accomplished through the application of the principles of knowledge organization. We believe that cultural artifacts are effectively lost unless they are indexed, cataloged and classified. Accordingly, OCLC has developed products, sponsored research projects, and encouraged the participation in international standards communities whose outcome has been improved library classification schemes, cataloging productivity tools, and new proposals for the creation and maintenance of metadata. Though cataloging and classification requires expert intellectual effort, we recognize that at least some of the work must be automated if we hope to keep pace with cultural change
Date: 5. 5.2003 9:22:09

Ahmed, M.; Mukhopadhyay, M.; Mukhopadhyay, P.: Automated knowledge organization : AI ML based subject indexing system for libraries (2023) 0.06

0.061206747 = product of:
  0.081608996 = sum of:
    0.008582841 = weight(_text_:information in 977) [ClassicSimilarity], result of:
      0.008582841 = score(doc=977,freq=2.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.09697737 = fieldWeight in 977, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=977)
    0.0553244 = weight(_text_:standards in 977) [ClassicSimilarity], result of:
      0.0553244 = score(doc=977,freq=2.0), product of:
        0.22470023 = queryWeight, product of:
          4.4569545 = idf(docFreq=1393, maxDocs=44218)
          0.050415643 = queryNorm
        0.24621427 = fieldWeight in 977, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4569545 = idf(docFreq=1393, maxDocs=44218)
          0.0390625 = fieldNorm(doc=977)
    0.017701752 = product of:
      0.035403505 = sum of:
        0.035403505 = weight(_text_:organization in 977) [ClassicSimilarity], result of:
          0.035403505 = score(doc=977,freq=2.0), product of:
            0.17974974 = queryWeight, product of:
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.050415643 = queryNorm
            0.19695997 = fieldWeight in 977, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.0390625 = fieldNorm(doc=977)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: The research study as reported here is an attempt to explore the possibilities of an AI/ML-based semi-automated indexing system in a library setup to handle large volumes of documents. It uses the Python virtual environment to install and configure an open source AI environment (named Annif) to feed the LOD (Linked Open Data) dataset of Library of Congress Subject Headings (LCSH) as a standard KOS (Knowledge Organisation System). The framework deployed the Turtle format of LCSH after cleaning the file with Skosify, applied an array of backend algorithms (namely TF-IDF, Omikuji, and NN-Ensemble) to measure relative performance, and selected Snowball as an analyser. The training of Annif was conducted with a large set of bibliographic records populated with subject descriptors (MARC tag 650$a) and indexed by trained LIS professionals. The training dataset is first treated with MarcEdit to export it in a format suitable for OpenRefine, and then in OpenRefine it undergoes many steps to produce a bibliographic record set suitable to train Annif. The framework, after training, has been tested with a bibliographic dataset to measure indexing efficiencies, and finally, the automated indexing framework is integrated with data wrangling software (OpenRefine) to produce suggested headings on a mass scale. The entire framework is based on open-source software, open datasets, and open standards.
Source: DESIDOC journal of library and information technology. 43(2023) no.1, S.45-54

Koch, T.; Vizine-Goetz, D.: Automatic classification and content navigation support for Web services : DESIRE II cooperates with OCLC (1998) 0.05

0.051118307 = product of:
  0.10223661 = sum of:
    0.077454165 = weight(_text_:standards in 1568) [ClassicSimilarity], result of:
      0.077454165 = score(doc=1568,freq=2.0), product of:
        0.22470023 = queryWeight, product of:
          4.4569545 = idf(docFreq=1393, maxDocs=44218)
          0.050415643 = queryNorm
        0.34469998 = fieldWeight in 1568, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4569545 = idf(docFreq=1393, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1568)
    0.024782453 = product of:
      0.049564905 = sum of:
        0.049564905 = weight(_text_:organization in 1568) [ClassicSimilarity], result of:
          0.049564905 = score(doc=1568,freq=2.0), product of:
            0.17974974 = queryWeight, product of:
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.050415643 = queryNorm
            0.27574396 = fieldWeight in 1568, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1568)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: Emerging standards in knowledge representation and organization are preparing the way for distributed vocabulary support in Internet search services. NetLab researchers are exploring several innovative solutions for searching and browsing in the subject-based Internet gateway, Electronic Engineering Library, Sweden (EELS). The implementation of the EELS service is described, specifically, the generation of the robot-gathered database 'All' engineering and the automated application of the Ei thesaurus and classification scheme. NetLab and OCLC researchers are collaborating to investigate advanced solutions to automated classification in the DESIRE II context. A plan for furthering the development of distributed vocabulary support in Internet search services is offered.

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.05

0.050282635 = product of:
  0.10056527 = sum of:
    0.0800734 = product of:
      0.2402202 = sum of:
        0.2402202 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.2402202 = score(doc=562,freq=2.0), product of:
            0.42742437 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.050415643 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.02049187 = product of:
      0.04098374 = sum of:
        0.04098374 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.04098374 = score(doc=562,freq=2.0), product of:
            0.17654699 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050415643 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Khoo, C.S.G.; Ng, K.; Ou, S.: ¬An exploratory study of human clustering of Web pages (2003) 0.04
```
0.03963486 = product of:
  0.07926972 = sum of:
    0.011892734 = weight(_text_:information in 2741) [ClassicSimilarity], result of:
      0.011892734 = score(doc=2741,freq=6.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.1343758 = fieldWeight in 2741, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03125 = fieldNorm(doc=2741)
    0.067376986 = sum of:
      0.040054493 = weight(_text_:organization in 2741) [ClassicSimilarity], result of:
        0.040054493 = score(doc=2741,freq=4.0), product of:
          0.17974974 = queryWeight, product of:
            3.5653565 = idf(docFreq=3399, maxDocs=44218)
            0.050415643 = queryNorm
          0.22283478 = fieldWeight in 2741, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            3.5653565 = idf(docFreq=3399, maxDocs=44218)
            0.03125 = fieldNorm(doc=2741)
      0.027322493 = weight(_text_:22 in 2741) [ClassicSimilarity], result of:
        0.027322493 = score(doc=2741,freq=2.0), product of:
          0.17654699 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.050415643 = queryNorm
          0.15476047 = fieldWeight in 2741, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.03125 = fieldNorm(doc=2741)
  0.5 = coord(2/4)
```
Abstract

This study seeks to find out how human beings cluster Web pages naturally. Twenty Web pages retrieved by the Northem Light search engine for each of 10 queries were sorted by 3 subjects into categories that were natural or meaningful to them. lt was found that different subjects clustered the same set of Web pages quite differently and created different categories. The average inter-subject similarity of the clusters created was a low 0.27. Subjects created an average of 5.4 clusters for each sorting. The categories constructed can be divided into 10 types. About 1/3 of the categories created were topical. Another 20% of the categories relate to the degree of relevance or usefulness. The rest of the categories were subject-independent categories such as format, purpose, authoritativeness and direction to other sources. The authors plan to develop automatic methods for categorizing Web pages using the common categories created by the subjects. lt is hoped that the techniques developed can be used by Web search engines to automatically organize Web pages retrieved into categories that are natural to users. 1. Introduction The World Wide Web is an increasingly important source of information for people globally because of its ease of access, the ease of publishing, its ability to transcend geographic and national boundaries, its flexibility and heterogeneity and its dynamic nature. However, Web users also find it increasingly difficult to locate relevant and useful information in this vast information storehouse. Web search engines, despite their scope and power, appear to be quite ineffective. They retrieve too many pages, and though they attempt to rank retrieved pages in order of probable relevance, often the relevant documents do not appear in the top-ranked 10 or 20 documents displayed. Several studies have found that users do not know how to use the advanced features of Web search engines, and do not know how to formulate and re-formulate queries. Users also typically exert minimal effort in performing, evaluating and refining their searches, and are unwilling to scan more than 10 or 20 items retrieved (Jansen, Spink, Bateman & Saracevic, 1998). This suggests that the conventional ranked-list display of search results does not satisfy user requirements, and that better ways of presenting and summarizing search results have to be developed. One promising approach is to group retrieved pages into clusters or categories to allow users to navigate immediately to the "promising" clusters where the most useful Web pages are likely to be located. This approach has been adopted by a number of search engines (notably Northem Light) and search agents.

Date

12. 9.2004 9:56:22

Series

Advances in knowledge organization; vol.8

Source

Challenges in knowledge representation and organization for the 21st century: Integration of knowledge across boundaries. Proceedings of the 7th ISKO International Conference Granada, Spain, July 10-13, 2002. Ed.: M. López-Huertas

Guerrero-Bote, V.P.; Moya Anegón, F. de; Herrero Solana, V.: Document organization using Kohonen's algorithm (2002) 0.03

0.03191998 = product of:
  0.06383996 = sum of:
    0.023785468 = weight(_text_:information in 2564) [ClassicSimilarity], result of:
      0.023785468 = score(doc=2564,freq=6.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.2687516 = fieldWeight in 2564, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0625 = fieldNorm(doc=2564)
    0.040054493 = product of:
      0.080108985 = sum of:
        0.080108985 = weight(_text_:organization in 2564) [ClassicSimilarity], result of:
          0.080108985 = score(doc=2564,freq=4.0), product of:
            0.17974974 = queryWeight, product of:
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.050415643 = queryNorm
            0.44566956 = fieldWeight in 2564, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.0625 = fieldNorm(doc=2564)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: The classification of documents from a bibliographic database is a task that is linked to processes of information retrieval based on partial matching. A method is described of vectorizing reference documents from LISA which permits their topological organization using Kohonen's algorithm. As an example a map is generated of 202 documents from LISA, and an analysis is made of the possibilities of this type of neural network with respect to the development of information retrieval systems based on graphical browsing.
Source: Information processing and management. 38(2002) no.1, S.79-89

McKiernan, G.: Automated categorisation of Web resources : a profile of selected projects, research, products, and services (1996) 0.03

0.026284594 = product of:
  0.05256919 = sum of:
    0.017165681 = weight(_text_:information in 2533) [ClassicSimilarity], result of:
      0.017165681 = score(doc=2533,freq=2.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.19395474 = fieldWeight in 2533, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.078125 = fieldNorm(doc=2533)
    0.035403505 = product of:
      0.07080701 = sum of:
        0.07080701 = weight(_text_:organization in 2533) [ClassicSimilarity], result of:
          0.07080701 = score(doc=2533,freq=2.0), product of:
            0.17974974 = queryWeight, product of:
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.050415643 = queryNorm
            0.39391994 = fieldWeight in 2533, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.078125 = fieldNorm(doc=2533)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: Profiles several representative current efforts that apply established as well as more innovative methods of automated classification, organization or other method of categorisation of WWW resources
Source: New review of information networking. 1996, no.2, S.15-40

Reiner, U.: Automatische DDC-Klassifizierung von bibliografischen Titeldatensätzen (2009) 0.03

0.025659401 = product of:
  0.051318802 = sum of:
    0.017165681 = weight(_text_:information in 611) [ClassicSimilarity], result of:
      0.017165681 = score(doc=611,freq=2.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.19395474 = fieldWeight in 611, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.078125 = fieldNorm(doc=611)
    0.03415312 = product of:
      0.06830624 = sum of:
        0.06830624 = weight(_text_:22 in 611) [ClassicSimilarity], result of:
          0.06830624 = score(doc=611,freq=2.0), product of:
            0.17654699 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050415643 = queryNorm
            0.38690117 = fieldWeight in 611, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=611)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Content: Präsentation zum Vortrag anlässlich des 98. Deutscher Bibliothekartag in Erfurt: Ein neuer Blick auf Bibliotheken; TK10: Information erschließen und recherchieren Inhalte erschließen - mit neuen Tools
Date: 22. 8.2009 12:54:24

Smiraglia, R.P.; Cai, X.: Tracking the evolution of clustering, machine learning, automatic indexing and automatic classification in knowledge organization (2017) 0.02
```
0.024082582 = product of:
  0.048165165 = sum of:
    0.008582841 = weight(_text_:information in 3627) [ClassicSimilarity], result of:
      0.008582841 = score(doc=3627,freq=2.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.09697737 = fieldWeight in 3627, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3627)
    0.039582323 = product of:
      0.07916465 = sum of:
        0.07916465 = weight(_text_:organization in 3627) [ClassicSimilarity], result of:
          0.07916465 = score(doc=3627,freq=10.0), product of:
            0.17974974 = queryWeight, product of:
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.050415643 = queryNorm
            0.44041592 = fieldWeight in 3627, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3627)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

A very important extension of the traditional domain of knowledge organization (KO) arises from attempts to incorporate techniques devised in the computer science domain for automatic concept extraction and for grouping, categorizing, clustering and otherwise organizing knowledge using mechanical means. Four specific terms have emerged to identify the most prevalent techniques: machine learning, clustering, automatic indexing, and automatic classification. Our study presents three domain analytical case analyses in search of answers. The first case relies on citations located using the ISKO-supported "Knowledge Organization Bibliography." The second case relies on works in both Web of Science and SCOPUS. Case three applies co-word analysis and citation analysis to the contents of the papers in the present special issue. We observe scholars involved in "clustering" and "automatic classification" who share common thematic emphases. But we have found no coherence, no common activity and no social semantics. We have not found a research front, or a common teleology within the KO domain. We also have found a lively group of authors who have succeeded in submitting papers to this special issue, and their work quite interestingly aligns with the case studies we report. There is an emphasis on KO for information retrieval; there is much work on clustering (which involves conceptual points within texts) and automatic classification (which involves semantic groupings at the meta-document level).

Content

Beitrag in einem Special Issue "New Trends for Knowledge Organization, Guest Editor: Renato Rocha Souza".

Source

Knowledge organization. 44(2017) no.3, S.215-233
Golub, K.: Automated subject classification of textual documents in the context of Web-based hierarchical browsing (2011) 0.02
```
0.023545908 = product of:
  0.047091816 = sum of:
    0.01029941 = weight(_text_:information in 4558) [ClassicSimilarity], result of:
      0.01029941 = score(doc=4558,freq=2.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.116372846 = fieldWeight in 4558, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=4558)
    0.036792405 = product of:
      0.07358481 = sum of:
        0.07358481 = weight(_text_:organization in 4558) [ClassicSimilarity], result of:
          0.07358481 = score(doc=4558,freq=6.0), product of:
            0.17974974 = queryWeight, product of:
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.050415643 = queryNorm
            0.40937364 = fieldWeight in 4558, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.046875 = fieldNorm(doc=4558)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

While automated methods for information organization have been around for several decades now, exponential growth of the World Wide Web has put them into the forefront of research in different communities, within which several approaches can be identified: 1) machine learning (algorithms that allow computers to improve their performance based on learning from pre-existing data); 2) document clustering (algorithms for unsupervised document organization and automated topic extraction); and 3) string matching (algorithms that match given strings within larger text). Here the aim was to automatically organize textual documents into hierarchical structures for subject browsing. The string-matching approach was tested using a controlled vocabulary (containing pre-selected and pre-defined authorized terms, each corresponding to only one concept). The results imply that an appropriate controlled vocabulary, with a sufficient number of entry terms designating classes, could in itself be a solution for automated classification. Then, if the same controlled vocabulary had an appropriat hierarchical structure, it would at the same time provide a good browsing structure for the collection of automatically classified documents.

Source

Knowledge organization. 38(2011) no.3, S.230-244

Dubin, D.: Dimensions and discriminability (1998) 0.02

0.022359734 = product of:
  0.04471947 = sum of:
    0.020812286 = weight(_text_:information in 2338) [ClassicSimilarity], result of:
      0.020812286 = score(doc=2338,freq=6.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.23515764 = fieldWeight in 2338, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2338)
    0.023907183 = product of:
      0.047814365 = sum of:
        0.047814365 = weight(_text_:22 in 2338) [ClassicSimilarity], result of:
          0.047814365 = score(doc=2338,freq=2.0), product of:
            0.17654699 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050415643 = queryNorm
            0.2708308 = fieldWeight in 2338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2338)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Date: 22. 9.1997 19:16:05
Imprint: Urbana-Champaign, IL : Illinois University at Urbana-Champaign, Graduate School of Library and Information Science
Source: Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al

Rose, J.R.; Gasteiger, J.: HORACE: an automatic system for the hierarchical classification of chemical reactions (1994) 0.02

0.020887807 = product of:
  0.041775614 = sum of:
    0.01699316 = weight(_text_:information in 7696) [ClassicSimilarity], result of:
      0.01699316 = score(doc=7696,freq=4.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.1920054 = fieldWeight in 7696, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7696)
    0.024782453 = product of:
      0.049564905 = sum of:
        0.049564905 = weight(_text_:organization in 7696) [ClassicSimilarity], result of:
          0.049564905 = score(doc=7696,freq=2.0), product of:
            0.17974974 = queryWeight, product of:
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.050415643 = queryNorm
            0.27574396 = fieldWeight in 7696, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.0546875 = fieldNorm(doc=7696)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: Describes an automatic classification system for classifying chemical reactions. A detailed study of the classification of chemical reactions, based on topological and physicochemical features, is followed by an analysis of the hierarchical classification produced by the HORACE algorithm (Hierarchical Organization of Reactions through Attribute and Condition Eduction), which combines both approaches in a synergistic manner. The searching and updating of reaction hierarchies is demonstrated with the hierarchies produced for 2 data sets by the HORACE algorithm. Shows that reaction hierarchies provide an efficient access to reaction information and indicate the main reaction types for a given reaction scheme, define the scope of a reaction type, enable searchers to find unusual reactions, and can help in locating the reactions most relevant for a given problem
Source: Journal of chemical information and computer sciences. 34(1994) no.1, S.74-90

Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998) 0.02

0.020450171 = product of:
  0.040900342 = sum of:
    0.01699316 = weight(_text_:information in 1673) [ClassicSimilarity], result of:
      0.01699316 = score(doc=1673,freq=4.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.1920054 = fieldWeight in 1673, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1673)
    0.023907183 = product of:
      0.047814365 = sum of:
        0.047814365 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
          0.047814365 = score(doc=1673,freq=2.0), product of:
            0.17654699 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050415643 = queryNorm
            0.2708308 = fieldWeight in 1673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1673)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: The Wolverhampton Web Library (WWLib) is a WWW search engine that provides access to UK based information. The experimental version developed in 1995, was a success but highlighted the need for a much higher degree of automation. An interesting feature of the experimental WWLib was that it organised information according to DDC. Discusses the advantages of classification and describes the automatic classifier that is being developed in Java as part of the new, fully automated WWLib
Date: 1. 8.1996 22:08:06

Desale, S.K.; Kumbhar, R.: Research on automatic classification of documents in library environment : a literature review (2013) 0.02

0.020170141 = product of:
  0.040340282 = sum of:
    0.01029941 = weight(_text_:information in 1071) [ClassicSimilarity], result of:
      0.01029941 = score(doc=1071,freq=2.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.116372846 = fieldWeight in 1071, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=1071)
    0.030040871 = product of:
      0.060081743 = sum of:
        0.060081743 = weight(_text_:organization in 1071) [ClassicSimilarity], result of:
          0.060081743 = score(doc=1071,freq=4.0), product of:
            0.17974974 = queryWeight, product of:
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.050415643 = queryNorm
            0.33425218 = fieldWeight in 1071, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.046875 = fieldNorm(doc=1071)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: This paper aims to provide an overview of automatic classification research, which focuses on issues related to the automatic classification of documents in a library environment. The review covers literature published in mainstream library and information science studies. The review was done on literature published in both academic and professional LIS journals and other documents. This review reveals that basically three types of research are being done on automatic classification: 1) hierarchical classification using different library classification schemes, 2) text categorization and document categorization using different type of classifiers with or without using training documents, and 3) automatic bibliographic classification. Predominantly this research is directed towards solving problems of organization of digital documents in an online environment. However, very little research is devoted towards solving the problems of arrangement of physical documents.
Source: Knowledge organization. 40(2013) no.5, S.295-304

Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.02
```
0.01989231 = product of:
  0.03978462 = sum of:
    0.022708062 = weight(_text_:information in 1107) [ClassicSimilarity], result of:
      0.022708062 = score(doc=1107,freq=14.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.256578 = fieldWeight in 1107, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1107)
    0.01707656 = product of:
      0.03415312 = sum of:
        0.03415312 = weight(_text_:22 in 1107) [ClassicSimilarity], result of:
          0.03415312 = score(doc=1107,freq=2.0), product of:
            0.17654699 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050415643 = queryNorm
            0.19345059 = fieldWeight in 1107, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1107)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Retrieval of disease information is often based on several key aspects such as etiology, diagnosis, treatment, prevention, and symptoms of diseases. Automatic identification of disease aspect information is thus essential. In this article, I model the aspect identification problem as a text classification (TC) problem in which a disease aspect corresponds to a category. The disease aspect classification problem poses two challenges to classifiers: (a) a medical text often contains information about multiple aspects of a disease and hence produces noise for the classifiers and (b) text classifiers often cannot extract the textual parts (i.e., passages) about the categories of interest. I thus develop a technique, PETC (Passage Extractor for Text Classification), that extracts passages (from medical texts) for the underlying text classifiers to classify. Case studies on thousands of Chinese and English medical texts show that PETC enhances a support vector machine (SVM) classifier in classifying disease aspect information. PETC also performs better than three state-of-the-art classifier enhancement techniques, including two passage extraction techniques for text classifiers and a technique that employs term proximity information to enhance text classifiers. The contribution is of significance to evidence-based medicine, health education, and healthcare decision support. PETC can be used in those application domains in which a text to be classified may have several parts about different categories.

Date

28.10.2013 19:22:57

Source

Journal of the American Society for Information Science and Technology. 64(2013) no.11, S.2265-2277
Liu, R.-L.: Context recognition for hierarchical text classification (2009) 0.02
```
0.019165486 = product of:
  0.038330972 = sum of:
    0.017839102 = weight(_text_:information in 2760) [ClassicSimilarity], result of:
      0.017839102 = score(doc=2760,freq=6.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.20156369 = fieldWeight in 2760, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2760)
    0.02049187 = product of:
      0.04098374 = sum of:
        0.04098374 = weight(_text_:22 in 2760) [ClassicSimilarity], result of:
          0.04098374 = score(doc=2760,freq=2.0), product of:
            0.17654699 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050415643 = queryNorm
            0.23214069 = fieldWeight in 2760, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2760)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Information is often organized as a text hierarchy. A hierarchical text-classification system is thus essential for the management, sharing, and dissemination of information. It aims to automatically classify each incoming document into zero, one, or several categories in the text hierarchy. In this paper, we present a technique called CRHTC (context recognition for hierarchical text classification) that performs hierarchical text classification by recognizing the context of discussion (COD) of each category. A category's COD is governed by its ancestor categories, whose contents indicate contextual backgrounds of the category. A document may be classified into a category only if its content matches the category's COD. CRHTC does not require any trials to manually set parameters, and hence is more portable and easier to implement than other methods. It is empirically evaluated under various conditions. The results show that CRHTC achieves both better and more stable performance than several hierarchical and nonhierarchical text-classification methodologies.

Date

22. 3.2009 19:11:54

Source

Journal of the American Society for Information Science and Technology. 60(2009) no.4, S.803-813

Savic, D.: Automatic classification of office documents : review of available methods and techniques (1995) 0.02

0.018399216 = product of:
  0.036798432 = sum of:
    0.012015978 = weight(_text_:information in 2219) [ClassicSimilarity], result of:
      0.012015978 = score(doc=2219,freq=2.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.13576832 = fieldWeight in 2219, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2219)
    0.024782453 = product of:
      0.049564905 = sum of:
        0.049564905 = weight(_text_:organization in 2219) [ClassicSimilarity], result of:
          0.049564905 = score(doc=2219,freq=2.0), product of:
            0.17974974 = queryWeight, product of:
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.050415643 = queryNorm
            0.27574396 = fieldWeight in 2219, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2219)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: Classification of office documents is one of the administrative functions carried out by almost every organization and institution which sends and receives correspondence. Processing of this increasing amount of information coming and out going mail, in particular its classification, is time consuming and expensive. More and more organizations are seeking a solution for meeting this challenge by designing computer based systems for automatic classification. Examines the present status of available knowledge and methodology which can be used for automatic classification of office documents. Besides a review of classic methods and techniques, the focus id also placed on the application of artificial intelligence

Hu, G.; Zhou, S.; Guan, J.; Hu, X.: Towards effective document clustering : a constrained K-means based approach (2008) 0.02

0.018399216 = product of:
  0.036798432 = sum of:
    0.012015978 = weight(_text_:information in 2113) [ClassicSimilarity], result of:
      0.012015978 = score(doc=2113,freq=2.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.13576832 = fieldWeight in 2113, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2113)
    0.024782453 = product of:
      0.049564905 = sum of:
        0.049564905 = weight(_text_:organization in 2113) [ClassicSimilarity], result of:
          0.049564905 = score(doc=2113,freq=2.0), product of:
            0.17974974 = queryWeight, product of:
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.050415643 = queryNorm
            0.27574396 = fieldWeight in 2113, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2113)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: Document clustering is an important tool for document collection organization and browsing. In real applications, some limited knowledge about cluster membership of a small number of documents is often available, such as some pairs of documents belonging to the same cluster. This kind of prior knowledge can be served as constraints for the clustering process. We integrate the constraints into the trace formulation of the sum of square Euclidean distance function of K-means. Then, the combined criterion function is transformed into trace maximization, which is further optimized by eigen-decomposition. Our experimental evaluation shows that the proposed semi-supervised clustering method can achieve better performance, compared to three existing methods.
Source: Information processing and management. 44(2008) no.4, S.1397-1409

Bock, H.-H.: Datenanalyse zur Strukturierung und Ordnung von Information (1989) 0.02

0.01796158 = product of:
  0.03592316 = sum of:
    0.012015978 = weight(_text_:information in 141) [ClassicSimilarity], result of:
      0.012015978 = score(doc=141,freq=2.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.13576832 = fieldWeight in 141, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=141)
    0.023907183 = product of:
      0.047814365 = sum of:
        0.047814365 = weight(_text_:22 in 141) [ClassicSimilarity], result of:
          0.047814365 = score(doc=141,freq=2.0), product of:
            0.17654699 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050415643 = queryNorm
            0.2708308 = fieldWeight in 141, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=141)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Pages: S.1-22

Yoon, Y.; Lee, C.; Lee, G.G.: ¬An effective procedure for constructing a hierarchical text classification system (2006) 0.02

0.01796158 = product of:
  0.03592316 = sum of:
    0.012015978 = weight(_text_:information in 5273) [ClassicSimilarity], result of:
      0.012015978 = score(doc=5273,freq=2.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.13576832 = fieldWeight in 5273, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5273)
    0.023907183 = product of:
      0.047814365 = sum of:
        0.047814365 = weight(_text_:22 in 5273) [ClassicSimilarity], result of:
          0.047814365 = score(doc=5273,freq=2.0), product of:
            0.17654699 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050415643 = queryNorm
            0.2708308 = fieldWeight in 5273, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5273)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Date: 22. 7.2006 16:24:52
Source: Journal of the American Society for Information Science and Technology. 57(2006) no.3, S.431-442

Search (159 results, page 1 of 8)

Authors

Years

Languages

Types

Themes

Subjects