Search (175 results, page 1 of 9)

Dubin, D.: Dimensions and discriminability (1998) 0.15

0.14626439 = product of:
  0.24377397 = sum of:
    0.02599618 = weight(_text_:of in 2338) [ClassicSimilarity], result of:
      0.02599618 = score(doc=2338,freq=16.0), product of:
        0.07599624 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04859849 = queryNorm
        0.34207192 = fieldWeight in 2338, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2338)
    0.08327723 = weight(_text_:subject in 2338) [ClassicSimilarity], result of:
      0.08327723 = score(doc=2338,freq=6.0), product of:
        0.17381717 = queryWeight, product of:
          3.576596 = idf(docFreq=3361, maxDocs=44218)
          0.04859849 = queryNorm
        0.4791082 = fieldWeight in 2338, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.576596 = idf(docFreq=3361, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2338)
    0.13450055 = sum of:
      0.08840958 = weight(_text_:headings in 2338) [ClassicSimilarity], result of:
        0.08840958 = score(doc=2338,freq=2.0), product of:
          0.23569997 = queryWeight, product of:
            4.849944 = idf(docFreq=940, maxDocs=44218)
            0.04859849 = queryNorm
          0.37509373 = fieldWeight in 2338, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            4.849944 = idf(docFreq=940, maxDocs=44218)
            0.0546875 = fieldNorm(doc=2338)
      0.04609097 = weight(_text_:22 in 2338) [ClassicSimilarity], result of:
        0.04609097 = score(doc=2338,freq=2.0), product of:
          0.17018363 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.04859849 = queryNorm
          0.2708308 = fieldWeight in 2338, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=2338)
  0.6 = coord(3/5)

Abstract: Visualization interfaces can improve subject access by highlighting the inclusion of document representation components in similarity and discrimination relationships. Within a set of retrieved documents, what kinds of groupings can index terms and subject headings make explicit? The role of controlled vocabulary in classifying search output is examined
Date: 22. 9.1997 19:16:05
Imprint: Urbana-Champaign, IL : Illinois University at Urbana-Champaign, Graduate School of Library and Information Science
Source: Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al

Golub, K.: Automated subject classification of textual Web pages, based on a controlled vocabulary : challenges and recommendations (2006) 0.12

0.121793024 = product of:
  0.20298837 = sum of:
    0.12242402 = weight(_text_:list in 5897) [ClassicSimilarity], result of:
      0.12242402 = score(doc=5897,freq=4.0), product of:
        0.25191793 = queryWeight, product of:
          5.183657 = idf(docFreq=673, maxDocs=44218)
          0.04859849 = queryNorm
        0.48596787 = fieldWeight in 5897, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.183657 = idf(docFreq=673, maxDocs=44218)
          0.046875 = fieldNorm(doc=5897)
    0.022282438 = weight(_text_:of in 5897) [ClassicSimilarity], result of:
      0.022282438 = score(doc=5897,freq=16.0), product of:
        0.07599624 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04859849 = queryNorm
        0.2932045 = fieldWeight in 5897, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=5897)
    0.058281917 = weight(_text_:subject in 5897) [ClassicSimilarity], result of:
      0.058281917 = score(doc=5897,freq=4.0), product of:
        0.17381717 = queryWeight, product of:
          3.576596 = idf(docFreq=3361, maxDocs=44218)
          0.04859849 = queryNorm
        0.33530587 = fieldWeight in 5897, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.576596 = idf(docFreq=3361, maxDocs=44218)
          0.046875 = fieldNorm(doc=5897)
  0.6 = coord(3/5)

Abstract: The primary objective of this study was to identify and address problems of applying a controlled vocabulary in automated subject classification of textual Web pages, in the area of engineering. Web pages have special characteristics such as structural information, but are at the same time rather heterogeneous. The classification approach used comprises string-to-string matching between words in a term list extracted from the Ei (Engineering Information) thesaurus and classification scheme, and words in the text to be classified. Based on a sample of 70 Web pages, a number of problems with the term list are identified. Reasons for those problems are discussed and improvements proposed. Methods for implementing the improvements are also specified, suggesting further research.
Source: New review of hypermedia and multimedia. 12(2006) no.1, S.11-27

Khoo, C.S.G.; Ng, K.; Ou, S.: ¬An exploratory study of human clustering of Web pages (2003) 0.11
```
0.10704198 = product of:
  0.13380247 = sum of:
    0.057711232 = weight(_text_:list in 2741) [ClassicSimilarity], result of:
      0.057711232 = score(doc=2741,freq=2.0), product of:
        0.25191793 = queryWeight, product of:
          5.183657 = idf(docFreq=673, maxDocs=44218)
          0.04859849 = queryNorm
        0.22908744 = fieldWeight in 2741, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.183657 = idf(docFreq=673, maxDocs=44218)
          0.03125 = fieldNorm(doc=2741)
    0.024067784 = weight(_text_:of in 2741) [ClassicSimilarity], result of:
      0.024067784 = score(doc=2741,freq=42.0), product of:
        0.07599624 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04859849 = queryNorm
        0.31669703 = fieldWeight in 2741, product of:
          6.4807405 = tf(freq=42.0), with freq of:
            42.0 = termFreq=42.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03125 = fieldNorm(doc=2741)
    0.038854614 = weight(_text_:subject in 2741) [ClassicSimilarity], result of:
      0.038854614 = score(doc=2741,freq=4.0), product of:
        0.17381717 = queryWeight, product of:
          3.576596 = idf(docFreq=3361, maxDocs=44218)
          0.04859849 = queryNorm
        0.22353725 = fieldWeight in 2741, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.576596 = idf(docFreq=3361, maxDocs=44218)
          0.03125 = fieldNorm(doc=2741)
    0.013168849 = product of:
      0.026337698 = sum of:
        0.026337698 = weight(_text_:22 in 2741) [ClassicSimilarity], result of:
          0.026337698 = score(doc=2741,freq=2.0), product of:
            0.17018363 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04859849 = queryNorm
            0.15476047 = fieldWeight in 2741, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=2741)
      0.5 = coord(1/2)
  0.8 = coord(4/5)
```
Abstract

This study seeks to find out how human beings cluster Web pages naturally. Twenty Web pages retrieved by the Northem Light search engine for each of 10 queries were sorted by 3 subjects into categories that were natural or meaningful to them. lt was found that different subjects clustered the same set of Web pages quite differently and created different categories. The average inter-subject similarity of the clusters created was a low 0.27. Subjects created an average of 5.4 clusters for each sorting. The categories constructed can be divided into 10 types. About 1/3 of the categories created were topical. Another 20% of the categories relate to the degree of relevance or usefulness. The rest of the categories were subject-independent categories such as format, purpose, authoritativeness and direction to other sources. The authors plan to develop automatic methods for categorizing Web pages using the common categories created by the subjects. lt is hoped that the techniques developed can be used by Web search engines to automatically organize Web pages retrieved into categories that are natural to users. 1. Introduction The World Wide Web is an increasingly important source of information for people globally because of its ease of access, the ease of publishing, its ability to transcend geographic and national boundaries, its flexibility and heterogeneity and its dynamic nature. However, Web users also find it increasingly difficult to locate relevant and useful information in this vast information storehouse. Web search engines, despite their scope and power, appear to be quite ineffective. They retrieve too many pages, and though they attempt to rank retrieved pages in order of probable relevance, often the relevant documents do not appear in the top-ranked 10 or 20 documents displayed. Several studies have found that users do not know how to use the advanced features of Web search engines, and do not know how to formulate and re-formulate queries. Users also typically exert minimal effort in performing, evaluating and refining their searches, and are unwilling to scan more than 10 or 20 items retrieved (Jansen, Spink, Bateman & Saracevic, 1998). This suggests that the conventional ranked-list display of search results does not satisfy user requirements, and that better ways of presenting and summarizing search results have to be developed. One promising approach is to group retrieved pages into clusters or categories to allow users to navigate immediately to the "promising" clusters where the most useful Web pages are likely to be located. This approach has been adopted by a number of search engines (notably Northem Light) and search agents.

Date

12. 9.2004 9:56:22

Source

Challenges in knowledge representation and organization for the 21st century: Integration of knowledge across boundaries. Proceedings of the 7th ISKO International Conference Granada, Spain, July 10-13, 2002. Ed.: M. López-Huertas

Wu, M.; Liu, Y.-H.; Brownlee, R.; Zhang, X.: Evaluating utility and automatic classification of subject metadata from Research Data Australia (2021) 0.11

0.10604166 = product of:
  0.1767361 = sum of:
    0.022282438 = weight(_text_:of in 453) [ClassicSimilarity], result of:
      0.022282438 = score(doc=453,freq=16.0), product of:
        0.07599624 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04859849 = queryNorm
        0.2932045 = fieldWeight in 453, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=453)
    0.116563834 = weight(_text_:subject in 453) [ClassicSimilarity], result of:
      0.116563834 = score(doc=453,freq=16.0), product of:
        0.17381717 = queryWeight, product of:
          3.576596 = idf(docFreq=3361, maxDocs=44218)
          0.04859849 = queryNorm
        0.67061174 = fieldWeight in 453, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          3.576596 = idf(docFreq=3361, maxDocs=44218)
          0.046875 = fieldNorm(doc=453)
    0.03788982 = product of:
      0.07577964 = sum of:
        0.07577964 = weight(_text_:headings in 453) [ClassicSimilarity], result of:
          0.07577964 = score(doc=453,freq=2.0), product of:
            0.23569997 = queryWeight, product of:
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.04859849 = queryNorm
            0.3215089 = fieldWeight in 453, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.046875 = fieldNorm(doc=453)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Abstract: In this paper, we present a case study of how well subject metadata (comprising headings from an international classification scheme) has been deployed in a national data catalogue, and how often data seekers use subject metadata when searching for data. Through an analysis of user search behaviour as recorded in search logs, we find evidence that users utilise the subject metadata for data discovery. Since approximately half of the records ingested by the catalogue did not include subject metadata at the time of harvest, we experimented with automatic subject classification approaches in order to enrich these records and to provide additional support for user search and data discovery. Our results show that automatic methods work well for well represented categories of subject metadata, and these categories tend to have features that can distinguish themselves from the other categories. Our findings raise implications for data catalogue providers; they should invest more effort to enhance the quality of data records by providing an adequate description of these records for under-represented subject categories.

Godby, C. J.; Stuler, J.: ¬The Library of Congress Classification as a knowledge base for automatic subject categorization (2001) 0.10

0.1015089 = product of:
  0.1691815 = sum of:
    0.023487754 = weight(_text_:of in 1567) [ClassicSimilarity], result of:
      0.023487754 = score(doc=1567,freq=10.0), product of:
        0.07599624 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04859849 = queryNorm
        0.3090647 = fieldWeight in 1567, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0625 = fieldNorm(doc=1567)
    0.09517398 = weight(_text_:subject in 1567) [ClassicSimilarity], result of:
      0.09517398 = score(doc=1567,freq=6.0), product of:
        0.17381717 = queryWeight, product of:
          3.576596 = idf(docFreq=3361, maxDocs=44218)
          0.04859849 = queryNorm
        0.5475522 = fieldWeight in 1567, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.576596 = idf(docFreq=3361, maxDocs=44218)
          0.0625 = fieldNorm(doc=1567)
    0.05051976 = product of:
      0.10103952 = sum of:
        0.10103952 = weight(_text_:headings in 1567) [ClassicSimilarity], result of:
          0.10103952 = score(doc=1567,freq=2.0), product of:
            0.23569997 = queryWeight, product of:
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.04859849 = queryNorm
            0.42867854 = fieldWeight in 1567, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.0625 = fieldNorm(doc=1567)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Abstract: This paper describes a set of experiments in adapting a subset of the Library of Congress Classification for use as a database for automatic classification. A high degree of concept integrity was obtained when subject headings were mapped from OCLC's WorldCat database and filtered using the log-likelihood statistic
Footnote: Paper, IFLA Preconference "Subject Retrieval in a Networked Environment", Dublin, OH, August 2001.

Godby, C.J.; Stuler, J.: ¬The Library of Congress Classification as a knowledge base for automatic subject categorization : subject access issues (2003) 0.10

0.09772705 = product of:
  0.16287841 = sum of:
    0.02251335 = weight(_text_:of in 3962) [ClassicSimilarity], result of:
      0.02251335 = score(doc=3962,freq=12.0), product of:
        0.07599624 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04859849 = queryNorm
        0.29624295 = fieldWeight in 3962, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3962)
    0.09616026 = weight(_text_:subject in 3962) [ClassicSimilarity], result of:
      0.09616026 = score(doc=3962,freq=8.0), product of:
        0.17381717 = queryWeight, product of:
          3.576596 = idf(docFreq=3361, maxDocs=44218)
          0.04859849 = queryNorm
        0.5532265 = fieldWeight in 3962, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.576596 = idf(docFreq=3361, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3962)
    0.04420479 = product of:
      0.08840958 = sum of:
        0.08840958 = weight(_text_:headings in 3962) [ClassicSimilarity], result of:
          0.08840958 = score(doc=3962,freq=2.0), product of:
            0.23569997 = queryWeight, product of:
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.04859849 = queryNorm
            0.37509373 = fieldWeight in 3962, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3962)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Abstract: This paper describes a set of experiments in adapting a subset of the Library of Congress Classification for use as a database for automatic classification. A high degree of concept integrity was obtained when subject headings were mapped from OCLC's WorldCat database and filtered using the log-likelihood statistic.
Source: Subject retrieval in a networked environment: Proceedings of the IFLA Satellite Meeting held in Dublin, OH, 14-16 August 2001 and sponsored by the IFLA Classification and Indexing Section, the IFLA Information Technology Section and OCLC. Ed.: I.C. McIlwaine

Frank, E.; Paynter, G.W.: Predicting Library of Congress Classifications from Library of Congress Subject Headings (2004) 0.08

0.08206725 = product of:
  0.13677874 = sum of:
    0.024912525 = weight(_text_:of in 2218) [ClassicSimilarity], result of:
      0.024912525 = score(doc=2218,freq=20.0), product of:
        0.07599624 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04859849 = queryNorm
        0.32781258 = fieldWeight in 2218, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=2218)
    0.058281917 = weight(_text_:subject in 2218) [ClassicSimilarity], result of:
      0.058281917 = score(doc=2218,freq=4.0), product of:
        0.17381717 = queryWeight, product of:
          3.576596 = idf(docFreq=3361, maxDocs=44218)
          0.04859849 = queryNorm
        0.33530587 = fieldWeight in 2218, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.576596 = idf(docFreq=3361, maxDocs=44218)
          0.046875 = fieldNorm(doc=2218)
    0.053584296 = product of:
      0.10716859 = sum of:
        0.10716859 = weight(_text_:headings in 2218) [ClassicSimilarity], result of:
          0.10716859 = score(doc=2218,freq=4.0), product of:
            0.23569997 = queryWeight, product of:
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.04859849 = queryNorm
            0.45468226 = fieldWeight in 2218, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.046875 = fieldNorm(doc=2218)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Abstract: This paper addresses the problem of automatically assigning a Library of Congress Classification (LCC) to a work given its set of Library of Congress Subject Headings (LCSH). LCCs are organized in a tree: The root node of this hierarchy comprises all possible topics, and leaf nodes correspond to the most specialized topic areas defined. We describe a procedure that, given a resource identified by its LCSH, automatically places that resource in the LCC hierarchy. The procedure uses machine learning techniques and training data from a large library catalog to learn a model that maps from sets of LCSH to classifications from the LCC tree. We present empirical results for our technique showing its accuracy an an independent collection of 50,000 LCSH/LCC pairs.
Source: Journal of the American Society for Information Science and technology. 55(2004) no.3, S.214-227

Dolin, R.; Agrawal, D.; El Abbadi, A.; Pearlman, J.: Using automated classification for summarizing and selecting heterogeneous information sources (1998) 0.08
```
0.07541096 = product of:
  0.12568493 = sum of:
    0.07496909 = weight(_text_:list in 1253) [ClassicSimilarity], result of:
      0.07496909 = score(doc=1253,freq=6.0), product of:
        0.25191793 = queryWeight, product of:
          5.183657 = idf(docFreq=673, maxDocs=44218)
          0.04859849 = queryNorm
        0.29759333 = fieldWeight in 1253, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          5.183657 = idf(docFreq=673, maxDocs=44218)
          0.0234375 = fieldNorm(doc=1253)
    0.021574879 = weight(_text_:of in 1253) [ClassicSimilarity], result of:
      0.021574879 = score(doc=1253,freq=60.0), product of:
        0.07599624 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04859849 = queryNorm
        0.28389403 = fieldWeight in 1253, product of:
          7.745967 = tf(freq=60.0), with freq of:
            60.0 = termFreq=60.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0234375 = fieldNorm(doc=1253)
    0.029140959 = weight(_text_:subject in 1253) [ClassicSimilarity], result of:
      0.029140959 = score(doc=1253,freq=4.0), product of:
        0.17381717 = queryWeight, product of:
          3.576596 = idf(docFreq=3361, maxDocs=44218)
          0.04859849 = queryNorm
        0.16765293 = fieldWeight in 1253, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.576596 = idf(docFreq=3361, maxDocs=44218)
          0.0234375 = fieldNorm(doc=1253)
  0.6 = coord(3/5)
```
Abstract

Information retrieval over the Internet increasingly requires the filtering of thousands of heterogeneous information sources. Important sources of information include not only traditional databases with structured data and queries, but also increasing numbers of non-traditional, semi- or unstructured collections such as Web sites, FTP archives, etc. As the number and variability of sources increases, new ways of automatically summarizing, discovering, and selecting collections relevant to a user's query are needed. One such method involves the use of classification schemes, such as the Library of Congress Classification (LCC), within which a collection may be represented based on its content, irrespective of the structure of the actual data or documents. For such a system to be useful in a large-scale distributed environment, it must be easy to use for both collection managers and users. As a result, it must be possible to classify documents automatically within a classification scheme. Furthermore, there must be a straightforward and intuitive interface with which the user may use the scheme to assist in information retrieval (IR). Our work with the Alexandria Digital Library (ADL) Project focuses on geo-referenced information, whether text, maps, aerial photographs, or satellite images. As a result, we have emphasized techniques which work with both text and non-text, such as combined textual and graphical queries, multi-dimensional indexing, and IR methods which are not solely dependent on words or phrases. Part of this work involves locating relevant online sources of information. In particular, we have designed and are currently testing aspects of an architecture, Pharos, which we believe will scale up to 1.000.000 heterogeneous sources. Pharos accommodates heterogeneity in content and format, both among multiple sources as well as within a single source. That is, we consider sources to include Web sites, FTP archives, newsgroups, and full digital libraries; all of these systems can include a wide variety of content and multimedia data formats. Pharos is based on the use of hierarchical classification schemes. These include not only well-known 'subject' (or 'concept') based schemes such as the Dewey Decimal System and the LCC, but also, for example, geographic classifications, which might be constructed as layers of smaller and smaller hierarchical longitude/latitude boxes. Pharos is designed to work with sophisticated queries which utilize subjects, geographical locations, temporal specifications, and other types of information domains. The Pharos architecture requires that hierarchically structured collection metadata be extracted so that it can be partitioned in such a way as to greatly enhance scalability. Automated classification is important to Pharos because it allows information sources to extract the requisite collection metadata automatically that must be distributed.
We are currently experimenting with newsgroups as collections. We have built an initial prototype which automatically classifies and summarizes newsgroups within the LCC. (The prototype can be tested below, and more details may be found at http://pharos.alexandria.ucsb.edu/). The prototype uses electronic library catalog records as a `training set' and Latent Semantic Indexing (LSI) for IR. We use the training set to build a rich set of classification terminology, and associate these terms with the relevant categories in the LCC. This association between terms and classification categories allows us to relate users' queries to nodes in the LCC so that users can select appropriate query categories. Newsgroups are similarly associated with classification categories. Pharos then matches the categories selected by users to relevant newsgroups. In principle, this approach allows users to exclude newsgroups that might have been selected based on an unintended meaning of a query term, and to include newsgroups with relevant content even though the exact query terms may not have been used. This work is extensible to other types of classification, including geographical, temporal, and image feature. Before discussing the methodology of the collection summarization and selection, we first present an online demonstration below. The demonstration is not intended to be a complete end-user interface. Rather, it is intended merely to offer a view of the process to suggest the "look and feel" of the prototype. The demo works as follows. First supply it with a few keywords of interest. The system will then use those terms to try to return to you the most relevant subject categories within the LCC. Assuming that the system recognizes any of your terms (it has over 400,000 terms indexed), it will give you a list of 15 LCC categories sorted by relevancy ranking. From there, you have two choices. The first choice, by clicking on the "News" links, is to get a list of newsgroups which the system has identified as relevant to the LCC category you select. The other choice, by clicking on the LCC ID links, is to enter the LCC hierarchy starting at the category of your choice and navigate the tree until you locate the best category for your query. From there, again, you can get a list of newsgroups by clicking on the "News" links. After having shown this demonstration to many people, we would like to suggest that you first give it easier examples before trying to break it. For example, "prostate cancer" (discussed below), "remote sensing", "investment banking", and "gershwin" all work reasonably well.

Ahmed, M.; Mukhopadhyay, M.; Mukhopadhyay, P.: Automated knowledge organization : AI ML based subject indexing system for libraries (2023) 0.07

0.07429945 = product of:
  0.123832405 = sum of:
    0.019695079 = weight(_text_:of in 977) [ClassicSimilarity], result of:
      0.019695079 = score(doc=977,freq=18.0), product of:
        0.07599624 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04859849 = queryNorm
        0.25915858 = fieldWeight in 977, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=977)
    0.059483737 = weight(_text_:subject in 977) [ClassicSimilarity], result of:
      0.059483737 = score(doc=977,freq=6.0), product of:
        0.17381717 = queryWeight, product of:
          3.576596 = idf(docFreq=3361, maxDocs=44218)
          0.04859849 = queryNorm
        0.34222013 = fieldWeight in 977, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.576596 = idf(docFreq=3361, maxDocs=44218)
          0.0390625 = fieldNorm(doc=977)
    0.044653583 = product of:
      0.08930717 = sum of:
        0.08930717 = weight(_text_:headings in 977) [ClassicSimilarity], result of:
          0.08930717 = score(doc=977,freq=4.0), product of:
            0.23569997 = queryWeight, product of:
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.04859849 = queryNorm
            0.3789019 = fieldWeight in 977, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.0390625 = fieldNorm(doc=977)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Abstract: The research study as reported here is an attempt to explore the possibilities of an AI/ML-based semi-automated indexing system in a library setup to handle large volumes of documents. It uses the Python virtual environment to install and configure an open source AI environment (named Annif) to feed the LOD (Linked Open Data) dataset of Library of Congress Subject Headings (LCSH) as a standard KOS (Knowledge Organisation System). The framework deployed the Turtle format of LCSH after cleaning the file with Skosify, applied an array of backend algorithms (namely TF-IDF, Omikuji, and NN-Ensemble) to measure relative performance, and selected Snowball as an analyser. The training of Annif was conducted with a large set of bibliographic records populated with subject descriptors (MARC tag 650$a) and indexed by trained LIS professionals. The training dataset is first treated with MarcEdit to export it in a format suitable for OpenRefine, and then in OpenRefine it undergoes many steps to produce a bibliographic record set suitable to train Annif. The framework, after training, has been tested with a bibliographic dataset to measure indexing efficiencies, and finally, the automated indexing framework is integrated with data wrangling software (OpenRefine) to produce suggested headings on a mass scale. The entire framework is based on open-source software, open datasets, and open standards.
Source: DESIDOC journal of library and information technology. 43(2023) no.1, S.45-54

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.07

0.06873383 = product of:
  0.11455637 = sum of:
    0.077187285 = product of:
      0.23156185 = sum of:
        0.23156185 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.23156185 = score(doc=562,freq=2.0), product of:
            0.41201854 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.04859849 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.017615816 = weight(_text_:of in 562) [ClassicSimilarity], result of:
      0.017615816 = score(doc=562,freq=10.0), product of:
        0.07599624 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04859849 = queryNorm
        0.23179851 = fieldWeight in 562, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.019753272 = product of:
      0.039506543 = sum of:
        0.039506543 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.039506543 = score(doc=562,freq=2.0), product of:
            0.17018363 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04859849 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Abstract: Document representations for text classification are typically based on the classical Bag-Of-Words paradigm. This approach comes with deficiencies that motivate the integration of features on a higher semantic level than single words. In this paper we propose an enhancement of the classical document representation through concepts extracted from background knowledge. Boosting is used for actual classification. Experimental evaluations on two well known text corpora support our approach through consistent improvement of the results.
Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32
Source: Proceedings of the 4th IEEE International Conference on Data Mining (ICDM 2004), 1-4 November 2004, Brighton, UK

Wartena, C.; Sommer, M.: Automatic classification of scientific records using the German Subject Heading Authority File (SWD) (2012) 0.07

0.06769939 = product of:
  0.11283231 = sum of:
    0.02177373 = weight(_text_:of in 472) [ClassicSimilarity], result of:
      0.02177373 = score(doc=472,freq=22.0), product of:
        0.07599624 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04859849 = queryNorm
        0.28651062 = fieldWeight in 472, product of:
          4.690416 = tf(freq=22.0), with freq of:
            22.0 = termFreq=22.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=472)
    0.059483737 = weight(_text_:subject in 472) [ClassicSimilarity], result of:
      0.059483737 = score(doc=472,freq=6.0), product of:
        0.17381717 = queryWeight, product of:
          3.576596 = idf(docFreq=3361, maxDocs=44218)
          0.04859849 = queryNorm
        0.34222013 = fieldWeight in 472, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.576596 = idf(docFreq=3361, maxDocs=44218)
          0.0390625 = fieldNorm(doc=472)
    0.03157485 = product of:
      0.0631497 = sum of:
        0.0631497 = weight(_text_:headings in 472) [ClassicSimilarity], result of:
          0.0631497 = score(doc=472,freq=2.0), product of:
            0.23569997 = queryWeight, product of:
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.04859849 = queryNorm
            0.2679241 = fieldWeight in 472, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.0390625 = fieldNorm(doc=472)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Abstract: The following paper deals with an automatic text classification method which does not require training documents. For this method the German Subject Heading Authority File (SWD), provided by the linked data service of the German National Library is used. Recently the SWD was enriched with notations of the Dewey Decimal Classification (DDC). In consequence it became possible to utilize the subject headings as textual representations for the notations of the DDC. Basically, we we derive the classification of a text from the classification of the words in the text given by the thesaurus. The method was tested by classifying 3826 OAI-Records from 7 different repositories. Mean reciprocal rank and recall were chosen as evaluation measure. Direct comparison to a machine learning method has shown that this method is definitely competitive. Thus we can conclude that the enriched version of the SWD provides high quality information with a broad coverage for classification of German scientific articles.
Content: This work is partially based on the Bachelor thesis of Maike Sommer. Vgl. auch: http://sda2012.dke-research.de.
Source: Proceedings of the 2nd International Workshop on Semantic Digital Archives held in conjunction with the 16th Int. Conference on Theory and Practice of Digital Libraries (TPDL) on September 27, 2012 in Paphos, Cyprus [http://ceur-ws.org/Vol-912/proceedings.pdf]. Eds.: A. Mitschik et al

Larson, R.R.: Experiments in automatic Library of Congress Classification (1992) 0.06

0.063835 = product of:
  0.10639167 = sum of:
    0.027290303 = weight(_text_:of in 1054) [ClassicSimilarity], result of:
      0.027290303 = score(doc=1054,freq=24.0), product of:
        0.07599624 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04859849 = queryNorm
        0.3591007 = fieldWeight in 1054, product of:
          4.8989797 = tf(freq=24.0), with freq of:
            24.0 = termFreq=24.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=1054)
    0.041211538 = weight(_text_:subject in 1054) [ClassicSimilarity], result of:
      0.041211538 = score(doc=1054,freq=2.0), product of:
        0.17381717 = queryWeight, product of:
          3.576596 = idf(docFreq=3361, maxDocs=44218)
          0.04859849 = queryNorm
        0.23709705 = fieldWeight in 1054, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.576596 = idf(docFreq=3361, maxDocs=44218)
          0.046875 = fieldNorm(doc=1054)
    0.03788982 = product of:
      0.07577964 = sum of:
        0.07577964 = weight(_text_:headings in 1054) [ClassicSimilarity], result of:
          0.07577964 = score(doc=1054,freq=2.0), product of:
            0.23569997 = queryWeight, product of:
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.04859849 = queryNorm
            0.3215089 = fieldWeight in 1054, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.046875 = fieldNorm(doc=1054)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Abstract: This article presents the results of research into the automatic selection of Library of Congress Classification numbers based on the titles and subject headings in MARC records. The method used in this study was based on partial match retrieval techniques using various elements of new recors (i.e., those to be classified) as "queries", and a test database of classification clusters generated from previously classified MARC records. Sixty individual methods for automatic classification were tested on a set of 283 new records, using all combinations of four different partial match methods, five query types, and three representations of search terms. The results indicate that if the best method for a particular case can be determined, then up to 86% of the new records may be correctly classified. The single method with the best accuracy was able to select the correct classification for about 46% of the new records.
Source: Journal of the American Society for Information Science. 43(1992), S.130-148

Humphrey, S.M.; Névéol, A.; Browne, A.; Gobeil, J.; Ruch, P.; Darmoni, S.J.: Comparing a rule-based versus statistical system for automatic categorization of MEDLINE documents according to biomedical specialty (2009) 0.05
```
0.050691903 = product of:
  0.0844865 = sum of:
    0.0185687 = weight(_text_:of in 3300) [ClassicSimilarity], result of:
      0.0185687 = score(doc=3300,freq=16.0), product of:
        0.07599624 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04859849 = queryNorm
        0.24433708 = fieldWeight in 3300, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3300)
    0.034342952 = weight(_text_:subject in 3300) [ClassicSimilarity], result of:
      0.034342952 = score(doc=3300,freq=2.0), product of:
        0.17381717 = queryWeight, product of:
          3.576596 = idf(docFreq=3361, maxDocs=44218)
          0.04859849 = queryNorm
        0.19758089 = fieldWeight in 3300, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.576596 = idf(docFreq=3361, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3300)
    0.03157485 = product of:
      0.0631497 = sum of:
        0.0631497 = weight(_text_:headings in 3300) [ClassicSimilarity], result of:
          0.0631497 = score(doc=3300,freq=2.0), product of:
            0.23569997 = queryWeight, product of:
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.04859849 = queryNorm
            0.2679241 = fieldWeight in 3300, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3300)
      0.5 = coord(1/2)
  0.6 = coord(3/5)
```
Abstract

Automatic document categorization is an important research problem in Information Science and Natural Language Processing. Many applications, including, Word Sense Disambiguation and Information Retrieval in large collections, can benefit from such categorization. This paper focuses on automatic categorization of documents from the biomedical literature into broad discipline-based categories. Two different systems are described and contrasted: CISMeF, which uses rules based on human indexing of the documents by the Medical Subject Headings (MeSH) controlled vocabulary in order to assign metaterms (MTs), and Journal Descriptor Indexing (JDI), based on human categorization of about 4,000 journals and statistical associations between journal descriptors (JDs) and textwords in the documents. We evaluate and compare the performance of these systems against a gold standard of humanly assigned categories for 100 MEDLINE documents, using six measures selected from trec_eval. The results show that for five of the measures performance is comparable, and for one measure JDI is superior. We conclude that these results favor JDI, given the significantly greater intellectual overhead involved in human indexing and maintaining a rule base for mapping MeSH terms to MTs. We also note a JDI method that associates JDs with MeSH indexing rather than textwords, and it may be worthwhile to investigate whether this JDI method (statistical) and CISMeF (rule-based) might be combined and then evaluated showing they are complementary to one another.

Source

Journal of the American Society for Information Science and Technology. 60(2009) no.12, S.2530-2539
Fang, H.: Classifying research articles in multidisciplinary sciences journals into subject categories (2015) 0.05
```
0.049681935 = product of:
  0.12420484 = sum of:
    0.0270683 = weight(_text_:of in 2194) [ClassicSimilarity], result of:
      0.0270683 = score(doc=2194,freq=34.0), product of:
        0.07599624 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04859849 = queryNorm
        0.35617945 = fieldWeight in 2194, product of:
          5.8309517 = tf(freq=34.0), with freq of:
            34.0 = termFreq=34.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2194)
    0.097136535 = weight(_text_:subject in 2194) [ClassicSimilarity], result of:
      0.097136535 = score(doc=2194,freq=16.0), product of:
        0.17381717 = queryWeight, product of:
          3.576596 = idf(docFreq=3361, maxDocs=44218)
          0.04859849 = queryNorm
        0.55884314 = fieldWeight in 2194, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          3.576596 = idf(docFreq=3361, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2194)
  0.4 = coord(2/5)
```
Abstract

In the Thomson Reuters Web of Science database, the subject categories of a journal are applied to all articles in the journal. However, many articles in multidisciplinary Sciences journals may only be represented by a small number of subject categories. To provide more accurate information on the research areas of articles in such journals, we can classify articles in these journals into subject categories as defined by Web of Science based on their references. For an article in a multidisciplinary sciences journal, the method counts the subject categories in all of the article's references indexed by Web of Science, and uses the most numerous subject categories of the references to determine the most appropriate classification of the article. We used articles in an issue of Proceedings of the National Academy of Sciences (PNAS) to validate the correctness of the method by comparing the obtained results with the categories of the articles as defined by PNAS and their content. This study shows that the method provides more precise search results for the subject category of interest in bibliometric investigations through recognition of articles in multidisciplinary sciences journals whose work relates to a particular subject category.

Object

Web of science
Pech, G.; Delgado, C.; Sorella, S.P.: Classifying papers into subfields using Abstracts, Titles, Keywords and KeyWords Plus through pattern detection and optimization procedures : an application in Physics (2022) 0.05
```
0.049517494 = product of:
  0.123793736 = sum of:
    0.10202001 = weight(_text_:list in 744) [ClassicSimilarity], result of:
      0.10202001 = score(doc=744,freq=4.0), product of:
        0.25191793 = queryWeight, product of:
          5.183657 = idf(docFreq=673, maxDocs=44218)
          0.04859849 = queryNorm
        0.4049732 = fieldWeight in 744, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.183657 = idf(docFreq=673, maxDocs=44218)
          0.0390625 = fieldNorm(doc=744)
    0.02177373 = weight(_text_:of in 744) [ClassicSimilarity], result of:
      0.02177373 = score(doc=744,freq=22.0), product of:
        0.07599624 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04859849 = queryNorm
        0.28651062 = fieldWeight in 744, product of:
          4.690416 = tf(freq=22.0), with freq of:
            22.0 = termFreq=22.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=744)
  0.4 = coord(2/5)
```
Abstract

Classifying papers according to the fields of knowledge is critical to clearly understand the dynamics of scientific (sub)fields, their leading questions, and trends. Most studies rely on journal categories defined by popular databases such as WoS or Scopus, but some experts find that those categories may not correctly map the existing subfields nor identify the subfield of a specific article. This study addresses the classification problem using data from each paper (Abstract, Title, Keywords, and the KeyWords Plus) and the help of experts to identify the existing subfields and journals exclusive of each subfield. These "exclusive journals" are critical to obtain, through a pattern detection procedure that uses machine learning techniques (from software NVivo), a list of the frequent terms that are specific to each subfield. With that list of terms and with the help of optimization procedures, we can identify to which subfield each paper most likely belongs. This study can contribute to support scientific policy-makers, funding, and research institutions-via more accurate academic performance evaluations-, to support editors in their tasks to redefine the scopes of journals, and to support popular databases in their processes of refining categories.

Source

Journal of the Association for Information Science and Technology. 73(2022) no.11, S.1513-1528

Shafer, K.E.: Evaluating Scorpion results (1998) 0.04

0.044106636 = product of:
  0.11026659 = sum of:
    0.013130054 = weight(_text_:of in 1569) [ClassicSimilarity], result of:
      0.013130054 = score(doc=1569,freq=2.0), product of:
        0.07599624 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04859849 = queryNorm
        0.17277241 = fieldWeight in 1569, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.078125 = fieldNorm(doc=1569)
    0.097136535 = weight(_text_:subject in 1569) [ClassicSimilarity], result of:
      0.097136535 = score(doc=1569,freq=4.0), product of:
        0.17381717 = queryWeight, product of:
          3.576596 = idf(docFreq=3361, maxDocs=44218)
          0.04859849 = queryNorm
        0.55884314 = fieldWeight in 1569, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.576596 = idf(docFreq=3361, maxDocs=44218)
          0.078125 = fieldNorm(doc=1569)
  0.4 = coord(2/5)

Abstract: Scorpion is a research project at OCLC that builds tools for automatic subject assignment by combining library science and information retrieval techniques. A thesis of Scorpion is that the Dewey Decimal Classification (Dewey) can be used to perform automatic subject assignment for electronic items.

Mukhopadhyay, S.; Peng, S.; Raje, R.; Palakal, M.; Mostafa, J.: Multi-agent information classification using dynamic acquaintance lists (2003) 0.04
```
0.044080377 = product of:
  0.11020094 = sum of:
    0.08656685 = weight(_text_:list in 1755) [ClassicSimilarity], result of:
      0.08656685 = score(doc=1755,freq=2.0), product of:
        0.25191793 = queryWeight, product of:
          5.183657 = idf(docFreq=673, maxDocs=44218)
          0.04859849 = queryNorm
        0.34363115 = fieldWeight in 1755, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.183657 = idf(docFreq=673, maxDocs=44218)
          0.046875 = fieldNorm(doc=1755)
    0.023634095 = weight(_text_:of in 1755) [ClassicSimilarity], result of:
      0.023634095 = score(doc=1755,freq=18.0), product of:
        0.07599624 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04859849 = queryNorm
        0.3109903 = fieldWeight in 1755, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=1755)
  0.4 = coord(2/5)
```
Abstract

There has been considerable interest in recent years in providing automated information services, such as information classification, by means of a society of collaborative agents. These agents augment each other's knowledge structures (e.g., the vocabularies) and assist each other in providing efficient information services to a human user. However, when the number of agents present in the society increases, exhaustive communication and collaboration among agents result in a [arge communication overhead and increased delays in response time. This paper introduces a method to achieve selective interaction with a relatively small number of potentially useful agents, based an simple agent modeling and acquaintance lists. The key idea presented here is that the acquaintance list of an agent, representing a small number of other agents to be collaborated with, is dynamically adjusted. The best acquaintances are automatically discovered using a learning algorithm, based an the past history of collaboration. Experimental results are presented to demonstrate that such dynamically learned acquaintance lists can lead to high quality of classification, while significantly reducing the delay in response time.

Source

Journal of the American Society for Information Science and technology. 54(2003) no.10, S.966-975

Shafer, K.E.: Automatic Subject Assignment via the Scorpion System (2001) 0.04

0.041882206 = product of:
  0.10470551 = sum of:
    0.022282438 = weight(_text_:of in 1043) [ClassicSimilarity], result of:
      0.022282438 = score(doc=1043,freq=4.0), product of:
        0.07599624 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04859849 = queryNorm
        0.2932045 = fieldWeight in 1043, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.09375 = fieldNorm(doc=1043)
    0.082423076 = weight(_text_:subject in 1043) [ClassicSimilarity], result of:
      0.082423076 = score(doc=1043,freq=2.0), product of:
        0.17381717 = queryWeight, product of:
          3.576596 = idf(docFreq=3361, maxDocs=44218)
          0.04859849 = queryNorm
        0.4741941 = fieldWeight in 1043, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.576596 = idf(docFreq=3361, maxDocs=44218)
          0.09375 = fieldNorm(doc=1043)
  0.4 = coord(2/5)

Footnote: Teil eines Themenheftes: OCLC and the Internet: An Historical Overview of Research Activities, 1990-1999 - Part I
Source: Journal of library administration. 34(2001) nos.1/2, S.187-189

Golub, K.; Hansson, J.; Soergel, D.; Tudhope, D.: Managing classification in libraries : a methodological outline for evaluating automatic subject indexing and classification in Swedish library catalogues (2015) 0.04
```
0.041076567 = product of:
  0.10269141 = sum of:
    0.0185687 = weight(_text_:of in 2300) [ClassicSimilarity], result of:
      0.0185687 = score(doc=2300,freq=16.0), product of:
        0.07599624 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04859849 = queryNorm
        0.24433708 = fieldWeight in 2300, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2300)
    0.08412271 = weight(_text_:subject in 2300) [ClassicSimilarity], result of:
      0.08412271 = score(doc=2300,freq=12.0), product of:
        0.17381717 = queryWeight, product of:
          3.576596 = idf(docFreq=3361, maxDocs=44218)
          0.04859849 = queryNorm
        0.48397237 = fieldWeight in 2300, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          3.576596 = idf(docFreq=3361, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2300)
  0.4 = coord(2/5)
```
Abstract

Subject terms play a crucial role in resource discovery but require substantial effort to produce. Automatic subject classification and indexing address problems of scale and sustainability and can be used to enrich existing bibliographic records, establish more connections across and between resources and enhance consistency of bibliographic data. The paper aims to put forward a complex methodological framework to evaluate automatic classification tools of Swedish textual documents based on the Dewey Decimal Classification (DDC) recently introduced to Swedish libraries. Three major complementary approaches are suggested: a quality-built gold standard, retrieval effects, domain analysis. The gold standard is built based on input from at least two catalogue librarians, end-users expert in the subject, end users inexperienced in the subject and automated tools. Retrieval effects are studied through a combination of assigned and free tasks, including factual and comprehensive types. The study also takes into consideration the different role and character of subject terms in various knowledge domains, such as scientific disciplines. As a theoretical framework, domain analysis is used and applied in relation to the implementation of DDC in Swedish libraries and chosen domains of knowledge within the DDC itself.

Source

Classification and authority control: expanding resource discovery: proceedings of the International UDC Seminar 2015, 29-30 October 2015, Lisbon, Portugal. Eds.: Slavic, A. u. M.I. Cordeiro
Chung, Y.-M.; Noh, Y.-H.: Developing a specialized directory system by automatically classifying Web documents (2003) 0.04
```
0.04001556 = product of:
  0.10003889 = sum of:
    0.017615816 = weight(_text_:of in 1566) [ClassicSimilarity], result of:
      0.017615816 = score(doc=1566,freq=10.0), product of:
        0.07599624 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04859849 = queryNorm
        0.23179851 = fieldWeight in 1566, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=1566)
    0.082423076 = weight(_text_:subject in 1566) [ClassicSimilarity], result of:
      0.082423076 = score(doc=1566,freq=8.0), product of:
        0.17381717 = queryWeight, product of:
          3.576596 = idf(docFreq=3361, maxDocs=44218)
          0.04859849 = queryNorm
        0.4741941 = fieldWeight in 1566, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.576596 = idf(docFreq=3361, maxDocs=44218)
          0.046875 = fieldNorm(doc=1566)
  0.4 = coord(2/5)
```
Abstract

This study developed a specialized directory system using an automatic classification technique. Economics was selected as the subject field for the classification experiments with Web documents. The classification scheme of the directory follows the DDC, and subject terms representing each class number or subject category were selected from the DDC table to construct a representative term dictionary. In collecting and classifying the Web documents, various strategies were tested in order to find the optimal thresholds. In the classification experiments, Web documents in economics were classified into a total of 757 hierarchical subject categories built from the DDC scheme. The first and second experiments using the representative term dictionary resulted in relatively high precision ratios of 77 and 60%, respectively. The third experiment employing a machine learning-based k-nearest neighbours (kNN) classifier in a closed experimental setting achieved a precision ratio of 96%. This implies that it is possible to enhance the classification performance by applying a hybrid method combining a dictionary-based technique and a kNN classifier

Source

Journal of information science. 29(2003) no.2, S.117-126

Search (175 results, page 1 of 9)

Authors

Years

Languages

Types

Themes

Subjects