Search (64 results, page 1 of 4)

Yi, K.: Automatic text classification using library classification schemes : trends, issues and challenges (2007) 0.10

0.103261515 = product of:
  0.13768202 = sum of:
    0.060314562 = weight(_text_:digital in 2560) [ClassicSimilarity], result of:
      0.060314562 = score(doc=2560,freq=2.0), product of:
        0.19770671 = queryWeight, product of:
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.050121464 = queryNorm
        0.30507088 = fieldWeight in 2560, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2560)
    0.053599782 = weight(_text_:library in 2560) [ClassicSimilarity], result of:
      0.053599782 = score(doc=2560,freq=8.0), product of:
        0.1317883 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.050121464 = queryNorm
        0.40671125 = fieldWeight in 2560, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2560)
    0.023767682 = product of:
      0.047535364 = sum of:
        0.047535364 = weight(_text_:22 in 2560) [ClassicSimilarity], result of:
          0.047535364 = score(doc=2560,freq=2.0), product of:
            0.17551683 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050121464 = queryNorm
            0.2708308 = fieldWeight in 2560, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2560)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: The proliferation of digital resources and their integration into a traditional library setting has created a pressing need for an automated tool that organizes textual information based on library classification schemes. Automated text classification is a research field of developing tools, methods, and models to automate text classification. This article describes the current popular approach for text classification and major text classification projects and applications that are based on library classification schemes. Related issues and challenges are discussed, and a number of considerations for the challenges are examined.
Date: 22. 9.2008 18:31:54

Yi, K.: Challenges in automated classification using library classification schemes (2006) 0.07

0.065093905 = product of:
  0.13018781 = sum of:
    0.068930924 = weight(_text_:digital in 5810) [ClassicSimilarity], result of:
      0.068930924 = score(doc=5810,freq=2.0), product of:
        0.19770671 = queryWeight, product of:
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.050121464 = queryNorm
        0.34865242 = fieldWeight in 5810, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.0625 = fieldNorm(doc=5810)
    0.061256893 = weight(_text_:library in 5810) [ClassicSimilarity], result of:
      0.061256893 = score(doc=5810,freq=8.0), product of:
        0.1317883 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.050121464 = queryNorm
        0.46481284 = fieldWeight in 5810, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0625 = fieldNorm(doc=5810)
  0.5 = coord(2/4)

Abstract: A major library classification scheme has long been standard classification framework for information sources in traditional library environment, and text classification (TC) becomes a popular and attractive tool of organizing digital information. This paper gives an overview of previous projects and studies on TC using major library classification schemes, and summarizes a discussion of TC research challenges.

Koch, T.: Experiments with automatic classification of WAIS databases and indexing of WWW : some results from the Nordic WAIS/WWW project (1994) 0.05

0.053482536 = product of:
  0.10696507 = sum of:
    0.03790077 = weight(_text_:library in 7209) [ClassicSimilarity], result of:
      0.03790077 = score(doc=7209,freq=4.0), product of:
        0.1317883 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.050121464 = queryNorm
        0.28758827 = fieldWeight in 7209, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7209)
    0.069064304 = product of:
      0.13812861 = sum of:
        0.13812861 = weight(_text_:project in 7209) [ClassicSimilarity], result of:
          0.13812861 = score(doc=7209,freq=8.0), product of:
            0.21156175 = queryWeight, product of:
              4.220981 = idf(docFreq=1764, maxDocs=44218)
              0.050121464 = queryNorm
            0.6528997 = fieldWeight in 7209, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              4.220981 = idf(docFreq=1764, maxDocs=44218)
              0.0546875 = fieldNorm(doc=7209)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: The Nordic WAIS/WWW project sponsored by NORDINFO is a joint project between Lund University Library and the National Technological Library of Denmark. It aims to improve the existing networked information discovery and retrieval tools Wide Area Information System (WAIS) and World Wide Web (WWW), and to move towards unifying WWW and WAIS. Details current results focusing on the WAIS side of the project. Describes research into automatic indexing and classification of WAIS sources, development of an orientation tool for WAIS, and development of a WAIS index of WWW resources

Dolin, R.; Agrawal, D.; El Abbadi, A.; Pearlman, J.: Using automated classification for summarizing and selecting heterogeneous information sources (1998) 0.05
```
0.053437054 = product of:
  0.0712494 = sum of:
    0.036556147 = weight(_text_:digital in 1253) [ClassicSimilarity], result of:
      0.036556147 = score(doc=1253,freq=4.0), product of:
        0.19770671 = queryWeight, product of:
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.050121464 = queryNorm
        0.18490088 = fieldWeight in 1253, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.0234375 = fieldNorm(doc=1253)
    0.019893762 = weight(_text_:library in 1253) [ClassicSimilarity], result of:
      0.019893762 = score(doc=1253,freq=6.0), product of:
        0.1317883 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.050121464 = queryNorm
        0.15095241 = fieldWeight in 1253, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0234375 = fieldNorm(doc=1253)
    0.014799493 = product of:
      0.029598987 = sum of:
        0.029598987 = weight(_text_:project in 1253) [ClassicSimilarity], result of:
          0.029598987 = score(doc=1253,freq=2.0), product of:
            0.21156175 = queryWeight, product of:
              4.220981 = idf(docFreq=1764, maxDocs=44218)
              0.050121464 = queryNorm
            0.13990708 = fieldWeight in 1253, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.220981 = idf(docFreq=1764, maxDocs=44218)
              0.0234375 = fieldNorm(doc=1253)
      0.5 = coord(1/2)
  0.75 = coord(3/4)
```
Abstract

Information retrieval over the Internet increasingly requires the filtering of thousands of heterogeneous information sources. Important sources of information include not only traditional databases with structured data and queries, but also increasing numbers of non-traditional, semi- or unstructured collections such as Web sites, FTP archives, etc. As the number and variability of sources increases, new ways of automatically summarizing, discovering, and selecting collections relevant to a user's query are needed. One such method involves the use of classification schemes, such as the Library of Congress Classification (LCC), within which a collection may be represented based on its content, irrespective of the structure of the actual data or documents. For such a system to be useful in a large-scale distributed environment, it must be easy to use for both collection managers and users. As a result, it must be possible to classify documents automatically within a classification scheme. Furthermore, there must be a straightforward and intuitive interface with which the user may use the scheme to assist in information retrieval (IR). Our work with the Alexandria Digital Library (ADL) Project focuses on geo-referenced information, whether text, maps, aerial photographs, or satellite images. As a result, we have emphasized techniques which work with both text and non-text, such as combined textual and graphical queries, multi-dimensional indexing, and IR methods which are not solely dependent on words or phrases. Part of this work involves locating relevant online sources of information. In particular, we have designed and are currently testing aspects of an architecture, Pharos, which we believe will scale up to 1.000.000 heterogeneous sources. Pharos accommodates heterogeneity in content and format, both among multiple sources as well as within a single source. That is, we consider sources to include Web sites, FTP archives, newsgroups, and full digital libraries; all of these systems can include a wide variety of content and multimedia data formats. Pharos is based on the use of hierarchical classification schemes. These include not only well-known 'subject' (or 'concept') based schemes such as the Dewey Decimal System and the LCC, but also, for example, geographic classifications, which might be constructed as layers of smaller and smaller hierarchical longitude/latitude boxes. Pharos is designed to work with sophisticated queries which utilize subjects, geographical locations, temporal specifications, and other types of information domains. The Pharos architecture requires that hierarchically structured collection metadata be extracted so that it can be partitioned in such a way as to greatly enhance scalability. Automated classification is important to Pharos because it allows information sources to extract the requisite collection metadata automatically that must be distributed.
We are currently experimenting with newsgroups as collections. We have built an initial prototype which automatically classifies and summarizes newsgroups within the LCC. (The prototype can be tested below, and more details may be found at http://pharos.alexandria.ucsb.edu/). The prototype uses electronic library catalog records as a `training set' and Latent Semantic Indexing (LSI) for IR. We use the training set to build a rich set of classification terminology, and associate these terms with the relevant categories in the LCC. This association between terms and classification categories allows us to relate users' queries to nodes in the LCC so that users can select appropriate query categories. Newsgroups are similarly associated with classification categories. Pharos then matches the categories selected by users to relevant newsgroups. In principle, this approach allows users to exclude newsgroups that might have been selected based on an unintended meaning of a query term, and to include newsgroups with relevant content even though the exact query terms may not have been used. This work is extensible to other types of classification, including geographical, temporal, and image feature. Before discussing the methodology of the collection summarization and selection, we first present an online demonstration below. The demonstration is not intended to be a complete end-user interface. Rather, it is intended merely to offer a view of the process to suggest the "look and feel" of the prototype. The demo works as follows. First supply it with a few keywords of interest. The system will then use those terms to try to return to you the most relevant subject categories within the LCC. Assuming that the system recognizes any of your terms (it has over 400,000 terms indexed), it will give you a list of 15 LCC categories sorted by relevancy ranking. From there, you have two choices. The first choice, by clicking on the "News" links, is to get a list of newsgroups which the system has identified as relevant to the LCC category you select. The other choice, by clicking on the LCC ID links, is to enter the LCC hierarchy starting at the category of your choice and navigate the tree until you locate the best category for your query. From there, again, you can get a list of newsgroups by clicking on the "News" links. After having shown this demonstration to many people, we would like to suggest that you first give it easier examples before trying to break it. For example, "prostate cancer" (discussed below), "remote sensing", "investment banking", and "gershwin" all work reasonably well.

Koch, T.; Vizine-Goetz, D.: DDC and knowledge organization in the digital library : Research and development. Demonstration pages (1999) 0.05

0.052799337 = product of:
  0.10559867 = sum of:
    0.073112294 = weight(_text_:digital in 942) [ClassicSimilarity], result of:
      0.073112294 = score(doc=942,freq=4.0), product of:
        0.19770671 = queryWeight, product of:
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.050121464 = queryNorm
        0.36980176 = fieldWeight in 942, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.046875 = fieldNorm(doc=942)
    0.032486375 = weight(_text_:library in 942) [ClassicSimilarity], result of:
      0.032486375 = score(doc=942,freq=4.0), product of:
        0.1317883 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.050121464 = queryNorm
        0.24650425 = fieldWeight in 942, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.046875 = fieldNorm(doc=942)
  0.5 = coord(2/4)

Content: 1. Increased Importance of Knowledge Organization in Internet Services - 2. Quality Subject Service and the role of classification - 3. Developing the DDC into a knowledge organization instrument for the digital library. OCLC site - 4. DESIRE's Barefoot Solutions of Automatic Classification - 5. Advanced Classification Solutions in DESIRE and CORC - 6. Future directions of research and development - 7. General references

Pong, J.Y.-H.; Kwok, R.C.-W.; Lau, R.Y.-K.; Hao, J.-X.; Wong, P.C.-C.: ¬A comparative study of two automatic document classification methods in a library setting (2008) 0.05
```
0.051808305 = product of:
  0.10361661 = sum of:
    0.043081827 = weight(_text_:digital in 2532) [ClassicSimilarity], result of:
      0.043081827 = score(doc=2532,freq=2.0), product of:
        0.19770671 = queryWeight, product of:
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.050121464 = queryNorm
        0.21790776 = fieldWeight in 2532, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2532)
    0.060534786 = weight(_text_:library in 2532) [ClassicSimilarity], result of:
      0.060534786 = score(doc=2532,freq=20.0), product of:
        0.1317883 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.050121464 = queryNorm
        0.45933354 = fieldWeight in 2532, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2532)
  0.5 = coord(2/4)
```
Abstract

In current library practice, trained human experts usually carry out document cataloguing and indexing based on a manual approach. With the explosive growth in the number of electronic documents available on the Internet and digital libraries, it is increasingly difficult for library practitioners to categorize both electronic documents and traditional library materials using just a manual approach. To improve the effectiveness and efficiency of document categorization at the library setting, more in-depth studies of using automatic document classification methods to categorize library items are required. Machine learning research has advanced rapidly in recent years. However, applying machine learning techniques to improve library practice is still a relatively unexplored area. This paper illustrates the design and development of a machine learning based automatic document classification system to alleviate the manual categorization problem encountered within the library setting. Two supervised machine learning algorithms have been tested. Our empirical tests show that supervised machine learning algorithms in general, and the k-nearest neighbours (KNN) algorithm in particular, can be used to develop an effective document classification system to enhance current library practice. Moreover, some concrete recommendations regarding how to practically apply the KNN algorithm to develop automatic document classification in a library setting are made. To our best knowledge, this is the first in-depth study of applying the KNN algorithm to automatic document classification based on the widely used LCC classification scheme adopted by many large libraries.

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.05

0.04998924 = product of:
  0.09997848 = sum of:
    0.079606175 = product of:
      0.23881851 = sum of:
        0.23881851 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.23881851 = score(doc=562,freq=2.0), product of:
            0.42493033 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.050121464 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.0203723 = product of:
      0.0407446 = sum of:
        0.0407446 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.0407446 = score(doc=562,freq=2.0), product of:
            0.17551683 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050121464 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Desale, S.K.; Kumbhar, R.: Research on automatic classification of documents in library environment : a literature review (2013) 0.05
```
0.048820432 = product of:
  0.097640865 = sum of:
    0.051698197 = weight(_text_:digital in 1071) [ClassicSimilarity], result of:
      0.051698197 = score(doc=1071,freq=2.0), product of:
        0.19770671 = queryWeight, product of:
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.050121464 = queryNorm
        0.26148933 = fieldWeight in 1071, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.046875 = fieldNorm(doc=1071)
    0.045942668 = weight(_text_:library in 1071) [ClassicSimilarity], result of:
      0.045942668 = score(doc=1071,freq=8.0), product of:
        0.1317883 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.050121464 = queryNorm
        0.34860963 = fieldWeight in 1071, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.046875 = fieldNorm(doc=1071)
  0.5 = coord(2/4)
```
Abstract

This paper aims to provide an overview of automatic classification research, which focuses on issues related to the automatic classification of documents in a library environment. The review covers literature published in mainstream library and information science studies. The review was done on literature published in both academic and professional LIS journals and other documents. This review reveals that basically three types of research are being done on automatic classification: 1) hierarchical classification using different library classification schemes, 2) text categorization and document categorization using different type of classifiers with or without using training documents, and 3) automatic bibliographic classification. Predominantly this research is directed towards solving problems of organization of digital documents in an online environment. However, very little research is devoted towards solving the problems of arrangement of physical documents.

Shafer, K.E.: Evaluating Scorpion results (1998) 0.04

0.0438086 = product of:
  0.0876172 = sum of:
    0.03828556 = weight(_text_:library in 1569) [ClassicSimilarity], result of:
      0.03828556 = score(doc=1569,freq=2.0), product of:
        0.1317883 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.050121464 = queryNorm
        0.29050803 = fieldWeight in 1569, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.078125 = fieldNorm(doc=1569)
    0.049331643 = product of:
      0.098663285 = sum of:
        0.098663285 = weight(_text_:project in 1569) [ClassicSimilarity], result of:
          0.098663285 = score(doc=1569,freq=2.0), product of:
            0.21156175 = queryWeight, product of:
              4.220981 = idf(docFreq=1764, maxDocs=44218)
              0.050121464 = queryNorm
            0.4663569 = fieldWeight in 1569, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.220981 = idf(docFreq=1764, maxDocs=44218)
              0.078125 = fieldNorm(doc=1569)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: Scorpion is a research project at OCLC that builds tools for automatic subject assignment by combining library science and information retrieval techniques. A thesis of Scorpion is that the Dewey Decimal Classification (Dewey) can be used to perform automatic subject assignment for electronic items.

Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.04

0.043343633 = product of:
  0.08668727 = sum of:
    0.045942668 = weight(_text_:library in 1046) [ClassicSimilarity], result of:
      0.045942668 = score(doc=1046,freq=2.0), product of:
        0.1317883 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.050121464 = queryNorm
        0.34860963 = fieldWeight in 1046, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.09375 = fieldNorm(doc=1046)
    0.0407446 = product of:
      0.0814892 = sum of:
        0.0814892 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
          0.0814892 = score(doc=1046,freq=2.0), product of:
            0.17551683 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050121464 = queryNorm
            0.46428138 = fieldWeight in 1046, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=1046)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Date: 5. 5.2003 14:17:22
Source: Journal of library administration. 34(2001) nos.3/4, S.221-228

Wartena, C.; Sommer, M.: Automatic classification of scientific records using the German Subject Heading Authority File (SWD) (2012) 0.04
```
0.040034845 = product of:
  0.08006969 = sum of:
    0.060926907 = weight(_text_:digital in 472) [ClassicSimilarity], result of:
      0.060926907 = score(doc=472,freq=4.0), product of:
        0.19770671 = queryWeight, product of:
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.050121464 = queryNorm
        0.3081681 = fieldWeight in 472, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.0390625 = fieldNorm(doc=472)
    0.01914278 = weight(_text_:library in 472) [ClassicSimilarity], result of:
      0.01914278 = score(doc=472,freq=2.0), product of:
        0.1317883 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.050121464 = queryNorm
        0.14525402 = fieldWeight in 472, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0390625 = fieldNorm(doc=472)
  0.5 = coord(2/4)
```
Abstract

The following paper deals with an automatic text classification method which does not require training documents. For this method the German Subject Heading Authority File (SWD), provided by the linked data service of the German National Library is used. Recently the SWD was enriched with notations of the Dewey Decimal Classification (DDC). In consequence it became possible to utilize the subject headings as textual representations for the notations of the DDC. Basically, we we derive the classification of a text from the classification of the words in the text given by the thesaurus. The method was tested by classifying 3826 OAI-Records from 7 different repositories. Mean reciprocal rank and recall were chosen as evaluation measure. Direct comparison to a machine learning method has shown that this method is definitely competitive. Thus we can conclude that the enriched version of the SWD provides high quality information with a broad coverage for classification of German scientific articles.

Source

Proceedings of the 2nd International Workshop on Semantic Digital Archives held in conjunction with the 16th Int. Conference on Theory and Practice of Digital Libraries (TPDL) on September 27, 2012 in Paphos, Cyprus [http://ceur-ws.org/Vol-912/proceedings.pdf]. Eds.: A. Mitschik et al

Egbert, J.; Biber, D.; Davies, M.: Developing a bottom-up, user-based method of web register classification (2015) 0.04

0.035819627 = product of:
  0.14327851 = sum of:
    0.14327851 = sum of:
      0.10253391 = weight(_text_:project in 2158) [ClassicSimilarity], result of:
        0.10253391 = score(doc=2158,freq=6.0), product of:
          0.21156175 = queryWeight, product of:
            4.220981 = idf(docFreq=1764, maxDocs=44218)
            0.050121464 = queryNorm
          0.48465237 = fieldWeight in 2158, product of:
            2.4494898 = tf(freq=6.0), with freq of:
              6.0 = termFreq=6.0
            4.220981 = idf(docFreq=1764, maxDocs=44218)
            0.046875 = fieldNorm(doc=2158)
      0.0407446 = weight(_text_:22 in 2158) [ClassicSimilarity], result of:
        0.0407446 = score(doc=2158,freq=2.0), product of:
          0.17551683 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.050121464 = queryNorm
          0.23214069 = fieldWeight in 2158, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=2158)
  0.25 = coord(1/4)

Abstract: This paper introduces a project to develop a reliable, cost-effective method for classifying Internet texts into register categories, and apply that approach to the analysis of a large corpus of web documents. To date, the project has proceeded in 2 key phases. First, we developed a bottom-up method for web register classification, asking end users of the web to utilize a decision-tree survey to code relevant situational characteristics of web documents, resulting in a bottom-up identification of register and subregister categories. We present details regarding the development and testing of this method through a series of 10 pilot studies. Then, in the second phase of our project we applied this procedure to a corpus of 53,000 web documents. An analysis of the results demonstrates the effectiveness of these methods for web register classification and provides a preliminary description of the types and distribution of registers on the web.
Date: 4. 8.2015 19:22:04

Dubin, D.: Dimensions and discriminability (1998) 0.04

0.03509323 = product of:
  0.07018646 = sum of:
    0.04641878 = weight(_text_:library in 2338) [ClassicSimilarity], result of:
      0.04641878 = score(doc=2338,freq=6.0), product of:
        0.1317883 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.050121464 = queryNorm
        0.3522223 = fieldWeight in 2338, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2338)
    0.023767682 = product of:
      0.047535364 = sum of:
        0.047535364 = weight(_text_:22 in 2338) [ClassicSimilarity], result of:
          0.047535364 = score(doc=2338,freq=2.0), product of:
            0.17551683 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050121464 = queryNorm
            0.2708308 = fieldWeight in 2338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2338)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Date: 22. 9.1997 19:16:05
Imprint: Urbana-Champaign, IL : Illinois University at Urbana-Champaign, Graduate School of Library and Information Science
Source: Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al

Reiner, U.: DDC-based search in the data of the German National Bibliography (2008) 0.03

0.02628516 = product of:
  0.05257032 = sum of:
    0.022971334 = weight(_text_:library in 2166) [ClassicSimilarity], result of:
      0.022971334 = score(doc=2166,freq=2.0), product of:
        0.1317883 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.050121464 = queryNorm
        0.17430481 = fieldWeight in 2166, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.046875 = fieldNorm(doc=2166)
    0.029598987 = product of:
      0.059197973 = sum of:
        0.059197973 = weight(_text_:project in 2166) [ClassicSimilarity], result of:
          0.059197973 = score(doc=2166,freq=2.0), product of:
            0.21156175 = queryWeight, product of:
              4.220981 = idf(docFreq=1764, maxDocs=44218)
              0.050121464 = queryNorm
            0.27981415 = fieldWeight in 2166, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.220981 = idf(docFreq=1764, maxDocs=44218)
              0.046875 = fieldNorm(doc=2166)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: In 2004, the German National Library began to classify title records of the German National Bibliography according to subject groups based on the divisions of the Dewey Decimal Classification (DDC). Since 2006, all titles of the main series of the German National Bibliography are classified in strict compliance with the DDC. On this basis, an enhanced DDC-based search can be realized - e.g., searching the data of the German National Bibliography for title records using number components of synthesized classification numbers or searching for DDC numbers using unclassified title records. This paper gives an account of the current research and development of the DDC-based search. The work is conducted in the VZG project Colibri that focuses on the automatic analysis of DDC-synthesized numbers and the automatic classification of bibliographic title records.

Automatic classification research at OCLC (2002) 0.03

0.025283787 = product of:
  0.050567575 = sum of:
    0.026799891 = weight(_text_:library in 1563) [ClassicSimilarity], result of:
      0.026799891 = score(doc=1563,freq=2.0), product of:
        0.1317883 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.050121464 = queryNorm
        0.20335563 = fieldWeight in 1563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1563)
    0.023767682 = product of:
      0.047535364 = sum of:
        0.047535364 = weight(_text_:22 in 1563) [ClassicSimilarity], result of:
          0.047535364 = score(doc=1563,freq=2.0), product of:
            0.17551683 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050121464 = queryNorm
            0.2708308 = fieldWeight in 1563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1563)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: OCLC enlists the cooperation of the world's libraries to make the written record of humankind's cultural heritage more accessible through electronic media. Part of this goal can be accomplished through the application of the principles of knowledge organization. We believe that cultural artifacts are effectively lost unless they are indexed, cataloged and classified. Accordingly, OCLC has developed products, sponsored research projects, and encouraged the participation in international standards communities whose outcome has been improved library classification schemes, cataloging productivity tools, and new proposals for the creation and maintenance of metadata. Though cataloging and classification requires expert intellectual effort, we recognize that at least some of the work must be automated if we hope to keep pace with cultural change
Date: 5. 5.2003 9:22:09

Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998) 0.03

0.025283787 = product of:
  0.050567575 = sum of:
    0.026799891 = weight(_text_:library in 1673) [ClassicSimilarity], result of:
      0.026799891 = score(doc=1673,freq=2.0), product of:
        0.1317883 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.050121464 = queryNorm
        0.20335563 = fieldWeight in 1673, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1673)
    0.023767682 = product of:
      0.047535364 = sum of:
        0.047535364 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
          0.047535364 = score(doc=1673,freq=2.0), product of:
            0.17551683 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050121464 = queryNorm
            0.2708308 = fieldWeight in 1673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1673)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: The Wolverhampton Web Library (WWLib) is a WWW search engine that provides access to UK based information. The experimental version developed in 1995, was a success but highlighted the need for a much higher degree of automation. An interesting feature of the experimental WWLib was that it organised information according to DDC. Discusses the advantages of classification and describes the automatic classifier that is being developed in Java as part of the new, fully automated WWLib
Date: 1. 8.1996 22:08:06

Kragelj, M.; Borstnar, M.K.: Automatic classification of older electronic texts into the Universal Decimal Classification-UDC (2021) 0.02
```
0.024889842 = product of:
  0.049779683 = sum of:
    0.034465462 = weight(_text_:digital in 175) [ClassicSimilarity], result of:
      0.034465462 = score(doc=175,freq=2.0), product of:
        0.19770671 = queryWeight, product of:
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.050121464 = queryNorm
        0.17432621 = fieldWeight in 175, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.03125 = fieldNorm(doc=175)
    0.015314223 = weight(_text_:library in 175) [ClassicSimilarity], result of:
      0.015314223 = score(doc=175,freq=2.0), product of:
        0.1317883 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.050121464 = queryNorm
        0.11620321 = fieldWeight in 175, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.03125 = fieldNorm(doc=175)
  0.5 = coord(2/4)
```
Abstract

Purpose The purpose of this study is to develop a model for automated classification of old digitised texts to the Universal Decimal Classification (UDC), using machine-learning methods. Design/methodology/approach The general research approach is inherent to design science research, in which the problem of UDC assignment of the old, digitised texts is addressed by developing a machine-learning classification model. A corpus of 70,000 scholarly texts, fully bibliographically processed by librarians, was used to train and test the model, which was used for classification of old texts on a corpus of 200,000 items. Human experts evaluated the performance of the model. Findings Results suggest that machine-learning models can correctly assign the UDC at some level for almost any scholarly text. Furthermore, the model can be recommended for the UDC assignment of older texts. Ten librarians corroborated this on 150 randomly selected texts. Research limitations/implications The main limitations of this study were unavailability of labelled older texts and the limited availability of librarians. Practical implications The classification model can provide a recommendation to the librarians during their classification work; furthermore, it can be implemented as an add-on to full-text search in the library databases. Social implications The proposed methodology supports librarians by recommending UDC classifiers, thus saving time in their daily work. By automatically classifying older texts, digital libraries can provide a better user experience by enabling structured searches. These contribute to making knowledge more widely available and useable. Originality/value These findings contribute to the field of automated classification of bibliographical information with the usage of full texts, especially in cases in which the texts are old, unstructured and in which archaic language and vocabulary are used.
Barthel, S.; Tönnies, S.; Balke, W.-T.: Large-scale experiments for mathematical document classification (2013) 0.02
```
0.01865498 = product of:
  0.07461992 = sum of:
    0.07461992 = weight(_text_:digital in 1056) [ClassicSimilarity], result of:
      0.07461992 = score(doc=1056,freq=6.0), product of:
        0.19770671 = queryWeight, product of:
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.050121464 = queryNorm
        0.37742734 = fieldWeight in 1056, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1056)
  0.25 = coord(1/4)
```
Abstract

The ever increasing amount of digitally available information is curse and blessing at the same time. On the one hand, users have increasingly large amounts of information at their fingertips. On the other hand, the assessment and refinement of web search results becomes more and more tiresome and difficult for non-experts in a domain. Therefore, established digital libraries offer specialized collections with a certain degree of quality. This quality can largely be attributed to the great effort invested into semantic enrichment of the provided documents e.g. by annotating their documents with respect to a domain-specific taxonomy. This process is still done manually in many domains, e.g. chemistry CAS, medicine MeSH, or mathematics MSC. But due to the growing amount of data, this manual task gets more and more time consuming and expensive. The only solution for this problem seems to employ automated classification algorithms, but from evaluations done in previous research, conclusions to a real world scenario are difficult to make. We therefore conducted a large scale feasibility study on a real world data set from one of the biggest mathematical digital libraries, i.e. Zentralblatt MATH, with special focus on its practical applicability.

Source

15th International Conference on Asia-Pacific Digital Libraries ICADL 2013. Bangalore, India. [to appear, 2013]

Rijsbergen, C.J. van: Automatic classification in information retrieval (1978) 0.02

0.015314223 = product of:
  0.061256893 = sum of:
    0.061256893 = weight(_text_:library in 2412) [ClassicSimilarity], result of:
      0.061256893 = score(doc=2412,freq=2.0), product of:
        0.1317883 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.050121464 = queryNorm
        0.46481284 = fieldWeight in 2412, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.125 = fieldNorm(doc=2412)
  0.25 = coord(1/4)

Source: Drexel library quarterly. 14(1978), S.75-89

Cui, H.; Heidorn, P.B.; Zhang, H.: ¬An approach to automatic classification of text for information retrieval (2002) 0.02

0.015078641 = product of:
  0.060314562 = sum of:
    0.060314562 = weight(_text_:digital in 174) [ClassicSimilarity], result of:
      0.060314562 = score(doc=174,freq=2.0), product of:
        0.19770671 = queryWeight, product of:
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.050121464 = queryNorm
        0.30507088 = fieldWeight in 174, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.0546875 = fieldNorm(doc=174)
  0.25 = coord(1/4)

Source: Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries : JCDL 2002 ; July 14 - 18, 2002, Portland, Oregon, USA. Ed. by Gary Marchionini

Search (64 results, page 1 of 4)

Authors

Years

Languages

Types

Themes