Search (173 results, page 1 of 9)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.10

0.10440264 = product of:
  0.23490594 = sum of:
    0.05238601 = product of:
      0.15715802 = sum of:
        0.15715802 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.15715802 = score(doc=562,freq=2.0), product of:
            0.2796316 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.03298316 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.011955625 = weight(_text_:of in 562) [ClassicSimilarity], result of:
      0.011955625 = score(doc=562,freq=10.0), product of:
        0.05157766 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03298316 = queryNorm
        0.23179851 = fieldWeight in 562, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.15715802 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.15715802 = score(doc=562,freq=2.0), product of:
        0.2796316 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.03298316 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.013406289 = product of:
      0.026812578 = sum of:
        0.026812578 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.026812578 = score(doc=562,freq=2.0), product of:
            0.11550141 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03298316 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
  0.44444445 = coord(4/9)

Abstract: Document representations for text classification are typically based on the classical Bag-Of-Words paradigm. This approach comes with deficiencies that motivate the integration of features on a higher semantic level than single words. In this paper we propose an enhancement of the classical document representation through concepts extracted from background knowledge. Boosting is used for actual classification. Experimental evaluations on two well known text corpora support our approach through consistent improvement of the results.
Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32
Source: Proceedings of the 4th IEEE International Conference on Data Mining (ICDM 2004), 1-4 November 2004, Brighton, UK

Frank, E.; Paynter, G.W.: Predicting Library of Congress Classifications from Library of Congress Subject Headings (2004) 0.08

0.08292788 = product of:
  0.18658774 = sum of:
    0.036366962 = product of:
      0.072733924 = sum of:
        0.072733924 = weight(_text_:headings in 2218) [ClassicSimilarity], result of:
          0.072733924 = score(doc=2218,freq=4.0), product of:
            0.15996648 = queryWeight, product of:
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.03298316 = queryNorm
            0.45468226 = fieldWeight in 2218, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.046875 = fieldNorm(doc=2218)
      0.5 = coord(1/2)
    0.033801798 = weight(_text_:library in 2218) [ClassicSimilarity], result of:
      0.033801798 = score(doc=2218,freq=10.0), product of:
        0.08672522 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.03298316 = queryNorm
        0.38975742 = fieldWeight in 2218, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.046875 = fieldNorm(doc=2218)
    0.016907806 = weight(_text_:of in 2218) [ClassicSimilarity], result of:
      0.016907806 = score(doc=2218,freq=20.0), product of:
        0.05157766 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03298316 = queryNorm
        0.32781258 = fieldWeight in 2218, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=2218)
    0.09951117 = weight(_text_:congress in 2218) [ClassicSimilarity], result of:
      0.09951117 = score(doc=2218,freq=8.0), product of:
        0.15733992 = queryWeight, product of:
          4.7703104 = idf(docFreq=1018, maxDocs=44218)
          0.03298316 = queryNorm
        0.63245976 = fieldWeight in 2218, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          4.7703104 = idf(docFreq=1018, maxDocs=44218)
          0.046875 = fieldNorm(doc=2218)
  0.44444445 = coord(4/9)

Abstract: This paper addresses the problem of automatically assigning a Library of Congress Classification (LCC) to a work given its set of Library of Congress Subject Headings (LCSH). LCCs are organized in a tree: The root node of this hierarchy comprises all possible topics, and leaf nodes correspond to the most specialized topic areas defined. We describe a procedure that, given a resource identified by its LCSH, automatically places that resource in the LCC hierarchy. The procedure uses machine learning techniques and training data from a large library catalog to learn a model that maps from sets of LCSH to classifications from the LCC tree. We present empirical results for our technique showing its accuracy an an independent collection of 50,000 LCSH/LCC pairs.
Source: Journal of the American Society for Information Science and technology. 55(2004) no.3, S.214-227

Godby, C. J.; Stuler, J.: ¬The Library of Congress Classification as a knowledge base for automatic subject categorization (2001) 0.08

0.07668984 = product of:
  0.17255214 = sum of:
    0.0342871 = product of:
      0.0685742 = sum of:
        0.0685742 = weight(_text_:headings in 1567) [ClassicSimilarity], result of:
          0.0685742 = score(doc=1567,freq=2.0), product of:
            0.15996648 = queryWeight, product of:
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.03298316 = queryNorm
            0.42867854 = fieldWeight in 1567, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.0625 = fieldNorm(doc=1567)
      0.5 = coord(1/2)
    0.02850418 = weight(_text_:library in 1567) [ClassicSimilarity], result of:
      0.02850418 = score(doc=1567,freq=4.0), product of:
        0.08672522 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.03298316 = queryNorm
        0.32867232 = fieldWeight in 1567, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0625 = fieldNorm(doc=1567)
    0.015940834 = weight(_text_:of in 1567) [ClassicSimilarity], result of:
      0.015940834 = score(doc=1567,freq=10.0), product of:
        0.05157766 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03298316 = queryNorm
        0.3090647 = fieldWeight in 1567, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0625 = fieldNorm(doc=1567)
    0.09382003 = weight(_text_:congress in 1567) [ClassicSimilarity], result of:
      0.09382003 = score(doc=1567,freq=4.0), product of:
        0.15733992 = queryWeight, product of:
          4.7703104 = idf(docFreq=1018, maxDocs=44218)
          0.03298316 = queryNorm
        0.5962888 = fieldWeight in 1567, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.7703104 = idf(docFreq=1018, maxDocs=44218)
          0.0625 = fieldNorm(doc=1567)
  0.44444445 = coord(4/9)

Abstract: This paper describes a set of experiments in adapting a subset of the Library of Congress Classification for use as a database for automatic classification. A high degree of concept integrity was obtained when subject headings were mapped from OCLC's WorldCat database and filtered using the log-likelihood statistic

Godby, C.J.; Stuler, J.: ¬The Library of Congress Classification as a knowledge base for automatic subject categorization : subject access issues (2003) 0.07

0.0676953 = product of:
  0.15231442 = sum of:
    0.030001212 = product of:
      0.060002424 = sum of:
        0.060002424 = weight(_text_:headings in 3962) [ClassicSimilarity], result of:
          0.060002424 = score(doc=3962,freq=2.0), product of:
            0.15996648 = queryWeight, product of:
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.03298316 = queryNorm
            0.37509373 = fieldWeight in 3962, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3962)
      0.5 = coord(1/2)
    0.024941156 = weight(_text_:library in 3962) [ClassicSimilarity], result of:
      0.024941156 = score(doc=3962,freq=4.0), product of:
        0.08672522 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.03298316 = queryNorm
        0.28758827 = fieldWeight in 3962, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3962)
    0.015279518 = weight(_text_:of in 3962) [ClassicSimilarity], result of:
      0.015279518 = score(doc=3962,freq=12.0), product of:
        0.05157766 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03298316 = queryNorm
        0.29624295 = fieldWeight in 3962, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3962)
    0.08209253 = weight(_text_:congress in 3962) [ClassicSimilarity], result of:
      0.08209253 = score(doc=3962,freq=4.0), product of:
        0.15733992 = queryWeight, product of:
          4.7703104 = idf(docFreq=1018, maxDocs=44218)
          0.03298316 = queryNorm
        0.5217527 = fieldWeight in 3962, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.7703104 = idf(docFreq=1018, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3962)
  0.44444445 = coord(4/9)

Abstract: This paper describes a set of experiments in adapting a subset of the Library of Congress Classification for use as a database for automatic classification. A high degree of concept integrity was obtained when subject headings were mapped from OCLC's WorldCat database and filtered using the log-likelihood statistic.
Source: Subject retrieval in a networked environment: Proceedings of the IFLA Satellite Meeting held in Dublin, OH, 14-16 August 2001 and sponsored by the IFLA Classification and Indexing Section, the IFLA Information Technology Section and OCLC. Ed.: I.C. McIlwaine

Larson, R.R.: Experiments in automatic Library of Congress Classification (1992) 0.06

0.060435575 = product of:
  0.13598004 = sum of:
    0.025715325 = product of:
      0.05143065 = sum of:
        0.05143065 = weight(_text_:headings in 1054) [ClassicSimilarity], result of:
          0.05143065 = score(doc=1054,freq=2.0), product of:
            0.15996648 = queryWeight, product of:
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.03298316 = queryNorm
            0.3215089 = fieldWeight in 1054, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.046875 = fieldNorm(doc=1054)
      0.5 = coord(1/2)
    0.021378135 = weight(_text_:library in 1054) [ClassicSimilarity], result of:
      0.021378135 = score(doc=1054,freq=4.0), product of:
        0.08672522 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.03298316 = queryNorm
        0.24650425 = fieldWeight in 1054, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.046875 = fieldNorm(doc=1054)
    0.018521573 = weight(_text_:of in 1054) [ClassicSimilarity], result of:
      0.018521573 = score(doc=1054,freq=24.0), product of:
        0.05157766 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03298316 = queryNorm
        0.3591007 = fieldWeight in 1054, product of:
          4.8989797 = tf(freq=24.0), with freq of:
            24.0 = termFreq=24.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=1054)
    0.07036502 = weight(_text_:congress in 1054) [ClassicSimilarity], result of:
      0.07036502 = score(doc=1054,freq=4.0), product of:
        0.15733992 = queryWeight, product of:
          4.7703104 = idf(docFreq=1018, maxDocs=44218)
          0.03298316 = queryNorm
        0.4472166 = fieldWeight in 1054, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.7703104 = idf(docFreq=1018, maxDocs=44218)
          0.046875 = fieldNorm(doc=1054)
  0.44444445 = coord(4/9)

Abstract: This article presents the results of research into the automatic selection of Library of Congress Classification numbers based on the titles and subject headings in MARC records. The method used in this study was based on partial match retrieval techniques using various elements of new recors (i.e., those to be classified) as "queries", and a test database of classification clusters generated from previously classified MARC records. Sixty individual methods for automatic classification were tested on a set of 283 new records, using all combinations of four different partial match methods, five query types, and three representations of search terms. The results indicate that if the best method for a particular case can be determined, then up to 86% of the new records may be correctly classified. The single method with the best accuracy was able to select the correct classification for about 46% of the new records.
Source: Journal of the American Society for Information Science. 43(1992), S.130-148

Dolin, R.; Agrawal, D.; El Abbadi, A.; Pearlman, J.: Using automated classification for summarizing and selecting heterogeneous information sources (1998) 0.05

0.050601836 = product of:
  0.11385413 = sum of:
    0.015116624 = weight(_text_:library in 316) [ClassicSimilarity], result of:
      0.015116624 = score(doc=316,freq=2.0), product of:
        0.08672522 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.03298316 = queryNorm
        0.17430481 = fieldWeight in 316, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.046875 = fieldNorm(doc=316)
    0.016907806 = weight(_text_:of in 316) [ClassicSimilarity], result of:
      0.016907806 = score(doc=316,freq=20.0), product of:
        0.05157766 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03298316 = queryNorm
        0.32781258 = fieldWeight in 316, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=316)
    0.049755584 = weight(_text_:congress in 316) [ClassicSimilarity], result of:
      0.049755584 = score(doc=316,freq=2.0), product of:
        0.15733992 = queryWeight, product of:
          4.7703104 = idf(docFreq=1018, maxDocs=44218)
          0.03298316 = queryNorm
        0.31622988 = fieldWeight in 316, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.7703104 = idf(docFreq=1018, maxDocs=44218)
          0.046875 = fieldNorm(doc=316)
    0.032074116 = product of:
      0.06414823 = sum of:
        0.06414823 = weight(_text_:etc in 316) [ClassicSimilarity], result of:
          0.06414823 = score(doc=316,freq=2.0), product of:
            0.17865302 = queryWeight, product of:
              5.4164915 = idf(docFreq=533, maxDocs=44218)
              0.03298316 = queryNorm
            0.35906604 = fieldWeight in 316, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4164915 = idf(docFreq=533, maxDocs=44218)
              0.046875 = fieldNorm(doc=316)
      0.5 = coord(1/2)
  0.44444445 = coord(4/9)

Abstract: Information retrieval over the Internet increasingly requires the filtering of thousands of heterogeneous information sources. Important sources of information include not only traditional databases with structured data and queries, but also increasing numbers of non-traditional, semi- or unstructured collections such as Web sites, FTP archives, etc. As the number and variability of sources increases, new ways of automatically summarizing, discovering, and selecting collections relevant to a user's query are needed. One such method involves the use of classification schemes, such as the Library of Congress Classification (LCC) [10], within which a collection may be represented based on its content, irrespective of the structure of the actual data or documents. For such a system to be useful in a large-scale distributed environment, it must be easy to use for both collection managers and users. As a result, it must be possible to classify documents automatically within a classification scheme. Furthermore, there must be a straightforward and intuitive interface with which the user may use the scheme to assist in information retrieval (IR).

Ahmed, M.; Mukhopadhyay, M.; Mukhopadhyay, P.: Automated knowledge organization : AI ML based subject indexing system for libraries (2023) 0.05

0.047535356 = product of:
  0.10695455 = sum of:
    0.030305803 = product of:
      0.060611606 = sum of:
        0.060611606 = weight(_text_:headings in 977) [ClassicSimilarity], result of:
          0.060611606 = score(doc=977,freq=4.0), product of:
            0.15996648 = queryWeight, product of:
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.03298316 = queryNorm
            0.3789019 = fieldWeight in 977, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.0390625 = fieldNorm(doc=977)
      0.5 = coord(1/2)
    0.02181897 = weight(_text_:library in 977) [ClassicSimilarity], result of:
      0.02181897 = score(doc=977,freq=6.0), product of:
        0.08672522 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.03298316 = queryNorm
        0.25158736 = fieldWeight in 977, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0390625 = fieldNorm(doc=977)
    0.013366793 = weight(_text_:of in 977) [ClassicSimilarity], result of:
      0.013366793 = score(doc=977,freq=18.0), product of:
        0.05157766 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03298316 = queryNorm
        0.25915858 = fieldWeight in 977, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=977)
    0.041462988 = weight(_text_:congress in 977) [ClassicSimilarity], result of:
      0.041462988 = score(doc=977,freq=2.0), product of:
        0.15733992 = queryWeight, product of:
          4.7703104 = idf(docFreq=1018, maxDocs=44218)
          0.03298316 = queryNorm
        0.26352492 = fieldWeight in 977, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.7703104 = idf(docFreq=1018, maxDocs=44218)
          0.0390625 = fieldNorm(doc=977)
  0.44444445 = coord(4/9)

Abstract: The research study as reported here is an attempt to explore the possibilities of an AI/ML-based semi-automated indexing system in a library setup to handle large volumes of documents. It uses the Python virtual environment to install and configure an open source AI environment (named Annif) to feed the LOD (Linked Open Data) dataset of Library of Congress Subject Headings (LCSH) as a standard KOS (Knowledge Organisation System). The framework deployed the Turtle format of LCSH after cleaning the file with Skosify, applied an array of backend algorithms (namely TF-IDF, Omikuji, and NN-Ensemble) to measure relative performance, and selected Snowball as an analyser. The training of Annif was conducted with a large set of bibliographic records populated with subject descriptors (MARC tag 650$a) and indexed by trained LIS professionals. The training dataset is first treated with MarcEdit to export it in a format suitable for OpenRefine, and then in OpenRefine it undergoes many steps to produce a bibliographic record set suitable to train Annif. The framework, after training, has been tested with a bibliographic dataset to measure indexing efficiencies, and finally, the automated indexing framework is integrated with data wrangling software (OpenRefine) to produce suggested headings on a mass scale. The entire framework is based on open-source software, open datasets, and open standards.
Source: DESIDOC journal of library and information technology. 43(2023) no.1, S.45-54

Dubin, D.: Dimensions and discriminability (1998) 0.04

0.041702982 = product of:
  0.09383171 = sum of:
    0.030001212 = product of:
      0.060002424 = sum of:
        0.060002424 = weight(_text_:headings in 2338) [ClassicSimilarity], result of:
          0.060002424 = score(doc=2338,freq=2.0), product of:
            0.15996648 = queryWeight, product of:
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.03298316 = queryNorm
            0.37509373 = fieldWeight in 2338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2338)
      0.5 = coord(1/2)
    0.030546555 = weight(_text_:library in 2338) [ClassicSimilarity], result of:
      0.030546555 = score(doc=2338,freq=6.0), product of:
        0.08672522 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.03298316 = queryNorm
        0.3522223 = fieldWeight in 2338, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2338)
    0.01764327 = weight(_text_:of in 2338) [ClassicSimilarity], result of:
      0.01764327 = score(doc=2338,freq=16.0), product of:
        0.05157766 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03298316 = queryNorm
        0.34207192 = fieldWeight in 2338, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2338)
    0.01564067 = product of:
      0.03128134 = sum of:
        0.03128134 = weight(_text_:22 in 2338) [ClassicSimilarity], result of:
          0.03128134 = score(doc=2338,freq=2.0), product of:
            0.11550141 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03298316 = queryNorm
            0.2708308 = fieldWeight in 2338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2338)
      0.5 = coord(1/2)
  0.44444445 = coord(4/9)

Abstract: Visualization interfaces can improve subject access by highlighting the inclusion of document representation components in similarity and discrimination relationships. Within a set of retrieved documents, what kinds of groupings can index terms and subject headings make explicit? The role of controlled vocabulary in classifying search output is examined
Date: 22. 9.1997 19:16:05
Imprint: Urbana-Champaign, IL : Illinois University at Urbana-Champaign, Graduate School of Library and Information Science
Source: Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al

Dolin, R.; Agrawal, D.; El Abbadi, A.; Pearlman, J.: Using automated classification for summarizing and selecting heterogeneous information sources (1998) 0.03
```
0.030510588 = product of:
  0.06864882 = sum of:
    0.013091382 = weight(_text_:library in 1253) [ClassicSimilarity], result of:
      0.013091382 = score(doc=1253,freq=6.0), product of:
        0.08672522 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.03298316 = queryNorm
        0.15095241 = fieldWeight in 1253, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0234375 = fieldNorm(doc=1253)
    0.014642591 = weight(_text_:of in 1253) [ClassicSimilarity], result of:
      0.014642591 = score(doc=1253,freq=60.0), product of:
        0.05157766 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03298316 = queryNorm
        0.28389403 = fieldWeight in 1253, product of:
          7.745967 = tf(freq=60.0), with freq of:
            60.0 = termFreq=60.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0234375 = fieldNorm(doc=1253)
    0.024877792 = weight(_text_:congress in 1253) [ClassicSimilarity], result of:
      0.024877792 = score(doc=1253,freq=2.0), product of:
        0.15733992 = queryWeight, product of:
          4.7703104 = idf(docFreq=1018, maxDocs=44218)
          0.03298316 = queryNorm
        0.15811494 = fieldWeight in 1253, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.7703104 = idf(docFreq=1018, maxDocs=44218)
          0.0234375 = fieldNorm(doc=1253)
    0.016037058 = product of:
      0.032074116 = sum of:
        0.032074116 = weight(_text_:etc in 1253) [ClassicSimilarity], result of:
          0.032074116 = score(doc=1253,freq=2.0), product of:
            0.17865302 = queryWeight, product of:
              5.4164915 = idf(docFreq=533, maxDocs=44218)
              0.03298316 = queryNorm
            0.17953302 = fieldWeight in 1253, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4164915 = idf(docFreq=533, maxDocs=44218)
              0.0234375 = fieldNorm(doc=1253)
      0.5 = coord(1/2)
  0.44444445 = coord(4/9)
```
Abstract

Information retrieval over the Internet increasingly requires the filtering of thousands of heterogeneous information sources. Important sources of information include not only traditional databases with structured data and queries, but also increasing numbers of non-traditional, semi- or unstructured collections such as Web sites, FTP archives, etc. As the number and variability of sources increases, new ways of automatically summarizing, discovering, and selecting collections relevant to a user's query are needed. One such method involves the use of classification schemes, such as the Library of Congress Classification (LCC), within which a collection may be represented based on its content, irrespective of the structure of the actual data or documents. For such a system to be useful in a large-scale distributed environment, it must be easy to use for both collection managers and users. As a result, it must be possible to classify documents automatically within a classification scheme. Furthermore, there must be a straightforward and intuitive interface with which the user may use the scheme to assist in information retrieval (IR). Our work with the Alexandria Digital Library (ADL) Project focuses on geo-referenced information, whether text, maps, aerial photographs, or satellite images. As a result, we have emphasized techniques which work with both text and non-text, such as combined textual and graphical queries, multi-dimensional indexing, and IR methods which are not solely dependent on words or phrases. Part of this work involves locating relevant online sources of information. In particular, we have designed and are currently testing aspects of an architecture, Pharos, which we believe will scale up to 1.000.000 heterogeneous sources. Pharos accommodates heterogeneity in content and format, both among multiple sources as well as within a single source. That is, we consider sources to include Web sites, FTP archives, newsgroups, and full digital libraries; all of these systems can include a wide variety of content and multimedia data formats. Pharos is based on the use of hierarchical classification schemes. These include not only well-known 'subject' (or 'concept') based schemes such as the Dewey Decimal System and the LCC, but also, for example, geographic classifications, which might be constructed as layers of smaller and smaller hierarchical longitude/latitude boxes. Pharos is designed to work with sophisticated queries which utilize subjects, geographical locations, temporal specifications, and other types of information domains. The Pharos architecture requires that hierarchically structured collection metadata be extracted so that it can be partitioned in such a way as to greatly enhance scalability. Automated classification is important to Pharos because it allows information sources to extract the requisite collection metadata automatically that must be distributed.
We are currently experimenting with newsgroups as collections. We have built an initial prototype which automatically classifies and summarizes newsgroups within the LCC. (The prototype can be tested below, and more details may be found at http://pharos.alexandria.ucsb.edu/). The prototype uses electronic library catalog records as a `training set' and Latent Semantic Indexing (LSI) for IR. We use the training set to build a rich set of classification terminology, and associate these terms with the relevant categories in the LCC. This association between terms and classification categories allows us to relate users' queries to nodes in the LCC so that users can select appropriate query categories. Newsgroups are similarly associated with classification categories. Pharos then matches the categories selected by users to relevant newsgroups. In principle, this approach allows users to exclude newsgroups that might have been selected based on an unintended meaning of a query term, and to include newsgroups with relevant content even though the exact query terms may not have been used. This work is extensible to other types of classification, including geographical, temporal, and image feature. Before discussing the methodology of the collection summarization and selection, we first present an online demonstration below. The demonstration is not intended to be a complete end-user interface. Rather, it is intended merely to offer a view of the process to suggest the "look and feel" of the prototype. The demo works as follows. First supply it with a few keywords of interest. The system will then use those terms to try to return to you the most relevant subject categories within the LCC. Assuming that the system recognizes any of your terms (it has over 400,000 terms indexed), it will give you a list of 15 LCC categories sorted by relevancy ranking. From there, you have two choices. The first choice, by clicking on the "News" links, is to get a list of newsgroups which the system has identified as relevant to the LCC category you select. The other choice, by clicking on the LCC ID links, is to enter the LCC hierarchy starting at the category of your choice and navigate the tree until you locate the best category for your query. From there, again, you can get a list of newsgroups by clicking on the "News" links. After having shown this demonstration to many people, we would like to suggest that you first give it easier examples before trying to break it. For example, "prostate cancer" (discussed below), "remote sensing", "investment banking", and "gershwin" all work reasonably well.

Prabowo, R.; Jackson, M.; Burden, P.; Knoell, H.-D.: Ontology-based automatic classification for the Web pages : design, implementation and evaluation (2002) 0.03

0.026339432 = product of:
  0.079018295 = sum of:
    0.015116624 = weight(_text_:library in 3383) [ClassicSimilarity], result of:
      0.015116624 = score(doc=3383,freq=2.0), product of:
        0.08672522 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.03298316 = queryNorm
        0.17430481 = fieldWeight in 3383, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.046875 = fieldNorm(doc=3383)
    0.014146087 = weight(_text_:of in 3383) [ClassicSimilarity], result of:
      0.014146087 = score(doc=3383,freq=14.0), product of:
        0.05157766 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03298316 = queryNorm
        0.2742677 = fieldWeight in 3383, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=3383)
    0.049755584 = weight(_text_:congress in 3383) [ClassicSimilarity], result of:
      0.049755584 = score(doc=3383,freq=2.0), product of:
        0.15733992 = queryWeight, product of:
          4.7703104 = idf(docFreq=1018, maxDocs=44218)
          0.03298316 = queryNorm
        0.31622988 = fieldWeight in 3383, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.7703104 = idf(docFreq=1018, maxDocs=44218)
          0.046875 = fieldNorm(doc=3383)
  0.33333334 = coord(3/9)

Abstract: In recent years, we have witnessed the continual growth in the use of ontologies in order to provide a mechanism to enable machine reasoning. This paper describes an automatic classifier, which focuses on the use of ontologies for classifying Web pages with respect to the Dewey Decimal Classification (DDC) and Library of Congress Classification (LCC) schemes. Firstly, we explain how these ontologies can be built in a modular fashion, and mapped into DDC and LCC. Secondly, we propose the formal definition of a DDC-LCC and an ontology-classification-scheme mapping. Thirdly, we explain the way the classifier uses these ontologies to assist classification. Finally, an experiment in which the accuracy of the classifier was evaluated is presented. The experiment shows that our approach results an improved classification in terms of accuracy. This improvement, however, comes at a cost in a low overage ratio due to the incompleteness of the ontologies used

Wang, J.: ¬An extensive study on automated Dewey Decimal Classification (2009) 0.03

0.025023457 = product of:
  0.075070366 = sum of:
    0.02181897 = weight(_text_:library in 3172) [ClassicSimilarity], result of:
      0.02181897 = score(doc=3172,freq=6.0), product of:
        0.08672522 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.03298316 = queryNorm
        0.25158736 = fieldWeight in 3172, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3172)
    0.0117884055 = weight(_text_:of in 3172) [ClassicSimilarity], result of:
      0.0117884055 = score(doc=3172,freq=14.0), product of:
        0.05157766 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03298316 = queryNorm
        0.22855641 = fieldWeight in 3172, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3172)
    0.041462988 = weight(_text_:congress in 3172) [ClassicSimilarity], result of:
      0.041462988 = score(doc=3172,freq=2.0), product of:
        0.15733992 = queryWeight, product of:
          4.7703104 = idf(docFreq=1018, maxDocs=44218)
          0.03298316 = queryNorm
        0.26352492 = fieldWeight in 3172, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.7703104 = idf(docFreq=1018, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3172)
  0.33333334 = coord(3/9)

Abstract: In this paper, we present a theoretical analysis and extensive experiments on the automated assignment of Dewey Decimal Classification (DDC) classes to bibliographic data with a supervised machine-learning approach. Library classification systems, such as the DDC, impose great obstacles on state-of-art text categorization (TC) technologies, including deep hierarchy, data sparseness, and skewed distribution. We first analyze statistically the document and category distributions over the DDC, and discuss the obstacles imposed by bibliographic corpora and library classification schemes on TC technology. To overcome these obstacles, we propose an innovative algorithm to reshape the DDC structure into a balanced virtual tree by balancing the category distribution and flattening the hierarchy. To improve the classification effectiveness to a level acceptable to real-world applications, we propose an interactive classification model that is able to predict a class of any depth within a limited number of user interactions. The experiments are conducted on a large bibliographic collection created by the Library of Congress within the science and technology domains over 10 years. With no more than three interactions, a classification accuracy of nearly 90% is achieved, thus providing a practical solution to the automatic bibliographic classification problem.
Source: Journal of the American Society for Information Science and Technology. 60(2009) no.11, S.2269-2286

Desale, S.K.; Kumbhar, R.: Research on automatic classification of documents in library environment : a literature review (2013) 0.02

0.0242042 = product of:
  0.0726126 = sum of:
    0.030233247 = weight(_text_:library in 1071) [ClassicSimilarity], result of:
      0.030233247 = score(doc=1071,freq=8.0), product of:
        0.08672522 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.03298316 = queryNorm
        0.34860963 = fieldWeight in 1071, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.046875 = fieldNorm(doc=1071)
    0.016040152 = weight(_text_:of in 1071) [ClassicSimilarity], result of:
      0.016040152 = score(doc=1071,freq=18.0), product of:
        0.05157766 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03298316 = queryNorm
        0.3109903 = fieldWeight in 1071, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=1071)
    0.026339196 = product of:
      0.05267839 = sum of:
        0.05267839 = weight(_text_:problems in 1071) [ClassicSimilarity], result of:
          0.05267839 = score(doc=1071,freq=4.0), product of:
            0.13613719 = queryWeight, product of:
              4.1274753 = idf(docFreq=1937, maxDocs=44218)
              0.03298316 = queryNorm
            0.3869508 = fieldWeight in 1071, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.1274753 = idf(docFreq=1937, maxDocs=44218)
              0.046875 = fieldNorm(doc=1071)
      0.5 = coord(1/2)
  0.33333334 = coord(3/9)

Abstract: This paper aims to provide an overview of automatic classification research, which focuses on issues related to the automatic classification of documents in a library environment. The review covers literature published in mainstream library and information science studies. The review was done on literature published in both academic and professional LIS journals and other documents. This review reveals that basically three types of research are being done on automatic classification: 1) hierarchical classification using different library classification schemes, 2) text categorization and document categorization using different type of classifiers with or without using training documents, and 3) automatic bibliographic classification. Predominantly this research is directed towards solving problems of organization of digital documents in an online environment. However, very little research is devoted towards solving the problems of arrangement of physical documents.

Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.02

0.02405621 = product of:
  0.072168626 = sum of:
    0.030233247 = weight(_text_:library in 1046) [ClassicSimilarity], result of:
      0.030233247 = score(doc=1046,freq=2.0), product of:
        0.08672522 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.03298316 = queryNorm
        0.34860963 = fieldWeight in 1046, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.09375 = fieldNorm(doc=1046)
    0.015122802 = weight(_text_:of in 1046) [ClassicSimilarity], result of:
      0.015122802 = score(doc=1046,freq=4.0), product of:
        0.05157766 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03298316 = queryNorm
        0.2932045 = fieldWeight in 1046, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.09375 = fieldNorm(doc=1046)
    0.026812578 = product of:
      0.053625155 = sum of:
        0.053625155 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
          0.053625155 = score(doc=1046,freq=2.0), product of:
            0.11550141 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03298316 = queryNorm
            0.46428138 = fieldWeight in 1046, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=1046)
      0.5 = coord(1/2)
  0.33333334 = coord(3/9)

Date: 5. 5.2003 14:17:22
Footnote: Teil eines Themenheftes: OCLC and the Internet: An Historical Overview of Research Activities, 1990-1999 - Part II
Source: Journal of library administration. 34(2001) nos.3/4, S.221-228

Ozmutlu, S.; Cosar, G.C.: Analyzing the results of automatic new topic identification (2008) 0.02

0.021702826 = product of:
  0.06510848 = sum of:
    0.015116624 = weight(_text_:library in 2604) [ClassicSimilarity], result of:
      0.015116624 = score(doc=2604,freq=2.0), product of:
        0.08672522 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.03298316 = queryNorm
        0.17430481 = fieldWeight in 2604, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.046875 = fieldNorm(doc=2604)
    0.017733058 = weight(_text_:of in 2604) [ClassicSimilarity], result of:
      0.017733058 = score(doc=2604,freq=22.0), product of:
        0.05157766 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03298316 = queryNorm
        0.34381276 = fieldWeight in 2604, product of:
          4.690416 = tf(freq=22.0), with freq of:
            22.0 = termFreq=22.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=2604)
    0.032258797 = product of:
      0.064517595 = sum of:
        0.064517595 = weight(_text_:problems in 2604) [ClassicSimilarity], result of:
          0.064517595 = score(doc=2604,freq=6.0), product of:
            0.13613719 = queryWeight, product of:
              4.1274753 = idf(docFreq=1937, maxDocs=44218)
              0.03298316 = queryNorm
            0.47391602 = fieldWeight in 2604, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.1274753 = idf(docFreq=1937, maxDocs=44218)
              0.046875 = fieldNorm(doc=2604)
      0.5 = coord(1/2)
  0.33333334 = coord(3/9)

Abstract: Purpose - Identification of topic changes within a user search session is a key issue in content analysis of search engine user queries. Recently, various studies have focused on new topic identification/session identification of search engine transaction logs, and several problems regarding the estimation of topic shifts and continuations were observed in these studies. This study aims to analyze the reasons for the problems that were encountered as a result of applying automatic new topic identification. Design/methodology/approach - Measures, such as cleaning the data of common words and analyzing the errors of automatic new topic identification, are applied to eliminate the problems in estimating topic shifts and continuations. Findings - The findings show that the resulting errors of automatic new topic identification have a pattern, and further research is required to improve the performance of automatic new topic identification. Originality/value - Improving the performance of automatic new topic identification would be valuable to search engine designers, so that they can develop new clustering and query recommendation algorithms, as well as custom-tailored graphical user interfaces for search engine users.
Source: Library hi tech. 26(2008) no.3, S.466-487

Cosh, K.J.; Burns, R.; Daniel, T.: Content clouds : classifying content in Web 2.0 (2008) 0.02

0.020771181 = product of:
  0.06231354 = sum of:
    0.015116624 = weight(_text_:library in 2013) [ClassicSimilarity], result of:
      0.015116624 = score(doc=2013,freq=2.0), product of:
        0.08672522 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.03298316 = queryNorm
        0.17430481 = fieldWeight in 2013, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.046875 = fieldNorm(doc=2013)
    0.015122802 = weight(_text_:of in 2013) [ClassicSimilarity], result of:
      0.015122802 = score(doc=2013,freq=16.0), product of:
        0.05157766 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03298316 = queryNorm
        0.2932045 = fieldWeight in 2013, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=2013)
    0.032074116 = product of:
      0.06414823 = sum of:
        0.06414823 = weight(_text_:etc in 2013) [ClassicSimilarity], result of:
          0.06414823 = score(doc=2013,freq=2.0), product of:
            0.17865302 = queryWeight, product of:
              5.4164915 = idf(docFreq=533, maxDocs=44218)
              0.03298316 = queryNorm
            0.35906604 = fieldWeight in 2013, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4164915 = idf(docFreq=533, maxDocs=44218)
              0.046875 = fieldNorm(doc=2013)
      0.5 = coord(1/2)
  0.33333334 = coord(3/9)

Abstract: Purpose - With increasing amounts of user generated content being produced electronically in the form of wikis, blogs, forums etc. the purpose of this paper is to investigate a new approach to classifying ad hoc content. Design/methodology/approach - The approach applies natural language processing (NLP) tools to automatically extract the content of some text, visualizing the results in a content cloud. Findings - Content clouds share the visual simplicity of a tag cloud, but display the details of an article at a different level of abstraction, providing a complimentary classification. Research limitations/implications - Provides the general approach to creating a content cloud. In the future, the process can be refined and enhanced by further evaluation of results. Further work is also required to better identify closely related articles. Practical implications - Being able to automatically classify the content generated by web users will enable others to find more appropriate content. Originality/value - The approach is original. Other researchers have produced a cloud, simply by using skiplists to filter unwanted words, this paper's approach improves this by applying appropriate NLP techniques.
Source: Library review. 57(2008) no.9, S.722-729

Yi, K.: Automatic text classification using library classification schemes : trends, issues and challenges (2007) 0.02

0.020572348 = product of:
  0.06171704 = sum of:
    0.03527212 = weight(_text_:library in 2560) [ClassicSimilarity], result of:
      0.03527212 = score(doc=2560,freq=8.0), product of:
        0.08672522 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.03298316 = queryNorm
        0.40671125 = fieldWeight in 2560, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2560)
    0.010804252 = weight(_text_:of in 2560) [ClassicSimilarity], result of:
      0.010804252 = score(doc=2560,freq=6.0), product of:
        0.05157766 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03298316 = queryNorm
        0.20947541 = fieldWeight in 2560, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2560)
    0.01564067 = product of:
      0.03128134 = sum of:
        0.03128134 = weight(_text_:22 in 2560) [ClassicSimilarity], result of:
          0.03128134 = score(doc=2560,freq=2.0), product of:
            0.11550141 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03298316 = queryNorm
            0.2708308 = fieldWeight in 2560, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2560)
      0.5 = coord(1/2)
  0.33333334 = coord(3/9)

Abstract: The proliferation of digital resources and their integration into a traditional library setting has created a pressing need for an automated tool that organizes textual information based on library classification schemes. Automated text classification is a research field of developing tools, methods, and models to automate text classification. This article describes the current popular approach for text classification and major text classification projects and applications that are based on library classification schemes. Related issues and challenges are discussed, and a number of considerations for the challenges are examined.
Date: 22. 9.2008 18:31:54

Golub, K.: Automated subject classification of textual web documents (2006) 0.02
```
0.018399697 = product of:
  0.05519909 = sum of:
    0.017815111 = weight(_text_:library in 5600) [ClassicSimilarity], result of:
      0.017815111 = score(doc=5600,freq=4.0), product of:
        0.08672522 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.03298316 = queryNorm
        0.2054202 = fieldWeight in 5600, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5600)
    0.015434646 = weight(_text_:of in 5600) [ClassicSimilarity], result of:
      0.015434646 = score(doc=5600,freq=24.0), product of:
        0.05157766 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03298316 = queryNorm
        0.2992506 = fieldWeight in 5600, product of:
          4.8989797 = tf(freq=24.0), with freq of:
            24.0 = termFreq=24.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5600)
    0.021949332 = product of:
      0.043898664 = sum of:
        0.043898664 = weight(_text_:problems in 5600) [ClassicSimilarity], result of:
          0.043898664 = score(doc=5600,freq=4.0), product of:
            0.13613719 = queryWeight, product of:
              4.1274753 = idf(docFreq=1937, maxDocs=44218)
              0.03298316 = queryNorm
            0.322459 = fieldWeight in 5600, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.1274753 = idf(docFreq=1937, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5600)
      0.5 = coord(1/2)
  0.33333334 = coord(3/9)
```
Abstract

Purpose - To provide an integrated perspective to similarities and differences between approaches to automated classification in different research communities (machine learning, information retrieval and library science), and point to problems with the approaches and automated classification as such. Design/methodology/approach - A range of works dealing with automated classification of full-text web documents are discussed. Explorations of individual approaches are given in the following sections: special features (description, differences, evaluation), application and characteristics of web pages. Findings - Provides major similarities and differences between the three approaches: document pre-processing and utilization of web-specific document characteristics is common to all the approaches; major differences are in applied algorithms, employment or not of the vector space model and of controlled vocabularies. Problems of automated classification are recognized. Research limitations/implications - The paper does not attempt to provide an exhaustive bibliography of related resources. Practical implications - As an integrated overview of approaches from different research communities with application examples, it is very useful for students in library and information science and computer science, as well as for practitioners. Researchers from one community have the information on how similar tasks are conducted in different communities. Originality/value - To the author's knowledge, no review paper on automated text classification attempted to discuss more than one community's approach from an integrated perspective.

Source

Journal of documentation. 62(2006) no.3, S.350-371

Automatic classification research at OCLC (2002) 0.02

0.0165935 = product of:
  0.049780495 = sum of:
    0.01763606 = weight(_text_:library in 1563) [ClassicSimilarity], result of:
      0.01763606 = score(doc=1563,freq=2.0), product of:
        0.08672522 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.03298316 = queryNorm
        0.20335563 = fieldWeight in 1563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1563)
    0.016503768 = weight(_text_:of in 1563) [ClassicSimilarity], result of:
      0.016503768 = score(doc=1563,freq=14.0), product of:
        0.05157766 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03298316 = queryNorm
        0.31997898 = fieldWeight in 1563, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1563)
    0.01564067 = product of:
      0.03128134 = sum of:
        0.03128134 = weight(_text_:22 in 1563) [ClassicSimilarity], result of:
          0.03128134 = score(doc=1563,freq=2.0), product of:
            0.11550141 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03298316 = queryNorm
            0.2708308 = fieldWeight in 1563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1563)
      0.5 = coord(1/2)
  0.33333334 = coord(3/9)

Abstract: OCLC enlists the cooperation of the world's libraries to make the written record of humankind's cultural heritage more accessible through electronic media. Part of this goal can be accomplished through the application of the principles of knowledge organization. We believe that cultural artifacts are effectively lost unless they are indexed, cataloged and classified. Accordingly, OCLC has developed products, sponsored research projects, and encouraged the participation in international standards communities whose outcome has been improved library classification schemes, cataloging productivity tools, and new proposals for the creation and maintenance of metadata. Though cataloging and classification requires expert intellectual effort, we recognize that at least some of the work must be automated if we hope to keep pace with cultural change
Date: 5. 5.2003 9:22:09

Wartena, C.; Sommer, M.: Automatic classification of scientific records using the German Subject Heading Authority File (SWD) (2012) 0.02

0.016268058 = product of:
  0.04880417 = sum of:
    0.021429438 = product of:
      0.042858876 = sum of:
        0.042858876 = weight(_text_:headings in 472) [ClassicSimilarity], result of:
          0.042858876 = score(doc=472,freq=2.0), product of:
            0.15996648 = queryWeight, product of:
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.03298316 = queryNorm
            0.2679241 = fieldWeight in 472, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.0390625 = fieldNorm(doc=472)
      0.5 = coord(1/2)
    0.0125971865 = weight(_text_:library in 472) [ClassicSimilarity], result of:
      0.0125971865 = score(doc=472,freq=2.0), product of:
        0.08672522 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.03298316 = queryNorm
        0.14525402 = fieldWeight in 472, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0390625 = fieldNorm(doc=472)
    0.014777548 = weight(_text_:of in 472) [ClassicSimilarity], result of:
      0.014777548 = score(doc=472,freq=22.0), product of:
        0.05157766 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03298316 = queryNorm
        0.28651062 = fieldWeight in 472, product of:
          4.690416 = tf(freq=22.0), with freq of:
            22.0 = termFreq=22.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=472)
  0.33333334 = coord(3/9)

Abstract: The following paper deals with an automatic text classification method which does not require training documents. For this method the German Subject Heading Authority File (SWD), provided by the linked data service of the German National Library is used. Recently the SWD was enriched with notations of the Dewey Decimal Classification (DDC). In consequence it became possible to utilize the subject headings as textual representations for the notations of the DDC. Basically, we we derive the classification of a text from the classification of the words in the text given by the thesaurus. The method was tested by classifying 3826 OAI-Records from 7 different repositories. Mean reciprocal rank and recall were chosen as evaluation measure. Direct comparison to a machine learning method has shown that this method is definitely competitive. Thus we can conclude that the enriched version of the SWD provides high quality information with a broad coverage for classification of German scientific articles.
Content: This work is partially based on the Bachelor thesis of Maike Sommer. Vgl. auch: http://sda2012.dke-research.de.
Source: Proceedings of the 2nd International Workshop on Semantic Digital Archives held in conjunction with the 16th Int. Conference on Theory and Practice of Digital Libraries (TPDL) on September 27, 2012 in Paphos, Cyprus [http://ceur-ws.org/Vol-912/proceedings.pdf]. Eds.: A. Mitschik et al

Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998) 0.02

0.016185416 = product of:
  0.048556246 = sum of:
    0.01763606 = weight(_text_:library in 1673) [ClassicSimilarity], result of:
      0.01763606 = score(doc=1673,freq=2.0), product of:
        0.08672522 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.03298316 = queryNorm
        0.20335563 = fieldWeight in 1673, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1673)
    0.015279518 = weight(_text_:of in 1673) [ClassicSimilarity], result of:
      0.015279518 = score(doc=1673,freq=12.0), product of:
        0.05157766 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03298316 = queryNorm
        0.29624295 = fieldWeight in 1673, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1673)
    0.01564067 = product of:
      0.03128134 = sum of:
        0.03128134 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
          0.03128134 = score(doc=1673,freq=2.0), product of:
            0.11550141 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03298316 = queryNorm
            0.2708308 = fieldWeight in 1673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1673)
      0.5 = coord(1/2)
  0.33333334 = coord(3/9)

Abstract: The Wolverhampton Web Library (WWLib) is a WWW search engine that provides access to UK based information. The experimental version developed in 1995, was a success but highlighted the need for a much higher degree of automation. An interesting feature of the experimental WWLib was that it organised information according to DDC. Discusses the advantages of classification and describes the automatic classifier that is being developed in Java as part of the new, fully automated WWLib
Date: 1. 8.1996 22:08:06
Footnote: Contribution to a special issue devoted to the Proceedings of the 7th International World Wide Web Conference, held 14-18 April 1998, Brisbane, Australia; vgl. auch: http://www7.scu.edu.au/programme/posters/1846/com1846.htm.

Search (173 results, page 1 of 9)

Authors

Years

Languages

Types

Themes

Subjects