Search (28 results, page 1 of 2)

  • × theme_ss:"Automatisches Klassifizieren"
  1. Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.23
    0.23115546 = product of:
      0.30820727 = sum of:
        0.07241859 = product of:
          0.21725577 = sum of:
            0.21725577 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
              0.21725577 = score(doc=562,freq=2.0), product of:
                0.3865637 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.045596033 = queryNorm
                0.56201804 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.33333334 = coord(1/3)
        0.21725577 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
          0.21725577 = score(doc=562,freq=2.0), product of:
            0.3865637 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.045596033 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
        0.018532898 = product of:
          0.037065797 = sum of:
            0.037065797 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
              0.037065797 = score(doc=562,freq=2.0), product of:
                0.15966953 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045596033 = queryNorm
                0.23214069 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Content
    Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
    Date
    8. 1.2013 10:22:32
  2. Dubin, D.: Dimensions and discriminability (1998) 0.05
    0.052284636 = product of:
      0.10456927 = sum of:
        0.08294755 = weight(_text_:headings in 2338) [ClassicSimilarity], result of:
          0.08294755 = score(doc=2338,freq=2.0), product of:
            0.22113821 = queryWeight, product of:
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.045596033 = queryNorm
            0.37509373 = fieldWeight in 2338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2338)
        0.021621715 = product of:
          0.04324343 = sum of:
            0.04324343 = weight(_text_:22 in 2338) [ClassicSimilarity], result of:
              0.04324343 = score(doc=2338,freq=2.0), product of:
                0.15966953 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045596033 = queryNorm
                0.2708308 = fieldWeight in 2338, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2338)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Visualization interfaces can improve subject access by highlighting the inclusion of document representation components in similarity and discrimination relationships. Within a set of retrieved documents, what kinds of groupings can index terms and subject headings make explicit? The role of controlled vocabulary in classifying search output is examined
    Date
    22. 9.1997 19:16:05
  3. Frank, E.; Paynter, G.W.: Predicting Library of Congress Classifications from Library of Congress Subject Headings (2004) 0.03
    0.025136905 = product of:
      0.10054762 = sum of:
        0.10054762 = weight(_text_:headings in 2218) [ClassicSimilarity], result of:
          0.10054762 = score(doc=2218,freq=4.0), product of:
            0.22113821 = queryWeight, product of:
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.045596033 = queryNorm
            0.45468226 = fieldWeight in 2218, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.046875 = fieldNorm(doc=2218)
      0.25 = coord(1/4)
    
    Abstract
    This paper addresses the problem of automatically assigning a Library of Congress Classification (LCC) to a work given its set of Library of Congress Subject Headings (LCSH). LCCs are organized in a tree: The root node of this hierarchy comprises all possible topics, and leaf nodes correspond to the most specialized topic areas defined. We describe a procedure that, given a resource identified by its LCSH, automatically places that resource in the LCC hierarchy. The procedure uses machine learning techniques and training data from a large library catalog to learn a model that maps from sets of LCSH to classifications from the LCC tree. We present empirical results for our technique showing its accuracy an an independent collection of 50,000 LCSH/LCC pairs.
  4. Godby, C. J.; Stuler, J.: ¬The Library of Congress Classification as a knowledge base for automatic subject categorization (2001) 0.02
    0.0236993 = product of:
      0.0947972 = sum of:
        0.0947972 = weight(_text_:headings in 1567) [ClassicSimilarity], result of:
          0.0947972 = score(doc=1567,freq=2.0), product of:
            0.22113821 = queryWeight, product of:
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.045596033 = queryNorm
            0.42867854 = fieldWeight in 1567, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.0625 = fieldNorm(doc=1567)
      0.25 = coord(1/4)
    
    Abstract
    This paper describes a set of experiments in adapting a subset of the Library of Congress Classification for use as a database for automatic classification. A high degree of concept integrity was obtained when subject headings were mapped from OCLC's WorldCat database and filtered using the log-likelihood statistic
  5. Ahmed, M.; Mukhopadhyay, M.; Mukhopadhyay, P.: Automated knowledge organization : AI ML based subject indexing system for libraries (2023) 0.02
    0.020947421 = product of:
      0.083789684 = sum of:
        0.083789684 = weight(_text_:headings in 977) [ClassicSimilarity], result of:
          0.083789684 = score(doc=977,freq=4.0), product of:
            0.22113821 = queryWeight, product of:
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.045596033 = queryNorm
            0.3789019 = fieldWeight in 977, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.0390625 = fieldNorm(doc=977)
      0.25 = coord(1/4)
    
    Abstract
    The research study as reported here is an attempt to explore the possibilities of an AI/ML-based semi-automated indexing system in a library setup to handle large volumes of documents. It uses the Python virtual environment to install and configure an open source AI environment (named Annif) to feed the LOD (Linked Open Data) dataset of Library of Congress Subject Headings (LCSH) as a standard KOS (Knowledge Organisation System). The framework deployed the Turtle format of LCSH after cleaning the file with Skosify, applied an array of backend algorithms (namely TF-IDF, Omikuji, and NN-Ensemble) to measure relative performance, and selected Snowball as an analyser. The training of Annif was conducted with a large set of bibliographic records populated with subject descriptors (MARC tag 650$a) and indexed by trained LIS professionals. The training dataset is first treated with MarcEdit to export it in a format suitable for OpenRefine, and then in OpenRefine it undergoes many steps to produce a bibliographic record set suitable to train Annif. The framework, after training, has been tested with a bibliographic dataset to measure indexing efficiencies, and finally, the automated indexing framework is integrated with data wrangling software (OpenRefine) to produce suggested headings on a mass scale. The entire framework is based on open-source software, open datasets, and open standards.
  6. Godby, C.J.; Stuler, J.: ¬The Library of Congress Classification as a knowledge base for automatic subject categorization : subject access issues (2003) 0.02
    0.020736888 = product of:
      0.08294755 = sum of:
        0.08294755 = weight(_text_:headings in 3962) [ClassicSimilarity], result of:
          0.08294755 = score(doc=3962,freq=2.0), product of:
            0.22113821 = queryWeight, product of:
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.045596033 = queryNorm
            0.37509373 = fieldWeight in 3962, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3962)
      0.25 = coord(1/4)
    
    Abstract
    This paper describes a set of experiments in adapting a subset of the Library of Congress Classification for use as a database for automatic classification. A high degree of concept integrity was obtained when subject headings were mapped from OCLC's WorldCat database and filtered using the log-likelihood statistic.
  7. Larson, R.R.: Experiments in automatic Library of Congress Classification (1992) 0.02
    0.017774476 = product of:
      0.0710979 = sum of:
        0.0710979 = weight(_text_:headings in 1054) [ClassicSimilarity], result of:
          0.0710979 = score(doc=1054,freq=2.0), product of:
            0.22113821 = queryWeight, product of:
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.045596033 = queryNorm
            0.3215089 = fieldWeight in 1054, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.046875 = fieldNorm(doc=1054)
      0.25 = coord(1/4)
    
    Abstract
    This article presents the results of research into the automatic selection of Library of Congress Classification numbers based on the titles and subject headings in MARC records. The method used in this study was based on partial match retrieval techniques using various elements of new recors (i.e., those to be classified) as "queries", and a test database of classification clusters generated from previously classified MARC records. Sixty individual methods for automatic classification were tested on a set of 283 new records, using all combinations of four different partial match methods, five query types, and three representations of search terms. The results indicate that if the best method for a particular case can be determined, then up to 86% of the new records may be correctly classified. The single method with the best accuracy was able to select the correct classification for about 46% of the new records.
  8. Wu, M.; Liu, Y.-H.; Brownlee, R.; Zhang, X.: Evaluating utility and automatic classification of subject metadata from Research Data Australia (2021) 0.02
    0.017774476 = product of:
      0.0710979 = sum of:
        0.0710979 = weight(_text_:headings in 453) [ClassicSimilarity], result of:
          0.0710979 = score(doc=453,freq=2.0), product of:
            0.22113821 = queryWeight, product of:
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.045596033 = queryNorm
            0.3215089 = fieldWeight in 453, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.046875 = fieldNorm(doc=453)
      0.25 = coord(1/4)
    
    Abstract
    In this paper, we present a case study of how well subject metadata (comprising headings from an international classification scheme) has been deployed in a national data catalogue, and how often data seekers use subject metadata when searching for data. Through an analysis of user search behaviour as recorded in search logs, we find evidence that users utilise the subject metadata for data discovery. Since approximately half of the records ingested by the catalogue did not include subject metadata at the time of harvest, we experimented with automatic subject classification approaches in order to enrich these records and to provide additional support for user search and data discovery. Our results show that automatic methods work well for well represented categories of subject metadata, and these categories tend to have features that can distinguish themselves from the other categories. Our findings raise implications for data catalogue providers; they should invest more effort to enhance the quality of data records by providing an adequate description of these records for under-represented subject categories.
  9. Humphrey, S.M.; Névéol, A.; Browne, A.; Gobeil, J.; Ruch, P.; Darmoni, S.J.: Comparing a rule-based versus statistical system for automatic categorization of MEDLINE documents according to biomedical specialty (2009) 0.01
    0.014812064 = product of:
      0.059248257 = sum of:
        0.059248257 = weight(_text_:headings in 3300) [ClassicSimilarity], result of:
          0.059248257 = score(doc=3300,freq=2.0), product of:
            0.22113821 = queryWeight, product of:
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.045596033 = queryNorm
            0.2679241 = fieldWeight in 3300, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3300)
      0.25 = coord(1/4)
    
    Abstract
    Automatic document categorization is an important research problem in Information Science and Natural Language Processing. Many applications, including, Word Sense Disambiguation and Information Retrieval in large collections, can benefit from such categorization. This paper focuses on automatic categorization of documents from the biomedical literature into broad discipline-based categories. Two different systems are described and contrasted: CISMeF, which uses rules based on human indexing of the documents by the Medical Subject Headings (MeSH) controlled vocabulary in order to assign metaterms (MTs), and Journal Descriptor Indexing (JDI), based on human categorization of about 4,000 journals and statistical associations between journal descriptors (JDs) and textwords in the documents. We evaluate and compare the performance of these systems against a gold standard of humanly assigned categories for 100 MEDLINE documents, using six measures selected from trec_eval. The results show that for five of the measures performance is comparable, and for one measure JDI is superior. We conclude that these results favor JDI, given the significantly greater intellectual overhead involved in human indexing and maintaining a rule base for mapping MeSH terms to MTs. We also note a JDI method that associates JDs with MeSH indexing rather than textwords, and it may be worthwhile to investigate whether this JDI method (statistical) and CISMeF (rule-based) might be combined and then evaluated showing they are complementary to one another.
  10. Wartena, C.; Sommer, M.: Automatic classification of scientific records using the German Subject Heading Authority File (SWD) (2012) 0.01
    0.014812064 = product of:
      0.059248257 = sum of:
        0.059248257 = weight(_text_:headings in 472) [ClassicSimilarity], result of:
          0.059248257 = score(doc=472,freq=2.0), product of:
            0.22113821 = queryWeight, product of:
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.045596033 = queryNorm
            0.2679241 = fieldWeight in 472, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.0390625 = fieldNorm(doc=472)
      0.25 = coord(1/4)
    
    Abstract
    The following paper deals with an automatic text classification method which does not require training documents. For this method the German Subject Heading Authority File (SWD), provided by the linked data service of the German National Library is used. Recently the SWD was enriched with notations of the Dewey Decimal Classification (DDC). In consequence it became possible to utilize the subject headings as textual representations for the notations of the DDC. Basically, we we derive the classification of a text from the classification of the words in the text given by the thesaurus. The method was tested by classifying 3826 OAI-Records from 7 different repositories. Mean reciprocal rank and recall were chosen as evaluation measure. Direct comparison to a machine learning method has shown that this method is definitely competitive. Thus we can conclude that the enriched version of the SWD provides high quality information with a broad coverage for classification of German scientific articles.
  11. Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.01
    0.009266449 = product of:
      0.037065797 = sum of:
        0.037065797 = product of:
          0.07413159 = sum of:
            0.07413159 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
              0.07413159 = score(doc=1046,freq=2.0), product of:
                0.15966953 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045596033 = queryNorm
                0.46428138 = fieldWeight in 1046, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=1046)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    5. 5.2003 14:17:22
  12. Reiner, U.: Automatische DDC-Klassifizierung von bibliografischen Titeldatensätzen (2009) 0.01
    0.007722041 = product of:
      0.030888164 = sum of:
        0.030888164 = product of:
          0.06177633 = sum of:
            0.06177633 = weight(_text_:22 in 611) [ClassicSimilarity], result of:
              0.06177633 = score(doc=611,freq=2.0), product of:
                0.15966953 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045596033 = queryNorm
                0.38690117 = fieldWeight in 611, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=611)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    22. 8.2009 12:54:24
  13. HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.01
    0.007722041 = product of:
      0.030888164 = sum of:
        0.030888164 = product of:
          0.06177633 = sum of:
            0.06177633 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
              0.06177633 = score(doc=2748,freq=2.0), product of:
                0.15966953 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045596033 = queryNorm
                0.38690117 = fieldWeight in 2748, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=2748)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    1. 2.2016 18:25:22
  14. Koch, T.; Ardö, A.; Brümmer, A.: ¬The building and maintenance of robot based internet search services : A review of current indexing and data collection methods. Prepared to meet the requirements of Work Package 3 of EU Telematics for Research, project DESIRE. Version D3.11v0.3 (Draft version 3) (1996) 0.01
    0.007009558 = product of:
      0.028038232 = sum of:
        0.028038232 = product of:
          0.056076463 = sum of:
            0.056076463 = weight(_text_:terminology in 1669) [ClassicSimilarity], result of:
              0.056076463 = score(doc=1669,freq=2.0), product of:
                0.24053115 = queryWeight, product of:
                  5.2752647 = idf(docFreq=614, maxDocs=44218)
                  0.045596033 = queryNorm
                0.23313597 = fieldWeight in 1669, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.2752647 = idf(docFreq=614, maxDocs=44218)
                  0.03125 = fieldNorm(doc=1669)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    After a short outline of problems, possibilities and difficulties of systematic information retrieval on the Internet and a description of efforts for development in this area, a specification of the terminology for this report is required. Although the process of retrieval is generally seen as an iterative process of browsing and information retrieval and several important services on the net have taken this fact into consideration, the emphasis of this report lays on the general retrieval tools for the whole of Internet. In order to be able to evaluate the differences, possibilities and restrictions of the different services it is necessary to begin with organizing the existing varieties in a typological/ taxonomical survey. The possibilities and weaknesses will be briefly compared and described for the most important services in the categories robot-based WWW-catalogues of different types, list- or form-based catalogues and simultaneous or collected search services respectively. It will however for different reasons not be possible to rank them in order of "best" services. Still more important are the weaknesses and problems common for all attempts of indexing the Internet. The problems of the quality of the input, the technical performance and the general problem of indexing virtual hypertext are shown to be at least as difficult as the different aspects of harvesting, indexing and information retrieval. Some of the attempts made in the area of further development of retrieval services will be mentioned in relation to descriptions of the contents of documents and standardization efforts. Internet harvesting and indexing technology and retrieval software is thoroughly reviewed. Details about all services and software are listed in analytical forms in Annex 1-3.
  15. Bock, H.-H.: Datenanalyse zur Strukturierung und Ordnung von Information (1989) 0.01
    0.005405429 = product of:
      0.021621715 = sum of:
        0.021621715 = product of:
          0.04324343 = sum of:
            0.04324343 = weight(_text_:22 in 141) [ClassicSimilarity], result of:
              0.04324343 = score(doc=141,freq=2.0), product of:
                0.15966953 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045596033 = queryNorm
                0.2708308 = fieldWeight in 141, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=141)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Pages
    S.1-22
  16. Automatic classification research at OCLC (2002) 0.01
    0.005405429 = product of:
      0.021621715 = sum of:
        0.021621715 = product of:
          0.04324343 = sum of:
            0.04324343 = weight(_text_:22 in 1563) [ClassicSimilarity], result of:
              0.04324343 = score(doc=1563,freq=2.0), product of:
                0.15966953 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045596033 = queryNorm
                0.2708308 = fieldWeight in 1563, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1563)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    5. 5.2003 9:22:09
  17. Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998) 0.01
    0.005405429 = product of:
      0.021621715 = sum of:
        0.021621715 = product of:
          0.04324343 = sum of:
            0.04324343 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
              0.04324343 = score(doc=1673,freq=2.0), product of:
                0.15966953 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045596033 = queryNorm
                0.2708308 = fieldWeight in 1673, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1673)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    1. 8.1996 22:08:06
  18. Yoon, Y.; Lee, C.; Lee, G.G.: ¬An effective procedure for constructing a hierarchical text classification system (2006) 0.01
    0.005405429 = product of:
      0.021621715 = sum of:
        0.021621715 = product of:
          0.04324343 = sum of:
            0.04324343 = weight(_text_:22 in 5273) [ClassicSimilarity], result of:
              0.04324343 = score(doc=5273,freq=2.0), product of:
                0.15966953 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045596033 = queryNorm
                0.2708308 = fieldWeight in 5273, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5273)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    22. 7.2006 16:24:52
  19. Yi, K.: Automatic text classification using library classification schemes : trends, issues and challenges (2007) 0.01
    0.005405429 = product of:
      0.021621715 = sum of:
        0.021621715 = product of:
          0.04324343 = sum of:
            0.04324343 = weight(_text_:22 in 2560) [ClassicSimilarity], result of:
              0.04324343 = score(doc=2560,freq=2.0), product of:
                0.15966953 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045596033 = queryNorm
                0.2708308 = fieldWeight in 2560, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2560)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    22. 9.2008 18:31:54
  20. Dolin, R.; Agrawal, D.; El Abbadi, A.; Pearlman, J.: Using automated classification for summarizing and selecting heterogeneous information sources (1998) 0.01
    0.0052571683 = product of:
      0.021028673 = sum of:
        0.021028673 = product of:
          0.042057347 = sum of:
            0.042057347 = weight(_text_:terminology in 1253) [ClassicSimilarity], result of:
              0.042057347 = score(doc=1253,freq=2.0), product of:
                0.24053115 = queryWeight, product of:
                  5.2752647 = idf(docFreq=614, maxDocs=44218)
                  0.045596033 = queryNorm
                0.17485198 = fieldWeight in 1253, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.2752647 = idf(docFreq=614, maxDocs=44218)
                  0.0234375 = fieldNorm(doc=1253)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    We are currently experimenting with newsgroups as collections. We have built an initial prototype which automatically classifies and summarizes newsgroups within the LCC. (The prototype can be tested below, and more details may be found at http://pharos.alexandria.ucsb.edu/). The prototype uses electronic library catalog records as a `training set' and Latent Semantic Indexing (LSI) for IR. We use the training set to build a rich set of classification terminology, and associate these terms with the relevant categories in the LCC. This association between terms and classification categories allows us to relate users' queries to nodes in the LCC so that users can select appropriate query categories. Newsgroups are similarly associated with classification categories. Pharos then matches the categories selected by users to relevant newsgroups. In principle, this approach allows users to exclude newsgroups that might have been selected based on an unintended meaning of a query term, and to include newsgroups with relevant content even though the exact query terms may not have been used. This work is extensible to other types of classification, including geographical, temporal, and image feature. Before discussing the methodology of the collection summarization and selection, we first present an online demonstration below. The demonstration is not intended to be a complete end-user interface. Rather, it is intended merely to offer a view of the process to suggest the "look and feel" of the prototype. The demo works as follows. First supply it with a few keywords of interest. The system will then use those terms to try to return to you the most relevant subject categories within the LCC. Assuming that the system recognizes any of your terms (it has over 400,000 terms indexed), it will give you a list of 15 LCC categories sorted by relevancy ranking. From there, you have two choices. The first choice, by clicking on the "News" links, is to get a list of newsgroups which the system has identified as relevant to the LCC category you select. The other choice, by clicking on the LCC ID links, is to enter the LCC hierarchy starting at the category of your choice and navigate the tree until you locate the best category for your query. From there, again, you can get a list of newsgroups by clicking on the "News" links. After having shown this demonstration to many people, we would like to suggest that you first give it easier examples before trying to break it. For example, "prostate cancer" (discussed below), "remote sensing", "investment banking", and "gershwin" all work reasonably well.