Search (38 results, page 1 of 2)

  • × theme_ss:"Automatisches Klassifizieren"
  • × year_i:[1990 TO 2000}
  1. Wätjen, H.-J.; Diekmann, B.; Möller, G.; Carstensen, K.-U.: Bericht zum DFG-Projekt: GERHARD : German Harvest Automated Retrieval and Directory (1998) 0.05
    0.0516568 = product of:
      0.11364496 = sum of:
        0.009064952 = product of:
          0.018129904 = sum of:
            0.018129904 = weight(_text_:h in 3065) [ClassicSimilarity], result of:
              0.018129904 = score(doc=3065,freq=2.0), product of:
                0.0660481 = queryWeight, product of:
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.026584605 = queryNorm
                0.27449545 = fieldWeight in 3065, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.078125 = fieldNorm(doc=3065)
          0.5 = coord(1/2)
        0.03218541 = weight(_text_:r in 3065) [ClassicSimilarity], result of:
          0.03218541 = score(doc=3065,freq=2.0), product of:
            0.088001914 = queryWeight, product of:
              3.3102584 = idf(docFreq=4387, maxDocs=44218)
              0.026584605 = queryNorm
            0.36573532 = fieldWeight in 3065, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3102584 = idf(docFreq=4387, maxDocs=44218)
              0.078125 = fieldNorm(doc=3065)
        0.0034720355 = weight(_text_:s in 3065) [ClassicSimilarity], result of:
          0.0034720355 = score(doc=3065,freq=2.0), product of:
            0.028903782 = queryWeight, product of:
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.026584605 = queryNorm
            0.120123915 = fieldWeight in 3065, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.078125 = fieldNorm(doc=3065)
        0.031492744 = weight(_text_:u in 3065) [ClassicSimilarity], result of:
          0.031492744 = score(doc=3065,freq=2.0), product of:
            0.08704981 = queryWeight, product of:
              3.2744443 = idf(docFreq=4547, maxDocs=44218)
              0.026584605 = queryNorm
            0.3617784 = fieldWeight in 3065, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2744443 = idf(docFreq=4547, maxDocs=44218)
              0.078125 = fieldNorm(doc=3065)
        0.037429813 = weight(_text_:k in 3065) [ClassicSimilarity], result of:
          0.037429813 = score(doc=3065,freq=2.0), product of:
            0.09490114 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.026584605 = queryNorm
            0.39440846 = fieldWeight in 3065, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.078125 = fieldNorm(doc=3065)
      0.45454547 = coord(5/11)
    
    Pages
    34 S
    Type
    r
  2. Dolin, R.; Agrawal, D.; El Abbadi, A.; Pearlman, J.: Using automated classification for summarizing and selecting heterogeneous information sources (1998) 0.01
    0.007751891 = product of:
      0.0284236 = sum of:
        0.0070291325 = weight(_text_:a in 316) [ClassicSimilarity], result of:
          0.0070291325 = score(doc=316,freq=18.0), product of:
            0.030653298 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.026584605 = queryNorm
            0.22931081 = fieldWeight in 316, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=316)
        0.019311246 = weight(_text_:r in 316) [ClassicSimilarity], result of:
          0.019311246 = score(doc=316,freq=2.0), product of:
            0.088001914 = queryWeight, product of:
              3.3102584 = idf(docFreq=4387, maxDocs=44218)
              0.026584605 = queryNorm
            0.2194412 = fieldWeight in 316, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3102584 = idf(docFreq=4387, maxDocs=44218)
              0.046875 = fieldNorm(doc=316)
        0.0020832212 = weight(_text_:s in 316) [ClassicSimilarity], result of:
          0.0020832212 = score(doc=316,freq=2.0), product of:
            0.028903782 = queryWeight, product of:
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.026584605 = queryNorm
            0.072074346 = fieldWeight in 316, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.046875 = fieldNorm(doc=316)
      0.27272728 = coord(3/11)
    
    Abstract
    Information retrieval over the Internet increasingly requires the filtering of thousands of heterogeneous information sources. Important sources of information include not only traditional databases with structured data and queries, but also increasing numbers of non-traditional, semi- or unstructured collections such as Web sites, FTP archives, etc. As the number and variability of sources increases, new ways of automatically summarizing, discovering, and selecting collections relevant to a user's query are needed. One such method involves the use of classification schemes, such as the Library of Congress Classification (LCC) [10], within which a collection may be represented based on its content, irrespective of the structure of the actual data or documents. For such a system to be useful in a large-scale distributed environment, it must be easy to use for both collection managers and users. As a result, it must be possible to classify documents automatically within a classification scheme. Furthermore, there must be a straightforward and intuitive interface with which the user may use the scheme to assist in information retrieval (IR).
    Pages
    11 S.
    Type
    a
  3. Yang, Y.; Liu, X.: ¬A re-examination of text categorization methods (1999) 0.01
    0.005981214 = product of:
      0.032896675 = sum of:
        0.006695806 = weight(_text_:a in 3386) [ClassicSimilarity], result of:
          0.006695806 = score(doc=3386,freq=12.0), product of:
            0.030653298 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.026584605 = queryNorm
            0.21843673 = fieldWeight in 3386, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3386)
        0.02620087 = weight(_text_:k in 3386) [ClassicSimilarity], result of:
          0.02620087 = score(doc=3386,freq=2.0), product of:
            0.09490114 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.026584605 = queryNorm
            0.27608594 = fieldWeight in 3386, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3386)
      0.18181819 = coord(2/11)
    
    Abstract
    This paper reports a controlled study with statistical significance tests an five text categorization methods: the Support Vector Machines (SVM), a k-Nearest Neighbor (kNN) classifier, a neural network (NNet) approach, the Linear Leastsquares Fit (LLSF) mapping and a Naive Bayes (NB) classifier. We focus an the robustness of these methods in dealing with a skewed category distribution, and their performance as function of the training-set category frequency. Our results show that SVM, kNN and LLSF significantly outperform NNet and NB when the number of positive training instances per category are small (less than ten, and that all the methods perform comparably when the categories are sufficiently common (over 300 instances).
  4. Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998) 0.01
    0.00576799 = product of:
      0.021149296 = sum of:
        0.006112407 = weight(_text_:a in 1673) [ClassicSimilarity], result of:
          0.006112407 = score(doc=1673,freq=10.0), product of:
            0.030653298 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.026584605 = queryNorm
            0.19940455 = fieldWeight in 1673, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1673)
        0.0024304248 = weight(_text_:s in 1673) [ClassicSimilarity], result of:
          0.0024304248 = score(doc=1673,freq=2.0), product of:
            0.028903782 = queryWeight, product of:
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.026584605 = queryNorm
            0.08408674 = fieldWeight in 1673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1673)
        0.012606464 = product of:
          0.025212929 = sum of:
            0.025212929 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
              0.025212929 = score(doc=1673,freq=2.0), product of:
                0.09309476 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.026584605 = queryNorm
                0.2708308 = fieldWeight in 1673, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1673)
          0.5 = coord(1/2)
      0.27272728 = coord(3/11)
    
    Abstract
    The Wolverhampton Web Library (WWLib) is a WWW search engine that provides access to UK based information. The experimental version developed in 1995, was a success but highlighted the need for a much higher degree of automation. An interesting feature of the experimental WWLib was that it organised information according to DDC. Discusses the advantages of classification and describes the automatic classifier that is being developed in Java as part of the new, fully automated WWLib
    Date
    1. 8.1996 22:08:06
    Footnote
    Contribution to a special issue devoted to the Proceedings of the 7th International World Wide Web Conference, held 14-18 April 1998, Brisbane, Australia; vgl. auch: http://www7.scu.edu.au/programme/posters/1846/com1846.htm.
    Source
    Computer networks and ISDN systems. 30(1998) nos.1/7, S.646-648
    Type
    a
  5. Ruocco, A.S.; Frieder, O.: Clustering and classification of large document bases in a parallel environment (1997) 0.01
    0.005682776 = product of:
      0.020836845 = sum of:
        0.005467103 = weight(_text_:a in 1661) [ClassicSimilarity], result of:
          0.005467103 = score(doc=1661,freq=8.0), product of:
            0.030653298 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.026584605 = queryNorm
            0.17835285 = fieldWeight in 1661, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1661)
        0.012939317 = product of:
          0.05175727 = sum of:
            0.05175727 = weight(_text_:o in 1661) [ClassicSimilarity], result of:
              0.05175727 = score(doc=1661,freq=2.0), product of:
                0.13338262 = queryWeight, product of:
                  5.017288 = idf(docFreq=795, maxDocs=44218)
                  0.026584605 = queryNorm
                0.38803607 = fieldWeight in 1661, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.017288 = idf(docFreq=795, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1661)
          0.25 = coord(1/4)
        0.0024304248 = weight(_text_:s in 1661) [ClassicSimilarity], result of:
          0.0024304248 = score(doc=1661,freq=2.0), product of:
            0.028903782 = queryWeight, product of:
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.026584605 = queryNorm
            0.08408674 = fieldWeight in 1661, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1661)
      0.27272728 = coord(3/11)
    
    Abstract
    Proposes the use of parallel computing systems to overcome the computationally intense clustering process. Examines 2 operations: clustering a document set and classifying the document set. Uses a subset of the TIPSTER corpus, specifically, articles from the Wall Street Journal. Document set classification was performed without the large storage requirements for ancillary data matrices. The time performance of the parallel systems was an improvement over sequential systems times, and produced the same clustering and classification scheme. Results show near linear speed up in higher threshold clustering applications
    Source
    Journal of the American Society for Information Science. 48(1997) no.10, S.932-943
    Type
    a
  6. Dubin, D.: Dimensions and discriminability (1998) 0.01
    0.0051552863 = product of:
      0.018902715 = sum of:
        0.003865826 = weight(_text_:a in 2338) [ClassicSimilarity], result of:
          0.003865826 = score(doc=2338,freq=4.0), product of:
            0.030653298 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.026584605 = queryNorm
            0.12611452 = fieldWeight in 2338, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2338)
        0.0024304248 = weight(_text_:s in 2338) [ClassicSimilarity], result of:
          0.0024304248 = score(doc=2338,freq=2.0), product of:
            0.028903782 = queryWeight, product of:
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.026584605 = queryNorm
            0.08408674 = fieldWeight in 2338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2338)
        0.012606464 = product of:
          0.025212929 = sum of:
            0.025212929 = weight(_text_:22 in 2338) [ClassicSimilarity], result of:
              0.025212929 = score(doc=2338,freq=2.0), product of:
                0.09309476 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.026584605 = queryNorm
                0.2708308 = fieldWeight in 2338, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2338)
          0.5 = coord(1/2)
      0.27272728 = coord(3/11)
    
    Abstract
    Visualization interfaces can improve subject access by highlighting the inclusion of document representation components in similarity and discrimination relationships. Within a set of retrieved documents, what kinds of groupings can index terms and subject headings make explicit? The role of controlled vocabulary in classifying search output is examined
    Date
    22. 9.1997 19:16:05
    Pages
    S.39-44
    Type
    a
  7. Koch, T.; Ardö, A.; Brümmer, A.: ¬The building and maintenance of robot based internet search services : A review of current indexing and data collection methods. Prepared to meet the requirements of Work Package 3 of EU Telematics for Research, project DESIRE. Version D3.11v0.3 (Draft version 3) (1996) 0.01
    0.0050170147 = product of:
      0.01839572 = sum of:
        0.0041327416 = weight(_text_:a in 1669) [ClassicSimilarity], result of:
          0.0041327416 = score(doc=1669,freq=14.0), product of:
            0.030653298 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.026584605 = queryNorm
            0.13482209 = fieldWeight in 1669, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.03125 = fieldNorm(doc=1669)
        0.012874164 = weight(_text_:r in 1669) [ClassicSimilarity], result of:
          0.012874164 = score(doc=1669,freq=2.0), product of:
            0.088001914 = queryWeight, product of:
              3.3102584 = idf(docFreq=4387, maxDocs=44218)
              0.026584605 = queryNorm
            0.14629413 = fieldWeight in 1669, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3102584 = idf(docFreq=4387, maxDocs=44218)
              0.03125 = fieldNorm(doc=1669)
        0.0013888142 = weight(_text_:s in 1669) [ClassicSimilarity], result of:
          0.0013888142 = score(doc=1669,freq=2.0), product of:
            0.028903782 = queryWeight, product of:
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.026584605 = queryNorm
            0.048049565 = fieldWeight in 1669, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.03125 = fieldNorm(doc=1669)
      0.27272728 = coord(3/11)
    
    Abstract
    After a short outline of problems, possibilities and difficulties of systematic information retrieval on the Internet and a description of efforts for development in this area, a specification of the terminology for this report is required. Although the process of retrieval is generally seen as an iterative process of browsing and information retrieval and several important services on the net have taken this fact into consideration, the emphasis of this report lays on the general retrieval tools for the whole of Internet. In order to be able to evaluate the differences, possibilities and restrictions of the different services it is necessary to begin with organizing the existing varieties in a typological/ taxonomical survey. The possibilities and weaknesses will be briefly compared and described for the most important services in the categories robot-based WWW-catalogues of different types, list- or form-based catalogues and simultaneous or collected search services respectively. It will however for different reasons not be possible to rank them in order of "best" services. Still more important are the weaknesses and problems common for all attempts of indexing the Internet. The problems of the quality of the input, the technical performance and the general problem of indexing virtual hypertext are shown to be at least as difficult as the different aspects of harvesting, indexing and information retrieval. Some of the attempts made in the area of further development of retrieval services will be mentioned in relation to descriptions of the contents of documents and standardization efforts. Internet harvesting and indexing technology and retrieval software is thoroughly reviewed. Details about all services and software are listed in analytical forms in Annex 1-3.
    Pages
    116 S
    Type
    r
  8. Dolin, R.; Agrawal, D.; El Abbadi, A.; Pearlman, J.: Using automated classification for summarizing and selecting heterogeneous information sources (1998) 0.00
    0.0044160434 = product of:
      0.016192159 = sum of:
        0.0054949257 = weight(_text_:a in 1253) [ClassicSimilarity], result of:
          0.0054949257 = score(doc=1253,freq=44.0), product of:
            0.030653298 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.026584605 = queryNorm
            0.1792605 = fieldWeight in 1253, product of:
              6.6332498 = tf(freq=44.0), with freq of:
                44.0 = termFreq=44.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0234375 = fieldNorm(doc=1253)
        0.009655623 = weight(_text_:r in 1253) [ClassicSimilarity], result of:
          0.009655623 = score(doc=1253,freq=2.0), product of:
            0.088001914 = queryWeight, product of:
              3.3102584 = idf(docFreq=4387, maxDocs=44218)
              0.026584605 = queryNorm
            0.1097206 = fieldWeight in 1253, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3102584 = idf(docFreq=4387, maxDocs=44218)
              0.0234375 = fieldNorm(doc=1253)
        0.0010416106 = weight(_text_:s in 1253) [ClassicSimilarity], result of:
          0.0010416106 = score(doc=1253,freq=2.0), product of:
            0.028903782 = queryWeight, product of:
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.026584605 = queryNorm
            0.036037173 = fieldWeight in 1253, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.0234375 = fieldNorm(doc=1253)
      0.27272728 = coord(3/11)
    
    Abstract
    Information retrieval over the Internet increasingly requires the filtering of thousands of heterogeneous information sources. Important sources of information include not only traditional databases with structured data and queries, but also increasing numbers of non-traditional, semi- or unstructured collections such as Web sites, FTP archives, etc. As the number and variability of sources increases, new ways of automatically summarizing, discovering, and selecting collections relevant to a user's query are needed. One such method involves the use of classification schemes, such as the Library of Congress Classification (LCC), within which a collection may be represented based on its content, irrespective of the structure of the actual data or documents. For such a system to be useful in a large-scale distributed environment, it must be easy to use for both collection managers and users. As a result, it must be possible to classify documents automatically within a classification scheme. Furthermore, there must be a straightforward and intuitive interface with which the user may use the scheme to assist in information retrieval (IR). Our work with the Alexandria Digital Library (ADL) Project focuses on geo-referenced information, whether text, maps, aerial photographs, or satellite images. As a result, we have emphasized techniques which work with both text and non-text, such as combined textual and graphical queries, multi-dimensional indexing, and IR methods which are not solely dependent on words or phrases. Part of this work involves locating relevant online sources of information. In particular, we have designed and are currently testing aspects of an architecture, Pharos, which we believe will scale up to 1.000.000 heterogeneous sources. Pharos accommodates heterogeneity in content and format, both among multiple sources as well as within a single source. That is, we consider sources to include Web sites, FTP archives, newsgroups, and full digital libraries; all of these systems can include a wide variety of content and multimedia data formats. Pharos is based on the use of hierarchical classification schemes. These include not only well-known 'subject' (or 'concept') based schemes such as the Dewey Decimal System and the LCC, but also, for example, geographic classifications, which might be constructed as layers of smaller and smaller hierarchical longitude/latitude boxes. Pharos is designed to work with sophisticated queries which utilize subjects, geographical locations, temporal specifications, and other types of information domains. The Pharos architecture requires that hierarchically structured collection metadata be extracted so that it can be partitioned in such a way as to greatly enhance scalability. Automated classification is important to Pharos because it allows information sources to extract the requisite collection metadata automatically that must be distributed.
    We are currently experimenting with newsgroups as collections. We have built an initial prototype which automatically classifies and summarizes newsgroups within the LCC. (The prototype can be tested below, and more details may be found at http://pharos.alexandria.ucsb.edu/). The prototype uses electronic library catalog records as a `training set' and Latent Semantic Indexing (LSI) for IR. We use the training set to build a rich set of classification terminology, and associate these terms with the relevant categories in the LCC. This association between terms and classification categories allows us to relate users' queries to nodes in the LCC so that users can select appropriate query categories. Newsgroups are similarly associated with classification categories. Pharos then matches the categories selected by users to relevant newsgroups. In principle, this approach allows users to exclude newsgroups that might have been selected based on an unintended meaning of a query term, and to include newsgroups with relevant content even though the exact query terms may not have been used. This work is extensible to other types of classification, including geographical, temporal, and image feature. Before discussing the methodology of the collection summarization and selection, we first present an online demonstration below. The demonstration is not intended to be a complete end-user interface. Rather, it is intended merely to offer a view of the process to suggest the "look and feel" of the prototype. The demo works as follows. First supply it with a few keywords of interest. The system will then use those terms to try to return to you the most relevant subject categories within the LCC. Assuming that the system recognizes any of your terms (it has over 400,000 terms indexed), it will give you a list of 15 LCC categories sorted by relevancy ranking. From there, you have two choices. The first choice, by clicking on the "News" links, is to get a list of newsgroups which the system has identified as relevant to the LCC category you select. The other choice, by clicking on the LCC ID links, is to enter the LCC hierarchy starting at the category of your choice and navigate the tree until you locate the best category for your query. From there, again, you can get a list of newsgroups by clicking on the "News" links. After having shown this demonstration to many people, we would like to suggest that you first give it easier examples before trying to break it. For example, "prostate cancer" (discussed below), "remote sensing", "investment banking", and "gershwin" all work reasonably well.
    Source
    D-Lib magazine. 4(1998) no.1, xx S
    Type
    a
  9. Orwig, R.E.; Chen, H.; Nunamaker, J.F.: ¬A graphical, self-organizing approach to classifying electronic meeting output (1997) 0.00
    0.004060445 = product of:
      0.014888298 = sum of:
        0.006345466 = product of:
          0.012690932 = sum of:
            0.012690932 = weight(_text_:h in 6928) [ClassicSimilarity], result of:
              0.012690932 = score(doc=6928,freq=2.0), product of:
                0.0660481 = queryWeight, product of:
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.026584605 = queryNorm
                0.19214681 = fieldWeight in 6928, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=6928)
          0.5 = coord(1/2)
        0.006112407 = weight(_text_:a in 6928) [ClassicSimilarity], result of:
          0.006112407 = score(doc=6928,freq=10.0), product of:
            0.030653298 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.026584605 = queryNorm
            0.19940455 = fieldWeight in 6928, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6928)
        0.0024304248 = weight(_text_:s in 6928) [ClassicSimilarity], result of:
          0.0024304248 = score(doc=6928,freq=2.0), product of:
            0.028903782 = queryWeight, product of:
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.026584605 = queryNorm
            0.08408674 = fieldWeight in 6928, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6928)
      0.27272728 = coord(3/11)
    
    Abstract
    Describes research in the application of a Kohonen Self-Organizing Map (SOM) to the problem of classification of electronic brainstorming output and an evaluation of the results. Describes an electronic meeting system and describes the classification problem that exists in the group problem solving process. Surveys the literature concerning classification. Describes the application of the Kohonen SOM to the meeting output classification problem. Describes an experiment that evaluated the classification performed by the Kohonen SOM by comparing it with those of a human expert and a Hopfield neural network. Discusses conclusions and directions for future research
    Source
    Journal of the American Society for Information Science. 48(1997) no.2, S.157-170
    Type
    a
  10. Wätjen, H.-J.: GERHARD : Automatisches Sammeln, Klassifizieren und Indexieren von wissenschaftlich relevanten Informationsressourcen im deutschen World Wide Web (1998) 0.00
    0.0038557693 = product of:
      0.01413782 = sum of:
        0.008973843 = product of:
          0.017947687 = sum of:
            0.017947687 = weight(_text_:h in 3064) [ClassicSimilarity], result of:
              0.017947687 = score(doc=3064,freq=4.0), product of:
                0.0660481 = queryWeight, product of:
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.026584605 = queryNorm
                0.27173662 = fieldWeight in 3064, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3064)
          0.5 = coord(1/2)
        0.0027335514 = weight(_text_:a in 3064) [ClassicSimilarity], result of:
          0.0027335514 = score(doc=3064,freq=2.0), product of:
            0.030653298 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.026584605 = queryNorm
            0.089176424 = fieldWeight in 3064, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3064)
        0.0024304248 = weight(_text_:s in 3064) [ClassicSimilarity], result of:
          0.0024304248 = score(doc=3064,freq=2.0), product of:
            0.028903782 = queryWeight, product of:
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.026584605 = queryNorm
            0.08408674 = fieldWeight in 3064, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3064)
      0.27272728 = coord(3/11)
    
    Source
    B.I.T.online. 1(1998) H.4, S.279-290
    Type
    a
  11. Koch, T.: Nutzung von Klassifikationssystemen zur verbesserten Beschreibung, Organisation und Suche von Internetressourcen (1998) 0.00
    0.0035873586 = product of:
      0.013153648 = sum of:
        0.007251961 = product of:
          0.014503922 = sum of:
            0.014503922 = weight(_text_:h in 1030) [ClassicSimilarity], result of:
              0.014503922 = score(doc=1030,freq=2.0), product of:
                0.0660481 = queryWeight, product of:
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.026584605 = queryNorm
                0.21959636 = fieldWeight in 1030, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1030)
          0.5 = coord(1/2)
        0.0031240587 = weight(_text_:a in 1030) [ClassicSimilarity], result of:
          0.0031240587 = score(doc=1030,freq=2.0), product of:
            0.030653298 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.026584605 = queryNorm
            0.10191591 = fieldWeight in 1030, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=1030)
        0.0027776284 = weight(_text_:s in 1030) [ClassicSimilarity], result of:
          0.0027776284 = score(doc=1030,freq=2.0), product of:
            0.028903782 = queryWeight, product of:
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.026584605 = queryNorm
            0.09609913 = fieldWeight in 1030, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.0625 = fieldNorm(doc=1030)
      0.27272728 = coord(3/11)
    
    Source
    BuB. 50(1998) H.5, S.326-335
    Type
    a
  12. Ardö, A.; Koch, T.: Automatic classification applied to full-text Internet documents in a robot-generated subject index (1999) 0.00
    0.0022332703 = product of:
      0.012282986 = sum of:
        0.008116543 = weight(_text_:a in 382) [ClassicSimilarity], result of:
          0.008116543 = score(doc=382,freq=6.0), product of:
            0.030653298 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.026584605 = queryNorm
            0.26478532 = fieldWeight in 382, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.09375 = fieldNorm(doc=382)
        0.0041664424 = weight(_text_:s in 382) [ClassicSimilarity], result of:
          0.0041664424 = score(doc=382,freq=2.0), product of:
            0.028903782 = queryWeight, product of:
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.026584605 = queryNorm
            0.14414869 = fieldWeight in 382, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.09375 = fieldNorm(doc=382)
      0.18181819 = coord(2/11)
    
    Pages
    S.239-246
    Type
    a
  13. May, A.D.: Automatic classification of e-mail messages by message type (1997) 0.00
    0.0017568587 = product of:
      0.009662722 = sum of:
        0.0072322977 = weight(_text_:a in 6493) [ClassicSimilarity], result of:
          0.0072322977 = score(doc=6493,freq=14.0), product of:
            0.030653298 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.026584605 = queryNorm
            0.23593865 = fieldWeight in 6493, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6493)
        0.0024304248 = weight(_text_:s in 6493) [ClassicSimilarity], result of:
          0.0024304248 = score(doc=6493,freq=2.0), product of:
            0.028903782 = queryWeight, product of:
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.026584605 = queryNorm
            0.08408674 = fieldWeight in 6493, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6493)
      0.18181819 = coord(2/11)
    
    Abstract
    This article describes a system that automatically classifies e-mail messages in the HUMANIST electronic discussion group into one of 4 classes: questions, responses, announcement or administartive. A total of 1.372 messages were analyzed. The automatic classification of a message was based on string matching between a message text and predefined string sets for each of the massage types. The system's automated ability to accurately classify a message was compared against manually assigned codes. The Cohen's Kappa of .55 suggested that there was a statistical agreement between the automatic and manually assigned codes
    Source
    Journal of the American Society for Information Science. 48(1997) no.1, S.32-39
    Type
    a
  14. Rose, J.R.; Gasteiger, J.: HORACE: an automatic system for the hierarchical classification of chemical reactions (1994) 0.00
    0.0016593148 = product of:
      0.009126231 = sum of:
        0.006695806 = weight(_text_:a in 7696) [ClassicSimilarity], result of:
          0.006695806 = score(doc=7696,freq=12.0), product of:
            0.030653298 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.026584605 = queryNorm
            0.21843673 = fieldWeight in 7696, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=7696)
        0.0024304248 = weight(_text_:s in 7696) [ClassicSimilarity], result of:
          0.0024304248 = score(doc=7696,freq=2.0), product of:
            0.028903782 = queryWeight, product of:
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.026584605 = queryNorm
            0.08408674 = fieldWeight in 7696, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.0546875 = fieldNorm(doc=7696)
      0.18181819 = coord(2/11)
    
    Abstract
    Describes an automatic classification system for classifying chemical reactions. A detailed study of the classification of chemical reactions, based on topological and physicochemical features, is followed by an analysis of the hierarchical classification produced by the HORACE algorithm (Hierarchical Organization of Reactions through Attribute and Condition Eduction), which combines both approaches in a synergistic manner. The searching and updating of reaction hierarchies is demonstrated with the hierarchies produced for 2 data sets by the HORACE algorithm. Shows that reaction hierarchies provide an efficient access to reaction information and indicate the main reaction types for a given reaction scheme, define the scope of a reaction type, enable searchers to find unusual reactions, and can help in locating the reactions most relevant for a given problem
    Source
    Journal of chemical information and computer sciences. 34(1994) no.1, S.74-90
    Type
    a
  15. McKiernan, G.: Automated categorisation of Web resources : a profile of selected projects, research, products, and services (1996) 0.00
    0.0016353898 = product of:
      0.008994644 = sum of:
        0.0055226083 = weight(_text_:a in 2533) [ClassicSimilarity], result of:
          0.0055226083 = score(doc=2533,freq=4.0), product of:
            0.030653298 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.026584605 = queryNorm
            0.18016359 = fieldWeight in 2533, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.078125 = fieldNorm(doc=2533)
        0.0034720355 = weight(_text_:s in 2533) [ClassicSimilarity], result of:
          0.0034720355 = score(doc=2533,freq=2.0), product of:
            0.028903782 = queryWeight, product of:
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.026584605 = queryNorm
            0.120123915 = fieldWeight in 2533, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.078125 = fieldNorm(doc=2533)
      0.18181819 = coord(2/11)
    
    Source
    New review of information networking. 1996, no.2, S.15-40
    Type
    a
  16. Subramanian, S.; Shafer, K.E.: Clustering (1998) 0.00
    0.0016353898 = product of:
      0.008994644 = sum of:
        0.0055226083 = weight(_text_:a in 1103) [ClassicSimilarity], result of:
          0.0055226083 = score(doc=1103,freq=4.0), product of:
            0.030653298 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.026584605 = queryNorm
            0.18016359 = fieldWeight in 1103, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.078125 = fieldNorm(doc=1103)
        0.0034720355 = weight(_text_:s in 1103) [ClassicSimilarity], result of:
          0.0034720355 = score(doc=1103,freq=2.0), product of:
            0.028903782 = queryWeight, product of:
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.026584605 = queryNorm
            0.120123915 = fieldWeight in 1103, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.078125 = fieldNorm(doc=1103)
      0.18181819 = coord(2/11)
    
    Abstract
    This article presents our exploration of computer science clustering algorithms as they relate to the Scorpion system. Scorpion is a research project at OCLC that explores the indexing and cataloging of electronic resources. For a more complete description of the Scorpion, please visit the Scorpion Web site at <http://purl.oclc.org/scorpion>
  17. GERHARD : eine Spezialsuchmaschine für die Wissenschaft (1998) 0.00
    0.001609551 = product of:
      0.00885253 = sum of:
        0.0046860883 = weight(_text_:a in 381) [ClassicSimilarity], result of:
          0.0046860883 = score(doc=381,freq=2.0), product of:
            0.030653298 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.026584605 = queryNorm
            0.15287387 = fieldWeight in 381, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.09375 = fieldNorm(doc=381)
        0.0041664424 = weight(_text_:s in 381) [ClassicSimilarity], result of:
          0.0041664424 = score(doc=381,freq=2.0), product of:
            0.028903782 = queryWeight, product of:
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.026584605 = queryNorm
            0.14414869 = fieldWeight in 381, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.09375 = fieldNorm(doc=381)
      0.18181819 = coord(2/11)
    
    Source
    RRZN-Info. 1998, BI315, S.9
    Type
    a
  18. Krellenstein, M.: Document classification at Northern Light (1999) 0.00
    0.001609551 = product of:
      0.00885253 = sum of:
        0.0046860883 = weight(_text_:a in 4435) [ClassicSimilarity], result of:
          0.0046860883 = score(doc=4435,freq=2.0), product of:
            0.030653298 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.026584605 = queryNorm
            0.15287387 = fieldWeight in 4435, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.09375 = fieldNorm(doc=4435)
        0.0041664424 = weight(_text_:s in 4435) [ClassicSimilarity], result of:
          0.0041664424 = score(doc=4435,freq=2.0), product of:
            0.028903782 = queryWeight, product of:
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.026584605 = queryNorm
            0.14414869 = fieldWeight in 4435, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.09375 = fieldNorm(doc=4435)
      0.18181819 = coord(2/11)
    
    Pages
    11 S
    Type
    a
  19. Huang, Y.-L.: ¬A theoretic and empirical research of cluster indexing for Mandarine Chinese full text document (1998) 0.00
    0.0015532422 = product of:
      0.008542832 = sum of:
        0.006112407 = weight(_text_:a in 513) [ClassicSimilarity], result of:
          0.006112407 = score(doc=513,freq=10.0), product of:
            0.030653298 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.026584605 = queryNorm
            0.19940455 = fieldWeight in 513, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=513)
        0.0024304248 = weight(_text_:s in 513) [ClassicSimilarity], result of:
          0.0024304248 = score(doc=513,freq=2.0), product of:
            0.028903782 = queryWeight, product of:
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.026584605 = queryNorm
            0.08408674 = fieldWeight in 513, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.0546875 = fieldNorm(doc=513)
      0.18181819 = coord(2/11)
    
    Abstract
    Since most popular commercialized systems for full text retrieval are designed with full text scaning and Boolean logic query mode, these systems use an oversimplified relationship between the indexing form and the content of document. Reports the use of Singular Value Decomposition (SVD) to develop a Cluster Indexing Model (CIM) based on a Vector Space Model (VSM) in orer to explore the index theory of cluster indexing for chinese full text documents. From a series of experiments, it was found that the indexing performance of CIM is better than traditional VSM, and has almost equivalent effectiveness of the authority control of index terms
    Source
    Bulletin of library and information science. 1998, no.24, S.44-68
    Type
    a
  20. Cheng, P.T.K.; Wu, A.K.W.: ACS: an automatic classification system (1995) 0.00
    0.0014888468 = product of:
      0.008188657 = sum of:
        0.005411029 = weight(_text_:a in 2188) [ClassicSimilarity], result of:
          0.005411029 = score(doc=2188,freq=6.0), product of:
            0.030653298 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.026584605 = queryNorm
            0.17652355 = fieldWeight in 2188, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=2188)
        0.0027776284 = weight(_text_:s in 2188) [ClassicSimilarity], result of:
          0.0027776284 = score(doc=2188,freq=2.0), product of:
            0.028903782 = queryWeight, product of:
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.026584605 = queryNorm
            0.09609913 = fieldWeight in 2188, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.0625 = fieldNorm(doc=2188)
      0.18181819 = coord(2/11)
    
    Abstract
    In this paper, we introduce ACS, an automatic classification system for school libraries. First, various approaches towards automatic classification, namely (i) rule-based, (ii) browse and search, and (iii) partial match, are critically reviewed. The central issues of scheme selection, text analysis and similarity measures are discussed. A novel approach towards detecting book-class similarity with Modified Overlap Coefficient (MOC) is also proposed. Finally, the design and implementation of ACS is presented. The test result of over 80% correctness in automatic classification and a cost reduction of 75% compared to manual classification suggest that ACS is highly adoptable
    Source
    Journal of information science. 21(1995) no.4, S.289-299
    Type
    a