Search (35 results, page 2 of 2)

  • × theme_ss:"Automatisches Klassifizieren"
  • × year_i:[1990 TO 2000}
  1. Larson, R.R.: Experiments in automatic Library of Congress Classification (1992) 0.01
    0.007041318 = product of:
      0.014082636 = sum of:
        0.008916007 = weight(_text_:a in 1054) [ClassicSimilarity], result of:
          0.008916007 = score(doc=1054,freq=8.0), product of:
            0.05832264 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.05058132 = queryNorm
            0.15287387 = fieldWeight in 1054, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=1054)
        0.0051666284 = product of:
          0.010333257 = sum of:
            0.010333257 = weight(_text_:information in 1054) [ClassicSimilarity], result of:
              0.010333257 = score(doc=1054,freq=2.0), product of:
                0.088794395 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.05058132 = queryNorm
                0.116372846 = fieldWeight in 1054, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1054)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    This article presents the results of research into the automatic selection of Library of Congress Classification numbers based on the titles and subject headings in MARC records. The method used in this study was based on partial match retrieval techniques using various elements of new recors (i.e., those to be classified) as "queries", and a test database of classification clusters generated from previously classified MARC records. Sixty individual methods for automatic classification were tested on a set of 283 new records, using all combinations of four different partial match methods, five query types, and three representations of search terms. The results indicate that if the best method for a particular case can be determined, then up to 86% of the new records may be correctly classified. The single method with the best accuracy was able to select the correct classification for about 46% of the new records.
    Source
    Journal of the American Society for Information Science. 43(1992), S.130-148
    Type
    a
  2. Koch, T.; Ardö, A.; Brümmer, A.: ¬The building and maintenance of robot based internet search services : A review of current indexing and data collection methods. Prepared to meet the requirements of Work Package 3 of EU Telematics for Research, project DESIRE. Version D3.11v0.3 (Draft version 3) (1996) 0.01
    0.0069145444 = product of:
      0.013829089 = sum of:
        0.00786318 = weight(_text_:a in 1669) [ClassicSimilarity], result of:
          0.00786318 = score(doc=1669,freq=14.0), product of:
            0.05832264 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.05058132 = queryNorm
            0.13482209 = fieldWeight in 1669, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.03125 = fieldNorm(doc=1669)
        0.005965909 = product of:
          0.011931818 = sum of:
            0.011931818 = weight(_text_:information in 1669) [ClassicSimilarity], result of:
              0.011931818 = score(doc=1669,freq=6.0), product of:
                0.088794395 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.05058132 = queryNorm
                0.1343758 = fieldWeight in 1669, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.03125 = fieldNorm(doc=1669)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    After a short outline of problems, possibilities and difficulties of systematic information retrieval on the Internet and a description of efforts for development in this area, a specification of the terminology for this report is required. Although the process of retrieval is generally seen as an iterative process of browsing and information retrieval and several important services on the net have taken this fact into consideration, the emphasis of this report lays on the general retrieval tools for the whole of Internet. In order to be able to evaluate the differences, possibilities and restrictions of the different services it is necessary to begin with organizing the existing varieties in a typological/ taxonomical survey. The possibilities and weaknesses will be briefly compared and described for the most important services in the categories robot-based WWW-catalogues of different types, list- or form-based catalogues and simultaneous or collected search services respectively. It will however for different reasons not be possible to rank them in order of "best" services. Still more important are the weaknesses and problems common for all attempts of indexing the Internet. The problems of the quality of the input, the technical performance and the general problem of indexing virtual hypertext are shown to be at least as difficult as the different aspects of harvesting, indexing and information retrieval. Some of the attempts made in the area of further development of retrieval services will be mentioned in relation to descriptions of the contents of documents and standardization efforts. Internet harvesting and indexing technology and retrieval software is thoroughly reviewed. Details about all services and software are listed in analytical forms in Annex 1-3.
  3. Search Engines and Beyond : Developing efficient knowledge management systems, April 19-20 1999, Boston, Mass (1999) 0.01
    0.005407574 = product of:
      0.010815148 = sum of:
        0.0059440047 = weight(_text_:a in 2596) [ClassicSimilarity], result of:
          0.0059440047 = score(doc=2596,freq=8.0), product of:
            0.05832264 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.05058132 = queryNorm
            0.10191591 = fieldWeight in 2596, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.03125 = fieldNorm(doc=2596)
        0.0048711435 = product of:
          0.009742287 = sum of:
            0.009742287 = weight(_text_:information in 2596) [ClassicSimilarity], result of:
              0.009742287 = score(doc=2596,freq=4.0), product of:
                0.088794395 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.05058132 = queryNorm
                0.10971737 = fieldWeight in 2596, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.03125 = fieldNorm(doc=2596)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Content
    Ramana Rao (Inxight, Palo Alto, CA) 7 ± 2 Insights on achieving Effective Information Access Session One: Updates and a twelve month perspective Danny Sullivan (Search Engine Watch, US / England) Portalization and other search trends Carol Tenopir (University of Tennessee) Search realities faced by end users and professional searchers Session Two: Today's search engines and beyond Daniel Hoogterp (Retrieval Technologies, McLean, VA) Effective presentation and utilization of search techniques Rick Kenny (Fulcrum Technologies, Ontario, Canada) Beyond document clustering: The knowledge impact statement Gary Stock (Ingenius, Kalamazoo, MI) Automated change monitoring Gary Culliss (Direct Hit, Wellesley Hills, MA) User popularity ranked search engines Byron Dom (IBM, CA) Automatically finding the best pages on the World Wide Web (CLEVER) Peter Tomassi (LookSmart, San Francisco, CA) Adding human intellect to search technology Session Three: Panel discussion: Human v automated categorization and editing Ev Brenner (New York, NY)- Chairman James Callan (University of Massachusetts, MA) Marc Krellenstein (Northern Light Technology, Cambridge, MA) Dan Miller (Ask Jeeves, Berkeley, CA) Session Four: Updates and a twelve month perspective Steve Arnold (AIT, Harrods Creek, KY) Review: The leading edge in search and retrieval software Ellen Voorhees (NIST, Gaithersburg, MD) TREC update Session Five: Search engines now and beyond Intelligent Agents John Snyder (Muscat, Cambridge, England) Practical issues behind intelligent agents Text summarization Therese Firmin, (Dept of Defense, Ft George G. Meade, MD) The TIPSTER/SUMMAC evaluation of automatic text summarization systems Cross language searching Elizabeth Liddy (TextWise, Syracuse, NY) A conceptual interlingua approach to cross-language retrieval. Video search and retrieval Armon Amir (IBM, Almaden, CA) CueVideo: Modular system for automatic indexing and browsing of video/audio Speech recognition Michael Witbrock (Lycos, Waltham, MA) Retrieval of spoken documents Visualization James A. Wise (Integral Visuals, Richland, WA) Information visualization in the new millennium: Emerging science or passing fashion? Text mining David Evans (Claritech, Pittsburgh, PA) Text mining - towards decision support
  4. Sebastiani, F.: ¬A tutorial an automated text categorisation (1999) 0.00
    0.0033435028 = product of:
      0.013374011 = sum of:
        0.013374011 = weight(_text_:a in 3390) [ClassicSimilarity], result of:
          0.013374011 = score(doc=3390,freq=18.0), product of:
            0.05832264 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.05058132 = queryNorm
            0.22931081 = fieldWeight in 3390, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=3390)
      0.25 = coord(1/4)
    
    Abstract
    The automated categorisation (or classification) of texts into topical categories has a long history, dating back at least to 1960. Until the late '80s, the dominant approach to the problem involved knowledge-engineering automatic categorisers, i.e. manually building a set of rules encoding expert knowledge an how to classify documents. In the '90s, with the booming production and availability of on-line documents, automated text categorisation has witnessed an increased and renewed interest. A newer paradigm based an machine learning has superseded the previous approach. Within this paradigm, a general inductive process automatically builds a classifier by "learning", from a set of previously classified documents, the characteristics of one or more categories; the advantages are a very good effectiveness, a considerable savings in terms of expert manpower, and domain independence. In this tutorial we look at the main approaches that have been taken towards automatic text categorisation within the general machine learning paradigm. Issues of document indexing, classifier construction, and classifier evaluation, will be touched upon.
  5. Yang, Y.; Liu, X.: ¬A re-examination of text categorization methods (1999) 0.00
    0.0031849516 = product of:
      0.012739806 = sum of:
        0.012739806 = weight(_text_:a in 3386) [ClassicSimilarity], result of:
          0.012739806 = score(doc=3386,freq=12.0), product of:
            0.05832264 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.05058132 = queryNorm
            0.21843673 = fieldWeight in 3386, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3386)
      0.25 = coord(1/4)
    
    Abstract
    This paper reports a controlled study with statistical significance tests an five text categorization methods: the Support Vector Machines (SVM), a k-Nearest Neighbor (kNN) classifier, a neural network (NNet) approach, the Linear Leastsquares Fit (LLSF) mapping and a Naive Bayes (NB) classifier. We focus an the robustness of these methods in dealing with a skewed category distribution, and their performance as function of the training-set category frequency. Our results show that SVM, kNN and LLSF significantly outperform NNet and NB when the number of positive training instances per category are small (less than ten, and that all the methods perform comparably when the categories are sufficiently common (over 300 instances).
  6. Koch, T.; Ardö, A.; Noodén, L.: ¬The construction of a robot-generated subject index : DESIRE II D3.6a, Working Paper 1 (1999) 0.00
    0.0027299584 = product of:
      0.010919834 = sum of:
        0.010919834 = weight(_text_:a in 1668) [ClassicSimilarity], result of:
          0.010919834 = score(doc=1668,freq=12.0), product of:
            0.05832264 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.05058132 = queryNorm
            0.18723148 = fieldWeight in 1668, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=1668)
      0.25 = coord(1/4)
    
    Abstract
    This working paper describes the creation of a test database to carry out the automatic classification tasks of the DESIRE II work package D3.6a on. It is an improved version of NetLab's existing "All" Engineering database created after a comparative study of the outcome of two different approaches to collecting the documents. These two methods were selected from seven different general methodologies to build robot-generated subject indices, presented in this paper. We found a surprisingly low overlap between the Engineering link collections we used as seed pages for the robot and subsequently an even more surprisingly low overlap between the resources collected by the two different approaches. That inspite of using basically the same services to start the harvesting process from. A intellectual evaluation of the contents of both databases showed almost exactly the same percentage of relevant documents (77%), indicating that the main difference between those aproaches was the coverage of the resulting database.
  7. Subramanian, S.; Shafer, K.E.: Clustering (1998) 0.00
    0.002626904 = product of:
      0.010507616 = sum of:
        0.010507616 = weight(_text_:a in 1103) [ClassicSimilarity], result of:
          0.010507616 = score(doc=1103,freq=4.0), product of:
            0.05832264 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.05058132 = queryNorm
            0.18016359 = fieldWeight in 1103, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.078125 = fieldNorm(doc=1103)
      0.25 = coord(1/4)
    
    Abstract
    This article presents our exploration of computer science clustering algorithms as they relate to the Scorpion system. Scorpion is a research project at OCLC that explores the indexing and cataloging of electronic resources. For a more complete description of the Scorpion, please visit the Scorpion Web site at <http://purl.oclc.org/scorpion>
  8. GERHARD : eine Spezialsuchmaschine für die Wissenschaft (1998) 0.00
    0.0022290018 = product of:
      0.008916007 = sum of:
        0.008916007 = weight(_text_:a in 381) [ClassicSimilarity], result of:
          0.008916007 = score(doc=381,freq=2.0), product of:
            0.05832264 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.05058132 = queryNorm
            0.15287387 = fieldWeight in 381, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.09375 = fieldNorm(doc=381)
      0.25 = coord(1/4)
    
    Type
    a
  9. Krellenstein, M.: Document classification at Northern Light (1999) 0.00
    0.0022290018 = product of:
      0.008916007 = sum of:
        0.008916007 = weight(_text_:a in 4435) [ClassicSimilarity], result of:
          0.008916007 = score(doc=4435,freq=2.0), product of:
            0.05832264 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.05058132 = queryNorm
            0.15287387 = fieldWeight in 4435, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.09375 = fieldNorm(doc=4435)
      0.25 = coord(1/4)
    
    Type
    a
  10. Savic, D.: Designing an expert system for classifying office documents (1994) 0.00
    0.002101523 = product of:
      0.008406092 = sum of:
        0.008406092 = weight(_text_:a in 2655) [ClassicSimilarity], result of:
          0.008406092 = score(doc=2655,freq=4.0), product of:
            0.05832264 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.05058132 = queryNorm
            0.14413087 = fieldWeight in 2655, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=2655)
      0.25 = coord(1/4)
    
    Abstract
    Can records management benefit from artificial intelligence technology, in particular from expert systems? Gives an answer to this question by showing an example of a small scale prototype project in automatic classification of office documents. Project methodology and basic elements of an expert system's approach are elaborated to give guidelines to potential users of this promising technology
    Type
    a
  11. Vizine-Goetz, D.: NetLab / OCLC collaboration seeks to improve Web searching (1999) 0.00
    0.0018575015 = product of:
      0.007430006 = sum of:
        0.007430006 = weight(_text_:a in 4180) [ClassicSimilarity], result of:
          0.007430006 = score(doc=4180,freq=2.0), product of:
            0.05832264 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.05058132 = queryNorm
            0.12739488 = fieldWeight in 4180, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.078125 = fieldNorm(doc=4180)
      0.25 = coord(1/4)
    
    Type
    a
  12. Koch, T.: Nutzung von Klassifikationssystemen zur verbesserten Beschreibung, Organisation und Suche von Internetressourcen (1998) 0.00
    0.0014860012 = product of:
      0.0059440047 = sum of:
        0.0059440047 = weight(_text_:a in 1030) [ClassicSimilarity], result of:
          0.0059440047 = score(doc=1030,freq=2.0), product of:
            0.05832264 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.05058132 = queryNorm
            0.10191591 = fieldWeight in 1030, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=1030)
      0.25 = coord(1/4)
    
    Type
    a
  13. Wätjen, H.-J.: GERHARD : Automatisches Sammeln, Klassifizieren und Indexieren von wissenschaftlich relevanten Informationsressourcen im deutschen World Wide Web (1998) 0.00
    0.0013002511 = product of:
      0.0052010044 = sum of:
        0.0052010044 = weight(_text_:a in 3064) [ClassicSimilarity], result of:
          0.0052010044 = score(doc=3064,freq=2.0), product of:
            0.05832264 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.05058132 = queryNorm
            0.089176424 = fieldWeight in 3064, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3064)
      0.25 = coord(1/4)
    
    Type
    a
  14. Koch, T.; Vizine-Goetz, D.: Automatic classification and content navigation support for Web services : DESIRE II cooperates with OCLC (1998) 0.00
    0.0013002511 = product of:
      0.0052010044 = sum of:
        0.0052010044 = weight(_text_:a in 1568) [ClassicSimilarity], result of:
          0.0052010044 = score(doc=1568,freq=2.0), product of:
            0.05832264 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.05058132 = queryNorm
            0.089176424 = fieldWeight in 1568, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1568)
      0.25 = coord(1/4)
    
    Abstract
    Emerging standards in knowledge representation and organization are preparing the way for distributed vocabulary support in Internet search services. NetLab researchers are exploring several innovative solutions for searching and browsing in the subject-based Internet gateway, Electronic Engineering Library, Sweden (EELS). The implementation of the EELS service is described, specifically, the generation of the robot-gathered database 'All' engineering and the automated application of the Ei thesaurus and classification scheme. NetLab and OCLC researchers are collaborating to investigate advanced solutions to automated classification in the DESIRE II context. A plan for furthering the development of distributed vocabulary support in Internet search services is offered.
  15. Koch, T.; Vizine-Goetz, D.: DDC and knowledge organization in the digital library : Research and development. Demonstration pages (1999) 0.00
    0.0011145009 = product of:
      0.0044580037 = sum of:
        0.0044580037 = weight(_text_:a in 942) [ClassicSimilarity], result of:
          0.0044580037 = score(doc=942,freq=2.0), product of:
            0.05832264 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.05058132 = queryNorm
            0.07643694 = fieldWeight in 942, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=942)
      0.25 = coord(1/4)
    
    Content
    1. Increased Importance of Knowledge Organization in Internet Services - 2. Quality Subject Service and the role of classification - 3. Developing the DDC into a knowledge organization instrument for the digital library. OCLC site - 4. DESIRE's Barefoot Solutions of Automatic Classification - 5. Advanced Classification Solutions in DESIRE and CORC - 6. Future directions of research and development - 7. General references