Search (37 results, page 2 of 2)

  • × theme_ss:"Automatisches Klassifizieren"
  • × year_i:[1990 TO 2000}
  1. Losee, R.M.: Text windows and phrases differing by discipline, location in document, and syntactic structure (1996) 0.00
    8.2263234E-4 = product of:
      0.0032905294 = sum of:
        0.0032905294 = product of:
          0.009871588 = sum of:
            0.009871588 = weight(_text_:a in 6962) [ClassicSimilarity], result of:
              0.009871588 = score(doc=6962,freq=8.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.17835285 = fieldWeight in 6962, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=6962)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Abstract
    Knowledge of window style, content, location, and grammatical structure may be used to classify documents as originating within a particular discipline or may be used to place a document on a theory vs. practice spectrum. Examines characteristics of phrases and text windows, including their number, location in documents, and grammatical construction, in addition to studying variations in these window characteristics across disciplines. Examines some of the linguistic regularities for individual disciplines, and suggests families of regularities that may provide helpful for the automatic classification of documents, as well as for information retrieval and filtering applications
    Type
    a
  2. Ruocco, A.S.; Frieder, O.: Clustering and classification of large document bases in a parallel environment (1997) 0.00
    8.2263234E-4 = product of:
      0.0032905294 = sum of:
        0.0032905294 = product of:
          0.009871588 = sum of:
            0.009871588 = weight(_text_:a in 1661) [ClassicSimilarity], result of:
              0.009871588 = score(doc=1661,freq=8.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.17835285 = fieldWeight in 1661, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1661)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Abstract
    Proposes the use of parallel computing systems to overcome the computationally intense clustering process. Examines 2 operations: clustering a document set and classifying the document set. Uses a subset of the TIPSTER corpus, specifically, articles from the Wall Street Journal. Document set classification was performed without the large storage requirements for ancillary data matrices. The time performance of the parallel systems was an improvement over sequential systems times, and produced the same clustering and classification scheme. Results show near linear speed up in higher threshold clustering applications
    Type
    a
  3. Cheng, P.T.K.; Wu, A.K.W.: ACS: an automatic classification system (1995) 0.00
    8.141949E-4 = product of:
      0.0032567796 = sum of:
        0.0032567796 = product of:
          0.009770338 = sum of:
            0.009770338 = weight(_text_:a in 2188) [ClassicSimilarity], result of:
              0.009770338 = score(doc=2188,freq=6.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.17652355 = fieldWeight in 2188, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0625 = fieldNorm(doc=2188)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Abstract
    In this paper, we introduce ACS, an automatic classification system for school libraries. First, various approaches towards automatic classification, namely (i) rule-based, (ii) browse and search, and (iii) partial match, are critically reviewed. The central issues of scheme selection, text analysis and similarity measures are discussed. A novel approach towards detecting book-class similarity with Modified Overlap Coefficient (MOC) is also proposed. Finally, the design and implementation of ACS is presented. The test result of over 80% correctness in automatic classification and a cost reduction of 75% compared to manual classification suggest that ACS is highly adoptable
    Type
    a
  4. Losee, R.M.; Haas, S.W.: Sublanguage terms : dictionaries, usage, and automatic classification (1995) 0.00
    8.141949E-4 = product of:
      0.0032567796 = sum of:
        0.0032567796 = product of:
          0.009770338 = sum of:
            0.009770338 = weight(_text_:a in 2650) [ClassicSimilarity], result of:
              0.009770338 = score(doc=2650,freq=6.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.17652355 = fieldWeight in 2650, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0625 = fieldNorm(doc=2650)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Abstract
    The use of terms from natural and social science titles and abstracts is studied from the perspective of sublanguages and their specialized dictionaries. Explores different notions of sublanguage distinctiveness. Object methods for separating hard and soft sciences are suggested based on measures of sublanguage use, dictionary characteristics, and sublanguage distinctiveness. Abstracts were automatically classified with a high degree of accuracy by using a formula that condsiders the degree of uniqueness of terms in each sublanguage. This may prove useful for text filtering of information retrieval systems
    Type
    a
  5. Koch, T.: Experiments with automatic classification of WAIS databases and indexing of WWW : some results from the Nordic WAIS/WWW project (1994) 0.00
    7.1242056E-4 = product of:
      0.0028496822 = sum of:
        0.0028496822 = product of:
          0.008549047 = sum of:
            0.008549047 = weight(_text_:a in 7209) [ClassicSimilarity], result of:
              0.008549047 = score(doc=7209,freq=6.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.1544581 = fieldWeight in 7209, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=7209)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Abstract
    The Nordic WAIS/WWW project sponsored by NORDINFO is a joint project between Lund University Library and the National Technological Library of Denmark. It aims to improve the existing networked information discovery and retrieval tools Wide Area Information System (WAIS) and World Wide Web (WWW), and to move towards unifying WWW and WAIS. Details current results focusing on the WAIS side of the project. Describes research into automatic indexing and classification of WAIS sources, development of an orientation tool for WAIS, and development of a WAIS index of WWW resources
    Type
    a
  6. Savic, D.: Automatic classification of office documents : review of available methods and techniques (1995) 0.00
    7.1242056E-4 = product of:
      0.0028496822 = sum of:
        0.0028496822 = product of:
          0.008549047 = sum of:
            0.008549047 = weight(_text_:a in 2219) [ClassicSimilarity], result of:
              0.008549047 = score(doc=2219,freq=6.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.1544581 = fieldWeight in 2219, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2219)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Abstract
    Classification of office documents is one of the administrative functions carried out by almost every organization and institution which sends and receives correspondence. Processing of this increasing amount of information coming and out going mail, in particular its classification, is time consuming and expensive. More and more organizations are seeking a solution for meeting this challenge by designing computer based systems for automatic classification. Examines the present status of available knowledge and methodology which can be used for automatic classification of office documents. Besides a review of classic methods and techniques, the focus id also placed on the application of artificial intelligence
    Type
    a
  7. Larson, R.R.: Experiments in automatic Library of Congress Classification (1992) 0.00
    7.051135E-4 = product of:
      0.002820454 = sum of:
        0.002820454 = product of:
          0.008461362 = sum of:
            0.008461362 = weight(_text_:a in 1054) [ClassicSimilarity], result of:
              0.008461362 = score(doc=1054,freq=8.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.15287387 = fieldWeight in 1054, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1054)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Abstract
    This article presents the results of research into the automatic selection of Library of Congress Classification numbers based on the titles and subject headings in MARC records. The method used in this study was based on partial match retrieval techniques using various elements of new recors (i.e., those to be classified) as "queries", and a test database of classification clusters generated from previously classified MARC records. Sixty individual methods for automatic classification were tested on a set of 283 new records, using all combinations of four different partial match methods, five query types, and three representations of search terms. The results indicate that if the best method for a particular case can be determined, then up to 86% of the new records may be correctly classified. The single method with the best accuracy was able to select the correct classification for about 46% of the new records.
    Type
    a
  8. GERHARD : eine Spezialsuchmaschine für die Wissenschaft (1998) 0.00
    7.051135E-4 = product of:
      0.002820454 = sum of:
        0.002820454 = product of:
          0.008461362 = sum of:
            0.008461362 = weight(_text_:a in 381) [ClassicSimilarity], result of:
              0.008461362 = score(doc=381,freq=2.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.15287387 = fieldWeight in 381, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.09375 = fieldNorm(doc=381)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Type
    a
  9. Krellenstein, M.: Document classification at Northern Light (1999) 0.00
    7.051135E-4 = product of:
      0.002820454 = sum of:
        0.002820454 = product of:
          0.008461362 = sum of:
            0.008461362 = weight(_text_:a in 4435) [ClassicSimilarity], result of:
              0.008461362 = score(doc=4435,freq=2.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.15287387 = fieldWeight in 4435, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.09375 = fieldNorm(doc=4435)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Type
    a
  10. Savic, D.: Designing an expert system for classifying office documents (1994) 0.00
    6.6478737E-4 = product of:
      0.0026591495 = sum of:
        0.0026591495 = product of:
          0.007977448 = sum of:
            0.007977448 = weight(_text_:a in 2655) [ClassicSimilarity], result of:
              0.007977448 = score(doc=2655,freq=4.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.14413087 = fieldWeight in 2655, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0625 = fieldNorm(doc=2655)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Abstract
    Can records management benefit from artificial intelligence technology, in particular from expert systems? Gives an answer to this question by showing an example of a small scale prototype project in automatic classification of office documents. Project methodology and basic elements of an expert system's approach are elaborated to give guidelines to potential users of this promising technology
    Type
    a
  11. Koch, T.; Ardö, A.; Brümmer, A.: ¬The building and maintenance of robot based internet search services : A review of current indexing and data collection methods. Prepared to meet the requirements of Work Package 3 of EU Telematics for Research, project DESIRE. Version D3.11v0.3 (Draft version 3) (1996) 0.00
    6.2185165E-4 = product of:
      0.0024874066 = sum of:
        0.0024874066 = product of:
          0.0074622193 = sum of:
            0.0074622193 = weight(_text_:a in 1669) [ClassicSimilarity], result of:
              0.0074622193 = score(doc=1669,freq=14.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.13482209 = fieldWeight in 1669, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.03125 = fieldNorm(doc=1669)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Abstract
    After a short outline of problems, possibilities and difficulties of systematic information retrieval on the Internet and a description of efforts for development in this area, a specification of the terminology for this report is required. Although the process of retrieval is generally seen as an iterative process of browsing and information retrieval and several important services on the net have taken this fact into consideration, the emphasis of this report lays on the general retrieval tools for the whole of Internet. In order to be able to evaluate the differences, possibilities and restrictions of the different services it is necessary to begin with organizing the existing varieties in a typological/ taxonomical survey. The possibilities and weaknesses will be briefly compared and described for the most important services in the categories robot-based WWW-catalogues of different types, list- or form-based catalogues and simultaneous or collected search services respectively. It will however for different reasons not be possible to rank them in order of "best" services. Still more important are the weaknesses and problems common for all attempts of indexing the Internet. The problems of the quality of the input, the technical performance and the general problem of indexing virtual hypertext are shown to be at least as difficult as the different aspects of harvesting, indexing and information retrieval. Some of the attempts made in the area of further development of retrieval services will be mentioned in relation to descriptions of the contents of documents and standardization efforts. Internet harvesting and indexing technology and retrieval software is thoroughly reviewed. Details about all services and software are listed in analytical forms in Annex 1-3.
  12. Mostafa, J.; Quiroga, L.M.; Palakal, M.: Filtering medical documents using automated and human classification methods (1998) 0.00
    6.106462E-4 = product of:
      0.0024425848 = sum of:
        0.0024425848 = product of:
          0.007327754 = sum of:
            0.007327754 = weight(_text_:a in 2326) [ClassicSimilarity], result of:
              0.007327754 = score(doc=2326,freq=6.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.13239266 = fieldWeight in 2326, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2326)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Abstract
    The goal of this research is to clarify the role of document classification in information filtering. An important function of classification, in managing computational complexity, is described and illustrated in the context of an existing filtering system. A parameter called classification homogeneity is presented for analyzing unsupervised automated classification by employing human classification as a control. 2 significant components of the automated classification approach, vocabulary discovery and classification scheme generation, are described in detail. Results of classification performance revealed considerable variability in the homogeneity of automatically produced classes. Based on the classification performance, different types of interest profiles were created. Subsequently, these profiles were used to perform filtering sessions. The filtering results showed that with increasing homogeneity, filtering performance improves, and, conversely, with decreasing homogeneity, filtering performance degrades
    Type
    a
  13. Vizine-Goetz, D.: NetLab / OCLC collaboration seeks to improve Web searching (1999) 0.00
    5.875945E-4 = product of:
      0.002350378 = sum of:
        0.002350378 = product of:
          0.007051134 = sum of:
            0.007051134 = weight(_text_:a in 4180) [ClassicSimilarity], result of:
              0.007051134 = score(doc=4180,freq=2.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.12739488 = fieldWeight in 4180, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.078125 = fieldNorm(doc=4180)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Type
    a
  14. Möller, G.: Automatic classification of the World Wide Web using Universal Decimal Classification (1999) 0.00
    5.875945E-4 = product of:
      0.002350378 = sum of:
        0.002350378 = product of:
          0.007051134 = sum of:
            0.007051134 = weight(_text_:a in 494) [ClassicSimilarity], result of:
              0.007051134 = score(doc=494,freq=2.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.12739488 = fieldWeight in 494, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.078125 = fieldNorm(doc=494)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Type
    a
  15. Ingwersen, P.; Wormell, I.: Ranganathan in the perspective of advanced information retrieval (1992) 0.00
    4.700756E-4 = product of:
      0.0018803024 = sum of:
        0.0018803024 = product of:
          0.005640907 = sum of:
            0.005640907 = weight(_text_:a in 7695) [ClassicSimilarity], result of:
              0.005640907 = score(doc=7695,freq=2.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.10191591 = fieldWeight in 7695, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0625 = fieldNorm(doc=7695)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Type
    a
  16. Search Engines and Beyond : Developing efficient knowledge management systems, April 19-20 1999, Boston, Mass (1999) 0.00
    4.700756E-4 = product of:
      0.0018803024 = sum of:
        0.0018803024 = product of:
          0.005640907 = sum of:
            0.005640907 = weight(_text_:a in 2596) [ClassicSimilarity], result of:
              0.005640907 = score(doc=2596,freq=8.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.10191591 = fieldWeight in 2596, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.03125 = fieldNorm(doc=2596)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Content
    Ramana Rao (Inxight, Palo Alto, CA) 7 ± 2 Insights on achieving Effective Information Access Session One: Updates and a twelve month perspective Danny Sullivan (Search Engine Watch, US / England) Portalization and other search trends Carol Tenopir (University of Tennessee) Search realities faced by end users and professional searchers Session Two: Today's search engines and beyond Daniel Hoogterp (Retrieval Technologies, McLean, VA) Effective presentation and utilization of search techniques Rick Kenny (Fulcrum Technologies, Ontario, Canada) Beyond document clustering: The knowledge impact statement Gary Stock (Ingenius, Kalamazoo, MI) Automated change monitoring Gary Culliss (Direct Hit, Wellesley Hills, MA) User popularity ranked search engines Byron Dom (IBM, CA) Automatically finding the best pages on the World Wide Web (CLEVER) Peter Tomassi (LookSmart, San Francisco, CA) Adding human intellect to search technology Session Three: Panel discussion: Human v automated categorization and editing Ev Brenner (New York, NY)- Chairman James Callan (University of Massachusetts, MA) Marc Krellenstein (Northern Light Technology, Cambridge, MA) Dan Miller (Ask Jeeves, Berkeley, CA) Session Four: Updates and a twelve month perspective Steve Arnold (AIT, Harrods Creek, KY) Review: The leading edge in search and retrieval software Ellen Voorhees (NIST, Gaithersburg, MD) TREC update Session Five: Search engines now and beyond Intelligent Agents John Snyder (Muscat, Cambridge, England) Practical issues behind intelligent agents Text summarization Therese Firmin, (Dept of Defense, Ft George G. Meade, MD) The TIPSTER/SUMMAC evaluation of automatic text summarization systems Cross language searching Elizabeth Liddy (TextWise, Syracuse, NY) A conceptual interlingua approach to cross-language retrieval. Video search and retrieval Armon Amir (IBM, Almaden, CA) CueVideo: Modular system for automatic indexing and browsing of video/audio Speech recognition Michael Witbrock (Lycos, Waltham, MA) Retrieval of spoken documents Visualization James A. Wise (Integral Visuals, Richland, WA) Information visualization in the new millennium: Emerging science or passing fashion? Text mining David Evans (Claritech, Pittsburgh, PA) Text mining - towards decision support
  17. Koch, T.; Vizine-Goetz, D.: Automatic classification and content navigation support for Web services : DESIRE II cooperates with OCLC (1998) 0.00
    4.1131617E-4 = product of:
      0.0016452647 = sum of:
        0.0016452647 = product of:
          0.004935794 = sum of:
            0.004935794 = weight(_text_:a in 1568) [ClassicSimilarity], result of:
              0.004935794 = score(doc=1568,freq=2.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.089176424 = fieldWeight in 1568, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1568)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Abstract
    Emerging standards in knowledge representation and organization are preparing the way for distributed vocabulary support in Internet search services. NetLab researchers are exploring several innovative solutions for searching and browsing in the subject-based Internet gateway, Electronic Engineering Library, Sweden (EELS). The implementation of the EELS service is described, specifically, the generation of the robot-gathered database 'All' engineering and the automated application of the Ei thesaurus and classification scheme. NetLab and OCLC researchers are collaborating to investigate advanced solutions to automated classification in the DESIRE II context. A plan for furthering the development of distributed vocabulary support in Internet search services is offered.