Search (18 results, page 1 of 1)

  • × year_i:[1990 TO 2000}
  • × theme_ss:"Automatisches Klassifizieren"
  1. Dubin, D.: Dimensions and discriminability (1998) 0.03
    0.02822417 = product of:
      0.042336255 = sum of:
        0.01867095 = weight(_text_:on in 2338) [ClassicSimilarity], result of:
          0.01867095 = score(doc=2338,freq=2.0), product of:
            0.109763056 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.04990557 = queryNorm
            0.17010231 = fieldWeight in 2338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2338)
        0.023665305 = product of:
          0.04733061 = sum of:
            0.04733061 = weight(_text_:22 in 2338) [ClassicSimilarity], result of:
              0.04733061 = score(doc=2338,freq=2.0), product of:
                0.1747608 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04990557 = queryNorm
                0.2708308 = fieldWeight in 2338, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2338)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Date
    22. 9.1997 19:16:05
    Source
    Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al
  2. Larson, R.R.: Experiments in automatic Library of Congress Classification (1992) 0.01
    0.009239726 = product of:
      0.027719175 = sum of:
        0.027719175 = weight(_text_:on in 1054) [ClassicSimilarity], result of:
          0.027719175 = score(doc=1054,freq=6.0), product of:
            0.109763056 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.04990557 = queryNorm
            0.25253648 = fieldWeight in 1054, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.046875 = fieldNorm(doc=1054)
      0.33333334 = coord(1/3)
    
    Abstract
    This article presents the results of research into the automatic selection of Library of Congress Classification numbers based on the titles and subject headings in MARC records. The method used in this study was based on partial match retrieval techniques using various elements of new recors (i.e., those to be classified) as "queries", and a test database of classification clusters generated from previously classified MARC records. Sixty individual methods for automatic classification were tested on a set of 283 new records, using all combinations of four different partial match methods, five query types, and three representations of search terms. The results indicate that if the best method for a particular case can be determined, then up to 86% of the new records may be correctly classified. The single method with the best accuracy was able to select the correct classification for about 46% of the new records.
  3. Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998) 0.01
    0.007888435 = product of:
      0.023665305 = sum of:
        0.023665305 = product of:
          0.04733061 = sum of:
            0.04733061 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
              0.04733061 = score(doc=1673,freq=2.0), product of:
                0.1747608 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04990557 = queryNorm
                0.2708308 = fieldWeight in 1673, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1673)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    1. 8.1996 22:08:06
  4. Sebastiani, F.: ¬A tutorial an automated text categorisation (1999) 0.01
    0.0075442037 = product of:
      0.02263261 = sum of:
        0.02263261 = weight(_text_:on in 3390) [ClassicSimilarity], result of:
          0.02263261 = score(doc=3390,freq=4.0), product of:
            0.109763056 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.04990557 = queryNorm
            0.20619515 = fieldWeight in 3390, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.046875 = fieldNorm(doc=3390)
      0.33333334 = coord(1/3)
    
    Abstract
    The automated categorisation (or classification) of texts into topical categories has a long history, dating back at least to 1960. Until the late '80s, the dominant approach to the problem involved knowledge-engineering automatic categorisers, i.e. manually building a set of rules encoding expert knowledge an how to classify documents. In the '90s, with the booming production and availability of on-line documents, automated text categorisation has witnessed an increased and renewed interest. A newer paradigm based an machine learning has superseded the previous approach. Within this paradigm, a general inductive process automatically builds a classifier by "learning", from a set of previously classified documents, the characteristics of one or more categories; the advantages are a very good effectiveness, a considerable savings in terms of expert manpower, and domain independence. In this tutorial we look at the main approaches that have been taken towards automatic text categorisation within the general machine learning paradigm. Issues of document indexing, classifier construction, and classifier evaluation, will be touched upon.
    Content
    Aus: Proceedings of THAI-99, European Symposium on Telematics, Hypermedia and Artificial Intelligence
  5. Dolin, R.; Agrawal, D.; El Abbadi, A.; Pearlman, J.: Using automated classification for summarizing and selecting heterogeneous information sources (1998) 0.01
    0.0075442037 = product of:
      0.02263261 = sum of:
        0.02263261 = weight(_text_:on in 1253) [ClassicSimilarity], result of:
          0.02263261 = score(doc=1253,freq=16.0), product of:
            0.109763056 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.04990557 = queryNorm
            0.20619515 = fieldWeight in 1253, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0234375 = fieldNorm(doc=1253)
      0.33333334 = coord(1/3)
    
    Abstract
    Information retrieval over the Internet increasingly requires the filtering of thousands of heterogeneous information sources. Important sources of information include not only traditional databases with structured data and queries, but also increasing numbers of non-traditional, semi- or unstructured collections such as Web sites, FTP archives, etc. As the number and variability of sources increases, new ways of automatically summarizing, discovering, and selecting collections relevant to a user's query are needed. One such method involves the use of classification schemes, such as the Library of Congress Classification (LCC), within which a collection may be represented based on its content, irrespective of the structure of the actual data or documents. For such a system to be useful in a large-scale distributed environment, it must be easy to use for both collection managers and users. As a result, it must be possible to classify documents automatically within a classification scheme. Furthermore, there must be a straightforward and intuitive interface with which the user may use the scheme to assist in information retrieval (IR). Our work with the Alexandria Digital Library (ADL) Project focuses on geo-referenced information, whether text, maps, aerial photographs, or satellite images. As a result, we have emphasized techniques which work with both text and non-text, such as combined textual and graphical queries, multi-dimensional indexing, and IR methods which are not solely dependent on words or phrases. Part of this work involves locating relevant online sources of information. In particular, we have designed and are currently testing aspects of an architecture, Pharos, which we believe will scale up to 1.000.000 heterogeneous sources. Pharos accommodates heterogeneity in content and format, both among multiple sources as well as within a single source. That is, we consider sources to include Web sites, FTP archives, newsgroups, and full digital libraries; all of these systems can include a wide variety of content and multimedia data formats. Pharos is based on the use of hierarchical classification schemes. These include not only well-known 'subject' (or 'concept') based schemes such as the Dewey Decimal System and the LCC, but also, for example, geographic classifications, which might be constructed as layers of smaller and smaller hierarchical longitude/latitude boxes. Pharos is designed to work with sophisticated queries which utilize subjects, geographical locations, temporal specifications, and other types of information domains. The Pharos architecture requires that hierarchically structured collection metadata be extracted so that it can be partitioned in such a way as to greatly enhance scalability. Automated classification is important to Pharos because it allows information sources to extract the requisite collection metadata automatically that must be distributed.
    We are currently experimenting with newsgroups as collections. We have built an initial prototype which automatically classifies and summarizes newsgroups within the LCC. (The prototype can be tested below, and more details may be found at http://pharos.alexandria.ucsb.edu/). The prototype uses electronic library catalog records as a `training set' and Latent Semantic Indexing (LSI) for IR. We use the training set to build a rich set of classification terminology, and associate these terms with the relevant categories in the LCC. This association between terms and classification categories allows us to relate users' queries to nodes in the LCC so that users can select appropriate query categories. Newsgroups are similarly associated with classification categories. Pharos then matches the categories selected by users to relevant newsgroups. In principle, this approach allows users to exclude newsgroups that might have been selected based on an unintended meaning of a query term, and to include newsgroups with relevant content even though the exact query terms may not have been used. This work is extensible to other types of classification, including geographical, temporal, and image feature. Before discussing the methodology of the collection summarization and selection, we first present an online demonstration below. The demonstration is not intended to be a complete end-user interface. Rather, it is intended merely to offer a view of the process to suggest the "look and feel" of the prototype. The demo works as follows. First supply it with a few keywords of interest. The system will then use those terms to try to return to you the most relevant subject categories within the LCC. Assuming that the system recognizes any of your terms (it has over 400,000 terms indexed), it will give you a list of 15 LCC categories sorted by relevancy ranking. From there, you have two choices. The first choice, by clicking on the "News" links, is to get a list of newsgroups which the system has identified as relevant to the LCC category you select. The other choice, by clicking on the LCC ID links, is to enter the LCC hierarchy starting at the category of your choice and navigate the tree until you locate the best category for your query. From there, again, you can get a list of newsgroups by clicking on the "News" links. After having shown this demonstration to many people, we would like to suggest that you first give it easier examples before trying to break it. For example, "prostate cancer" (discussed below), "remote sensing", "investment banking", and "gershwin" all work reasonably well.
  6. Losee, R.M.; Haas, S.W.: Sublanguage terms : dictionaries, usage, and automatic classification (1995) 0.01
    0.007112743 = product of:
      0.021338228 = sum of:
        0.021338228 = weight(_text_:on in 2650) [ClassicSimilarity], result of:
          0.021338228 = score(doc=2650,freq=2.0), product of:
            0.109763056 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.04990557 = queryNorm
            0.19440265 = fieldWeight in 2650, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0625 = fieldNorm(doc=2650)
      0.33333334 = coord(1/3)
    
    Abstract
    The use of terms from natural and social science titles and abstracts is studied from the perspective of sublanguages and their specialized dictionaries. Explores different notions of sublanguage distinctiveness. Object methods for separating hard and soft sciences are suggested based on measures of sublanguage use, dictionary characteristics, and sublanguage distinctiveness. Abstracts were automatically classified with a high degree of accuracy by using a formula that condsiders the degree of uniqueness of terms in each sublanguage. This may prove useful for text filtering of information retrieval systems
  7. Chan, L.M.; Lin, X.; Zeng, M.: Structural and multilingual approaches to subject access on the Web (1999) 0.01
    0.007112743 = product of:
      0.021338228 = sum of:
        0.021338228 = weight(_text_:on in 162) [ClassicSimilarity], result of:
          0.021338228 = score(doc=162,freq=2.0), product of:
            0.109763056 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.04990557 = queryNorm
            0.19440265 = fieldWeight in 162, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0625 = fieldNorm(doc=162)
      0.33333334 = coord(1/3)
    
  8. Koch, T.: Experiments with automatic classification of WAIS databases and indexing of WWW : some results from the Nordic WAIS/WWW project (1994) 0.01
    0.00622365 = product of:
      0.01867095 = sum of:
        0.01867095 = weight(_text_:on in 7209) [ClassicSimilarity], result of:
          0.01867095 = score(doc=7209,freq=2.0), product of:
            0.109763056 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.04990557 = queryNorm
            0.17010231 = fieldWeight in 7209, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0546875 = fieldNorm(doc=7209)
      0.33333334 = coord(1/3)
    
    Abstract
    The Nordic WAIS/WWW project sponsored by NORDINFO is a joint project between Lund University Library and the National Technological Library of Denmark. It aims to improve the existing networked information discovery and retrieval tools Wide Area Information System (WAIS) and World Wide Web (WWW), and to move towards unifying WWW and WAIS. Details current results focusing on the WAIS side of the project. Describes research into automatic indexing and classification of WAIS sources, development of an orientation tool for WAIS, and development of a WAIS index of WWW resources
  9. Savic, D.: Automatic classification of office documents : review of available methods and techniques (1995) 0.01
    0.00622365 = product of:
      0.01867095 = sum of:
        0.01867095 = weight(_text_:on in 2219) [ClassicSimilarity], result of:
          0.01867095 = score(doc=2219,freq=2.0), product of:
            0.109763056 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.04990557 = queryNorm
            0.17010231 = fieldWeight in 2219, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2219)
      0.33333334 = coord(1/3)
    
    Abstract
    Classification of office documents is one of the administrative functions carried out by almost every organization and institution which sends and receives correspondence. Processing of this increasing amount of information coming and out going mail, in particular its classification, is time consuming and expensive. More and more organizations are seeking a solution for meeting this challenge by designing computer based systems for automatic classification. Examines the present status of available knowledge and methodology which can be used for automatic classification of office documents. Besides a review of classic methods and techniques, the focus id also placed on the application of artificial intelligence
  10. May, A.D.: Automatic classification of e-mail messages by message type (1997) 0.01
    0.00622365 = product of:
      0.01867095 = sum of:
        0.01867095 = weight(_text_:on in 6493) [ClassicSimilarity], result of:
          0.01867095 = score(doc=6493,freq=2.0), product of:
            0.109763056 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.04990557 = queryNorm
            0.17010231 = fieldWeight in 6493, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6493)
      0.33333334 = coord(1/3)
    
    Abstract
    This article describes a system that automatically classifies e-mail messages in the HUMANIST electronic discussion group into one of 4 classes: questions, responses, announcement or administartive. A total of 1.372 messages were analyzed. The automatic classification of a message was based on string matching between a message text and predefined string sets for each of the massage types. The system's automated ability to accurately classify a message was compared against manually assigned codes. The Cohen's Kappa of .55 suggested that there was a statistical agreement between the automatic and manually assigned codes
  11. Losee, R.M.: Text windows and phrases differing by discipline, location in document, and syntactic structure (1996) 0.01
    0.00622365 = product of:
      0.01867095 = sum of:
        0.01867095 = weight(_text_:on in 6962) [ClassicSimilarity], result of:
          0.01867095 = score(doc=6962,freq=2.0), product of:
            0.109763056 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.04990557 = queryNorm
            0.17010231 = fieldWeight in 6962, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6962)
      0.33333334 = coord(1/3)
    
    Abstract
    Knowledge of window style, content, location, and grammatical structure may be used to classify documents as originating within a particular discipline or may be used to place a document on a theory vs. practice spectrum. Examines characteristics of phrases and text windows, including their number, location in documents, and grammatical construction, in addition to studying variations in these window characteristics across disciplines. Examines some of the linguistic regularities for individual disciplines, and suggests families of regularities that may provide helpful for the automatic classification of documents, as well as for information retrieval and filtering applications
  12. Rose, J.R.; Gasteiger, J.: HORACE: an automatic system for the hierarchical classification of chemical reactions (1994) 0.01
    0.00622365 = product of:
      0.01867095 = sum of:
        0.01867095 = weight(_text_:on in 7696) [ClassicSimilarity], result of:
          0.01867095 = score(doc=7696,freq=2.0), product of:
            0.109763056 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.04990557 = queryNorm
            0.17010231 = fieldWeight in 7696, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0546875 = fieldNorm(doc=7696)
      0.33333334 = coord(1/3)
    
    Abstract
    Describes an automatic classification system for classifying chemical reactions. A detailed study of the classification of chemical reactions, based on topological and physicochemical features, is followed by an analysis of the hierarchical classification produced by the HORACE algorithm (Hierarchical Organization of Reactions through Attribute and Condition Eduction), which combines both approaches in a synergistic manner. The searching and updating of reaction hierarchies is demonstrated with the hierarchies produced for 2 data sets by the HORACE algorithm. Shows that reaction hierarchies provide an efficient access to reaction information and indicate the main reaction types for a given reaction scheme, define the scope of a reaction type, enable searchers to find unusual reactions, and can help in locating the reactions most relevant for a given problem
  13. Huang, Y.-L.: ¬A theoretic and empirical research of cluster indexing for Mandarine Chinese full text document (1998) 0.01
    0.00622365 = product of:
      0.01867095 = sum of:
        0.01867095 = weight(_text_:on in 513) [ClassicSimilarity], result of:
          0.01867095 = score(doc=513,freq=2.0), product of:
            0.109763056 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.04990557 = queryNorm
            0.17010231 = fieldWeight in 513, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0546875 = fieldNorm(doc=513)
      0.33333334 = coord(1/3)
    
    Abstract
    Since most popular commercialized systems for full text retrieval are designed with full text scaning and Boolean logic query mode, these systems use an oversimplified relationship between the indexing form and the content of document. Reports the use of Singular Value Decomposition (SVD) to develop a Cluster Indexing Model (CIM) based on a Vector Space Model (VSM) in orer to explore the index theory of cluster indexing for chinese full text documents. From a series of experiments, it was found that the indexing performance of CIM is better than traditional VSM, and has almost equivalent effectiveness of the authority control of index terms
  14. Koch, T.; Ardö, A.; Brümmer, A.: ¬The building and maintenance of robot based internet search services : A review of current indexing and data collection methods. Prepared to meet the requirements of Work Package 3 of EU Telematics for Research, project DESIRE. Version D3.11v0.3 (Draft version 3) (1996) 0.01
    0.006159817 = product of:
      0.01847945 = sum of:
        0.01847945 = weight(_text_:on in 1669) [ClassicSimilarity], result of:
          0.01847945 = score(doc=1669,freq=6.0), product of:
            0.109763056 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.04990557 = queryNorm
            0.16835764 = fieldWeight in 1669, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.03125 = fieldNorm(doc=1669)
      0.33333334 = coord(1/3)
    
    Abstract
    After a short outline of problems, possibilities and difficulties of systematic information retrieval on the Internet and a description of efforts for development in this area, a specification of the terminology for this report is required. Although the process of retrieval is generally seen as an iterative process of browsing and information retrieval and several important services on the net have taken this fact into consideration, the emphasis of this report lays on the general retrieval tools for the whole of Internet. In order to be able to evaluate the differences, possibilities and restrictions of the different services it is necessary to begin with organizing the existing varieties in a typological/ taxonomical survey. The possibilities and weaknesses will be briefly compared and described for the most important services in the categories robot-based WWW-catalogues of different types, list- or form-based catalogues and simultaneous or collected search services respectively. It will however for different reasons not be possible to rank them in order of "best" services. Still more important are the weaknesses and problems common for all attempts of indexing the Internet. The problems of the quality of the input, the technical performance and the general problem of indexing virtual hypertext are shown to be at least as difficult as the different aspects of harvesting, indexing and information retrieval. Some of the attempts made in the area of further development of retrieval services will be mentioned in relation to descriptions of the contents of documents and standardization efforts. Internet harvesting and indexing technology and retrieval software is thoroughly reviewed. Details about all services and software are listed in analytical forms in Annex 1-3.
  15. Mostafa, J.; Quiroga, L.M.; Palakal, M.: Filtering medical documents using automated and human classification methods (1998) 0.01
    0.0053345575 = product of:
      0.016003672 = sum of:
        0.016003672 = weight(_text_:on in 2326) [ClassicSimilarity], result of:
          0.016003672 = score(doc=2326,freq=2.0), product of:
            0.109763056 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.04990557 = queryNorm
            0.14580199 = fieldWeight in 2326, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.046875 = fieldNorm(doc=2326)
      0.33333334 = coord(1/3)
    
    Abstract
    The goal of this research is to clarify the role of document classification in information filtering. An important function of classification, in managing computational complexity, is described and illustrated in the context of an existing filtering system. A parameter called classification homogeneity is presented for analyzing unsupervised automated classification by employing human classification as a control. 2 significant components of the automated classification approach, vocabulary discovery and classification scheme generation, are described in detail. Results of classification performance revealed considerable variability in the homogeneity of automatically produced classes. Based on the classification performance, different types of interest profiles were created. Subsequently, these profiles were used to perform filtering sessions. The filtering results showed that with increasing homogeneity, filtering performance improves, and, conversely, with decreasing homogeneity, filtering performance degrades
  16. Dolin, R.; Agrawal, D.; El Abbadi, A.; Pearlman, J.: Using automated classification for summarizing and selecting heterogeneous information sources (1998) 0.01
    0.0053345575 = product of:
      0.016003672 = sum of:
        0.016003672 = weight(_text_:on in 316) [ClassicSimilarity], result of:
          0.016003672 = score(doc=316,freq=2.0), product of:
            0.109763056 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.04990557 = queryNorm
            0.14580199 = fieldWeight in 316, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.046875 = fieldNorm(doc=316)
      0.33333334 = coord(1/3)
    
    Abstract
    Information retrieval over the Internet increasingly requires the filtering of thousands of heterogeneous information sources. Important sources of information include not only traditional databases with structured data and queries, but also increasing numbers of non-traditional, semi- or unstructured collections such as Web sites, FTP archives, etc. As the number and variability of sources increases, new ways of automatically summarizing, discovering, and selecting collections relevant to a user's query are needed. One such method involves the use of classification schemes, such as the Library of Congress Classification (LCC) [10], within which a collection may be represented based on its content, irrespective of the structure of the actual data or documents. For such a system to be useful in a large-scale distributed environment, it must be easy to use for both collection managers and users. As a result, it must be possible to classify documents automatically within a classification scheme. Furthermore, there must be a straightforward and intuitive interface with which the user may use the scheme to assist in information retrieval (IR).
  17. Koch, T.; Ardö, A.; Noodén, L.: ¬The construction of a robot-generated subject index : DESIRE II D3.6a, Working Paper 1 (1999) 0.01
    0.0053345575 = product of:
      0.016003672 = sum of:
        0.016003672 = weight(_text_:on in 1668) [ClassicSimilarity], result of:
          0.016003672 = score(doc=1668,freq=2.0), product of:
            0.109763056 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.04990557 = queryNorm
            0.14580199 = fieldWeight in 1668, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.046875 = fieldNorm(doc=1668)
      0.33333334 = coord(1/3)
    
    Abstract
    This working paper describes the creation of a test database to carry out the automatic classification tasks of the DESIRE II work package D3.6a on. It is an improved version of NetLab's existing "All" Engineering database created after a comparative study of the outcome of two different approaches to collecting the documents. These two methods were selected from seven different general methodologies to build robot-generated subject indices, presented in this paper. We found a surprisingly low overlap between the Engineering link collections we used as seed pages for the robot and subsequently an even more surprisingly low overlap between the resources collected by the two different approaches. That inspite of using basically the same services to start the harvesting process from. A intellectual evaluation of the contents of both databases showed almost exactly the same percentage of relevant documents (77%), indicating that the main difference between those aproaches was the coverage of the resulting database.
  18. Search Engines and Beyond : Developing efficient knowledge management systems, April 19-20 1999, Boston, Mass (1999) 0.01
    0.005029469 = product of:
      0.015088406 = sum of:
        0.015088406 = weight(_text_:on in 2596) [ClassicSimilarity], result of:
          0.015088406 = score(doc=2596,freq=4.0), product of:
            0.109763056 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.04990557 = queryNorm
            0.13746344 = fieldWeight in 2596, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.03125 = fieldNorm(doc=2596)
      0.33333334 = coord(1/3)
    
    Content
    Ramana Rao (Inxight, Palo Alto, CA) 7 ± 2 Insights on achieving Effective Information Access Session One: Updates and a twelve month perspective Danny Sullivan (Search Engine Watch, US / England) Portalization and other search trends Carol Tenopir (University of Tennessee) Search realities faced by end users and professional searchers Session Two: Today's search engines and beyond Daniel Hoogterp (Retrieval Technologies, McLean, VA) Effective presentation and utilization of search techniques Rick Kenny (Fulcrum Technologies, Ontario, Canada) Beyond document clustering: The knowledge impact statement Gary Stock (Ingenius, Kalamazoo, MI) Automated change monitoring Gary Culliss (Direct Hit, Wellesley Hills, MA) User popularity ranked search engines Byron Dom (IBM, CA) Automatically finding the best pages on the World Wide Web (CLEVER) Peter Tomassi (LookSmart, San Francisco, CA) Adding human intellect to search technology Session Three: Panel discussion: Human v automated categorization and editing Ev Brenner (New York, NY)- Chairman James Callan (University of Massachusetts, MA) Marc Krellenstein (Northern Light Technology, Cambridge, MA) Dan Miller (Ask Jeeves, Berkeley, CA) Session Four: Updates and a twelve month perspective Steve Arnold (AIT, Harrods Creek, KY) Review: The leading edge in search and retrieval software Ellen Voorhees (NIST, Gaithersburg, MD) TREC update Session Five: Search engines now and beyond Intelligent Agents John Snyder (Muscat, Cambridge, England) Practical issues behind intelligent agents Text summarization Therese Firmin, (Dept of Defense, Ft George G. Meade, MD) The TIPSTER/SUMMAC evaluation of automatic text summarization systems Cross language searching Elizabeth Liddy (TextWise, Syracuse, NY) A conceptual interlingua approach to cross-language retrieval. Video search and retrieval Armon Amir (IBM, Almaden, CA) CueVideo: Modular system for automatic indexing and browsing of video/audio Speech recognition Michael Witbrock (Lycos, Waltham, MA) Retrieval of spoken documents Visualization James A. Wise (Integral Visuals, Richland, WA) Information visualization in the new millennium: Emerging science or passing fashion? Text mining David Evans (Claritech, Pittsburgh, PA) Text mining - towards decision support