Search (9 results, page 1 of 1)

Savic, D.: Automatic classification of office documents : review of available methods and techniques (1995) 0.04

0.039781027 = product of:
  0.13923359 = sum of:
    0.05205557 = weight(_text_:processing in 2219) [ClassicSimilarity], result of:
      0.05205557 = score(doc=2219,freq=2.0), product of:
        0.1662677 = queryWeight, product of:
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.04107254 = queryNorm
        0.3130829 = fieldWeight in 2219, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2219)
    0.08717802 = weight(_text_:techniques in 2219) [ClassicSimilarity], result of:
      0.08717802 = score(doc=2219,freq=4.0), product of:
        0.18093403 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.04107254 = queryNorm
        0.48182213 = fieldWeight in 2219, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2219)
  0.2857143 = coord(2/7)

Abstract: Classification of office documents is one of the administrative functions carried out by almost every organization and institution which sends and receives correspondence. Processing of this increasing amount of information coming and out going mail, in particular its classification, is time consuming and expensive. More and more organizations are seeking a solution for meeting this challenge by designing computer based systems for automatic classification. Examines the present status of available knowledge and methodology which can be used for automatic classification of office documents. Besides a review of classic methods and techniques, the focus id also placed on the application of artificial intelligence

Dubin, D.: Dimensions and discriminability (1998) 0.02

0.020437783 = product of:
  0.071532235 = sum of:
    0.05205557 = weight(_text_:processing in 2338) [ClassicSimilarity], result of:
      0.05205557 = score(doc=2338,freq=2.0), product of:
        0.1662677 = queryWeight, product of:
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.04107254 = queryNorm
        0.3130829 = fieldWeight in 2338, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2338)
    0.019476667 = product of:
      0.038953334 = sum of:
        0.038953334 = weight(_text_:22 in 2338) [ClassicSimilarity], result of:
          0.038953334 = score(doc=2338,freq=2.0), product of:
            0.14382903 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04107254 = queryNorm
            0.2708308 = fieldWeight in 2338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2338)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)

Date: 22. 9.1997 19:16:05
Source: Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al

Dolin, R.; Agrawal, D.; El Abbadi, A.; Pearlman, J.: Using automated classification for summarizing and selecting heterogeneous information sources (1998) 0.02
```
0.016107209 = product of:
  0.05637523 = sum of:
    0.029956304 = weight(_text_:digital in 1253) [ClassicSimilarity], result of:
      0.029956304 = score(doc=1253,freq=4.0), product of:
        0.16201277 = queryWeight, product of:
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.04107254 = queryNorm
        0.18490088 = fieldWeight in 1253, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.0234375 = fieldNorm(doc=1253)
    0.026418928 = weight(_text_:techniques in 1253) [ClassicSimilarity], result of:
      0.026418928 = score(doc=1253,freq=2.0), product of:
        0.18093403 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.04107254 = queryNorm
        0.14601415 = fieldWeight in 1253, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.0234375 = fieldNorm(doc=1253)
  0.2857143 = coord(2/7)
```
Abstract

Information retrieval over the Internet increasingly requires the filtering of thousands of heterogeneous information sources. Important sources of information include not only traditional databases with structured data and queries, but also increasing numbers of non-traditional, semi- or unstructured collections such as Web sites, FTP archives, etc. As the number and variability of sources increases, new ways of automatically summarizing, discovering, and selecting collections relevant to a user's query are needed. One such method involves the use of classification schemes, such as the Library of Congress Classification (LCC), within which a collection may be represented based on its content, irrespective of the structure of the actual data or documents. For such a system to be useful in a large-scale distributed environment, it must be easy to use for both collection managers and users. As a result, it must be possible to classify documents automatically within a classification scheme. Furthermore, there must be a straightforward and intuitive interface with which the user may use the scheme to assist in information retrieval (IR). Our work with the Alexandria Digital Library (ADL) Project focuses on geo-referenced information, whether text, maps, aerial photographs, or satellite images. As a result, we have emphasized techniques which work with both text and non-text, such as combined textual and graphical queries, multi-dimensional indexing, and IR methods which are not solely dependent on words or phrases. Part of this work involves locating relevant online sources of information. In particular, we have designed and are currently testing aspects of an architecture, Pharos, which we believe will scale up to 1.000.000 heterogeneous sources. Pharos accommodates heterogeneity in content and format, both among multiple sources as well as within a single source. That is, we consider sources to include Web sites, FTP archives, newsgroups, and full digital libraries; all of these systems can include a wide variety of content and multimedia data formats. Pharos is based on the use of hierarchical classification schemes. These include not only well-known 'subject' (or 'concept') based schemes such as the Dewey Decimal System and the LCC, but also, for example, geographic classifications, which might be constructed as layers of smaller and smaller hierarchical longitude/latitude boxes. Pharos is designed to work with sophisticated queries which utilize subjects, geographical locations, temporal specifications, and other types of information domains. The Pharos architecture requires that hierarchically structured collection metadata be extracted so that it can be partitioned in such a way as to greatly enhance scalability. Automated classification is important to Pharos because it allows information sources to extract the requisite collection metadata automatically that must be distributed.

Shafer, K.E.: Evaluating Scorpion results (1998) 0.01

0.012580444 = product of:
  0.088063106 = sum of:
    0.088063106 = weight(_text_:techniques in 1569) [ClassicSimilarity], result of:
      0.088063106 = score(doc=1569,freq=2.0), product of:
        0.18093403 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.04107254 = queryNorm
        0.4867139 = fieldWeight in 1569, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.078125 = fieldNorm(doc=1569)
  0.14285715 = coord(1/7)

Abstract: Scorpion is a research project at OCLC that builds tools for automatic subject assignment by combining library science and information retrieval techniques. A thesis of Scorpion is that the Dewey Decimal Classification (Dewey) can be used to perform automatic subject assignment for electronic items.

Koch, T.; Vizine-Goetz, D.: DDC and knowledge organization in the digital library : Research and development. Demonstration pages (1999) 0.01

0.008558944 = product of:
  0.059912607 = sum of:
    0.059912607 = weight(_text_:digital in 942) [ClassicSimilarity], result of:
      0.059912607 = score(doc=942,freq=4.0), product of:
        0.16201277 = queryWeight, product of:
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.04107254 = queryNorm
        0.36980176 = fieldWeight in 942, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.046875 = fieldNorm(doc=942)
  0.14285715 = coord(1/7)

Content: 1. Increased Importance of Knowledge Organization in Internet Services - 2. Quality Subject Service and the role of classification - 3. Developing the DDC into a knowledge organization instrument for the digital library. OCLC site - 4. DESIRE's Barefoot Solutions of Automatic Classification - 5. Advanced Classification Solutions in DESIRE and CORC - 6. Future directions of research and development - 7. General references

Larson, R.R.: Experiments in automatic Library of Congress Classification (1992) 0.01
```
0.0075482656 = product of:
  0.052837856 = sum of:
    0.052837856 = weight(_text_:techniques in 1054) [ClassicSimilarity], result of:
      0.052837856 = score(doc=1054,freq=2.0), product of:
        0.18093403 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.04107254 = queryNorm
        0.2920283 = fieldWeight in 1054, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.046875 = fieldNorm(doc=1054)
  0.14285715 = coord(1/7)
```
Abstract

This article presents the results of research into the automatic selection of Library of Congress Classification numbers based on the titles and subject headings in MARC records. The method used in this study was based on partial match retrieval techniques using various elements of new recors (i.e., those to be classified) as "queries", and a test database of classification clusters generated from previously classified MARC records. Sixty individual methods for automatic classification were tested on a set of 283 new records, using all combinations of four different partial match methods, five query types, and three representations of search terms. The results indicate that if the best method for a particular case can be determined, then up to 86% of the new records may be correctly classified. The single method with the best accuracy was able to select the correct classification for about 46% of the new records.

Losee, R.M.: Text windows and phrases differing by discipline, location in document, and syntactic structure (1996) 0.01

0.0074365106 = product of:
  0.05205557 = sum of:
    0.05205557 = weight(_text_:processing in 6962) [ClassicSimilarity], result of:
      0.05205557 = score(doc=6962,freq=2.0), product of:
        0.1662677 = queryWeight, product of:
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.04107254 = queryNorm
        0.3130829 = fieldWeight in 6962, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.0546875 = fieldNorm(doc=6962)
  0.14285715 = coord(1/7)

Source: Information processing and management. 32(1996) no.6, S.747-767

Search Engines and Beyond : Developing efficient knowledge management systems, April 19-20 1999, Boston, Mass (1999) 0.01
```
0.005032177 = product of:
  0.03522524 = sum of:
    0.03522524 = weight(_text_:techniques in 2596) [ClassicSimilarity], result of:
      0.03522524 = score(doc=2596,freq=2.0), product of:
        0.18093403 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.04107254 = queryNorm
        0.19468555 = fieldWeight in 2596, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.03125 = fieldNorm(doc=2596)
  0.14285715 = coord(1/7)
```
Content

Ramana Rao (Inxight, Palo Alto, CA) 7 ± 2 Insights on achieving Effective Information Access Session One: Updates and a twelve month perspective Danny Sullivan (Search Engine Watch, US / England) Portalization and other search trends Carol Tenopir (University of Tennessee) Search realities faced by end users and professional searchers Session Two: Today's search engines and beyond Daniel Hoogterp (Retrieval Technologies, McLean, VA) Effective presentation and utilization of search techniques Rick Kenny (Fulcrum Technologies, Ontario, Canada) Beyond document clustering: The knowledge impact statement Gary Stock (Ingenius, Kalamazoo, MI) Automated change monitoring Gary Culliss (Direct Hit, Wellesley Hills, MA) User popularity ranked search engines Byron Dom (IBM, CA) Automatically finding the best pages on the World Wide Web (CLEVER) Peter Tomassi (LookSmart, San Francisco, CA) Adding human intellect to search technology Session Three: Panel discussion: Human v automated categorization and editing Ev Brenner (New York, NY)- Chairman James Callan (University of Massachusetts, MA) Marc Krellenstein (Northern Light Technology, Cambridge, MA) Dan Miller (Ask Jeeves, Berkeley, CA) Session Four: Updates and a twelve month perspective Steve Arnold (AIT, Harrods Creek, KY) Review: The leading edge in search and retrieval software Ellen Voorhees (NIST, Gaithersburg, MD) TREC update Session Five: Search engines now and beyond Intelligent Agents John Snyder (Muscat, Cambridge, England) Practical issues behind intelligent agents Text summarization Therese Firmin, (Dept of Defense, Ft George G. Meade, MD) The TIPSTER/SUMMAC evaluation of automatic text summarization systems Cross language searching Elizabeth Liddy (TextWise, Syracuse, NY) A conceptual interlingua approach to cross-language retrieval. Video search and retrieval Armon Amir (IBM, Almaden, CA) CueVideo: Modular system for automatic indexing and browsing of video/audio Speech recognition Michael Witbrock (Lycos, Waltham, MA) Retrieval of spoken documents Visualization James A. Wise (Integral Visuals, Richland, WA) Information visualization in the new millennium: Emerging science or passing fashion? Text mining David Evans (Claritech, Pittsburgh, PA) Text mining - towards decision support

Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998) 0.00

0.0027823811 = product of:
  0.019476667 = sum of:
    0.019476667 = product of:
      0.038953334 = sum of:
        0.038953334 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
          0.038953334 = score(doc=1673,freq=2.0), product of:
            0.14382903 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04107254 = queryNorm
            0.2708308 = fieldWeight in 1673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1673)
      0.5 = coord(1/2)
  0.14285715 = coord(1/7)

Date: 1. 8.1996 22:08:06

Search (9 results, page 1 of 1)

Authors

Types

Themes