Search (190 results, page 10 of 10)

Striedieck, S.: Online catalog maintenance : the OOPS command in LIAS (1985) 0.01

0.009770754 = product of:
  0.04885377 = sum of:
    0.04885377 = weight(_text_:22 in 366) [ClassicSimilarity], result of:
      0.04885377 = score(doc=366,freq=2.0), product of:
        0.18038483 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.051511593 = queryNorm
        0.2708308 = fieldWeight in 366, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.0546875 = fieldNorm(doc=366)
  0.2 = coord(1/5)

Date: 7. 1.2007 13:22:30

Coates, E.J.: Significance and term relationship in compound headings (1985) 0.01
```
0.008693925 = product of:
  0.043469626 = sum of:
    0.043469626 = weight(_text_:index in 3634) [ClassicSimilarity], result of:
      0.043469626 = score(doc=3634,freq=2.0), product of:
        0.2250935 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.051511593 = queryNorm
        0.1931181 = fieldWeight in 3634, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.03125 = fieldNorm(doc=3634)
  0.2 = coord(1/5)
```
Abstract

In the continuing search for criteria for determining the form of compound headings (i.e., headings containing more than one word), many authors have attempted to deal with the problem of entry element and citation order. Among the proposed criteria are Cutter's concept of "significance," Kaiser's formula of "concrete/process," Prevost's "noun rule," and Farradane's categories of relationships*' (q.v.). One of the problems in applying the criteria has been the difficulty in determining what is "significant," particularly when two or more words in the heading all refer to concrete objects. In the following excerpt from Subject Catalogues: Headings and Structure, a widely cited book an the alphabetical subject catalog, E. J. Coates proposes the concept of "term significance," that is, "the word which evokes the clearest mental image," as the criterion for determining the entry element in a compound heading. Since a concrete object generally evokes a clearer mental image than an action or process, Coates' theory is in line with Kaiser's theory of "concrete/process" (q.v.) which Coates renamed "thing/action." For determining the citation order of component elements in a compound heading where the elements are equally "significant" (i.e., both or all evoking clear mental images), Coates proposes the use of "term relationship" as the determining factor. He has identified twenty different kinds of relationships among terms and set down the citation order for each. Another frequently encountered problem related to citation order is the determination of the entry element for a compound heading which contains a topic and a locality. Entering such headings uniformly under either the topic or the locality has proven to be infeasible in practice. Many headings of this type have the topic as the main heading, subdivided by the locality; others are entered under the locality as the main heading with the topic as the subdivision. No criteria or rules have been proposed that ensure consistency or predictability. In the following selection, Coates attempts to deal with this problem by ranking the "main areas of knowledge according to the extent to which they appear to be significantly conditioned by locality." The theory Coates expounded in his book was put into practice in compiling the British Technology Index for which Coates served as the editor from 1961 to 1977.
Salton, G.: Automatic processing of foreign language documents (1985) 0.01
```
0.008693925 = product of:
  0.043469626 = sum of:
    0.043469626 = weight(_text_:index in 3650) [ClassicSimilarity], result of:
      0.043469626 = score(doc=3650,freq=2.0), product of:
        0.2250935 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.051511593 = queryNorm
        0.1931181 = fieldWeight in 3650, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.03125 = fieldNorm(doc=3650)
  0.2 = coord(1/5)
```
Abstract

The attempt to computerize a process, such as indexing, abstracting, classifying, or retrieving information, begins with an analysis of the process into its intellectual and nonintellectual components. That part of the process which is amenable to computerization is mechanical or algorithmic. What is not is intellectual or creative and requires human intervention. Gerard Salton has been an innovator, experimenter, and promoter in the area of mechanized information systems since the early 1960s. He has been particularly ingenious at analyzing the process of information retrieval into its algorithmic components. He received a doctorate in applied mathematics from Harvard University before moving to the computer science department at Cornell, where he developed a prototype automatic retrieval system called SMART. Working with this system he and his students contributed for over a decade to our theoretical understanding of the retrieval process. On a more practical level, they have contributed design criteria for operating retrieval systems. The following selection presents one of the early descriptions of the SMART system; it is valuable as it shows the direction automatic retrieval methods were to take beyond simple word-matching techniques. These include various word normalization techniques to improve recall, for instance, the separation of words into stems and affixes; the correlation and clustering, using statistical association measures, of related terms; and the identification, using a concept thesaurus, of synonymous, broader, narrower, and sibling terms. They include, as weIl, techniques, both linguistic and statistical, to deal with the thorny problem of how to automatically extract from texts index terms that consist of more than one word. They include weighting techniques and various documentrequest matching algorithms. Significant among the latter are those which produce a retrieval output of citations ranked in relevante order. During the 1970s, Salton and his students went an to further refine these various techniques, particularly the weighting and statistical association measures. Many of their early innovations seem commonplace today. Some of their later techniques are still ahead of their time and await technological developments for implementation. The particular focus of the selection that follows is an the evaluation of a particular component of the SMART system, a multilingual thesaurus. By mapping English language expressions and their German equivalents to a common concept number, the thesaurus permitted the automatic processing of German language documents against English language queries and vice versa. The results of the evaluation, as it turned out, were somewhat inconclusive. However, this SMART experiment suggested in a bold and optimistic way how one might proceed to answer such complex questions as What is meant by retrieval language compatability? How it is to be achieved, and how evaluated?
Danky, J.P.: Newspapers and their readers : the United States newspaper program's list of intended audience terms (1986) 0.01
```
0.008693925 = product of:
  0.043469626 = sum of:
    0.043469626 = weight(_text_:index in 372) [ClassicSimilarity], result of:
      0.043469626 = score(doc=372,freq=2.0), product of:
        0.2250935 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.051511593 = queryNorm
        0.1931181 = fieldWeight in 372, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.03125 = fieldNorm(doc=372)
  0.2 = coord(1/5)
```
Abstract

The publication by OCLC of the United States Newspaper Program National Union List in June, 1985 is an important milestone for librarians in general as well as for participants in the Program and OCLC. The United States Newspapers Program (USNP) is a cooperative venture of the National Endowment for the Humanities and the Library of Congress and will eventually involve libraries in all 50 states and territories. The Program seeks to create an online data base with bibliographic records and holdings statements for all newspapers held in U.S. libraries regardless of their place of publication. To begin with U.S. newspapers are the focus. As the largest union list product produced by OCLC, this nearly 6,000page set is impressive. However, bulk is not the most important characteristic. By providing access to bibliographic records contributed by many libraries around the nation in new ways, OCLC has responded to patron and librarian demands. The chronological, intended audience (subject), language, and place of publication (geographical) indexes represent the most important advances in access to newspapers in decades. As a prototype, this product holds much promise for the profession, especially in terms of subject access, or intended audience here. This article analyzes the Intended-Audience Index in the first edition, looking at the use of approved and improper terms, describing the origins of the list of terms, and projecting the shape of the data base over the life of the United States Newspaper Program. Like CONSER, of which the USNP is a part, this project is an example of cooperation between many institutions including the Library of Congress, OCLC, and libraries in every state and territory. The article describes one instance of this cooperation in practice.

Devadason, F.J.: Postulate-Based Permuted Subject Indexing Language as a metalanguage for computer-aided generation of information retrieval thesaurus (1983) 0.01

0.008374932 = product of:
  0.04187466 = sum of:
    0.04187466 = weight(_text_:22 in 1637) [ClassicSimilarity], result of:
      0.04187466 = score(doc=1637,freq=2.0), product of:
        0.18038483 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.051511593 = queryNorm
        0.23214069 = fieldWeight in 1637, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.046875 = fieldNorm(doc=1637)
  0.2 = coord(1/5)

Source: International forum on information and documentation. 8(1983), S.22-29

Miller, J.: From subject headings for audiovisual media (1988) 0.01

0.008374932 = product of:
  0.04187466 = sum of:
    0.04187466 = weight(_text_:22 in 324) [ClassicSimilarity], result of:
      0.04187466 = score(doc=324,freq=2.0), product of:
        0.18038483 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.051511593 = queryNorm
        0.23214069 = fieldWeight in 324, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.046875 = fieldNorm(doc=324)
  0.2 = coord(1/5)

Source: Inspel. 22(1988), S.121-145

Woodhead, P.A.; Martin, J.V.: Subject specialization in British university libraries : a survey (1982) 0.01

0.008374932 = product of:
  0.04187466 = sum of:
    0.04187466 = weight(_text_:22 in 468) [ClassicSimilarity], result of:
      0.04187466 = score(doc=468,freq=2.0), product of:
        0.18038483 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.051511593 = queryNorm
        0.23214069 = fieldWeight in 468, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.046875 = fieldNorm(doc=468)
  0.2 = coord(1/5)

Date: 9. 2.1997 18:44:22

Needham, R.M.; Sparck Jones, K.: Keywords and clumps (1985) 0.01
```
0.0076071853 = product of:
  0.038035925 = sum of:
    0.038035925 = weight(_text_:index in 3645) [ClassicSimilarity], result of:
      0.038035925 = score(doc=3645,freq=2.0), product of:
        0.2250935 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.051511593 = queryNorm
        0.16897833 = fieldWeight in 3645, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.02734375 = fieldNorm(doc=3645)
  0.2 = coord(1/5)
```
Abstract

The selection that follows was chosen as it represents "a very early paper an the possibilities allowed by computers an documentation." In the early 1960s computers were being used to provide simple automatic indexing systems wherein keywords were extracted from documents. The problem with such systems was that they lacked vocabulary control, thus documents related in subject matter were not always collocated in retrieval. To improve retrieval by improving recall is the raison d'être of vocabulary control tools such as classifications and thesauri. The question arose whether it was possible by automatic means to construct classes of terms, which when substituted, one for another, could be used to improve retrieval performance? One of the first theoretical approaches to this question was initiated by R. M. Needham and Karen Sparck Jones at the Cambridge Language Research Institute in England.t The question was later pursued using experimental methodologies by Sparck Jones, who, as a Senior Research Associate in the Computer Laboratory at the University of Cambridge, has devoted her life's work to research in information retrieval and automatic naturai language processing. Based an the principles of numerical taxonomy, automatic classification techniques start from the premise that two objects are similar to the degree that they share attributes in common. When these two objects are keywords, their similarity is measured in terms of the number of documents they index in common. Step 1 in automatic classification is to compute mathematically the degree to which two terms are similar. Step 2 is to group together those terms that are "most similar" to each other, forming equivalence classes of intersubstitutable terms. The technique for forming such classes varies and is the factor that characteristically distinguishes different approaches to automatic classification. The technique used by Needham and Sparck Jones, that of clumping, is described in the selection that follows. Questions that must be asked are whether the use of automatically generated classes really does improve retrieval performance and whether there is a true eco nomic advantage in substituting mechanical for manual labor. Several years after her work with clumping, Sparck Jones was to observe that while it was not wholly satisfactory in itself, it was valuable in that it stimulated research into automatic classification. To this it might be added that it was valuable in that it introduced to libraryl information science the methods of numerical taxonomy, thus stimulating us to think again about the fundamental nature and purpose of classification. In this connection it might be useful to review how automatically derived classes differ from those of manually constructed classifications: 1) the manner of their derivation is purely a posteriori, the ultimate operationalization of the principle of literary warrant; 2) the relationship between members forming such classes is essentially statistical; the members of a given class are similar to each other not because they possess the class-defining characteristic but by virtue of sharing a family resemblance; and finally, 3) automatically derived classes are not related meaningfully one to another, that is, they are not ordered in traditional hierarchical and precedence relationships.
Borko, H.: Research in computer based classification systems (1985) 0.01
```
0.0076071853 = product of:
  0.038035925 = sum of:
    0.038035925 = weight(_text_:index in 3647) [ClassicSimilarity], result of:
      0.038035925 = score(doc=3647,freq=2.0), product of:
        0.2250935 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.051511593 = queryNorm
        0.16897833 = fieldWeight in 3647, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.02734375 = fieldNorm(doc=3647)
  0.2 = coord(1/5)
```
Abstract

The selection in this reader by R. M. Needham and K. Sparck Jones reports an early approach to automatic classification that was taken in England. The following selection reviews various approaches that were being pursued in the United States at about the same time. It then discusses a particular approach initiated in the early 1960s by Harold Borko, at that time Head of the Language Processing and Retrieval Research Staff at the System Development Corporation, Santa Monica, California and, since 1966, a member of the faculty at the Graduate School of Library and Information Science, University of California, Los Angeles. As was described earlier, there are two steps in automatic classification, the first being to identify pairs of terms that are similar by virtue of co-occurring as index terms in the same documents, and the second being to form equivalence classes of intersubstitutable terms. To compute similarities, Borko and his associates used a standard correlation formula; to derive classification categories, where Needham and Sparck Jones used clumping, the Borko team used the statistical technique of factor analysis. The fact that documents can be classified automatically, and in any number of ways, is worthy of passing notice. Worthy of serious attention would be a demonstra tion that a computer-based classification system was effective in the organization and retrieval of documents. One reason for the inclusion of the following selection in the reader is that it addresses the question of evaluation. To evaluate the effectiveness of their automatically derived classification, Borko and his team asked three questions. The first was Is the classification reliable? in other words, could the categories derived from one sample of texts be used to classify other texts? Reliability was assessed by a case-study comparison of the classes derived from three different samples of abstracts. The notso-surprising conclusion reached was that automatically derived classes were reliable only to the extent that the sample from which they were derived was representative of the total document collection. The second evaluation question asked whether the classification was reasonable, in the sense of adequately describing the content of the document collection. The answer was sought by comparing the automatically derived categories with categories in a related classification system that was manually constructed. Here the conclusion was that the automatic method yielded categories that fairly accurately reflected the major area of interest in the sample collection of texts; however, since there were only eleven such categories and they were quite broad, they could not be regarded as suitable for use in a university or any large general library. The third evaluation question asked whether automatic classification was accurate, in the sense of producing results similar to those obtainabie by human cIassifiers. When using human classification as a criterion, automatic classification was found to be 50 percent accurate.

Kuhlen, R.; Hammwöhner, R.; Sonnenberger, G.; Thiel, U.: TWRM-TOPOGRAPHIC : ein wissensbasiertes System zur situationsgerechten Aufbereitung und Präsentation von Textinformation in graphischen Retrievaldialogen (1988) 0.01

0.00697911 = product of:
  0.03489555 = sum of:
    0.03489555 = weight(_text_:22 in 3113) [ClassicSimilarity], result of:
      0.03489555 = score(doc=3113,freq=2.0), product of:
        0.18038483 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.051511593 = queryNorm
        0.19345059 = fieldWeight in 3113, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3113)
  0.2 = coord(1/5)

Date: 15. 1.2005 14:10:22

Search (190 results, page 10 of 10)

Authors

Languages

Themes