Search (114 results, page 1 of 6)

Park, Y.C.; Choi, K.-S.: Automatic thesaurus construction using Bayesian networks (1996) 0.17

0.16562025 = product of:
  0.3312405 = sum of:
    0.12775593 = weight(_text_:term in 6581) [ClassicSimilarity], result of:
      0.12775593 = score(doc=6581,freq=4.0), product of:
        0.21904005 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.04694356 = queryNorm
        0.58325374 = fieldWeight in 6581, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.0625 = fieldNorm(doc=6581)
    0.20348458 = weight(_text_:frequency in 6581) [ClassicSimilarity], result of:
      0.20348458 = score(doc=6581,freq=4.0), product of:
        0.27643865 = queryWeight, product of:
          5.888745 = idf(docFreq=332, maxDocs=44218)
          0.04694356 = queryNorm
        0.7360931 = fieldWeight in 6581, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.888745 = idf(docFreq=332, maxDocs=44218)
          0.0625 = fieldNorm(doc=6581)
  0.5 = coord(2/4)

Abstract: Automatic thesaurus construction is accomplished by extracting term relations mechanically. A popular method uses statistical analysis to discover the term relations. For low frequency terms the statistical information of the terms cannot be reliably used for deciding the relationship of terms. This problem is referred to as the data sparseness problem. Many studies have shown that low frequency terms are of most use in thesaurus construction. Characterizes the statistical behaviour of terms by using an inference network. Develops a formal approach using a Baysian network for the data sparseness problem

Tseng, Y.-H.: Automatic thesaurus generation for Chinese documents (2002) 0.11

0.11420593 = product of:
  0.15227456 = sum of:
    0.005885557 = product of:
      0.023542227 = sum of:
        0.023542227 = weight(_text_:based in 5226) [ClassicSimilarity], result of:
          0.023542227 = score(doc=5226,freq=2.0), product of:
            0.14144066 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.04694356 = queryNorm
            0.16644597 = fieldWeight in 5226, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5226)
      0.25 = coord(1/4)
    0.056460675 = weight(_text_:term in 5226) [ClassicSimilarity], result of:
      0.056460675 = score(doc=5226,freq=2.0), product of:
        0.21904005 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.04694356 = queryNorm
        0.25776416 = fieldWeight in 5226, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5226)
    0.08992833 = weight(_text_:frequency in 5226) [ClassicSimilarity], result of:
      0.08992833 = score(doc=5226,freq=2.0), product of:
        0.27643865 = queryWeight, product of:
          5.888745 = idf(docFreq=332, maxDocs=44218)
          0.04694356 = queryNorm
        0.32531026 = fieldWeight in 5226, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.888745 = idf(docFreq=332, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5226)
  0.75 = coord(3/4)

Abstract: Tseng constructs a word co-occurrence based thesaurus by means of the automatic analysis of Chinese text. Words are identified by a longest dictionary match supplemented by a key word extraction algorithm that merges back nearby tokens and accepts shorter strings of characters if they occur more often than the longest string. Single character auxiliary words are a major source of error but this can be greatly reduced with the use of a 70-character 2680 word stop list. Extracted terms with their associate document weights are sorted by decreasing frequency and the top of this list is associated using a Dice coefficient modified to account for longer documents on the weights of term pairs. Co-occurrence is not in the document as a whole but in paragraph or sentence size sections in order to reduce computation time. A window of 29 characters or 11 words was found to be sufficient. A thesaurus was produced from 25,230 Chinese news articles and judges asked to review the top 50 terms associated with each of 30 single word query terms. They determined 69% to be relevant.

Crouch, C.J.: ¬An approach to the automatic construction of global thesauri (1990) 0.08

0.08215908 = product of:
  0.10954544 = sum of:
    0.00823978 = product of:
      0.03295912 = sum of:
        0.03295912 = weight(_text_:based in 4042) [ClassicSimilarity], result of:
          0.03295912 = score(doc=4042,freq=2.0), product of:
            0.14144066 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.04694356 = queryNorm
            0.23302436 = fieldWeight in 4042, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4042)
      0.25 = coord(1/4)
    0.079044946 = weight(_text_:term in 4042) [ClassicSimilarity], result of:
      0.079044946 = score(doc=4042,freq=2.0), product of:
        0.21904005 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.04694356 = queryNorm
        0.36086982 = fieldWeight in 4042, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4042)
    0.022260714 = product of:
      0.04452143 = sum of:
        0.04452143 = weight(_text_:22 in 4042) [ClassicSimilarity], result of:
          0.04452143 = score(doc=4042,freq=2.0), product of:
            0.16438834 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04694356 = queryNorm
            0.2708308 = fieldWeight in 4042, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4042)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: The benefits of a well constructed thesaurus to an information retrieval system have long been recognised by both researchers and practitioners in the field. Examines both early and current approaches to automatic thesaurus construction and describes an approach to the automatic generation of global thesauri based on the term discrimination value model of Salton Yang, and Yu and on an appropriate clustering algorithm. This method has been implemented and applied to 2 document collections. Preliminary results indicate that this method, which produces improvements in retrieval performance in excess of 10 and 15% in the test collections, is viable and worthy of continued investigation.
Date: 22. 4.1996 3:39:53

Eastman, C.M.: Overlaps in postings to thesaurus terms : a preliminary study (1988) 0.08

0.07958529 = product of:
  0.15917058 = sum of:
    0.13690987 = weight(_text_:term in 3555) [ClassicSimilarity], result of:
      0.13690987 = score(doc=3555,freq=6.0), product of:
        0.21904005 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.04694356 = queryNorm
        0.62504494 = fieldWeight in 3555, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3555)
    0.022260714 = product of:
      0.04452143 = sum of:
        0.04452143 = weight(_text_:22 in 3555) [ClassicSimilarity], result of:
          0.04452143 = score(doc=3555,freq=2.0), product of:
            0.16438834 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04694356 = queryNorm
            0.2708308 = fieldWeight in 3555, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3555)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: The patterns of overlap between terms which are closely related in a thesaurus are considered. The relationships considered are parent/child, in which one term is a broader term of the other, and sibling in which to 2 terms share the same broader term. The patterns of overlap observed in the MeSH thesaurus with respect to selected MEDLINE postings are examined. The implications of the overlap patterns are discussed, in particular, the impact of the overlap patterns on the potential effectiveness of a proposed algorithm for handling negation is considered.
Date: 25.12.1995 22:52:34

Hudon, M.: Term definitions in subject thesauri : the Canadian Literacy Thesaurus experience (1992) 0.07

0.06858641 = product of:
  0.13717282 = sum of:
    0.009416891 = product of:
      0.037667565 = sum of:
        0.037667565 = weight(_text_:based in 2107) [ClassicSimilarity], result of:
          0.037667565 = score(doc=2107,freq=2.0), product of:
            0.14144066 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.04694356 = queryNorm
            0.26631355 = fieldWeight in 2107, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0625 = fieldNorm(doc=2107)
      0.25 = coord(1/4)
    0.12775593 = weight(_text_:term in 2107) [ClassicSimilarity], result of:
      0.12775593 = score(doc=2107,freq=4.0), product of:
        0.21904005 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.04694356 = queryNorm
        0.58325374 = fieldWeight in 2107, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.0625 = fieldNorm(doc=2107)
  0.5 = coord(2/4)

Abstract: Suggests that complex thesauri are not entirely appropriate in community-based/oriented resource centres and information systems. Describes a proposal to create and integrate term definitions in the Canadian Literacy Thesaurus, currently under development. Discusses major terminological problems arising in the process

Mooers, C.N.: ¬The indexing language of an information retrieval system (1985) 0.05
```
0.05397012 = product of:
  0.10794024 = sum of:
    0.09680989 = weight(_text_:term in 3644) [ClassicSimilarity], result of:
      0.09680989 = score(doc=3644,freq=12.0), product of:
        0.21904005 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.04694356 = queryNorm
        0.44197345 = fieldWeight in 3644, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.02734375 = fieldNorm(doc=3644)
    0.011130357 = product of:
      0.022260714 = sum of:
        0.022260714 = weight(_text_:22 in 3644) [ClassicSimilarity], result of:
          0.022260714 = score(doc=3644,freq=2.0), product of:
            0.16438834 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04694356 = queryNorm
            0.1354154 = fieldWeight in 3644, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.02734375 = fieldNorm(doc=3644)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Calvin Mooers' work toward the resolution of the problem of ambiguity in indexing went unrecognized for years. At the time he introduced the "descriptor" - a term with a very distinct meaning-indexers were, for the most part, taking index terms directly from the document, without either rationalizing them with context or normalizing them with some kind of classification. It is ironic that Mooers' term came to be attached to the popular but unsophisticated indexing methods which he was trying to root out. Simply expressed, what Mooers did was to take the dictionary definitions of terms and redefine them so clearly that they could not be used in any context except that provided by the new definition. He did, at great pains, construct such meanings for over four hundred words; disambiguation and specificity were sought after and found for these words. He proposed that all indexers adopt this method so that when the index supplied a term, it also supplied the exact meaning for that term as used in the indexed document. The same term used differently in another document would be defined differently and possibly renamed to avoid ambiguity. The disambiguation was achieved by using unabridged dictionaries and other sources of defining terminology. In practice, this tends to produce circularity in definition, that is, word A refers to word B which refers to word C which refers to word A. It was necessary, therefore, to break this chain by creating a new, definitive meaning for each word. Eventually, means such as those used by Austin (q.v.) for PRECIS achieved the same purpose, but by much more complex means than just creating a unique definition of each term. Mooers, however, was probably the first to realize how confusing undefined terminology could be. Early automatic indexers dealt with distinct disciplines and, as long as they did not stray beyond disciplinary boundaries, a quick and dirty keyword approach was satisfactory. The trouble came when attempts were made to make a combined index for two or more distinct disciplines. A number of processes have since been developed, mostly involving tagging of some kind or use of strings. Mooers' solution has rarely been considered seriously and probably would be extremely difficult to apply now because of so much interdisciplinarity. But for a specific, weIl defined field, it is still weIl worth considering. Mooers received training in mathematics and physics from the University of Minnesota and the Massachusetts Institute of Technology. He was the founder of Zator Company, which developed and marketed a coded card information retrieval system, and of Rockford Research, Inc., which engages in research in information science. He is the inventor of the TRAC computer language.

Footnote

Original in: Information retrieval today: papers presented at an Institute conducted by the Library School and the Center for Continuation Study, University of Minnesota, Sept. 19-22, 1962. Ed. by Wesley Simonton. Minneapolis, Minn.: The Center, 1963. S.21-36.
Amirhosseini, M.: Theoretical base of quantitative evaluation of unity in a thesaurus term network based on Kant's epistemology (2010) 0.05
```
0.0530581 = product of:
  0.1061162 = sum of:
    0.008323434 = product of:
      0.033293735 = sum of:
        0.033293735 = weight(_text_:based in 5854) [ClassicSimilarity], result of:
          0.033293735 = score(doc=5854,freq=4.0), product of:
            0.14144066 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.04694356 = queryNorm
            0.23539014 = fieldWeight in 5854, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5854)
      0.25 = coord(1/4)
    0.09779277 = weight(_text_:term in 5854) [ClassicSimilarity], result of:
      0.09779277 = score(doc=5854,freq=6.0), product of:
        0.21904005 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.04694356 = queryNorm
        0.44646066 = fieldWeight in 5854, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5854)
  0.5 = coord(2/4)
```
Abstract

The quantitative evaluation of thesauri has been carried out much further since 1976. This type of evaluation is based on counting of special factors in thesaurus structure, some of which are counting preferred terms, non preferred terms, cross reference terms and so on. Therefore, various statistical tests have been proposed and applied for evaluation of thesauri. In this article, we try to explain some ratios in the field of unity quantitative evaluation in a thesaurus term network. Theoretical base of the ratios' indicators and indices construction, and epistemological thought in this type of quantitative evaluation, are discussed in this article. The theoretical base of quantitative evaluation is the epistemological thought of Immanuel Kant's Critique of pure reason. The cognition states of transcendental understanding are divided into three steps, the first is perception, the second combination and the third, relation making. Terms relation domains and conceptual relation domains can be analyzed with ratios. The use of quantitative evaluations in current research in the field of thesaurus construction prepares a basis for a restoration period. In modern thesaurus construction, traditional term relations are analyzed in detail in the form of new conceptual relations. Hence, the new domains of hierarchical and associative relations are constructed in the form of relations between concepts. The newly formed conceptual domains can be a suitable basis for quantitative evaluation analysis in conceptual relations.

Losee, R.M.: Decisions in thesaurus construction and use (2007) 0.05

0.051439807 = product of:
  0.10287961 = sum of:
    0.0070626684 = product of:
      0.028250674 = sum of:
        0.028250674 = weight(_text_:based in 924) [ClassicSimilarity], result of:
          0.028250674 = score(doc=924,freq=2.0), product of:
            0.14144066 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.04694356 = queryNorm
            0.19973516 = fieldWeight in 924, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=924)
      0.25 = coord(1/4)
    0.09581695 = weight(_text_:term in 924) [ClassicSimilarity], result of:
      0.09581695 = score(doc=924,freq=4.0), product of:
        0.21904005 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.04694356 = queryNorm
        0.4374403 = fieldWeight in 924, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.046875 = fieldNorm(doc=924)
  0.5 = coord(2/4)

Abstract: A thesaurus and an ontology provide a set of structured terms, phrases, and metadata, often in a hierarchical arrangement, that may be used to index, search, and mine documents. We describe the decisions that should be made when including a term, deciding whether a term should be subdivided into its subclasses, or determining which of more than one set of possible subclasses should be used. Based on retrospective measurements or estimates of future performance when using thesaurus terms in document ordering, decisions are made so as to maximize performance. These decisions may be used in the automatic construction of a thesaurus. The evaluation of an existing thesaurus is described, consistent with the decision criteria developed here. These kinds of user-focused decision-theoretic techniques may be applied to other hierarchical applications, such as faceted classification systems used in information architecture or the use of hierarchical terms in "breadcrumb navigation".

Jones, S.; Gatford, M.; Robertson, S.; Hancock-Beaulieu, M.; Secker, J.; Walker, S.: Interactive thesaurus navigation : intelligence rules OK? (1995) 0.05

0.045348875 = product of:
  0.09069775 = sum of:
    0.011652809 = product of:
      0.046611235 = sum of:
        0.046611235 = weight(_text_:based in 180) [ClassicSimilarity], result of:
          0.046611235 = score(doc=180,freq=4.0), product of:
            0.14144066 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.04694356 = queryNorm
            0.3295462 = fieldWeight in 180, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0546875 = fieldNorm(doc=180)
      0.25 = coord(1/4)
    0.079044946 = weight(_text_:term in 180) [ClassicSimilarity], result of:
      0.079044946 = score(doc=180,freq=2.0), product of:
        0.21904005 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.04694356 = queryNorm
        0.36086982 = fieldWeight in 180, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.0546875 = fieldNorm(doc=180)
  0.5 = coord(2/4)

Abstract: We discuss whether it is feasible to build intelligent rule- or weight-based algorithms into general-purpose software for interactive thesaurus navigation. We survey some approaches to the problem reported in the literature, particularly those involving the assignement of 'link weights' in a thesaurus network, and point out some problems of both principle and practice. We then describe investigations which entailed logging the behavior of thesaurus users and testing the effect of thesaurus-based query enhancement in an IR system using term weighting, in an attempt to identify successful strategies to incorporate into automatic procedures. The results cause us to question many of the assumptions made by previous researchers in this area

Mu, X.; Lu, K.; Ryu, H.: Explicitly integrating MeSH thesaurus help into health information retrieval systems : an empirical user study (2014) 0.04
```
0.04286651 = product of:
  0.08573302 = sum of:
    0.005885557 = product of:
      0.023542227 = sum of:
        0.023542227 = weight(_text_:based in 2703) [ClassicSimilarity], result of:
          0.023542227 = score(doc=2703,freq=2.0), product of:
            0.14144066 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.04694356 = queryNorm
            0.16644597 = fieldWeight in 2703, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2703)
      0.25 = coord(1/4)
    0.07984746 = weight(_text_:term in 2703) [ClassicSimilarity], result of:
      0.07984746 = score(doc=2703,freq=4.0), product of:
        0.21904005 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.04694356 = queryNorm
        0.3645336 = fieldWeight in 2703, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2703)
  0.5 = coord(2/4)
```
Abstract

When consumers search for health information, a major obstacle is their unfamiliarity with the medical terminology. Even though medical thesauri such as the Medical Subject Headings (MeSH) and related tools (e.g., the MeSH Browser) were created to help consumers find medical term definitions, the lack of direct and explicit integration of these help tools into a health retrieval system prevented them from effectively achieving their objectives. To explore this issue, we conducted an empirical study with two systems: One is a simple interface system supporting query-based searching; the other is an augmented system with two new components supporting MeSH term searching and MeSH tree browsing. A total of 45 subjects were recruited to participate in the study. The results indicated that the augmented system is more effective than the simple system in terms of improving user-perceived topic familiarity and question-answer performance, even though we did not find users spend more time on the augmented system. The two new MeSH help components played a critical role in participants' health information retrieval and were found to allow them to develop new search strategies. The findings of the study enhanced our understanding of consumers' search behaviors and shed light on the design of future health information retrieval systems.

Andrade, J. de; Lopes Ginez de Lara, M.: Interoperability and mapping between knowledge organization systems : metathesaurus - Unified Medical Language System of the National Library of Medicine (2016) 0.04

0.038870465 = product of:
  0.07774093 = sum of:
    0.009988121 = product of:
      0.039952483 = sum of:
        0.039952483 = weight(_text_:based in 2826) [ClassicSimilarity], result of:
          0.039952483 = score(doc=2826,freq=4.0), product of:
            0.14144066 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.04694356 = queryNorm
            0.28246817 = fieldWeight in 2826, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=2826)
      0.25 = coord(1/4)
    0.06775281 = weight(_text_:term in 2826) [ClassicSimilarity], result of:
      0.06775281 = score(doc=2826,freq=2.0), product of:
        0.21904005 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.04694356 = queryNorm
        0.309317 = fieldWeight in 2826, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.046875 = fieldNorm(doc=2826)
  0.5 = coord(2/4)

Abstract: This paper is aimed at assessing the potential of interoperable knowledge organization systems to respond to search strategies in order to retrieve information from databases in the areas of health and biomedicine. An analysis was done on the semantic consistency of synonym grouping of a term selected from the Metathesaurus, the Unified Medical Language System of the National Library of Medicine, based on the characteristics of equivalence proposed in ISO 25964: 2: 2011 and based on the following categories: semantic, morphological, syntactic and typographical variations. This paper highlights the importance of understanding the results of automatic mapping as well as the need for characterization, evaluation and selection of equivalences for preparation of consistent search strategies and presentation of search results in scientific work methodologies.

McCulloch, E.: Thesauri: practical guidance for construction (2005) 0.04
```
0.037407737 = product of:
  0.074815474 = sum of:
    0.0070626684 = product of:
      0.028250674 = sum of:
        0.028250674 = weight(_text_:based in 4724) [ClassicSimilarity], result of:
          0.028250674 = score(doc=4724,freq=2.0), product of:
            0.14144066 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.04694356 = queryNorm
            0.19973516 = fieldWeight in 4724, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=4724)
      0.25 = coord(1/4)
    0.06775281 = weight(_text_:term in 4724) [ClassicSimilarity], result of:
      0.06775281 = score(doc=4724,freq=2.0), product of:
        0.21904005 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.04694356 = queryNorm
        0.309317 = fieldWeight in 4724, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.046875 = fieldNorm(doc=4724)
  0.5 = coord(2/4)
```
Abstract

Purpose - With the growing recognition that thesauri aid information retrieval, organisations are beginning to adopt, and in many cases, create thesauri. This paper offers some guidance on the construction process. Design/methodology/approach - An opinion piece with a practical focus, based on recent experiences gleaned from consultancy work. Findings - A number of steps can be taken to ensure any thesaurus under construction is fit for purpose. Due consideration is therefore given to aspects such as term selection, structure and notation, thesauri standards, software and Web display issues, thesauri evaluation and maintenance. This paper also notes that creating new subject schemes from scratch, however attractive, contributes to the plethora of terminologies currently in existence and can limit user searching within particular contexts. The decision to create a "new" thesaurus should therefore be taken carefully and observance of standards is paramount. Practical implications - This paper offers advice to assist practitioners in the development of thesauri. Originality/value - Useful guidance for those practitioners new to the area of thesaurus construction is provided, together with an overview of selected key processes involved in the construction of a thesaurus.

Milstead, J.L.: Thesauri in a full-text world (1998) 0.04

0.036180593 = product of:
  0.072361186 = sum of:
    0.056460675 = weight(_text_:term in 2337) [ClassicSimilarity], result of:
      0.056460675 = score(doc=2337,freq=2.0), product of:
        0.21904005 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.04694356 = queryNorm
        0.25776416 = fieldWeight in 2337, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2337)
    0.015900511 = product of:
      0.031801023 = sum of:
        0.031801023 = weight(_text_:22 in 2337) [ClassicSimilarity], result of:
          0.031801023 = score(doc=2337,freq=2.0), product of:
            0.16438834 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04694356 = queryNorm
            0.19345059 = fieldWeight in 2337, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2337)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: Despite early claims to the contemporary, thesauri continue to find use as access tools for information in the full-text environment. Their mode of use is changing, but this change actually represents an expansion rather than a contrdiction of their utility. Thesauri and similar vocabulary tools can complement full-text access by aiding users in focusing their searches, by supplementing the linguistic analysis of the text search engine, and even by serving as one of the tools used by the linguistic engine for its analysis. While human indexing contunues to be used for many databases, the trend is to increase the use of machine aids for this purpose. All machine-aided indexing (MAI) systems rely on thesauri as the basis for term selection. In the 21st century, the balance of effort between human and machine will change at both input and output, but thesauri will continue to play an important role for the foreseeable future
Date: 22. 9.1997 19:16:05

Li, K.W.; Yang, C.C.: Automatic crosslingual thesaurus generated from the Hong Kong SAR Police Department Web Corpus for Crime Analysis (2005) 0.04
```
0.036016613 = product of:
  0.07203323 = sum of:
    0.008155267 = product of:
      0.032621067 = sum of:
        0.032621067 = weight(_text_:based in 3391) [ClassicSimilarity], result of:
          0.032621067 = score(doc=3391,freq=6.0), product of:
            0.14144066 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.04694356 = queryNorm
            0.2306343 = fieldWeight in 3391, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03125 = fieldNorm(doc=3391)
      0.25 = coord(1/4)
    0.06387796 = weight(_text_:term in 3391) [ClassicSimilarity], result of:
      0.06387796 = score(doc=3391,freq=4.0), product of:
        0.21904005 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.04694356 = queryNorm
        0.29162687 = fieldWeight in 3391, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.03125 = fieldNorm(doc=3391)
  0.5 = coord(2/4)
```
Abstract

For the sake of national security, very large volumes of data and information are generated and gathered daily. Much of this data and information is written in different languages, stored in different locations, and may be seemingly unconnected. Crosslingual semantic interoperability is a major challenge to generate an overview of this disparate data and information so that it can be analyzed, shared, searched, and summarized. The recent terrorist attacks and the tragic events of September 11, 2001 have prompted increased attention an national security and criminal analysis. Many Asian countries and cities, such as Japan, Taiwan, and Singapore, have been advised that they may become the next targets of terrorist attacks. Semantic interoperability has been a focus in digital library research. Traditional information retrieval (IR) approaches normally require a document to share some common keywords with the query. Generating the associations for the related terms between the two term spaces of users and documents is an important issue. The problem can be viewed as the creation of a thesaurus. Apart from this, terrorists and criminals may communicate through letters, e-mails, and faxes in languages other than English. The translation ambiguity significantly exacerbates the retrieval problem. The problem is expanded to crosslingual semantic interoperability. In this paper, we focus an the English/Chinese crosslingual semantic interoperability problem. However, the developed techniques are not limited to English and Chinese languages but can be applied to many other languages. English and Chinese are popular languages in the Asian region. Much information about national security or crime is communicated in these languages. An efficient automatically generated thesaurus between these languages is important to crosslingual information retrieval between English and Chinese languages. To facilitate crosslingual information retrieval, a corpus-based approach uses the term co-occurrence statistics in parallel or comparable corpora to construct a statistical translation model to cross the language boundary. In this paper, the text based approach to align English/Chinese Hong Kong Police press release documents from the Web is first presented. We also introduce an algorithmic approach to generate a robust knowledge base based an statistical correlation analysis of the semantics (knowledge) embedded in the bilingual press release corpus. The research output consisted of a thesaurus-like, semantic network knowledge base, which can aid in semanticsbased crosslingual information management and retrieval.
Spiteri, L.F.: Word association testing and thesaurus construction : a pilot study (2005) 0.03
```
0.034227468 = product of:
  0.13690987 = sum of:
    0.13690987 = weight(_text_:term in 5216) [ClassicSimilarity], result of:
      0.13690987 = score(doc=5216,freq=6.0), product of:
        0.21904005 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.04694356 = queryNorm
        0.62504494 = fieldWeight in 5216, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5216)
  0.25 = coord(1/4)
```
Abstract

This pilot study examines the use of word association testing in the derivation of user-derived descriptors, descriptor hierarchies, and categories of inter-term relationships for the purpose of thesaurus construction. Ten participants, who were students, were presented with a test-bed of 15 domain-specific stimulus terms and were asked to provide as many response words as they could for each stimulus term and to describe how the response and stimulus terms are inter-related. The word association test was successful in generating a significant number of word pairs and facet indicators that could be used to display inter-term relationships in thesauri.
Martín-Moncunill, D.; García-Barriocanal, E.; Sicilia, M.-A.; Sánchez-Alonso, S.: Evaluating the practical applicability of thesaurus-based keyphrase extraction in the agricultural domain : insights from the VOA3R project (2015) 0.03
```
0.03332738 = product of:
  0.06665476 = sum of:
    0.010194084 = product of:
      0.040776335 = sum of:
        0.040776335 = weight(_text_:based in 2106) [ClassicSimilarity], result of:
          0.040776335 = score(doc=2106,freq=6.0), product of:
            0.14144066 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.04694356 = queryNorm
            0.28829288 = fieldWeight in 2106, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2106)
      0.25 = coord(1/4)
    0.056460675 = weight(_text_:term in 2106) [ClassicSimilarity], result of:
      0.056460675 = score(doc=2106,freq=2.0), product of:
        0.21904005 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.04694356 = queryNorm
        0.25776416 = fieldWeight in 2106, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2106)
  0.5 = coord(2/4)
```
Abstract

The use of Knowledge Organization Systems (KOSs) in aggregated metadata collections facilitates the implementation of search mechanisms operating on the same term or keyphrase space, thus preparing the ground for improved browsing, more accurate retrieval and better user profiling. Automatic thesaurus-based keyphrase extraction appears to be an inexpensive tool to obtain this information, but the studies on its effectiveness are scattered and do not consider the practical applicability of these techniques compared to the quality obtained by involving human experts. This paper presents an evaluation of keyphrase extraction using the KEA software and the AGROVOC vocabulary on a sample of a large collection of metadata in the field of agriculture from the AGRIS database. This effort includes a double evaluation, the classical automatic evaluation based on precision and recall measures, plus a blind evaluation aimed to contrast the quality of the keyphrases extracted against expert-provided samples and against the keyphrases originally recorded in the metadata. Results show not only that KEA outperforms humans in matching the original keyphrases, but also that the quality of the keyphrases extracted was similar to those provided by humans.
Wang, J.: Automatic thesaurus development : term extraction from title metadata (2006) 0.03
```
0.031173116 = product of:
  0.06234623 = sum of:
    0.005885557 = product of:
      0.023542227 = sum of:
        0.023542227 = weight(_text_:based in 5063) [ClassicSimilarity], result of:
          0.023542227 = score(doc=5063,freq=2.0), product of:
            0.14144066 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.04694356 = queryNorm
            0.16644597 = fieldWeight in 5063, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5063)
      0.25 = coord(1/4)
    0.056460675 = weight(_text_:term in 5063) [ClassicSimilarity], result of:
      0.056460675 = score(doc=5063,freq=2.0), product of:
        0.21904005 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.04694356 = queryNorm
        0.25776416 = fieldWeight in 5063, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5063)
  0.5 = coord(2/4)
```
Abstract

The application of thesauri in networked environments is seriously hampered by the challenges of introducing new concepts and terminology into the formal controlled vocabulary, which is critical for enhancing its retrieval capability. The author describes an automated process of adding new terms to thesauri as entry vocabulary by analyzing the association between words/phrases extracted from bibliographic titles and subject descriptors in the metadata record (subject descriptors are terms assigned from controlled vocabularies of thesauri to describe the subjects of the objects [e.g., books, articles] represented by the metadata records). The investigated approach uses a corpus of metadata for scientific and technical (S&T) publications in which the titles contain substantive words for key topics. The three steps of the method are (a) extracting words and phrases from the title field of the metadata; (b) applying a method to identify and select the specific and meaningful keywords based on the associated controlled vocabulary terms from the thesaurus used to catalog the objects; and (c) inserting selected keywords into the thesaurus as new terms (most of them are in hierarchical relationships with the existing concepts), thereby updating the thesaurus with new terminology that is being used in the literature. The effectiveness of the method was demonstrated by an experiment with the Chinese Classification Thesaurus (CCT) and bibliographic data in China Machine-Readable Cataloging Record (MARC) format (CNMARC) provided by Peking University Library. This approach is equally effective in large-scale collections and in other languages.
Shiri, A.A.; Revie, C.; Chowdhury, G.: Thesaurus-assisted search term selection and query expansion : a review of user-centred studies (2002) 0.03
```
0.029337829 = product of:
  0.117351316 = sum of:
    0.117351316 = weight(_text_:term in 1330) [ClassicSimilarity], result of:
      0.117351316 = score(doc=1330,freq=6.0), product of:
        0.21904005 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.04694356 = queryNorm
        0.5357528 = fieldWeight in 1330, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.046875 = fieldNorm(doc=1330)
  0.25 = coord(1/4)
```
Abstract

This paper provides a review of the literature related to the application of domain-specific thesauri in the search and retrieval process. Focusing an studies that adopt a user-centred approach, the review presents a survey of the methodologies and results from empirical studies undertaken an the use of thesauri as sources of term selection for query formulation and expansion during the search process. It summarises the ways in which domain-specific thesauri from different disciplines have been used by various types of users and how these tools aid users in the selection of search terms. The review consists of two main sections: first, studies an thesaurus-aided search term selection; and second, studies dealing with query expansion using thesauri. Both sections are illustrated with case studies that have adopted a user-centred approach.
Ma, X.; Carranza, E.J.M.; Wu, C.; Meer, F.D. van der; Liu, G.: ¬A SKOS-based multilingual thesaurus of geological time scale for interoperability of online geological maps (2011) 0.03
```
0.027292717 = product of:
  0.054585434 = sum of:
    0.009416891 = product of:
      0.037667565 = sum of:
        0.037667565 = weight(_text_:based in 4800) [ClassicSimilarity], result of:
          0.037667565 = score(doc=4800,freq=8.0), product of:
            0.14144066 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.04694356 = queryNorm
            0.26631355 = fieldWeight in 4800, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03125 = fieldNorm(doc=4800)
      0.25 = coord(1/4)
    0.04516854 = weight(_text_:term in 4800) [ClassicSimilarity], result of:
      0.04516854 = score(doc=4800,freq=2.0), product of:
        0.21904005 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.04694356 = queryNorm
        0.20621133 = fieldWeight in 4800, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.03125 = fieldNorm(doc=4800)
  0.5 = coord(2/4)
```
Abstract

The usefulness of online geological maps is hindered by linguistic barriers. Multilingual geoscience thesauri alleviate linguistic barriers of geological maps. However, the benefits of multilingual geoscience thesauri for online geological maps are less studied. In this regard, we developed a multilingual thesaurus of geological time scale (GTS) to alleviate linguistic barriers of GTS records among online geological maps. We extended the Simple Knowledge Organization System (SKOS) model to represent the ordinal hierarchical structure of GTS terms. We collected GTS terms in seven languages and encoded them into a thesaurus by using the extended SKOS model. We implemented methods of characteristic-oriented term retrieval in JavaScript programs for accessing Web Map Services (WMS), recognizing GTS terms, and making translations. With the developed thesaurus and programs, we set up a pilot system to test recognitions and translations of GTS terms in online geological maps. Results of this pilot system proved the accuracy of the developed thesaurus and the functionality of the developed programs. Therefore, with proper deployments, SKOS-based multilingual geoscience thesauri can be functional for alleviating linguistic barriers among online geological maps and, thus, improving their interoperability.

Content

Article Outline 1. Introduction 2. SKOS-based multilingual thesaurus of geological time scale 2.1. Addressing the insufficiency of SKOS in the context of the Semantic Web 2.2. Addressing semantics and syntax/lexicon in multilingual GTS terms 2.3. Extending SKOS model to capture GTS structure 2.4. Summary of building the SKOS-based MLTGTS 3. Recognizing and translating GTS terms retrieved from WMS 4. Pilot system, results, and evaluation 5. Discussion 6. Conclusions Vgl. unter: http://www.sciencedirect.com/science?_ob=MiamiImageURL&_cid=271720&_user=3865853&_pii=S0098300411000744&_check=y&_origin=&_coverDate=31-Oct-2011&view=c&wchp=dGLbVlt-zSkzS&_valck=1&md5=e2c1daf53df72d034d22278212578f42&ie=/sdarticle.pdf.
Tudhope, D.; Alani, H.; Jones, C.: Augmenting thesaurus relationships : possibilities for retrieval (2001) 0.02
```
0.024448192 = product of:
  0.09779277 = sum of:
    0.09779277 = weight(_text_:term in 1520) [ClassicSimilarity], result of:
      0.09779277 = score(doc=1520,freq=6.0), product of:
        0.21904005 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.04694356 = queryNorm
        0.44646066 = fieldWeight in 1520, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1520)
  0.25 = coord(1/4)
```
Abstract

This paper discusses issues concerning the augmentation of thesaurus relationships, in light of new application possibilities for retrieval. We first discuss a case study that explored the retrieval potential of an augmented set of thesaurus relationships by specialising standard relationships into richer subtypes, in particular hierarchical geographical containment and the associative relationship. We then locate this work in a broader context by reviewing various attempts to build taxonomies of thesaurus relationships, and conclude by discussing the feasibility of hierarchically augmenting the core set of thesaurus relationships, particularly the associative relationship. We discuss the possibility of enriching the specification and semantics of Related Term (RT relationships), while maintaining compatibility with traditional thesauri via a limited hierarchical extension of the associative (and hierarchical) relationships. This would be facilitated by distinguishing the type of term from the (sub)type of relationship and explicitly specifying semantic categories for terms following a faceted approach. We first illustrate how hierarchical spatial relationships can be used to provide more flexible retrieval for queries incorporating place names in applications employing online gazetteers and geographical thesauri. We then employ a set of experimental scenarios to investigate key issues affecting use of the associative (RT) thesaurus relationships in semantic distance measures. Previous work has noted the potential of RTs in thesaurus search aids but also the problem of uncontrolled expansion of query term sets. Results presented in this paper suggest the potential for taking account of the hierarchical context of an RT link and specialisations of the RT relationship

Search (114 results, page 1 of 6)

Authors

Years

Languages

Types

Themes

Subjects

Classifications