Search (4 results, page 1 of 1)

Losee, R.M.: Decisions in thesaurus construction and use (2007) 0.04
```
0.035616852 = product of:
  0.17808425 = sum of:
    0.17808425 = weight(_text_:thesaurus in 924) [ClassicSimilarity], result of:
      0.17808425 = score(doc=924,freq=12.0), product of:
        0.23732872 = queryWeight, product of:
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.051357865 = queryNorm
        0.7503696 = fieldWeight in 924, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.046875 = fieldNorm(doc=924)
  0.2 = coord(1/5)
```
Abstract

A thesaurus and an ontology provide a set of structured terms, phrases, and metadata, often in a hierarchical arrangement, that may be used to index, search, and mine documents. We describe the decisions that should be made when including a term, deciding whether a term should be subdivided into its subclasses, or determining which of more than one set of possible subclasses should be used. Based on retrospective measurements or estimates of future performance when using thesaurus terms in document ordering, decisions are made so as to maximize performance. These decisions may be used in the automatic construction of a thesaurus. The evaluation of an existing thesaurus is described, consistent with the decision criteria developed here. These kinds of user-focused decision-theoretic techniques may be applied to other hierarchical applications, such as faceted classification systems used in information architecture or the use of hierarchical terms in "breadcrumb navigation".

Theme

Konzeption und Anwendung des Prinzips Thesaurus
Willis, C.; Losee, R.M.: ¬A random walk on an ontology : using thesaurus structure for automatic subject indexing (2013) 0.03
```
0.025647065 = product of:
  0.12823533 = sum of:
    0.12823533 = weight(_text_:thesaurus in 1016) [ClassicSimilarity], result of:
      0.12823533 = score(doc=1016,freq=14.0), product of:
        0.23732872 = queryWeight, product of:
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.051357865 = queryNorm
        0.5403279 = fieldWeight in 1016, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.03125 = fieldNorm(doc=1016)
  0.2 = coord(1/5)
```
Abstract

Relationships between terms and features are an essential component of thesauri, ontologies, and a range of controlled vocabularies. In this article, we describe ways to identify important concepts in documents using the relationships in a thesaurus or other vocabulary structures. We introduce a methodology for the analysis and modeling of the indexing process based on a weighted random walk algorithm. The primary goal of this research is the analysis of the contribution of thesaurus structure to the indexing process. The resulting models are evaluated in the context of automatic subject indexing using four collections of documents pre-indexed with 4 different thesauri (AGROVOC [UN Food and Agriculture Organization], high-energy physics taxonomy [HEP], National Agricultural Library Thesaurus [NALT], and medical subject headings [MeSH]). We also introduce a thesaurus-centric matching algorithm intended to improve the quality of candidate concepts. In all cases, the weighted random walk improves automatic indexing performance over matching alone with an increase in average precision (AP) of 9% for HEP, 11% for MeSH, 35% for NALT, and 37% for AGROVOC. The results of the analysis support our hypothesis that subject indexing is in part a browsing process, and that using the vocabulary and its structure in a thesaurus contributes to the indexing process. The amount that the vocabulary structure contributes was found to differ among the 4 thesauri, possibly due to the vocabulary used in the corresponding thesauri and the structural relationships between the terms. Each of the thesauri and the manual indexing associated with it is characterized using the methods developed here.

Theme

Konzeption und Anwendung des Prinzips Thesaurus
Haas, S.W.; Losee, R.M.: Looking in text windows : their size and composition (1994) 0.01
```
0.014540519 = product of:
  0.072702594 = sum of:
    0.072702594 = weight(_text_:thesaurus in 8525) [ClassicSimilarity], result of:
      0.072702594 = score(doc=8525,freq=2.0), product of:
        0.23732872 = queryWeight, product of:
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.051357865 = queryNorm
        0.30633712 = fieldWeight in 8525, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.046875 = fieldNorm(doc=8525)
  0.2 = coord(1/5)
```
Abstract

A text window is a group of words appearing in contiguous positions in text used to exploit a variety of lexical, syntactics, and semantic relationships without having to analyze the text explicitely for their structure. This supports the previously suggested idea that natural grouping of words are best treated as a unit of size 7 to 11 words, that is, plus or minus 3 to 5 words. The text retrieval experiments varying the size of windows, both with full text and with stopwords removed, support these size ranges. The characteristcs of windows that best match terms in queries are examined in detail, revealing intersting differences between those for queries with good results and those for queries with poorer results. Queries with good results tend to contain morte content word phrase and few terms with high frequency of use in the database. Information retrieval systems may benefit from expanding thesaurus-style relationships or incorporating statistical dependencies for terms within these windows

Losee, R.M.: Determining information retrieval and filtering performance without experimentation (1995) 0.01

0.009741595 = product of:
  0.048707973 = sum of:
    0.048707973 = weight(_text_:22 in 3368) [ClassicSimilarity], result of:
      0.048707973 = score(doc=3368,freq=2.0), product of:
        0.1798465 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.051357865 = queryNorm
        0.2708308 = fieldWeight in 3368, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3368)
  0.2 = coord(1/5)

Date: 22. 2.1996 13:14:10

Search (4 results, page 1 of 1)

Authors

Years

Themes