Search (13 results, page 1 of 1)

Haas, S.W.; Losee, R.M.: Looking in text windows : their size and composition (1994) 0.01
```
0.0052452656 = product of:
  0.036716856 = sum of:
    0.036716856 = weight(_text_:with in 8525) [ClassicSimilarity], result of:
      0.036716856 = score(doc=8525,freq=12.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.39129806 = fieldWeight in 8525, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.046875 = fieldNorm(doc=8525)
  0.14285715 = coord(1/7)
```
Abstract

A text window is a group of words appearing in contiguous positions in text used to exploit a variety of lexical, syntactics, and semantic relationships without having to analyze the text explicitely for their structure. This supports the previously suggested idea that natural grouping of words are best treated as a unit of size 7 to 11 words, that is, plus or minus 3 to 5 words. The text retrieval experiments varying the size of windows, both with full text and with stopwords removed, support these size ranges. The characteristcs of windows that best match terms in queries are examined in detail, revealing intersting differences between those for queries with good results and those for queries with poorer results. Queries with good results tend to contain morte content word phrase and few terms with high frequency of use in the database. Information retrieval systems may benefit from expanding thesaurus-style relationships or incorporating statistical dependencies for terms within these windows

Losee, R.M.; Paris, L.A.H.: Measuring search-engine quality and query difficulty : ranking with Target and Freestyle (1999) 0.00

0.004282741 = product of:
  0.029979186 = sum of:
    0.029979186 = weight(_text_:with in 4310) [ClassicSimilarity], result of:
      0.029979186 = score(doc=4310,freq=2.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.3194935 = fieldWeight in 4310, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.09375 = fieldNorm(doc=4310)
  0.14285715 = coord(1/7)

Losee, R.M.: ¬The effect of assigning a metadata or indexing term on document ordering (2013) 0.00
```
0.004282741 = product of:
  0.029979186 = sum of:
    0.029979186 = weight(_text_:with in 1100) [ClassicSimilarity], result of:
      0.029979186 = score(doc=1100,freq=8.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.3194935 = fieldWeight in 1100, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.046875 = fieldNorm(doc=1100)
  0.14285715 = coord(1/7)
```
Abstract

The assignment of indexing terms and metadata to documents, data, and other information representations is considered useful, but the utility of including a single term is seldom discussed. The author discusses a simple model of document ordering and then shows how assigning index and metadata labels improves or decreases retrieval performance. The Indexing and Metadata Advantage (IMA) factor measures how indexing or assigning a metadata term helps (or hurts) ordering performance. Performance values and the associated IMA expressions are computed, consistent with several different assumptions. The economic value associated with various term assignment decisions is developed. The IMA term advantage model itself is empirically validated with computer software that shows that the analytic results obtained agree completely with the actual performance gains and losses found when ordering all sets of 14 or fewer documents. When the formulas in the software are changed to differ from this model, the predictions of the actual performance are erroneous.
Losee, R.M.: Improving collection browsing : small world networking and Gray code ordering (2017) 0.00
```
0.003568951 = product of:
  0.024982655 = sum of:
    0.024982655 = weight(_text_:with in 5148) [ClassicSimilarity], result of:
      0.024982655 = score(doc=5148,freq=8.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.2662446 = fieldWeight in 5148, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5148)
  0.14285715 = coord(1/7)
```
Abstract

Documents in digital and paper libraries may be arranged, based on their topics, in order to facilitate browsing. It may seem intuitively obvious that ordering documents by their subject should improve browsing performance; the results presented in this article suggest that ordering library materials by their Gray code values and through using links consistent with the small world model of document relationships is consistent with improving browsing performance. Below, library circulation data, including ordering with Library of Congress Classification numbers and Library of Congress Subject Headings, are used to provide information useful in generating user-centered document arrangements, as well as user-independent arrangements. Documents may be linearly arranged so they can be placed in a line by topic, such as on a library shelf, or in a list on a computer display. Crossover links, jumps between a document and another document to which it is not adjacent, can be used in library databases to allow additional paths that one might take when browsing. The improvement that is obtained with different combinations of document orderings and different crossovers is examined and applications suggested.
Losee, R.M.; Haas, S.W.: Sublanguage terms : dictionaries, usage, and automatic classification (1995) 0.00
```
0.0028551605 = product of:
  0.019986123 = sum of:
    0.019986123 = weight(_text_:with in 2650) [ClassicSimilarity], result of:
      0.019986123 = score(doc=2650,freq=2.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.21299566 = fieldWeight in 2650, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.0625 = fieldNorm(doc=2650)
  0.14285715 = coord(1/7)
```
Abstract

The use of terms from natural and social science titles and abstracts is studied from the perspective of sublanguages and their specialized dictionaries. Explores different notions of sublanguage distinctiveness. Object methods for separating hard and soft sciences are suggested based on measures of sublanguage use, dictionary characteristics, and sublanguage distinctiveness. Abstracts were automatically classified with a high degree of accuracy by using a formula that condsiders the degree of uniqueness of terms in each sublanguage. This may prove useful for text filtering of information retrieval systems
Losee, R.M.: ¬A discipline independent definition of information (1997) 0.00
```
0.0028551605 = product of:
  0.019986123 = sum of:
    0.019986123 = weight(_text_:with in 380) [ClassicSimilarity], result of:
      0.019986123 = score(doc=380,freq=2.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.21299566 = fieldWeight in 380, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.0625 = fieldNorm(doc=380)
  0.14285715 = coord(1/7)
```
Abstract

Information may be defined as the characteristics of the output of a process, these being informative about the process and the input. This discipline independent definition may be applied to all domains, from physics to epistemology. Hierarchies of processes linked together, provide a communication channel between each of the corresponding functions and layers in the hierarchies. Models of communication, perception, observation, belief, and knowledge are suggested that are consistent with this conceptual framework of information as the value of the output of any process in a hierarchy of processes. Misinformation and errors are considered

Losee, R.M.: Determining information retrieval and filtering performance without experimentation (1995) 0.00

0.002637832 = product of:
  0.018464822 = sum of:
    0.018464822 = product of:
      0.036929645 = sum of:
        0.036929645 = weight(_text_:22 in 3368) [ClassicSimilarity], result of:
          0.036929645 = score(doc=3368,freq=2.0), product of:
            0.13635688 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038938753 = queryNorm
            0.2708308 = fieldWeight in 3368, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3368)
      0.5 = coord(1/2)
  0.14285715 = coord(1/7)

Date: 22. 2.1996 13:14:10

Losee, R.M.; Church Jr., L.: Are two document clusters better than one? : the cluster performance question for information retrieval (2005) 0.00
```
0.0024982654 = product of:
  0.017487857 = sum of:
    0.017487857 = weight(_text_:with in 3270) [ClassicSimilarity], result of:
      0.017487857 = score(doc=3270,freq=2.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.1863712 = fieldWeight in 3270, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3270)
  0.14285715 = coord(1/7)
```
Abstract

When do information retrieval systems using two document clusters provide better retrieval performance than systems using no clustering? We answer this question for one set of assumptions and suggest how this may be studied with other assumptions. The "Cluster Hypothesis" asks an empirical question about the relationships between documents and user-supplied relevance judgments, while the "Cluster Performance Question" proposed here focuses an the when and why of information retrieval or digital library performance for clustered and unclustered text databases. This may be generalized to study the relative performance of m versus n clusters.
Willis, C.; Losee, R.M.: ¬A random walk on an ontology : using thesaurus structure for automatic subject indexing (2013) 0.00
```
0.0024726419 = product of:
  0.017308492 = sum of:
    0.017308492 = weight(_text_:with in 1016) [ClassicSimilarity], result of:
      0.017308492 = score(doc=1016,freq=6.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.18445967 = fieldWeight in 1016, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.03125 = fieldNorm(doc=1016)
  0.14285715 = coord(1/7)
```
Abstract

Relationships between terms and features are an essential component of thesauri, ontologies, and a range of controlled vocabularies. In this article, we describe ways to identify important concepts in documents using the relationships in a thesaurus or other vocabulary structures. We introduce a methodology for the analysis and modeling of the indexing process based on a weighted random walk algorithm. The primary goal of this research is the analysis of the contribution of thesaurus structure to the indexing process. The resulting models are evaluated in the context of automatic subject indexing using four collections of documents pre-indexed with 4 different thesauri (AGROVOC [UN Food and Agriculture Organization], high-energy physics taxonomy [HEP], National Agricultural Library Thesaurus [NALT], and medical subject headings [MeSH]). We also introduce a thesaurus-centric matching algorithm intended to improve the quality of candidate concepts. In all cases, the weighted random walk improves automatic indexing performance over matching alone with an increase in average precision (AP) of 9% for HEP, 11% for MeSH, 35% for NALT, and 37% for AGROVOC. The results of the analysis support our hypothesis that subject indexing is in part a browsing process, and that using the vocabulary and its structure in a thesaurus contributes to the indexing process. The amount that the vocabulary structure contributes was found to differ among the 4 thesauri, possibly due to the vocabulary used in the corresponding thesauri and the structural relationships between the terms. Each of the thesauri and the manual indexing associated with it is characterized using the methods developed here.

Losee, R.M.: Learning syntactic rules and tags with genetic algorithms for information retrieval and filtering : an empirical basis for grammatical rules (1996) 0.00

0.0021413704 = product of:
  0.014989593 = sum of:
    0.014989593 = weight(_text_:with in 4068) [ClassicSimilarity], result of:
      0.014989593 = score(doc=4068,freq=2.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.15974675 = fieldWeight in 4068, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.046875 = fieldNorm(doc=4068)
  0.14285715 = coord(1/7)

Losee, R.M.: Browsing document collections : automatically organizing digital libraries and hypermedia using the Gray code (1997) 0.00
```
0.0021413704 = product of:
  0.014989593 = sum of:
    0.014989593 = weight(_text_:with in 146) [ClassicSimilarity], result of:
      0.014989593 = score(doc=146,freq=2.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.15974675 = fieldWeight in 146, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.046875 = fieldNorm(doc=146)
  0.14285715 = coord(1/7)
```
Abstract

Relevance and economic feedback may be used to produce an ordering of documents that supports browsing in hypermedia and digital libraries. Document classification based on the Gray code provides paths through the entire collection, each path traversing each node in the set of documents exactly once. Examines systems organizing document based on weighted and unweighted Gray codes. Relevance feedback is used to conceptually organize the collection for an individual to browse, based on that individual's interests and information needs, as reflected by their relevance judgements and user supplied economic preferences. Applies Bayesian learning theory to estimating the characteristics of documents of interest to the user and supplying an analytic model of browsing performance, based on minimising the Expected Browsing Distance. Economic feedback may be used to change the ordering of documents to benefit the user. Using these techniques, a hypermedia or digital library may order any and all available documents, not just those examined, based on the information provided by the searcher or people with similar interests
Losee, R.M.: Term dependence : a basis for Luhn and Zipf models (2001) 0.00
```
0.0021413704 = product of:
  0.014989593 = sum of:
    0.014989593 = weight(_text_:with in 6976) [ClassicSimilarity], result of:
      0.014989593 = score(doc=6976,freq=2.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.15974675 = fieldWeight in 6976, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.046875 = fieldNorm(doc=6976)
  0.14285715 = coord(1/7)
```
Abstract

There are regularities in the statistical information provided by natural language terms about neighboring terms. We find that when phrase rank increases, moving from common to less common phrases, the value of the expected mutual information measure (EMIM) between the terms regularly decreases. Luhn's model suggests that midrange terms are the best index terms and relevance discriminators. We suggest reasons for this principle based on the empirical relationships shown here between the rank of terms within phrases and the average mutual information between terms, which we refer to as the Inverse Representation- EMIM principle. We also suggest an Inverse EMIM term weight for indexing or retrieval applications that is consistent with Luhn's distribution. An information theoretic interpretation of Zipf's Law is provided. Using the regularity noted here, we suggest that Zipf's Law is a consequence of the statistical dependencies that exist between terms, described here using information theoretic concepts.
Losee, R.M.: Decisions in thesaurus construction and use (2007) 0.00
```
0.0021413704 = product of:
  0.014989593 = sum of:
    0.014989593 = weight(_text_:with in 924) [ClassicSimilarity], result of:
      0.014989593 = score(doc=924,freq=2.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.15974675 = fieldWeight in 924, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.046875 = fieldNorm(doc=924)
  0.14285715 = coord(1/7)
```
Abstract

A thesaurus and an ontology provide a set of structured terms, phrases, and metadata, often in a hierarchical arrangement, that may be used to index, search, and mine documents. We describe the decisions that should be made when including a term, deciding whether a term should be subdivided into its subclasses, or determining which of more than one set of possible subclasses should be used. Based on retrospective measurements or estimates of future performance when using thesaurus terms in document ordering, decisions are made so as to maximize performance. These decisions may be used in the automatic construction of a thesaurus. The evaluation of an existing thesaurus is described, consistent with the decision criteria developed here. These kinds of user-focused decision-theoretic techniques may be applied to other hierarchical applications, such as faceted classification systems used in information architecture or the use of hierarchical terms in "breadcrumb navigation".

Search (13 results, page 1 of 1)

Authors

Years

Themes