Search (14 results, page 1 of 1)

Losee, R.M.: Determining information retrieval and filtering performance without experimentation (1995) 0.04

0.043804124 = product of:
  0.08760825 = sum of:
    0.08760825 = sum of:
      0.038116705 = weight(_text_:systems in 3368) [ClassicSimilarity], result of:
        0.038116705 = score(doc=3368,freq=2.0), product of:
          0.16037072 = queryWeight, product of:
            3.0731742 = idf(docFreq=5561, maxDocs=44218)
            0.052184064 = queryNorm
          0.23767869 = fieldWeight in 3368, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.0731742 = idf(docFreq=5561, maxDocs=44218)
            0.0546875 = fieldNorm(doc=3368)
      0.049491543 = weight(_text_:22 in 3368) [ClassicSimilarity], result of:
        0.049491543 = score(doc=3368,freq=2.0), product of:
          0.1827397 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.052184064 = queryNorm
          0.2708308 = fieldWeight in 3368, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=3368)
  0.5 = coord(1/2)

Abstract: The performance of an information retrieval or text and media filtering system may be determined through analytic methods as well as by traditional simulation or experimental methods. These analytic methods can provide precise statements about expected performance. They can thus determine which of 2 similarly performing systems is superior. For both a single query terms and for a multiple query term retrieval model, a model for comparing the performance of different probabilistic retrieval methods is developed. This method may be used in computing the average search length for a query, given only knowledge of database parameter values. Describes predictive models for inverse document frequency, binary independence, and relevance feedback based retrieval and filtering. Simulation illustrate how the single term model performs and sample performance predictions are given for single term and multiple term problems
Date: 22. 2.1996 13:14:10

Losee, R.M.: Term dependence : truncating the Bahadur Lazarsfeld expansion (1994) 0.02

0.01633573 = product of:
  0.03267146 = sum of:
    0.03267146 = product of:
      0.06534292 = sum of:
        0.06534292 = weight(_text_:systems in 7390) [ClassicSimilarity], result of:
          0.06534292 = score(doc=7390,freq=2.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.4074492 = fieldWeight in 7390, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.09375 = fieldNorm(doc=7390)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: Studies the performance of probabilistic information retrieval systems where differing statistical dependence assumptions are used when estimating the probabilities inherent in the retrieval model. Uses the Bahadur Lazarsfeld expansion model

Losee, R.M.: Seven fundamental questions for the science of library classification (1993) 0.02

0.015401474 = product of:
  0.030802948 = sum of:
    0.030802948 = product of:
      0.061605897 = sum of:
        0.061605897 = weight(_text_:systems in 4508) [ClassicSimilarity], result of:
          0.061605897 = score(doc=4508,freq=4.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.38414678 = fieldWeight in 4508, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.0625 = fieldNorm(doc=4508)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: For classification to advance to the point where optimal systems may be developed for manual or automated use, it will be necessary for a science of document or library classification to be developed. Seven questions are posed which the author feels must be answered before such optimal systems can be developed. Suggestions are made as to the forms that answers to these questions might take

Losee, R.M.: How to study classification systems and their appropriateness for individual institutions (1995) 0.02

0.015401474 = product of:
  0.030802948 = sum of:
    0.030802948 = product of:
      0.061605897 = sum of:
        0.061605897 = weight(_text_:systems in 5545) [ClassicSimilarity], result of:
          0.061605897 = score(doc=5545,freq=4.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.38414678 = fieldWeight in 5545, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.0625 = fieldNorm(doc=5545)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: Answers to questions concerning individual library decisions to adopt classification systems are important in understanding the efffectiveness of libraries but are difficult to provide. Measures of classification system performance are discussed, as are different methodologies that may be used to seek answers, ranging from formal or philosophical models to quantitative experimental techniques and qualitative methods

Losee, R.M.: Browsing mixed structured and unstructured data (2006) 0.01
```
0.014147157 = product of:
  0.028294314 = sum of:
    0.028294314 = product of:
      0.056588627 = sum of:
        0.056588627 = weight(_text_:systems in 173) [ClassicSimilarity], result of:
          0.056588627 = score(doc=173,freq=6.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.35286134 = fieldWeight in 173, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.046875 = fieldNorm(doc=173)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Both structured and unstructured data, as well as structured data representing several different types of tuples, may be integrated into a single list for browsing or retrieval. Data may be arranged in the Gray code order of the features and metadata, producing optimal ordering for browsing. We provide several metrics for evaluating the performance of systems supporting browsing, given some constraints. Metadata and indexing terms are used for sorting keys and attributes for structured data, as well as for semi-structured or unstructured documents, images, media, etc. Economic and information theoretic models are suggested that enable the ordering to adapt to user preferences. Different relational structures and unstructured data may be integrated into a single, optimal ordering for browsing or for displaying tables in digital libraries, database management systems, or information retrieval systems. Adaptive displays of data are discussed.
Losee, R.M.: ¬A Gray code based ordering for documents on shelves : classification for browsing and retrieval (1992) 0.01
```
0.013476291 = product of:
  0.026952581 = sum of:
    0.026952581 = product of:
      0.053905163 = sum of:
        0.053905163 = weight(_text_:systems in 2335) [ClassicSimilarity], result of:
          0.053905163 = score(doc=2335,freq=4.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.33612844 = fieldWeight in 2335, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2335)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

A document classifier places documents together in a linear arrangement for browsing or high-speed access by human or computerised information retrieval systems. Requirements for document classification and browsing systems are developed from similarity measures, distance measures, and the notion of subject aboutness. A requirement that documents be arranged in decreasing order of similarity as the distance from a given document increases can often not be met. Based on these requirements, information-theoretic considerations, and the Gray code, a classification system is proposed that can classifiy documents without human intervention. A measure of classifier performance is developed, and used to evaluate experimental results comparing the distance between subject headings assigned to documents given classifications from the proposed system and the Library of Congress Classification (LCC) system
Losee, R.M.: Comparing Boolean and probabilistic information retrieval systems across queries and disciplines (1997) 0.01
```
0.013476291 = product of:
  0.026952581 = sum of:
    0.026952581 = product of:
      0.053905163 = sum of:
        0.053905163 = weight(_text_:systems in 7709) [ClassicSimilarity], result of:
          0.053905163 = score(doc=7709,freq=4.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.33612844 = fieldWeight in 7709, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.0546875 = fieldNorm(doc=7709)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Suggests a method for comparison of the use of Boolean queries and ranking documents using document and term weights, and examines their relative merits. The performance of information retrieval may be determined either by using experimental simulation, or through the application of analytic techniques that estimate the retrieval performance, given values for query and database characteristics. Using these performance predicting techniques, sample performance figures are provided for queries using the Boolean operators and, and or, as well as for probabilistic systems assuming statistical term independence or term dependence. Examines the performance of models failing to meet statistical and other assumptions
Losee, R.M.; Church Jr., L.: Are two document clusters better than one? : the cluster performance question for information retrieval (2005) 0.01
```
0.013476291 = product of:
  0.026952581 = sum of:
    0.026952581 = product of:
      0.053905163 = sum of:
        0.053905163 = weight(_text_:systems in 3270) [ClassicSimilarity], result of:
          0.053905163 = score(doc=3270,freq=4.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.33612844 = fieldWeight in 3270, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3270)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

When do information retrieval systems using two document clusters provide better retrieval performance than systems using no clustering? We answer this question for one set of assumptions and suggest how this may be studied with other assumptions. The "Cluster Hypothesis" asks an empirical question about the relationships between documents and user-supplied relevance judgments, while the "Cluster Performance Question" proposed here focuses an the when and why of information retrieval or digital library performance for clustered and unclustered text databases. This may be generalized to study the relative performance of m versus n clusters.
Losee, R.M.; Haas, S.W.: Sublanguage terms : dictionaries, usage, and automatic classification (1995) 0.01
```
0.010890487 = product of:
  0.021780973 = sum of:
    0.021780973 = product of:
      0.043561947 = sum of:
        0.043561947 = weight(_text_:systems in 2650) [ClassicSimilarity], result of:
          0.043561947 = score(doc=2650,freq=2.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.2716328 = fieldWeight in 2650, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.0625 = fieldNorm(doc=2650)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The use of terms from natural and social science titles and abstracts is studied from the perspective of sublanguages and their specialized dictionaries. Explores different notions of sublanguage distinctiveness. Object methods for separating hard and soft sciences are suggested based on measures of sublanguage use, dictionary characteristics, and sublanguage distinctiveness. Abstracts were automatically classified with a high degree of accuracy by using a formula that condsiders the degree of uniqueness of terms in each sublanguage. This may prove useful for text filtering of information retrieval systems
Losee, R.M.: ¬The relative shelf location of circulated books : a study of classification, users, and browsing (1993) 0.01
```
0.009529176 = product of:
  0.019058352 = sum of:
    0.019058352 = product of:
      0.038116705 = sum of:
        0.038116705 = weight(_text_:systems in 4485) [ClassicSimilarity], result of:
          0.038116705 = score(doc=4485,freq=2.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.23767869 = fieldWeight in 4485, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4485)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Patrons often browse through books organized by a library classification system, looking for books to use and possibly circulate. This research is an examination of the clustering of similar books provided by a classification system and ways in which the books that patrons circulate are clustered. Measures of classification system performance are suggested and used to evaluate two test collections. Regression formulas are derived describing the relationships among the number of areas in which books were found (the number of stops a patron makes when browsing), the distances across a cluster, and the average number of books a patron circulates. Patrons were found usually to make more stops than there were books found at their average stop. Consequences for full-text document systems and online catalogs are suggested
Losee, R.M.: When information retrieval measures agree about the relative quality of document rankings (2000) 0.01
```
0.009529176 = product of:
  0.019058352 = sum of:
    0.019058352 = product of:
      0.038116705 = sum of:
        0.038116705 = weight(_text_:systems in 4860) [ClassicSimilarity], result of:
          0.038116705 = score(doc=4860,freq=2.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.23767869 = fieldWeight in 4860, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4860)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The variety of performance measures available for information retrieval systems, search engines, and network filtering agents can be confusing to both practitioners and scholars. Most discussions about these measures address their theoretical foundations and the characteristics of a measure that make it desirable for a particular application. In this work, we consider how measures of performance at a point in a search may be formally compared. Criteria are developed that allow one to determine the percent of time or conditions under which 2 different performance measures suggest that one document ordering is superior to another ordering, or when the 2 measures disagree about the relative value of document orderings. As an example, graphs provide illustrations of the relationships between precision and F
Haas, S.W.; Losee, R.M.: Looking in text windows : their size and composition (1994) 0.01
```
0.008167865 = product of:
  0.01633573 = sum of:
    0.01633573 = product of:
      0.03267146 = sum of:
        0.03267146 = weight(_text_:systems in 8525) [ClassicSimilarity], result of:
          0.03267146 = score(doc=8525,freq=2.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.2037246 = fieldWeight in 8525, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.046875 = fieldNorm(doc=8525)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

A text window is a group of words appearing in contiguous positions in text used to exploit a variety of lexical, syntactics, and semantic relationships without having to analyze the text explicitely for their structure. This supports the previously suggested idea that natural grouping of words are best treated as a unit of size 7 to 11 words, that is, plus or minus 3 to 5 words. The text retrieval experiments varying the size of windows, both with full text and with stopwords removed, support these size ranges. The characteristcs of windows that best match terms in queries are examined in detail, revealing intersting differences between those for queries with good results and those for queries with poorer results. Queries with good results tend to contain morte content word phrase and few terms with high frequency of use in the database. Information retrieval systems may benefit from expanding thesaurus-style relationships or incorporating statistical dependencies for terms within these windows
Losee, R.M.: Browsing document collections : automatically organizing digital libraries and hypermedia using the Gray code (1997) 0.01
```
0.008167865 = product of:
  0.01633573 = sum of:
    0.01633573 = product of:
      0.03267146 = sum of:
        0.03267146 = weight(_text_:systems in 146) [ClassicSimilarity], result of:
          0.03267146 = score(doc=146,freq=2.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.2037246 = fieldWeight in 146, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.046875 = fieldNorm(doc=146)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Relevance and economic feedback may be used to produce an ordering of documents that supports browsing in hypermedia and digital libraries. Document classification based on the Gray code provides paths through the entire collection, each path traversing each node in the set of documents exactly once. Examines systems organizing document based on weighted and unweighted Gray codes. Relevance feedback is used to conceptually organize the collection for an individual to browse, based on that individual's interests and information needs, as reflected by their relevance judgements and user supplied economic preferences. Applies Bayesian learning theory to estimating the characteristics of documents of interest to the user and supplying an analytic model of browsing performance, based on minimising the Expected Browsing Distance. Economic feedback may be used to change the ordering of documents to benefit the user. Using these techniques, a hypermedia or digital library may order any and all available documents, not just those examined, based on the information provided by the searcher or people with similar interests
Losee, R.M.: Decisions in thesaurus construction and use (2007) 0.01
```
0.008167865 = product of:
  0.01633573 = sum of:
    0.01633573 = product of:
      0.03267146 = sum of:
        0.03267146 = weight(_text_:systems in 924) [ClassicSimilarity], result of:
          0.03267146 = score(doc=924,freq=2.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.2037246 = fieldWeight in 924, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.046875 = fieldNorm(doc=924)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

A thesaurus and an ontology provide a set of structured terms, phrases, and metadata, often in a hierarchical arrangement, that may be used to index, search, and mine documents. We describe the decisions that should be made when including a term, deciding whether a term should be subdivided into its subclasses, or determining which of more than one set of possible subclasses should be used. Based on retrospective measurements or estimates of future performance when using thesaurus terms in document ordering, decisions are made so as to maximize performance. These decisions may be used in the automatic construction of a thesaurus. The evaluation of an existing thesaurus is described, consistent with the decision criteria developed here. These kinds of user-focused decision-theoretic techniques may be applied to other hierarchical applications, such as faceted classification systems used in information architecture or the use of hierarchical terms in "breadcrumb navigation".

Search (14 results, page 1 of 1)

Authors

Years

Themes