Search (5 results, page 1 of 1)

  • × author_ss:"Schatz, B.R."
  1. Chen, H.; Ng, T.D.; Martinez, J.; Schatz, B.R.: ¬A concept space approach to addressing the vocabulary problem in scientific information retrieval : an experiment on the Worm Community System (1997) 0.03
    0.02709466 = product of:
      0.1354733 = sum of:
        0.1354733 = weight(_text_:thesaurus in 6492) [ClassicSimilarity], result of:
          0.1354733 = score(doc=6492,freq=10.0), product of:
            0.23732872 = queryWeight, product of:
              4.6210785 = idf(docFreq=1182, maxDocs=44218)
              0.051357865 = queryNorm
            0.5708255 = fieldWeight in 6492, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              4.6210785 = idf(docFreq=1182, maxDocs=44218)
              0.0390625 = fieldNorm(doc=6492)
      0.2 = coord(1/5)
    
    Abstract
    This research presents an algorithmic approach to addressing the vocabulary problem in scientific information retrieval and information sharing, using the molecular biology domain as an example. We first present a literature review of cognitive studies related to the vocabulary problem and vocabulary-based search aids (thesauri) and then discuss techniques for building robust and domain-specific thesauri to assist in cross-domain scientific information retrieval. Using a variation of the automatic thesaurus generation techniques, which we refer to as the concept space approach, we recently conducted an experiment in the molecular biology domain in which we created a C. elegans worm thesaurus of 7.657 worm-specific terms and a Drosophila fly thesaurus of 15.626 terms. About 30% of these terms overlapped, which created vocabulary paths from one subject domain to the other. Based on a cognitve study of term association involving 4 biologists, we found that a large percentage (59,6-85,6%) of the terms suggested by the subjects were identified in the cojoined fly-worm thesaurus. However, we found only a small percentage (8,4-18,1%) of the associations suggested by the subjects in the thesaurus
  2. Chen, H.; Martinez, J.; Kirchhoff, A.; Ng, T.D.; Schatz, B.R.: Alleviating search uncertainty through concept associations : automatic indexing, co-occurence analysis, and parallel computing (1998) 0.03
    0.02518492 = product of:
      0.1259246 = sum of:
        0.1259246 = weight(_text_:thesaurus in 5202) [ClassicSimilarity], result of:
          0.1259246 = score(doc=5202,freq=6.0), product of:
            0.23732872 = queryWeight, product of:
              4.6210785 = idf(docFreq=1182, maxDocs=44218)
              0.051357865 = queryNorm
            0.5305915 = fieldWeight in 5202, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.6210785 = idf(docFreq=1182, maxDocs=44218)
              0.046875 = fieldNorm(doc=5202)
      0.2 = coord(1/5)
    
    Abstract
    In this article, we report research on an algorithmic approach to alleviating search uncertainty in a large information space. Grounded on object filtering, automatic indexing, and co-occurence analysis, we performed a large-scale experiment using a parallel supercomputer (SGI Power Challenge) to analyze 400.000+ abstracts in an INSPEC computer engineering collection. Two system-generated thesauri, one based on a combined object filtering and automatic indexing method, and the other based on automatic indexing only, were compaed with the human-generated INSPEC subject thesaurus. Our user evaluation revealed that the system-generated thesauri were better than the INSPEC thesaurus in 'concept recall', but in 'concept precision' the 3 thesauri were comparable. Our analysis also revealed that the terms suggested by the 3 thesauri were complementary and could be used to significantly increase 'variety' in search terms the thereby reduce search uncertainty
    Theme
    Konzeption und Anwendung des Prinzips Thesaurus
  3. Ramsey, M.C.; Chen, H.; Zhu, B.; Schatz, B.R.: ¬A collection of visual thesauri for browsing large collections of geographic images (1999) 0.02
    0.023990633 = product of:
      0.11995316 = sum of:
        0.11995316 = weight(_text_:thesaurus in 3922) [ClassicSimilarity], result of:
          0.11995316 = score(doc=3922,freq=4.0), product of:
            0.23732872 = queryWeight, product of:
              4.6210785 = idf(docFreq=1182, maxDocs=44218)
              0.051357865 = queryNorm
            0.50543046 = fieldWeight in 3922, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.6210785 = idf(docFreq=1182, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3922)
      0.2 = coord(1/5)
    
    Abstract
    Digital libraries of geo-spatial multimedia content are currently deficient in providing fuzzy, concept-based retrieval mechanisms to users. The main challenge is that indexing and thesaurus creation are extremely labor-intensive processes for text documents and especially for images. Recently, 800.000 declassified staellite photographs were made available by the US Geological Survey. Additionally, millions of satellite and aerial photographs are archived in national and local map libraries. Such enormous collections make human indexing and thesaurus generation methods impossible to utilize. In this article we propose a scalable method to automatically generate visual thesauri of large collections of geo-spatial media using fuzzy, unsupervised machine-learning techniques
  4. Chen, H.; Houston, A.L.; Sewell, R.R.; Schatz, B.R.: Internet browsing and searching : user evaluations of category map and concept space techniques (1998) 0.02
    0.019387359 = product of:
      0.09693679 = sum of:
        0.09693679 = weight(_text_:thesaurus in 869) [ClassicSimilarity], result of:
          0.09693679 = score(doc=869,freq=8.0), product of:
            0.23732872 = queryWeight, product of:
              4.6210785 = idf(docFreq=1182, maxDocs=44218)
              0.051357865 = queryNorm
            0.40844947 = fieldWeight in 869, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              4.6210785 = idf(docFreq=1182, maxDocs=44218)
              0.03125 = fieldNorm(doc=869)
      0.2 = coord(1/5)
    
    Abstract
    The Internet provides an exceptional testbed for developing algorithms that can improve bowsing and searching large information spaces. Browsing and searching tasks are susceptible to problems of information overload and vocabulary differences. Much of the current research is aimed at the development and refinement of algorithms to improve browsing and searching by addressing these problems. Our research was focused on discovering whether two of the algorithms our research group has developed, a Kohonen algorithm category map for browsing, and an automatically generated concept space algorithm for searching, can help improve browsing and / or searching the Internet. Our results indicate that a Kohonen self-organizing map (SOM)-based algorithm can successfully categorize a large and eclectic Internet information space (the Entertainment subcategory of Yahoo!) into manageable sub-spaces that users can successfully navigate to locate a homepage of interest to them. The SOM algorithm worked best with browsing tasks that were very broad, and in which subjects skipped around between categories. Subjects especially liked the visual and graphical aspects of the map. Subjects who tried to do a directed search, and those that wanted to use the more familiar mental models (alphabetic or hierarchical organization) for browsing, found that the work did not work well. The results from the concept space experiment were especially encouraging. There were no significant differences among the precision measures for the set of documents identified by subject-suggested terms, thesaurus-suggested terms, and the combination of subject- and thesaurus-suggested terms. The recall measures indicated that the combination of subject- and thesaurs-suggested terms exhibited significantly better recall than subject-suggested terms alone. Furthermore, analysis of the homepages indicated that there was limited overlap between the homepages retrieved by the subject-suggested and thesaurus-suggested terms. Since the retrieval homepages for the most part were different, this suggests that a user can enhance a keyword-based search by using an automatically generated concept space. Subejcts especially liked the level of control that they could exert over the search, and the fact that the terms suggested by the thesaurus were 'real' (i.e., orininating in the homepages) and therefore guaranteed to have retrieval success
  5. Schatz, B.R.: Information analysis in the net : the interspace of the twenty-first century (1998) 0.01
    0.011133251 = product of:
      0.055666253 = sum of:
        0.055666253 = weight(_text_:22 in 2344) [ClassicSimilarity], result of:
          0.055666253 = score(doc=2344,freq=2.0), product of:
            0.1798465 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051357865 = queryNorm
            0.30952093 = fieldWeight in 2344, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=2344)
      0.2 = coord(1/5)
    
    Date
    22. 9.1997 19:16:05