Search (18 results, page 1 of 1)

  • × theme_ss:"Volltextretrieval"
  1. Paijmans, H.: Gravity wells of meaning : detecting information rich passages in scientific texts (1997) 0.23
    0.22742893 = product of:
      0.30323857 = sum of:
        0.009416891 = product of:
          0.037667565 = sum of:
            0.037667565 = weight(_text_:based in 7444) [ClassicSimilarity], result of:
              0.037667565 = score(doc=7444,freq=2.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.26631355 = fieldWeight in 7444, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.0625 = fieldNorm(doc=7444)
          0.25 = coord(1/4)
        0.09033708 = weight(_text_:term in 7444) [ClassicSimilarity], result of:
          0.09033708 = score(doc=7444,freq=2.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.41242266 = fieldWeight in 7444, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.0625 = fieldNorm(doc=7444)
        0.20348458 = weight(_text_:frequency in 7444) [ClassicSimilarity], result of:
          0.20348458 = score(doc=7444,freq=4.0), product of:
            0.27643865 = queryWeight, product of:
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.04694356 = queryNorm
            0.7360931 = fieldWeight in 7444, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.0625 = fieldNorm(doc=7444)
      0.75 = coord(3/4)
    
    Abstract
    Presents research in which 4 term weigthing schemes were used to detect information rich passages in texts and the results compared. Demonstrates that word categories and frequency derived weights have a close correlation but that weighting according to the first mention theory or the cue method shows no correlation with frequency based weights
  2. Magennis, M.: Expert rule-based query expansion (1995) 0.08
    0.07766729 = product of:
      0.15533458 = sum of:
        0.018424708 = product of:
          0.07369883 = sum of:
            0.07369883 = weight(_text_:based in 5181) [ClassicSimilarity], result of:
              0.07369883 = score(doc=5181,freq=10.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.5210583 = fieldWeight in 5181, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5181)
          0.25 = coord(1/4)
        0.13690987 = weight(_text_:term in 5181) [ClassicSimilarity], result of:
          0.13690987 = score(doc=5181,freq=6.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.62504494 = fieldWeight in 5181, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5181)
      0.5 = coord(2/4)
    
    Abstract
    Examines how, for term based free text retrieval, Interactive Query Expansion (IQE) provides better retrieval performance tahn Automatic Query Expansion (AQE) but the performance of IQE depends on the strategy employed by the user to select expansion terms. The aim is to build an expert query expansion system using term selection rules based on expert users' strategies. It is expected that such a system will achieve better performance for novice or inexperienced users that either AQE or IQE. The procedure is to discover expert IQE users' term selection strategies through observation and interrogation, to construct a rule based query expansion (RQE) system based on these and to compare the resulting retrieval performance with that of comparable AQE and IQE systems
  3. Leppanen, E.: Homografiongelma tekstihaussa ja homografien disambiguoinnin vaikutukset (1996) 0.06
    0.057488333 = product of:
      0.11497667 = sum of:
        0.0070626684 = product of:
          0.028250674 = sum of:
            0.028250674 = weight(_text_:based in 27) [ClassicSimilarity], result of:
              0.028250674 = score(doc=27,freq=2.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.19973516 = fieldWeight in 27, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.046875 = fieldNorm(doc=27)
          0.25 = coord(1/4)
        0.107914 = weight(_text_:frequency in 27) [ClassicSimilarity], result of:
          0.107914 = score(doc=27,freq=2.0), product of:
            0.27643865 = queryWeight, product of:
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.04694356 = queryNorm
            0.39037234 = fieldWeight in 27, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.046875 = fieldNorm(doc=27)
      0.5 = coord(2/4)
    
    Abstract
    Homonymy is known to often cause false drops in free text searching in a full text database. The problem is quite common and difficult to avoid in Finnish, but nobody has examined it before. Reports on a study that examined the frequency of, and solutions to, the homonymy problem, based on searches made in a Finnish full text database containing about 55.000 newspaper articles. The results indicate that homonymy is not a very serious problem in full text searching, with only about 1 search result set out of 4 containing false drops caused by homonymy. Several other reasons for nonrelevance were much more common. However, in some set results there were a considerable number of homonymy errors, so the number seems to be very random. A study was also made into whether homonyms can be disambiguated by syntactic analysis. The result was that 75,2% of homonyms were disambiguated by this method. Verb homonyms were considerably easier to disambiguate than substantives. Although homonymy is not a very big problem it could perhaps easily be eliminated if there was a suitable syntactic analyzer in the IR system
  4. Kristensen, J.: Expanding end-users' query statements for free text searching with a search-aid thesaurus (1993) 0.03
    0.03193898 = product of:
      0.12775593 = sum of:
        0.12775593 = weight(_text_:term in 6621) [ClassicSimilarity], result of:
          0.12775593 = score(doc=6621,freq=4.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.58325374 = fieldWeight in 6621, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.0625 = fieldNorm(doc=6621)
      0.25 = coord(1/4)
    
    Abstract
    Tests the effectiveness of a thesaurus as a search-aid in free text searching of a full text database. A set of queries was searched against a large full text database of newspaper articles. The thesaurus contained equivalence, hierarchical and associative relationships. Each query was searched in five modes: basic search, synonym search, narrower term search, related term search, and union of all previous searches. The searches were analyzed in terms of relative recall and precision
  5. Wildemuth, B.M.: Measures of success in searching a full-text fact base (1990) 0.02
    0.019761236 = product of:
      0.079044946 = sum of:
        0.079044946 = weight(_text_:term in 2050) [ClassicSimilarity], result of:
          0.079044946 = score(doc=2050,freq=2.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.36086982 = fieldWeight in 2050, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2050)
      0.25 = coord(1/4)
    
    Abstract
    The traditional measures of online searching proficiency (recall and precision) are less appropriate when applied to the searching of full text databases. The pilot study investigated and evaluated 5 measures of overall success in searching a full text data bank. Data was drawn from INQUIRER searches conducted by medical students at North Carolina Univ. at Chapel Hill. INQUIRER ia an online database of facts and concepts in microbiology. The 5 measures were: success/failure; precision; search term overlap; number of search cycles; and time per search. Concludes that the last 4 measures look promising for the evaluation of fact data bases such as ENQUIRER
  6. Laegreid, J.A.: SIFT: a Norwegian information retrieval system (1993) 0.02
    0.017428853 = product of:
      0.034857705 = sum of:
        0.009416891 = product of:
          0.037667565 = sum of:
            0.037667565 = weight(_text_:based in 7701) [ClassicSimilarity], result of:
              0.037667565 = score(doc=7701,freq=2.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.26631355 = fieldWeight in 7701, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.0625 = fieldNorm(doc=7701)
          0.25 = coord(1/4)
        0.025440816 = product of:
          0.05088163 = sum of:
            0.05088163 = weight(_text_:22 in 7701) [ClassicSimilarity], result of:
              0.05088163 = score(doc=7701,freq=2.0), product of:
                0.16438834 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04694356 = queryNorm
                0.30952093 = fieldWeight in 7701, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=7701)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Describes SIFT (Search in Free Text) an information retrieval system originally developed for administering governmental documents in Norway but which is now being applied alsewhere. SIFT handles structured information well. A library system, SIFT-BIBL, is now available. SIFT's retrieval engine and search facilities are powerful. Its user interface is limited but being imporved. An application programmer interface has been released which will allow programmers to develop their own interface. A Windows-based- client-server version is now being beta tested
    Date
    23. 1.1999 19:22:09
  7. Reinisch, F.: Wer suchet - der findet? : oder Die Überwindung der sprachlichen Grenzen bei der Suche in Volltextdatenbanken (2000) 0.01
    0.006360204 = product of:
      0.025440816 = sum of:
        0.025440816 = product of:
          0.05088163 = sum of:
            0.05088163 = weight(_text_:22 in 4919) [ClassicSimilarity], result of:
              0.05088163 = score(doc=4919,freq=2.0), product of:
                0.16438834 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04694356 = queryNorm
                0.30952093 = fieldWeight in 4919, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=4919)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    22. 7.2000 17:48:06
  8. Zillmann, H.: OSIRIS und eLib : Information Retrieval und Search Engines in Full-text Databases (2001) 0.01
    0.006360204 = product of:
      0.025440816 = sum of:
        0.025440816 = product of:
          0.05088163 = sum of:
            0.05088163 = weight(_text_:22 in 5937) [ClassicSimilarity], result of:
              0.05088163 = score(doc=5937,freq=2.0), product of:
                0.16438834 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04694356 = queryNorm
                0.30952093 = fieldWeight in 5937, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=5937)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    14. 6.2001 12:22:31
  9. Dambeck, H.; Engler, T.: Gesucht und gefunden : Neun Volltext-Suchprogramme für den Desktop (2002) 0.01
    0.006360204 = product of:
      0.025440816 = sum of:
        0.025440816 = product of:
          0.05088163 = sum of:
            0.05088163 = weight(_text_:22 in 1169) [ClassicSimilarity], result of:
              0.05088163 = score(doc=1169,freq=2.0), product of:
                0.16438834 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04694356 = queryNorm
                0.30952093 = fieldWeight in 1169, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1169)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Source
    c't. 2002, H.22, S.190-197
  10. Sievert, M.E.; McKinin, E.J.: Why full-text misses some relevant documents : an analysis of documents not retrieved by CCML or MEDIS (1989) 0.00
    0.0047701527 = product of:
      0.019080611 = sum of:
        0.019080611 = product of:
          0.038161222 = sum of:
            0.038161222 = weight(_text_:22 in 3564) [ClassicSimilarity], result of:
              0.038161222 = score(doc=3564,freq=2.0), product of:
                0.16438834 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04694356 = queryNorm
                0.23214069 = fieldWeight in 3564, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3564)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    9. 1.1996 10:22:31
  11. Melucci, M.: Passage retrieval : a probabilistic technique (1998) 0.00
    0.0029132022 = product of:
      0.011652809 = sum of:
        0.011652809 = product of:
          0.046611235 = sum of:
            0.046611235 = weight(_text_:based in 1150) [ClassicSimilarity], result of:
              0.046611235 = score(doc=1150,freq=4.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.3295462 = fieldWeight in 1150, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1150)
          0.25 = coord(1/4)
      0.25 = coord(1/4)
    
    Abstract
    This paper presents a probabilistic technique to retrieve passages from texts having a large size or heterogeneous semantic content. The proposed technique is independent on any supporting auxiliary data, such as text structure, topic organization, or pre-defined text segments. A Bayesian framework implements the probabilistic technique. We carried out experiments to compare the probabilistique technique to one based on a text segmentation algorithm. In particular, the probabilistique technique is more effective than, or as effective as the one based on the text segmentation to retrieve small passages. Results show that passage size affects passage retrieval performance. Results do also suggest that text organization and query generality may have an impact on the difference in effectiveness between the two techniques
  12. Ashford, J.H.: Full text retrieval in document management : a review (1995) 0.00
    0.0023542228 = product of:
      0.009416891 = sum of:
        0.009416891 = product of:
          0.037667565 = sum of:
            0.037667565 = weight(_text_:based in 2054) [ClassicSimilarity], result of:
              0.037667565 = score(doc=2054,freq=2.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.26631355 = fieldWeight in 2054, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.0625 = fieldNorm(doc=2054)
          0.25 = coord(1/4)
      0.25 = coord(1/4)
    
    Abstract
    Full text management which applied to document management tends to be centred on text storage and retrieval. Recent developments are concerned with integration with relational database management system products to deliver document management services offering both the flexibility of text retrieval and the ability to support process based funnctions. There has been a move towards client server architectures, more user friendly user interfaces and more flexible and easier to understand retrieval. Advocates caution in choosing tasks for full text methods. Identifies document management functions for which the combined use of database management systems or special purpose tools should be considered
  13. Dow Jones unveils knowledge indexing system (1997) 0.00
    0.0023542228 = product of:
      0.009416891 = sum of:
        0.009416891 = product of:
          0.037667565 = sum of:
            0.037667565 = weight(_text_:based in 751) [ClassicSimilarity], result of:
              0.037667565 = score(doc=751,freq=2.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.26631355 = fieldWeight in 751, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.0625 = fieldNorm(doc=751)
          0.25 = coord(1/4)
      0.25 = coord(1/4)
    
    Abstract
    Dow Jones Interactive Publishing has developed a sophisticated automatic knowledge indexing system that will allow searchers of the Dow Jones News / Retrieval service to get highly targeted results from a search in the service's Publications Library. Instead of relying on a thesaurus of company names, the new system uses a combination of that basic algorithm plus unique rules based on the editorial styles of individual publications in the Library. Dow Jones have also announced its acceptance of the definitions of 'selected full text' and 'full text' from Bibliodata's Fulltext Sources Online directory
  14. Hider, P.: ¬The search value added by professional indexing to a bibliographic database (2017) 0.00
    0.0020808585 = product of:
      0.008323434 = sum of:
        0.008323434 = product of:
          0.033293735 = sum of:
            0.033293735 = weight(_text_:based in 3868) [ClassicSimilarity], result of:
              0.033293735 = score(doc=3868,freq=4.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.23539014 = fieldWeight in 3868, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3868)
          0.25 = coord(1/4)
      0.25 = coord(1/4)
    
    Abstract
    Gross et al. (2015) have demonstrated that about a quarter of hits would typically be lost to keyword searchers if contemporary academic library catalogs dropped their controlled subject headings. This paper reports on an analysis of the loss levels that would result if a bibliographic database, namely the Australian Education Index (AEI), were missing the subject descriptors and identifiers assigned by its professional indexers, employing the methodology developed by Gross and Taylor (2005), and later by Gross et al. (2015). The results indicate that AEI users would lose a similar proportion of hits per query to that experienced by library catalog users: on average, 27% of the resources found by a sample of keyword queries on the AEI database would not have been found without the subject indexing, based on the Australian Thesaurus of Education Descriptors (ATED). The paper also discusses the methodological limitations of these studies, pointing out that real-life users might still find some of the resources missed by a particular query through follow-up searches, while additional resources might also be found through iterative searching on the subject vocabulary. The paper goes on to describe a new research design, based on a before - and - after experiment, which addresses some of these limitations. It is argued that this alternative design will provide a more realistic picture of the value that professionally assigned subject indexing and controlled subject vocabularies can add to literature searching of a more scholarly and thorough kind.
  15. Bernard, M.: Modelling the efficient access to full-text information (1996) 0.00
    0.002059945 = product of:
      0.00823978 = sum of:
        0.00823978 = product of:
          0.03295912 = sum of:
            0.03295912 = weight(_text_:based in 5610) [ClassicSimilarity], result of:
              0.03295912 = score(doc=5610,freq=2.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.23302436 = fieldWeight in 5610, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5610)
          0.25 = coord(1/4)
      0.25 = coord(1/4)
    
    Abstract
    Describes a research goal set by many offices within the US Department of Energy. Reviews efficient full text searching techniques being developed to better understand and meet this goal. Classical computer human interaction (CHI) approaches provided by commercial information retrieval engines fail to contextualize information in ways that facilitates timely decision making. Discusses the uses of advanced CHI techniques in combination with deductive database technology to augment the weaknesses found in the presentation capabilities of information retrieval engines. Presents various techniques employed in a WWW based prototype system currently under development
  16. Huang, Y.-L.: ¬A theoretic and empirical research of cluster indexing for Mandarine Chinese full text document (1998) 0.00
    0.002059945 = product of:
      0.00823978 = sum of:
        0.00823978 = product of:
          0.03295912 = sum of:
            0.03295912 = weight(_text_:based in 513) [ClassicSimilarity], result of:
              0.03295912 = score(doc=513,freq=2.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.23302436 = fieldWeight in 513, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=513)
          0.25 = coord(1/4)
      0.25 = coord(1/4)
    
    Abstract
    Since most popular commercialized systems for full text retrieval are designed with full text scaning and Boolean logic query mode, these systems use an oversimplified relationship between the indexing form and the content of document. Reports the use of Singular Value Decomposition (SVD) to develop a Cluster Indexing Model (CIM) based on a Vector Space Model (VSM) in orer to explore the index theory of cluster indexing for chinese full text documents. From a series of experiments, it was found that the indexing performance of CIM is better than traditional VSM, and has almost equivalent effectiveness of the authority control of index terms
  17. Pearce, C.; Nicholas, C.: TELLTALE: Experiments in a dynamic hypertext environment for degraded and multilingual data (1996) 0.00
    0.0017656671 = product of:
      0.0070626684 = sum of:
        0.0070626684 = product of:
          0.028250674 = sum of:
            0.028250674 = weight(_text_:based in 4071) [ClassicSimilarity], result of:
              0.028250674 = score(doc=4071,freq=2.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.19973516 = fieldWeight in 4071, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4071)
          0.25 = coord(1/4)
      0.25 = coord(1/4)
    
    Abstract
    Methods and tools for finding documents relevant to a user's needs in a document corpora can be found in the information retrieval, library science, and hypertext communities. Typically, these systems provide retrieval capabilities for fairly static copora, their algorithms are dependent on the language for which they are written, e.g. English, and they do not perform well when presented with misspelled words or text that has been degraded by OCR techniques. In this article, we present experimentation results for the TELLTALE system. TELLTALE is a dynamic hypertext environment that provides full-text search from a hypertext-style user interface for text corpora that may be garbled by OCR or transmission errors, and that may contain languages other than English. TELLTALE uses several techniques based on n-grams (n character sequences of text). With these results we show that the dynamic linkage mechanisms in TELLTALE are tolerant of garbles in up to 30% of the characters in the body of the texts
  18. Pirkola, A.; Jarvelin, K.: ¬The effect of anaphor and ellipsis resolution on proximity searching in a text database (1995) 0.00
    0.0014713892 = product of:
      0.005885557 = sum of:
        0.005885557 = product of:
          0.023542227 = sum of:
            0.023542227 = weight(_text_:based in 4088) [ClassicSimilarity], result of:
              0.023542227 = score(doc=4088,freq=2.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.16644597 = fieldWeight in 4088, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4088)
          0.25 = coord(1/4)
      0.25 = coord(1/4)
    
    Abstract
    So far, methods for ellipsis and anaphor resolution have been developed and the effects of anaphor resolution have been analyzed in the context of statistical information retrieval of scientific abstracts. No significant improvements has been observed. Analyzes the effects of ellipsis and anaphor resolution on proximity searching in a full text database. Anaphora and ellipsis are classified on the basis of the type of their correlates / antecedents rather than, as traditional, on the basis of their own linguistic type. The classification differentiates proper names and common nouns of basic words, compound words, and phrases. The study was carried out in a newspaper article database containing 55.000 full text articles. A set of 154 keyword pairs in different categories was created. Human resolution of keyword ellipsis and anaphora was performed to identify sentences and paragraphs which would match proximity searches after resolution. Findings indicate that ellipsis and anaphor resolution is most relevant for proper name phrases and only marginal in the other keyword categories. Therefore the recall effect of restricted resolution of proper name phrases only was analyzed for keyword pairs containing at least 1 proper name phrase. Findings indicate a recall increase of 38.2% in sentence searches, and 28.8% in paragraph searches when proper name ellipsis were resolved. The recall increase was 17.6% sentence searches, and 19.8% in paragraph searches when proper name anaphora were resolved. Some simple and computationally justifiable resolution method might be developed only for proper name phrases to support keyword based full text information retrieval. Discusses elements of such a method