Search (305 results, page 3 of 16)

  • × theme_ss:"Retrievalalgorithmen"
  1. Sparck Jones, K.: ¬A statistical interpretation of term specifity and its application in retrieval (1972) 0.00
    0.004463867 = product of:
      0.0133916 = sum of:
        0.0133916 = product of:
          0.0267832 = sum of:
            0.0267832 = weight(_text_:of in 5187) [ClassicSimilarity], result of:
              0.0267832 = score(doc=5187,freq=4.0), product of:
                0.06850986 = queryWeight, product of:
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.043811057 = queryNorm
                0.39093933 = fieldWeight in 5187, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.125 = fieldNorm(doc=5187)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Source
    Journal of documentation. 28(1972), S.11-21
  2. Wilbur, W.J.: ¬A retrieval system based on automatic relevance weighting of search terms (1992) 0.00
    0.004463867 = product of:
      0.0133916 = sum of:
        0.0133916 = product of:
          0.0267832 = sum of:
            0.0267832 = weight(_text_:of in 5269) [ClassicSimilarity], result of:
              0.0267832 = score(doc=5269,freq=16.0), product of:
                0.06850986 = queryWeight, product of:
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.043811057 = queryNorm
                0.39093933 = fieldWeight in 5269, product of:
                  4.0 = tf(freq=16.0), with freq of:
                    16.0 = termFreq=16.0
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.0625 = fieldNorm(doc=5269)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Describes the development of a retrieval system based on automatic relevance weighting of search terms and founded on the Bayesian formulation of the probability of relevance as function of term occurrence where the contribution from individual terms is assumed to be independent. The relevance pair (RP) model and the vector cosine (VC) model were compared and in the test environment improved retrieval was obtained with the RP model when compared with the VC model
    Source
    Proceedings of the 55th Annual Meeting of the American Society for Information Science, Pittsburgh, 26.-29.10.92. Ed.: D. Shaw
  3. Guerrero-Bote, V.P.; Moya Anegón, F. de; Herrero Solana, V.: Document organization using Kohonen's algorithm (2002) 0.00
    0.004463867 = product of:
      0.0133916 = sum of:
        0.0133916 = product of:
          0.0267832 = sum of:
            0.0267832 = weight(_text_:of in 2564) [ClassicSimilarity], result of:
              0.0267832 = score(doc=2564,freq=16.0), product of:
                0.06850986 = queryWeight, product of:
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.043811057 = queryNorm
                0.39093933 = fieldWeight in 2564, product of:
                  4.0 = tf(freq=16.0), with freq of:
                    16.0 = termFreq=16.0
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.0625 = fieldNorm(doc=2564)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    The classification of documents from a bibliographic database is a task that is linked to processes of information retrieval based on partial matching. A method is described of vectorizing reference documents from LISA which permits their topological organization using Kohonen's algorithm. As an example a map is generated of 202 documents from LISA, and an analysis is made of the possibilities of this type of neural network with respect to the development of information retrieval systems based on graphical browsing.
  4. Stock, W.G.: On relevance distributions (2006) 0.00
    0.004463867 = product of:
      0.0133916 = sum of:
        0.0133916 = product of:
          0.0267832 = sum of:
            0.0267832 = weight(_text_:of in 5116) [ClassicSimilarity], result of:
              0.0267832 = score(doc=5116,freq=16.0), product of:
                0.06850986 = queryWeight, product of:
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.043811057 = queryNorm
                0.39093933 = fieldWeight in 5116, product of:
                  4.0 = tf(freq=16.0), with freq of:
                    16.0 = termFreq=16.0
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.0625 = fieldNorm(doc=5116)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    There are at least three possible ways that documents are distributed by relevance: informetric (power law), inverse logistic, and dichotomous. The nature of the type of distribution has implications for the construction of relevance ranking algorithms for search engines, for automated (blind) relevance feedback, for user behavior when using Web search engines, for combining of outputs of search engines for metasearch, for topic detection and tracking, and for the methodology of evaluation of information retrieval systems.
    Source
    Journal of the American Society for Information Science and Technology. 57(2006) no.8, S.1126-1129
  5. Biskri, I.; Rompré, L.: Using association rules for query reformulation (2012) 0.00
    0.004428855 = product of:
      0.013286565 = sum of:
        0.013286565 = product of:
          0.02657313 = sum of:
            0.02657313 = weight(_text_:of in 92) [ClassicSimilarity], result of:
              0.02657313 = score(doc=92,freq=28.0), product of:
                0.06850986 = queryWeight, product of:
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.043811057 = queryNorm
                0.38787308 = fieldWeight in 92, product of:
                  5.2915025 = tf(freq=28.0), with freq of:
                    28.0 = termFreq=28.0
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.046875 = fieldNorm(doc=92)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    In this paper the authors will present research on the combination of two methods of data mining: text classification and maximal association rules. Text classification has been the focus of interest of many researchers for a long time. However, the results take the form of lists of words (classes) that people often do not know what to do with. The use of maximal association rules induced a number of advantages: (1) the detection of dependencies and correlations between the relevant units of information (words) of different classes, (2) the extraction of hidden knowledge, often relevant, from a large volume of data. The authors will show how this combination can improve the process of information retrieval.
  6. Agosti, M.; Pretto, L.: ¬A theoretical study of a generalized version of kleinberg's HITS algorithm (2005) 0.00
    0.0044112457 = product of:
      0.013233736 = sum of:
        0.013233736 = product of:
          0.026467472 = sum of:
            0.026467472 = weight(_text_:of in 4) [ClassicSimilarity], result of:
              0.026467472 = score(doc=4,freq=40.0), product of:
                0.06850986 = queryWeight, product of:
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.043811057 = queryNorm
                0.38633084 = fieldWeight in 4, product of:
                  6.3245554 = tf(freq=40.0), with freq of:
                    40.0 = termFreq=40.0
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Kleinberg's HITS (Hyperlink-Induced Topic Search) algorithm (Kleinberg 1999), which was originally developed in a Web context, tries to infer the authoritativeness of a Web page in relation to a specific query using the structure of a subgraph of the Web graph, which is obtained considering this specific query. Recent applications of this algorithm in contexts far removed from that of Web searching (Bacchin, Ferro and Melucci 2002, Ng et al. 2001) inspired us to study the algorithm in the abstract, independently of its particular applications, trying to mathematically illuminate its behaviour. In the present paper we detail this theoretical analysis. The original work starts from the definition of a revised and more general version of the algorithm, which includes the classic one as a particular case. We perform an analysis of the structure of two particular matrices, essential to studying the behaviour of the algorithm, and we prove the convergence of the algorithm in the most general case, finding the analytic expression of the vectors to which it converges. Then we study the symmetry of the algorithm and prove the equivalence between the existence of symmetry and the independence from the order of execution of some basic operations on initial vectors. Finally, we expound some interesting consequences of our theoretical results.
  7. Srinivasan, P.: Intelligent information retrieval using rough set approximations (1989) 0.00
    0.00436691 = product of:
      0.01310073 = sum of:
        0.01310073 = product of:
          0.02620146 = sum of:
            0.02620146 = weight(_text_:of in 2526) [ClassicSimilarity], result of:
              0.02620146 = score(doc=2526,freq=20.0), product of:
                0.06850986 = queryWeight, product of:
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.043811057 = queryNorm
                0.38244802 = fieldWeight in 2526, product of:
                  4.472136 = tf(freq=20.0), with freq of:
                    20.0 = termFreq=20.0
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2526)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    The theory of rough sets was introduced in 1982. It allows the classification of objects into sets of equivalent members based on their attributes. Any combination of the same objetcts (or even their attributes) may be examined using the resultant classification. The theory has direct applications in the design and evaluation of classification schemes and the selection of discriminating attributes. Introductory papers discuss its application in the domain of medical diagnostic systems and the design of information retrieval systems accessing collections of documents. Advantages offered by the theory are: the implicit inclusion of Boolean logic; term weighting; and the ability to rank retrieved documents.
  8. Heinz, S.; Zobel, J.: Efficient single-pass index construction for text databases (2003) 0.00
    0.00436691 = product of:
      0.01310073 = sum of:
        0.01310073 = product of:
          0.02620146 = sum of:
            0.02620146 = weight(_text_:of in 1678) [ClassicSimilarity], result of:
              0.02620146 = score(doc=1678,freq=20.0), product of:
                0.06850986 = queryWeight, product of:
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.043811057 = queryNorm
                0.38244802 = fieldWeight in 1678, product of:
                  4.472136 = tf(freq=20.0), with freq of:
                    20.0 = termFreq=20.0
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1678)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Efficient construction of inverted indexes is essential to provision of search over large collections of text data. In this article, we review the principal approaches to inversion, analyze their theoretical cost, and present experimental results. We identify the drawbacks of existing inversion approaches and propose a single-pass inversion method that, in contrast to previous approaches, does not require the complete vocabulary of the indexed collection in main memory, can operate within limited resources, and does not sacrifice speed with high temporary storage requirements. We show that the performance of the single-pass approach can be improved by constructing inverted files in segments, reducing the cost of disk accesses during inversion of large volumes of data.
    Source
    Journal of the American Society for Information Science and technology. 54(2003) no.8, S.713-729
  9. Vechtomova, O.; Karamuftuoglu, M.: Elicitation and use of relevance feedback information (2006) 0.00
    0.00436691 = product of:
      0.01310073 = sum of:
        0.01310073 = product of:
          0.02620146 = sum of:
            0.02620146 = weight(_text_:of in 966) [ClassicSimilarity], result of:
              0.02620146 = score(doc=966,freq=20.0), product of:
                0.06850986 = queryWeight, product of:
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.043811057 = queryNorm
                0.38244802 = fieldWeight in 966, product of:
                  4.472136 = tf(freq=20.0), with freq of:
                    20.0 = termFreq=20.0
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=966)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    The paper presents two approaches to interactively refining user search formulations and their evaluation in the new High Accuracy Retrieval from Documents (HARD) track of TREC-12. The first method consists of asking the user to select a number of sentences that represent documents. The second method consists of showing to the user a list of noun phrases extracted from the initial document set. Both methods then expand the query based on the user feedback. The TREC results show that one of the methods is an effective means of interactive query expansion and yields significant performance improvements. The paper presents a comparison of the methods and detailed analysis of the evaluation results.
  10. Jacso, P.: Testing the calculation of a realistic h-index in Google Scholar, Scopus, and Web of Science for F. W. Lancaster (2008) 0.00
    0.0042995503 = product of:
      0.012898651 = sum of:
        0.012898651 = product of:
          0.025797302 = sum of:
            0.025797302 = weight(_text_:of in 5586) [ClassicSimilarity], result of:
              0.025797302 = score(doc=5586,freq=38.0), product of:
                0.06850986 = queryWeight, product of:
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.043811057 = queryNorm
                0.37654874 = fieldWeight in 5586, product of:
                  6.164414 = tf(freq=38.0), with freq of:
                    38.0 = termFreq=38.0
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5586)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    This paper focuses on the practical limitations in the content and software of the databases that are used to calculate the h-index for assessing the publishing productivity and impact of researchers. To celebrate F. W. Lancaster's biological age of seventy-five, and "scientific age" of forty-five, this paper discusses the related features of Google Scholar, Scopus, and Web of Science (WoS), and demonstrates in the latter how a much more realistic and fair h-index can be computed for F. W. Lancaster than the one produced automatically. Browsing and searching the cited reference index of the 1945-2007 edition of WoS, which in my estimate has over a hundred million "orphan references" that have no counterpart master records to be attached to, and "stray references" that cite papers which do have master records but cannot be identified by the matching algorithm because of errors of omission and commission in the references of the citing works, can bring up hundreds of additional cited references given to works of an accomplished author but are ignored in the automatic process of calculating the h-index. The partially manual process doubled the h-index value for F. W. Lancaster from 13 to 26, which is a much more realistic value for an information scientist and professor of his stature.
    Content
    Beitrag in einem Themenheft 'The Influence of F. W. Lancaster on Information Science and on Libraries', das als Festschrift für F.W. Lancaster deklariert ist.
    Object
    Web of Science
  11. Qi, Q.; Hessen, D.J.; Heijden, P.G.M. van der: Improving information retrieval through correspondenceanalysis instead of latent semantic analysis (2023) 0.00
    0.004267752 = product of:
      0.012803256 = sum of:
        0.012803256 = product of:
          0.025606511 = sum of:
            0.025606511 = weight(_text_:of in 1045) [ClassicSimilarity], result of:
              0.025606511 = score(doc=1045,freq=26.0), product of:
                0.06850986 = queryWeight, product of:
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.043811057 = queryNorm
                0.37376386 = fieldWeight in 1045, product of:
                  5.0990195 = tf(freq=26.0), with freq of:
                    26.0 = termFreq=26.0
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1045)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    The initial dimensions extracted by latent semantic analysis (LSA) of a document-term matrixhave been shown to mainly display marginal effects, which are irrelevant for informationretrieval. To improve the performance of LSA, usually the elements of the raw document-term matrix are weighted and the weighting exponent of singular values can be adjusted.An alternative information retrieval technique that ignores the marginal effects is correspon-dence analysis (CA). In this paper, the information retrieval performance of LSA and CA isempirically compared. Moreover, it is explored whether the two weightings also improve theperformance of CA. The results for four empirical datasets show that CA always performsbetter than LSA. Weighting the elements of the raw data matrix can improve CA; however,it is data dependent and the improvement is small. Adjusting the singular value weightingexponent often improves the performance of CA; however, the extent of the improvementdepends on the dataset and the number of dimensions. (PDF) Improving information retrieval through correspondence analysis instead of latent semantic analysis.
    Source
    Journal of intelligent information systems [https://doi.org/10.1007/s10844-023-00815-y]
  12. Can, F.: Incremental clustering for dynamic information processing (1993) 0.00
    0.004175565 = product of:
      0.012526695 = sum of:
        0.012526695 = product of:
          0.02505339 = sum of:
            0.02505339 = weight(_text_:of in 6627) [ClassicSimilarity], result of:
              0.02505339 = score(doc=6627,freq=14.0), product of:
                0.06850986 = queryWeight, product of:
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.043811057 = queryNorm
                0.36569026 = fieldWeight in 6627, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.0625 = fieldNorm(doc=6627)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Clustering of very large document databases is useful for both searching and browsing. The periodic updating of clusters is required due to the dynamic nature of databases. Introduces an algorithm for incremental clustering and discusses the complexity and cost of analysis of the algorithm together with an investigation of its expected behaviour. Shows through empirical testing that the algortihm achieves cost effectiveness and generates statistically valid clusters that are compatible with those of reclustering. The experimental evidence shows that the algorithm creates an effective and effecient retrieval environment
  13. Nakkouzi, Z.S.; Eastman, C.M.: Query formulation for handling negation in information retrieval systems (1990) 0.00
    0.004175565 = product of:
      0.012526695 = sum of:
        0.012526695 = product of:
          0.02505339 = sum of:
            0.02505339 = weight(_text_:of in 3531) [ClassicSimilarity], result of:
              0.02505339 = score(doc=3531,freq=14.0), product of:
                0.06850986 = queryWeight, product of:
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.043811057 = queryNorm
                0.36569026 = fieldWeight in 3531, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.0625 = fieldNorm(doc=3531)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Queries containing negation are widely recognised as presenting problems for both users and systems. In information retrieval systems such problems usually manifest themselves in the use of the NOT operator. Describes an algorithm to transform Boolean queries with negated terms into queries without negation; the transformation process is based on the use of a hierarchical thesaurus. Examines a set of user requests submitted to the Thomas Cooper Library at the University of South Carolina to determine the pattern and frequency of use of negation.
    Source
    Journal of the American Society for Information Science. 41(1990) no.3, S.171-182
  14. Torra, V.; Miyamoto, S.; Lanau, S.: Exploration of textual document archives using a fuzzy hierarchical clustering algorithm in the GAMBAL system (2005) 0.00
    0.004142815 = product of:
      0.012428444 = sum of:
        0.012428444 = product of:
          0.024856888 = sum of:
            0.024856888 = weight(_text_:of in 1028) [ClassicSimilarity], result of:
              0.024856888 = score(doc=1028,freq=18.0), product of:
                0.06850986 = queryWeight, product of:
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.043811057 = queryNorm
                0.36282203 = fieldWeight in 1028, product of:
                  4.2426405 = tf(freq=18.0), with freq of:
                    18.0 = termFreq=18.0
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1028)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    The Internet, together with the large amount of textual information available in document archives, has increased the relevance of information retrieval related tools. In this work we present an extension of the Gambal system for clustering and visualization of documents based on fuzzy clustering techniques. The tool allows to structure the set of documents in a hierarchical way (using a fuzzy hierarchical structure) and represent this structure in a graphical interface (a 3D sphere) over which the user can navigate. Gambal allows the analysis of the documents and the computation of their similarity not only on the basis of the syntactic similarity between words but also based on a dictionary (Wordnet 1.7) and latent semantics analysis.
  15. Tseng, Y.H.; Lin, Y.I.: Evaluation of fuzzy search, term suggestion, and term relevance feedback in an OPAC system (1998) 0.00
    0.0041003237 = product of:
      0.01230097 = sum of:
        0.01230097 = product of:
          0.02460194 = sum of:
            0.02460194 = weight(_text_:of in 6430) [ClassicSimilarity], result of:
              0.02460194 = score(doc=6430,freq=6.0), product of:
                0.06850986 = queryWeight, product of:
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.043811057 = queryNorm
                0.3591007 = fieldWeight in 6430, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.09375 = fieldNorm(doc=6430)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Source
    Bulletin of the Library Association of China. 1998, no.61, S.103-125
  16. Maron, M.E.; Kuhns, I.L.: On relevance, probabilistic indexing and information retrieval (1960) 0.00
    0.003945538 = product of:
      0.0118366135 = sum of:
        0.0118366135 = product of:
          0.023673227 = sum of:
            0.023673227 = weight(_text_:of in 1928) [ClassicSimilarity], result of:
              0.023673227 = score(doc=1928,freq=32.0), product of:
                0.06850986 = queryWeight, product of:
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.043811057 = queryNorm
                0.34554482 = fieldWeight in 1928, product of:
                  5.656854 = tf(freq=32.0), with freq of:
                    32.0 = termFreq=32.0
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1928)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Reports on a novel technique for literature indexing and searching in a mechanized library system. The notion of relevance is taken as the key concept in the theory of information retrieval and a comparative concept of relevance is explicated in terms of the theory of probability. The resulting technique called 'Probabilistic indexing' allows a computing machine, given a request for information, to make a statistical inference and derive a number (called the 'relevance number') for each document, which is a measure of the probability that the document will satisfy the given request. The result of a search is an ordered list of those documents which satisfy the request ranked according to their probable relevance. The paper goes on to show that whereas in a conventional library system the cross-referencing ('see' and 'see also') is based soley on the 'semantic closeness' between index terms, statistical measures of closeness between index terms can be defined and computed. Thus, given an arbitrary request consisting of one (or many) index term(s), a machine can eleborate on it to increase the probability of selecting relevant documents that would not otherwise have been selected. Finally, the paper suggest an interpretation of the whole library problem as one where the request is considered as a clue on the basis of which the library system makes a concatenated statistical inference in order to provide as an output an ordered list of those documents which most probably satisfy the information needs of the user
    Source
    Journal of the Association for Computing Machinery. 7(1960) no.3, S.216-244
  17. Kantor, P.; Kim, M.H.; Ibraev, U.; Atasoy, K.: Estimating the number of relevant documents in enormous collections (1999) 0.00
    0.003945538 = product of:
      0.0118366135 = sum of:
        0.0118366135 = product of:
          0.023673227 = sum of:
            0.023673227 = weight(_text_:of in 6690) [ClassicSimilarity], result of:
              0.023673227 = score(doc=6690,freq=32.0), product of:
                0.06850986 = queryWeight, product of:
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.043811057 = queryNorm
                0.34554482 = fieldWeight in 6690, product of:
                  5.656854 = tf(freq=32.0), with freq of:
                    32.0 = termFreq=32.0
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=6690)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    In assessing information retrieval systems, it is important to know not only the precision of the retrieved set, but also to compare the number of retrieved relevant items to the total number of relevant items. For large collections, such as the TREC test collections, or the World Wide Web, it is not possible to enumerate the entire set of relevant documents. If the retrieved documents are evaluated, a variant of the statistical "capture-recapture" method can be used to estimate the total number of relevant documents, providing the several retrieval systems used are sufficiently independent. We show that the underlying signal detection model supporting such an analysis can be extended in two ways. First, assuming that there are two distinct performance characteristics (corresponding to the chance of retrieving a relevant, and retrieving a given non-relevant document), we show that if there are three or more independent systems available it is possible to estimate the number of relevant documents without actually having to decide whether each individual document is relevant. We report applications of this 3-system method to the TREC data, leading to the conclusion that the independence assumptions are not satisfied. We then extend the model to a multi-system, multi-problem model, and show that it is possible to include statistical dependencies of all orders in the model, and determine the number of relevant documents for each of the problems in the set. Application to the TREC setting will be presented
    Series
    Proceedings of the American Society for Information Science; vol.36
    Source
    Knowledge: creation, organization and use. Proceedings of the 62nd Annual Meeting of the American Society for Information Science, 31.10.-4.11.1999. Ed.: L. Woods
  18. Sormunen, E.; Kekäläinen, J.; Koivisto, J.; Järvelin, K.: Document text characteristics affect the ranking of the most relevant documents by expanded structured queries (2001) 0.00
    0.003945538 = product of:
      0.0118366135 = sum of:
        0.0118366135 = product of:
          0.023673227 = sum of:
            0.023673227 = weight(_text_:of in 4487) [ClassicSimilarity], result of:
              0.023673227 = score(doc=4487,freq=32.0), product of:
                0.06850986 = queryWeight, product of:
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.043811057 = queryNorm
                0.34554482 = fieldWeight in 4487, product of:
                  5.656854 = tf(freq=32.0), with freq of:
                    32.0 = termFreq=32.0
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4487)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    The increasing flood of documentary information through the Internet and other information sources challenges the developers of information retrieval systems. It is not enough that an IR system is able to make a distinction between relevant and non-relevant documents. The reduction of information overload requires that IR systems provide the capability of screening the most valuable documents out of the mass of potentially or marginally relevant documents. This paper introduces a new concept-based method to analyse the text characteristics of documents at varying relevance levels. The results of the document analysis were applied in an experiment on query expansion (QE) in a probabilistic IR system. Statistical differences in textual characteristics of highly relevant and less relevant documents were investigated by applying a facet analysis technique. In highly relevant documents a larger number of aspects of the request were discussed, searchable expressions for the aspects were distributed over a larger set of text paragraphs, and a larger set of unique expressions were used per aspect than in marginally relevant documents. A query expansion experiment verified that the findings of the text analysis can be exploited in formulating more effective queries for best match retrieval in the search for highly relevant documents. The results revealed that expanded queries with concept-based structures performed better than unexpanded queries or Ñnatural languageÒ queries. Further, it was shown that highly relevant documents benefit essentially more from the concept-based QE in ranking than marginally relevant documents.
    Source
    Journal of documentation. 57(2001) no.3, S.358-376
  19. Green, R.: Topical relevance relationships : 2: an exploratory study and preliminary typology (1995) 0.00
    0.003925761 = product of:
      0.011777283 = sum of:
        0.011777283 = product of:
          0.023554565 = sum of:
            0.023554565 = weight(_text_:of in 3724) [ClassicSimilarity], result of:
              0.023554565 = score(doc=3724,freq=22.0), product of:
                0.06850986 = queryWeight, product of:
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.043811057 = queryNorm
                0.34381276 = fieldWeight in 3724, product of:
                  4.690416 = tf(freq=22.0), with freq of:
                    22.0 = termFreq=22.0
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3724)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    The assumption of topic matching between user needs and texts topically relevant to those needs is often erroneous. Reports an emprical investigantion of the question 'what relationship types actually account for topical relevance'? In order to avoid the bias to topic matching search strategies, user needs are back generated from a randomly selected subset of the subject headings employed in a user oriented topical concordance. The corresponding relevant texts are those indicated in the concordance under the subject heading. Compares the topics of the user needs with the topics of the relevant texts to determine the relationships between them. Topical relevance relationships include a large variety of relationships, only some of which are matching relationships. Others are examples of paradigmatic or syntagmatic relationships. There appear to be no constraints on the kinds of relationships that can function as topical relevance relationships. They are distinguishable from other types of relationships only on functional grounds
    Source
    Journal of the American Society for Information Science. 46(1995) no.9, S.654-662
  20. Zhu, B.; Chen, H.: Validating a geographical image retrieval system (2000) 0.00
    0.003925761 = product of:
      0.011777283 = sum of:
        0.011777283 = product of:
          0.023554565 = sum of:
            0.023554565 = weight(_text_:of in 4769) [ClassicSimilarity], result of:
              0.023554565 = score(doc=4769,freq=22.0), product of:
                0.06850986 = queryWeight, product of:
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.043811057 = queryNorm
                0.34381276 = fieldWeight in 4769, product of:
                  4.690416 = tf(freq=22.0), with freq of:
                    22.0 = termFreq=22.0
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4769)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    This paper summarizes a prototype geographical image retrieval system that demonstrates how to integrate image processing and information analysis techniques to support large-scale content-based image retrieval. By using an image as its interface, the prototype system addresses a troublesome aspect of traditional retrieval models, which require users to have complete knowledge of the low-level features of an image. In addition we describe an experiment to validate against that of human subjects in an effort to address the scarcity of research evaluating performance of an algorithm against that of human beings. The results of the experiment indicate that the system could do as well as human subjects in accomplishing the tasks of similarity analysis and image categorization. We also found that under some circumstances texture features of an image are insufficient to represent an geographic image. We believe, however, that our image retrieval system provides a promising approach to integrating image processing techniques and information retrieval algorithms
    Source
    Journal of the American Society for Information Science. 51(2000) no.7, S.625-634

Languages

  • e 293
  • d 9
  • chi 2
  • More… Less…

Types

  • a 283
  • m 10
  • el 8
  • s 4
  • r 3
  • p 2
  • x 1
  • More… Less…