Search (308 results, page 1 of 16)

  • × theme_ss:"Retrievalalgorithmen"
  1. Xu, Y.; Wang, D.: Order effect in relevance judgment : mediation and causality (2008) 0.06
    0.06039112 = product of:
      0.2214341 = sum of:
        0.19265921 = weight(_text_:effect in 1877) [ClassicSimilarity], result of:
          0.19265921 = score(doc=1877,freq=18.0), product of:
            0.18289955 = queryWeight, product of:
              5.29663 = idf(docFreq=601, maxDocs=44218)
              0.034531306 = queryNorm
            1.0533608 = fieldWeight in 1877, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              5.29663 = idf(docFreq=601, maxDocs=44218)
              0.046875 = fieldNorm(doc=1877)
        0.017701415 = weight(_text_:of in 1877) [ClassicSimilarity], result of:
          0.017701415 = score(doc=1877,freq=20.0), product of:
            0.053998582 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.034531306 = queryNorm
            0.32781258 = fieldWeight in 1877, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=1877)
        0.011073467 = weight(_text_:on in 1877) [ClassicSimilarity], result of:
          0.011073467 = score(doc=1877,freq=2.0), product of:
            0.07594867 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.034531306 = queryNorm
            0.14580199 = fieldWeight in 1877, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.046875 = fieldNorm(doc=1877)
      0.27272728 = coord(3/11)
    
    Abstract
    The order effect of relevance judgment refers to the different relevance perceptions of a document when it appears in different positions in a list. Although the order effect of relevance judgment has significant theoretical and practical implications, the extant literature is inconclusive regarding its existence and forming mechanisms. This study proposes a set of order effect forming mechanisms, including the learning effect, the subneed scheduling effect, and the cursoriness effect based on the conceptualization of dynamic relevance and the psychology of cognitive elaboration. Our empirical study indicates that in an interactive information retrieval setting, when a document list is reasonably long, order effects demonstrate a curvilinear pattern that conforms to the combined effect of the three mechanisms. Moreover, the curvilinear pattern of order effect could differ for documents of different relevance levels.
    Source
    Journal of the American Society for Information Science and Technology. 59(2008) no.8, S.1264-1275
  2. MacFarlane, A.; Robertson, S.E.; McCann, J.A.: Parallel computing for passage retrieval (2004) 0.06
    0.05892224 = product of:
      0.16203615 = sum of:
        0.12109391 = weight(_text_:effect in 5108) [ClassicSimilarity], result of:
          0.12109391 = score(doc=5108,freq=4.0), product of:
            0.18289955 = queryWeight, product of:
              5.29663 = idf(docFreq=601, maxDocs=44218)
              0.034531306 = queryNorm
            0.66207874 = fieldWeight in 5108, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.29663 = idf(docFreq=601, maxDocs=44218)
              0.0625 = fieldNorm(doc=5108)
        0.007463572 = weight(_text_:of in 5108) [ClassicSimilarity], result of:
          0.007463572 = score(doc=5108,freq=2.0), product of:
            0.053998582 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.034531306 = queryNorm
            0.13821793 = fieldWeight in 5108, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0625 = fieldNorm(doc=5108)
        0.014764623 = weight(_text_:on in 5108) [ClassicSimilarity], result of:
          0.014764623 = score(doc=5108,freq=2.0), product of:
            0.07594867 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.034531306 = queryNorm
            0.19440265 = fieldWeight in 5108, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0625 = fieldNorm(doc=5108)
        0.018714061 = product of:
          0.037428122 = sum of:
            0.037428122 = weight(_text_:22 in 5108) [ClassicSimilarity], result of:
              0.037428122 = score(doc=5108,freq=2.0), product of:
                0.12092275 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.034531306 = queryNorm
                0.30952093 = fieldWeight in 5108, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=5108)
          0.5 = coord(1/2)
      0.36363637 = coord(4/11)
    
    Abstract
    In this paper methods for both speeding up passage processing and examining more passages using parallel computers are explored. The number of passages processed are varied in order to examine the effect on retrieval effectiveness and efficiency. The particular algorithm applied has previously been used to good effect in Okapi experiments at TREC. This algorithm and the mechanism for applying parallel computing to speed up processing are described.
    Date
    20. 1.2007 18:30:22
  3. Ruthven, T.; Lalmas, M.; Rijsbergen, K.van: Incorporating user research behavior into relevance feedback (2003) 0.05
    0.049799085 = product of:
      0.13694748 = sum of:
        0.05263353 = weight(_text_:higher in 5169) [ClassicSimilarity], result of:
          0.05263353 = score(doc=5169,freq=2.0), product of:
            0.18138453 = queryWeight, product of:
              5.252756 = idf(docFreq=628, maxDocs=44218)
              0.034531306 = queryNorm
            0.2901765 = fieldWeight in 5169, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.252756 = idf(docFreq=628, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5169)
        0.053516448 = weight(_text_:effect in 5169) [ClassicSimilarity], result of:
          0.053516448 = score(doc=5169,freq=2.0), product of:
            0.18289955 = queryWeight, product of:
              5.29663 = idf(docFreq=601, maxDocs=44218)
              0.034531306 = queryNorm
            0.2926002 = fieldWeight in 5169, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.29663 = idf(docFreq=601, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5169)
        0.012341722 = weight(_text_:of in 5169) [ClassicSimilarity], result of:
          0.012341722 = score(doc=5169,freq=14.0), product of:
            0.053998582 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.034531306 = queryNorm
            0.22855641 = fieldWeight in 5169, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5169)
        0.018455777 = weight(_text_:on in 5169) [ClassicSimilarity], result of:
          0.018455777 = score(doc=5169,freq=8.0), product of:
            0.07594867 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.034531306 = queryNorm
            0.24300331 = fieldWeight in 5169, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5169)
      0.36363637 = coord(4/11)
    
    Abstract
    Ruthven, Mounia, and van Rijsbergen rank and select terms for query expansion using information gathered on searcher evaluation behavior. Using the TREC Financial Times and Los Angeles Times collections and search topics from TREC-6 placed in simulated work situations, six student subjects each preformed three searches on an experimental system and three on a control system with instructions to search by natural language expression in any way they found comfortable. Searching was analyzed for behavior differences between experimental and control situations, and for effectiveness and perceptions. In three experiments paired t-tests were the analysis tool with controls being a no relevance feedback system, a standard ranking for automatic expansion system, and a standard ranking for interactive expansion while the experimental systems based ranking upon user information on temporal relevance and partial relevance. Two further experiments compare using user behavior (number assessed relevant and similarity of relevant documents) to choose a query expansion technique against a non-selective technique and finally the effect of providing the user with knowledge of the process. When partial relevance data and time of assessment data are incorporated in term ranking more relevant documents were recovered in fewer iterations, however retrieval effectiveness overall was not improved. The subjects, none-the-less, rated the suggested terms as more useful and used them more heavily. Explanations of what the feedback techniques were doing led to higher use of the techniques.
    Source
    Journal of the American Society for Information Science and technology. 54(2003) no.6, S.528-548
  4. Lee, D.L.; Ren, L.: Document ranking on weight-partitioned signature files (1996) 0.04
    0.039860837 = product of:
      0.1461564 = sum of:
        0.105957165 = weight(_text_:effect in 2417) [ClassicSimilarity], result of:
          0.105957165 = score(doc=2417,freq=4.0), product of:
            0.18289955 = queryWeight, product of:
              5.29663 = idf(docFreq=601, maxDocs=44218)
              0.034531306 = queryNorm
            0.5793189 = fieldWeight in 2417, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.29663 = idf(docFreq=601, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2417)
        0.011311376 = weight(_text_:of in 2417) [ClassicSimilarity], result of:
          0.011311376 = score(doc=2417,freq=6.0), product of:
            0.053998582 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.034531306 = queryNorm
            0.20947541 = fieldWeight in 2417, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2417)
        0.028887864 = weight(_text_:on in 2417) [ClassicSimilarity], result of:
          0.028887864 = score(doc=2417,freq=10.0), product of:
            0.07594867 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.034531306 = queryNorm
            0.38036036 = fieldWeight in 2417, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2417)
      0.27272728 = coord(3/11)
    
    Abstract
    Proposes the weight partitioned signature file, a signature file organization for supporting document ranking. It uses multiple signature files each corresponding to one term frequency to represent terms with different term frequencies. Words with the same term frequency in a document are grouped together and hased into the signature file corresponding to that term frequency. Investigates the effect of false drops on retrieval effectiveness. Analyses the performance of the weight partitioned signature file under different search strategies and configurations. Obtains an optimal formula for storage allocation to minimise the effect of false drops on document ranks. Analytical results are supported by experiments on document collections
    Source
    ACM transactions on information systems. 14(1996) no.2, S.109-137
  5. Na, S.-H.; Kang, I.-S.; Roh, J.-E.; Lee, J.-H.: ¬An empirical study of query expansion and cluster-based retrieval in language modeling approach (2007) 0.04
    0.03936281 = product of:
      0.14433031 = sum of:
        0.105957165 = weight(_text_:effect in 906) [ClassicSimilarity], result of:
          0.105957165 = score(doc=906,freq=4.0), product of:
            0.18289955 = queryWeight, product of:
              5.29663 = idf(docFreq=601, maxDocs=44218)
              0.034531306 = queryNorm
            0.5793189 = fieldWeight in 906, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.29663 = idf(docFreq=601, maxDocs=44218)
              0.0546875 = fieldNorm(doc=906)
        0.0159967 = weight(_text_:of in 906) [ClassicSimilarity], result of:
          0.0159967 = score(doc=906,freq=12.0), product of:
            0.053998582 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.034531306 = queryNorm
            0.29624295 = fieldWeight in 906, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0546875 = fieldNorm(doc=906)
        0.022376444 = weight(_text_:on in 906) [ClassicSimilarity], result of:
          0.022376444 = score(doc=906,freq=6.0), product of:
            0.07594867 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.034531306 = queryNorm
            0.29462588 = fieldWeight in 906, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0546875 = fieldNorm(doc=906)
      0.27272728 = coord(3/11)
    
    Abstract
    The term mismatch problem in information retrieval is a critical problem, and several techniques have been developed, such as query expansion, cluster-based retrieval and dimensionality reduction to resolve this issue. Of these techniques, this paper performs an empirical study on query expansion and cluster-based retrieval. We examine the effect of using parsimony in query expansion and the effect of clustering algorithms in cluster-based retrieval. In addition, query expansion and cluster-based retrieval are compared, and their combinations are evaluated in terms of retrieval performance by performing experimentations on seven test collections of NTCIR and TREC.
    Footnote
    Beitrag in: Special issue on AIRS2005: Information Retrieval Research in Asia
  6. Shiri, A.A.; Revie, C.: Query expansion behavior within a thesaurus-enhanced search environment : a user-centered evaluation (2006) 0.03
    0.03382332 = product of:
      0.09301412 = sum of:
        0.053516448 = weight(_text_:effect in 56) [ClassicSimilarity], result of:
          0.053516448 = score(doc=56,freq=2.0), product of:
            0.18289955 = queryWeight, product of:
              5.29663 = idf(docFreq=601, maxDocs=44218)
              0.034531306 = queryNorm
            0.2926002 = fieldWeight in 56, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.29663 = idf(docFreq=601, maxDocs=44218)
              0.0390625 = fieldNorm(doc=56)
        0.014751178 = weight(_text_:of in 56) [ClassicSimilarity], result of:
          0.014751178 = score(doc=56,freq=20.0), product of:
            0.053998582 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.034531306 = queryNorm
            0.27317715 = fieldWeight in 56, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=56)
        0.013050207 = weight(_text_:on in 56) [ClassicSimilarity], result of:
          0.013050207 = score(doc=56,freq=4.0), product of:
            0.07594867 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.034531306 = queryNorm
            0.1718293 = fieldWeight in 56, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=56)
        0.011696288 = product of:
          0.023392577 = sum of:
            0.023392577 = weight(_text_:22 in 56) [ClassicSimilarity], result of:
              0.023392577 = score(doc=56,freq=2.0), product of:
                0.12092275 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.034531306 = queryNorm
                0.19345059 = fieldWeight in 56, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=56)
          0.5 = coord(1/2)
      0.36363637 = coord(4/11)
    
    Abstract
    The study reported here investigated the query expansion behavior of end-users interacting with a thesaurus-enhanced search system on the Web. Two groups, namely academic staff and postgraduate students, were recruited into this study. Data were collected from 90 searches performed by 30 users using the OVID interface to the CAB abstracts database. Data-gathering techniques included questionnaires, screen capturing software, and interviews. The results presented here relate to issues of search-topic and search-term characteristics, number and types of expanded queries, usefulness of thesaurus terms, and behavioral differences between academic staff and postgraduate students in their interaction. The key conclusions drawn were that (a) academic staff chose more narrow and synonymous terms than did postgraduate students, who generally selected broader and related terms; (b) topic complexity affected users' interaction with the thesaurus in that complex topics required more query expansion and search term selection; (c) users' prior topic-search experience appeared to have a significant effect on their selection and evaluation of thesaurus terms; (d) in 50% of the searches where additional terms were suggested from the thesaurus, users stated that they had not been aware of the terms at the beginning of the search; this observation was particularly noticeable in the case of postgraduate students.
    Date
    22. 7.2006 16:32:43
    Source
    Journal of the American Society for Information Science and Technology. 57(2006) no.4, S.462-478
  7. Kaszkiel, M.; Zobel, J.: Effective ranking with arbitrary passages (2001) 0.03
    0.030359533 = product of:
      0.11131828 = sum of:
        0.02307982 = weight(_text_:of in 5764) [ClassicSimilarity], result of:
          0.02307982 = score(doc=5764,freq=34.0), product of:
            0.053998582 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.034531306 = queryNorm
            0.4274153 = fieldWeight in 5764, product of:
              5.8309517 = tf(freq=34.0), with freq of:
                34.0 = termFreq=34.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=5764)
        0.015660247 = weight(_text_:on in 5764) [ClassicSimilarity], result of:
          0.015660247 = score(doc=5764,freq=4.0), product of:
            0.07594867 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.034531306 = queryNorm
            0.20619515 = fieldWeight in 5764, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.046875 = fieldNorm(doc=5764)
        0.072578214 = weight(_text_:great in 5764) [ClassicSimilarity], result of:
          0.072578214 = score(doc=5764,freq=2.0), product of:
            0.19443816 = queryWeight, product of:
              5.6307793 = idf(docFreq=430, maxDocs=44218)
              0.034531306 = queryNorm
            0.37327147 = fieldWeight in 5764, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.6307793 = idf(docFreq=430, maxDocs=44218)
              0.046875 = fieldNorm(doc=5764)
      0.27272728 = coord(3/11)
    
    Abstract
    Text retrieval systems store a great variety of documents, from abstracts, newspaper articles, and Web pages to journal articles, books, court transcripts, and legislation. Collections of diverse types of documents expose shortcomings in current approaches to ranking. Use of short fragments of documents, called passages, instead of whole documents can overcome these shortcomings: passage ranking provides convenient units of text to return to the user, can avoid the difficulties of comparing documents of different length, and enables identification of short blocks of relevant material among otherwise irrelevant text. In this article, we compare several kinds of passage in an extensive series of experiments. We introduce a new type of passage, overlapping fragments of either fixed or variable length. We show that ranking with these arbitrary passages gives substantial improvements in retrieval effectiveness over traditional document ranking schemes, particularly for queries on collections of long documents. Ranking with arbitrary passages shows consistent improvements compared to ranking with whole documents, and to ranking with previous passage types that depend on document structure or topic shifts in documents
    Source
    Journal of the American Society for Information Science and technology. 52(2001) no.4, S.344-364
  8. Bidoki, A.M.Z.; Yazdani, N.: an intelligent ranking algorithm for web pages : DistanceRank (2008) 0.03
    0.027181976 = product of:
      0.09966724 = sum of:
        0.07368694 = weight(_text_:higher in 2068) [ClassicSimilarity], result of:
          0.07368694 = score(doc=2068,freq=2.0), product of:
            0.18138453 = queryWeight, product of:
              5.252756 = idf(docFreq=628, maxDocs=44218)
              0.034531306 = queryNorm
            0.4062471 = fieldWeight in 2068, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.252756 = idf(docFreq=628, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2068)
        0.0130612515 = weight(_text_:of in 2068) [ClassicSimilarity], result of:
          0.0130612515 = score(doc=2068,freq=8.0), product of:
            0.053998582 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.034531306 = queryNorm
            0.24188137 = fieldWeight in 2068, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2068)
        0.012919044 = weight(_text_:on in 2068) [ClassicSimilarity], result of:
          0.012919044 = score(doc=2068,freq=2.0), product of:
            0.07594867 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.034531306 = queryNorm
            0.17010231 = fieldWeight in 2068, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2068)
      0.27272728 = coord(3/11)
    
    Abstract
    A fast and efficient page ranking mechanism for web crawling and retrieval remains as a challenging issue. Recently, several link based ranking algorithms like PageRank, HITS and OPIC have been proposed. In this paper, we propose a novel recursive method based on reinforcement learning which considers distance between pages as punishment, called "DistanceRank" to compute ranks of web pages. The distance is defined as the number of "average clicks" between two pages. The objective is to minimize punishment or distance so that a page with less distance to have a higher rank. Experimental results indicate that DistanceRank outperforms other ranking algorithms in page ranking and crawling scheduling. Furthermore, the complexity of DistanceRank is low. We have used University of California at Berkeley's web for our experiments.
  9. Abdelkareem, M.A.A.: In terms of publication index, what indicator is the best for researchers indexing, Google Scholar, Scopus, Clarivate or others? (2018) 0.03
    0.026704736 = product of:
      0.09791736 = sum of:
        0.07368694 = weight(_text_:higher in 4548) [ClassicSimilarity], result of:
          0.07368694 = score(doc=4548,freq=2.0), product of:
            0.18138453 = queryWeight, product of:
              5.252756 = idf(docFreq=628, maxDocs=44218)
              0.034531306 = queryNorm
            0.4062471 = fieldWeight in 4548, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.252756 = idf(docFreq=628, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4548)
        0.011311376 = weight(_text_:of in 4548) [ClassicSimilarity], result of:
          0.011311376 = score(doc=4548,freq=6.0), product of:
            0.053998582 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.034531306 = queryNorm
            0.20947541 = fieldWeight in 4548, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4548)
        0.012919044 = weight(_text_:on in 4548) [ClassicSimilarity], result of:
          0.012919044 = score(doc=4548,freq=2.0), product of:
            0.07594867 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.034531306 = queryNorm
            0.17010231 = fieldWeight in 4548, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4548)
      0.27272728 = coord(3/11)
    
    Abstract
    I believe that Google Scholar is the most popular academic indexing way for researchers and citations. However, some other indexing institutions may be more professional than Google Scholar but not as popular as Google Scholar. Other indexing websites like Scopus and Clarivate are providing more statistical figures for scholars, institutions or even journals. On account of publication citations, always Google Scholar shows higher citations for a paper than other indexing websites since Google Scholar consider most of the publication platforms so he can easily count the citations. While other databases just consider the citations come from those journals that are already indexed in their database
  10. Efron, M.: Linear time series models for term weighting in information retrieval (2010) 0.03
    0.026324157 = product of:
      0.09652191 = sum of:
        0.06316024 = weight(_text_:higher in 3688) [ClassicSimilarity], result of:
          0.06316024 = score(doc=3688,freq=2.0), product of:
            0.18138453 = queryWeight, product of:
              5.252756 = idf(docFreq=628, maxDocs=44218)
              0.034531306 = queryNorm
            0.34821182 = fieldWeight in 3688, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.252756 = idf(docFreq=628, maxDocs=44218)
              0.046875 = fieldNorm(doc=3688)
        0.017701415 = weight(_text_:of in 3688) [ClassicSimilarity], result of:
          0.017701415 = score(doc=3688,freq=20.0), product of:
            0.053998582 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.034531306 = queryNorm
            0.32781258 = fieldWeight in 3688, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=3688)
        0.015660247 = weight(_text_:on in 3688) [ClassicSimilarity], result of:
          0.015660247 = score(doc=3688,freq=4.0), product of:
            0.07594867 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.034531306 = queryNorm
            0.20619515 = fieldWeight in 3688, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.046875 = fieldNorm(doc=3688)
      0.27272728 = coord(3/11)
    
    Abstract
    Common measures of term importance in information retrieval (IR) rely on counts of term frequency; rare terms receive higher weight in document ranking than common terms receive. However, realistic scenarios yield additional information about terms in a collection. Of interest in this article is the temporal behavior of terms as a collection changes over time. We propose capturing each term's collection frequency at discrete time intervals over the lifespan of a corpus and analyzing the resulting time series. We hypothesize the collection frequency of a weakly discriminative term x at time t is predictable by a linear model of the term's prior observations. On the other hand, a linear time series model for a strong discriminators' collection frequency will yield a poor fit to the data. Operationalizing this hypothesis, we induce three time-based measures of term importance and test these against state-of-the-art term weighting models.
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.7, S.1299-1312
  11. Savoy, J.: Ranking schemes in hybrid Boolean systems : a new approach (1997) 0.03
    0.026227767 = product of:
      0.09616847 = sum of:
        0.012516791 = weight(_text_:of in 393) [ClassicSimilarity], result of:
          0.012516791 = score(doc=393,freq=10.0), product of:
            0.053998582 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.034531306 = queryNorm
            0.23179851 = fieldWeight in 393, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=393)
        0.011073467 = weight(_text_:on in 393) [ClassicSimilarity], result of:
          0.011073467 = score(doc=393,freq=2.0), product of:
            0.07594867 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.034531306 = queryNorm
            0.14580199 = fieldWeight in 393, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.046875 = fieldNorm(doc=393)
        0.072578214 = weight(_text_:great in 393) [ClassicSimilarity], result of:
          0.072578214 = score(doc=393,freq=2.0), product of:
            0.19443816 = queryWeight, product of:
              5.6307793 = idf(docFreq=430, maxDocs=44218)
              0.034531306 = queryNorm
            0.37327147 = fieldWeight in 393, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.6307793 = idf(docFreq=430, maxDocs=44218)
              0.046875 = fieldNorm(doc=393)
      0.27272728 = coord(3/11)
    
    Abstract
    In most commercial online systems, the retrieval system is based on the Boolean model and its inverted file organization. Since the investment in these systems is so great and changing them could be economically unfeasible, this article suggests a new ranking scheme especially adapted for hypertext environments in order to produce more effective retrieval results and yet maintain the effectiveness of the investment made to date in the Boolean model. To select the retrieved documents, the suggested ranking strategy uses multiple sources of document content evidence. The proposed scheme integrates both the information provided by the index and query terms, and the inherent relationships between documents such as bibliographic references or hypertext links. We will demonstrate that our scheme represents an integration of both subject and citation indexing, and results in a significant imporvement over classical ranking schemes uses in hybrid Boolean systems, while preserving its efficiency. Moreover, through knowing the nearest neighbor and the hypertext links which constitute additional sources of evidence, our strategy will take them into account in order to further improve retrieval effectiveness and to provide 'good' starting points for browsing in a hypertext or hypermedia environement
    Source
    Journal of the American Society for Information Science. 48(1997) no.3, S.235-253
  12. Deerwester, S.C.; Dumais, S.T.; Landauer, T.K.; Furnas, G.W.; Harshman, R.A.: Indexing by latent semantic analysis (1990) 0.02
    0.024563547 = product of:
      0.090066336 = sum of:
        0.06316024 = weight(_text_:higher in 2399) [ClassicSimilarity], result of:
          0.06316024 = score(doc=2399,freq=2.0), product of:
            0.18138453 = queryWeight, product of:
              5.252756 = idf(docFreq=628, maxDocs=44218)
              0.034531306 = queryNorm
            0.34821182 = fieldWeight in 2399, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.252756 = idf(docFreq=628, maxDocs=44218)
              0.046875 = fieldNorm(doc=2399)
        0.015832627 = weight(_text_:of in 2399) [ClassicSimilarity], result of:
          0.015832627 = score(doc=2399,freq=16.0), product of:
            0.053998582 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.034531306 = queryNorm
            0.2932045 = fieldWeight in 2399, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=2399)
        0.011073467 = weight(_text_:on in 2399) [ClassicSimilarity], result of:
          0.011073467 = score(doc=2399,freq=2.0), product of:
            0.07594867 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.034531306 = queryNorm
            0.14580199 = fieldWeight in 2399, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.046875 = fieldNorm(doc=2399)
      0.27272728 = coord(3/11)
    
    Abstract
    A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higher-order structure in the association of terms with documents ("semantic structure") in order to improve the detection of relevant documents on the basis of terms found in queries. The particular technique used is singular-value decomposition, in which a large term by document matrix is decomposed into a set of ca 100 orthogonal factors from which the original matrix can be approximated by linear combination. Documents are represented by ca 100 item vectors of factor weights. Queries are represented as pseudo-document vectors formed from weighted combinations of terms, and documents with supra-threshold cosine values are returned. Initial tests find this completely automatic method for retrieval to be promising.
    Source
    Journal of the American Society for Information Science. 41(1990) no.6, S.391-407
  13. Crouch, C.J.; Crouch, D.B.; Chen, Q.; Holtz, S.J.: Improving the retrieval effectiveness of very short queries (2002) 0.02
    0.024335822 = product of:
      0.08923134 = sum of:
        0.05263353 = weight(_text_:higher in 2572) [ClassicSimilarity], result of:
          0.05263353 = score(doc=2572,freq=2.0), product of:
            0.18138453 = queryWeight, product of:
              5.252756 = idf(docFreq=628, maxDocs=44218)
              0.034531306 = queryNorm
            0.2901765 = fieldWeight in 2572, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.252756 = idf(docFreq=628, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2572)
        0.0139941955 = weight(_text_:of in 2572) [ClassicSimilarity], result of:
          0.0139941955 = score(doc=2572,freq=18.0), product of:
            0.053998582 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.034531306 = queryNorm
            0.25915858 = fieldWeight in 2572, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2572)
        0.02260362 = weight(_text_:on in 2572) [ClassicSimilarity], result of:
          0.02260362 = score(doc=2572,freq=12.0), product of:
            0.07594867 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.034531306 = queryNorm
            0.29761705 = fieldWeight in 2572, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2572)
      0.27272728 = coord(3/11)
    
    Abstract
    This paper describes an automatic approach designed to improve the retrieval effectiveness of very short queries such as those used in web searching. The method is based on the observation that stemming, which is designed to maximize recall, often results in depressed precision. Our approach is based on pseudo-feedback and attempts to increase the number of relevant documents in the pseudo-relevant set by reranking those documents based on the presence of unstemmed query terms in the document text. The original experiments underlying this work were carried out using Smart 11.0 and the lnc.ltc weighting scheme on three sets of documents from the TREC collection with corresponding TREC (title only) topics as queries. (The average length of these queries after stoplisting ranges from 2.4 to 4.5 terms.) Results, evaluated in terms of P@20 and non-interpolated average precision, showed clearly that pseudo-feedback (PF) based on this approach was effective in increasing the number of relevant documents in the top ranks. Subsequent experiments, performed on the same data sets using Smart 13.0 and the improved Lnu.ltu weighting scheme, indicate that these results hold up even over the much higher baseline provided by the new weights. Query drift analysis presents a more detailed picture of the improvements produced by this process.
  14. Costa Carvalho, A. da; Rossi, C.; Moura, E.S. de; Silva, A.S. da; Fernandes, D.: LePrEF: Learn to precompute evidence fusion for efficient query evaluation (2012) 0.02
    0.02291468 = product of:
      0.08402049 = sum of:
        0.053516448 = weight(_text_:effect in 278) [ClassicSimilarity], result of:
          0.053516448 = score(doc=278,freq=2.0), product of:
            0.18289955 = queryWeight, product of:
              5.29663 = idf(docFreq=601, maxDocs=44218)
              0.034531306 = queryNorm
            0.2926002 = fieldWeight in 278, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.29663 = idf(docFreq=601, maxDocs=44218)
              0.0390625 = fieldNorm(doc=278)
        0.01745383 = weight(_text_:of in 278) [ClassicSimilarity], result of:
          0.01745383 = score(doc=278,freq=28.0), product of:
            0.053998582 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.034531306 = queryNorm
            0.32322758 = fieldWeight in 278, product of:
              5.2915025 = tf(freq=28.0), with freq of:
                28.0 = termFreq=28.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=278)
        0.013050207 = weight(_text_:on in 278) [ClassicSimilarity], result of:
          0.013050207 = score(doc=278,freq=4.0), product of:
            0.07594867 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.034531306 = queryNorm
            0.1718293 = fieldWeight in 278, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=278)
      0.27272728 = coord(3/11)
    
    Abstract
    State-of-the-art search engine ranking methods combine several distinct sources of relevance evidence to produce a high-quality ranking of results for each query. The fusion of information is currently done at query-processing time, which has a direct effect on the response time of search systems. Previous research also shows that an alternative to improve search efficiency in textual databases is to precompute term impacts at indexing time. In this article, we propose a novel alternative to precompute term impacts, providing a generic framework for combining any distinct set of sources of evidence by using a machine-learning technique. This method retains the advantages of producing high-quality results, but avoids the costs of combining evidence at query-processing time. Our method, called Learn to Precompute Evidence Fusion (LePrEF), uses genetic programming to compute a unified precomputed impact value for each term found in each document prior to query processing, at indexing time. Compared with previous research on precomputing term impacts, our method offers the advantage of providing a generic framework to precompute impact using any set of relevance evidence at any text collection, whereas previous research articles do not. The precomputed impact values are indexed and used later for computing document ranking at query-processing time. By doing so, our method effectively reduces the query processing to simple additions of such impacts. We show that this approach, while leading to results comparable to state-of-the-art ranking methods, also can lead to a significant decrease in computational costs during query processing.
    Source
    Journal of the American Society for Information Science and Technology. 63(2012) no.7, S.1383-1397
  15. Hoenkamp, E.: Unitary operators on the document space (2003) 0.02
    0.022828344 = product of:
      0.08370393 = sum of:
        0.0139941955 = weight(_text_:of in 3457) [ClassicSimilarity], result of:
          0.0139941955 = score(doc=3457,freq=18.0), product of:
            0.053998582 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.034531306 = queryNorm
            0.25915858 = fieldWeight in 3457, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3457)
        0.009227889 = weight(_text_:on in 3457) [ClassicSimilarity], result of:
          0.009227889 = score(doc=3457,freq=2.0), product of:
            0.07594867 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.034531306 = queryNorm
            0.121501654 = fieldWeight in 3457, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3457)
        0.060481843 = weight(_text_:great in 3457) [ClassicSimilarity], result of:
          0.060481843 = score(doc=3457,freq=2.0), product of:
            0.19443816 = queryWeight, product of:
              5.6307793 = idf(docFreq=430, maxDocs=44218)
              0.034531306 = queryNorm
            0.31105953 = fieldWeight in 3457, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.6307793 = idf(docFreq=430, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3457)
      0.27272728 = coord(3/11)
    
    Abstract
    When people search for documents, they eventually want content, not words. Hence, search engines should relate documents more by their underlying concepts than by the words they contain. One promising technique to do so is Latent Semantic Indexing (LSI). LSI dramatically reduces the dimension of the document space by mapping it into a space spanned by conceptual indices. Empirically, the number of concepts that can represent the documents are far fewer than the great variety of words in the textual representation. Although this almost obviates the problem of lexical matching, the mapping incurs a high computational cost compared to document parsing, indexing, query matching, and updating. This article accomplishes several things. First, it shows how the technique underlying LSI is just one example of a unitary operator, for which there are computationally more attractive alternatives. Second, it proposes the Haar transform as such an alternative, as it is memory efficient, and can be computed in linear to sublinear time. Third, it generalizes LSI by a multiresolution representation of the document space. The approach not only preserves the advantages of LSI at drastically reduced computational costs, it also opens a spectrum of possibilities for new research.
    Source
    Journal of the American Society for Information Science and technology. 54(2003) no.4, S.314-320
  16. Purpura, A.; Silvello, G.; Susto, G.A.: Learning to rank from relevance judgments distributions (2022) 0.02
    0.022722738 = product of:
      0.083316706 = sum of:
        0.05263353 = weight(_text_:higher in 645) [ClassicSimilarity], result of:
          0.05263353 = score(doc=645,freq=2.0), product of:
            0.18138453 = queryWeight, product of:
              5.252756 = idf(docFreq=628, maxDocs=44218)
              0.034531306 = queryNorm
            0.2901765 = fieldWeight in 645, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.252756 = idf(docFreq=628, maxDocs=44218)
              0.0390625 = fieldNorm(doc=645)
        0.008079554 = weight(_text_:of in 645) [ClassicSimilarity], result of:
          0.008079554 = score(doc=645,freq=6.0), product of:
            0.053998582 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.034531306 = queryNorm
            0.1496253 = fieldWeight in 645, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=645)
        0.02260362 = weight(_text_:on in 645) [ClassicSimilarity], result of:
          0.02260362 = score(doc=645,freq=12.0), product of:
            0.07594867 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.034531306 = queryNorm
            0.29761705 = fieldWeight in 645, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=645)
      0.27272728 = coord(3/11)
    
    Abstract
    LEarning TO Rank (LETOR) algorithms are usually trained on annotated corpora where a single relevance label is assigned to each available document-topic pair. Within the Cranfield framework, relevance labels result from merging either multiple expertly curated or crowdsourced human assessments. In this paper, we explore how to train LETOR models with relevance judgments distributions (either real or synthetically generated) assigned to document-topic pairs instead of single-valued relevance labels. We propose five new probabilistic loss functions to deal with the higher expressive power provided by relevance judgments distributions and show how they can be applied both to neural and gradient boosting machine (GBM) architectures. Moreover, we show how training a LETOR model on a sampled version of the relevance judgments from certain probability distributions can improve its performance when relying either on traditional or probabilistic loss functions. Finally, we validate our hypothesis on real-world crowdsourced relevance judgments distributions. Overall, we observe that relying on relevance judgments distributions to train different LETOR models can boost their performance and even outperform strong baselines such as LambdaMART on several test collections.
    Source
    Journal of the Association for Information Science and Technology. 73(2022) no.9, S.1236-1252
  17. Hubert, G.; Pitarch, Y.; Pinel-Sauvagnat, K.; Tournier, R.; Laporte, L.: TournaRank : when retrieval becomes document competition (2018) 0.02
    0.02261007 = product of:
      0.08290359 = sum of:
        0.013193856 = weight(_text_:of in 5087) [ClassicSimilarity], result of:
          0.013193856 = score(doc=5087,freq=16.0), product of:
            0.053998582 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.034531306 = queryNorm
            0.24433708 = fieldWeight in 5087, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5087)
        0.009227889 = weight(_text_:on in 5087) [ClassicSimilarity], result of:
          0.009227889 = score(doc=5087,freq=2.0), product of:
            0.07594867 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.034531306 = queryNorm
            0.121501654 = fieldWeight in 5087, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5087)
        0.060481843 = weight(_text_:great in 5087) [ClassicSimilarity], result of:
          0.060481843 = score(doc=5087,freq=2.0), product of:
            0.19443816 = queryWeight, product of:
              5.6307793 = idf(docFreq=430, maxDocs=44218)
              0.034531306 = queryNorm
            0.31105953 = fieldWeight in 5087, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.6307793 = idf(docFreq=430, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5087)
      0.27272728 = coord(3/11)
    
    Abstract
    Numerous feature-based models have been recently proposed by the information retrieval community. The capability of features to express different relevance facets (query- or document-dependent) can explain such a success story. Such models are most of the time supervised, thus requiring a learning phase. To leverage the advantages of feature-based representations of documents, we propose TournaRank, an unsupervised approach inspired by real-life game and sport competition principles. Documents compete against each other in tournaments using features as evidences of relevance. Tournaments are modeled as a sequence of matches, which involve pairs of documents playing in turn their features. Once a tournament is ended, documents are ranked according to their number of won matches during the tournament. This principle is generic since it can be applied to any collection type. It also provides great flexibility since different alternatives can be considered by changing the tournament type, the match rules, the feature set, or the strategies adopted by documents during matches. TournaRank was experimented on several collections to evaluate our model in different contexts and to compare it with related approaches such as Learning To Rank and fusion ones: the TREC Robust2004 collection for homogeneous documents, the TREC Web2014 (ClueWeb12) collection for heterogeneous web documents, and the LETOR3.0 collection for comparison with supervised feature-based models.
  18. Käki, M.: fKWIC: frequency-based Keyword-in-Context Index for filtering Web search results (2006) 0.02
    0.022404553 = product of:
      0.08215003 = sum of:
        0.06316024 = weight(_text_:higher in 6112) [ClassicSimilarity], result of:
          0.06316024 = score(doc=6112,freq=2.0), product of:
            0.18138453 = queryWeight, product of:
              5.252756 = idf(docFreq=628, maxDocs=44218)
              0.034531306 = queryNorm
            0.34821182 = fieldWeight in 6112, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.252756 = idf(docFreq=628, maxDocs=44218)
              0.046875 = fieldNorm(doc=6112)
        0.007916314 = weight(_text_:of in 6112) [ClassicSimilarity], result of:
          0.007916314 = score(doc=6112,freq=4.0), product of:
            0.053998582 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.034531306 = queryNorm
            0.14660224 = fieldWeight in 6112, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=6112)
        0.011073467 = weight(_text_:on in 6112) [ClassicSimilarity], result of:
          0.011073467 = score(doc=6112,freq=2.0), product of:
            0.07594867 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.034531306 = queryNorm
            0.14580199 = fieldWeight in 6112, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.046875 = fieldNorm(doc=6112)
      0.27272728 = coord(3/11)
    
    Abstract
    Enormous Web search engine databases combined with short search queries result in large result sets that are often difficult to access. Result ranking works fairly well, but users need help when it fails. For these situations, we propose a filtering interface that is inspired by keyword-in-context (KWIC) indices. The user interface lists the most frequent keyword contexts (fKWIC). When a context is selected, the corresponding results are displayed in the result list, allowing users to concentrate on the specific context. We compared the keyword context index user interface to the rank order result listing in an experiment with 36 participants. The results show that the proposed user interface was 29% faster in finding relevant results, and the precision of the selected results was 19% higher. In addition, participants showed positive attitudes toward the system.
    Source
    Journal of the American Society for Information Science and Technology. 57(2006) no.12, S.1606-1615
  19. López-Pujalte, C.; Guerrero-Bote, V.P.; Moya-Anegón, F. de: Order-based fitness functions for genetic algorithms applied to relevance feedback (2003) 0.02
    0.021936797 = product of:
      0.08043492 = sum of:
        0.05263353 = weight(_text_:higher in 5154) [ClassicSimilarity], result of:
          0.05263353 = score(doc=5154,freq=2.0), product of:
            0.18138453 = queryWeight, product of:
              5.252756 = idf(docFreq=628, maxDocs=44218)
              0.034531306 = queryNorm
            0.2901765 = fieldWeight in 5154, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.252756 = idf(docFreq=628, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5154)
        0.014751178 = weight(_text_:of in 5154) [ClassicSimilarity], result of:
          0.014751178 = score(doc=5154,freq=20.0), product of:
            0.053998582 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.034531306 = queryNorm
            0.27317715 = fieldWeight in 5154, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5154)
        0.013050207 = weight(_text_:on in 5154) [ClassicSimilarity], result of:
          0.013050207 = score(doc=5154,freq=4.0), product of:
            0.07594867 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.034531306 = queryNorm
            0.1718293 = fieldWeight in 5154, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5154)
      0.27272728 = coord(3/11)
    
    Abstract
    Lopez-Pujalte and Guerrero-Bote test a relevance feedback genetic algorithm while varying its order based fitness functions and generating a function based upon the Ide dec-hi method as a base line. Using the non-zero weighted term types assigned to the query, and to the initially retrieved set of documents, as genes, a chromosome of equal length is created for each. The algorithm is provided with the chromosomes for judged relevant documents, for judged irrelevant documents, and for the irrelevant documents with their terms negated. The algorithm uses random selection of all possible genes, but gives greater likelihood to those with higher fitness values. When the fittest chromosome of a previous population is eliminated it is restored while the least fittest of the new population is eliminated in its stead. A crossover probability of .8 and a mutation probability of .2 were used with 20 generations. Three fitness functions were utilized; the Horng and Yeh function which takes into account the position of relevant documents, and two new functions, one based on accumulating the cosine similarity for retrieved documents, the other on stored fixed-recall-interval precessions. The Cranfield collection was used with the first 15 documents retrieved from 33 queries chosen to have at least 3 relevant documents in the first 15 and at least 5 relevant documents not initially retrieved. Precision was calculated at fixed recall levels using the residual collection method which removes viewed documents. One of the three functions improved the original retrieval by127 percent, while the Ide dec-hi method provided a 120 percent improvement.
    Source
    Journal of the American Society for Information Science and technology. 54(2003) no.2, S.152-160
  20. Keen, E.M.: Designing and testing an interactive ranked retrieval system for professional searchers (1994) 0.02
    0.021258047 = product of:
      0.07794617 = sum of:
        0.05263353 = weight(_text_:higher in 1066) [ClassicSimilarity], result of:
          0.05263353 = score(doc=1066,freq=2.0), product of:
            0.18138453 = queryWeight, product of:
              5.252756 = idf(docFreq=628, maxDocs=44218)
              0.034531306 = queryNorm
            0.2901765 = fieldWeight in 1066, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.252756 = idf(docFreq=628, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1066)
        0.009329465 = weight(_text_:of in 1066) [ClassicSimilarity], result of:
          0.009329465 = score(doc=1066,freq=8.0), product of:
            0.053998582 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.034531306 = queryNorm
            0.17277241 = fieldWeight in 1066, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1066)
        0.015983174 = weight(_text_:on in 1066) [ClassicSimilarity], result of:
          0.015983174 = score(doc=1066,freq=6.0), product of:
            0.07594867 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.034531306 = queryNorm
            0.21044704 = fieldWeight in 1066, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1066)
      0.27272728 = coord(3/11)
    
    Abstract
    Reports 3 explorations of ranked system design. 2 tests used a 'cystic fibrosis' test collection with 100 queries. Experiment 1 compared a Boolean with a ranked interactive system using a subject qualified trained searcher, and reporting recall and precision results. Experiment 2 compared 15 different ranked match algorithms in a batch mode using 2 test collections, and included some new proximate pairs and term weighting approaches. Experiment 3 is a design plan for an interactive ranked prototype offering mid search algorithm choices plus other manual search devices (such as obligatory and unwanted terms), as influenced by thinking aloud comments from experiment 1. Concludes that, in Boolean versus ranked using inverse collection frequency, the searcher inspected more records on ranked than Boolean and so achieved a higher recall but lower precision; however, the presentation order of the relevant records, was, on average, very similar in both systems. Concludes also that: query reformulation was quite strongly practised in ranked searching but does not appear to have been effective; the term pairs proximate weithing methods in experiment 2 enhanced precision on both test collections when used with inverse collection frequency weighting (ICF); and the design plan for an interactive prototype adds to a selection of match algorithms other devices, such as obligatory and unwanted term marking, evidence for this being found from think aloud comments
    Source
    Journal of information science. 20(1994) no.6, S.389-398

Languages

  • e 296
  • d 9
  • chi 2
  • More… Less…

Types

  • a 286
  • m 10
  • el 8
  • s 4
  • r 3
  • p 2
  • x 1
  • More… Less…