Search (5 results, page 1 of 1)

  • × author_ss:"Levene, M."
  • × language_ss:"e"
  1. Zhitomirsky-Geffet, M.; Bar-Ilan, J.; Levene, M.: Analysis of change in users' assessment of search results over time (2017) 0.03
    0.031401675 = product of:
      0.09420502 = sum of:
        0.085665286 = weight(_text_:ranking in 3593) [ClassicSimilarity], result of:
          0.085665286 = score(doc=3593,freq=4.0), product of:
            0.20271951 = queryWeight, product of:
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.03747799 = queryNorm
            0.42258036 = fieldWeight in 3593, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3593)
        0.008539738 = product of:
          0.025619213 = sum of:
            0.025619213 = weight(_text_:29 in 3593) [ClassicSimilarity], result of:
              0.025619213 = score(doc=3593,freq=2.0), product of:
                0.13183585 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03747799 = queryNorm
                0.19432661 = fieldWeight in 3593, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3593)
          0.33333334 = coord(1/3)
      0.33333334 = coord(2/6)
    
    Abstract
    We present the first systematic study of the influence of time on user judgements for rankings and relevance grades of web search engine results. The goal of this study is to evaluate the change in user assessment of search results and explore how users' judgements change. To this end, we conducted a large-scale user study with 86 participants who evaluated 2 different queries and 4 diverse result sets twice with an interval of 2 months. To analyze the results we investigate whether 2 types of patterns of user behavior from the theory of categorical thinking hold for the case of evaluation of search results: (a) coarseness and (b) locality. To quantify these patterns we devised 2 new measures of change in user judgements and distinguish between local (when users swap between close ranks and relevance values) and nonlocal changes. Two types of judgements were considered in this study: (a) relevance on a 4-point scale, and (b) ranking on a 10-point scale without ties. We found that users tend to change their judgements of the results over time in about 50% of cases for relevance and in 85% of cases for ranking. However, the majority of these changes were local.
    Date
    16.11.2017 13:33:29
  2. Zhitomirsky-Geffet, M.; Bar-Ilan, J.; Levene, M.: Testing the stability of "wisdom of crowds" judgments of search results over time and their similarity with the search engine rankings (2016) 0.03
    0.025100855 = product of:
      0.07530256 = sum of:
        0.06853223 = weight(_text_:ranking in 3071) [ClassicSimilarity], result of:
          0.06853223 = score(doc=3071,freq=4.0), product of:
            0.20271951 = queryWeight, product of:
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.03747799 = queryNorm
            0.33806428 = fieldWeight in 3071, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.03125 = fieldNorm(doc=3071)
        0.0067703333 = product of:
          0.020311 = sum of:
            0.020311 = weight(_text_:22 in 3071) [ClassicSimilarity], result of:
              0.020311 = score(doc=3071,freq=2.0), product of:
                0.13124153 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03747799 = queryNorm
                0.15476047 = fieldWeight in 3071, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03125 = fieldNorm(doc=3071)
          0.33333334 = coord(1/3)
      0.33333334 = coord(2/6)
    
    Abstract
    Purpose - One of the under-explored aspects in the process of user information seeking behaviour is influence of time on relevance evaluation. It has been shown in previous studies that individual users might change their assessment of search results over time. It is also known that aggregated judgements of multiple individual users can lead to correct and reliable decisions; this phenomenon is known as the "wisdom of crowds". The purpose of this paper is to examine whether aggregated judgements will be more stable and thus more reliable over time than individual user judgements. Design/methodology/approach - In this study two simple measures are proposed to calculate the aggregated judgements of search results and compare their reliability and stability to individual user judgements. In addition, the aggregated "wisdom of crowds" judgements were used as a means to compare the differences between human assessments of search results and search engine's rankings. A large-scale user study was conducted with 87 participants who evaluated two different queries and four diverse result sets twice, with an interval of two months. Two types of judgements were considered in this study: relevance on a four-point scale, and ranking on a ten-point scale without ties. Findings - It was found that aggregated judgements are much more stable than individual user judgements, yet they are quite different from search engine rankings. Practical implications - The proposed "wisdom of crowds"-based approach provides a reliable reference point for the evaluation of search engines. This is also important for exploring the need of personalisation and adapting search engine's ranking over time to changes in users preferences. Originality/value - This is a first study that applies the notion of "wisdom of crowds" to examine an under-explored in the literature phenomenon of "change in time" in user evaluation of relevance.
    Date
    20. 1.2015 18:30:22
  3. Zhitomirsky-Geffet, M.; Bar-Ilan, J.; Levene, M.: Categorical relevance judgment (2018) 0.02
    0.02303808 = product of:
      0.06911424 = sum of:
        0.0605745 = weight(_text_:ranking in 4457) [ClassicSimilarity], result of:
          0.0605745 = score(doc=4457,freq=2.0), product of:
            0.20271951 = queryWeight, product of:
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.03747799 = queryNorm
            0.29880944 = fieldWeight in 4457, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4457)
        0.008539738 = product of:
          0.025619213 = sum of:
            0.025619213 = weight(_text_:29 in 4457) [ClassicSimilarity], result of:
              0.025619213 = score(doc=4457,freq=2.0), product of:
                0.13183585 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03747799 = queryNorm
                0.19432661 = fieldWeight in 4457, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4457)
          0.33333334 = coord(1/3)
      0.33333334 = coord(2/6)
    
    Abstract
    In this study we aim to explore users' behavior when assessing search results relevance based on the hypothesis of categorical thinking. To investigate how users categories search engine results, we perform several experiments where users are asked to group a list of 20 search results into several categories, while attaching a relevance judgment to each formed category. Moreover, to determine how users change their minds over time, each experiment was repeated three times under the same conditions, with a gap of one month between rounds. The results show that on average users form 4-5 categories. Within each round the size of a category decreases with the relevance of a category. To measure the agreement between the search engine's ranking and the users' relevance judgments, we defined two novel similarity measures, the average concordance and the MinMax swap ratio. Similarity is shown to be the highest for the third round as the users' opinion stabilizes. Qualitative analysis uncovered some interesting points that users tended to categories results by type and reliability of their source, and particularly, found commercial sites less trustworthy, and attached high relevance to Wikipedia when their prior domain knowledge was limited.
    Date
    29. 9.2018 11:35:30
  4. Bar-Ilan, J.; Levene, M.: ¬The hw-rank : an h-index variant for ranking web pages (2015) 0.02
    0.020191502 = product of:
      0.121149 = sum of:
        0.121149 = weight(_text_:ranking in 1694) [ClassicSimilarity], result of:
          0.121149 = score(doc=1694,freq=2.0), product of:
            0.20271951 = queryWeight, product of:
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.03747799 = queryNorm
            0.5976189 = fieldWeight in 1694, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.078125 = fieldNorm(doc=1694)
      0.16666667 = coord(1/6)
    
  5. Bar-Ilan, J.; Levene, M.; Mat-Hassan, M.: Methods for evaluating dynamic changes in search engine rankings : a case study (2006) 0.01
    0.011422038 = product of:
      0.06853223 = sum of:
        0.06853223 = weight(_text_:ranking in 616) [ClassicSimilarity], result of:
          0.06853223 = score(doc=616,freq=4.0), product of:
            0.20271951 = queryWeight, product of:
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.03747799 = queryNorm
            0.33806428 = fieldWeight in 616, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.03125 = fieldNorm(doc=616)
      0.16666667 = coord(1/6)
    
    Abstract
    Purpose - The objective of this paper is to characterize the changes in the rankings of the top ten results of major search engines over time and to compare the rankings between these engines. Design/methodology/approach - The papers compare rankings of the top-ten results of the search engines Google and AlltheWeb on ten identical queries over a period of three weeks. Only the top-ten results were considered, since users do not normally inspect more than the first results page returned by a search engine. The experiment was repeated twice, in October 2003 and in January 2004, in order to assess changes to the top-ten results of some of the queries during the three months interval. In order to assess the changes in the rankings, three measures were computed for each data collection point and each search engine. Findings - The findings in this paper show that the rankings of AlltheWeb were highly stable over each period, while the rankings of Google underwent constant yet minor changes, with occasional major ones. Changes over time can be explained by the dynamic nature of the web or by fluctuations in the search engines' indexes. The top-ten results of the two search engines had surprisingly low overlap. With such small overlap, the task of comparing the rankings of the two engines becomes extremely challenging. Originality/value - The paper shows that because of the abundance of information on the web, ranking search results is of extreme importance. The paper compares several measures for computing the similarity between rankings of search tools, and shows that none of the measures is fully satisfactory as a standalone measure. It also demonstrates the apparent differences in the ranking algorithms of two widely used search engines.