Search (3 results, page 1 of 1)

Zhitomirsky-Geffet, M.; Bar-Ilan, J.; Levene, M.: Testing the stability of "wisdom of crowds" judgments of search results over time and their similarity with the search engine rankings (2016) 0.02
```
0.016059995 = product of:
  0.03211999 = sum of:
    0.03211999 = sum of:
      0.0071598003 = weight(_text_:a in 3071) [ClassicSimilarity], result of:
        0.0071598003 = score(doc=3071,freq=14.0), product of:
          0.053105544 = queryWeight, product of:
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.046056706 = queryNorm
          0.13482209 = fieldWeight in 3071, product of:
            3.7416575 = tf(freq=14.0), with freq of:
              14.0 = termFreq=14.0
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.03125 = fieldNorm(doc=3071)
      0.02496019 = weight(_text_:22 in 3071) [ClassicSimilarity], result of:
        0.02496019 = score(doc=3071,freq=2.0), product of:
          0.16128273 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046056706 = queryNorm
          0.15476047 = fieldWeight in 3071, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.03125 = fieldNorm(doc=3071)
  0.5 = coord(1/2)
```
Abstract

Purpose - One of the under-explored aspects in the process of user information seeking behaviour is influence of time on relevance evaluation. It has been shown in previous studies that individual users might change their assessment of search results over time. It is also known that aggregated judgements of multiple individual users can lead to correct and reliable decisions; this phenomenon is known as the "wisdom of crowds". The purpose of this paper is to examine whether aggregated judgements will be more stable and thus more reliable over time than individual user judgements. Design/methodology/approach - In this study two simple measures are proposed to calculate the aggregated judgements of search results and compare their reliability and stability to individual user judgements. In addition, the aggregated "wisdom of crowds" judgements were used as a means to compare the differences between human assessments of search results and search engine's rankings. A large-scale user study was conducted with 87 participants who evaluated two different queries and four diverse result sets twice, with an interval of two months. Two types of judgements were considered in this study: relevance on a four-point scale, and ranking on a ten-point scale without ties. Findings - It was found that aggregated judgements are much more stable than individual user judgements, yet they are quite different from search engine rankings. Practical implications - The proposed "wisdom of crowds"-based approach provides a reliable reference point for the evaluation of search engines. This is also important for exploring the need of personalisation and adapting search engine's ranking over time to changes in users preferences. Originality/value - This is a first study that applies the notion of "wisdom of crowds" to examine an under-explored in the literature phenomenon of "change in time" in user evaluation of relevance.

Date

20. 1.2015 18:30:22

Type

a
Zhitomirsky-Geffet, M.; Bar-Ilan, J.; Levene, M.: Analysis of change in users' assessment of search results over time (2017) 0.00
```
0.0020714647 = product of:
  0.0041429293 = sum of:
    0.0041429293 = product of:
      0.008285859 = sum of:
        0.008285859 = weight(_text_:a in 3593) [ClassicSimilarity], result of:
          0.008285859 = score(doc=3593,freq=12.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.15602624 = fieldWeight in 3593, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3593)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

We present the first systematic study of the influence of time on user judgements for rankings and relevance grades of web search engine results. The goal of this study is to evaluate the change in user assessment of search results and explore how users' judgements change. To this end, we conducted a large-scale user study with 86 participants who evaluated 2 different queries and 4 diverse result sets twice with an interval of 2 months. To analyze the results we investigate whether 2 types of patterns of user behavior from the theory of categorical thinking hold for the case of evaluation of search results: (a) coarseness and (b) locality. To quantify these patterns we devised 2 new measures of change in user judgements and distinguish between local (when users swap between close ranks and relevance values) and nonlocal changes. Two types of judgements were considered in this study: (a) relevance on a 4-point scale, and (b) ranking on a 10-point scale without ties. We found that users tend to change their judgements of the results over time in about 50% of cases for relevance and in 85% of cases for ranking. However, the majority of these changes were local.

Type

a
Zhitomirsky-Geffet, M.; Bar-Ilan, J.; Levene, M.: Categorical relevance judgment (2018) 0.00
```
0.0020714647 = product of:
  0.0041429293 = sum of:
    0.0041429293 = product of:
      0.008285859 = sum of:
        0.008285859 = weight(_text_:a in 4457) [ClassicSimilarity], result of:
          0.008285859 = score(doc=4457,freq=12.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.15602624 = fieldWeight in 4457, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4457)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

In this study we aim to explore users' behavior when assessing search results relevance based on the hypothesis of categorical thinking. To investigate how users categories search engine results, we perform several experiments where users are asked to group a list of 20 search results into several categories, while attaching a relevance judgment to each formed category. Moreover, to determine how users change their minds over time, each experiment was repeated three times under the same conditions, with a gap of one month between rounds. The results show that on average users form 4-5 categories. Within each round the size of a category decreases with the relevance of a category. To measure the agreement between the search engine's ranking and the users' relevance judgments, we defined two novel similarity measures, the average concordance and the MinMax swap ratio. Similarity is shown to be the highest for the third round as the users' opinion stabilizes. Qualitative analysis uncovered some interesting points that users tended to categories results by type and reliability of their source, and particularly, found commercial sites less trustworthy, and attached high relevance to Wikipedia when their prior domain knowledge was limited.

Type

a

Search (3 results, page 1 of 1)

Themes