Document (#40326)

Author
Aletras, N.
Baldwin, T.
Lau, J.H.
Stevenson, M.
Title
Evaluating topic representations for exploring document collections
Source
Journal of the Association for Information Science and Technology. 68(2017) no.1, S.154-167
Year
2017
Abstract
Topic models have been shown to be a useful way of representing the content of large document collections, for example, via visualization interfaces (topic browsers). These systems enable users to explore collections by way of latent topics. A standard way to represent a topic is using a term list; that is the top-n words with highest conditional probability within the topic. Other topic representations such as textual and image labels also have been proposed. However, there has been no comparison of these alternative representations. In this article, we compare 3 different topic representations in a document retrieval task. Participants were asked to retrieve relevant documents based on predefined queries within a fixed time limit, presenting topics in one of the following modalities: (a) lists of terms, (b) textual phrase labels, and (c) image labels. Results show that textual labels are easier for users to interpret than are term lists and image labels. Moreover, the precision of retrieved documents for textual and image labels is comparable to the precision achieved by representing topics using term lists, demonstrating that labeling methods are an effective alternative topic representation.
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23574/full.
Theme
Visualisierung

Similar documents (author)

  1. Stevenson, G.: ¬The Mainzer Sachkatalog and his background (1970) 5.81
    5.81187 = sum of:
      5.81187 = weight(author_txt:stevenson in 754) [ClassicSimilarity], result of:
        5.81187 = fieldWeight in 754, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.298992 = idf(docFreq=10, maxDocs=44218)
          0.625 = fieldNorm(doc=754)
    
  2. Stevenson, G.: ¬The historical context: traditional classification since 1950 (1974) 5.81
    5.81187 = sum of:
      5.81187 = weight(author_txt:stevenson in 1259) [ClassicSimilarity], result of:
        5.81187 = fieldWeight in 1259, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.298992 = idf(docFreq=10, maxDocs=44218)
          0.625 = fieldNorm(doc=1259)
    
  3. Stevenson, G.: Andreas Schleiermacher's bibliographic classification and its relationship to the Dewey Decimal Classification (1978) 5.81
    5.81187 = sum of:
      5.81187 = weight(author_txt:stevenson in 3550) [ClassicSimilarity], result of:
        5.81187 = fieldWeight in 3550, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.298992 = idf(docFreq=10, maxDocs=44218)
          0.625 = fieldNorm(doc=3550)
    
  4. McDonald, S.; Stevenson, R.J.: Navigation in hyperspace : an evaluation of the effects of navigational tools and subject matter expertise on browsing and information retrieval in hypertext (1998) 4.65
    4.649496 = sum of:
      4.649496 = weight(author_txt:stevenson in 3760) [ClassicSimilarity], result of:
        4.649496 = fieldWeight in 3760, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.298992 = idf(docFreq=10, maxDocs=44218)
          0.5 = fieldNorm(doc=3760)
    
  5. Cole, C.; Mandelblatt, B.; Stevenson, J.: Visualizing a high recall search strategy output for undergraduates in an exploration stage of researching a term paper (2002) 3.49
    3.487122 = sum of:
      3.487122 = weight(author_txt:stevenson in 2575) [ClassicSimilarity], result of:
        3.487122 = fieldWeight in 2575, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.298992 = idf(docFreq=10, maxDocs=44218)
          0.375 = fieldNorm(doc=2575)
    

Similar documents (content)

  1. Alkhodair, S.A.; Fung, B.C.M.; Patrick, O.R.; Hung, C.K.: Improving interpretations of topic modeling in microblogs (2018) 0.25
    0.24808182 = sum of:
      0.24808182 = product of:
        0.8860065 = sum of:
          0.057466786 = weight(abstract_txt:labeling in 4181) [ClassicSimilarity], result of:
            0.057466786 = score(doc=4181,freq=1.0), product of:
              0.117493674 = queryWeight, product of:
                1.0856566 = boost
                7.825686 = idf(docFreq=47, maxDocs=44218)
                0.013829281 = queryNorm
              0.48910537 = fieldWeight in 4181, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.825686 = idf(docFreq=47, maxDocs=44218)
                0.0625 = fieldNorm(doc=4181)
          0.029076613 = weight(abstract_txt:documents in 4181) [ClassicSimilarity], result of:
            0.029076613 = score(doc=4181,freq=3.0), product of:
              0.06517315 = queryWeight, product of:
                1.1434959 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.013829281 = queryNorm
              0.44614407 = fieldWeight in 4181, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.0625 = fieldNorm(doc=4181)
          0.028453438 = weight(abstract_txt:document in 4181) [ClassicSimilarity], result of:
            0.028453438 = score(doc=4181,freq=1.0), product of:
              0.106055565 = queryWeight, product of:
                1.7865394 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.013829281 = queryNorm
              0.26828802 = fieldWeight in 4181, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.0625 = fieldNorm(doc=4181)
          0.037200667 = weight(abstract_txt:collections in 4181) [ClassicSimilarity], result of:
            0.037200667 = score(doc=4181,freq=1.0), product of:
              0.12680726 = queryWeight, product of:
                1.9535204 = boost
                4.693822 = idf(docFreq=1099, maxDocs=44218)
                0.013829281 = queryNorm
              0.29336387 = fieldWeight in 4181, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.693822 = idf(docFreq=1099, maxDocs=44218)
                0.0625 = fieldNorm(doc=4181)
          0.10583615 = weight(abstract_txt:topics in 4181) [ClassicSimilarity], result of:
            0.10583615 = score(doc=4181,freq=5.0), product of:
              0.1488937 = queryWeight, product of:
                2.1168206 = boost
                5.086191 = idf(docFreq=742, maxDocs=44218)
                0.013829281 = queryNorm
              0.71081686 = fieldWeight in 4181, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.086191 = idf(docFreq=742, maxDocs=44218)
                0.0625 = fieldNorm(doc=4181)
          0.35197875 = weight(abstract_txt:topic in 4181) [ClassicSimilarity], result of:
            0.35197875 = score(doc=4181,freq=8.0), product of:
              0.39332134 = queryWeight, product of:
                5.618288 = boost
                5.062254 = idf(docFreq=760, maxDocs=44218)
                0.013829281 = queryNorm
              0.8948885 = fieldWeight in 4181, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                5.062254 = idf(docFreq=760, maxDocs=44218)
                0.0625 = fieldNorm(doc=4181)
          0.27599406 = weight(abstract_txt:labels in 4181) [ClassicSimilarity], result of:
            0.27599406 = score(doc=4181,freq=1.0), product of:
              0.6077432 = queryWeight, product of:
                6.048127 = boost
                7.2660704 = idf(docFreq=83, maxDocs=44218)
                0.013829281 = queryNorm
              0.4541294 = fieldWeight in 4181, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2660704 = idf(docFreq=83, maxDocs=44218)
                0.0625 = fieldNorm(doc=4181)
        0.28 = coord(7/25)
    
  2. Ouyang, Y.; Li, W.; Li, S.; Lu, Q.: Intertopic information mining for query-based summarization (2010) 0.17
    0.16874148 = sum of:
      0.16874148 = product of:
        0.70308954 = sum of:
          0.016787391 = weight(abstract_txt:documents in 3459) [ClassicSimilarity], result of:
            0.016787391 = score(doc=3459,freq=1.0), product of:
              0.06517315 = queryWeight, product of:
                1.1434959 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.013829281 = queryNorm
              0.2575814 = fieldWeight in 3459, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.0625 = fieldNorm(doc=3459)
          0.017030409 = weight(abstract_txt:been in 3459) [ClassicSimilarity], result of:
            0.017030409 = score(doc=3459,freq=1.0), product of:
              0.075322896 = queryWeight, product of:
                1.5055993 = boost
                3.617579 = idf(docFreq=3226, maxDocs=44218)
                0.013829281 = queryNorm
              0.22609869 = fieldWeight in 3459, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.617579 = idf(docFreq=3226, maxDocs=44218)
                0.0625 = fieldNorm(doc=3459)
          0.028453438 = weight(abstract_txt:document in 3459) [ClassicSimilarity], result of:
            0.028453438 = score(doc=3459,freq=1.0), product of:
              0.106055565 = queryWeight, product of:
                1.7865394 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.013829281 = queryNorm
              0.26828802 = fieldWeight in 3459, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.0625 = fieldNorm(doc=3459)
          0.115937695 = weight(abstract_txt:topics in 3459) [ClassicSimilarity], result of:
            0.115937695 = score(doc=3459,freq=6.0), product of:
              0.1488937 = queryWeight, product of:
                2.1168206 = boost
                5.086191 = idf(docFreq=742, maxDocs=44218)
                0.013829281 = queryNorm
              0.77866083 = fieldWeight in 3459, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.086191 = idf(docFreq=742, maxDocs=44218)
                0.0625 = fieldNorm(doc=3459)
          0.24888656 = weight(abstract_txt:topic in 3459) [ClassicSimilarity], result of:
            0.24888656 = score(doc=3459,freq=4.0), product of:
              0.39332134 = queryWeight, product of:
                5.618288 = boost
                5.062254 = idf(docFreq=760, maxDocs=44218)
                0.013829281 = queryNorm
              0.63278174 = fieldWeight in 3459, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.062254 = idf(docFreq=760, maxDocs=44218)
                0.0625 = fieldNorm(doc=3459)
          0.27599406 = weight(abstract_txt:labels in 3459) [ClassicSimilarity], result of:
            0.27599406 = score(doc=3459,freq=1.0), product of:
              0.6077432 = queryWeight, product of:
                6.048127 = boost
                7.2660704 = idf(docFreq=83, maxDocs=44218)
                0.013829281 = queryNorm
              0.4541294 = fieldWeight in 3459, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2660704 = idf(docFreq=83, maxDocs=44218)
                0.0625 = fieldNorm(doc=3459)
        0.24 = coord(6/25)
    
  3. Soricut, R.; Marcu, D.: Abstractive headline generation using WIDL-expressions (2007) 0.16
    0.16370496 = sum of:
      0.16370496 = product of:
        0.682104 = sum of:
          0.08591546 = weight(abstract_txt:representing in 943) [ClassicSimilarity], result of:
            0.08591546 = score(doc=943,freq=2.0), product of:
              0.13238662 = queryWeight, product of:
                1.6297549 = boost
                5.8738413 = idf(docFreq=337, maxDocs=44218)
                0.013829281 = queryNorm
              0.6489739 = fieldWeight in 943, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.8738413 = idf(docFreq=337, maxDocs=44218)
                0.078125 = fieldNorm(doc=943)
          0.07113359 = weight(abstract_txt:document in 943) [ClassicSimilarity], result of:
            0.07113359 = score(doc=943,freq=4.0), product of:
              0.106055565 = queryWeight, product of:
                1.7865394 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.013829281 = queryNorm
              0.67072004 = fieldWeight in 943, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.078125 = fieldNorm(doc=943)
          0.059164204 = weight(abstract_txt:topics in 943) [ClassicSimilarity], result of:
            0.059164204 = score(doc=943,freq=1.0), product of:
              0.1488937 = queryWeight, product of:
                2.1168206 = boost
                5.086191 = idf(docFreq=742, maxDocs=44218)
                0.013829281 = queryNorm
              0.3973587 = fieldWeight in 943, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.086191 = idf(docFreq=742, maxDocs=44218)
                0.078125 = fieldNorm(doc=943)
          0.18041235 = weight(abstract_txt:textual in 943) [ClassicSimilarity], result of:
            0.18041235 = score(doc=943,freq=2.0), product of:
              0.27351683 = queryWeight, product of:
                3.312892 = boost
                5.9700394 = idf(docFreq=306, maxDocs=44218)
                0.013829281 = queryNorm
              0.65960234 = fieldWeight in 943, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9700394 = idf(docFreq=306, maxDocs=44218)
                0.078125 = fieldNorm(doc=943)
          0.12992425 = weight(abstract_txt:representations in 943) [ClassicSimilarity], result of:
            0.12992425 = score(doc=943,freq=1.0), product of:
              0.27687052 = queryWeight, product of:
                3.3331404 = boost
                6.006528 = idf(docFreq=295, maxDocs=44218)
                0.013829281 = queryNorm
              0.46925998 = fieldWeight in 943, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.006528 = idf(docFreq=295, maxDocs=44218)
                0.078125 = fieldNorm(doc=943)
          0.1555541 = weight(abstract_txt:topic in 943) [ClassicSimilarity], result of:
            0.1555541 = score(doc=943,freq=1.0), product of:
              0.39332134 = queryWeight, product of:
                5.618288 = boost
                5.062254 = idf(docFreq=760, maxDocs=44218)
                0.013829281 = queryNorm
              0.3954886 = fieldWeight in 943, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.062254 = idf(docFreq=760, maxDocs=44218)
                0.078125 = fieldNorm(doc=943)
        0.24 = coord(6/25)
    
  4. Xu, J.; Croft, W.B.: Topic-based language models for distributed retrieval (2000) 0.15
    0.15341663 = sum of:
      0.15341663 = product of:
        0.639236 = sum of:
          0.071833484 = weight(abstract_txt:labeling in 38) [ClassicSimilarity], result of:
            0.071833484 = score(doc=38,freq=1.0), product of:
              0.117493674 = queryWeight, product of:
                1.0856566 = boost
                7.825686 = idf(docFreq=47, maxDocs=44218)
                0.013829281 = queryNorm
              0.6113817 = fieldWeight in 38, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.825686 = idf(docFreq=47, maxDocs=44218)
                0.078125 = fieldNorm(doc=38)
          0.036345765 = weight(abstract_txt:documents in 38) [ClassicSimilarity], result of:
            0.036345765 = score(doc=38,freq=3.0), product of:
              0.06517315 = queryWeight, product of:
                1.1434959 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.013829281 = queryNorm
              0.5576801 = fieldWeight in 38, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.078125 = fieldNorm(doc=38)
          0.05029905 = weight(abstract_txt:document in 38) [ClassicSimilarity], result of:
            0.05029905 = score(doc=38,freq=2.0), product of:
              0.106055565 = queryWeight, product of:
                1.7865394 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.013829281 = queryNorm
              0.4742707 = fieldWeight in 38, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.078125 = fieldNorm(doc=38)
          0.09300166 = weight(abstract_txt:collections in 38) [ClassicSimilarity], result of:
            0.09300166 = score(doc=38,freq=4.0), product of:
              0.12680726 = queryWeight, product of:
                1.9535204 = boost
                4.693822 = idf(docFreq=1099, maxDocs=44218)
                0.013829281 = queryNorm
              0.73340964 = fieldWeight in 38, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.693822 = idf(docFreq=1099, maxDocs=44218)
                0.078125 = fieldNorm(doc=38)
          0.11832841 = weight(abstract_txt:topics in 38) [ClassicSimilarity], result of:
            0.11832841 = score(doc=38,freq=4.0), product of:
              0.1488937 = queryWeight, product of:
                2.1168206 = boost
                5.086191 = idf(docFreq=742, maxDocs=44218)
                0.013829281 = queryNorm
              0.7947174 = fieldWeight in 38, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.086191 = idf(docFreq=742, maxDocs=44218)
                0.078125 = fieldNorm(doc=38)
          0.2694276 = weight(abstract_txt:topic in 38) [ClassicSimilarity], result of:
            0.2694276 = score(doc=38,freq=3.0), product of:
              0.39332134 = queryWeight, product of:
                5.618288 = boost
                5.062254 = idf(docFreq=760, maxDocs=44218)
                0.013829281 = queryNorm
              0.6850063 = fieldWeight in 38, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.062254 = idf(docFreq=760, maxDocs=44218)
                0.078125 = fieldNorm(doc=38)
        0.24 = coord(6/25)
    
  5. Li, X.; Zhang, A.; Li, C.; Ouyang, J.; Cai, Y.: Exploring coherent topics by topic modeling with term weighting (2018) 0.14
    0.14306551 = sum of:
      0.14306551 = product of:
        0.5961063 = sum of:
          0.06612285 = weight(abstract_txt:conditional in 5045) [ClassicSimilarity], result of:
            0.06612285 = score(doc=5045,freq=1.0), product of:
              0.12901422 = queryWeight, product of:
                1.1376379 = boost
                8.200379 = idf(docFreq=32, maxDocs=44218)
                0.013829281 = queryNorm
              0.5125237 = fieldWeight in 5045, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.200379 = idf(docFreq=32, maxDocs=44218)
                0.0625 = fieldNorm(doc=5045)
          0.028453438 = weight(abstract_txt:document in 5045) [ClassicSimilarity], result of:
            0.028453438 = score(doc=5045,freq=1.0), product of:
              0.106055565 = queryWeight, product of:
                1.7865394 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.013829281 = queryNorm
              0.26828802 = fieldWeight in 5045, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.0625 = fieldNorm(doc=5045)
          0.06895762 = weight(abstract_txt:term in 5045) [ClassicSimilarity], result of:
            0.06895762 = score(doc=5045,freq=3.0), product of:
              0.1326757 = queryWeight, product of:
                1.9982121 = boost
                4.8012047 = idf(docFreq=987, maxDocs=44218)
                0.013829281 = queryNorm
              0.51974565 = fieldWeight in 5045, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.8012047 = idf(docFreq=987, maxDocs=44218)
                0.0625 = fieldNorm(doc=5045)
          0.06693666 = weight(abstract_txt:topics in 5045) [ClassicSimilarity], result of:
            0.06693666 = score(doc=5045,freq=2.0), product of:
              0.1488937 = queryWeight, product of:
                2.1168206 = boost
                5.086191 = idf(docFreq=742, maxDocs=44218)
                0.013829281 = queryNorm
              0.44956002 = fieldWeight in 5045, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.086191 = idf(docFreq=742, maxDocs=44218)
                0.0625 = fieldNorm(doc=5045)
          0.060813233 = weight(abstract_txt:lists in 5045) [ClassicSimilarity], result of:
            0.060813233 = score(doc=5045,freq=1.0), product of:
              0.17597151 = queryWeight, product of:
                2.3012671 = boost
                5.529371 = idf(docFreq=476, maxDocs=44218)
                0.013829281 = queryNorm
              0.34558567 = fieldWeight in 5045, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.529371 = idf(docFreq=476, maxDocs=44218)
                0.0625 = fieldNorm(doc=5045)
          0.30482253 = weight(abstract_txt:topic in 5045) [ClassicSimilarity], result of:
            0.30482253 = score(doc=5045,freq=6.0), product of:
              0.39332134 = queryWeight, product of:
                5.618288 = boost
                5.062254 = idf(docFreq=760, maxDocs=44218)
                0.013829281 = queryNorm
              0.7749962 = fieldWeight in 5045, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.062254 = idf(docFreq=760, maxDocs=44218)
                0.0625 = fieldNorm(doc=5045)
        0.24 = coord(6/25)