Document (#40327)

Author
Aletras, N.
Baldwin, T.
Lau, J.H.
Stevenson, M.
Title
Evaluating topic representations for exploring document collections
Source
Journal of the Association for Information Science and Technology. 68(2017) no.1, S.154-167
Year
2017
Abstract
Topic models have been shown to be a useful way of representing the content of large document collections, for example, via visualization interfaces (topic browsers). These systems enable users to explore collections by way of latent topics. A standard way to represent a topic is using a term list; that is the top-n words with highest conditional probability within the topic. Other topic representations such as textual and image labels also have been proposed. However, there has been no comparison of these alternative representations. In this article, we compare 3 different topic representations in a document retrieval task. Participants were asked to retrieve relevant documents based on predefined queries within a fixed time limit, presenting topics in one of the following modalities: (a) lists of terms, (b) textual phrase labels, and (c) image labels. Results show that textual labels are easier for users to interpret than are term lists and image labels. Moreover, the precision of retrieved documents for textual and image labels is comparable to the precision achieved by representing topics using term lists, demonstrating that labeling methods are an effective alternative topic representation.
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23574/full.
Theme
Visualisierung

Similar documents (author)

  1. Stevenson, G.: ¬The Mainzer Sachkatalog and his background (1970) 5.79
    5.790622 = sum of:
      5.790622 = weight(author_txt:stevenson in 754) [ClassicSimilarity], result of:
        5.790622 = fieldWeight in 754, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.264996 = idf(docFreq=10, maxDocs=42740)
          0.625 = fieldNorm(doc=754)
    
  2. Stevenson, G.: ¬The historical context: traditional classification since 1950 (1974) 5.79
    5.790622 = sum of:
      5.790622 = weight(author_txt:stevenson in 1259) [ClassicSimilarity], result of:
        5.790622 = fieldWeight in 1259, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.264996 = idf(docFreq=10, maxDocs=42740)
          0.625 = fieldNorm(doc=1259)
    
  3. Stevenson, G.: Andreas Schleiermacher's bibliographic classification and its relationship to the Dewey Decimal Classification (1978) 5.79
    5.790622 = sum of:
      5.790622 = weight(author_txt:stevenson in 3550) [ClassicSimilarity], result of:
        5.790622 = fieldWeight in 3550, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.264996 = idf(docFreq=10, maxDocs=42740)
          0.625 = fieldNorm(doc=3550)
    
  4. McDonald, S.; Stevenson, R.J.: Navigation in hyperspace : an evaluation of the effects of navigational tools and subject matter expertise on browsing and information retrieval in hypertext (1998) 4.63
    4.632498 = sum of:
      4.632498 = weight(author_txt:stevenson in 4761) [ClassicSimilarity], result of:
        4.632498 = fieldWeight in 4761, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.264996 = idf(docFreq=10, maxDocs=42740)
          0.5 = fieldNorm(doc=4761)
    
  5. Cole, C.; Mandelblatt, B.; Stevenson, J.: Visualizing a high recall search strategy output for undergraduates in an exploration stage of researching a term paper (2002) 3.47
    3.4743733 = sum of:
      3.4743733 = weight(author_txt:stevenson in 3576) [ClassicSimilarity], result of:
        3.4743733 = fieldWeight in 3576, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.264996 = idf(docFreq=10, maxDocs=42740)
          0.375 = fieldNorm(doc=3576)
    

Similar documents (content)

  1. Alkhodair, S.A.; Fung, B.C.M.; Patrick, O.R.; Hung, C.K.: Improving interpretations of topic modeling in microblogs (2018) 0.25
    0.25091046 = sum of:
      0.25091046 = product of:
        0.8961088 = sum of:
          0.057543803 = weight(abstract_txt:labeling in 182) [ClassicSimilarity], result of:
            0.057543803 = score(doc=182,freq=1.0), product of:
              0.11685948 = queryWeight, product of:
                1.0947227 = boost
                7.8787007 = idf(docFreq=43, maxDocs=42740)
                0.013548936 = queryNorm
              0.4924188 = fieldWeight in 182, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.8787007 = idf(docFreq=43, maxDocs=42740)
                0.0625 = fieldNorm(doc=182)
          0.028409136 = weight(abstract_txt:documents in 182) [ClassicSimilarity], result of:
            0.028409136 = score(doc=182,freq=3.0), product of:
              0.06376854 = queryWeight, product of:
                1.1436429 = boost
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.013548936 = queryNorm
              0.44550392 = fieldWeight in 182, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.0625 = fieldNorm(doc=182)
          0.027689844 = weight(abstract_txt:document in 182) [ClassicSimilarity], result of:
            0.027689844 = score(doc=182,freq=1.0), product of:
              0.10349491 = queryWeight, product of:
                1.7844005 = boost
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.013548936 = queryNorm
              0.26754788 = fieldWeight in 182, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.0625 = fieldNorm(doc=182)
          0.036663078 = weight(abstract_txt:collections in 182) [ClassicSimilarity], result of:
            0.036663078 = score(doc=182,freq=1.0), product of:
              0.1247933 = queryWeight, product of:
                1.959424 = boost
                4.700647 = idf(docFreq=1055, maxDocs=42740)
                0.013548936 = queryNorm
              0.29379043 = fieldWeight in 182, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.700647 = idf(docFreq=1055, maxDocs=42740)
                0.0625 = fieldNorm(doc=182)
          0.105963774 = weight(abstract_txt:topics in 182) [ClassicSimilarity], result of:
            0.105963774 = score(doc=182,freq=5.0), product of:
              0.14807677 = queryWeight, product of:
                2.1344023 = boost
                5.1204185 = idf(docFreq=693, maxDocs=42740)
                0.013548936 = queryNorm
              0.71560025 = fieldWeight in 182, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.1204185 = idf(docFreq=693, maxDocs=42740)
                0.0625 = fieldNorm(doc=182)
          0.35209054 = weight(abstract_txt:topic in 182) [ClassicSimilarity], result of:
            0.35209054 = score(doc=182,freq=8.0), product of:
              0.39093193 = queryWeight, product of:
                5.663276 = boost
                5.0948124 = idf(docFreq=711, maxDocs=42740)
                0.013548936 = queryNorm
              0.90064406 = fieldWeight in 182, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                5.0948124 = idf(docFreq=711, maxDocs=42740)
                0.0625 = fieldNorm(doc=182)
          0.28774863 = weight(abstract_txt:labels in 182) [ClassicSimilarity], result of:
            0.28774863 = score(doc=182,freq=1.0), product of:
              0.6209513 = queryWeight, product of:
                6.181254 = boost
                7.4143953 = idf(docFreq=69, maxDocs=42740)
                0.013548936 = queryNorm
              0.4633997 = fieldWeight in 182, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4143953 = idf(docFreq=69, maxDocs=42740)
                0.0625 = fieldNorm(doc=182)
        0.28 = coord(7/25)
    
  2. Ouyang, Y.; Li, W.; Li, S.; Lu, Q.: Intertopic information mining for query-based summarization (2010) 0.17
    0.17132677 = sum of:
      0.17132677 = product of:
        0.7138616 = sum of:
          0.016402023 = weight(abstract_txt:documents in 460) [ClassicSimilarity], result of:
            0.016402023 = score(doc=460,freq=1.0), product of:
              0.06376854 = queryWeight, product of:
                1.1436429 = boost
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.013548936 = queryNorm
              0.2572118 = fieldWeight in 460, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.0625 = fieldNorm(doc=460)
          0.016977971 = weight(abstract_txt:been in 460) [ClassicSimilarity], result of:
            0.016977971 = score(doc=460,freq=1.0), product of:
              0.07469574 = queryWeight, product of:
                1.5159355 = boost
                3.6367204 = idf(docFreq=3059, maxDocs=42740)
                0.013548936 = queryNorm
              0.22729503 = fieldWeight in 460, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6367204 = idf(docFreq=3059, maxDocs=42740)
                0.0625 = fieldNorm(doc=460)
          0.027689844 = weight(abstract_txt:document in 460) [ClassicSimilarity], result of:
            0.027689844 = score(doc=460,freq=1.0), product of:
              0.10349491 = queryWeight, product of:
                1.7844005 = boost
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.013548936 = queryNorm
              0.26754788 = fieldWeight in 460, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.0625 = fieldNorm(doc=460)
          0.1160775 = weight(abstract_txt:topics in 460) [ClassicSimilarity], result of:
            0.1160775 = score(doc=460,freq=6.0), product of:
              0.14807677 = queryWeight, product of:
                2.1344023 = boost
                5.1204185 = idf(docFreq=693, maxDocs=42740)
                0.013548936 = queryNorm
              0.7839008 = fieldWeight in 460, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.1204185 = idf(docFreq=693, maxDocs=42740)
                0.0625 = fieldNorm(doc=460)
          0.2489656 = weight(abstract_txt:topic in 460) [ClassicSimilarity], result of:
            0.2489656 = score(doc=460,freq=4.0), product of:
              0.39093193 = queryWeight, product of:
                5.663276 = boost
                5.0948124 = idf(docFreq=711, maxDocs=42740)
                0.013548936 = queryNorm
              0.63685155 = fieldWeight in 460, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.0948124 = idf(docFreq=711, maxDocs=42740)
                0.0625 = fieldNorm(doc=460)
          0.28774863 = weight(abstract_txt:labels in 460) [ClassicSimilarity], result of:
            0.28774863 = score(doc=460,freq=1.0), product of:
              0.6209513 = queryWeight, product of:
                6.181254 = boost
                7.4143953 = idf(docFreq=69, maxDocs=42740)
                0.013548936 = queryNorm
              0.4633997 = fieldWeight in 460, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4143953 = idf(docFreq=69, maxDocs=42740)
                0.0625 = fieldNorm(doc=460)
        0.24 = coord(6/25)
    
  3. Soricut, R.; Marcu, D.: Abstractive headline generation using WIDL-expressions (2007) 0.16
    0.1625544 = sum of:
      0.1625544 = product of:
        0.67731 = sum of:
          0.08644432 = weight(abstract_txt:representing in 2944) [ClassicSimilarity], result of:
            0.08644432 = score(doc=2944,freq=2.0), product of:
              0.13209383 = queryWeight, product of:
                1.6459948 = boost
                5.9230976 = idf(docFreq=310, maxDocs=42740)
                0.013548936 = queryNorm
              0.654416 = fieldWeight in 2944, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9230976 = idf(docFreq=310, maxDocs=42740)
                0.078125 = fieldNorm(doc=2944)
          0.06922461 = weight(abstract_txt:document in 2944) [ClassicSimilarity], result of:
            0.06922461 = score(doc=2944,freq=4.0), product of:
              0.10349491 = queryWeight, product of:
                1.7844005 = boost
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.013548936 = queryNorm
              0.6688697 = fieldWeight in 2944, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.078125 = fieldNorm(doc=2944)
          0.05923555 = weight(abstract_txt:topics in 2944) [ClassicSimilarity], result of:
            0.05923555 = score(doc=2944,freq=1.0), product of:
              0.14807677 = queryWeight, product of:
                2.1344023 = boost
                5.1204185 = idf(docFreq=693, maxDocs=42740)
                0.013548936 = queryNorm
              0.4000327 = fieldWeight in 2944, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1204185 = idf(docFreq=693, maxDocs=42740)
                0.078125 = fieldNorm(doc=2944)
          0.17816213 = weight(abstract_txt:textual in 2944) [ClassicSimilarity], result of:
            0.17816213 = score(doc=2944,freq=2.0), product of:
              0.26953292 = queryWeight, product of:
                3.325126 = boost
                5.982718 = idf(docFreq=292, maxDocs=42740)
                0.013548936 = queryNorm
              0.66100323 = fieldWeight in 2944, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.982718 = idf(docFreq=292, maxDocs=42740)
                0.078125 = fieldNorm(doc=2944)
          0.12863986 = weight(abstract_txt:representations in 2944) [ClassicSimilarity], result of:
            0.12863986 = score(doc=2944,freq=1.0), product of:
              0.27331403 = queryWeight, product of:
                3.3483677 = boost
                6.0245357 = idf(docFreq=280, maxDocs=42740)
                0.013548936 = queryNorm
              0.47066686 = fieldWeight in 2944, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0245357 = idf(docFreq=280, maxDocs=42740)
                0.078125 = fieldNorm(doc=2944)
          0.1556035 = weight(abstract_txt:topic in 2944) [ClassicSimilarity], result of:
            0.1556035 = score(doc=2944,freq=1.0), product of:
              0.39093193 = queryWeight, product of:
                5.663276 = boost
                5.0948124 = idf(docFreq=711, maxDocs=42740)
                0.013548936 = queryNorm
              0.39803222 = fieldWeight in 2944, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0948124 = idf(docFreq=711, maxDocs=42740)
                0.078125 = fieldNorm(doc=2944)
        0.24 = coord(6/25)
    
  4. Xu, J.; Croft, W.B.: Topic-based language models for distributed retrieval (2000) 0.15
    0.15264776 = sum of:
      0.15264776 = product of:
        0.63603234 = sum of:
          0.07192976 = weight(abstract_txt:labeling in 1039) [ClassicSimilarity], result of:
            0.07192976 = score(doc=1039,freq=1.0), product of:
              0.11685948 = queryWeight, product of:
                1.0947227 = boost
                7.8787007 = idf(docFreq=43, maxDocs=42740)
                0.013548936 = queryNorm
              0.6155235 = fieldWeight in 1039, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.8787007 = idf(docFreq=43, maxDocs=42740)
                0.078125 = fieldNorm(doc=1039)
          0.03551142 = weight(abstract_txt:documents in 1039) [ClassicSimilarity], result of:
            0.03551142 = score(doc=1039,freq=3.0), product of:
              0.06376854 = queryWeight, product of:
                1.1436429 = boost
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.013548936 = queryNorm
              0.5568799 = fieldWeight in 1039, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.078125 = fieldNorm(doc=1039)
          0.04894919 = weight(abstract_txt:document in 1039) [ClassicSimilarity], result of:
            0.04894919 = score(doc=1039,freq=2.0), product of:
              0.10349491 = queryWeight, product of:
                1.7844005 = boost
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.013548936 = queryNorm
              0.4729623 = fieldWeight in 1039, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.078125 = fieldNorm(doc=1039)
          0.09165769 = weight(abstract_txt:collections in 1039) [ClassicSimilarity], result of:
            0.09165769 = score(doc=1039,freq=4.0), product of:
              0.1247933 = queryWeight, product of:
                1.959424 = boost
                4.700647 = idf(docFreq=1055, maxDocs=42740)
                0.013548936 = queryNorm
              0.7344761 = fieldWeight in 1039, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.700647 = idf(docFreq=1055, maxDocs=42740)
                0.078125 = fieldNorm(doc=1039)
          0.1184711 = weight(abstract_txt:topics in 1039) [ClassicSimilarity], result of:
            0.1184711 = score(doc=1039,freq=4.0), product of:
              0.14807677 = queryWeight, product of:
                2.1344023 = boost
                5.1204185 = idf(docFreq=693, maxDocs=42740)
                0.013548936 = queryNorm
              0.8000654 = fieldWeight in 1039, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.1204185 = idf(docFreq=693, maxDocs=42740)
                0.078125 = fieldNorm(doc=1039)
          0.26951316 = weight(abstract_txt:topic in 1039) [ClassicSimilarity], result of:
            0.26951316 = score(doc=1039,freq=3.0), product of:
              0.39093193 = queryWeight, product of:
                5.663276 = boost
                5.0948124 = idf(docFreq=711, maxDocs=42740)
                0.013548936 = queryNorm
              0.689412 = fieldWeight in 1039, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.0948124 = idf(docFreq=711, maxDocs=42740)
                0.078125 = fieldNorm(doc=1039)
        0.24 = coord(6/25)
    
  5. Li, X.; Zhang, A.; Li, C.; Ouyang, J.; Cai, Y.: Exploring coherent topics by topic modeling with term weighting (2018) 0.14
    0.14232326 = sum of:
      0.14232326 = product of:
        0.5930136 = sum of:
          0.065563284 = weight(abstract_txt:conditional in 1046) [ClassicSimilarity], result of:
            0.065563284 = score(doc=1046,freq=1.0), product of:
              0.12747902 = queryWeight, product of:
                1.1433825 = boost
                8.228904 = idf(docFreq=30, maxDocs=42740)
                0.013548936 = queryNorm
              0.5143065 = fieldWeight in 1046, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.228904 = idf(docFreq=30, maxDocs=42740)
                0.0625 = fieldNorm(doc=1046)
          0.027689844 = weight(abstract_txt:document in 1046) [ClassicSimilarity], result of:
            0.027689844 = score(doc=1046,freq=1.0), product of:
              0.10349491 = queryWeight, product of:
                1.7844005 = boost
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.013548936 = queryNorm
              0.26754788 = fieldWeight in 1046, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.0625 = fieldNorm(doc=1046)
          0.06883817 = weight(abstract_txt:term in 1046) [ClassicSimilarity], result of:
            0.06883817 = score(doc=1046,freq=3.0), product of:
              0.13168949 = queryWeight, product of:
                2.012836 = boost
                4.8287816 = idf(docFreq=928, maxDocs=42740)
                0.013548936 = queryNorm
              0.52273095 = fieldWeight in 1046, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.8287816 = idf(docFreq=928, maxDocs=42740)
                0.0625 = fieldNorm(doc=1046)
          0.06701738 = weight(abstract_txt:topics in 1046) [ClassicSimilarity], result of:
            0.06701738 = score(doc=1046,freq=2.0), product of:
              0.14807677 = queryWeight, product of:
                2.1344023 = boost
                5.1204185 = idf(docFreq=693, maxDocs=42740)
                0.013548936 = queryNorm
              0.45258534 = fieldWeight in 1046, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1204185 = idf(docFreq=693, maxDocs=42740)
                0.0625 = fieldNorm(doc=1046)
          0.05898553 = weight(abstract_txt:lists in 1046) [ClassicSimilarity], result of:
            0.05898553 = score(doc=1046,freq=1.0), product of:
              0.17134404 = queryWeight, product of:
                2.295976 = boost
                5.5080323 = idf(docFreq=470, maxDocs=42740)
                0.013548936 = queryNorm
              0.34425202 = fieldWeight in 1046, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5080323 = idf(docFreq=470, maxDocs=42740)
                0.0625 = fieldNorm(doc=1046)
          0.30491936 = weight(abstract_txt:topic in 1046) [ClassicSimilarity], result of:
            0.30491936 = score(doc=1046,freq=6.0), product of:
              0.39093193 = queryWeight, product of:
                5.663276 = boost
                5.0948124 = idf(docFreq=711, maxDocs=42740)
                0.013548936 = queryNorm
              0.7799807 = fieldWeight in 1046, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.0948124 = idf(docFreq=711, maxDocs=42740)
                0.0625 = fieldNorm(doc=1046)
        0.24 = coord(6/25)