Document (#29486)

Author
Lehtokangas, R.
Järvelin, K.
Title
Consistency of textual expression in newspaper articles : an argument for semantically based query expansion
Source
Journal of documentation. 57(2001) no.4, S.535-548
Year
2001
Abstract
This article investigates how consistent different newspapers are in their choice of words when writing about the same news events. News articles on the same news events were taken from three Finnish newspapers and compared in regard to their central concepts and words representing the concepts in the news texts. Consistency figures were calculated for each set of three articles (the total number of sets was sixty). Inconsistency in words and concepts was found between news articles from different newspapers. The mean value of consistency calculated on the basis of words was 65 per cent; this however depended on the article length. For short news wires consistency was 83 per cent while for long articles it was only 47 per cent. At the concept level, consistency was considerably higher, ranging from 92 per cent to 97 per cent between short and long articles. The articles also represented three categories of topic (event, process and opinion). Statistically significant differences in consistency were found in regard to length but not in regard to the categories of topic. We argue that the expression inconsistency is a clear sign of a retrieval problem and that query expansion based on semantic relationships can significantly improve retrieval performance on free-text sources.
Footnote
Vgl. auch: http://www.emeraldinsight.com/10.1108/EUM0000000007104.
Theme
Semantisches Umfeld in Indexierung u. Retrieval

Similar documents (author)

  1. Järvelin, K.: ¬An analysis of two approaches in information retrieval : from frameworks to study designs (2007) 4.99
    4.989572 = sum of:
      4.989572 = weight(author_txt:järvelin in 326) [ClassicSimilarity], result of:
        4.989572 = fieldWeight in 326, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.983315 = idf(docFreq=40, maxDocs=44218)
          0.625 = fieldNorm(doc=326)
    
  2. Järvelin, K.: Evaluation (2011) 4.99
    4.989572 = sum of:
      4.989572 = weight(author_txt:järvelin in 548) [ClassicSimilarity], result of:
        4.989572 = fieldWeight in 548, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.983315 = idf(docFreq=40, maxDocs=44218)
          0.625 = fieldNorm(doc=548)
    
  3. Järvelin, K.; Vakkari, P.: ¬The evolution of library and information science 1965-1985 : a content analysis of journal titles (1993) 3.99
    3.9916575 = sum of:
      3.9916575 = weight(author_txt:järvelin in 4649) [ClassicSimilarity], result of:
        3.9916575 = fieldWeight in 4649, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.983315 = idf(docFreq=40, maxDocs=44218)
          0.5 = fieldNorm(doc=4649)
    
  4. Kristensen, J.; Järvelin, K.: ¬The effectiveness of a searching thesaurus in free-text searching in a full-text database (1990) 3.99
    3.9916575 = sum of:
      3.9916575 = weight(author_txt:järvelin in 2043) [ClassicSimilarity], result of:
        3.9916575 = fieldWeight in 2043, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.983315 = idf(docFreq=40, maxDocs=44218)
          0.5 = fieldNorm(doc=2043)
    
  5. Pirkola, A.; Järvelin, K.: Employing the resolution power of search keys (2001) 3.99
    3.9916575 = sum of:
      3.9916575 = weight(author_txt:järvelin in 5907) [ClassicSimilarity], result of:
        3.9916575 = fieldWeight in 5907, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.983315 = idf(docFreq=40, maxDocs=44218)
          0.5 = fieldNorm(doc=5907)
    

Similar documents (content)

  1. Khoo, C.S.G.; Dai, D.; Loh, T.E.: Using statistical and contextual information to identify two- and three-character words in Chinese text (2002) 0.16
    0.15790863 = sum of:
      0.15790863 = product of:
        0.5639594 = sum of:
          0.01820123 = weight(abstract_txt:found in 5206) [ClassicSimilarity], result of:
            0.01820123 = score(doc=5206,freq=1.0), product of:
              0.06504439 = queryWeight, product of:
                1.0102459 = boost
                4.4772453 = idf(docFreq=1365, maxDocs=44218)
                0.014380429 = queryNorm
              0.27982783 = fieldWeight in 5206, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4772453 = idf(docFreq=1365, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
          0.030075314 = weight(abstract_txt:were in 5206) [ClassicSimilarity], result of:
            0.030075314 = score(doc=5206,freq=4.0), product of:
              0.06555813 = queryWeight, product of:
                1.2421701 = boost
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.014380429 = queryNorm
              0.45875797 = fieldWeight in 5206, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
          0.04537987 = weight(abstract_txt:three in 5206) [ClassicSimilarity], result of:
            0.04537987 = score(doc=5206,freq=3.0), product of:
              0.09492374 = queryWeight, product of:
                1.4947041 = boost
                4.41619 = idf(docFreq=1451, maxDocs=44218)
                0.014380429 = queryNorm
              0.4780666 = fieldWeight in 5206, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.41619 = idf(docFreq=1451, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
          0.0774182 = weight(abstract_txt:calculated in 5206) [ClassicSimilarity], result of:
            0.0774182 = score(doc=5206,freq=1.0), product of:
              0.1707542 = queryWeight, product of:
                1.6368462 = boost
                7.2542357 = idf(docFreq=84, maxDocs=44218)
                0.014380429 = queryNorm
              0.45338973 = fieldWeight in 5206, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2542357 = idf(docFreq=84, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
          0.18664344 = weight(abstract_txt:words in 5206) [ClassicSimilarity], result of:
            0.18664344 = score(doc=5206,freq=9.0), product of:
              0.18595748 = queryWeight, product of:
                2.4157057 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.014380429 = queryNorm
              1.0036888 = fieldWeight in 5206, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
          0.07762604 = weight(abstract_txt:articles in 5206) [ClassicSimilarity], result of:
            0.07762604 = score(doc=5206,freq=1.0), product of:
              0.25971895 = queryWeight, product of:
                3.7766612 = boost
                4.7821565 = idf(docFreq=1006, maxDocs=44218)
                0.014380429 = queryNorm
              0.29888478 = fieldWeight in 5206, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7821565 = idf(docFreq=1006, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
          0.12861533 = weight(abstract_txt:news in 5206) [ClassicSimilarity], result of:
            0.12861533 = score(doc=5206,freq=1.0), product of:
              0.34544447 = queryWeight, product of:
                4.0324774 = boost
                5.957094 = idf(docFreq=310, maxDocs=44218)
                0.014380429 = queryNorm
              0.3723184 = fieldWeight in 5206, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.957094 = idf(docFreq=310, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
        0.28 = coord(7/25)
    
  2. Iivonen, M.: ¬The impact of the indexing environment on interindexer consistency (1990) 0.14
    0.1400689 = sum of:
      0.1400689 = product of:
        0.87543064 = sum of:
          0.030075314 = weight(abstract_txt:were in 4779) [ClassicSimilarity], result of:
            0.030075314 = score(doc=4779,freq=1.0), product of:
              0.06555813 = queryWeight, product of:
                1.2421701 = boost
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.014380429 = queryNorm
              0.45875797 = fieldWeight in 4779, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.125 = fieldNorm(doc=4779)
          0.057432715 = weight(abstract_txt:concepts in 4779) [ClassicSimilarity], result of:
            0.057432715 = score(doc=4779,freq=1.0), product of:
              0.10090809 = queryWeight, product of:
                1.5411 = boost
                4.5532694 = idf(docFreq=1265, maxDocs=44218)
                0.014380429 = queryNorm
              0.5691587 = fieldWeight in 4779, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5532694 = idf(docFreq=1265, maxDocs=44218)
                0.125 = fieldNorm(doc=4779)
          0.1548364 = weight(abstract_txt:calculated in 4779) [ClassicSimilarity], result of:
            0.1548364 = score(doc=4779,freq=1.0), product of:
              0.1707542 = queryWeight, product of:
                1.6368462 = boost
                7.2542357 = idf(docFreq=84, maxDocs=44218)
                0.014380429 = queryNorm
              0.90677947 = fieldWeight in 4779, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2542357 = idf(docFreq=84, maxDocs=44218)
                0.125 = fieldNorm(doc=4779)
          0.6330862 = weight(abstract_txt:consistency in 4779) [ClassicSimilarity], result of:
            0.6330862 = score(doc=4779,freq=4.0), product of:
              0.39669037 = queryWeight, product of:
                4.3212423 = boost
                6.3836813 = idf(docFreq=202, maxDocs=44218)
                0.014380429 = queryNorm
              1.5959203 = fieldWeight in 4779, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.3836813 = idf(docFreq=202, maxDocs=44218)
                0.125 = fieldNorm(doc=4779)
        0.16 = coord(4/25)
    
  3. Buccio, E. Di; Melucci, M.; Moro, F.: Detecting verbose queries and improving information retrieval (2014) 0.14
    0.13670515 = sum of:
      0.13670515 = product of:
        0.4272036 = sum of:
          0.06889395 = weight(abstract_txt:query in 2695) [ClassicSimilarity], result of:
            0.06889395 = score(doc=2695,freq=10.0), product of:
              0.07332691 = queryWeight, product of:
                1.0726397 = boost
                4.7537646 = idf(docFreq=1035, maxDocs=44218)
                0.014380429 = queryNorm
              0.9395452 = fieldWeight in 2695, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                4.7537646 = idf(docFreq=1035, maxDocs=44218)
                0.0625 = fieldNorm(doc=2695)
          0.058828104 = weight(abstract_txt:topic in 2695) [ClassicSimilarity], result of:
            0.058828104 = score(doc=2695,freq=5.0), product of:
              0.083152615 = queryWeight, product of:
                1.1422473 = boost
                5.062254 = idf(docFreq=760, maxDocs=44218)
                0.014380429 = queryNorm
              0.7074715 = fieldWeight in 2695, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.062254 = idf(docFreq=760, maxDocs=44218)
                0.0625 = fieldNorm(doc=2695)
          0.029499266 = weight(abstract_txt:long in 2695) [ClassicSimilarity], result of:
            0.029499266 = score(doc=2695,freq=1.0), product of:
              0.089746356 = queryWeight, product of:
                1.1866717 = boost
                5.2591357 = idf(docFreq=624, maxDocs=44218)
                0.014380429 = queryNorm
              0.32869598 = fieldWeight in 2695, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2591357 = idf(docFreq=624, maxDocs=44218)
                0.0625 = fieldNorm(doc=2695)
          0.039507154 = weight(abstract_txt:short in 2695) [ClassicSimilarity], result of:
            0.039507154 = score(doc=2695,freq=1.0), product of:
              0.10904184 = queryWeight, product of:
                1.3080332 = boost
                5.79699 = idf(docFreq=364, maxDocs=44218)
                0.014380429 = queryNorm
              0.36231187 = fieldWeight in 2695, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.79699 = idf(docFreq=364, maxDocs=44218)
                0.0625 = fieldNorm(doc=2695)
          0.1133442 = weight(abstract_txt:length in 2695) [ClassicSimilarity], result of:
            0.1133442 = score(doc=2695,freq=4.0), product of:
              0.13869332 = queryWeight, product of:
                1.4751967 = boost
                6.537832 = idf(docFreq=173, maxDocs=44218)
                0.014380429 = queryNorm
              0.817229 = fieldWeight in 2695, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.537832 = idf(docFreq=173, maxDocs=44218)
                0.0625 = fieldNorm(doc=2695)
          0.02620008 = weight(abstract_txt:three in 2695) [ClassicSimilarity], result of:
            0.02620008 = score(doc=2695,freq=1.0), product of:
              0.09492374 = queryWeight, product of:
                1.4947041 = boost
                4.41619 = idf(docFreq=1451, maxDocs=44218)
                0.014380429 = queryNorm
              0.27601188 = fieldWeight in 2695, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.41619 = idf(docFreq=1451, maxDocs=44218)
                0.0625 = fieldNorm(doc=2695)
          0.028716357 = weight(abstract_txt:concepts in 2695) [ClassicSimilarity], result of:
            0.028716357 = score(doc=2695,freq=1.0), product of:
              0.10090809 = queryWeight, product of:
                1.5411 = boost
                4.5532694 = idf(docFreq=1265, maxDocs=44218)
                0.014380429 = queryNorm
              0.28457934 = fieldWeight in 2695, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5532694 = idf(docFreq=1265, maxDocs=44218)
                0.0625 = fieldNorm(doc=2695)
          0.06221448 = weight(abstract_txt:words in 2695) [ClassicSimilarity], result of:
            0.06221448 = score(doc=2695,freq=1.0), product of:
              0.18595748 = queryWeight, product of:
                2.4157057 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.014380429 = queryNorm
              0.33456293 = fieldWeight in 2695, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.0625 = fieldNorm(doc=2695)
        0.32 = coord(8/25)
    
  4. Iivonen, M.: Interindexer consistency and the indexing environment (1990) 0.12
    0.122125626 = sum of:
      0.122125626 = product of:
        0.76328516 = sum of:
          0.022556484 = weight(abstract_txt:were in 3593) [ClassicSimilarity], result of:
            0.022556484 = score(doc=3593,freq=1.0), product of:
              0.06555813 = queryWeight, product of:
                1.2421701 = boost
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.014380429 = queryNorm
              0.34406847 = fieldWeight in 3593, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.09375 = fieldNorm(doc=3593)
          0.043074537 = weight(abstract_txt:concepts in 3593) [ClassicSimilarity], result of:
            0.043074537 = score(doc=3593,freq=1.0), product of:
              0.10090809 = queryWeight, product of:
                1.5411 = boost
                4.5532694 = idf(docFreq=1265, maxDocs=44218)
                0.014380429 = queryNorm
              0.426869 = fieldWeight in 3593, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5532694 = idf(docFreq=1265, maxDocs=44218)
                0.09375 = fieldNorm(doc=3593)
          0.1161273 = weight(abstract_txt:calculated in 3593) [ClassicSimilarity], result of:
            0.1161273 = score(doc=3593,freq=1.0), product of:
              0.1707542 = queryWeight, product of:
                1.6368462 = boost
                7.2542357 = idf(docFreq=84, maxDocs=44218)
                0.014380429 = queryNorm
              0.6800846 = fieldWeight in 3593, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2542357 = idf(docFreq=84, maxDocs=44218)
                0.09375 = fieldNorm(doc=3593)
          0.5815268 = weight(abstract_txt:consistency in 3593) [ClassicSimilarity], result of:
            0.5815268 = score(doc=3593,freq=6.0), product of:
              0.39669037 = queryWeight, product of:
                4.3212423 = boost
                6.3836813 = idf(docFreq=202, maxDocs=44218)
                0.014380429 = queryNorm
              1.4659464 = fieldWeight in 3593, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.3836813 = idf(docFreq=202, maxDocs=44218)
                0.09375 = fieldNorm(doc=3593)
        0.16 = coord(4/25)
    
  5. Iivonen, M.: Factors lowering the consistency in online searching (1995) 0.12
    0.116702355 = sum of:
      0.116702355 = product of:
        0.58351177 = sum of:
          0.056157887 = weight(abstract_txt:same in 3869) [ClassicSimilarity], result of:
            0.056157887 = score(doc=3869,freq=3.0), product of:
              0.07294271 = queryWeight, product of:
                1.069826 = boost
                4.7412944 = idf(docFreq=1048, maxDocs=44218)
                0.014380429 = queryNorm
              0.7698903 = fieldWeight in 3869, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.7412944 = idf(docFreq=1048, maxDocs=44218)
                0.09375 = fieldNorm(doc=3869)
          0.032679267 = weight(abstract_txt:query in 3869) [ClassicSimilarity], result of:
            0.032679267 = score(doc=3869,freq=1.0), product of:
              0.07332691 = queryWeight, product of:
                1.0726397 = boost
                4.7537646 = idf(docFreq=1035, maxDocs=44218)
                0.014380429 = queryNorm
              0.44566542 = fieldWeight in 3869, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7537646 = idf(docFreq=1035, maxDocs=44218)
                0.09375 = fieldNorm(doc=3869)
          0.022556484 = weight(abstract_txt:were in 3869) [ClassicSimilarity], result of:
            0.022556484 = score(doc=3869,freq=1.0), product of:
              0.06555813 = queryWeight, product of:
                1.2421701 = boost
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.014380429 = queryNorm
              0.34406847 = fieldWeight in 3869, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.09375 = fieldNorm(doc=3869)
          0.060916595 = weight(abstract_txt:concepts in 3869) [ClassicSimilarity], result of:
            0.060916595 = score(doc=3869,freq=2.0), product of:
              0.10090809 = queryWeight, product of:
                1.5411 = boost
                4.5532694 = idf(docFreq=1265, maxDocs=44218)
                0.014380429 = queryNorm
              0.60368395 = fieldWeight in 3869, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5532694 = idf(docFreq=1265, maxDocs=44218)
                0.09375 = fieldNorm(doc=3869)
          0.41120154 = weight(abstract_txt:consistency in 3869) [ClassicSimilarity], result of:
            0.41120154 = score(doc=3869,freq=3.0), product of:
              0.39669037 = queryWeight, product of:
                4.3212423 = boost
                6.3836813 = idf(docFreq=202, maxDocs=44218)
                0.014380429 = queryNorm
              1.0365806 = fieldWeight in 3869, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.3836813 = idf(docFreq=202, maxDocs=44218)
                0.09375 = fieldNorm(doc=3869)
        0.2 = coord(5/25)