Document (#29487)

Author
Lehtokangas, R.
Järvelin, K.
Title
Consistency of textual expression in newspaper articles : an argument for semantically based query expansion
Source
Journal of documentation. 57(2001) no.4, S.535-548
Year
2001
Abstract
This article investigates how consistent different newspapers are in their choice of words when writing about the same news events. News articles on the same news events were taken from three Finnish newspapers and compared in regard to their central concepts and words representing the concepts in the news texts. Consistency figures were calculated for each set of three articles (the total number of sets was sixty). Inconsistency in words and concepts was found between news articles from different newspapers. The mean value of consistency calculated on the basis of words was 65 per cent; this however depended on the article length. For short news wires consistency was 83 per cent while for long articles it was only 47 per cent. At the concept level, consistency was considerably higher, ranging from 92 per cent to 97 per cent between short and long articles. The articles also represented three categories of topic (event, process and opinion). Statistically significant differences in consistency were found in regard to length but not in regard to the categories of topic. We argue that the expression inconsistency is a clear sign of a retrieval problem and that query expansion based on semantic relationships can significantly improve retrieval performance on free-text sources.
Footnote
Vgl. auch: http://www.emeraldinsight.com/10.1108/EUM0000000007104.
Theme
Semantisches Umfeld in Indexierung u. Retrieval

Similar documents (author)

  1. Järvelin, K.: ¬An analysis of two approaches in information retrieval : from frameworks to study designs (2007) 5.00
    5.0043335 = sum of:
      5.0043335 = weight(author_txt:järvelin in 2327) [ClassicSimilarity], result of:
        5.0043335 = fieldWeight in 2327, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.006933 = idf(docFreq=37, maxDocs=41962)
          0.625 = fieldNorm(doc=2327)
    
  2. Järvelin, K.: Evaluation (2011) 5.00
    5.0043335 = sum of:
      5.0043335 = weight(author_txt:järvelin in 2549) [ClassicSimilarity], result of:
        5.0043335 = fieldWeight in 2549, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.006933 = idf(docFreq=37, maxDocs=41962)
          0.625 = fieldNorm(doc=2549)
    
  3. Järvelin, K.; Vakkari, P.: ¬The evolution of library and information science 1965-1985 : a content analysis of journal titles (1993) 4.00
    4.0034666 = sum of:
      4.0034666 = weight(author_txt:järvelin in 4649) [ClassicSimilarity], result of:
        4.0034666 = fieldWeight in 4649, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.006933 = idf(docFreq=37, maxDocs=41962)
          0.5 = fieldNorm(doc=4649)
    
  4. Kristensen, J.; Järvelin, K.: ¬The effectiveness of a searching thesaurus in free-text searching in a full-text database (1990) 4.00
    4.0034666 = sum of:
      4.0034666 = weight(author_txt:järvelin in 2112) [ClassicSimilarity], result of:
        4.0034666 = fieldWeight in 2112, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.006933 = idf(docFreq=37, maxDocs=41962)
          0.5 = fieldNorm(doc=2112)
    
  5. Pirkola, A.; Järvelin, K.: Employing the resolution power of search keys (2001) 4.00
    4.0034666 = sum of:
      4.0034666 = weight(author_txt:järvelin in 403) [ClassicSimilarity], result of:
        4.0034666 = fieldWeight in 403, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.006933 = idf(docFreq=37, maxDocs=41962)
          0.5 = fieldNorm(doc=403)
    

Similar documents (content)

  1. Khoo, C.S.G.; Dai, D.; Loh, T.E.: Using statistical and contextual information to identify two- and three-character words in Chinese text (2002) 0.16
    0.16097565 = sum of:
      0.16097565 = product of:
        0.574913 = sum of:
          0.018527456 = weight(abstract_txt:found in 207) [ClassicSimilarity], result of:
            0.018527456 = score(doc=207,freq=1.0), product of:
              0.06555994 = queryWeight, product of:
                1.0263299 = boost
                4.521653 = idf(docFreq=1239, maxDocs=41962)
                0.014127142 = queryNorm
              0.28260332 = fieldWeight in 207, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.521653 = idf(docFreq=1239, maxDocs=41962)
                0.0625 = fieldNorm(doc=207)
          0.030711077 = weight(abstract_txt:were in 207) [ClassicSimilarity], result of:
            0.030711077 = score(doc=207,freq=4.0), product of:
              0.06621684 = queryWeight, product of:
                1.2632741 = boost
                3.7103646 = idf(docFreq=2790, maxDocs=41962)
                0.014127142 = queryNorm
              0.46379557 = fieldWeight in 207, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.7103646 = idf(docFreq=2790, maxDocs=41962)
                0.0625 = fieldNorm(doc=207)
          0.046835657 = weight(abstract_txt:three in 207) [ClassicSimilarity], result of:
            0.046835657 = score(doc=207,freq=3.0), product of:
              0.09656115 = queryWeight, product of:
                1.5255082 = boost
                4.480573 = idf(docFreq=1291, maxDocs=41962)
                0.014127142 = queryNorm
              0.48503625 = fieldWeight in 207, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.480573 = idf(docFreq=1291, maxDocs=41962)
                0.0625 = fieldNorm(doc=207)
          0.07757384 = weight(abstract_txt:calculated in 207) [ClassicSimilarity], result of:
            0.07757384 = score(doc=207,freq=1.0), product of:
              0.17030923 = queryWeight, product of:
                1.6541954 = boost
                7.287811 = idf(docFreq=77, maxDocs=41962)
                0.014127142 = queryNorm
              0.45548818 = fieldWeight in 207, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.287811 = idf(docFreq=77, maxDocs=41962)
                0.0625 = fieldNorm(doc=207)
          0.18637884 = weight(abstract_txt:words in 207) [ClassicSimilarity], result of:
            0.18637884 = score(doc=207,freq=9.0), product of:
              0.18504964 = queryWeight, product of:
                2.4385228 = boost
                5.3716426 = idf(docFreq=529, maxDocs=41962)
                0.014127142 = queryNorm
              1.007183 = fieldWeight in 207, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                5.3716426 = idf(docFreq=529, maxDocs=41962)
                0.0625 = fieldNorm(doc=207)
          0.07898258 = weight(abstract_txt:articles in 207) [ClassicSimilarity], result of:
            0.07898258 = score(doc=207,freq=1.0), product of:
              0.2617007 = queryWeight, product of:
                3.8362257 = boost
                4.82888 = idf(docFreq=911, maxDocs=41962)
                0.014127142 = queryNorm
              0.301805 = fieldWeight in 207, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.82888 = idf(docFreq=911, maxDocs=41962)
                0.0625 = fieldNorm(doc=207)
          0.1359036 = weight(abstract_txt:news in 207) [ClassicSimilarity], result of:
            0.1359036 = score(doc=207,freq=1.0), product of:
              0.35696232 = queryWeight, product of:
                4.1480074 = boost
                6.0915604 = idf(docFreq=257, maxDocs=41962)
                0.014127142 = queryNorm
              0.38072252 = fieldWeight in 207, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0915604 = idf(docFreq=257, maxDocs=41962)
                0.0625 = fieldNorm(doc=207)
        0.28 = coord(7/25)
    
  2. Iivonen, M.: ¬The impact of the indexing environment on interindexer consistency (1990) 0.14
    0.13912323 = sum of:
      0.13912323 = product of:
        0.86952025 = sum of:
          0.030711077 = weight(abstract_txt:were in 4779) [ClassicSimilarity], result of:
            0.030711077 = score(doc=4779,freq=1.0), product of:
              0.06621684 = queryWeight, product of:
                1.2632741 = boost
                3.7103646 = idf(docFreq=2790, maxDocs=41962)
                0.014127142 = queryNorm
              0.46379557 = fieldWeight in 4779, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7103646 = idf(docFreq=2790, maxDocs=41962)
                0.125 = fieldNorm(doc=4779)
          0.058573864 = weight(abstract_txt:concepts in 4779) [ClassicSimilarity], result of:
            0.058573864 = score(doc=4779,freq=1.0), product of:
              0.101837486 = queryWeight, product of:
                1.5666326 = boost
                4.60136 = idf(docFreq=1144, maxDocs=41962)
                0.014127142 = queryNorm
              0.57517 = fieldWeight in 4779, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.60136 = idf(docFreq=1144, maxDocs=41962)
                0.125 = fieldNorm(doc=4779)
          0.15514769 = weight(abstract_txt:calculated in 4779) [ClassicSimilarity], result of:
            0.15514769 = score(doc=4779,freq=1.0), product of:
              0.17030923 = queryWeight, product of:
                1.6541954 = boost
                7.287811 = idf(docFreq=77, maxDocs=41962)
                0.014127142 = queryNorm
              0.91097635 = fieldWeight in 4779, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.287811 = idf(docFreq=77, maxDocs=41962)
                0.125 = fieldNorm(doc=4779)
          0.6250876 = weight(abstract_txt:consistency in 4779) [ClassicSimilarity], result of:
            0.6250876 = score(doc=4779,freq=4.0), product of:
              0.3917921 = queryWeight, product of:
                4.345664 = boost
                6.3818297 = idf(docFreq=192, maxDocs=41962)
                0.014127142 = queryNorm
              1.5954574 = fieldWeight in 4779, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.3818297 = idf(docFreq=192, maxDocs=41962)
                0.125 = fieldNorm(doc=4779)
        0.16 = coord(4/25)
    
  3. Buccio, E. Di; Melucci, M.; Moro, F.: Detecting verbose queries and improving information retrieval (2014) 0.14
    0.1378383 = sum of:
      0.1378383 = product of:
        0.4307447 = sum of:
          0.06748319 = weight(abstract_txt:query in 4696) [ClassicSimilarity], result of:
            0.06748319 = score(doc=4696,freq=10.0), product of:
              0.07203746 = queryWeight, product of:
                1.075838 = boost
                4.739769 = idf(docFreq=996, maxDocs=41962)
                0.014127142 = queryNorm
              0.9367791 = fieldWeight in 4696, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                4.739769 = idf(docFreq=996, maxDocs=41962)
                0.0625 = fieldNorm(doc=4696)
          0.06028518 = weight(abstract_txt:topic in 4696) [ClassicSimilarity], result of:
            0.06028518 = score(doc=4696,freq=5.0), product of:
              0.084187 = queryWeight, product of:
                1.1630281 = boost
                5.1238985 = idf(docFreq=678, maxDocs=41962)
                0.014127142 = queryNorm
              0.71608657 = fieldWeight in 4696, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.1238985 = idf(docFreq=678, maxDocs=41962)
                0.0625 = fieldNorm(doc=4696)
          0.030301139 = weight(abstract_txt:long in 4696) [ClassicSimilarity], result of:
            0.030301139 = score(doc=4696,freq=1.0), product of:
              0.09100543 = queryWeight, product of:
                1.209209 = boost
                5.327355 = idf(docFreq=553, maxDocs=41962)
                0.014127142 = queryNorm
              0.33295968 = fieldWeight in 4696, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.327355 = idf(docFreq=553, maxDocs=41962)
                0.0625 = fieldNorm(doc=4696)
          0.039599147 = weight(abstract_txt:short in 4696) [ClassicSimilarity], result of:
            0.039599147 = score(doc=4696,freq=1.0), product of:
              0.108780704 = queryWeight, product of:
                1.3220371 = boost
                5.8244367 = idf(docFreq=336, maxDocs=41962)
                0.014127142 = queryNorm
              0.3640273 = fieldWeight in 4696, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8244367 = idf(docFreq=336, maxDocs=41962)
                0.0625 = fieldNorm(doc=4696)
          0.11462227 = weight(abstract_txt:length in 4696) [ClassicSimilarity], result of:
            0.11462227 = score(doc=4696,freq=4.0), product of:
              0.13918337 = queryWeight, product of:
                1.4954138 = boost
                6.588274 = idf(docFreq=156, maxDocs=41962)
                0.014127142 = queryNorm
              0.82353425 = fieldWeight in 4696, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.588274 = idf(docFreq=156, maxDocs=41962)
                0.0625 = fieldNorm(doc=4696)
          0.02704058 = weight(abstract_txt:three in 4696) [ClassicSimilarity], result of:
            0.02704058 = score(doc=4696,freq=1.0), product of:
              0.09656115 = queryWeight, product of:
                1.5255082 = boost
                4.480573 = idf(docFreq=1291, maxDocs=41962)
                0.014127142 = queryNorm
              0.28003582 = fieldWeight in 4696, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.480573 = idf(docFreq=1291, maxDocs=41962)
                0.0625 = fieldNorm(doc=4696)
          0.029286932 = weight(abstract_txt:concepts in 4696) [ClassicSimilarity], result of:
            0.029286932 = score(doc=4696,freq=1.0), product of:
              0.101837486 = queryWeight, product of:
                1.5666326 = boost
                4.60136 = idf(docFreq=1144, maxDocs=41962)
                0.014127142 = queryNorm
              0.287585 = fieldWeight in 4696, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.60136 = idf(docFreq=1144, maxDocs=41962)
                0.0625 = fieldNorm(doc=4696)
          0.062126283 = weight(abstract_txt:words in 4696) [ClassicSimilarity], result of:
            0.062126283 = score(doc=4696,freq=1.0), product of:
              0.18504964 = queryWeight, product of:
                2.4385228 = boost
                5.3716426 = idf(docFreq=529, maxDocs=41962)
                0.014127142 = queryNorm
              0.33572766 = fieldWeight in 4696, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3716426 = idf(docFreq=529, maxDocs=41962)
                0.0625 = fieldNorm(doc=4696)
        0.32 = coord(8/25)
    
  4. Iivonen, M.: Interindexer consistency and the indexing environment (1990) 0.12
    0.12120065 = sum of:
      0.12120065 = product of:
        0.7575041 = sum of:
          0.02303331 = weight(abstract_txt:were in 3662) [ClassicSimilarity], result of:
            0.02303331 = score(doc=3662,freq=1.0), product of:
              0.06621684 = queryWeight, product of:
                1.2632741 = boost
                3.7103646 = idf(docFreq=2790, maxDocs=41962)
                0.014127142 = queryNorm
              0.3478467 = fieldWeight in 3662, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7103646 = idf(docFreq=2790, maxDocs=41962)
                0.09375 = fieldNorm(doc=3662)
          0.043930396 = weight(abstract_txt:concepts in 3662) [ClassicSimilarity], result of:
            0.043930396 = score(doc=3662,freq=1.0), product of:
              0.101837486 = queryWeight, product of:
                1.5666326 = boost
                4.60136 = idf(docFreq=1144, maxDocs=41962)
                0.014127142 = queryNorm
              0.43137747 = fieldWeight in 3662, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.60136 = idf(docFreq=1144, maxDocs=41962)
                0.09375 = fieldNorm(doc=3662)
          0.11636076 = weight(abstract_txt:calculated in 3662) [ClassicSimilarity], result of:
            0.11636076 = score(doc=3662,freq=1.0), product of:
              0.17030923 = queryWeight, product of:
                1.6541954 = boost
                7.287811 = idf(docFreq=77, maxDocs=41962)
                0.014127142 = queryNorm
              0.68323225 = fieldWeight in 3662, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.287811 = idf(docFreq=77, maxDocs=41962)
                0.09375 = fieldNorm(doc=3662)
          0.57417965 = weight(abstract_txt:consistency in 3662) [ClassicSimilarity], result of:
            0.57417965 = score(doc=3662,freq=6.0), product of:
              0.3917921 = queryWeight, product of:
                4.345664 = boost
                6.3818297 = idf(docFreq=192, maxDocs=41962)
                0.014127142 = queryNorm
              1.4655213 = fieldWeight in 3662, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.3818297 = idf(docFreq=192, maxDocs=41962)
                0.09375 = fieldNorm(doc=3662)
        0.16 = coord(4/25)
    
  5. Brandow, R.; Mitze, K.; Rau, L.F.: Automatic condensation of electronic publications by sentence selection (1995) 0.12
    0.119516395 = sum of:
      0.119516395 = product of:
        0.597582 = sum of:
          0.032739587 = weight(abstract_txt:same in 2998) [ClassicSimilarity], result of:
            0.032739587 = score(doc=2998,freq=1.0), product of:
              0.07312782 = queryWeight, product of:
                1.0839494 = boost
                4.775505 = idf(docFreq=961, maxDocs=41962)
                0.014127142 = queryNorm
              0.4477036 = fieldWeight in 2998, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.775505 = idf(docFreq=961, maxDocs=41962)
                0.09375 = fieldNorm(doc=2998)
          0.02303331 = weight(abstract_txt:were in 2998) [ClassicSimilarity], result of:
            0.02303331 = score(doc=2998,freq=1.0), product of:
              0.06621684 = queryWeight, product of:
                1.2632741 = boost
                3.7103646 = idf(docFreq=2790, maxDocs=41962)
                0.014127142 = queryNorm
              0.3478467 = fieldWeight in 2998, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7103646 = idf(docFreq=2790, maxDocs=41962)
                0.09375 = fieldNorm(doc=2998)
          0.08596671 = weight(abstract_txt:length in 2998) [ClassicSimilarity], result of:
            0.08596671 = score(doc=2998,freq=1.0), product of:
              0.13918337 = queryWeight, product of:
                1.4954138 = boost
                6.588274 = idf(docFreq=156, maxDocs=41962)
                0.014127142 = queryNorm
              0.6176507 = fieldWeight in 2998, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.588274 = idf(docFreq=156, maxDocs=41962)
                0.09375 = fieldNorm(doc=2998)
          0.16754735 = weight(abstract_txt:articles in 2998) [ClassicSimilarity], result of:
            0.16754735 = score(doc=2998,freq=2.0), product of:
              0.2617007 = queryWeight, product of:
                3.8362257 = boost
                4.82888 = idf(docFreq=911, maxDocs=41962)
                0.014127142 = queryNorm
              0.64022505 = fieldWeight in 2998, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.82888 = idf(docFreq=911, maxDocs=41962)
                0.09375 = fieldNorm(doc=2998)
          0.28829506 = weight(abstract_txt:news in 2998) [ClassicSimilarity], result of:
            0.28829506 = score(doc=2998,freq=2.0), product of:
              0.35696232 = queryWeight, product of:
                4.1480074 = boost
                6.0915604 = idf(docFreq=257, maxDocs=41962)
                0.014127142 = queryNorm
              0.8076344 = fieldWeight in 2998, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.0915604 = idf(docFreq=257, maxDocs=41962)
                0.09375 = fieldNorm(doc=2998)
        0.2 = coord(5/25)