Document (#19832)

Author
Tseng, Y.-H.
Title
Keyword extraction techniques and relevance feedback
Source
Bulletin of the Library Association of China. 1997, no.59, Dec., S.59-64
Year
1997
Abstract
Automatic keyword extraction is an important and fundamental technology in an advanced information retrieval systems. Briefly compares several major keyword extraction methods, lists their advantages and disadvantages, and reports recent research progress in Taiwan. Also describes the application of a keyword extraction algorithm in an information retrieval system for relevance feedback. Preliminary analysis shows that the error rate of extracting relevant keywords is 18%, and that the precision rate is over 50%. The main disadvantage of this approach is that the extraction results depend on the retrieval results, which in turn depend on the data held by the database. Apart from collecting more data, this problem can be alleviated by the application of a thesaurus constructed by the same keyword extraction algorithm
Footnote
[In Chinesisch]
Theme
Indexierungsstudien

Similar documents (author)

  1. Tseng, Y.-H.: Automatic cataloguing and searching for retrospective data by use of OCR text (2001) 4.54
    4.5438676 = sum of:
      4.5438676 = weight(author_txt:tseng in 5421) [ClassicSimilarity], result of:
        4.5438676 = fieldWeight in 5421, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.087735 = idf(docFreq=12, maxDocs=42306)
          0.5 = fieldNorm(doc=5421)
    
  2. Tseng, Y.-H.: Solving vocabulary problems with interactive query expansion (1998) 4.54
    4.5438676 = sum of:
      4.5438676 = weight(author_txt:tseng in 160) [ClassicSimilarity], result of:
        4.5438676 = fieldWeight in 160, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.087735 = idf(docFreq=12, maxDocs=42306)
          0.5 = fieldNorm(doc=160)
    
  3. Tseng, Y.H.; Lin, Y.I.: Evaluation of fuzzy search, term suggestion, and term relevance feedback in an OPAC system (1998) 4.54
    4.5438676 = sum of:
      4.5438676 = weight(author_txt:tseng in 431) [ClassicSimilarity], result of:
        4.5438676 = fieldWeight in 431, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.087735 = idf(docFreq=12, maxDocs=42306)
          0.5 = fieldNorm(doc=431)
    
  4. Tseng, Y.-H.: Automatic thesaurus generation for Chinese documents (2002) 4.54
    4.5438676 = sum of:
      4.5438676 = weight(author_txt:tseng in 227) [ClassicSimilarity], result of:
        4.5438676 = fieldWeight in 227, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.087735 = idf(docFreq=12, maxDocs=42306)
          0.5 = fieldNorm(doc=227)
    
  5. Drenth, H.; Morris, A.; Tseng, G.: Expert systems as information intermediaries (1991) 3.41
    3.4079008 = sum of:
      3.4079008 = weight(author_txt:tseng in 3695) [ClassicSimilarity], result of:
        3.4079008 = fieldWeight in 3695, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.087735 = idf(docFreq=12, maxDocs=42306)
          0.375 = fieldNorm(doc=3695)
    

Similar documents (content)

  1. Goh, A.; Hui, S.C.; Chan, S.K.: ¬A text extraction system for news reports (1996) 0.24
    0.24050185 = sum of:
      0.24050185 = product of:
        0.8589352 = sum of:
          0.034701407 = weight(abstract_txt:keywords in 6670) [ClassicSimilarity], result of:
            0.034701407 = score(doc=6670,freq=1.0), product of:
              0.09165154 = queryWeight, product of:
                6.057973 = idf(docFreq=268, maxDocs=42306)
                0.015129077 = queryNorm
              0.3786233 = fieldWeight in 6670, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.057973 = idf(docFreq=268, maxDocs=42306)
                0.0625 = fieldNorm(doc=6670)
          0.052448444 = weight(abstract_txt:extracting in 6670) [ClassicSimilarity], result of:
            0.052448444 = score(doc=6670,freq=1.0), product of:
              0.12070634 = queryWeight, product of:
                1.1476122 = boost
                6.9522038 = idf(docFreq=109, maxDocs=42306)
                0.015129077 = queryNorm
              0.43451273 = fieldWeight in 6670, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9522038 = idf(docFreq=109, maxDocs=42306)
                0.0625 = fieldNorm(doc=6670)
          0.027080592 = weight(abstract_txt:results in 6670) [ClassicSimilarity], result of:
            0.027080592 = score(doc=6670,freq=4.0), product of:
              0.06166004 = queryWeight, product of:
                1.1599706 = boost
                3.5135355 = idf(docFreq=3425, maxDocs=42306)
                0.015129077 = queryNorm
              0.43919194 = fieldWeight in 6670, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.5135355 = idf(docFreq=3425, maxDocs=42306)
                0.0625 = fieldNorm(doc=6670)
          0.030063013 = weight(abstract_txt:application in 6670) [ClassicSimilarity], result of:
            0.030063013 = score(doc=6670,freq=1.0), product of:
              0.10493975 = queryWeight, product of:
                1.5132655 = boost
                4.5836606 = idf(docFreq=1174, maxDocs=42306)
                0.015129077 = queryNorm
              0.2864788 = fieldWeight in 6670, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5836606 = idf(docFreq=1174, maxDocs=42306)
                0.0625 = fieldNorm(doc=6670)
          0.03765519 = weight(abstract_txt:relevance in 6670) [ClassicSimilarity], result of:
            0.03765519 = score(doc=6670,freq=1.0), product of:
              0.121936835 = queryWeight, product of:
                1.6312201 = boost
                4.9409437 = idf(docFreq=821, maxDocs=42306)
                0.015129077 = queryNorm
              0.30880898 = fieldWeight in 6670, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9409437 = idf(docFreq=821, maxDocs=42306)
                0.0625 = fieldNorm(doc=6670)
          0.17224187 = weight(abstract_txt:keyword in 6670) [ClassicSimilarity], result of:
            0.17224187 = score(doc=6670,freq=1.0), product of:
              0.4560273 = queryWeight, product of:
                4.9878173 = boost
                6.0432124 = idf(docFreq=272, maxDocs=42306)
                0.015129077 = queryNorm
              0.37770078 = fieldWeight in 6670, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0432124 = idf(docFreq=272, maxDocs=42306)
                0.0625 = fieldNorm(doc=6670)
          0.50474465 = weight(abstract_txt:extraction in 6670) [ClassicSimilarity], result of:
            0.50474465 = score(doc=6670,freq=5.0), product of:
              0.580341 = queryWeight, product of:
                6.163783 = boost
                6.2233386 = idf(docFreq=227, maxDocs=42306)
                0.015129077 = queryNorm
              0.86973804 = fieldWeight in 6670, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.2233386 = idf(docFreq=227, maxDocs=42306)
                0.0625 = fieldNorm(doc=6670)
        0.28 = coord(7/25)
    
  2. Ercan, G.; Cicekli, I.: Using lexical chains for keyword extraction (2007) 0.21
    0.21191578 = sum of:
      0.21191578 = product of:
        1.0595789 = sum of:
          0.0736128 = weight(abstract_txt:keywords in 2952) [ClassicSimilarity], result of:
            0.0736128 = score(doc=2952,freq=2.0), product of:
              0.09165154 = queryWeight, product of:
                6.057973 = idf(docFreq=268, maxDocs=42306)
                0.015129077 = queryNorm
              0.8031813 = fieldWeight in 2952, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.057973 = idf(docFreq=268, maxDocs=42306)
                0.09375 = fieldNorm(doc=2952)
          0.020310445 = weight(abstract_txt:results in 2952) [ClassicSimilarity], result of:
            0.020310445 = score(doc=2952,freq=1.0), product of:
              0.06166004 = queryWeight, product of:
                1.1599706 = boost
                3.5135355 = idf(docFreq=3425, maxDocs=42306)
                0.015129077 = queryNorm
              0.32939395 = fieldWeight in 2952, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5135355 = idf(docFreq=3425, maxDocs=42306)
                0.09375 = fieldNorm(doc=2952)
          0.013815276 = weight(abstract_txt:that in 2952) [ClassicSimilarity], result of:
            0.013815276 = score(doc=2952,freq=2.0), product of:
              0.043329563 = queryWeight, product of:
                1.1909208 = boost
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.015129077 = queryNorm
              0.31884181 = fieldWeight in 2952, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.09375 = fieldNorm(doc=2952)
          0.36538014 = weight(abstract_txt:keyword in 2952) [ClassicSimilarity], result of:
            0.36538014 = score(doc=2952,freq=2.0), product of:
              0.4560273 = queryWeight, product of:
                4.9878173 = boost
                6.0432124 = idf(docFreq=272, maxDocs=42306)
                0.015129077 = queryNorm
              0.8012243 = fieldWeight in 2952, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.0432124 = idf(docFreq=272, maxDocs=42306)
                0.09375 = fieldNorm(doc=2952)
          0.5864603 = weight(abstract_txt:extraction in 2952) [ClassicSimilarity], result of:
            0.5864603 = score(doc=2952,freq=3.0), product of:
              0.580341 = queryWeight, product of:
                6.163783 = boost
                6.2233386 = idf(docFreq=227, maxDocs=42306)
                0.015129077 = queryNorm
              1.0105443 = fieldWeight in 2952, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.2233386 = idf(docFreq=227, maxDocs=42306)
                0.09375 = fieldNorm(doc=2952)
        0.2 = coord(5/25)
    
  3. Ning, X.; Jin, H.; Jia, W.; Yuan, P.: Practical and effective IR-style keyword search over semantic web (2009) 0.17
    0.16857348 = sum of:
      0.16857348 = product of:
        0.60204816 = sum of:
          0.052052114 = weight(abstract_txt:keywords in 1214) [ClassicSimilarity], result of:
            0.052052114 = score(doc=1214,freq=1.0), product of:
              0.09165154 = queryWeight, product of:
                6.057973 = idf(docFreq=268, maxDocs=42306)
                0.015129077 = queryNorm
              0.567935 = fieldWeight in 1214, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.057973 = idf(docFreq=268, maxDocs=42306)
                0.09375 = fieldNorm(doc=1214)
          0.018124485 = weight(abstract_txt:data in 1214) [ClassicSimilarity], result of:
            0.018124485 = score(doc=1214,freq=1.0), product of:
              0.057152424 = queryWeight, product of:
                1.1167667 = boost
                3.382671 = idf(docFreq=3904, maxDocs=42306)
                0.015129077 = queryNorm
              0.3171254 = fieldWeight in 1214, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.382671 = idf(docFreq=3904, maxDocs=42306)
                0.09375 = fieldNorm(doc=1214)
          0.020310445 = weight(abstract_txt:results in 1214) [ClassicSimilarity], result of:
            0.020310445 = score(doc=1214,freq=1.0), product of:
              0.06166004 = queryWeight, product of:
                1.1599706 = boost
                3.5135355 = idf(docFreq=3425, maxDocs=42306)
                0.015129077 = queryNorm
              0.32939395 = fieldWeight in 1214, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5135355 = idf(docFreq=3425, maxDocs=42306)
                0.09375 = fieldNorm(doc=1214)
          0.016920188 = weight(abstract_txt:that in 1214) [ClassicSimilarity], result of:
            0.016920188 = score(doc=1214,freq=3.0), product of:
              0.043329563 = queryWeight, product of:
                1.1909208 = boost
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.015129077 = queryNorm
              0.39049986 = fieldWeight in 1214, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.09375 = fieldNorm(doc=1214)
          0.041238446 = weight(abstract_txt:retrieval in 1214) [ClassicSimilarity], result of:
            0.041238446 = score(doc=1214,freq=2.0), product of:
              0.08982822 = queryWeight, product of:
                1.7147355 = boost
                3.4626071 = idf(docFreq=3604, maxDocs=42306)
                0.015129077 = queryNorm
              0.45908117 = fieldWeight in 1214, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4626071 = idf(docFreq=3604, maxDocs=42306)
                0.09375 = fieldNorm(doc=1214)
          0.08802233 = weight(abstract_txt:algorithm in 1214) [ClassicSimilarity], result of:
            0.08802233 = score(doc=1214,freq=1.0), product of:
              0.16390269 = queryWeight, product of:
                1.891203 = boost
                5.7284284 = idf(docFreq=373, maxDocs=42306)
                0.015129077 = queryNorm
              0.5370402 = fieldWeight in 1214, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7284284 = idf(docFreq=373, maxDocs=42306)
                0.09375 = fieldNorm(doc=1214)
          0.36538014 = weight(abstract_txt:keyword in 1214) [ClassicSimilarity], result of:
            0.36538014 = score(doc=1214,freq=2.0), product of:
              0.4560273 = queryWeight, product of:
                4.9878173 = boost
                6.0432124 = idf(docFreq=272, maxDocs=42306)
                0.015129077 = queryNorm
              0.8012243 = fieldWeight in 1214, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.0432124 = idf(docFreq=272, maxDocs=42306)
                0.09375 = fieldNorm(doc=1214)
        0.28 = coord(7/25)
    
  4. Semantic keyword-based search on structured data sources : COST Action IC1302. Second International KEYSTONE Conference, IKC 2016, Cluj-Napoca, Romania, September 8-9, 2016, Revised Selected Papers (2017) 0.16
    0.15781797 = sum of:
      0.15781797 = product of:
        0.98636234 = sum of:
          0.021145234 = weight(abstract_txt:data in 398) [ClassicSimilarity], result of:
            0.021145234 = score(doc=398,freq=1.0), product of:
              0.057152424 = queryWeight, product of:
                1.1167667 = boost
                3.382671 = idf(docFreq=3904, maxDocs=42306)
                0.015129077 = queryNorm
              0.36997965 = fieldWeight in 398, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.382671 = idf(docFreq=3904, maxDocs=42306)
                0.109375 = fieldNorm(doc=398)
          0.04811152 = weight(abstract_txt:retrieval in 398) [ClassicSimilarity], result of:
            0.04811152 = score(doc=398,freq=2.0), product of:
              0.08982822 = queryWeight, product of:
                1.7147355 = boost
                3.4626071 = idf(docFreq=3604, maxDocs=42306)
                0.015129077 = queryNorm
              0.5355947 = fieldWeight in 398, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4626071 = idf(docFreq=3604, maxDocs=42306)
                0.109375 = fieldNorm(doc=398)
          0.5220804 = weight(abstract_txt:keyword in 398) [ClassicSimilarity], result of:
            0.5220804 = score(doc=398,freq=3.0), product of:
              0.4560273 = queryWeight, product of:
                4.9878173 = boost
                6.0432124 = idf(docFreq=272, maxDocs=42306)
                0.015129077 = queryNorm
              1.1448447 = fieldWeight in 398, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.0432124 = idf(docFreq=272, maxDocs=42306)
                0.109375 = fieldNorm(doc=398)
          0.39502513 = weight(abstract_txt:extraction in 398) [ClassicSimilarity], result of:
            0.39502513 = score(doc=398,freq=1.0), product of:
              0.580341 = queryWeight, product of:
                6.163783 = boost
                6.2233386 = idf(docFreq=227, maxDocs=42306)
                0.015129077 = queryNorm
              0.68067765 = fieldWeight in 398, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2233386 = idf(docFreq=227, maxDocs=42306)
                0.109375 = fieldNorm(doc=398)
        0.16 = coord(4/25)
    
  5. Colace, F.; Santo, M. de; Greco, L.; Napoletano, P.: Improving relevance feedback-based query expansion by the use of a weighted word pairs approach (2015) 0.15
    0.14916503 = sum of:
      0.14916503 = product of:
        0.621521 = sum of:
          0.01692537 = weight(abstract_txt:results in 4264) [ClassicSimilarity], result of:
            0.01692537 = score(doc=4264,freq=1.0), product of:
              0.06166004 = queryWeight, product of:
                1.1599706 = boost
                3.5135355 = idf(docFreq=3425, maxDocs=42306)
                0.015129077 = queryNorm
              0.27449495 = fieldWeight in 4264, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5135355 = idf(docFreq=3425, maxDocs=42306)
                0.078125 = fieldNorm(doc=4264)
          0.00814073 = weight(abstract_txt:that in 4264) [ClassicSimilarity], result of:
            0.00814073 = score(doc=4264,freq=1.0), product of:
              0.043329563 = queryWeight, product of:
                1.1909208 = boost
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.015129077 = queryNorm
              0.18787934 = fieldWeight in 4264, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.078125 = fieldNorm(doc=4264)
          0.04706899 = weight(abstract_txt:relevance in 4264) [ClassicSimilarity], result of:
            0.04706899 = score(doc=4264,freq=1.0), product of:
              0.121936835 = queryWeight, product of:
                1.6312201 = boost
                4.9409437 = idf(docFreq=821, maxDocs=42306)
                0.015129077 = queryNorm
              0.38601124 = fieldWeight in 4264, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9409437 = idf(docFreq=821, maxDocs=42306)
                0.078125 = fieldNorm(doc=4264)
          0.03436537 = weight(abstract_txt:retrieval in 4264) [ClassicSimilarity], result of:
            0.03436537 = score(doc=4264,freq=2.0), product of:
              0.08982822 = queryWeight, product of:
                1.7147355 = boost
                3.4626071 = idf(docFreq=3604, maxDocs=42306)
                0.015129077 = queryNorm
              0.38256764 = fieldWeight in 4264, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4626071 = idf(docFreq=3604, maxDocs=42306)
                0.078125 = fieldNorm(doc=4264)
          0.115984894 = weight(abstract_txt:feedback in 4264) [ClassicSimilarity], result of:
            0.115984894 = score(doc=4264,freq=2.0), product of:
              0.1765642 = queryWeight, product of:
                1.9628922 = boost
                5.945574 = idf(docFreq=300, maxDocs=42306)
                0.015129077 = queryNorm
              0.6568993 = fieldWeight in 4264, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.945574 = idf(docFreq=300, maxDocs=42306)
                0.078125 = fieldNorm(doc=4264)
          0.39903563 = weight(abstract_txt:extraction in 4264) [ClassicSimilarity], result of:
            0.39903563 = score(doc=4264,freq=2.0), product of:
              0.580341 = queryWeight, product of:
                6.163783 = boost
                6.2233386 = idf(docFreq=227, maxDocs=42306)
                0.015129077 = queryNorm
              0.6875882 = fieldWeight in 4264, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.2233386 = idf(docFreq=227, maxDocs=42306)
                0.078125 = fieldNorm(doc=4264)
        0.24 = coord(6/25)