Document (#15671)

Author
Goh, A.
Hui, S.C.
Chan, S.K.
Title
¬A text extraction system for news reports
Source
Asian libraries. 5(1996) no.1, S.34-42
Year
1996
Abstract
Describes the design and implementation of a text extraction tool, NEWS_EXT, which aztomatically produces summaries from news reports by extracting sentences to form indicative abstracts. Selection of sentences is based on sentence importance, measured by means of sentence scoring or simple linguistic analysis of sentence structure. Tests were conducted on 4 approaches for the functioning of the NEWS_EXT system; extraction by keyword frequency; extraction by title keywords; extraction by location; and extraction by indicative phrase. Reports results of a study to compare the results of the application of NEWS_EXT with manually produced extracts; using relevance as the criterion for effectiveness. 48 newspaper articles were assessed (The Straits Times, International Herald Tribune, Asian Wall Street Journal, and Financial Times). The evaluation was conducted in 2 stages: stage 1 involving abstracts produced manually by 2 human experts; stage 2 involving the generation of abstracts using NEWS_EXT. Results of each of the 4 approaches were compared with the human produced abstracts, where the title and location approaches were found to give the best results for both local and foreign news. Reports plans to refine and enhance NEWS_EXT and incorporate it as a module within a larger newspaper clipping system
Theme
Automatisches Abstracting
Object
NEWS_EXT

Similar documents (author)

  1. Chan, L.M.: Year's work in cataloging and classification : 1975 (1976) 4.55
    4.54692 = sum of:
      4.54692 = weight(author_txt:chan in 307) [ClassicSimilarity], result of:
        4.54692 = fieldWeight in 307, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.275072 = idf(docFreq=78, maxDocs=41962)
          0.625 = fieldNorm(doc=307)
    
  2. Chan, L.M.: 'American poetry' but 'Satire, American' : the direct and inverted forms of subject headings containing national adjectives (1973) 4.55
    4.54692 = sum of:
      4.54692 = weight(author_txt:chan in 382) [ClassicSimilarity], result of:
        4.54692 = fieldWeight in 382, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.275072 = idf(docFreq=78, maxDocs=41962)
          0.625 = fieldNorm(doc=382)
    
  3. Chan, L.M.: Library of Congress Classification as an online retrieval tool : potentials and limitations (1986) 4.55
    4.54692 = sum of:
      4.54692 = weight(author_txt:chan in 1145) [ClassicSimilarity], result of:
        4.54692 = fieldWeight in 1145, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.275072 = idf(docFreq=78, maxDocs=41962)
          0.625 = fieldNorm(doc=1145)
    
  4. Chan, L.M.: Library of Congress class numbers in online catalog searching (1989) 4.55
    4.54692 = sum of:
      4.54692 = weight(author_txt:chan in 1146) [ClassicSimilarity], result of:
        4.54692 = fieldWeight in 1146, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.275072 = idf(docFreq=78, maxDocs=41962)
          0.625 = fieldNorm(doc=1146)
    
  5. Chan, L.M.: Dewey 18: another step in an evolutionary step (1972) 4.55
    4.54692 = sum of:
      4.54692 = weight(author_txt:chan in 1780) [ClassicSimilarity], result of:
        4.54692 = fieldWeight in 1780, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.275072 = idf(docFreq=78, maxDocs=41962)
          0.625 = fieldNorm(doc=1780)
    

Similar documents (content)

  1. Goh, A.; Hui, S.C.: TES: a text extraction system (1996) 0.44
    0.4385765 = sum of:
      0.4385765 = product of:
        1.3705516 = sum of:
          0.025734978 = weight(abstract_txt:system in 6668) [ClassicSimilarity], result of:
            0.025734978 = score(doc=6668,freq=1.0), product of:
              0.069963664 = queryWeight, product of:
                1.2288054 = boost
                3.3630488 = idf(docFreq=3949, maxDocs=41962)
                0.016929973 = queryNorm
              0.36783347 = fieldWeight in 6668, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3630488 = idf(docFreq=3949, maxDocs=41962)
                0.109375 = fieldNorm(doc=6668)
          0.13497604 = weight(abstract_txt:manually in 6668) [ClassicSimilarity], result of:
            0.13497604 = score(doc=6668,freq=1.0), product of:
              0.18450044 = queryWeight, product of:
                1.6292957 = boost
                6.6886926 = idf(docFreq=141, maxDocs=41962)
                0.016929973 = queryNorm
              0.7315757 = fieldWeight in 6668, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6886926 = idf(docFreq=141, maxDocs=41962)
                0.109375 = fieldNorm(doc=6668)
          0.15667066 = weight(abstract_txt:sentences in 6668) [ClassicSimilarity], result of:
            0.15667066 = score(doc=6668,freq=1.0), product of:
              0.20377523 = queryWeight, product of:
                1.7122884 = boost
                7.0293994 = idf(docFreq=100, maxDocs=41962)
                0.016929973 = queryNorm
              0.76884055 = fieldWeight in 6668, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0293994 = idf(docFreq=100, maxDocs=41962)
                0.109375 = fieldNorm(doc=6668)
          0.039457116 = weight(abstract_txt:results in 6668) [ClassicSimilarity], result of:
            0.039457116 = score(doc=6668,freq=1.0), product of:
              0.10238897 = queryWeight, product of:
                1.716496 = boost
                3.5233364 = idf(docFreq=3364, maxDocs=41962)
                0.016929973 = queryNorm
              0.38536492 = fieldWeight in 6668, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5233364 = idf(docFreq=3364, maxDocs=41962)
                0.109375 = fieldNorm(doc=6668)
          0.243999 = weight(abstract_txt:indicative in 6668) [ClassicSimilarity], result of:
            0.243999 = score(doc=6668,freq=1.0), product of:
              0.27379045 = queryWeight, product of:
                1.9847708 = boost
                8.148012 = idf(docFreq=32, maxDocs=41962)
                0.016929973 = queryNorm
              0.89118886 = fieldWeight in 6668, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.148012 = idf(docFreq=32, maxDocs=41962)
                0.109375 = fieldNorm(doc=6668)
          0.11657972 = weight(abstract_txt:produced in 6668) [ClassicSimilarity], result of:
            0.11657972 = score(doc=6668,freq=1.0), product of:
              0.19154553 = queryWeight, product of:
                2.033213 = boost
                5.5645866 = idf(docFreq=436, maxDocs=41962)
                0.016929973 = queryNorm
              0.60862666 = fieldWeight in 6668, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5645866 = idf(docFreq=436, maxDocs=41962)
                0.109375 = fieldNorm(doc=6668)
          0.19075014 = weight(abstract_txt:abstracts in 6668) [ClassicSimilarity], result of:
            0.19075014 = score(doc=6668,freq=1.0), product of:
              0.29273826 = queryWeight, product of:
                2.9023912 = boost
                5.9575443 = idf(docFreq=294, maxDocs=41962)
                0.016929973 = queryNorm
              0.65160644 = fieldWeight in 6668, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9575443 = idf(docFreq=294, maxDocs=41962)
                0.109375 = fieldNorm(doc=6668)
          0.46238396 = weight(abstract_txt:extraction in 6668) [ClassicSimilarity], result of:
            0.46238396 = score(doc=6668,freq=2.0), product of:
              0.47994542 = queryWeight, product of:
                4.5515337 = boost
                6.2284193 = idf(docFreq=224, maxDocs=41962)
                0.016929973 = queryNorm
              0.9634095 = fieldWeight in 6668, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.2284193 = idf(docFreq=224, maxDocs=41962)
                0.109375 = fieldNorm(doc=6668)
        0.32 = coord(8/25)
    
  2. Yang, C.C.; Wang, F.L.: Hierarchical summarization of large documents (2008) 0.20
    0.19839877 = sum of:
      0.19839877 = product of:
        0.708567 = sum of:
          0.03913671 = weight(abstract_txt:human in 3720) [ClassicSimilarity], result of:
            0.03913671 = score(doc=3720,freq=2.0), product of:
              0.09316037 = queryWeight, product of:
                1.1577554 = boost
                4.752894 = idf(docFreq=983, maxDocs=41962)
                0.016929973 = queryNorm
              0.42010042 = fieldWeight in 3720, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.752894 = idf(docFreq=983, maxDocs=41962)
                0.0625 = fieldNorm(doc=3720)
          0.032032624 = weight(abstract_txt:conducted in 3720) [ClassicSimilarity], result of:
            0.032032624 = score(doc=3720,freq=1.0), product of:
              0.10270226 = queryWeight, product of:
                1.2156014 = boost
                4.9903674 = idf(docFreq=775, maxDocs=41962)
                0.016929973 = queryNorm
              0.31189796 = fieldWeight in 3720, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9903674 = idf(docFreq=775, maxDocs=41962)
                0.0625 = fieldNorm(doc=3720)
          0.15506373 = weight(abstract_txt:sentences in 3720) [ClassicSimilarity], result of:
            0.15506373 = score(doc=3720,freq=3.0), product of:
              0.20377523 = queryWeight, product of:
                1.7122884 = boost
                7.0293994 = idf(docFreq=100, maxDocs=41962)
                0.016929973 = queryNorm
              0.7609548 = fieldWeight in 3720, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.0293994 = idf(docFreq=100, maxDocs=41962)
                0.0625 = fieldNorm(doc=3720)
          0.022546925 = weight(abstract_txt:results in 3720) [ClassicSimilarity], result of:
            0.022546925 = score(doc=3720,freq=1.0), product of:
              0.10238897 = queryWeight, product of:
                1.716496 = boost
                3.5233364 = idf(docFreq=3364, maxDocs=41962)
                0.016929973 = queryNorm
              0.22020853 = fieldWeight in 3720, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5233364 = idf(docFreq=3364, maxDocs=41962)
                0.0625 = fieldNorm(doc=3720)
          0.06661698 = weight(abstract_txt:produced in 3720) [ClassicSimilarity], result of:
            0.06661698 = score(doc=3720,freq=1.0), product of:
              0.19154553 = queryWeight, product of:
                2.033213 = boost
                5.5645866 = idf(docFreq=436, maxDocs=41962)
                0.016929973 = queryNorm
              0.34778666 = fieldWeight in 3720, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5645866 = idf(docFreq=436, maxDocs=41962)
                0.0625 = fieldNorm(doc=3720)
          0.12895067 = weight(abstract_txt:sentence in 3720) [ClassicSimilarity], result of:
            0.12895067 = score(doc=3720,freq=1.0), product of:
              0.2975074 = queryWeight, product of:
                2.5339365 = boost
                6.9349895 = idf(docFreq=110, maxDocs=41962)
                0.016929973 = queryNorm
              0.43343684 = fieldWeight in 3720, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9349895 = idf(docFreq=110, maxDocs=41962)
                0.0625 = fieldNorm(doc=3720)
          0.2642194 = weight(abstract_txt:extraction in 3720) [ClassicSimilarity], result of:
            0.2642194 = score(doc=3720,freq=2.0), product of:
              0.47994542 = queryWeight, product of:
                4.5515337 = boost
                6.2284193 = idf(docFreq=224, maxDocs=41962)
                0.016929973 = queryNorm
              0.5505197 = fieldWeight in 3720, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.2284193 = idf(docFreq=224, maxDocs=41962)
                0.0625 = fieldNorm(doc=3720)
        0.28 = coord(7/25)
    
  3. Ibekwe-SanJuan, F.: Semantic metadata annotation : tagging Medline abstracts for enhanced information access (2010) 0.15
    0.151606 = sum of:
      0.151606 = product of:
        0.63169163 = sum of:
          0.13568076 = weight(abstract_txt:sentences in 950) [ClassicSimilarity], result of:
            0.13568076 = score(doc=950,freq=3.0), product of:
              0.20377523 = queryWeight, product of:
                1.7122884 = boost
                7.0293994 = idf(docFreq=100, maxDocs=41962)
                0.016929973 = queryNorm
              0.66583544 = fieldWeight in 950, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.0293994 = idf(docFreq=100, maxDocs=41962)
                0.0546875 = fieldNorm(doc=950)
          0.03500318 = weight(abstract_txt:approaches in 950) [ClassicSimilarity], result of:
            0.03500318 = score(doc=950,freq=1.0), product of:
              0.13633741 = queryWeight, product of:
                1.7153565 = boost
                4.694663 = idf(docFreq=1042, maxDocs=41962)
                0.016929973 = queryNorm
              0.25673938 = fieldWeight in 950, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.694663 = idf(docFreq=1042, maxDocs=41962)
                0.0546875 = fieldNorm(doc=950)
          0.019728558 = weight(abstract_txt:results in 950) [ClassicSimilarity], result of:
            0.019728558 = score(doc=950,freq=1.0), product of:
              0.10238897 = queryWeight, product of:
                1.716496 = boost
                3.5233364 = idf(docFreq=3364, maxDocs=41962)
                0.016929973 = queryNorm
              0.19268246 = fieldWeight in 950, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5233364 = idf(docFreq=3364, maxDocs=41962)
                0.0546875 = fieldNorm(doc=950)
          0.0325835 = weight(abstract_txt:were in 950) [ClassicSimilarity], result of:
            0.0325835 = score(doc=950,freq=2.0), product of:
              0.11354764 = queryWeight, product of:
                1.8076122 = boost
                3.7103646 = idf(docFreq=2790, maxDocs=41962)
                0.016929973 = queryNorm
              0.28695887 = fieldWeight in 950, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.7103646 = idf(docFreq=2790, maxDocs=41962)
                0.0546875 = fieldNorm(doc=950)
          0.19543047 = weight(abstract_txt:sentence in 950) [ClassicSimilarity], result of:
            0.19543047 = score(doc=950,freq=3.0), product of:
              0.2975074 = queryWeight, product of:
                2.5339365 = boost
                6.9349895 = idf(docFreq=110, maxDocs=41962)
                0.016929973 = queryNorm
              0.6568928 = fieldWeight in 950, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.9349895 = idf(docFreq=110, maxDocs=41962)
                0.0546875 = fieldNorm(doc=950)
          0.21326514 = weight(abstract_txt:abstracts in 950) [ClassicSimilarity], result of:
            0.21326514 = score(doc=950,freq=5.0), product of:
              0.29273826 = queryWeight, product of:
                2.9023912 = boost
                5.9575443 = idf(docFreq=294, maxDocs=41962)
                0.016929973 = queryNorm
              0.7285181 = fieldWeight in 950, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.9575443 = idf(docFreq=294, maxDocs=41962)
                0.0546875 = fieldNorm(doc=950)
        0.24 = coord(6/25)
    
  4. Ou, S.; Khoo, S.G.; Goh, D.H.: Automatic multidocument summarization of research abstracts : design and user evaluation (2007) 0.15
    0.15052687 = sum of:
      0.15052687 = product of:
        0.6271953 = sum of:
          0.014705701 = weight(abstract_txt:system in 2523) [ClassicSimilarity], result of:
            0.014705701 = score(doc=2523,freq=1.0), product of:
              0.069963664 = queryWeight, product of:
                1.2288054 = boost
                3.3630488 = idf(docFreq=3949, maxDocs=41962)
                0.016929973 = queryNorm
              0.21019055 = fieldWeight in 2523, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3630488 = idf(docFreq=3949, maxDocs=41962)
                0.0625 = fieldNorm(doc=2523)
          0.12660901 = weight(abstract_txt:sentences in 2523) [ClassicSimilarity], result of:
            0.12660901 = score(doc=2523,freq=2.0), product of:
              0.20377523 = queryWeight, product of:
                1.7122884 = boost
                7.0293994 = idf(docFreq=100, maxDocs=41962)
                0.016929973 = queryNorm
              0.62131697 = fieldWeight in 2523, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.0293994 = idf(docFreq=100, maxDocs=41962)
                0.0625 = fieldNorm(doc=2523)
          0.022546925 = weight(abstract_txt:results in 2523) [ClassicSimilarity], result of:
            0.022546925 = score(doc=2523,freq=1.0), product of:
              0.10238897 = queryWeight, product of:
                1.716496 = boost
                3.5233364 = idf(docFreq=3364, maxDocs=41962)
                0.016929973 = queryNorm
              0.22020853 = fieldWeight in 2523, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5233364 = idf(docFreq=3364, maxDocs=41962)
                0.0625 = fieldNorm(doc=2523)
          0.037238285 = weight(abstract_txt:were in 2523) [ClassicSimilarity], result of:
            0.037238285 = score(doc=2523,freq=2.0), product of:
              0.11354764 = queryWeight, product of:
                1.8076122 = boost
                3.7103646 = idf(docFreq=2790, maxDocs=41962)
                0.016929973 = queryNorm
              0.32795298 = fieldWeight in 2523, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.7103646 = idf(docFreq=2790, maxDocs=41962)
                0.0625 = fieldNorm(doc=2523)
          0.1823638 = weight(abstract_txt:sentence in 2523) [ClassicSimilarity], result of:
            0.1823638 = score(doc=2523,freq=2.0), product of:
              0.2975074 = queryWeight, product of:
                2.5339365 = boost
                6.9349895 = idf(docFreq=110, maxDocs=41962)
                0.016929973 = queryNorm
              0.61297226 = fieldWeight in 2523, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.9349895 = idf(docFreq=110, maxDocs=41962)
                0.0625 = fieldNorm(doc=2523)
          0.24373157 = weight(abstract_txt:abstracts in 2523) [ClassicSimilarity], result of:
            0.24373157 = score(doc=2523,freq=5.0), product of:
              0.29273826 = queryWeight, product of:
                2.9023912 = boost
                5.9575443 = idf(docFreq=294, maxDocs=41962)
                0.016929973 = queryNorm
              0.8325921 = fieldWeight in 2523, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.9575443 = idf(docFreq=294, maxDocs=41962)
                0.0625 = fieldNorm(doc=2523)
        0.24 = coord(6/25)
    
  5. Ling, X.; Jiang, J.; He, X.; Mei, Q.; Zhai, C.; Schatz, B.: Generating gene summaries from biomedical literature : a study of semi-structured summarization (2007) 0.15
    0.14553936 = sum of:
      0.14553936 = product of:
        0.7276968 = sum of:
          0.10350292 = weight(abstract_txt:stage in 2947) [ClassicSimilarity], result of:
            0.10350292 = score(doc=2947,freq=3.0), product of:
              0.15563704 = queryWeight, product of:
                1.4964345 = boost
                6.1432614 = idf(docFreq=244, maxDocs=41962)
                0.016929973 = queryNorm
              0.66502756 = fieldWeight in 2947, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.1432614 = idf(docFreq=244, maxDocs=41962)
                0.0625 = fieldNorm(doc=2947)
          0.15506373 = weight(abstract_txt:sentences in 2947) [ClassicSimilarity], result of:
            0.15506373 = score(doc=2947,freq=3.0), product of:
              0.20377523 = queryWeight, product of:
                1.7122884 = boost
                7.0293994 = idf(docFreq=100, maxDocs=41962)
                0.016929973 = queryNorm
              0.7609548 = fieldWeight in 2947, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.0293994 = idf(docFreq=100, maxDocs=41962)
                0.0625 = fieldNorm(doc=2947)
          0.022546925 = weight(abstract_txt:results in 2947) [ClassicSimilarity], result of:
            0.022546925 = score(doc=2947,freq=1.0), product of:
              0.10238897 = queryWeight, product of:
                1.716496 = boost
                3.5233364 = idf(docFreq=3364, maxDocs=41962)
                0.016929973 = queryNorm
              0.22020853 = fieldWeight in 2947, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5233364 = idf(docFreq=3364, maxDocs=41962)
                0.0625 = fieldNorm(doc=2947)
          0.1823638 = weight(abstract_txt:sentence in 2947) [ClassicSimilarity], result of:
            0.1823638 = score(doc=2947,freq=2.0), product of:
              0.2975074 = queryWeight, product of:
                2.5339365 = boost
                6.9349895 = idf(docFreq=110, maxDocs=41962)
                0.016929973 = queryNorm
              0.61297226 = fieldWeight in 2947, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.9349895 = idf(docFreq=110, maxDocs=41962)
                0.0625 = fieldNorm(doc=2947)
          0.2642194 = weight(abstract_txt:extraction in 2947) [ClassicSimilarity], result of:
            0.2642194 = score(doc=2947,freq=2.0), product of:
              0.47994542 = queryWeight, product of:
                4.5515337 = boost
                6.2284193 = idf(docFreq=224, maxDocs=41962)
                0.016929973 = queryNorm
              0.5505197 = fieldWeight in 2947, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.2284193 = idf(docFreq=224, maxDocs=41962)
                0.0625 = fieldNorm(doc=2947)
        0.2 = coord(5/25)