Document (#31300)

Author
Souza, R.R.
Raghavan, K.S.
Title
¬A methodology for noun phrase-based automatic indexing
Source
Knowledge organization. 33(2006) no.1, S.35-44
Year
2006
Abstract
The scholarly community is increasingly employing the Web both for publication of scholarly output and for locating and accessing relevant scholarly literature. Organization of this vast body of digital information assumes significance in this context. The sheer volume of digital information to be handled makes traditional indexing and knowledge representation strategies ineffective and impractical. It is, therefore, worth exploring new approaches. An approach being discussed considers the intrinsic semantics of texts of documents. Based on the hypothesis that noun phrases in a text are semantically rich in terms of their ability to represent the subject content of the document, this approach seeks to identify and extract noun phrases instead of single keywords, and use them as descriptors. This paper presents a methodology that has been developed for extracting noun phrases from Portuguese texts. The results of an experiment carried out to test the adequacy of the methodology are also presented.
Theme
Automatisches Indexieren

Similar documents (author)

  1. Raghavan, K.S.: ¬The general theory of classification as the basis for structuring of subject headings (1985(?)) 1.82
    1.8201754 = sum of:
      1.8201754 = product of:
        3.6403508 = sum of:
          3.6403508 = weight(author_txt:raghavan in 1830) [ClassicSimilarity], result of:
            3.6403508 = score(doc=1830,freq=1.0), product of:
              0.71219385 = queryWeight, product of:
                1.0072467 = boost
                8.178337 = idf(docFreq=32, maxDocs=43254)
                0.08645644 = queryNorm
              5.1114607 = fieldWeight in 1830, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.178337 = idf(docFreq=32, maxDocs=43254)
                0.625 = fieldNorm(doc=1830)
        0.5 = coord(1/2)
    
  2. Raghavan, K.S.: Education for knowledge organization : the Indian scene (2005) 1.82
    1.8201754 = sum of:
      1.8201754 = product of:
        3.6403508 = sum of:
          3.6403508 = weight(author_txt:raghavan in 807) [ClassicSimilarity], result of:
            3.6403508 = score(doc=807,freq=1.0), product of:
              0.71219385 = queryWeight, product of:
                1.0072467 = boost
                8.178337 = idf(docFreq=32, maxDocs=43254)
                0.08645644 = queryNorm
              5.1114607 = fieldWeight in 807, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.178337 = idf(docFreq=32, maxDocs=43254)
                0.625 = fieldNorm(doc=807)
        0.5 = coord(1/2)
    
  3. Raghavan, K.S.: Education for information management as a transformation force (2013) 1.82
    1.8201754 = sum of:
      1.8201754 = product of:
        3.6403508 = sum of:
          3.6403508 = weight(author_txt:raghavan in 2402) [ClassicSimilarity], result of:
            3.6403508 = score(doc=2402,freq=1.0), product of:
              0.71219385 = queryWeight, product of:
                1.0072467 = boost
                8.178337 = idf(docFreq=32, maxDocs=43254)
                0.08645644 = queryNorm
              5.1114607 = fieldWeight in 2402, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.178337 = idf(docFreq=32, maxDocs=43254)
                0.625 = fieldNorm(doc=2402)
        0.5 = coord(1/2)
    
  4. Raghavan, K.S.: ¬The Colon Classification : a few considerations on its future (2015) 1.82
    1.8201754 = sum of:
      1.8201754 = product of:
        3.6403508 = sum of:
          3.6403508 = weight(author_txt:raghavan in 4225) [ClassicSimilarity], result of:
            3.6403508 = score(doc=4225,freq=1.0), product of:
              0.71219385 = queryWeight, product of:
                1.0072467 = boost
                8.178337 = idf(docFreq=32, maxDocs=43254)
                0.08645644 = queryNorm
              5.1114607 = fieldWeight in 4225, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.178337 = idf(docFreq=32, maxDocs=43254)
                0.625 = fieldNorm(doc=4225)
        0.5 = coord(1/2)
    
  5. Souza, S.d.: Informacion : utopia y realidad de la bibliotelogia (1996) 1.78
    1.7811713 = sum of:
      1.7811713 = product of:
        3.5623426 = sum of:
          3.5623426 = weight(author_txt:souza in 1825) [ClassicSimilarity], result of:
            3.5623426 = score(doc=1825,freq=1.0), product of:
              0.70198286 = queryWeight, product of:
                8.119497 = idf(docFreq=34, maxDocs=43254)
                0.08645644 = queryNorm
              5.074686 = fieldWeight in 1825, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.119497 = idf(docFreq=34, maxDocs=43254)
                0.625 = fieldNorm(doc=1825)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Styltsvig, H.B.: Ontology-based information retrieval (2006) 0.21
    0.2149593 = sum of:
      0.2149593 = product of:
        0.6717478 = sum of:
          0.037014157 = weight(abstract_txt:hypothesis in 2619) [ClassicSimilarity], result of:
            0.037014157 = score(doc=2619,freq=1.0), product of:
              0.11651685 = queryWeight, product of:
                1.0179405 = boost
                6.777005 = idf(docFreq=133, maxDocs=43254)
                0.016889956 = queryNorm
              0.31767213 = fieldWeight in 2619, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.777005 = idf(docFreq=133, maxDocs=43254)
                0.046875 = fieldNorm(doc=2619)
          0.012601143 = weight(abstract_txt:approach in 2619) [ClassicSimilarity], result of:
            0.012601143 = score(doc=2619,freq=1.0), product of:
              0.0715748 = queryWeight, product of:
                1.1282961 = boost
                3.7558525 = idf(docFreq=2748, maxDocs=43254)
                0.016889956 = queryNorm
              0.17605558 = fieldWeight in 2619, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7558525 = idf(docFreq=2748, maxDocs=43254)
                0.046875 = fieldNorm(doc=2619)
          0.060006987 = weight(abstract_txt:handled in 2619) [ClassicSimilarity], result of:
            0.060006987 = score(doc=2619,freq=1.0), product of:
              0.16079704 = queryWeight, product of:
                1.1958234 = boost
                7.9612727 = idf(docFreq=40, maxDocs=43254)
                0.016889956 = queryNorm
              0.37318465 = fieldWeight in 2619, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9612727 = idf(docFreq=40, maxDocs=43254)
                0.046875 = fieldNorm(doc=2619)
          0.027592892 = weight(abstract_txt:indexing in 2619) [ClassicSimilarity], result of:
            0.027592892 = score(doc=2619,freq=2.0), product of:
              0.09579474 = queryWeight, product of:
                1.3053106 = boost
                4.345095 = idf(docFreq=1524, maxDocs=43254)
                0.016889956 = queryNorm
              0.28804183 = fieldWeight in 2619, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.345095 = idf(docFreq=1524, maxDocs=43254)
                0.046875 = fieldNorm(doc=2619)
          0.013675527 = weight(abstract_txt:this in 2619) [ClassicSimilarity], result of:
            0.013675527 = score(doc=2619,freq=4.0), product of:
              0.059993777 = queryWeight, product of:
                1.4608685 = boost
                2.4314568 = idf(docFreq=10335, maxDocs=43254)
                0.016889956 = queryNorm
              0.22794908 = fieldWeight in 2619, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.4314568 = idf(docFreq=10335, maxDocs=43254)
                0.046875 = fieldNorm(doc=2619)
          0.043095473 = weight(abstract_txt:texts in 2619) [ClassicSimilarity], result of:
            0.043095473 = score(doc=2619,freq=1.0), product of:
              0.16247053 = queryWeight, product of:
                1.6999272 = boost
                5.658688 = idf(docFreq=409, maxDocs=43254)
                0.016889956 = queryNorm
              0.265251 = fieldWeight in 2619, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.658688 = idf(docFreq=409, maxDocs=43254)
                0.046875 = fieldNorm(doc=2619)
          0.16306779 = weight(abstract_txt:phrases in 2619) [ClassicSimilarity], result of:
            0.16306779 = score(doc=2619,freq=2.0), product of:
              0.35844243 = queryWeight, product of:
                3.0924191 = boost
                6.8626604 = idf(docFreq=122, maxDocs=43254)
                0.016889956 = queryNorm
              0.4549344 = fieldWeight in 2619, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.8626604 = idf(docFreq=122, maxDocs=43254)
                0.046875 = fieldNorm(doc=2619)
          0.31469387 = weight(abstract_txt:noun in 2619) [ClassicSimilarity], result of:
            0.31469387 = score(doc=2619,freq=2.0), product of:
              0.6115223 = queryWeight, product of:
                4.6640606 = boost
                7.762822 = idf(docFreq=49, maxDocs=43254)
                0.016889956 = queryNorm
              0.5146073 = fieldWeight in 2619, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.762822 = idf(docFreq=49, maxDocs=43254)
                0.046875 = fieldNorm(doc=2619)
        0.32 = coord(8/25)
    
  2. Tudhope, D.; Binding, C.; Blocks, D.; Cunliffe, D.: Compound descriptors in context : a matching function for classifications and thesauri (2002) 0.18
    0.17725372 = sum of:
      0.17725372 = product of:
        0.633049 = sum of:
          0.09357706 = weight(abstract_txt:descriptors in 5180) [ClassicSimilarity], result of:
            0.09357706 = score(doc=5180,freq=4.0), product of:
              0.11244598 = queryWeight, product of:
                6.657565 = idf(docFreq=150, maxDocs=43254)
                0.016889956 = queryNorm
              0.83219564 = fieldWeight in 5180, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.657565 = idf(docFreq=150, maxDocs=43254)
                0.0625 = fieldNorm(doc=5180)
          0.016801525 = weight(abstract_txt:approach in 5180) [ClassicSimilarity], result of:
            0.016801525 = score(doc=5180,freq=1.0), product of:
              0.0715748 = queryWeight, product of:
                1.1282961 = boost
                3.7558525 = idf(docFreq=2748, maxDocs=43254)
                0.016889956 = queryNorm
              0.23474078 = fieldWeight in 5180, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7558525 = idf(docFreq=2748, maxDocs=43254)
                0.0625 = fieldNorm(doc=5180)
          0.03679052 = weight(abstract_txt:indexing in 5180) [ClassicSimilarity], result of:
            0.03679052 = score(doc=5180,freq=2.0), product of:
              0.09579474 = queryWeight, product of:
                1.3053106 = boost
                4.345095 = idf(docFreq=1524, maxDocs=43254)
                0.016889956 = queryNorm
              0.38405576 = fieldWeight in 5180, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.345095 = idf(docFreq=1524, maxDocs=43254)
                0.0625 = fieldNorm(doc=5180)
          0.026324924 = weight(abstract_txt:digital in 5180) [ClassicSimilarity], result of:
            0.026324924 = score(doc=5180,freq=1.0), product of:
              0.09655448 = queryWeight, product of:
                1.3104765 = boost
                4.3622913 = idf(docFreq=1498, maxDocs=43254)
                0.016889956 = queryNorm
              0.2726432 = fieldWeight in 5180, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3622913 = idf(docFreq=1498, maxDocs=43254)
                0.0625 = fieldNorm(doc=5180)
          0.0091170175 = weight(abstract_txt:this in 5180) [ClassicSimilarity], result of:
            0.0091170175 = score(doc=5180,freq=1.0), product of:
              0.059993777 = queryWeight, product of:
                1.4608685 = boost
                2.4314568 = idf(docFreq=10335, maxDocs=43254)
                0.016889956 = queryNorm
              0.15196605 = fieldWeight in 5180, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4314568 = idf(docFreq=10335, maxDocs=43254)
                0.0625 = fieldNorm(doc=5180)
          0.15374179 = weight(abstract_txt:phrases in 5180) [ClassicSimilarity], result of:
            0.15374179 = score(doc=5180,freq=1.0), product of:
              0.35844243 = queryWeight, product of:
                3.0924191 = boost
                6.8626604 = idf(docFreq=122, maxDocs=43254)
                0.016889956 = queryNorm
              0.42891628 = fieldWeight in 5180, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8626604 = idf(docFreq=122, maxDocs=43254)
                0.0625 = fieldNorm(doc=5180)
          0.2966962 = weight(abstract_txt:noun in 5180) [ClassicSimilarity], result of:
            0.2966962 = score(doc=5180,freq=1.0), product of:
              0.6115223 = queryWeight, product of:
                4.6640606 = boost
                7.762822 = idf(docFreq=49, maxDocs=43254)
                0.016889956 = queryNorm
              0.48517638 = fieldWeight in 5180, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.762822 = idf(docFreq=49, maxDocs=43254)
                0.0625 = fieldNorm(doc=5180)
        0.28 = coord(7/25)
    
  3. Justeson, J.S.; Katz, S.M.: Technical terminology : some linguistic properties and an algorithm for identification in text (1995) 0.17
    0.17377399 = sum of:
      0.17377399 = product of:
        1.0860875 = sum of:
          0.099464335 = weight(abstract_txt:phrase in 2220) [ClassicSimilarity], result of:
            0.099464335 = score(doc=2220,freq=2.0), product of:
              0.12715866 = queryWeight, product of:
                1.0634106 = boost
                7.0797253 = idf(docFreq=98, maxDocs=43254)
                0.016889956 = queryNorm
              0.78220654 = fieldWeight in 2220, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.0797253 = idf(docFreq=98, maxDocs=43254)
                0.078125 = fieldNorm(doc=2220)
          0.011396271 = weight(abstract_txt:this in 2220) [ClassicSimilarity], result of:
            0.011396271 = score(doc=2220,freq=1.0), product of:
              0.059993777 = queryWeight, product of:
                1.4608685 = boost
                2.4314568 = idf(docFreq=10335, maxDocs=43254)
                0.016889956 = queryNorm
              0.18995756 = fieldWeight in 2220, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4314568 = idf(docFreq=10335, maxDocs=43254)
                0.078125 = fieldNorm(doc=2220)
          0.33286074 = weight(abstract_txt:phrases in 2220) [ClassicSimilarity], result of:
            0.33286074 = score(doc=2220,freq=3.0), product of:
              0.35844243 = queryWeight, product of:
                3.0924191 = boost
                6.8626604 = idf(docFreq=122, maxDocs=43254)
                0.016889956 = queryNorm
              0.92863095 = fieldWeight in 2220, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.8626604 = idf(docFreq=122, maxDocs=43254)
                0.078125 = fieldNorm(doc=2220)
          0.6423661 = weight(abstract_txt:noun in 2220) [ClassicSimilarity], result of:
            0.6423661 = score(doc=2220,freq=3.0), product of:
              0.6115223 = queryWeight, product of:
                4.6640606 = boost
                7.762822 = idf(docFreq=49, maxDocs=43254)
                0.016889956 = queryNorm
              1.0504377 = fieldWeight in 2220, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.762822 = idf(docFreq=49, maxDocs=43254)
                0.078125 = fieldNorm(doc=2220)
        0.16 = coord(4/25)
    
  4. Mesquita, L.A.P.; Souza, R.R.; Baracho Porto, R.M.A.: Noun phrases in automatic indexing: : a structural analysis of the distribution of relevant terms in doctoral theses (2014) 0.17
    0.17184725 = sum of:
      0.17184725 = product of:
        0.85923624 = sum of:
          0.027592892 = weight(abstract_txt:indexing in 2907) [ClassicSimilarity], result of:
            0.027592892 = score(doc=2907,freq=2.0), product of:
              0.09579474 = queryWeight, product of:
                1.3053106 = boost
                4.345095 = idf(docFreq=1524, maxDocs=43254)
                0.016889956 = queryNorm
              0.28804183 = fieldWeight in 2907, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.345095 = idf(docFreq=1524, maxDocs=43254)
                0.046875 = fieldNorm(doc=2907)
          0.015289703 = weight(abstract_txt:this in 2907) [ClassicSimilarity], result of:
            0.015289703 = score(doc=2907,freq=5.0), product of:
              0.059993777 = queryWeight, product of:
                1.4608685 = boost
                2.4314568 = idf(docFreq=10335, maxDocs=43254)
                0.016889956 = queryNorm
              0.25485483 = fieldWeight in 2907, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                2.4314568 = idf(docFreq=10335, maxDocs=43254)
                0.046875 = fieldNorm(doc=2907)
          0.060946196 = weight(abstract_txt:texts in 2907) [ClassicSimilarity], result of:
            0.060946196 = score(doc=2907,freq=2.0), product of:
              0.16247053 = queryWeight, product of:
                1.6999272 = boost
                5.658688 = idf(docFreq=409, maxDocs=43254)
                0.016889956 = queryNorm
              0.37512153 = fieldWeight in 2907, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.658688 = idf(docFreq=409, maxDocs=43254)
                0.046875 = fieldNorm(doc=2907)
          0.25783283 = weight(abstract_txt:phrases in 2907) [ClassicSimilarity], result of:
            0.25783283 = score(doc=2907,freq=5.0), product of:
              0.35844243 = queryWeight, product of:
                3.0924191 = boost
                6.8626604 = idf(docFreq=122, maxDocs=43254)
                0.016889956 = queryNorm
              0.71931446 = fieldWeight in 2907, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.8626604 = idf(docFreq=122, maxDocs=43254)
                0.046875 = fieldNorm(doc=2907)
          0.49757463 = weight(abstract_txt:noun in 2907) [ClassicSimilarity], result of:
            0.49757463 = score(doc=2907,freq=5.0), product of:
              0.6115223 = queryWeight, product of:
                4.6640606 = boost
                7.762822 = idf(docFreq=49, maxDocs=43254)
                0.016889956 = queryNorm
              0.8136655 = fieldWeight in 2907, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.762822 = idf(docFreq=49, maxDocs=43254)
                0.046875 = fieldNorm(doc=2907)
        0.2 = coord(5/25)
    
  5. Martins, A.L.; Souza, R.R.; Ribeiro de Mello, H.: ¬The use of noun phrases in information retrieval : proposing a mechanism for automatic classification (2014) 0.17
    0.16664505 = sum of:
      0.16664505 = product of:
        0.6943544 = sum of:
          0.04318318 = weight(abstract_txt:hypothesis in 2906) [ClassicSimilarity], result of:
            0.04318318 = score(doc=2906,freq=1.0), product of:
              0.11651685 = queryWeight, product of:
                1.0179405 = boost
                6.777005 = idf(docFreq=133, maxDocs=43254)
                0.016889956 = queryNorm
              0.37061748 = fieldWeight in 2906, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.777005 = idf(docFreq=133, maxDocs=43254)
                0.0546875 = fieldNorm(doc=2906)
          0.014701334 = weight(abstract_txt:approach in 2906) [ClassicSimilarity], result of:
            0.014701334 = score(doc=2906,freq=1.0), product of:
              0.0715748 = queryWeight, product of:
                1.1282961 = boost
                3.7558525 = idf(docFreq=2748, maxDocs=43254)
                0.016889956 = queryNorm
              0.20539819 = fieldWeight in 2906, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7558525 = idf(docFreq=2748, maxDocs=43254)
                0.0546875 = fieldNorm(doc=2906)
          0.00797739 = weight(abstract_txt:this in 2906) [ClassicSimilarity], result of:
            0.00797739 = score(doc=2906,freq=1.0), product of:
              0.059993777 = queryWeight, product of:
                1.4608685 = boost
                2.4314568 = idf(docFreq=10335, maxDocs=43254)
                0.016889956 = queryNorm
              0.13297029 = fieldWeight in 2906, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4314568 = idf(docFreq=10335, maxDocs=43254)
                0.0546875 = fieldNorm(doc=2906)
          0.0711039 = weight(abstract_txt:texts in 2906) [ClassicSimilarity], result of:
            0.0711039 = score(doc=2906,freq=2.0), product of:
              0.16247053 = queryWeight, product of:
                1.6999272 = boost
                5.658688 = idf(docFreq=409, maxDocs=43254)
                0.016889956 = queryNorm
              0.4376418 = fieldWeight in 2906, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.658688 = idf(docFreq=409, maxDocs=43254)
                0.0546875 = fieldNorm(doc=2906)
          0.19024575 = weight(abstract_txt:phrases in 2906) [ClassicSimilarity], result of:
            0.19024575 = score(doc=2906,freq=2.0), product of:
              0.35844243 = queryWeight, product of:
                3.0924191 = boost
                6.8626604 = idf(docFreq=122, maxDocs=43254)
                0.016889956 = queryNorm
              0.5307568 = fieldWeight in 2906, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.8626604 = idf(docFreq=122, maxDocs=43254)
                0.0546875 = fieldNorm(doc=2906)
          0.36714283 = weight(abstract_txt:noun in 2906) [ClassicSimilarity], result of:
            0.36714283 = score(doc=2906,freq=2.0), product of:
              0.6115223 = queryWeight, product of:
                4.6640606 = boost
                7.762822 = idf(docFreq=49, maxDocs=43254)
                0.016889956 = queryNorm
              0.6003752 = fieldWeight in 2906, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.762822 = idf(docFreq=49, maxDocs=43254)
                0.0546875 = fieldNorm(doc=2906)
        0.24 = coord(6/25)