Document (#40624)

Author
Gil-Leiva, I.
Title
SISA-automatic indexing system for scientific articles : experiments with location heuristics rules versus TF-IDF rules
Source
Knowledge organization. 44(2017) no.3, S.139-162
Year
2017
Abstract
Indexing is contextualized and a brief description is provided of some of the most used automatic indexing systems. We describe SISA, a system which uses location heuristics rules, statistical rules like term frequency (TF) or TF-IDF to obtain automatic or semi-automatic indexing, depending on the user's preference. The aim of this research is to ascertain which rules (location heuristics rules or TF-IDF rules) provide the best indexing terms. SISA is used to obtain the automatic indexing of 200 scientific articles on fruit growing written in Portuguese. It uses, on the one hand, location heuristics rules founded on the value of certain parts of the articles for indexing such as titles, abstracts, keywords, headings, first paragraph, conclusions and references and, on the other, TF-IDF rules. The indexing is then evaluated to ascertain retrieval performance through recall, precision and f-measure. Automatic indexing of the articles with location heuristics rules provided the best results with the evaluation measures.
Content
Beitrag in einem Special Issue "New Trends for Knowledge Organization, Guest Editor: Renato Rocha Souza ".
Theme
Automatisches Indexieren

Similar documents (author)

  1. Leiva, I.G. -> Gil-Leiva, I.: 5.63
    5.629064 = sum of:
      5.629064 = weight(author_txt:leiva in 98) [ClassicSimilarity], result of:
        5.629064 = fieldWeight in 98, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          9.097941 = idf(docFreq=12, maxDocs=42740)
          0.4375 = fieldNorm(doc=98)
    
  2. Mederos, A. Leiva- = > Leiva-Mederos, A.: 4.82
    4.824912 = sum of:
      4.824912 = weight(author_txt:leiva in 5167) [ClassicSimilarity], result of:
        4.824912 = fieldWeight in 5167, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          9.097941 = idf(docFreq=12, maxDocs=42740)
          0.375 = fieldNorm(doc=5167)
    
  3. Leiva, I. Gil- => Gil-Leiva, I.: 4.82
    4.824912 = sum of:
      4.824912 = weight(author_txt:leiva in 736) [ClassicSimilarity], result of:
        4.824912 = fieldWeight in 736, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          9.097941 = idf(docFreq=12, maxDocs=42740)
          0.375 = fieldNorm(doc=736)
    
  4. Leiva, I. Gil- => Gil-Leiva, I.: 4.82
    4.824912 = sum of:
      4.824912 = weight(author_txt:leiva in 913) [ClassicSimilarity], result of:
        4.824912 = fieldWeight in 913, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          9.097941 = idf(docFreq=12, maxDocs=42740)
          0.375 = fieldNorm(doc=913)
    
  5. Gil-Leiva, I.; Munoz, V.R.: ¬Los origines del almacenamiento y recuperacion de informacion (1996) 3.98
    3.9803493 = sum of:
      3.9803493 = weight(author_txt:leiva in 5586) [ClassicSimilarity], result of:
        3.9803493 = fieldWeight in 5586, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.097941 = idf(docFreq=12, maxDocs=42740)
          0.4375 = fieldNorm(doc=5586)
    

Similar documents (content)

  1. Kim, P.K.: ¬An automatic indexing of compound words based on mutual information for Korean text retrieval (1995) 0.15
    0.14508486 = sum of:
      0.14508486 = product of:
        0.7254243 = sum of:
          0.0122901155 = weight(abstract_txt:system in 1621) [ClassicSimilarity], result of:
            0.0122901155 = score(doc=1621,freq=1.0), product of:
              0.03897731 = queryWeight, product of:
                1.100248 = boost
                3.3633559 = idf(docFreq=4021, maxDocs=42740)
                0.010532912 = queryNorm
              0.31531462 = fieldWeight in 1621, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3633559 = idf(docFreq=4021, maxDocs=42740)
                0.09375 = fieldNorm(doc=1621)
          0.012416647 = weight(abstract_txt:used in 1621) [ClassicSimilarity], result of:
            0.012416647 = score(doc=1621,freq=1.0), product of:
              0.039244376 = queryWeight, product of:
                1.1040109 = boost
                3.3748589 = idf(docFreq=3975, maxDocs=42740)
                0.010532912 = queryNorm
              0.31639302 = fieldWeight in 1621, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3748589 = idf(docFreq=3975, maxDocs=42740)
                0.09375 = fieldNorm(doc=1621)
          0.13624918 = weight(abstract_txt:automatic in 1621) [ClassicSimilarity], result of:
            0.13624918 = score(doc=1621,freq=1.0), product of:
              0.27949297 = queryWeight, product of:
                5.1030607 = boost
                5.199861 = idf(docFreq=640, maxDocs=42740)
                0.010532912 = queryNorm
              0.48748696 = fieldWeight in 1621, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.199861 = idf(docFreq=640, maxDocs=42740)
                0.09375 = fieldNorm(doc=1621)
          0.23771867 = weight(abstract_txt:indexing in 1621) [ClassicSimilarity], result of:
            0.23771867 = score(doc=1621,freq=4.0), product of:
              0.2921018 = queryWeight, product of:
                6.38937 = boost
                4.34038 = idf(docFreq=1513, maxDocs=42740)
                0.010532912 = queryNorm
              0.8138213 = fieldWeight in 1621, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.34038 = idf(docFreq=1513, maxDocs=42740)
                0.09375 = fieldNorm(doc=1621)
          0.32674968 = weight(abstract_txt:rules in 1621) [ClassicSimilarity], result of:
            0.32674968 = score(doc=1621,freq=2.0), product of:
              0.47122827 = queryWeight, product of:
                8.5543165 = boost
                5.2299504 = idf(docFreq=621, maxDocs=42740)
                0.010532912 = queryNorm
              0.6934 = fieldWeight in 1621, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.2299504 = idf(docFreq=621, maxDocs=42740)
                0.09375 = fieldNorm(doc=1621)
        0.2 = coord(5/25)
    
  2. Mundgod, M.B.; Prasad, A.R.D.: Automatic identification of bibliographic data elements from the title pages of documents : a heuristic approach (1996) 0.14
    0.14133789 = sum of:
      0.14133789 = product of:
        0.8833618 = sum of:
          0.021287104 = weight(abstract_txt:system in 1398) [ClassicSimilarity], result of:
            0.021287104 = score(doc=1398,freq=3.0), product of:
              0.03897731 = queryWeight, product of:
                1.100248 = boost
                3.3633559 = idf(docFreq=4021, maxDocs=42740)
                0.010532912 = queryNorm
              0.5461409 = fieldWeight in 1398, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.3633559 = idf(docFreq=4021, maxDocs=42740)
                0.09375 = fieldNorm(doc=1398)
          0.007732153 = weight(abstract_txt:with in 1398) [ClassicSimilarity], result of:
            0.007732153 = score(doc=1398,freq=1.0), product of:
              0.0327596 = queryWeight, product of:
                1.2353772 = boost
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.010532912 = queryNorm
              0.23602709 = fieldWeight in 1398, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.09375 = fieldNorm(doc=1398)
          0.19268544 = weight(abstract_txt:automatic in 1398) [ClassicSimilarity], result of:
            0.19268544 = score(doc=1398,freq=2.0), product of:
              0.27949297 = queryWeight, product of:
                5.1030607 = boost
                5.199861 = idf(docFreq=640, maxDocs=42740)
                0.010532912 = queryNorm
              0.6894107 = fieldWeight in 1398, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.199861 = idf(docFreq=640, maxDocs=42740)
                0.09375 = fieldNorm(doc=1398)
          0.66165715 = weight(abstract_txt:heuristics in 1398) [ClassicSimilarity], result of:
            0.66165715 = score(doc=1398,freq=3.0), product of:
              0.5229612 = queryWeight, product of:
                6.3722 = boost
                7.7916894 = idf(docFreq=47, maxDocs=42740)
                0.010532912 = queryNorm
              1.2652127 = fieldWeight in 1398, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.7916894 = idf(docFreq=47, maxDocs=42740)
                0.09375 = fieldNorm(doc=1398)
        0.16 = coord(4/25)
    
  3. Ibekwe-SanJuan, F.: Semantic metadata annotation : tagging Medline abstracts for enhanced information access (2010) 0.14
    0.14107561 = sum of:
      0.14107561 = product of:
        0.58781505 = sum of:
          0.024405507 = weight(abstract_txt:conclusions in 950) [ClassicSimilarity], result of:
            0.024405507 = score(doc=950,freq=1.0), product of:
              0.07000761 = queryWeight, product of:
                1.0426589 = boost
                6.3746233 = idf(docFreq=197, maxDocs=42740)
                0.010532912 = queryNorm
              0.34861222 = fieldWeight in 950, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3746233 = idf(docFreq=197, maxDocs=42740)
                0.0546875 = fieldNorm(doc=950)
          0.0072430437 = weight(abstract_txt:used in 950) [ClassicSimilarity], result of:
            0.0072430437 = score(doc=950,freq=1.0), product of:
              0.039244376 = queryWeight, product of:
                1.1040109 = boost
                3.3748589 = idf(docFreq=3975, maxDocs=42740)
                0.010532912 = queryNorm
              0.1845626 = fieldWeight in 950, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3748589 = idf(docFreq=3975, maxDocs=42740)
                0.0546875 = fieldNorm(doc=950)
          0.0063787005 = weight(abstract_txt:with in 950) [ClassicSimilarity], result of:
            0.0063787005 = score(doc=950,freq=2.0), product of:
              0.0327596 = queryWeight, product of:
                1.2353772 = boost
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.010532912 = queryNorm
              0.19471242 = fieldWeight in 950, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.0546875 = fieldNorm(doc=950)
          0.027380984 = weight(abstract_txt:scientific in 950) [ClassicSimilarity], result of:
            0.027380984 = score(doc=950,freq=2.0), product of:
              0.07558798 = queryWeight, product of:
                1.5321844 = boost
                4.6837454 = idf(docFreq=1073, maxDocs=42740)
                0.010532912 = queryNorm
              0.36223993 = fieldWeight in 950, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6837454 = idf(docFreq=1073, maxDocs=42740)
                0.0546875 = fieldNorm(doc=950)
          0.02412602 = weight(abstract_txt:best in 950) [ClassicSimilarity], result of:
            0.02412602 = score(doc=950,freq=1.0), product of:
              0.08752937 = queryWeight, product of:
                1.6487756 = boost
                5.040154 = idf(docFreq=751, maxDocs=42740)
                0.010532912 = queryNorm
              0.27563342 = fieldWeight in 950, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.040154 = idf(docFreq=751, maxDocs=42740)
                0.0546875 = fieldNorm(doc=950)
          0.49828082 = weight(abstract_txt:heuristics in 950) [ClassicSimilarity], result of:
            0.49828082 = score(doc=950,freq=5.0), product of:
              0.5229612 = queryWeight, product of:
                6.3722 = boost
                7.7916894 = idf(docFreq=47, maxDocs=42740)
                0.010532912 = queryNorm
              0.9528065 = fieldWeight in 950, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.7916894 = idf(docFreq=47, maxDocs=42740)
                0.0546875 = fieldNorm(doc=950)
        0.24 = coord(6/25)
    
  4. Panyr, J.: Information retrieval techniques in rule-based expert systems (1991) 0.13
    0.1262455 = sum of:
      0.1262455 = product of:
        0.6312275 = sum of:
          0.014191403 = weight(abstract_txt:system in 3036) [ClassicSimilarity], result of:
            0.014191403 = score(doc=3036,freq=3.0), product of:
              0.03897731 = queryWeight, product of:
                1.100248 = boost
                3.3633559 = idf(docFreq=4021, maxDocs=42740)
                0.010532912 = queryNorm
              0.36409396 = fieldWeight in 3036, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.3633559 = idf(docFreq=4021, maxDocs=42740)
                0.0625 = fieldNorm(doc=3036)
          0.011706526 = weight(abstract_txt:used in 3036) [ClassicSimilarity], result of:
            0.011706526 = score(doc=3036,freq=2.0), product of:
              0.039244376 = queryWeight, product of:
                1.1040109 = boost
                3.3748589 = idf(docFreq=3975, maxDocs=42740)
                0.010532912 = queryNorm
              0.29829818 = fieldWeight in 3036, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3748589 = idf(docFreq=3975, maxDocs=42740)
                0.0625 = fieldNorm(doc=3036)
          0.18166558 = weight(abstract_txt:automatic in 3036) [ClassicSimilarity], result of:
            0.18166558 = score(doc=3036,freq=4.0), product of:
              0.27949297 = queryWeight, product of:
                5.1030607 = boost
                5.199861 = idf(docFreq=640, maxDocs=42740)
                0.010532912 = queryNorm
              0.64998263 = fieldWeight in 3036, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.199861 = idf(docFreq=640, maxDocs=42740)
                0.0625 = fieldNorm(doc=3036)
          0.079239555 = weight(abstract_txt:indexing in 3036) [ClassicSimilarity], result of:
            0.079239555 = score(doc=3036,freq=1.0), product of:
              0.2921018 = queryWeight, product of:
                6.38937 = boost
                4.34038 = idf(docFreq=1513, maxDocs=42740)
                0.010532912 = queryNorm
              0.27127376 = fieldWeight in 3036, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.34038 = idf(docFreq=1513, maxDocs=42740)
                0.0625 = fieldNorm(doc=3036)
          0.34442443 = weight(abstract_txt:rules in 3036) [ClassicSimilarity], result of:
            0.34442443 = score(doc=3036,freq=5.0), product of:
              0.47122827 = queryWeight, product of:
                8.5543165 = boost
                5.2299504 = idf(docFreq=621, maxDocs=42740)
                0.010532912 = queryNorm
              0.7309078 = fieldWeight in 3036, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.2299504 = idf(docFreq=621, maxDocs=42740)
                0.0625 = fieldNorm(doc=3036)
        0.2 = coord(5/25)
    
  5. Driscoll, J.R.; Rajala, D.A.; Shaffer, W.H.: ¬The operation and performance of an artificially intelligent keywording system (1991) 0.12
    0.11556532 = sum of:
      0.11556532 = product of:
        0.5778266 = sum of:
          0.010241764 = weight(abstract_txt:system in 6681) [ClassicSimilarity], result of:
            0.010241764 = score(doc=6681,freq=1.0), product of:
              0.03897731 = queryWeight, product of:
                1.100248 = boost
                3.3633559 = idf(docFreq=4021, maxDocs=42740)
                0.010532912 = queryNorm
              0.2627622 = fieldWeight in 6681, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3633559 = idf(docFreq=4021, maxDocs=42740)
                0.078125 = fieldNorm(doc=6681)
          0.010347205 = weight(abstract_txt:used in 6681) [ClassicSimilarity], result of:
            0.010347205 = score(doc=6681,freq=1.0), product of:
              0.039244376 = queryWeight, product of:
                1.1040109 = boost
                3.3748589 = idf(docFreq=3975, maxDocs=42740)
                0.010532912 = queryNorm
              0.26366085 = fieldWeight in 6681, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3748589 = idf(docFreq=3975, maxDocs=42740)
                0.078125 = fieldNorm(doc=6681)
          0.03208238 = weight(abstract_txt:provided in 6681) [ClassicSimilarity], result of:
            0.03208238 = score(doc=6681,freq=1.0), product of:
              0.083446175 = queryWeight, product of:
                1.6098591 = boost
                4.92119 = idf(docFreq=846, maxDocs=42740)
                0.010532912 = queryNorm
              0.38446796 = fieldWeight in 6681, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.92119 = idf(docFreq=846, maxDocs=42740)
                0.078125 = fieldNorm(doc=6681)
          0.14007707 = weight(abstract_txt:indexing in 6681) [ClassicSimilarity], result of:
            0.14007707 = score(doc=6681,freq=2.0), product of:
              0.2921018 = queryWeight, product of:
                6.38937 = boost
                4.34038 = idf(docFreq=1513, maxDocs=42740)
                0.010532912 = queryNorm
              0.4795488 = fieldWeight in 6681, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.34038 = idf(docFreq=1513, maxDocs=42740)
                0.078125 = fieldNorm(doc=6681)
          0.3850782 = weight(abstract_txt:rules in 6681) [ClassicSimilarity], result of:
            0.3850782 = score(doc=6681,freq=4.0), product of:
              0.47122827 = queryWeight, product of:
                8.5543165 = boost
                5.2299504 = idf(docFreq=621, maxDocs=42740)
                0.010532912 = queryNorm
              0.81717974 = fieldWeight in 6681, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.2299504 = idf(docFreq=621, maxDocs=42740)
                0.078125 = fieldNorm(doc=6681)
        0.2 = coord(5/25)