Document (#40623)

Author
Gil-Leiva, I.
Title
SISA-automatic indexing system for scientific articles : experiments with location heuristics rules versus TF-IDF rules
Source
Knowledge organization. 44(2017) no.3, S.139-162
Year
2017
Abstract
Indexing is contextualized and a brief description is provided of some of the most used automatic indexing systems. We describe SISA, a system which uses location heuristics rules, statistical rules like term frequency (TF) or TF-IDF to obtain automatic or semi-automatic indexing, depending on the user's preference. The aim of this research is to ascertain which rules (location heuristics rules or TF-IDF rules) provide the best indexing terms. SISA is used to obtain the automatic indexing of 200 scientific articles on fruit growing written in Portuguese. It uses, on the one hand, location heuristics rules founded on the value of certain parts of the articles for indexing such as titles, abstracts, keywords, headings, first paragraph, conclusions and references and, on the other, TF-IDF rules. The indexing is then evaluated to ascertain retrieval performance through recall, precision and f-measure. Automatic indexing of the articles with location heuristics rules provided the best results with the evaluation measures.
Content
Beitrag in einem Special Issue "New Trends for Knowledge Organization, Guest Editor: Renato Rocha Souza ".
Theme
Automatisches Indexieren

Similar documents (author)

  1. Leiva, I.G. -> Gil-Leiva, I.: 5.56
    5.561559 = sum of:
      5.561559 = weight(author_txt:leiva in 98) [ClassicSimilarity], result of:
        5.561559 = fieldWeight in 98, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.988837 = idf(docFreq=14, maxDocs=44218)
          0.4375 = fieldNorm(doc=98)
    
  2. Mederos, A. Leiva- = > Leiva-Mederos, A.: 4.77
    4.7670507 = sum of:
      4.7670507 = weight(author_txt:leiva in 3166) [ClassicSimilarity], result of:
        4.7670507 = fieldWeight in 3166, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.988837 = idf(docFreq=14, maxDocs=44218)
          0.375 = fieldNorm(doc=3166)
    
  3. Leiva, I. Gil- => Gil-Leiva, I.: 4.77
    4.7670507 = sum of:
      4.7670507 = weight(author_txt:leiva in 4735) [ClassicSimilarity], result of:
        4.7670507 = fieldWeight in 4735, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.988837 = idf(docFreq=14, maxDocs=44218)
          0.375 = fieldNorm(doc=4735)
    
  4. Leiva, I. Gil- => Gil-Leiva, I.: 4.77
    4.7670507 = sum of:
      4.7670507 = weight(author_txt:leiva in 4912) [ClassicSimilarity], result of:
        4.7670507 = fieldWeight in 4912, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.988837 = idf(docFreq=14, maxDocs=44218)
          0.375 = fieldNorm(doc=4912)
    
  5. Leiva, I. Gil- => Gil-Leiva, I.: 4.77
    4.7670507 = sum of:
      4.7670507 = weight(author_txt:leiva in 738) [ClassicSimilarity], result of:
        4.7670507 = fieldWeight in 738, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.988837 = idf(docFreq=14, maxDocs=44218)
          0.375 = fieldNorm(doc=738)
    

Similar documents (content)

  1. Kim, P.K.: ¬An automatic indexing of compound words based on mutual information for Korean text retrieval (1995) 0.15
    0.14582035 = sum of:
      0.14582035 = product of:
        0.7291017 = sum of:
          0.01226671 = weight(abstract_txt:used in 620) [ClassicSimilarity], result of:
            0.01226671 = score(doc=620,freq=1.0), product of:
              0.03895006 = queryWeight, product of:
                1.1043499 = boost
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.010499117 = queryNorm
              0.3149343 = fieldWeight in 620, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.09375 = fieldNorm(doc=620)
          0.012409774 = weight(abstract_txt:system in 620) [ClassicSimilarity], result of:
            0.012409774 = score(doc=620,freq=1.0), product of:
              0.03925232 = queryWeight, product of:
                1.1086265 = boost
                3.3723085 = idf(docFreq=4123, maxDocs=44218)
                0.010499117 = queryNorm
              0.3161539 = fieldWeight in 620, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3723085 = idf(docFreq=4123, maxDocs=44218)
                0.09375 = fieldNorm(doc=620)
          0.13614754 = weight(abstract_txt:automatic in 620) [ClassicSimilarity], result of:
            0.13614754 = score(doc=620,freq=1.0), product of:
              0.27951366 = queryWeight, product of:
                5.1240664 = boost
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.010499117 = queryNorm
              0.48708728 = fieldWeight in 620, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.09375 = fieldNorm(doc=620)
          0.2396452 = weight(abstract_txt:indexing in 620) [ClassicSimilarity], result of:
            0.2396452 = score(doc=620,freq=4.0), product of:
              0.29384574 = queryWeight, product of:
                6.4345555 = boost
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.010499117 = queryNorm
              0.81554765 = fieldWeight in 620, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.09375 = fieldNorm(doc=620)
          0.32863247 = weight(abstract_txt:rules in 620) [ClassicSimilarity], result of:
            0.32863247 = score(doc=620,freq=2.0), product of:
              0.47330713 = queryWeight, product of:
                8.608136 = boost
                5.236983 = idf(docFreq=638, maxDocs=44218)
                0.010499117 = queryNorm
              0.69433236 = fieldWeight in 620, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.236983 = idf(docFreq=638, maxDocs=44218)
                0.09375 = fieldNorm(doc=620)
        0.2 = coord(5/25)
    
  2. Mundgod, M.B.; Prasad, A.R.D.: Automatic identification of bibliographic data elements from the title pages of documents : a heuristic approach (1996) 0.14
    0.14122717 = sum of:
      0.14122717 = product of:
        0.88266987 = sum of:
          0.02149436 = weight(abstract_txt:system in 397) [ClassicSimilarity], result of:
            0.02149436 = score(doc=397,freq=3.0), product of:
              0.03925232 = queryWeight, product of:
                1.1086265 = boost
                3.3723085 = idf(docFreq=4123, maxDocs=44218)
                0.010499117 = queryNorm
              0.54759467 = fieldWeight in 397, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.3723085 = idf(docFreq=4123, maxDocs=44218)
                0.09375 = fieldNorm(doc=397)
          0.0075814873 = weight(abstract_txt:with in 397) [ClassicSimilarity], result of:
            0.0075814873 = score(doc=397,freq=1.0), product of:
              0.03235113 = queryWeight, product of:
                1.232659 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.010499117 = queryNorm
              0.23435001 = fieldWeight in 397, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.09375 = fieldNorm(doc=397)
          0.19254169 = weight(abstract_txt:automatic in 397) [ClassicSimilarity], result of:
            0.19254169 = score(doc=397,freq=2.0), product of:
              0.27951366 = queryWeight, product of:
                5.1240664 = boost
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.010499117 = queryNorm
              0.6888454 = fieldWeight in 397, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.09375 = fieldNorm(doc=397)
          0.66105235 = weight(abstract_txt:heuristics in 397) [ClassicSimilarity], result of:
            0.66105235 = score(doc=397,freq=3.0), product of:
              0.52294123 = queryWeight, product of:
                6.398071 = boost
                7.7848644 = idf(docFreq=49, maxDocs=44218)
                0.010499117 = queryNorm
              1.2641045 = fieldWeight in 397, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.7848644 = idf(docFreq=49, maxDocs=44218)
                0.09375 = fieldNorm(doc=397)
        0.16 = coord(4/25)
    
  3. Ibekwe-SanJuan, F.: Semantic metadata annotation : tagging Medline abstracts for enhanced information access (2010) 0.14
    0.1406991 = sum of:
      0.1406991 = product of:
        0.58624625 = sum of:
          0.02449509 = weight(abstract_txt:conclusions in 3949) [ClassicSimilarity], result of:
            0.02449509 = score(doc=3949,freq=1.0), product of:
              0.070218936 = queryWeight, product of:
                1.048491 = boost
                6.378767 = idf(docFreq=203, maxDocs=44218)
                0.010499117 = queryNorm
              0.3488388 = fieldWeight in 3949, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.378767 = idf(docFreq=203, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3949)
          0.007155581 = weight(abstract_txt:used in 3949) [ClassicSimilarity], result of:
            0.007155581 = score(doc=3949,freq=1.0), product of:
              0.03895006 = queryWeight, product of:
                1.1043499 = boost
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.010499117 = queryNorm
              0.18371168 = fieldWeight in 3949, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3949)
          0.006254408 = weight(abstract_txt:with in 3949) [ClassicSimilarity], result of:
            0.006254408 = score(doc=3949,freq=2.0), product of:
              0.03235113 = queryWeight, product of:
                1.232659 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.010499117 = queryNorm
              0.1933289 = fieldWeight in 3949, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3949)
          0.02669376 = weight(abstract_txt:scientific in 3949) [ClassicSimilarity], result of:
            0.02669376 = score(doc=3949,freq=2.0), product of:
              0.07436034 = queryWeight, product of:
                1.5258902 = boost
                4.6415744 = idf(docFreq=1158, maxDocs=44218)
                0.010499117 = queryNorm
              0.35897845 = fieldWeight in 3949, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6415744 = idf(docFreq=1158, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3949)
          0.023822092 = weight(abstract_txt:best in 3949) [ClassicSimilarity], result of:
            0.023822092 = score(doc=3949,freq=1.0), product of:
              0.08684233 = queryWeight, product of:
                1.6489912 = boost
                5.0160327 = idf(docFreq=796, maxDocs=44218)
                0.010499117 = queryNorm
              0.27431428 = fieldWeight in 3949, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0160327 = idf(docFreq=796, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3949)
          0.49782535 = weight(abstract_txt:heuristics in 3949) [ClassicSimilarity], result of:
            0.49782535 = score(doc=3949,freq=5.0), product of:
              0.52294123 = queryWeight, product of:
                6.398071 = boost
                7.7848644 = idf(docFreq=49, maxDocs=44218)
                0.010499117 = queryNorm
              0.9519719 = fieldWeight in 3949, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.7848644 = idf(docFreq=49, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3949)
        0.24 = coord(6/25)
    
  4. Panyr, J.: Information retrieval techniques in rule-based expert systems (1991) 0.13
    0.12674312 = sum of:
      0.12674312 = product of:
        0.63371557 = sum of:
          0.011565165 = weight(abstract_txt:used in 3036) [ClassicSimilarity], result of:
            0.011565165 = score(doc=3036,freq=2.0), product of:
              0.03895006 = queryWeight, product of:
                1.1043499 = boost
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.010499117 = queryNorm
              0.2969229 = fieldWeight in 3036, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.0625 = fieldNorm(doc=3036)
          0.014329573 = weight(abstract_txt:system in 3036) [ClassicSimilarity], result of:
            0.014329573 = score(doc=3036,freq=3.0), product of:
              0.03925232 = queryWeight, product of:
                1.1086265 = boost
                3.3723085 = idf(docFreq=4123, maxDocs=44218)
                0.010499117 = queryNorm
              0.3650631 = fieldWeight in 3036, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.3723085 = idf(docFreq=4123, maxDocs=44218)
                0.0625 = fieldNorm(doc=3036)
          0.18153006 = weight(abstract_txt:automatic in 3036) [ClassicSimilarity], result of:
            0.18153006 = score(doc=3036,freq=4.0), product of:
              0.27951366 = queryWeight, product of:
                5.1240664 = boost
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.010499117 = queryNorm
              0.6494497 = fieldWeight in 3036, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.0625 = fieldNorm(doc=3036)
          0.079881735 = weight(abstract_txt:indexing in 3036) [ClassicSimilarity], result of:
            0.079881735 = score(doc=3036,freq=1.0), product of:
              0.29384574 = queryWeight, product of:
                6.4345555 = boost
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.010499117 = queryNorm
              0.27184922 = fieldWeight in 3036, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.0625 = fieldNorm(doc=3036)
          0.34640905 = weight(abstract_txt:rules in 3036) [ClassicSimilarity], result of:
            0.34640905 = score(doc=3036,freq=5.0), product of:
              0.47330713 = queryWeight, product of:
                8.608136 = boost
                5.236983 = idf(docFreq=638, maxDocs=44218)
                0.010499117 = queryNorm
              0.7318906 = fieldWeight in 3036, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.236983 = idf(docFreq=638, maxDocs=44218)
                0.0625 = fieldNorm(doc=3036)
        0.2 = coord(5/25)
    
  5. Driscoll, J.R.; Rajala, D.A.; Shaffer, W.H.: ¬The operation and performance of an artificially intelligent keywording system (1991) 0.12
    0.11620343 = sum of:
      0.11620343 = product of:
        0.58101714 = sum of:
          0.010222258 = weight(abstract_txt:used in 6681) [ClassicSimilarity], result of:
            0.010222258 = score(doc=6681,freq=1.0), product of:
              0.03895006 = queryWeight, product of:
                1.1043499 = boost
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.010499117 = queryNorm
              0.26244524 = fieldWeight in 6681, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.078125 = fieldNorm(doc=6681)
          0.0103414785 = weight(abstract_txt:system in 6681) [ClassicSimilarity], result of:
            0.0103414785 = score(doc=6681,freq=1.0), product of:
              0.03925232 = queryWeight, product of:
                1.1086265 = boost
                3.3723085 = idf(docFreq=4123, maxDocs=44218)
                0.010499117 = queryNorm
              0.2634616 = fieldWeight in 6681, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3723085 = idf(docFreq=4123, maxDocs=44218)
                0.078125 = fieldNorm(doc=6681)
          0.031944063 = weight(abstract_txt:provided in 6681) [ClassicSimilarity], result of:
            0.031944063 = score(doc=6681,freq=1.0), product of:
              0.08325372 = queryWeight, product of:
                1.6145608 = boost
                4.9112997 = idf(docFreq=884, maxDocs=44218)
                0.010499117 = queryNorm
              0.3836953 = fieldWeight in 6681, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9112997 = idf(docFreq=884, maxDocs=44218)
                0.078125 = fieldNorm(doc=6681)
          0.14121228 = weight(abstract_txt:indexing in 6681) [ClassicSimilarity], result of:
            0.14121228 = score(doc=6681,freq=2.0), product of:
              0.29384574 = queryWeight, product of:
                6.4345555 = boost
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.010499117 = queryNorm
              0.48056605 = fieldWeight in 6681, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.078125 = fieldNorm(doc=6681)
          0.38729706 = weight(abstract_txt:rules in 6681) [ClassicSimilarity], result of:
            0.38729706 = score(doc=6681,freq=4.0), product of:
              0.47330713 = queryWeight, product of:
                8.608136 = boost
                5.236983 = idf(docFreq=638, maxDocs=44218)
                0.010499117 = queryNorm
              0.81827855 = fieldWeight in 6681, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.236983 = idf(docFreq=638, maxDocs=44218)
                0.078125 = fieldNorm(doc=6681)
        0.2 = coord(5/25)