Document (#21771)

Author
Humphrey, S.M.
Title
Automatic indexing of documents from journal descriptors : a preliminary investigation
Source
Journal of the American Society for Information Science. 50(1999) no.8, S.661-674
Year
1999
Abstract
A new, fully automated approach for indedexing documents is presented based on associating textwords in a training set of bibliographic citations with the indexing of journals. This journal-level indexing is in the form of a consistent, timely set of journal descriptors (JDs) indexing the individual journals themselves. This indexing is maintained in journal records in a serials authority database. The advantage of this novel approach is that the training set does not depend on previous manual indexing of thousands of documents (i.e., any such indexing already in the training set is not used), but rather the relatively small intellectual effort of indexing at the journal level, usually a matter of a few thousand unique journals for which retrospective indexing to maintain consistency and currency may be feasible. If successful, JD indexing would provide topical categorization of documents outside the training set, i.e., journal articles, monographs, Web documents, reports from the grey literature, etc., and therefore be applied in searching. Because JDs are quite general, corresponding to subject domains, their most problable use would be for improving or refining search results
Theme
Automatisches Indexieren
Field
Medizin
Object
Medline

Similar documents (author)

  1. Humphrey, S.M.: Use and management of classification systems for knowledge-based indexing (1992) 5.80
    5.798094 = sum of:
      5.798094 = weight(author_txt:humphrey in 2094) [ClassicSimilarity], result of:
        5.798094 = fieldWeight in 2094, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.27695 = idf(docFreq=10, maxDocs=43254)
          0.625 = fieldNorm(doc=2094)
    
  2. Humphrey, S.M.: Indexing biomedical documents : from thesaural to knowledge-based retrieval systems (1992) 5.80
    5.798094 = sum of:
      5.798094 = weight(author_txt:humphrey in 641) [ClassicSimilarity], result of:
        5.798094 = fieldWeight in 641, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.27695 = idf(docFreq=10, maxDocs=43254)
          0.625 = fieldNorm(doc=641)
    
  3. Humphrey, S.M.: ¬The MedIndEx prototype for computer assisted MEDLINE database indexing (1993) 5.80
    5.798094 = sum of:
      5.798094 = weight(author_txt:humphrey in 819) [ClassicSimilarity], result of:
        5.798094 = fieldWeight in 819, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.27695 = idf(docFreq=10, maxDocs=43254)
          0.625 = fieldNorm(doc=819)
    
  4. Humphrey, S.M.: Knowledge-based systems for indexing (1994) 5.80
    5.798094 = sum of:
      5.798094 = weight(author_txt:humphrey in 4056) [ClassicSimilarity], result of:
        5.798094 = fieldWeight in 4056, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.27695 = idf(docFreq=10, maxDocs=43254)
          0.625 = fieldNorm(doc=4056)
    
  5. Humphrey, J.: Manuscripts and metadata : Descriptive metadata in three manuscript catalogs: DigCIM, MALVINE, & Digital Scriptorium (2007) 5.80
    5.798094 = sum of:
      5.798094 = weight(author_txt:humphrey in 2784) [ClassicSimilarity], result of:
        5.798094 = fieldWeight in 2784, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.27695 = idf(docFreq=10, maxDocs=43254)
          0.625 = fieldNorm(doc=2784)
    

Similar documents (content)

  1. Ferber, R.: Automated indexing with thesaurus descriptors : a co-occurence based approach to multilingual retrieval (1997) 0.18
    0.18187952 = sum of:
      0.18187952 = product of:
        0.75783134 = sum of:
          0.012212175 = weight(abstract_txt:this in 6145) [ClassicSimilarity], result of:
            0.012212175 = score(doc=6145,freq=3.0), product of:
              0.046396565 = queryWeight, product of:
                1.0967388 = boost
                2.4314568 = idf(docFreq=10335, maxDocs=43254)
                0.01739867 = queryNorm
              0.26321292 = fieldWeight in 6145, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4314568 = idf(docFreq=10335, maxDocs=43254)
                0.0625 = fieldNorm(doc=6145)
          0.017324751 = weight(abstract_txt:approach in 6145) [ClassicSimilarity], result of:
            0.017324751 = score(doc=6145,freq=1.0), product of:
              0.07380376 = queryWeight, product of:
                1.1294159 = boost
                3.7558525 = idf(docFreq=2748, maxDocs=43254)
                0.01739867 = queryNorm
              0.23474078 = fieldWeight in 6145, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7558525 = idf(docFreq=2748, maxDocs=43254)
                0.0625 = fieldNorm(doc=6145)
          0.2363542 = weight(abstract_txt:descriptors in 6145) [ClassicSimilarity], result of:
            0.2363542 = score(doc=6145,freq=6.0), product of:
              0.23189546 = queryWeight, product of:
                2.0019848 = boost
                6.657565 = idf(docFreq=150, maxDocs=43254)
                0.01739867 = queryNorm
              1.0192274 = fieldWeight in 6145, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.657565 = idf(docFreq=150, maxDocs=43254)
                0.0625 = fieldNorm(doc=6145)
          0.12493254 = weight(abstract_txt:training in 6145) [ClassicSimilarity], result of:
            0.12493254 = score(doc=6145,freq=2.0), product of:
              0.27547628 = queryWeight, product of:
                3.0858283 = boost
                5.1309333 = idf(docFreq=694, maxDocs=43254)
                0.01739867 = queryNorm
              0.4535147 = fieldWeight in 6145, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1309333 = idf(docFreq=694, maxDocs=43254)
                0.0625 = fieldNorm(doc=6145)
          0.09875796 = weight(abstract_txt:documents in 6145) [ClassicSimilarity], result of:
            0.09875796 = score(doc=6145,freq=3.0), product of:
              0.22162639 = queryWeight, product of:
                3.094535 = boost
                4.1163282 = idf(docFreq=1916, maxDocs=43254)
                0.01739867 = queryNorm
              0.4456056 = fieldWeight in 6145, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.1163282 = idf(docFreq=1916, maxDocs=43254)
                0.0625 = fieldNorm(doc=6145)
          0.26824972 = weight(abstract_txt:indexing in 6145) [ClassicSimilarity], result of:
            0.26824972 = score(doc=6145,freq=4.0), product of:
              0.49388972 = queryWeight, product of:
                6.53303 = boost
                4.345095 = idf(docFreq=1524, maxDocs=43254)
                0.01739867 = queryNorm
              0.5431369 = fieldWeight in 6145, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.345095 = idf(docFreq=1524, maxDocs=43254)
                0.0625 = fieldNorm(doc=6145)
        0.24 = coord(6/25)
    
  2. Humphrey, S.M.; Névéol, A.; Browne, A.; Gobeil, J.; Ruch, P.; Darmoni, S.J.: Comparing a rule-based versus statistical system for automatic categorization of MEDLINE documents according to biomedical specialty (2009) 0.18
    0.17951354 = sum of:
      0.17951354 = product of:
        0.74797314 = sum of:
          0.0099712 = weight(abstract_txt:this in 301) [ClassicSimilarity], result of:
            0.0099712 = score(doc=301,freq=2.0), product of:
              0.046396565 = queryWeight, product of:
                1.0967388 = boost
                2.4314568 = idf(docFreq=10335, maxDocs=43254)
                0.01739867 = queryNorm
              0.21491244 = fieldWeight in 301, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4314568 = idf(docFreq=10335, maxDocs=43254)
                0.0625 = fieldNorm(doc=301)
          0.096491195 = weight(abstract_txt:descriptors in 301) [ClassicSimilarity], result of:
            0.096491195 = score(doc=301,freq=1.0), product of:
              0.23189546 = queryWeight, product of:
                2.0019848 = boost
                6.657565 = idf(docFreq=150, maxDocs=43254)
                0.01739867 = queryNorm
              0.41609782 = fieldWeight in 301, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.657565 = idf(docFreq=150, maxDocs=43254)
                0.0625 = fieldNorm(doc=301)
          0.06745117 = weight(abstract_txt:journals in 301) [ClassicSimilarity], result of:
            0.06745117 = score(doc=301,freq=1.0), product of:
              0.20908548 = queryWeight, product of:
                2.3282104 = boost
                5.161615 = idf(docFreq=673, maxDocs=43254)
                0.01739867 = queryNorm
              0.32260093 = fieldWeight in 301, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.161615 = idf(docFreq=673, maxDocs=43254)
                0.0625 = fieldNorm(doc=301)
          0.11403587 = weight(abstract_txt:documents in 301) [ClassicSimilarity], result of:
            0.11403587 = score(doc=301,freq=4.0), product of:
              0.22162639 = queryWeight, product of:
                3.094535 = boost
                4.1163282 = idf(docFreq=1916, maxDocs=43254)
                0.01739867 = queryNorm
              0.51454103 = fieldWeight in 301, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.1163282 = idf(docFreq=1916, maxDocs=43254)
                0.0625 = fieldNorm(doc=301)
          0.19177398 = weight(abstract_txt:journal in 301) [ClassicSimilarity], result of:
            0.19177398 = score(doc=301,freq=2.0), product of:
              0.41962114 = queryWeight, product of:
                4.664488 = boost
                5.170557 = idf(docFreq=667, maxDocs=43254)
                0.01739867 = queryNorm
              0.45701697 = fieldWeight in 301, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.170557 = idf(docFreq=667, maxDocs=43254)
                0.0625 = fieldNorm(doc=301)
          0.26824972 = weight(abstract_txt:indexing in 301) [ClassicSimilarity], result of:
            0.26824972 = score(doc=301,freq=4.0), product of:
              0.49388972 = queryWeight, product of:
                6.53303 = boost
                4.345095 = idf(docFreq=1524, maxDocs=43254)
                0.01739867 = queryNorm
              0.5431369 = fieldWeight in 301, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.345095 = idf(docFreq=1524, maxDocs=43254)
                0.0625 = fieldNorm(doc=301)
        0.24 = coord(6/25)
    
  3. Harter, S.P.; Nisonger, T.E.; Weng, A.: Semantic relationsships between cited and citing articles in library and information science journals (1993) 0.17
    0.16666159 = sum of:
      0.16666159 = product of:
        0.6944233 = sum of:
          0.007050703 = weight(abstract_txt:this in 5644) [ClassicSimilarity], result of:
            0.007050703 = score(doc=5644,freq=1.0), product of:
              0.046396565 = queryWeight, product of:
                1.0967388 = boost
                2.4314568 = idf(docFreq=10335, maxDocs=43254)
                0.01739867 = queryNorm
              0.15196605 = fieldWeight in 5644, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4314568 = idf(docFreq=10335, maxDocs=43254)
                0.0625 = fieldNorm(doc=5644)
          0.096491195 = weight(abstract_txt:descriptors in 5644) [ClassicSimilarity], result of:
            0.096491195 = score(doc=5644,freq=1.0), product of:
              0.23189546 = queryWeight, product of:
                2.0019848 = boost
                6.657565 = idf(docFreq=150, maxDocs=43254)
                0.01739867 = queryNorm
              0.41609782 = fieldWeight in 5644, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.657565 = idf(docFreq=150, maxDocs=43254)
                0.0625 = fieldNorm(doc=5644)
          0.095390365 = weight(abstract_txt:journals in 5644) [ClassicSimilarity], result of:
            0.095390365 = score(doc=5644,freq=2.0), product of:
              0.20908548 = queryWeight, product of:
                2.3282104 = boost
                5.161615 = idf(docFreq=673, maxDocs=43254)
                0.01739867 = queryNorm
              0.45622662 = fieldWeight in 5644, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.161615 = idf(docFreq=673, maxDocs=43254)
                0.0625 = fieldNorm(doc=5644)
          0.11403587 = weight(abstract_txt:documents in 5644) [ClassicSimilarity], result of:
            0.11403587 = score(doc=5644,freq=4.0), product of:
              0.22162639 = queryWeight, product of:
                3.094535 = boost
                4.1163282 = idf(docFreq=1916, maxDocs=43254)
                0.01739867 = queryNorm
              0.51454103 = fieldWeight in 5644, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.1163282 = idf(docFreq=1916, maxDocs=43254)
                0.0625 = fieldNorm(doc=5644)
          0.19177398 = weight(abstract_txt:journal in 5644) [ClassicSimilarity], result of:
            0.19177398 = score(doc=5644,freq=2.0), product of:
              0.41962114 = queryWeight, product of:
                4.664488 = boost
                5.170557 = idf(docFreq=667, maxDocs=43254)
                0.01739867 = queryNorm
              0.45701697 = fieldWeight in 5644, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.170557 = idf(docFreq=667, maxDocs=43254)
                0.0625 = fieldNorm(doc=5644)
          0.18968119 = weight(abstract_txt:indexing in 5644) [ClassicSimilarity], result of:
            0.18968119 = score(doc=5644,freq=2.0), product of:
              0.49388972 = queryWeight, product of:
                6.53303 = boost
                4.345095 = idf(docFreq=1524, maxDocs=43254)
                0.01739867 = queryNorm
              0.38405576 = fieldWeight in 5644, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.345095 = idf(docFreq=1524, maxDocs=43254)
                0.0625 = fieldNorm(doc=5644)
        0.24 = coord(6/25)
    
  4. Lu, K.; Mao, J.: ¬An automatic approach to weighted subject indexing : an empirical study in the biomedical domain (2015) 0.15
    0.14629598 = sum of:
      0.14629598 = product of:
        0.7314799 = sum of:
          0.062151536 = weight(abstract_txt:feasible in 6) [ClassicSimilarity], result of:
            0.062151536 = score(doc=6,freq=1.0), product of:
              0.13727508 = queryWeight, product of:
                1.0891696 = boost
                7.244028 = idf(docFreq=83, maxDocs=43254)
                0.01739867 = queryNorm
              0.45275176 = fieldWeight in 6, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.244028 = idf(docFreq=83, maxDocs=43254)
                0.0625 = fieldNorm(doc=6)
          0.0099712 = weight(abstract_txt:this in 6) [ClassicSimilarity], result of:
            0.0099712 = score(doc=6,freq=2.0), product of:
              0.046396565 = queryWeight, product of:
                1.0967388 = boost
                2.4314568 = idf(docFreq=10335, maxDocs=43254)
                0.01739867 = queryNorm
              0.21491244 = fieldWeight in 6, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4314568 = idf(docFreq=10335, maxDocs=43254)
                0.0625 = fieldNorm(doc=6)
          0.13645916 = weight(abstract_txt:descriptors in 6) [ClassicSimilarity], result of:
            0.13645916 = score(doc=6,freq=2.0), product of:
              0.23189546 = queryWeight, product of:
                2.0019848 = boost
                6.657565 = idf(docFreq=150, maxDocs=43254)
                0.01739867 = queryNorm
              0.58845115 = fieldWeight in 6, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.657565 = idf(docFreq=150, maxDocs=43254)
                0.0625 = fieldNorm(doc=6)
          0.09875796 = weight(abstract_txt:documents in 6) [ClassicSimilarity], result of:
            0.09875796 = score(doc=6,freq=3.0), product of:
              0.22162639 = queryWeight, product of:
                3.094535 = boost
                4.1163282 = idf(docFreq=1916, maxDocs=43254)
                0.01739867 = queryNorm
              0.4456056 = fieldWeight in 6, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.1163282 = idf(docFreq=1916, maxDocs=43254)
                0.0625 = fieldNorm(doc=6)
          0.42414007 = weight(abstract_txt:indexing in 6) [ClassicSimilarity], result of:
            0.42414007 = score(doc=6,freq=10.0), product of:
              0.49388972 = queryWeight, product of:
                6.53303 = boost
                4.345095 = idf(docFreq=1524, maxDocs=43254)
                0.01739867 = queryNorm
              0.85877484 = fieldWeight in 6, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                4.345095 = idf(docFreq=1524, maxDocs=43254)
                0.0625 = fieldNorm(doc=6)
        0.2 = coord(5/25)
    
  5. Nebelong-Bonnevie, E.; Frandsen, T.F.: Journal citation identity and journal citation image : a portrait of the Journal of Documentation (2006) 0.13
    0.12989055 = sum of:
      0.12989055 = product of:
        0.6494527 = sum of:
          0.0099712 = weight(abstract_txt:this in 587) [ClassicSimilarity], result of:
            0.0099712 = score(doc=587,freq=2.0), product of:
              0.046396565 = queryWeight, product of:
                1.0967388 = boost
                2.4314568 = idf(docFreq=10335, maxDocs=43254)
                0.01739867 = queryNorm
              0.21491244 = fieldWeight in 587, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4314568 = idf(docFreq=10335, maxDocs=43254)
                0.0625 = fieldNorm(doc=587)
          0.017324751 = weight(abstract_txt:approach in 587) [ClassicSimilarity], result of:
            0.017324751 = score(doc=587,freq=1.0), product of:
              0.07380376 = queryWeight, product of:
                1.1294159 = boost
                3.7558525 = idf(docFreq=2748, maxDocs=43254)
                0.01739867 = queryNorm
              0.23474078 = fieldWeight in 587, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7558525 = idf(docFreq=2748, maxDocs=43254)
                0.0625 = fieldNorm(doc=587)
          0.095390365 = weight(abstract_txt:journals in 587) [ClassicSimilarity], result of:
            0.095390365 = score(doc=587,freq=2.0), product of:
              0.20908548 = queryWeight, product of:
                2.3282104 = boost
                5.161615 = idf(docFreq=673, maxDocs=43254)
                0.01739867 = queryNorm
              0.45622662 = fieldWeight in 587, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.161615 = idf(docFreq=673, maxDocs=43254)
                0.0625 = fieldNorm(doc=587)
          0.057017934 = weight(abstract_txt:documents in 587) [ClassicSimilarity], result of:
            0.057017934 = score(doc=587,freq=1.0), product of:
              0.22162639 = queryWeight, product of:
                3.094535 = boost
                4.1163282 = idf(docFreq=1916, maxDocs=43254)
                0.01739867 = queryNorm
              0.25727051 = fieldWeight in 587, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1163282 = idf(docFreq=1916, maxDocs=43254)
                0.0625 = fieldNorm(doc=587)
          0.46974844 = weight(abstract_txt:journal in 587) [ClassicSimilarity], result of:
            0.46974844 = score(doc=587,freq=12.0), product of:
              0.41962114 = queryWeight, product of:
                4.664488 = boost
                5.170557 = idf(docFreq=667, maxDocs=43254)
                0.01739867 = queryNorm
              1.1194584 = fieldWeight in 587, product of:
                3.4641016 = tf(freq=12.0), with freq of:
                  12.0 = termFreq=12.0
                5.170557 = idf(docFreq=667, maxDocs=43254)
                0.0625 = fieldNorm(doc=587)
        0.2 = coord(5/25)