Document (#21768)

Author
Humphrey, S.M.
Title
Automatic indexing of documents from journal descriptors : a preliminary investigation
Source
Journal of the American Society for Information Science. 50(1999) no.8, S.661-674
Year
1999
Abstract
A new, fully automated approach for indedexing documents is presented based on associating textwords in a training set of bibliographic citations with the indexing of journals. This journal-level indexing is in the form of a consistent, timely set of journal descriptors (JDs) indexing the individual journals themselves. This indexing is maintained in journal records in a serials authority database. The advantage of this novel approach is that the training set does not depend on previous manual indexing of thousands of documents (i.e., any such indexing already in the training set is not used), but rather the relatively small intellectual effort of indexing at the journal level, usually a matter of a few thousand unique journals for which retrospective indexing to maintain consistency and currency may be feasible. If successful, JD indexing would provide topical categorization of documents outside the training set, i.e., journal articles, monographs, Web documents, reports from the grey literature, etc., and therefore be applied in searching. Because JDs are quite general, corresponding to subject domains, their most problable use would be for improving or refining search results
Theme
Automatisches Indexieren
Field
Medizin
Object
Medline

Similar documents (author)

  1. Humphrey, S.M.: Use and management of classification systems for knowledge-based indexing (1992) 5.80
    5.8024426 = sum of:
      5.8024426 = weight(author_txt:humphrey in 2094) [ClassicSimilarity], result of:
        5.8024426 = fieldWeight in 2094, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.283908 = idf(docFreq=10, maxDocs=43556)
          0.625 = fieldNorm(doc=2094)
    
  2. Humphrey, S.M.: Indexing biomedical documents : from thesaural to knowledge-based retrieval systems (1992) 5.80
    5.8024426 = sum of:
      5.8024426 = weight(author_txt:humphrey in 7638) [ClassicSimilarity], result of:
        5.8024426 = fieldWeight in 7638, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.283908 = idf(docFreq=10, maxDocs=43556)
          0.625 = fieldNorm(doc=7638)
    
  3. Humphrey, S.M.: ¬The MedIndEx prototype for computer assisted MEDLINE database indexing (1993) 5.80
    5.8024426 = sum of:
      5.8024426 = weight(author_txt:humphrey in 7816) [ClassicSimilarity], result of:
        5.8024426 = fieldWeight in 7816, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.283908 = idf(docFreq=10, maxDocs=43556)
          0.625 = fieldNorm(doc=7816)
    
  4. Humphrey, S.M.: Knowledge-based systems for indexing (1994) 5.80
    5.8024426 = sum of:
      5.8024426 = weight(author_txt:humphrey in 3053) [ClassicSimilarity], result of:
        5.8024426 = fieldWeight in 3053, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.283908 = idf(docFreq=10, maxDocs=43556)
          0.625 = fieldNorm(doc=3053)
    
  5. Humphrey, J.: Manuscripts and metadata : Descriptive metadata in three manuscript catalogs: DigCIM, MALVINE, & Digital Scriptorium (2007) 5.80
    5.8024426 = sum of:
      5.8024426 = weight(author_txt:humphrey in 2781) [ClassicSimilarity], result of:
        5.8024426 = fieldWeight in 2781, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.283908 = idf(docFreq=10, maxDocs=43556)
          0.625 = fieldNorm(doc=2781)
    

Similar documents (content)

  1. Ferber, R.: Automated indexing with thesaurus descriptors : a co-occurence based approach to multilingual retrieval (1997) 0.18
    0.18208523 = sum of:
      0.18208523 = product of:
        0.75868845 = sum of:
          0.012150025 = weight(abstract_txt:this in 5142) [ClassicSimilarity], result of:
            0.012150025 = score(doc=5142,freq=3.0), product of:
              0.046236724 = queryWeight, product of:
                1.0937852 = boost
                2.4274454 = idf(docFreq=10449, maxDocs=43556)
                0.017414281 = queryNorm
              0.26277867 = fieldWeight in 5142, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4274454 = idf(docFreq=10449, maxDocs=43556)
                0.0625 = fieldNorm(doc=5142)
          0.017333146 = weight(abstract_txt:approach in 5142) [ClassicSimilarity], result of:
            0.017333146 = score(doc=5142,freq=1.0), product of:
              0.07382394 = queryWeight, product of:
                1.1284738 = boost
                3.7566452 = idf(docFreq=2765, maxDocs=43556)
                0.017414281 = queryNorm
              0.23479033 = fieldWeight in 5142, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7566452 = idf(docFreq=2765, maxDocs=43556)
                0.0625 = fieldNorm(doc=5142)
          0.23635711 = weight(abstract_txt:descriptors in 5142) [ClassicSimilarity], result of:
            0.23635711 = score(doc=5142,freq=6.0), product of:
              0.23188587 = queryWeight, product of:
                2.0 = boost
                6.6579223 = idf(docFreq=151, maxDocs=43556)
                0.017414281 = queryNorm
              1.0192821 = fieldWeight in 5142, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.6579223 = idf(docFreq=151, maxDocs=43556)
                0.0625 = fieldNorm(doc=5142)
          0.12489855 = weight(abstract_txt:training in 5142) [ClassicSimilarity], result of:
            0.12489855 = score(doc=5142,freq=2.0), product of:
              0.27541265 = queryWeight, product of:
                3.0824766 = boost
                5.1307225 = idf(docFreq=699, maxDocs=43556)
                0.017414281 = queryNorm
              0.45349607 = fieldWeight in 5142, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1307225 = idf(docFreq=699, maxDocs=43556)
                0.0625 = fieldNorm(doc=5142)
          0.09905687 = weight(abstract_txt:documents in 5142) [ClassicSimilarity], result of:
            0.09905687 = score(doc=5142,freq=3.0), product of:
              0.22206235 = queryWeight, product of:
                3.0945702 = boost
                4.1206813 = idf(docFreq=1921, maxDocs=43556)
                0.017414281 = queryNorm
              0.44607684 = fieldWeight in 5142, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.1206813 = idf(docFreq=1921, maxDocs=43556)
                0.0625 = fieldNorm(doc=5142)
          0.26889274 = weight(abstract_txt:indexing in 5142) [ClassicSimilarity], result of:
            0.26889274 = score(doc=5142,freq=4.0), product of:
              0.49465412 = queryWeight, product of:
                6.531737 = boost
                4.3487797 = idf(docFreq=1529, maxDocs=43556)
                0.017414281 = queryNorm
              0.54359746 = fieldWeight in 5142, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.3487797 = idf(docFreq=1529, maxDocs=43556)
                0.0625 = fieldNorm(doc=5142)
        0.24 = coord(6/25)
    
  2. Humphrey, S.M.; Névéol, A.; Browne, A.; Gobeil, J.; Ruch, P.; Darmoni, S.J.: Comparing a rule-based versus statistical system for automatic categorization of MEDLINE documents according to biomedical specialty (2009) 0.18
    0.17949894 = sum of:
      0.17949894 = product of:
        0.7479123 = sum of:
          0.009920454 = weight(abstract_txt:this in 298) [ClassicSimilarity], result of:
            0.009920454 = score(doc=298,freq=2.0), product of:
              0.046236724 = queryWeight, product of:
                1.0937852 = boost
                2.4274454 = idf(docFreq=10449, maxDocs=43556)
                0.017414281 = queryNorm
              0.21455789 = fieldWeight in 298, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4274454 = idf(docFreq=10449, maxDocs=43556)
                0.0625 = fieldNorm(doc=298)
          0.09649238 = weight(abstract_txt:descriptors in 298) [ClassicSimilarity], result of:
            0.09649238 = score(doc=298,freq=1.0), product of:
              0.23188587 = queryWeight, product of:
                2.0 = boost
                6.6579223 = idf(docFreq=151, maxDocs=43556)
                0.017414281 = queryNorm
              0.41612014 = fieldWeight in 298, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6579223 = idf(docFreq=151, maxDocs=43556)
                0.0625 = fieldNorm(doc=298)
          0.067194216 = weight(abstract_txt:journals in 298) [ClassicSimilarity], result of:
            0.067194216 = score(doc=298,freq=1.0), product of:
              0.20854379 = queryWeight, product of:
                2.3229353 = boost
                5.155308 = idf(docFreq=682, maxDocs=43556)
                0.017414281 = queryNorm
              0.32220674 = fieldWeight in 298, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.155308 = idf(docFreq=682, maxDocs=43556)
                0.0625 = fieldNorm(doc=298)
          0.11438102 = weight(abstract_txt:documents in 298) [ClassicSimilarity], result of:
            0.11438102 = score(doc=298,freq=4.0), product of:
              0.22206235 = queryWeight, product of:
                3.0945702 = boost
                4.1206813 = idf(docFreq=1921, maxDocs=43556)
                0.017414281 = queryNorm
              0.51508516 = fieldWeight in 298, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.1206813 = idf(docFreq=1921, maxDocs=43556)
                0.0625 = fieldNorm(doc=298)
          0.19103152 = weight(abstract_txt:journal in 298) [ClassicSimilarity], result of:
            0.19103152 = score(doc=298,freq=2.0), product of:
              0.4185166 = queryWeight, product of:
                4.653823 = boost
                5.1641316 = idf(docFreq=676, maxDocs=43556)
                0.017414281 = queryNorm
              0.45644906 = fieldWeight in 298, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1641316 = idf(docFreq=676, maxDocs=43556)
                0.0625 = fieldNorm(doc=298)
          0.26889274 = weight(abstract_txt:indexing in 298) [ClassicSimilarity], result of:
            0.26889274 = score(doc=298,freq=4.0), product of:
              0.49465412 = queryWeight, product of:
                6.531737 = boost
                4.3487797 = idf(docFreq=1529, maxDocs=43556)
                0.017414281 = queryNorm
              0.54359746 = fieldWeight in 298, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.3487797 = idf(docFreq=1529, maxDocs=43556)
                0.0625 = fieldNorm(doc=298)
        0.24 = coord(6/25)
    
  3. Harter, S.P.; Nisonger, T.E.; Weng, A.: Semantic relationsships between cited and citing articles in library and information science journals (1993) 0.17
    0.1665798 = sum of:
      0.1665798 = product of:
        0.6940825 = sum of:
          0.0070148204 = weight(abstract_txt:this in 5641) [ClassicSimilarity], result of:
            0.0070148204 = score(doc=5641,freq=1.0), product of:
              0.046236724 = queryWeight, product of:
                1.0937852 = boost
                2.4274454 = idf(docFreq=10449, maxDocs=43556)
                0.017414281 = queryNorm
              0.15171534 = fieldWeight in 5641, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4274454 = idf(docFreq=10449, maxDocs=43556)
                0.0625 = fieldNorm(doc=5641)
          0.09649238 = weight(abstract_txt:descriptors in 5641) [ClassicSimilarity], result of:
            0.09649238 = score(doc=5641,freq=1.0), product of:
              0.23188587 = queryWeight, product of:
                2.0 = boost
                6.6579223 = idf(docFreq=151, maxDocs=43556)
                0.017414281 = queryNorm
              0.41612014 = fieldWeight in 5641, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6579223 = idf(docFreq=151, maxDocs=43556)
                0.0625 = fieldNorm(doc=5641)
          0.09502697 = weight(abstract_txt:journals in 5641) [ClassicSimilarity], result of:
            0.09502697 = score(doc=5641,freq=2.0), product of:
              0.20854379 = queryWeight, product of:
                2.3229353 = boost
                5.155308 = idf(docFreq=682, maxDocs=43556)
                0.017414281 = queryNorm
              0.45566913 = fieldWeight in 5641, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.155308 = idf(docFreq=682, maxDocs=43556)
                0.0625 = fieldNorm(doc=5641)
          0.11438102 = weight(abstract_txt:documents in 5641) [ClassicSimilarity], result of:
            0.11438102 = score(doc=5641,freq=4.0), product of:
              0.22206235 = queryWeight, product of:
                3.0945702 = boost
                4.1206813 = idf(docFreq=1921, maxDocs=43556)
                0.017414281 = queryNorm
              0.51508516 = fieldWeight in 5641, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.1206813 = idf(docFreq=1921, maxDocs=43556)
                0.0625 = fieldNorm(doc=5641)
          0.19103152 = weight(abstract_txt:journal in 5641) [ClassicSimilarity], result of:
            0.19103152 = score(doc=5641,freq=2.0), product of:
              0.4185166 = queryWeight, product of:
                4.653823 = boost
                5.1641316 = idf(docFreq=676, maxDocs=43556)
                0.017414281 = queryNorm
              0.45644906 = fieldWeight in 5641, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1641316 = idf(docFreq=676, maxDocs=43556)
                0.0625 = fieldNorm(doc=5641)
          0.19013587 = weight(abstract_txt:indexing in 5641) [ClassicSimilarity], result of:
            0.19013587 = score(doc=5641,freq=2.0), product of:
              0.49465412 = queryWeight, product of:
                6.531737 = boost
                4.3487797 = idf(docFreq=1529, maxDocs=43556)
                0.017414281 = queryNorm
              0.38438144 = fieldWeight in 5641, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3487797 = idf(docFreq=1529, maxDocs=43556)
                0.0625 = fieldNorm(doc=5641)
        0.24 = coord(6/25)
    
  4. Lu, K.; Mao, J.: ¬An automatic approach to weighted subject indexing : an empirical study in the biomedical domain (2015) 0.15
    0.14658329 = sum of:
      0.14658329 = product of:
        0.7329164 = sum of:
          0.062321525 = weight(abstract_txt:feasible in 291) [ClassicSimilarity], result of:
            0.062321525 = score(doc=291,freq=1.0), product of:
              0.13751845 = queryWeight, product of:
                1.0890764 = boost
                7.250986 = idf(docFreq=83, maxDocs=43556)
                0.017414281 = queryNorm
              0.45318663 = fieldWeight in 291, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.250986 = idf(docFreq=83, maxDocs=43556)
                0.0625 = fieldNorm(doc=291)
          0.009920454 = weight(abstract_txt:this in 291) [ClassicSimilarity], result of:
            0.009920454 = score(doc=291,freq=2.0), product of:
              0.046236724 = queryWeight, product of:
                1.0937852 = boost
                2.4274454 = idf(docFreq=10449, maxDocs=43556)
                0.017414281 = queryNorm
              0.21455789 = fieldWeight in 291, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4274454 = idf(docFreq=10449, maxDocs=43556)
                0.0625 = fieldNorm(doc=291)
          0.13646083 = weight(abstract_txt:descriptors in 291) [ClassicSimilarity], result of:
            0.13646083 = score(doc=291,freq=2.0), product of:
              0.23188587 = queryWeight, product of:
                2.0 = boost
                6.6579223 = idf(docFreq=151, maxDocs=43556)
                0.017414281 = queryNorm
              0.58848274 = fieldWeight in 291, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.6579223 = idf(docFreq=151, maxDocs=43556)
                0.0625 = fieldNorm(doc=291)
          0.09905687 = weight(abstract_txt:documents in 291) [ClassicSimilarity], result of:
            0.09905687 = score(doc=291,freq=3.0), product of:
              0.22206235 = queryWeight, product of:
                3.0945702 = boost
                4.1206813 = idf(docFreq=1921, maxDocs=43556)
                0.017414281 = queryNorm
              0.44607684 = fieldWeight in 291, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.1206813 = idf(docFreq=1921, maxDocs=43556)
                0.0625 = fieldNorm(doc=291)
          0.42515674 = weight(abstract_txt:indexing in 291) [ClassicSimilarity], result of:
            0.42515674 = score(doc=291,freq=10.0), product of:
              0.49465412 = queryWeight, product of:
                6.531737 = boost
                4.3487797 = idf(docFreq=1529, maxDocs=43556)
                0.017414281 = queryNorm
              0.8595031 = fieldWeight in 291, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                4.3487797 = idf(docFreq=1529, maxDocs=43556)
                0.0625 = fieldNorm(doc=291)
        0.2 = coord(5/25)
    
  5. Nebelong-Bonnevie, E.; Frandsen, T.F.: Journal citation identity and journal citation image : a portrait of the Journal of Documentation (2006) 0.13
    0.12948017 = sum of:
      0.12948017 = product of:
        0.6474008 = sum of:
          0.009920454 = weight(abstract_txt:this in 584) [ClassicSimilarity], result of:
            0.009920454 = score(doc=584,freq=2.0), product of:
              0.046236724 = queryWeight, product of:
                1.0937852 = boost
                2.4274454 = idf(docFreq=10449, maxDocs=43556)
                0.017414281 = queryNorm
              0.21455789 = fieldWeight in 584, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4274454 = idf(docFreq=10449, maxDocs=43556)
                0.0625 = fieldNorm(doc=584)
          0.017333146 = weight(abstract_txt:approach in 584) [ClassicSimilarity], result of:
            0.017333146 = score(doc=584,freq=1.0), product of:
              0.07382394 = queryWeight, product of:
                1.1284738 = boost
                3.7566452 = idf(docFreq=2765, maxDocs=43556)
                0.017414281 = queryNorm
              0.23479033 = fieldWeight in 584, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7566452 = idf(docFreq=2765, maxDocs=43556)
                0.0625 = fieldNorm(doc=584)
          0.09502697 = weight(abstract_txt:journals in 584) [ClassicSimilarity], result of:
            0.09502697 = score(doc=584,freq=2.0), product of:
              0.20854379 = queryWeight, product of:
                2.3229353 = boost
                5.155308 = idf(docFreq=682, maxDocs=43556)
                0.017414281 = queryNorm
              0.45566913 = fieldWeight in 584, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.155308 = idf(docFreq=682, maxDocs=43556)
                0.0625 = fieldNorm(doc=584)
          0.05719051 = weight(abstract_txt:documents in 584) [ClassicSimilarity], result of:
            0.05719051 = score(doc=584,freq=1.0), product of:
              0.22206235 = queryWeight, product of:
                3.0945702 = boost
                4.1206813 = idf(docFreq=1921, maxDocs=43556)
                0.017414281 = queryNorm
              0.25754258 = fieldWeight in 584, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1206813 = idf(docFreq=1921, maxDocs=43556)
                0.0625 = fieldNorm(doc=584)
          0.46792972 = weight(abstract_txt:journal in 584) [ClassicSimilarity], result of:
            0.46792972 = score(doc=584,freq=12.0), product of:
              0.4185166 = queryWeight, product of:
                4.653823 = boost
                5.1641316 = idf(docFreq=676, maxDocs=43556)
                0.017414281 = queryNorm
              1.1180673 = fieldWeight in 584, product of:
                3.4641016 = tf(freq=12.0), with freq of:
                  12.0 = termFreq=12.0
                5.1641316 = idf(docFreq=676, maxDocs=43556)
                0.0625 = fieldNorm(doc=584)
        0.2 = coord(5/25)