Search (20 results, page 1 of 1)

  • × theme_ss:"Indexierungsstudien"
  1. Haanen, E.: Specificiteit en consistentie : een kwantitatief oderzoek naar trefwoordtoekenning door UBA en UBN (1991) 0.05
    0.05353949 = product of:
      0.10707898 = sum of:
        0.08344315 = weight(_text_:open in 4778) [ClassicSimilarity], result of:
          0.08344315 = score(doc=4778,freq=2.0), product of:
            0.20964009 = queryWeight, product of:
              4.5032015 = idf(docFreq=1330, maxDocs=44218)
              0.046553567 = queryNorm
            0.39803052 = fieldWeight in 4778, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.5032015 = idf(docFreq=1330, maxDocs=44218)
              0.0625 = fieldNorm(doc=4778)
        0.023635827 = product of:
          0.047271654 = sum of:
            0.047271654 = weight(_text_:access in 4778) [ClassicSimilarity], result of:
              0.047271654 = score(doc=4778,freq=2.0), product of:
                0.15778996 = queryWeight, product of:
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.046553567 = queryNorm
                0.29958594 = fieldWeight in 4778, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.0625 = fieldNorm(doc=4778)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Online public access catalogues enable users to undertake subject searching by classification schedules, natural language, or controlled language terminology. In practice the 1st method is little used. Controlled language systems require indexers to index specifically and consistently. A comparative survey was made of indexing practices at Amsterdam and Mijmegen university libraries. On average Amsterdam assigned each document 3.5 index terms against 1.8 at Nijmegen. This discrepancy in indexing policy is the result of long-standing practices in each institution. Nijmegen has failed to utilise the advantages offered by online cataloges
    Source
    Open. 23(1991) no.2, S.45-49
  2. Subrahmanyam, B.: Library of Congress Classification numbers : issues of consistency and their implications for union catalogs (2006) 0.02
    0.015270403 = product of:
      0.06108161 = sum of:
        0.06108161 = sum of:
          0.029544784 = weight(_text_:access in 5784) [ClassicSimilarity], result of:
            0.029544784 = score(doc=5784,freq=2.0), product of:
              0.15778996 = queryWeight, product of:
                3.389428 = idf(docFreq=4053, maxDocs=44218)
                0.046553567 = queryNorm
              0.18724121 = fieldWeight in 5784, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.389428 = idf(docFreq=4053, maxDocs=44218)
                0.0390625 = fieldNorm(doc=5784)
          0.03153683 = weight(_text_:22 in 5784) [ClassicSimilarity], result of:
            0.03153683 = score(doc=5784,freq=2.0), product of:
              0.16302267 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046553567 = queryNorm
              0.19345059 = fieldWeight in 5784, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=5784)
      0.25 = coord(1/4)
    
    Abstract
    This study examined Library of Congress Classification (LCC)-based class numbers assigned to a representative sample of 200 titles in 52 American library systems to determine the level of consistency within and across those systems. The results showed that under the condition that a library system has a title, the probability of that title having the same LCC-based class number across library systems is greater than 85 percent. An examination of 121 titles displaying variations in class numbers among library systems showed certain titles (for example, multi-foci titles, titles in series, bibliographies, and fiction) lend themselves to alternate class numbers. Others were assigned variant numbers either due to latitude in the schedules or for reasons that cannot be pinpointed. With increasing dependence on copy cataloging, the size of such variations may continue to decrease. As the preferred class number with its alternates represents a title more fully than just the preferred class number, this paper argues for continued use of alternates by library systems and for finding a method to link alternate class numbers to preferred class numbers for enriched subject access through local and union catalogs.
    Date
    10. 9.2000 17:38:22
  3. McCarthy, C.: ¬The realibility factor in subject access (1986) 0.01
    0.010445658 = product of:
      0.041782632 = sum of:
        0.041782632 = product of:
          0.083565265 = sum of:
            0.083565265 = weight(_text_:access in 2271) [ClassicSimilarity], result of:
              0.083565265 = score(doc=2271,freq=4.0), product of:
                0.15778996 = queryWeight, product of:
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.046553567 = queryNorm
                0.5295981 = fieldWeight in 2271, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.078125 = fieldNorm(doc=2271)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    For truly effective subject access, it is essential that books on any given topic be brought together consistently under the same subject heading. With the advent of online catalogs, this goal has assumed new importance but has also become easier to achieve
  4. Brenner, S.H.; McKinin, E.J.: CINAHL and MEDLINE : a comparison of indexing practices (1989) 0.01
    0.010340675 = product of:
      0.0413627 = sum of:
        0.0413627 = product of:
          0.0827254 = sum of:
            0.0827254 = weight(_text_:access in 2843) [ClassicSimilarity], result of:
              0.0827254 = score(doc=2843,freq=8.0), product of:
                0.15778996 = queryWeight, product of:
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.046553567 = queryNorm
                0.5242754 = fieldWeight in 2843, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2843)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    A random sample of 50 nursing articles indexed in both MEDLINE and CINAHL during 1986 was used for comparing indexing pratices. Indexing was analysed by counting the number of major descriptors, the number of major and minor descriptors, the number of indexing access points, the number of common indexing access points, and the number and type of unique indexing points. The study results indicate: there are few differences in the number of major descriptors used, MEDLINE uses almost twice as many descriptors, MEDLINE has almost twice as many indexing access points, and MEDLINE and CINAHL provide few common access points.
  5. Cleverdon, C.W.: ASLIB Cranfield Research Project : Report on the first stage of an investigation into the comparative efficiency of indexing systems (1960) 0.01
    0.009461049 = product of:
      0.037844196 = sum of:
        0.037844196 = product of:
          0.07568839 = sum of:
            0.07568839 = weight(_text_:22 in 6158) [ClassicSimilarity], result of:
              0.07568839 = score(doc=6158,freq=2.0), product of:
                0.16302267 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046553567 = queryNorm
                0.46428138 = fieldWeight in 6158, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=6158)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Footnote
    Rez. in: College and research libraries 22(1961) no.3, S.228 (G. Jahoda)
  6. Veenema, F.: To index or not to index (1996) 0.01
    0.006307366 = product of:
      0.025229463 = sum of:
        0.025229463 = product of:
          0.050458927 = sum of:
            0.050458927 = weight(_text_:22 in 7247) [ClassicSimilarity], result of:
              0.050458927 = score(doc=7247,freq=2.0), product of:
                0.16302267 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046553567 = queryNorm
                0.30952093 = fieldWeight in 7247, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=7247)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Source
    Canadian journal of information and library science. 21(1996) no.2, S.1-22
  7. Hudon, M.: Conceptual compatibility in controlled language tools used to index and access the content of moving image collections (2004) 0.01
    0.006267395 = product of:
      0.02506958 = sum of:
        0.02506958 = product of:
          0.05013916 = sum of:
            0.05013916 = weight(_text_:access in 2655) [ClassicSimilarity], result of:
              0.05013916 = score(doc=2655,freq=4.0), product of:
                0.15778996 = queryWeight, product of:
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.046553567 = queryNorm
                0.31775886 = fieldWeight in 2655, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2655)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Five controlled vocabularies currently used for content representation in collections of non art moving images were examined to determine their level of conceptual compatibility. Methods borrowed from previous research in the area of indexing language compatibility were used. Quantitative data and qualitative observations allowed us to estimate more precisely and realistically the actual degree of conceptual redundancy in these indexing languages. It was found that the conceptual overlap is high enough to justify the pursuit of research and development work an a common basic indexing and access language that could be used to name objects, events, categories of persons, and relations most frequently depicted in non art moving image collections.
  8. Edwards, S.: Indexing practices at the National Agricultural Library (1993) 0.01
    0.0059089568 = product of:
      0.023635827 = sum of:
        0.023635827 = product of:
          0.047271654 = sum of:
            0.047271654 = weight(_text_:access in 555) [ClassicSimilarity], result of:
              0.047271654 = score(doc=555,freq=2.0), product of:
                0.15778996 = queryWeight, product of:
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.046553567 = queryNorm
                0.29958594 = fieldWeight in 555, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.0625 = fieldNorm(doc=555)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    This article discusses indexing practices at the National Agriculture Library. Indexers at NAL scan over 2,200 incoming journals for input into its bibliographic database, AGRICOLA. The National Agriculture Library's coverage extends worldwide covering a broad range of agriculture subjects. Access to AGRICOLA occurs in several ways: onsite search, commercial vendors, Dialog Information Services, Inc. and BRS Information Technologies. The National Agricultural Library uses CAB THESAURUS to describe the subject content of articles in AGRICOLA.
  9. Booth, A.: How consistent is MEDLINE indexing? (1990) 0.01
    0.005518945 = product of:
      0.02207578 = sum of:
        0.02207578 = product of:
          0.04415156 = sum of:
            0.04415156 = weight(_text_:22 in 3510) [ClassicSimilarity], result of:
              0.04415156 = score(doc=3510,freq=2.0), product of:
                0.16302267 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046553567 = queryNorm
                0.2708308 = fieldWeight in 3510, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3510)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Source
    Health libraries review. 7(1990) no.1, S.22-26
  10. Neshat, N.; Horri, A.: ¬A study of subject indexing consistency between the National Library of Iran and Humanities Libraries in the area of Iranian studies (2006) 0.01
    0.005518945 = product of:
      0.02207578 = sum of:
        0.02207578 = product of:
          0.04415156 = sum of:
            0.04415156 = weight(_text_:22 in 230) [ClassicSimilarity], result of:
              0.04415156 = score(doc=230,freq=2.0), product of:
                0.16302267 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046553567 = queryNorm
                0.2708308 = fieldWeight in 230, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=230)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    4. 1.2007 10:22:26
  11. Rowley, J.: ¬The controlled versus natural indexing languages debate revisited : a perspective on information retrieval practice and research (1994) 0.01
    0.005222829 = product of:
      0.020891316 = sum of:
        0.020891316 = product of:
          0.041782632 = sum of:
            0.041782632 = weight(_text_:access in 7151) [ClassicSimilarity], result of:
              0.041782632 = score(doc=7151,freq=4.0), product of:
                0.15778996 = queryWeight, product of:
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.046553567 = queryNorm
                0.26479906 = fieldWeight in 7151, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=7151)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    This article revisits the debate concerning controlled and natural indexing languages, as used in searching the databases of the online hosts, in-house information retrieval systems, online public access catalogues and databases stored on CD-ROM. The debate was first formulated in the early days of information retrieval more than a century ago but, despite significant advance in technology, remains unresolved. The article divides the history of the debate into four eras. Era one was characterised by the introduction of controlled vocabulary. Era two focused on comparisons between different indexing languages in order to assess which was best. Era three saw a number of case studies of limited generalisability and a general recognition that the best search performance can be achieved by the parallel use of the two types of indexing languages. The emphasis in Era four has been on the development of end-user-based systems, including online public access catalogues and databases on CD-ROM. Recent developments in the use of expert systems techniques to support the representation of meaning may lead to systems which offer significant support to the user in end-user searching. In the meantime, however, information retrieval in practice involves a mixture of natural and controlled indexing languages used to search a wide variety of different kinds of databases
  12. Morris, L.R.: ¬The frequency of use of Library of Congress Classification numbers and Dewey Decimal Classification numbers in the MARC file in the field of library science (1991) 0.01
    0.0051703374 = product of:
      0.02068135 = sum of:
        0.02068135 = product of:
          0.0413627 = sum of:
            0.0413627 = weight(_text_:access in 2308) [ClassicSimilarity], result of:
              0.0413627 = score(doc=2308,freq=2.0), product of:
                0.15778996 = queryWeight, product of:
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.046553567 = queryNorm
                0.2621377 = fieldWeight in 2308, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2308)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    The LCC and DDC systems were devised and updated by librarians who had and have no access to the eventual frequency of use of each number in those classification systems. 80% of the monographs in a MARC file of over 1.000.000 records are classified into 20% of the classification numbers in the field of library science and only 20% of the mongraphs are classified into 80% of the classification numbers in the field of library science. Classification of monographs coulld be made easier and performed more accurately if many of the little used and unused numbers were eliminated and many of the most crowded numbers were expanded. A number of examples are included
  13. Taniguchi, S.: Recording evidence in bibliographic records and descriptive metadata (2005) 0.00
    0.0047305245 = product of:
      0.018922098 = sum of:
        0.018922098 = product of:
          0.037844196 = sum of:
            0.037844196 = weight(_text_:22 in 3565) [ClassicSimilarity], result of:
              0.037844196 = score(doc=3565,freq=2.0), product of:
                0.16302267 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046553567 = queryNorm
                0.23214069 = fieldWeight in 3565, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3565)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    18. 6.2005 13:16:22
  14. Leininger, K.: Interindexer consistency in PsychINFO (2000) 0.00
    0.0047305245 = product of:
      0.018922098 = sum of:
        0.018922098 = product of:
          0.037844196 = sum of:
            0.037844196 = weight(_text_:22 in 2552) [ClassicSimilarity], result of:
              0.037844196 = score(doc=2552,freq=2.0), product of:
                0.16302267 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046553567 = queryNorm
                0.23214069 = fieldWeight in 2552, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2552)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    9. 2.1997 18:44:22
  15. David, C.; Giroux, L.; Bertrand-Gastaldy, S.; Lanteigne, D.: Indexing as problem solving : a cognitive approach to consistency (1995) 0.00
    0.0044317176 = product of:
      0.01772687 = sum of:
        0.01772687 = product of:
          0.03545374 = sum of:
            0.03545374 = weight(_text_:access in 3609) [ClassicSimilarity], result of:
              0.03545374 = score(doc=3609,freq=2.0), product of:
                0.15778996 = queryWeight, product of:
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.046553567 = queryNorm
                0.22468945 = fieldWeight in 3609, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3609)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Indexers differ in their judgement as to which terms reflect adequately the content of a document. Studies of interindexers' consistency identified several factors associated with low consistency, but failed to provide a comprehensive model of this phenomenon. Our research applies theories and methods from cognitive psychology to the study of indexing behavior. From a theoretical standpoint, indexing is considered as a problem solving situation. To access to the cognitive processes of indexers, 3 kinds of verbal reports are used. We will present results of an experiment in which 4 experienced indexers indexed the same documents. It will be shown that the 3 kinds of verbal reports provide complementary data on strategic behavior, and that it is of prime importance to consider the indexing task as an ill-defined problem, where the solution is partly defined by the indexer him(her)self
  16. White, H.; Willis, C.; Greenberg, J.: HIVEing : the effect of a semantic web technology on inter-indexer consistency (2014) 0.00
    0.0039421036 = product of:
      0.015768414 = sum of:
        0.015768414 = product of:
          0.03153683 = sum of:
            0.03153683 = weight(_text_:22 in 1781) [ClassicSimilarity], result of:
              0.03153683 = score(doc=1781,freq=2.0), product of:
                0.16302267 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046553567 = queryNorm
                0.19345059 = fieldWeight in 1781, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1781)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Purpose - The purpose of this paper is to examine the effect of the Helping Interdisciplinary Vocabulary Engineering (HIVE) system on the inter-indexer consistency of information professionals when assigning keywords to a scientific abstract. This study examined first, the inter-indexer consistency of potential HIVE users; second, the impact HIVE had on consistency; and third, challenges associated with using HIVE. Design/methodology/approach - A within-subjects quasi-experimental research design was used for this study. Data were collected using a task-scenario based questionnaire. Analysis was performed on consistency results using Hooper's and Rolling's inter-indexer consistency measures. A series of t-tests was used to judge the significance between consistency measure results. Findings - Results suggest that HIVE improves inter-indexing consistency. Working with HIVE increased consistency rates by 22 percent (Rolling's) and 25 percent (Hooper's) when selecting relevant terms from all vocabularies. A statistically significant difference exists between the assignment of free-text keywords and machine-aided keywords. Issues with homographs, disambiguation, vocabulary choice, and document structure were all identified as potential challenges. Research limitations/implications - Research limitations for this study can be found in the small number of vocabularies used for the study. Future research will include implementing HIVE into the Dryad Repository and studying its application in a repository system. Originality/value - This paper showcases several features used in HIVE system. By using traditional consistency measures to evaluate a semantic web technology, this paper emphasizes the link between traditional indexing and next generation machine-aided indexing (MAI) tools.
  17. Olson, H.A.; Wolfram, D.: Syntagmatic relationships and indexing consistency on a larger scale (2008) 0.00
    0.003693098 = product of:
      0.014772392 = sum of:
        0.014772392 = product of:
          0.029544784 = sum of:
            0.029544784 = weight(_text_:access in 2214) [ClassicSimilarity], result of:
              0.029544784 = score(doc=2214,freq=2.0), product of:
                0.15778996 = queryWeight, product of:
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.046553567 = queryNorm
                0.18724121 = fieldWeight in 2214, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2214)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Purpose - The purpose of this article is to examine interindexer consistency on a larger scale than other studies have done to determine if group consensus is reached by larger numbers of indexers and what, if any, relationships emerge between assigned terms. Design/methodology/approach - In total, 64 MLIS students were recruited to assign up to five terms to a document. The authors applied basic data modeling and the exploratory statistical techniques of multi-dimensional scaling (MDS) and hierarchical cluster analysis to determine whether relationships exist in indexing consistency and the coocurrence of assigned terms. Findings - Consistency in the assignment of indexing terms to a document follows an inverse shape, although it is not strictly power law-based unlike many other social phenomena. The exploratory techniques revealed that groups of terms clustered together. The resulting term cooccurrence relationships were largely syntagmatic. Research limitations/implications - The results are based on the indexing of one article by non-expert indexers and are, thus, not generalizable. Based on the study findings, along with the growing popularity of folksonomies and the apparent authority of communally developed information resources, communally developed indexes based on group consensus may have merit. Originality/value - Consistency in the assignment of indexing terms has been studied primarily on a small scale. Few studies have examined indexing on a larger scale with more than a handful of indexers. Recognition of the differences in indexing assignment has implications for the development of public information systems, especially those that do not use a controlled vocabulary and those tagged by end-users. In such cases, multiple access points that accommodate the different ways that users interpret content are needed so that searchers may be guided to relevant content despite using different terminology.
  18. Lee, D.H.; Schleyer, T.: Social tagging is no substitute for controlled indexing : a comparison of Medical Subject Headings and CiteULike tags assigned to 231,388 papers (2012) 0.00
    0.003693098 = product of:
      0.014772392 = sum of:
        0.014772392 = product of:
          0.029544784 = sum of:
            0.029544784 = weight(_text_:access in 383) [ClassicSimilarity], result of:
              0.029544784 = score(doc=383,freq=2.0), product of:
                0.15778996 = queryWeight, product of:
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.046553567 = queryNorm
                0.18724121 = fieldWeight in 383, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=383)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Social tagging and controlled indexing both facilitate access to information resources. Given the increasing popularity of social tagging and the limitations of controlled indexing (primarily cost and scalability), it is reasonable to investigate to what degree social tagging could substitute for controlled indexing. In this study, we compared CiteULike tags to Medical Subject Headings (MeSH) terms for 231,388 citations indexed in MEDLINE. In addition to descriptive analyses of the data sets, we present a paper-by-paper analysis of tags and MeSH terms: the number of common annotations, Jaccard similarity, and coverage ratio. In the analysis, we apply three increasingly progressive levels of text processing, ranging from normalization to stemming, to reduce the impact of lexical differences. Annotations of our corpus consisted of over 76,968 distinct tags and 21,129 distinct MeSH terms. The top 20 tags/MeSH terms showed little direct overlap. On a paper-by-paper basis, the number of common annotations ranged from 0.29 to 0.5 and the Jaccard similarity from 2.12% to 3.3% using increased levels of text processing. At most, 77,834 citations (33.6%) shared at least one annotation. Our results show that CiteULike tags and MeSH terms are quite distinct lexically, reflecting different viewpoints/processes between social tagging and controlled indexing.
  19. Lu, K.; Mao, J.; Li, G.: Toward effective automated weighted subject indexing : a comparison of different approaches in different environments (2018) 0.00
    0.003693098 = product of:
      0.014772392 = sum of:
        0.014772392 = product of:
          0.029544784 = sum of:
            0.029544784 = weight(_text_:access in 4292) [ClassicSimilarity], result of:
              0.029544784 = score(doc=4292,freq=2.0), product of:
                0.15778996 = queryWeight, product of:
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.046553567 = queryNorm
                0.18724121 = fieldWeight in 4292, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4292)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Subject indexing plays an important role in supporting subject access to information resources. Current subject indexing systems do not make adequate distinctions on the importance of assigned subject descriptors. Assigning numeric weights to subject descriptors to distinguish their importance to the documents can strengthen the role of subject metadata. Automated methods are more cost-effective. This study compares different automated weighting methods in different environments. Two evaluation methods were used to assess the performance. Experiments on three datasets in the biomedical domain suggest the performance of different weighting methods depends on whether it is an abstract or full text environment. Mutual information with bag-of-words representation shows the best average performance in the full text environment, while cosine with bag-of-words representation is the best in an abstract environment. The cosine measure has relatively consistent and robust performance. A direct weighting method, IDF (Inverse Document Frequency), can produce quick and reasonable estimates of the weights. Bag-of-words representation generally outperforms the concept-based representation. Further improvement in performance can be obtained by using the learning-to-rank method to integrate different weighting methods. This study follows up Lu and Mao (Journal of the Association for Information Science and Technology, 66, 1776-1784, 2015), in which an automated weighted subject indexing method was proposed and validated. The findings from this study contribute to more effective weighted subject indexing.
  20. Bade, D.: ¬The creation and persistence of misinformation in shared library catalogs : language and subject knowledge in a technological era (2002) 0.00
    0.0015768415 = product of:
      0.006307366 = sum of:
        0.006307366 = product of:
          0.012614732 = sum of:
            0.012614732 = weight(_text_:22 in 1858) [ClassicSimilarity], result of:
              0.012614732 = score(doc=1858,freq=2.0), product of:
                0.16302267 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046553567 = queryNorm
                0.07738023 = fieldWeight in 1858, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.015625 = fieldNorm(doc=1858)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    22. 9.1997 19:16:05