Search (94 results, page 1 of 5)

  • × theme_ss:"Indexierungsstudien"
  1. White, H.; Willis, C.; Greenberg, J.: HIVEing : the effect of a semantic web technology on inter-indexer consistency (2014) 0.03
    0.025192767 = product of:
      0.050385535 = sum of:
        0.050385535 = sum of:
          0.019307716 = weight(_text_:j in 1781) [ClassicSimilarity], result of:
            0.019307716 = score(doc=1781,freq=2.0), product of:
              0.109994456 = queryWeight, product of:
                3.1774964 = idf(docFreq=5010, maxDocs=44218)
                0.034616705 = queryNorm
              0.17553353 = fieldWeight in 1781, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.1774964 = idf(docFreq=5010, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1781)
          0.007627391 = weight(_text_:a in 1781) [ClassicSimilarity], result of:
            0.007627391 = score(doc=1781,freq=18.0), product of:
              0.039914686 = queryWeight, product of:
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.034616705 = queryNorm
              0.19109234 = fieldWeight in 1781, product of:
                4.2426405 = tf(freq=18.0), with freq of:
                  18.0 = termFreq=18.0
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1781)
          0.023450429 = weight(_text_:22 in 1781) [ClassicSimilarity], result of:
            0.023450429 = score(doc=1781,freq=2.0), product of:
              0.1212218 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.034616705 = queryNorm
              0.19345059 = fieldWeight in 1781, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1781)
      0.5 = coord(1/2)
    
    Abstract
    Purpose - The purpose of this paper is to examine the effect of the Helping Interdisciplinary Vocabulary Engineering (HIVE) system on the inter-indexer consistency of information professionals when assigning keywords to a scientific abstract. This study examined first, the inter-indexer consistency of potential HIVE users; second, the impact HIVE had on consistency; and third, challenges associated with using HIVE. Design/methodology/approach - A within-subjects quasi-experimental research design was used for this study. Data were collected using a task-scenario based questionnaire. Analysis was performed on consistency results using Hooper's and Rolling's inter-indexer consistency measures. A series of t-tests was used to judge the significance between consistency measure results. Findings - Results suggest that HIVE improves inter-indexing consistency. Working with HIVE increased consistency rates by 22 percent (Rolling's) and 25 percent (Hooper's) when selecting relevant terms from all vocabularies. A statistically significant difference exists between the assignment of free-text keywords and machine-aided keywords. Issues with homographs, disambiguation, vocabulary choice, and document structure were all identified as potential challenges. Research limitations/implications - Research limitations for this study can be found in the small number of vocabularies used for the study. Future research will include implementing HIVE into the Dryad Repository and studying its application in a repository system. Originality/value - This paper showcases several features used in HIVE system. By using traditional consistency measures to evaluate a semantic web technology, this paper emphasizes the link between traditional indexing and next generation machine-aided indexing (MAI) tools.
    Type
    a
  2. Lancaster, F.W.; Mills, J.: Testing indexes and index language devices : the ASLIB Cranfield project (1964) 0.02
    0.02330686 = product of:
      0.04661372 = sum of:
        0.04661372 = product of:
          0.06992058 = sum of:
            0.061784692 = weight(_text_:j in 2261) [ClassicSimilarity], result of:
              0.061784692 = score(doc=2261,freq=2.0), product of:
                0.109994456 = queryWeight, product of:
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.034616705 = queryNorm
                0.5617073 = fieldWeight in 2261, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.125 = fieldNorm(doc=2261)
            0.008135883 = weight(_text_:a in 2261) [ClassicSimilarity], result of:
              0.008135883 = score(doc=2261,freq=2.0), product of:
                0.039914686 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.034616705 = queryNorm
                0.20383182 = fieldWeight in 2261, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.125 = fieldNorm(doc=2261)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Type
    a
  3. Veenema, F.: To index or not to index (1996) 0.01
    0.014424542 = product of:
      0.028849084 = sum of:
        0.028849084 = product of:
          0.043273624 = sum of:
            0.0057529383 = weight(_text_:a in 7247) [ClassicSimilarity], result of:
              0.0057529383 = score(doc=7247,freq=4.0), product of:
                0.039914686 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.034616705 = queryNorm
                0.14413087 = fieldWeight in 7247, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0625 = fieldNorm(doc=7247)
            0.037520684 = weight(_text_:22 in 7247) [ClassicSimilarity], result of:
              0.037520684 = score(doc=7247,freq=2.0), product of:
                0.1212218 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.034616705 = queryNorm
                0.30952093 = fieldWeight in 7247, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=7247)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Abstract
    Describes an experiment comparing the performance of automatic full-text indexing software for personal computers with the human intellectual assignment of indexing terms in each document in a collection. Considers the times required to index the document, to retrieve documents satisfying 5 typical foreseen information needs, and the recall and precision ratios of searching. The software used is QuickFinder facility in WordPerfect 6.1 for Windows
    Source
    Canadian journal of information and library science. 21(1996) no.2, S.1-22
    Type
    a
  4. Booth, A.: How consistent is MEDLINE indexing? (1990) 0.01
    0.01359659 = product of:
      0.02719318 = sum of:
        0.02719318 = product of:
          0.040789768 = sum of:
            0.00795917 = weight(_text_:a in 3510) [ClassicSimilarity], result of:
              0.00795917 = score(doc=3510,freq=10.0), product of:
                0.039914686 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.034616705 = queryNorm
                0.19940455 = fieldWeight in 3510, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3510)
            0.0328306 = weight(_text_:22 in 3510) [ClassicSimilarity], result of:
              0.0328306 = score(doc=3510,freq=2.0), product of:
                0.1212218 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.034616705 = queryNorm
                0.2708308 = fieldWeight in 3510, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3510)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Abstract
    A known-item search for abstracts to previously retrieved references revealed that 2 documents from the same annual volume had been indexed twice. Working from the premise that the whole volume may have been double-indexed, a search strategy was devised that limited the journal code to the year in question. 57 references were retrieved, comprising 28 pairs of duplicates plus a citation for the whole volume. Author, title, source and descriptors were requested off-line and the citations were paired with their duplicates. The 4 categories of descriptors-major descriptors, minor descriptors, subheadings and check-tags-were compared for depth and consistency of indexing and lessons that might be learnt from the study are discussed.
    Source
    Health libraries review. 7(1990) no.1, S.22-26
    Type
    a
  5. Neshat, N.; Horri, A.: ¬A study of subject indexing consistency between the National Library of Iran and Humanities Libraries in the area of Iranian studies (2006) 0.01
    0.012998583 = product of:
      0.025997166 = sum of:
        0.025997166 = product of:
          0.038995747 = sum of:
            0.006165147 = weight(_text_:a in 230) [ClassicSimilarity], result of:
              0.006165147 = score(doc=230,freq=6.0), product of:
                0.039914686 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.034616705 = queryNorm
                0.1544581 = fieldWeight in 230, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=230)
            0.0328306 = weight(_text_:22 in 230) [ClassicSimilarity], result of:
              0.0328306 = score(doc=230,freq=2.0), product of:
                0.1212218 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.034616705 = queryNorm
                0.2708308 = fieldWeight in 230, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=230)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Date
    4. 1.2007 10:22:26
    Type
    a
  6. Saarti, J.: Consistency of subject indexing of novels by public library professionals and patrons (2002) 0.01
    0.012215096 = product of:
      0.024430191 = sum of:
        0.024430191 = product of:
          0.036645286 = sum of:
            0.030892346 = weight(_text_:j in 4473) [ClassicSimilarity], result of:
              0.030892346 = score(doc=4473,freq=2.0), product of:
                0.109994456 = queryWeight, product of:
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.034616705 = queryNorm
                0.28085366 = fieldWeight in 4473, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.0625 = fieldNorm(doc=4473)
            0.0057529383 = weight(_text_:a in 4473) [ClassicSimilarity], result of:
              0.0057529383 = score(doc=4473,freq=4.0), product of:
                0.039914686 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.034616705 = queryNorm
                0.14413087 = fieldWeight in 4473, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0625 = fieldNorm(doc=4473)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Abstract
    The paper discusses the consistency of fiction indexing of library professionals and patrons based on an empirical test. Indexing was carried out with a Finnish fictional thesaurus and all of the test persons indexed the same five novels. The consistency of indexing was determined to be low; several reasons are postulated. Also an algorithm for typified indexing of fiction is given as well as some suggestions for the development of fiction information retrieval systems and content representation.
    Type
    a
  7. Leininger, K.: Interindexer consistency in PsychINFO (2000) 0.01
    0.011141642 = product of:
      0.022283284 = sum of:
        0.022283284 = product of:
          0.033424925 = sum of:
            0.0052844114 = weight(_text_:a in 2552) [ClassicSimilarity], result of:
              0.0052844114 = score(doc=2552,freq=6.0), product of:
                0.039914686 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.034616705 = queryNorm
                0.13239266 = fieldWeight in 2552, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2552)
            0.028140513 = weight(_text_:22 in 2552) [ClassicSimilarity], result of:
              0.028140513 = score(doc=2552,freq=2.0), product of:
                0.1212218 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.034616705 = queryNorm
                0.23214069 = fieldWeight in 2552, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2552)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Abstract
    Reports results of a study to examine interindexer consistency (the degree to which indexers, when assigning terms to a chosen record, will choose the same terms to reflect that record) in the PsycINFO database using 60 records that were inadvertently processed twice between 1996 and 1998. Five aspects of interindexer consistency were analysed. Two methods were used to calculate interindexer consistency: one posited by Hooper (1965) and the other by Rollin (1981). Aspects analysed were: checktag consistency (66.24% using Hooper's calculation and 77.17% using Rollin's); major-to-all term consistency (49.31% and 62.59% respectively); overall indexing consistency (49.02% and 63.32%); classification code consistency (44.17% and 45.00%); and major-to-major term consistency (43.24% and 56.09%). The average consistency across all categories was 50.4% using Hooper's method and 60.83% using Rollin's. Although comparison with previous studies is difficult due to methodological variations in the overall study of indexing consistency and the specific characteristics of the database, results generally support previous findings when trends and similar studies are analysed.
    Date
    9. 2.1997 18:44:22
    Type
    a
  8. Taghva, K.; Borsack, J.; Nartker, T.; Condit, A.: ¬The role of manually-assigned keywords in query expansion (2004) 0.01
    0.011065317 = product of:
      0.022130635 = sum of:
        0.022130635 = product of:
          0.03319595 = sum of:
            0.027030803 = weight(_text_:j in 2567) [ClassicSimilarity], result of:
              0.027030803 = score(doc=2567,freq=2.0), product of:
                0.109994456 = queryWeight, product of:
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.034616705 = queryNorm
                0.24574696 = fieldWeight in 2567, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2567)
            0.006165147 = weight(_text_:a in 2567) [ClassicSimilarity], result of:
              0.006165147 = score(doc=2567,freq=6.0), product of:
                0.039914686 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.034616705 = queryNorm
                0.1544581 = fieldWeight in 2567, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2567)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Abstract
    We report on two types of experiments with respect to manually-assigned keywords to documents in a collection. The first type of experiment examines the usefulness of manually-assigned keywords to automatic feedback. The second type of experiment considers the potential benefits of these keywords to the user as an interactive tool. Several experiments were run and compared. The results of these experiments indicate that there is no gain in average precision when manually-assigned keywords are used for query expansion. Further, manually-assigned keywords did not aid the user as an interactive tool for document understanding.
    Type
    a
  9. Taniguchi, S.: Recording evidence in bibliographic records and descriptive metadata (2005) 0.01
    0.010397157 = product of:
      0.020794313 = sum of:
        0.020794313 = product of:
          0.03119147 = sum of:
            0.0030509564 = weight(_text_:a in 3565) [ClassicSimilarity], result of:
              0.0030509564 = score(doc=3565,freq=2.0), product of:
                0.039914686 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.034616705 = queryNorm
                0.07643694 = fieldWeight in 3565, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3565)
            0.028140513 = weight(_text_:22 in 3565) [ClassicSimilarity], result of:
              0.028140513 = score(doc=3565,freq=2.0), product of:
                0.1212218 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.034616705 = queryNorm
                0.23214069 = fieldWeight in 3565, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3565)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Date
    18. 6.2005 13:16:22
    Type
    a
  10. Subrahmanyam, B.: Library of Congress Classification numbers : issues of consistency and their implications for union catalogs (2006) 0.01
    0.009892723 = product of:
      0.019785445 = sum of:
        0.019785445 = product of:
          0.029678168 = sum of:
            0.0062277387 = weight(_text_:a in 5784) [ClassicSimilarity], result of:
              0.0062277387 = score(doc=5784,freq=12.0), product of:
                0.039914686 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.034616705 = queryNorm
                0.15602624 = fieldWeight in 5784, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5784)
            0.023450429 = weight(_text_:22 in 5784) [ClassicSimilarity], result of:
              0.023450429 = score(doc=5784,freq=2.0), product of:
                0.1212218 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.034616705 = queryNorm
                0.19345059 = fieldWeight in 5784, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5784)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Abstract
    This study examined Library of Congress Classification (LCC)-based class numbers assigned to a representative sample of 200 titles in 52 American library systems to determine the level of consistency within and across those systems. The results showed that under the condition that a library system has a title, the probability of that title having the same LCC-based class number across library systems is greater than 85 percent. An examination of 121 titles displaying variations in class numbers among library systems showed certain titles (for example, multi-foci titles, titles in series, bibliographies, and fiction) lend themselves to alternate class numbers. Others were assigned variant numbers either due to latitude in the schedules or for reasons that cannot be pinpointed. With increasing dependence on copy cataloging, the size of such variations may continue to decrease. As the preferred class number with its alternates represents a title more fully than just the preferred class number, this paper argues for continued use of alternates by library systems and for finding a method to link alternate class numbers to preferred class numbers for enriched subject access through local and union catalogs.
    Date
    10. 9.2000 17:38:22
    Type
    a
  11. Ellis, D.; Furner, J.; Willett, P.: On the creation of hypertext links in full-text documents : measurement of retrieval effectiveness (1996) 0.01
    0.009491567 = product of:
      0.018983133 = sum of:
        0.018983133 = product of:
          0.0284747 = sum of:
            0.019307716 = weight(_text_:j in 4214) [ClassicSimilarity], result of:
              0.019307716 = score(doc=4214,freq=2.0), product of:
                0.109994456 = queryWeight, product of:
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.034616705 = queryNorm
                0.17553353 = fieldWeight in 4214, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4214)
            0.009166983 = weight(_text_:a in 4214) [ClassicSimilarity], result of:
              0.009166983 = score(doc=4214,freq=26.0), product of:
                0.039914686 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.034616705 = queryNorm
                0.22966442 = fieldWeight in 4214, product of:
                  5.0990195 = tf(freq=26.0), with freq of:
                    26.0 = termFreq=26.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4214)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Abstract
    An important stage in the process or retrieval of objects from a hypertext database is the creation of a set of internodal links that are intended to represent the relationships existing between objects; this operation is often undertaken manually, just as index terms are often manually assigned to documents in a conventional retrieval system. In an earlier article (1994), the results were published of a study in which several different sets of links were inserted, each by a different person, between the paragraphs of each of a number of full-text documents. These results showed little similarity between the link-sets, a finding that was comparable with those of studies of inter-indexer consistency, which suggest that there is generally only a low level of agreement between the sets of index terms assigned to a document by different indexers. In this article, a description is provided of an investigation into the nature of the relationship existing between (i) the levels of inter-linker consistency obtaining among the group of hypertext databases used in our earlier experiments, and (ii) the levels of effectiveness of a number of searches carried out in those databases. An account is given of the implementation of the searches and of the methods used in the calculation of numerical values expressing their effectiveness. Analysis of the results of a comparison between recorded levels of consistency and those of effectiveness does not allow us to draw conclusions about the consistency - effectiveness relationship that are equivalent to those drawn in comparable studies of inter-indexer consistency
    Type
    a
  12. Westerman, S.J.; Cribbin, T.; Collins, J.: Human assessments of document similarity (2010) 0.01
    0.009484557 = product of:
      0.018969115 = sum of:
        0.018969115 = product of:
          0.028453672 = sum of:
            0.02316926 = weight(_text_:j in 3915) [ClassicSimilarity], result of:
              0.02316926 = score(doc=3915,freq=2.0), product of:
                0.109994456 = queryWeight, product of:
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.034616705 = queryNorm
                0.21064025 = fieldWeight in 3915, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3915)
            0.0052844114 = weight(_text_:a in 3915) [ClassicSimilarity], result of:
              0.0052844114 = score(doc=3915,freq=6.0), product of:
                0.039914686 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.034616705 = queryNorm
                0.13239266 = fieldWeight in 3915, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3915)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Abstract
    Two studies are reported that examined the reliability of human assessments of document similarity and the association between human ratings and the results of n-gram automatic text analysis (ATA). Human interassessor reliability (IAR) was moderate to poor. However, correlations between average human ratings and n-gram solutions were strong. The average correlation between ATA and individual human solutions was greater than IAR. N-gram length influenced the strength of association, but optimum string length depended on the nature of the text (technical vs. nontechnical). We conclude that the methodology applied in previous studies may have led to overoptimistic views on human reliability, but that an optimal n-gram solution can provide a good approximation of the average human assessment of document similarity, a result that has important implications for future development of document visualization systems.
    Type
    a
  13. Cleverdon, C.W.: ASLIB Cranfield Research Project : Report on the first stage of an investigation into the comparative efficiency of indexing systems (1960) 0.01
    0.009380171 = product of:
      0.018760342 = sum of:
        0.018760342 = product of:
          0.056281026 = sum of:
            0.056281026 = weight(_text_:22 in 6158) [ClassicSimilarity], result of:
              0.056281026 = score(doc=6158,freq=2.0), product of:
                0.1212218 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.034616705 = queryNorm
                0.46428138 = fieldWeight in 6158, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=6158)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
    Footnote
    Rez. in: College and research libraries 22(1961) no.3, S.228 (G. Jahoda)
  14. Wolfram, D.; Zhang, J.: ¬An investigation of the influence of indexing exhaustivity and term distributions on a document space (2002) 0.01
    0.009246705 = product of:
      0.01849341 = sum of:
        0.01849341 = product of:
          0.027740113 = sum of:
            0.019307716 = weight(_text_:j in 5238) [ClassicSimilarity], result of:
              0.019307716 = score(doc=5238,freq=2.0), product of:
                0.109994456 = queryWeight, product of:
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.034616705 = queryNorm
                0.17553353 = fieldWeight in 5238, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5238)
            0.008432399 = weight(_text_:a in 5238) [ClassicSimilarity], result of:
              0.008432399 = score(doc=5238,freq=22.0), product of:
                0.039914686 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.034616705 = queryNorm
                0.21126054 = fieldWeight in 5238, product of:
                  4.690416 = tf(freq=22.0), with freq of:
                    22.0 = termFreq=22.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5238)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Abstract
    Wolfram and Zhang are interested in the effect of different indexing exhaustivity, by which they mean the number of terms chosen, and of different index term distributions and different term weighting methods on the resulting document cluster organization. The Distance Angle Retrieval Environment, DARE, which provides a two dimensional display of retrieved documents was used to represent the document clusters based upon a document's distance from the searcher's main interest, and on the angle formed by the document, a point representing a minor interest, and the point representing the main interest. If the centroid and the origin of the document space are assigned as major and minor points the average distance between documents and the centroid can be measured providing an indication of cluster organization. in the form of a size normalized similarity measure. Using 500 records from NTIS and nine models created by intersecting low, observed, and high exhaustivity levels (based upon a negative binomial distribution) with shallow, observed, and steep term distributions (based upon a Zipf distribution) simulation runs were preformed using inverse document frequency, inter-document term frequency, and inverse document frequency based upon both inter and intra-document frequencies. Low exhaustivity and shallow distributions result in a more dense document space and less effective retrieval. High exhaustivity and steeper distributions result in a more diffuse space.
    Type
    a
  15. Braam, R.R.; Bruil, J.: Quality of indexing information : authors' views on indexing of their articles in chemical abstracts online CA-file (1992) 0.01
    0.009161321 = product of:
      0.018322643 = sum of:
        0.018322643 = product of:
          0.027483964 = sum of:
            0.02316926 = weight(_text_:j in 2638) [ClassicSimilarity], result of:
              0.02316926 = score(doc=2638,freq=2.0), product of:
                0.109994456 = queryWeight, product of:
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.034616705 = queryNorm
                0.21064025 = fieldWeight in 2638, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2638)
            0.004314704 = weight(_text_:a in 2638) [ClassicSimilarity], result of:
              0.004314704 = score(doc=2638,freq=4.0), product of:
                0.039914686 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.034616705 = queryNorm
                0.10809815 = fieldWeight in 2638, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2638)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Abstract
    Studies the quality of subject indexing by Chemical Abstracts Indexing Service by confronting authors with the particular indexing terms attributed to their computer, for 270 articles published in 54 journals, 5 articles out of each journal. Responses (80%) indicate the superior quality of keywords, both as content descriptors and as retrieval tools. Author judgements on these 2 different aspects do not always converge, however. CAS's indexing policy to cover only 'new' aspects is reflected in author's judgements that index lists are somewhat incomplete, in particular in the case of thesaurus terms (index headings). The large effort expanded by CAS in maintaining and using a subject thesuaurs, in order to select valid index headings, as compared to quick and cheap keyword postings, does not lead to clear superior quality of thesaurus terms for document description nor in retrieval. Some 20% of papers were not placed in 'proper' CA main section, according to authors. As concerns the use of indexing data by third parties, in bibliometrics, users should be aware of the indexing policies behind the data, in order to prevent invalid interpretations
    Type
    a
  16. Cleverdon, C.W.; Mills, J.; Keen, M.: Factors determining the performance of indexing systems : ASLIB Cranfield research project (1966) 0.01
    0.009010268 = product of:
      0.018020537 = sum of:
        0.018020537 = product of:
          0.054061607 = sum of:
            0.054061607 = weight(_text_:j in 5363) [ClassicSimilarity], result of:
              0.054061607 = score(doc=5363,freq=2.0), product of:
                0.109994456 = queryWeight, product of:
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.034616705 = queryNorm
                0.4914939 = fieldWeight in 5363, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.109375 = fieldNorm(doc=5363)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
  17. Rowley, J.: ¬The controlled versus natural indexing languages debate revisited : a perspective on information retrieval practice and research (1994) 0.01
    0.008678148 = product of:
      0.017356295 = sum of:
        0.017356295 = product of:
          0.026034443 = sum of:
            0.019307716 = weight(_text_:j in 7151) [ClassicSimilarity], result of:
              0.019307716 = score(doc=7151,freq=2.0), product of:
                0.109994456 = queryWeight, product of:
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.034616705 = queryNorm
                0.17553353 = fieldWeight in 7151, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=7151)
            0.0067267264 = weight(_text_:a in 7151) [ClassicSimilarity], result of:
              0.0067267264 = score(doc=7151,freq=14.0), product of:
                0.039914686 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.034616705 = queryNorm
                0.1685276 = fieldWeight in 7151, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=7151)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Abstract
    This article revisits the debate concerning controlled and natural indexing languages, as used in searching the databases of the online hosts, in-house information retrieval systems, online public access catalogues and databases stored on CD-ROM. The debate was first formulated in the early days of information retrieval more than a century ago but, despite significant advance in technology, remains unresolved. The article divides the history of the debate into four eras. Era one was characterised by the introduction of controlled vocabulary. Era two focused on comparisons between different indexing languages in order to assess which was best. Era three saw a number of case studies of limited generalisability and a general recognition that the best search performance can be achieved by the parallel use of the two types of indexing languages. The emphasis in Era four has been on the development of end-user-based systems, including online public access catalogues and databases on CD-ROM. Recent developments in the use of expert systems techniques to support the representation of meaning may lead to systems which offer significant support to the user in end-user searching. In the meantime, however, information retrieval in practice involves a mixture of natural and controlled indexing languages used to search a wide variety of different kinds of databases
    Type
    a
  18. Qin, J.: Semantic similarities between a keyword database and a controlled vocabulary database : an investigation in the antibiotic resistance literature (2000) 0.01
    0.008330947 = product of:
      0.016661894 = sum of:
        0.016661894 = product of:
          0.024992839 = sum of:
            0.019307716 = weight(_text_:j in 4386) [ClassicSimilarity], result of:
              0.019307716 = score(doc=4386,freq=2.0), product of:
                0.109994456 = queryWeight, product of:
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.034616705 = queryNorm
                0.17553353 = fieldWeight in 4386, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4386)
            0.0056851218 = weight(_text_:a in 4386) [ClassicSimilarity], result of:
              0.0056851218 = score(doc=4386,freq=10.0), product of:
                0.039914686 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.034616705 = queryNorm
                0.14243183 = fieldWeight in 4386, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4386)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Abstract
    The 'KeyWords Plus' in the Science Citation Index database represents an approach to combining citation and semantic indexing in describing the document content. This paper explores the similariites or dissimilarities between citation-semantic and analytic indexing. The dataset consisted of over 400 matching records in the SCI and MEDLINE databases on antibiotic resistance in pneumonia. The degree of similarity in indexing terms was found to vary on a scale from completely different to completely identical with various levels in between. The within-document similarity in the 2 databases was measured by a variation on the Jaccard coefficient - the Inclusion Index. The average inclusion coefficient was 0,4134 for SCI and 0,3371 for Medline. The 20 terms occuring most frequently in each database were identified. The 2 groups of terms shared the same terms that consist of the 'intellectual base' for the subject. conceptual similarity was analyzed through scatterplots of matching and nonmatching terms vs. partially identical and broader/narrower terms. The study also found that both databases differed in assigning terms in various semantic categories. Implications of this research and further studies are suggested
    Type
    a
  19. Moreiro-González, J.-A.; Bolaños-Mejías, C.: Folksonomy indexing from the assignment of free tags to setup subject : a search analysis into the domain of legal history (2018) 0.01
    0.008330947 = product of:
      0.016661894 = sum of:
        0.016661894 = product of:
          0.024992839 = sum of:
            0.019307716 = weight(_text_:j in 4640) [ClassicSimilarity], result of:
              0.019307716 = score(doc=4640,freq=2.0), product of:
                0.109994456 = queryWeight, product of:
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.034616705 = queryNorm
                0.17553353 = fieldWeight in 4640, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4640)
            0.0056851218 = weight(_text_:a in 4640) [ClassicSimilarity], result of:
              0.0056851218 = score(doc=4640,freq=10.0), product of:
                0.039914686 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.034616705 = queryNorm
                0.14243183 = fieldWeight in 4640, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4640)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Abstract
    The behaviour and lexical quality of the folksonomies is examined by comparing two online social networks: Library-Thing (for books) and Flickr (for photos). We presented a case study that combines quantitative and qualitative elements, singularized by the lexical and functional framework. Our query was made by "Legal History" and by the synonyms "Law History" and "History of Law." We then examined the relevance, consistency and precision of the tags attached to the retrieved documents, in addition to their lexical composition. We identified the difficulties caused by free tagging and some of the folksonomy solutions that have been found to solve them. The results are presented in comparative tables, giving special attention to related tags within each retrieved document. Although the number of ambiguous or inconsistent tags is not very large, these do nevertheless represent the most obvious problem to search and retrieval in folksonomies. Relevance is high when the terms are assigned by especially competent taggers. Even with less expert taggers, ambiguity is often successfully corrected by contextualizing the concepts within related tags. A propinquity to associative and taxonomic lexical semantic knowledge is reached via contextual relationships.
    Type
    a
  20. Lu, K.; Mao, J.: ¬An automatic approach to weighted subject indexing : an empirical study in the biomedical domain (2015) 0.01
    0.007903798 = product of:
      0.015807595 = sum of:
        0.015807595 = product of:
          0.023711393 = sum of:
            0.019307716 = weight(_text_:j in 4005) [ClassicSimilarity], result of:
              0.019307716 = score(doc=4005,freq=2.0), product of:
                0.109994456 = queryWeight, product of:
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.034616705 = queryNorm
                0.17553353 = fieldWeight in 4005, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4005)
            0.0044036764 = weight(_text_:a in 4005) [ClassicSimilarity], result of:
              0.0044036764 = score(doc=4005,freq=6.0), product of:
                0.039914686 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.034616705 = queryNorm
                0.11032722 = fieldWeight in 4005, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4005)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Abstract
    Subject indexing is an intellectually intensive process that has many inherent uncertainties. Existing manual subject indexing systems generally produce binary outcomes for whether or not to assign an indexing term. This does not sufficiently reflect the extent to which the indexing terms are associated with the documents. On the other hand, the idea of probabilistic or weighted indexing was proposed a long time ago and has seen success in capturing uncertainties in the automatic indexing process. One hurdle to overcome in implementing weighted indexing in manual subject indexing systems is the practical burden that could be added to the already intensive indexing process. This study proposes a method to infer automatically the associations between subject terms and documents through text mining. By uncovering the connections between MeSH descriptors and document text, we are able to derive the weights of MeSH descriptors manually assigned to documents. Our initial results suggest that the inference method is feasible and promising. The study has practical implications for improving subject indexing practice and providing better support for information retrieval.
    Type
    a

Authors

Languages

Types

  • a 90
  • r 2
  • ? 1
  • b 1
  • m 1
  • More… Less…