Search (69 results, page 1 of 4)

  • × theme_ss:"Indexierungsstudien"
  1. Braam, R.R.; Bruil, J.: Quality of indexing information : authors' views on indexing of their articles in chemical abstracts online CA-file (1992) 0.08
    0.07741733 = product of:
      0.12902887 = sum of:
        0.092019245 = weight(_text_:section in 2638) [ClassicSimilarity], result of:
          0.092019245 = score(doc=2638,freq=2.0), product of:
            0.26305357 = queryWeight, product of:
              5.276892 = idf(docFreq=613, maxDocs=44218)
              0.049850095 = queryNorm
            0.34981182 = fieldWeight in 2638, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.276892 = idf(docFreq=613, maxDocs=44218)
              0.046875 = fieldNorm(doc=2638)
        0.022607451 = weight(_text_:on in 2638) [ClassicSimilarity], result of:
          0.022607451 = score(doc=2638,freq=4.0), product of:
            0.109641045 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.049850095 = queryNorm
            0.20619515 = fieldWeight in 2638, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.046875 = fieldNorm(doc=2638)
        0.0144021725 = weight(_text_:information in 2638) [ClassicSimilarity], result of:
          0.0144021725 = score(doc=2638,freq=4.0), product of:
            0.08751074 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.049850095 = queryNorm
            0.16457605 = fieldWeight in 2638, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=2638)
      0.6 = coord(3/5)
    
    Abstract
    Studies the quality of subject indexing by Chemical Abstracts Indexing Service by confronting authors with the particular indexing terms attributed to their computer, for 270 articles published in 54 journals, 5 articles out of each journal. Responses (80%) indicate the superior quality of keywords, both as content descriptors and as retrieval tools. Author judgements on these 2 different aspects do not always converge, however. CAS's indexing policy to cover only 'new' aspects is reflected in author's judgements that index lists are somewhat incomplete, in particular in the case of thesaurus terms (index headings). The large effort expanded by CAS in maintaining and using a subject thesuaurs, in order to select valid index headings, as compared to quick and cheap keyword postings, does not lead to clear superior quality of thesaurus terms for document description nor in retrieval. Some 20% of papers were not placed in 'proper' CA main section, according to authors. As concerns the use of indexing data by third parties, in bibliometrics, users should be aware of the indexing policies behind the data, in order to prevent invalid interpretations
    Source
    Journal of information science. 18(1992) no.5, S.399-408
  2. White, H.; Willis, C.; Greenberg, J.: HIVEing : the effect of a semantic web technology on inter-indexer consistency (2014) 0.06
    0.0620684 = product of:
      0.103447326 = sum of:
        0.026643137 = weight(_text_:on in 1781) [ClassicSimilarity], result of:
          0.026643137 = score(doc=1781,freq=8.0), product of:
            0.109641045 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.049850095 = queryNorm
            0.24300331 = fieldWeight in 1781, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1781)
        0.0084865615 = weight(_text_:information in 1781) [ClassicSimilarity], result of:
          0.0084865615 = score(doc=1781,freq=2.0), product of:
            0.08751074 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.049850095 = queryNorm
            0.09697737 = fieldWeight in 1781, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1781)
        0.06831763 = sum of:
          0.03454763 = weight(_text_:technology in 1781) [ClassicSimilarity], result of:
            0.03454763 = score(doc=1781,freq=4.0), product of:
              0.14847288 = queryWeight, product of:
                2.978387 = idf(docFreq=6114, maxDocs=44218)
                0.049850095 = queryNorm
              0.23268649 = fieldWeight in 1781, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.978387 = idf(docFreq=6114, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1781)
          0.03377 = weight(_text_:22 in 1781) [ClassicSimilarity], result of:
            0.03377 = score(doc=1781,freq=2.0), product of:
              0.17456654 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.049850095 = queryNorm
              0.19345059 = fieldWeight in 1781, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1781)
      0.6 = coord(3/5)
    
    Abstract
    Purpose - The purpose of this paper is to examine the effect of the Helping Interdisciplinary Vocabulary Engineering (HIVE) system on the inter-indexer consistency of information professionals when assigning keywords to a scientific abstract. This study examined first, the inter-indexer consistency of potential HIVE users; second, the impact HIVE had on consistency; and third, challenges associated with using HIVE. Design/methodology/approach - A within-subjects quasi-experimental research design was used for this study. Data were collected using a task-scenario based questionnaire. Analysis was performed on consistency results using Hooper's and Rolling's inter-indexer consistency measures. A series of t-tests was used to judge the significance between consistency measure results. Findings - Results suggest that HIVE improves inter-indexing consistency. Working with HIVE increased consistency rates by 22 percent (Rolling's) and 25 percent (Hooper's) when selecting relevant terms from all vocabularies. A statistically significant difference exists between the assignment of free-text keywords and machine-aided keywords. Issues with homographs, disambiguation, vocabulary choice, and document structure were all identified as potential challenges. Research limitations/implications - Research limitations for this study can be found in the small number of vocabularies used for the study. Future research will include implementing HIVE into the Dryad Repository and studying its application in a repository system. Originality/value - This paper showcases several features used in HIVE system. By using traditional consistency measures to evaluate a semantic web technology, this paper emphasizes the link between traditional indexing and next generation machine-aided indexing (MAI) tools.
  3. Rowley, J.: ¬The controlled versus natural indexing languages debate revisited : a perspective on information retrieval practice and research (1994) 0.04
    0.03658734 = product of:
      0.060978897 = sum of:
        0.029787935 = weight(_text_:on in 7151) [ClassicSimilarity], result of:
          0.029787935 = score(doc=7151,freq=10.0), product of:
            0.109641045 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.049850095 = queryNorm
            0.271686 = fieldWeight in 7151, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=7151)
        0.018976528 = weight(_text_:information in 7151) [ClassicSimilarity], result of:
          0.018976528 = score(doc=7151,freq=10.0), product of:
            0.08751074 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.049850095 = queryNorm
            0.21684799 = fieldWeight in 7151, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=7151)
        0.012214432 = product of:
          0.024428863 = sum of:
            0.024428863 = weight(_text_:technology in 7151) [ClassicSimilarity], result of:
              0.024428863 = score(doc=7151,freq=2.0), product of:
                0.14847288 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.049850095 = queryNorm
                0.16453418 = fieldWeight in 7151, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=7151)
          0.5 = coord(1/2)
      0.6 = coord(3/5)
    
    Abstract
    This article revisits the debate concerning controlled and natural indexing languages, as used in searching the databases of the online hosts, in-house information retrieval systems, online public access catalogues and databases stored on CD-ROM. The debate was first formulated in the early days of information retrieval more than a century ago but, despite significant advance in technology, remains unresolved. The article divides the history of the debate into four eras. Era one was characterised by the introduction of controlled vocabulary. Era two focused on comparisons between different indexing languages in order to assess which was best. Era three saw a number of case studies of limited generalisability and a general recognition that the best search performance can be achieved by the parallel use of the two types of indexing languages. The emphasis in Era four has been on the development of end-user-based systems, including online public access catalogues and databases on CD-ROM. Recent developments in the use of expert systems techniques to support the representation of meaning may lead to systems which offer significant support to the user in end-user searching. In the meantime, however, information retrieval in practice involves a mixture of natural and controlled indexing languages used to search a wide variety of different kinds of databases
    Source
    Journal of information science. 20(1994) no.2, S.108-119
  4. Tseng, Y.-H.: Keyword extraction techniques and relevance feedback (1997) 0.04
    0.03616686 = product of:
      0.060278103 = sum of:
        0.02637536 = weight(_text_:on in 1830) [ClassicSimilarity], result of:
          0.02637536 = score(doc=1830,freq=4.0), product of:
            0.109641045 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.049850095 = queryNorm
            0.24056101 = fieldWeight in 1830, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1830)
        0.016802534 = weight(_text_:information in 1830) [ClassicSimilarity], result of:
          0.016802534 = score(doc=1830,freq=4.0), product of:
            0.08751074 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.049850095 = queryNorm
            0.1920054 = fieldWeight in 1830, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1830)
        0.017100206 = product of:
          0.03420041 = sum of:
            0.03420041 = weight(_text_:technology in 1830) [ClassicSimilarity], result of:
              0.03420041 = score(doc=1830,freq=2.0), product of:
                0.14847288 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.049850095 = queryNorm
                0.23034787 = fieldWeight in 1830, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1830)
          0.5 = coord(1/2)
      0.6 = coord(3/5)
    
    Abstract
    Automatic keyword extraction is an important and fundamental technology in an advanced information retrieval systems. Briefly compares several major keyword extraction methods, lists their advantages and disadvantages, and reports recent research progress in Taiwan. Also describes the application of a keyword extraction algorithm in an information retrieval system for relevance feedback. Preliminary analysis shows that the error rate of extracting relevant keywords is 18%, and that the precision rate is over 50%. The main disadvantage of this approach is that the extraction results depend on the retrieval results, which in turn depend on the data held by the database. Apart from collecting more data, this problem can be alleviated by the application of a thesaurus constructed by the same keyword extraction algorithm
  5. Lu, K.; Mao, J.; Li, G.: Toward effective automated weighted subject indexing : a comparison of different approaches in different environments (2018) 0.03
    0.034392346 = product of:
      0.057320572 = sum of:
        0.023073634 = weight(_text_:on in 4292) [ClassicSimilarity], result of:
          0.023073634 = score(doc=4292,freq=6.0), product of:
            0.109641045 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.049850095 = queryNorm
            0.21044704 = fieldWeight in 4292, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4292)
        0.016973123 = weight(_text_:information in 4292) [ClassicSimilarity], result of:
          0.016973123 = score(doc=4292,freq=8.0), product of:
            0.08751074 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.049850095 = queryNorm
            0.19395474 = fieldWeight in 4292, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4292)
        0.017273815 = product of:
          0.03454763 = sum of:
            0.03454763 = weight(_text_:technology in 4292) [ClassicSimilarity], result of:
              0.03454763 = score(doc=4292,freq=4.0), product of:
                0.14847288 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.049850095 = queryNorm
                0.23268649 = fieldWeight in 4292, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4292)
          0.5 = coord(1/2)
      0.6 = coord(3/5)
    
    Abstract
    Subject indexing plays an important role in supporting subject access to information resources. Current subject indexing systems do not make adequate distinctions on the importance of assigned subject descriptors. Assigning numeric weights to subject descriptors to distinguish their importance to the documents can strengthen the role of subject metadata. Automated methods are more cost-effective. This study compares different automated weighting methods in different environments. Two evaluation methods were used to assess the performance. Experiments on three datasets in the biomedical domain suggest the performance of different weighting methods depends on whether it is an abstract or full text environment. Mutual information with bag-of-words representation shows the best average performance in the full text environment, while cosine with bag-of-words representation is the best in an abstract environment. The cosine measure has relatively consistent and robust performance. A direct weighting method, IDF (Inverse Document Frequency), can produce quick and reasonable estimates of the weights. Bag-of-words representation generally outperforms the concept-based representation. Further improvement in performance can be obtained by using the learning-to-rank method to integrate different weighting methods. This study follows up Lu and Mao (Journal of the Association for Information Science and Technology, 66, 1776-1784, 2015), in which an automated weighted subject indexing method was proposed and validated. The findings from this study contribute to more effective weighted subject indexing.
    Source
    Journal of the Association for Information Science and Technology. 69(2018) no.1, S.121-133
  6. Bade, D.: ¬The creation and persistence of misinformation in shared library catalogs : language and subject knowledge in a technological era (2002) 0.03
    0.03295981 = product of:
      0.054933015 = sum of:
        0.04337829 = weight(_text_:section in 1858) [ClassicSimilarity], result of:
          0.04337829 = score(doc=1858,freq=4.0), product of:
            0.26305357 = queryWeight, product of:
              5.276892 = idf(docFreq=613, maxDocs=44218)
              0.049850095 = queryNorm
            0.16490288 = fieldWeight in 1858, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.276892 = idf(docFreq=613, maxDocs=44218)
              0.015625 = fieldNorm(doc=1858)
        0.0048007243 = weight(_text_:information in 1858) [ClassicSimilarity], result of:
          0.0048007243 = score(doc=1858,freq=4.0), product of:
            0.08751074 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.049850095 = queryNorm
            0.054858685 = fieldWeight in 1858, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.015625 = fieldNorm(doc=1858)
        0.0067539997 = product of:
          0.0135079995 = sum of:
            0.0135079995 = weight(_text_:22 in 1858) [ClassicSimilarity], result of:
              0.0135079995 = score(doc=1858,freq=2.0), product of:
                0.17456654 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049850095 = queryNorm
                0.07738023 = fieldWeight in 1858, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.015625 = fieldNorm(doc=1858)
          0.5 = coord(1/2)
      0.6 = coord(3/5)
    
    Date
    22. 9.1997 19:16:05
    Footnote
    Rez. in JASIST 54(2003) no.4, S.356-357 (S.J. Lincicum): "Reliance upon shared cataloging in academic libraries in the United States has been driven largely by the need to reduce the expense of cataloging operations without muck regard for the Impact that this approach might have an the quality of the records included in local catalogs. In recent years, ever increasing pressures have prompted libraries to adopt practices such as "rapid" copy cataloging that purposely reduce the scrutiny applied to bibliographic records downloaded from shared databases, possibly increasing the number of errors that slip through unnoticed. Errors in bibliographic records can lead to serious problems for library catalog users. If the data contained in bibliographic records is inaccurate, users will have difficulty discovering and recognizing resources in a library's collection that are relevant to their needs. Thus, it has become increasingly important to understand the extent and nature of errors that occur in the records found in large shared bibliographic databases, such as OCLC WorldCat, to develop cataloging practices optimized for the shared cataloging environment. Although this monograph raises a few legitimate concerns about recent trends in cataloging practice, it fails to provide the "detailed look" at misinformation in library catalogs arising from linguistic errors and mistakes in subject analysis promised by the publisher. A basic premise advanced throughout the text is that a certain amount of linguistic and subject knowledge is required to catalog library materials effectively. The author emphasizes repeatedly that most catalogers today are asked to catalog an increasingly diverse array of materials, and that they are often required to work in languages or subject areas of which they have little or no knowledge. He argues that the records contributed to shared databases are increasingly being created by catalogers with inadequate linguistic or subject expertise. This adversely affects the quality of individual library catalogs because errors often go uncorrected as records are downloaded from shared databases to local catalogs by copy catalogers who possess even less knowledge. Calling misinformation an "evil phenomenon," Bade states that his main goal is to discuss, "two fundamental types of misinformation found in bibliographic and authority records in library catalogs: that arising from linguistic errors, and that caused by errors in subject analysis, including missing or wrong subject headings" (p. 2). After a superficial discussion of "other" types of errors that can occur in bibliographic records, such as typographical errors and errors in the application of descriptive cataloging rules, Bade begins his discussion of linguistic errors. He asserts that sharing bibliographic records created by catalogers with inadequate linguistic or subject knowledge has, "disastrous effects an the library community" (p. 6). To support this bold assertion, Bade provides as evidence little more than a laundry list of errors that he has personally observed in bibliographic records over the years. When he eventually cites several studies that have addressed the availability and quality of records available for materials in languages other than English, he fails to describe the findings of these studies in any detail, let alone relate the findings to his own observations in a meaningful way. Bade claims that a lack of linguistic expertise among catalogers is the "primary source for linguistic misinformation in our databases" (p. 10), but he neither cites substantive data from existing studies nor provides any new data regarding the overall level of linguistic knowledge among catalogers to support this claim. The section concludes with a brief list of eight sensible, if unoriginal, suggestions for coping with the challenge of cataloging materials in unfamiliar languages.
    Bade begins his discussion of errors in subject analysis by summarizing the contents of seven records containing what he considers to be egregious errors. The examples were drawn only from items that he has encountered in the course of his work. Five of the seven records were full-level ("I" level) records for Eastern European materials created between 1996 and 2000 in the OCLC WorldCat database. The final two examples were taken from records created by Bade himself over an unspecified period of time. Although he is to be commended for examining the actual items cataloged and for examining mostly items that he claims to have adequate linguistic and subject expertise to evaluate reliably, Bade's methodology has major flaws. First and foremost, the number of examples provided is completely inadequate to draw any conclusions about the extent of the problem. Although an in-depth qualitative analysis of a small number of records might have yielded some valuable insight into factors that contribute to errors in subject analysis, Bade provides no Information about the circumstances under which the live OCLC records he critiques were created. Instead, he offers simplistic explanations for the errors based solely an his own assumptions. He supplements his analysis of examples with an extremely brief survey of other studies regarding errors in subject analysis, which consists primarily of criticism of work done by Sheila Intner. In the end, it is impossible to draw any reliable conclusions about the nature or extent of errors in subject analysis found in records in shared bibliographic databases based an Bade's analysis. In the final third of the essay, Bade finally reveals his true concern: the deintellectualization of cataloging. It would strengthen the essay tremendously to present this as the primary premise from the very beginning, as this section offers glimpses of a compelling argument. Bade laments, "Many librarians simply do not sec cataloging as an intellectual activity requiring an educated mind" (p. 20). Commenting an recent trends in copy cataloging practice, he declares, "The disaster of our time is that this work is being done more and more by people who can neither evaluate nor correct imported errors and offen are forbidden from even thinking about it" (p. 26). Bade argues that the most valuable content found in catalog records is the intellectual content contributed by knowledgeable catalogers, and he asserts that to perform intellectually demanding tasks such as subject analysis reliably and effectively, catalogers must have the linguistic and subject knowledge required to gain at least a rudimentary understanding of the materials that they describe. He contends that requiring catalogers to quickly dispense with materials in unfamiliar languages and subjects clearly undermines their ability to perform the intellectual work of cataloging and leads to an increasing number of errors in the bibliographic records contributed to shared databases.
    Imprint
    Urbana-Champaign, IL : Illinois University at Urbana-Champaign, Graduate School of Library and Information Science
  7. Taniguchi, S.: Recording evidence in bibliographic records and descriptive metadata (2005) 0.03
    0.032009 = product of:
      0.0800225 = sum of:
        0.0101838745 = weight(_text_:information in 3565) [ClassicSimilarity], result of:
          0.0101838745 = score(doc=3565,freq=2.0), product of:
            0.08751074 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.049850095 = queryNorm
            0.116372846 = fieldWeight in 3565, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=3565)
        0.06983863 = sum of:
          0.029314637 = weight(_text_:technology in 3565) [ClassicSimilarity], result of:
            0.029314637 = score(doc=3565,freq=2.0), product of:
              0.14847288 = queryWeight, product of:
                2.978387 = idf(docFreq=6114, maxDocs=44218)
                0.049850095 = queryNorm
              0.19744103 = fieldWeight in 3565, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.978387 = idf(docFreq=6114, maxDocs=44218)
                0.046875 = fieldNorm(doc=3565)
          0.040523995 = weight(_text_:22 in 3565) [ClassicSimilarity], result of:
            0.040523995 = score(doc=3565,freq=2.0), product of:
              0.17456654 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.049850095 = queryNorm
              0.23214069 = fieldWeight in 3565, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=3565)
      0.4 = coord(2/5)
    
    Date
    18. 6.2005 13:16:22
    Source
    Journal of the American Society for Information Science and Technology. 56(2005) no.8, S.872-882
  8. Bodoff, D.; Richter-Levin, Y.: Viewpoints in indexing term assignment (2020) 0.03
    0.031517737 = product of:
      0.05252956 = sum of:
        0.027688364 = weight(_text_:on in 5765) [ClassicSimilarity], result of:
          0.027688364 = score(doc=5765,freq=6.0), product of:
            0.109641045 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.049850095 = queryNorm
            0.25253648 = fieldWeight in 5765, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.046875 = fieldNorm(doc=5765)
        0.0101838745 = weight(_text_:information in 5765) [ClassicSimilarity], result of:
          0.0101838745 = score(doc=5765,freq=2.0), product of:
            0.08751074 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.049850095 = queryNorm
            0.116372846 = fieldWeight in 5765, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=5765)
        0.014657319 = product of:
          0.029314637 = sum of:
            0.029314637 = weight(_text_:technology in 5765) [ClassicSimilarity], result of:
              0.029314637 = score(doc=5765,freq=2.0), product of:
                0.14847288 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.049850095 = queryNorm
                0.19744103 = fieldWeight in 5765, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5765)
          0.5 = coord(1/2)
      0.6 = coord(3/5)
    
    Abstract
    The literature on assigned indexing considers three possible viewpoints-the author's viewpoint as evidenced in the title, the users' viewpoint, and the indexer's viewpoint-and asks whether and which of those views should be reflected in an indexer's choice of terms to assign to an item. We study this question empirically, as opposed to normatively. Based on the literature that discusses whose viewpoints should be reflected, we construct a research model that includes those same three viewpoints as factors that might be influencing term assignment in actual practice. In the unique study design that we employ, the records of term assignments made by identified indexers in academic libraries are cross-referenced with the results of a survey that those same indexers completed on political views. Our results indicate that in our setting, variance in term assignment was best explained by indexers' personal political views.
    Source
    Journal of the Association for Information Science and Technology. 71(2020) no.4, S.450-461
  9. Cleverdon, C.W.: ASLIB Cranfield Research Project : Report on the first stage of an investigation into the comparative efficiency of indexing systems (1960) 0.03
    0.028998304 = product of:
      0.07249576 = sum of:
        0.031971764 = weight(_text_:on in 6158) [ClassicSimilarity], result of:
          0.031971764 = score(doc=6158,freq=2.0), product of:
            0.109641045 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.049850095 = queryNorm
            0.29160398 = fieldWeight in 6158, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.09375 = fieldNorm(doc=6158)
        0.040523995 = product of:
          0.08104799 = sum of:
            0.08104799 = weight(_text_:22 in 6158) [ClassicSimilarity], result of:
              0.08104799 = score(doc=6158,freq=2.0), product of:
                0.17456654 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049850095 = queryNorm
                0.46428138 = fieldWeight in 6158, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=6158)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Footnote
    Rez. in: College and research libraries 22(1961) no.3, S.228 (G. Jahoda)
  10. Westerman, S.J.; Cribbin, T.; Collins, J.: Human assessments of document similarity (2010) 0.03
    0.028469186 = product of:
      0.047448643 = sum of:
        0.022607451 = weight(_text_:on in 3915) [ClassicSimilarity], result of:
          0.022607451 = score(doc=3915,freq=4.0), product of:
            0.109641045 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.049850095 = queryNorm
            0.20619515 = fieldWeight in 3915, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.046875 = fieldNorm(doc=3915)
        0.0101838745 = weight(_text_:information in 3915) [ClassicSimilarity], result of:
          0.0101838745 = score(doc=3915,freq=2.0), product of:
            0.08751074 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.049850095 = queryNorm
            0.116372846 = fieldWeight in 3915, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=3915)
        0.014657319 = product of:
          0.029314637 = sum of:
            0.029314637 = weight(_text_:technology in 3915) [ClassicSimilarity], result of:
              0.029314637 = score(doc=3915,freq=2.0), product of:
                0.14847288 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.049850095 = queryNorm
                0.19744103 = fieldWeight in 3915, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3915)
          0.5 = coord(1/2)
      0.6 = coord(3/5)
    
    Abstract
    Two studies are reported that examined the reliability of human assessments of document similarity and the association between human ratings and the results of n-gram automatic text analysis (ATA). Human interassessor reliability (IAR) was moderate to poor. However, correlations between average human ratings and n-gram solutions were strong. The average correlation between ATA and individual human solutions was greater than IAR. N-gram length influenced the strength of association, but optimum string length depended on the nature of the text (technical vs. nontechnical). We conclude that the methodology applied in previous studies may have led to overoptimistic views on human reliability, but that an optimal n-gram solution can provide a good approximation of the average human assessment of document similarity, a result that has important implications for future development of document visualization systems.
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.8, S.1535-1542
  11. Huffman, G.D.; Vital, D.A.; Bivins, R.G.: Generating indices with lexical association methods : term uniqueness (1990) 0.03
    0.026759954 = product of:
      0.04459992 = sum of:
        0.018839544 = weight(_text_:on in 4152) [ClassicSimilarity], result of:
          0.018839544 = score(doc=4152,freq=4.0), product of:
            0.109641045 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.049850095 = queryNorm
            0.1718293 = fieldWeight in 4152, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4152)
        0.0084865615 = weight(_text_:information in 4152) [ClassicSimilarity], result of:
          0.0084865615 = score(doc=4152,freq=2.0), product of:
            0.08751074 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.049850095 = queryNorm
            0.09697737 = fieldWeight in 4152, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4152)
        0.017273815 = product of:
          0.03454763 = sum of:
            0.03454763 = weight(_text_:technology in 4152) [ClassicSimilarity], result of:
              0.03454763 = score(doc=4152,freq=4.0), product of:
                0.14847288 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.049850095 = queryNorm
                0.23268649 = fieldWeight in 4152, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4152)
          0.5 = coord(1/2)
      0.6 = coord(3/5)
    
    Abstract
    A software system has been developed which orders citations retrieved from an online database in terms of relevancy. The system resulted from an effort generated by NASA's Technology Utilization Program to create new advanced software tools to largely automate the process of determining relevancy of database citations retrieved to support large technology transfer studies. The ranking is based on the generation of an enriched vocabulary using lexical association methods, a user assessment of the vocabulary and a combination of the user assessment and the lexical metric. One of the key elements in relevancy ranking is the enriched vocabulary -the terms mst be both unique and descriptive. This paper examines term uniqueness. Six lexical association methods were employed to generate characteristic word indices. A limited subset of the terms - the highest 20,40,60 and 7,5% of the uniquess words - we compared and uniquess factors developed. Computational times were also measured. It was found that methods based on occurrences and signal produced virtually the same terms. The limited subset of terms producedby the exact and centroid discrimination value were also nearly identical. Unique terms sets were produced by teh occurrence, variance and discrimination value (centroid), An end-user evaluation showed that the generated terms were largely distinct and had values of word precision which were consistent with values of the search precision.
    Source
    Information processing and management. 26(1990) no.4, S.549-558
  12. Wolfram, D.; Zhang, J.: ¬An investigation of the influence of indexing exhaustivity and term distributions on a document space (2002) 0.03
    0.026264777 = product of:
      0.043774627 = sum of:
        0.023073634 = weight(_text_:on in 5238) [ClassicSimilarity], result of:
          0.023073634 = score(doc=5238,freq=6.0), product of:
            0.109641045 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.049850095 = queryNorm
            0.21044704 = fieldWeight in 5238, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5238)
        0.0084865615 = weight(_text_:information in 5238) [ClassicSimilarity], result of:
          0.0084865615 = score(doc=5238,freq=2.0), product of:
            0.08751074 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.049850095 = queryNorm
            0.09697737 = fieldWeight in 5238, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5238)
        0.012214432 = product of:
          0.024428863 = sum of:
            0.024428863 = weight(_text_:technology in 5238) [ClassicSimilarity], result of:
              0.024428863 = score(doc=5238,freq=2.0), product of:
                0.14847288 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.049850095 = queryNorm
                0.16453418 = fieldWeight in 5238, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5238)
          0.5 = coord(1/2)
      0.6 = coord(3/5)
    
    Abstract
    Wolfram and Zhang are interested in the effect of different indexing exhaustivity, by which they mean the number of terms chosen, and of different index term distributions and different term weighting methods on the resulting document cluster organization. The Distance Angle Retrieval Environment, DARE, which provides a two dimensional display of retrieved documents was used to represent the document clusters based upon a document's distance from the searcher's main interest, and on the angle formed by the document, a point representing a minor interest, and the point representing the main interest. If the centroid and the origin of the document space are assigned as major and minor points the average distance between documents and the centroid can be measured providing an indication of cluster organization. in the form of a size normalized similarity measure. Using 500 records from NTIS and nine models created by intersecting low, observed, and high exhaustivity levels (based upon a negative binomial distribution) with shallow, observed, and steep term distributions (based upon a Zipf distribution) simulation runs were preformed using inverse document frequency, inter-document term frequency, and inverse document frequency based upon both inter and intra-document frequencies. Low exhaustivity and shallow distributions result in a more dense document space and less effective retrieval. High exhaustivity and steeper distributions result in a more diffuse space.
    Source
    Journal of the American Society for Information Science and Technology. 53(2002) no.11, S.944-952
  13. Lin, Y,-l.; Trattner, C.; Brusilovsky, P.; He, D.: ¬The impact of image descriptions on user tagging behavior : a study of the nature and functionality of crowdsourced tags (2015) 0.03
    0.025922006 = product of:
      0.043203343 = sum of:
        0.023830349 = weight(_text_:on in 2159) [ClassicSimilarity], result of:
          0.023830349 = score(doc=2159,freq=10.0), product of:
            0.109641045 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.049850095 = queryNorm
            0.21734878 = fieldWeight in 2159, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.03125 = fieldNorm(doc=2159)
        0.009601449 = weight(_text_:information in 2159) [ClassicSimilarity], result of:
          0.009601449 = score(doc=2159,freq=4.0), product of:
            0.08751074 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.049850095 = queryNorm
            0.10971737 = fieldWeight in 2159, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.03125 = fieldNorm(doc=2159)
        0.009771545 = product of:
          0.01954309 = sum of:
            0.01954309 = weight(_text_:technology in 2159) [ClassicSimilarity], result of:
              0.01954309 = score(doc=2159,freq=2.0), product of:
                0.14847288 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.049850095 = queryNorm
                0.13162735 = fieldWeight in 2159, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.03125 = fieldNorm(doc=2159)
          0.5 = coord(1/2)
      0.6 = coord(3/5)
    
    Abstract
    Crowdsourcing has emerged as a way to harvest social wisdom from thousands of volunteers to perform a series of tasks online. However, little research has been devoted to exploring the impact of various factors such as the content of a resource or crowdsourcing interface design on user tagging behavior. Although images' titles and descriptions are frequently available in image digital libraries, it is not clear whether they should be displayed to crowdworkers engaged in tagging. This paper focuses on offering insight to the curators of digital image libraries who face this dilemma by examining (i) how descriptions influence the user in his/her tagging behavior and (ii) how this relates to the (a) nature of the tags, (b) the emergent folksonomy, and (c) the findability of the images in the tagging system. We compared two different methods for collecting image tags from Amazon's Mechanical Turk's crowdworkers-with and without image descriptions. Several properties of generated tags were examined from different perspectives: diversity, specificity, reusability, quality, similarity, descriptiveness, and so on. In addition, the study was carried out to examine the impact of image description on supporting users' information seeking with a tag cloud interface. The results showed that the properties of tags are affected by the crowdsourcing approach. Tags from the "with description" condition are more diverse and more specific than tags from the "without description" condition, while the latter has a higher tag reuse rate. A user study also revealed that different tag sets provided different support for search. Tags produced "with description" shortened the path to the target results, whereas tags produced without description increased user success in the search task.
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.9, S.1785-1798
  14. Krovetz, R.; Croft, W.B.: Lexical ambiguity and information retrieval (1992) 0.02
    0.024425104 = product of:
      0.06106276 = sum of:
        0.03730039 = weight(_text_:on in 4028) [ClassicSimilarity], result of:
          0.03730039 = score(doc=4028,freq=8.0), product of:
            0.109641045 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.049850095 = queryNorm
            0.34020463 = fieldWeight in 4028, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4028)
        0.023762373 = weight(_text_:information in 4028) [ClassicSimilarity], result of:
          0.023762373 = score(doc=4028,freq=8.0), product of:
            0.08751074 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.049850095 = queryNorm
            0.27153665 = fieldWeight in 4028, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4028)
      0.4 = coord(2/5)
    
    Abstract
    Reports on an analysis of lexical ambiguity in information retrieval text collections and on experiments to determine the utility of word meanings for separating relevant from nonrelevant documents. Results show that there is considerable ambiguity even in a specialised database. Word senses provide a significant separation between relevant and nonrelevant documents, but several factors contribute to determining whether disambiguation will make an improvement in performance such as: resolving lexical ambiguity was found to have little impact on retrieval effectiveness for documents that have many words in common with the query. Discusses other uses of word sense disambiguation in an information retrieval context
    Source
    ACM transactions on information systems. 10(1992) no.2, S.115-141
  15. Lee, D.H.; Schleyer, T.: Social tagging is no substitute for controlled indexing : a comparison of Medical Subject Headings and CiteULike tags assigned to 231,388 papers (2012) 0.02
    0.022522688 = product of:
      0.037537813 = sum of:
        0.013321568 = weight(_text_:on in 383) [ClassicSimilarity], result of:
          0.013321568 = score(doc=383,freq=2.0), product of:
            0.109641045 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.049850095 = queryNorm
            0.121501654 = fieldWeight in 383, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=383)
        0.012001811 = weight(_text_:information in 383) [ClassicSimilarity], result of:
          0.012001811 = score(doc=383,freq=4.0), product of:
            0.08751074 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.049850095 = queryNorm
            0.13714671 = fieldWeight in 383, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=383)
        0.012214432 = product of:
          0.024428863 = sum of:
            0.024428863 = weight(_text_:technology in 383) [ClassicSimilarity], result of:
              0.024428863 = score(doc=383,freq=2.0), product of:
                0.14847288 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.049850095 = queryNorm
                0.16453418 = fieldWeight in 383, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=383)
          0.5 = coord(1/2)
      0.6 = coord(3/5)
    
    Abstract
    Social tagging and controlled indexing both facilitate access to information resources. Given the increasing popularity of social tagging and the limitations of controlled indexing (primarily cost and scalability), it is reasonable to investigate to what degree social tagging could substitute for controlled indexing. In this study, we compared CiteULike tags to Medical Subject Headings (MeSH) terms for 231,388 citations indexed in MEDLINE. In addition to descriptive analyses of the data sets, we present a paper-by-paper analysis of tags and MeSH terms: the number of common annotations, Jaccard similarity, and coverage ratio. In the analysis, we apply three increasingly progressive levels of text processing, ranging from normalization to stemming, to reduce the impact of lexical differences. Annotations of our corpus consisted of over 76,968 distinct tags and 21,129 distinct MeSH terms. The top 20 tags/MeSH terms showed little direct overlap. On a paper-by-paper basis, the number of common annotations ranged from 0.29 to 0.5 and the Jaccard similarity from 2.12% to 3.3% using increased levels of text processing. At most, 77,834 citations (33.6%) shared at least one annotation. Our results show that CiteULike tags and MeSH terms are quite distinct lexically, reflecting different viewpoints/processes between social tagging and controlled indexing.
    Source
    Journal of the American Society for Information Science and Technology. 63(2012) no.9, S.1747-1757
  16. Lu, K.; Mao, J.: ¬An automatic approach to weighted subject indexing : an empirical study in the biomedical domain (2015) 0.02
    0.022522688 = product of:
      0.037537813 = sum of:
        0.013321568 = weight(_text_:on in 4005) [ClassicSimilarity], result of:
          0.013321568 = score(doc=4005,freq=2.0), product of:
            0.109641045 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.049850095 = queryNorm
            0.121501654 = fieldWeight in 4005, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4005)
        0.012001811 = weight(_text_:information in 4005) [ClassicSimilarity], result of:
          0.012001811 = score(doc=4005,freq=4.0), product of:
            0.08751074 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.049850095 = queryNorm
            0.13714671 = fieldWeight in 4005, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4005)
        0.012214432 = product of:
          0.024428863 = sum of:
            0.024428863 = weight(_text_:technology in 4005) [ClassicSimilarity], result of:
              0.024428863 = score(doc=4005,freq=2.0), product of:
                0.14847288 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.049850095 = queryNorm
                0.16453418 = fieldWeight in 4005, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4005)
          0.5 = coord(1/2)
      0.6 = coord(3/5)
    
    Abstract
    Subject indexing is an intellectually intensive process that has many inherent uncertainties. Existing manual subject indexing systems generally produce binary outcomes for whether or not to assign an indexing term. This does not sufficiently reflect the extent to which the indexing terms are associated with the documents. On the other hand, the idea of probabilistic or weighted indexing was proposed a long time ago and has seen success in capturing uncertainties in the automatic indexing process. One hurdle to overcome in implementing weighted indexing in manual subject indexing systems is the practical burden that could be added to the already intensive indexing process. This study proposes a method to infer automatically the associations between subject terms and documents through text mining. By uncovering the connections between MeSH descriptors and document text, we are able to derive the weights of MeSH descriptors manually assigned to documents. Our initial results suggest that the inference method is feasible and promising. The study has practical implications for improving subject indexing practice and providing better support for information retrieval.
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.9, S.1776-1784
  17. Iivonen, M.: Interindexer consistency and the indexing environment (1990) 0.02
    0.021433719 = product of:
      0.053584296 = sum of:
        0.04170311 = weight(_text_:on in 3593) [ClassicSimilarity], result of:
          0.04170311 = score(doc=3593,freq=10.0), product of:
            0.109641045 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.049850095 = queryNorm
            0.38036036 = fieldWeight in 3593, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3593)
        0.011881187 = weight(_text_:information in 3593) [ClassicSimilarity], result of:
          0.011881187 = score(doc=3593,freq=2.0), product of:
            0.08751074 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.049850095 = queryNorm
            0.13576832 = fieldWeight in 3593, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3593)
      0.4 = coord(2/5)
    
    Abstract
    Considers the interindexer consistency between indexers working in various organisations and reports on the result of an empirical study. The interindexer consistency was low, but there were clear differences depending on whether the consistency was calculated on the basis to terms or concepts or aspects. The fact that the consistency figures remained low can be explained. The low indexing consistency caused by indexing errors also seems to be difficult to control. Indexing consistency and its control have a clear impact on how feasible and useful centralised services and union catalogues are and can be from the point of view of subject description.
    Source
    International forum on information and documentation. 15(1990) no.2, S.8-15
  18. Cleverdon, C.W.: ¬The Cranfield tests on index language devices (1967) 0.02
    0.020935806 = product of:
      0.052339513 = sum of:
        0.031971764 = weight(_text_:on in 1957) [ClassicSimilarity], result of:
          0.031971764 = score(doc=1957,freq=2.0), product of:
            0.109641045 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.049850095 = queryNorm
            0.29160398 = fieldWeight in 1957, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.09375 = fieldNorm(doc=1957)
        0.020367749 = weight(_text_:information in 1957) [ClassicSimilarity], result of:
          0.020367749 = score(doc=1957,freq=2.0), product of:
            0.08751074 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.049850095 = queryNorm
            0.23274569 = fieldWeight in 1957, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.09375 = fieldNorm(doc=1957)
      0.4 = coord(2/5)
    
    Footnote
    Wiederabgedruckt in: Readings in information retrieval. Ed.: K. Sparck Jones u. P. Willett. San Francisco: Morgan Kaufmann 1997. S.47-58.
  19. Neshat, N.; Horri, A.: ¬A study of subject indexing consistency between the National Library of Iran and Humanities Libraries in the area of Iranian studies (2006) 0.02
    0.020005746 = product of:
      0.05001436 = sum of:
        0.02637536 = weight(_text_:on in 230) [ClassicSimilarity], result of:
          0.02637536 = score(doc=230,freq=4.0), product of:
            0.109641045 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.049850095 = queryNorm
            0.24056101 = fieldWeight in 230, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0546875 = fieldNorm(doc=230)
        0.023639 = product of:
          0.047278 = sum of:
            0.047278 = weight(_text_:22 in 230) [ClassicSimilarity], result of:
              0.047278 = score(doc=230,freq=2.0), product of:
                0.17456654 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049850095 = queryNorm
                0.2708308 = fieldWeight in 230, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=230)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    This study represents an attempt to compare indexing consistency between the catalogers of the National Library of Iran (NLI) on one side and 12 major academic and special libraries located in Tehran on the other. The research findings indicate that in 75% of the libraries the subject inconsistency values are 60% to 85%. In terms of subject classes, the consistency values are 10% to 35.2%, the mean of which is 22.5%. Moreover, the findings show that whenever the number of assigned terms increases, the probability of consistency decreases. This confirms Markey's findings in 1984.
    Date
    4. 1.2007 10:22:26
  20. Soergel, D.: Indexing and retrieval performance : the logical evidence (1994) 0.02
    0.01967263 = product of:
      0.049181577 = sum of:
        0.03730039 = weight(_text_:on in 579) [ClassicSimilarity], result of:
          0.03730039 = score(doc=579,freq=8.0), product of:
            0.109641045 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.049850095 = queryNorm
            0.34020463 = fieldWeight in 579, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0546875 = fieldNorm(doc=579)
        0.011881187 = weight(_text_:information in 579) [ClassicSimilarity], result of:
          0.011881187 = score(doc=579,freq=2.0), product of:
            0.08751074 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.049850095 = queryNorm
            0.13576832 = fieldWeight in 579, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=579)
      0.4 = coord(2/5)
    
    Abstract
    This article presents a logical analysis of the characteristics of indexing and their effects on retrieval performance.It establishes the ability to ask the questions one needs to ask as the foundation of performance evaluation, and recall and discrimination as the basic quantitative performance measures for binary noninteractive retrieval systems. It then defines the characteristics of indexing that affect retrieval - namely, indexing devices, viewpoint-based and importance-based indexing exhaustivity, indexing specifity, indexing correctness, and indexing consistency - and examines in detail their effects on retrieval. It concludes that retrieval performance depends chiefly on the match between indexing and the requirements of the individual query and on the adaption of the query formulation to the characteristics of the retrieval system, and that the ensuing complexity must be considered in the design and testing of retrieval systems
    Source
    Journal of the American Society for Information Science. 45(1994) no.8, S.589-599

Authors

Languages

  • e 65
  • chi 1
  • d 1
  • f 1
  • nl 1
  • More… Less…

Types

  • a 66
  • r 2
  • m 1
  • More… Less…