Search (57 results, page 1 of 3)

  • × theme_ss:"Indexierungsstudien"
  1. Taniguchi, S.: Recording evidence in bibliographic records and descriptive metadata (2005) 0.04
    0.038194444 = product of:
      0.07638889 = sum of:
        0.00972145 = weight(_text_:information in 3565) [ClassicSimilarity], result of:
          0.00972145 = score(doc=3565,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.116372846 = fieldWeight in 3565, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=3565)
        0.06666744 = sum of:
          0.027983533 = weight(_text_:technology in 3565) [ClassicSimilarity], result of:
            0.027983533 = score(doc=3565,freq=2.0), product of:
              0.1417311 = queryWeight, product of:
                2.978387 = idf(docFreq=6114, maxDocs=44218)
                0.047586527 = queryNorm
              0.19744103 = fieldWeight in 3565, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.978387 = idf(docFreq=6114, maxDocs=44218)
                0.046875 = fieldNorm(doc=3565)
          0.038683902 = weight(_text_:22 in 3565) [ClassicSimilarity], result of:
            0.038683902 = score(doc=3565,freq=2.0), product of:
              0.16663991 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.047586527 = queryNorm
              0.23214069 = fieldWeight in 3565, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=3565)
      0.5 = coord(2/4)
    
    Date
    18. 6.2005 13:16:22
    Source
    Journal of the American Society for Information Science and Technology. 56(2005) no.8, S.872-882
  2. White, H.; Willis, C.; Greenberg, J.: HIVEing : the effect of a semantic web technology on inter-indexer consistency (2014) 0.04
    0.036658354 = product of:
      0.07331671 = sum of:
        0.008101207 = weight(_text_:information in 1781) [ClassicSimilarity], result of:
          0.008101207 = score(doc=1781,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.09697737 = fieldWeight in 1781, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1781)
        0.0652155 = sum of:
          0.03297891 = weight(_text_:technology in 1781) [ClassicSimilarity], result of:
            0.03297891 = score(doc=1781,freq=4.0), product of:
              0.1417311 = queryWeight, product of:
                2.978387 = idf(docFreq=6114, maxDocs=44218)
                0.047586527 = queryNorm
              0.23268649 = fieldWeight in 1781, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.978387 = idf(docFreq=6114, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1781)
          0.032236587 = weight(_text_:22 in 1781) [ClassicSimilarity], result of:
            0.032236587 = score(doc=1781,freq=2.0), product of:
              0.16663991 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.047586527 = queryNorm
              0.19345059 = fieldWeight in 1781, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1781)
      0.5 = coord(2/4)
    
    Abstract
    Purpose - The purpose of this paper is to examine the effect of the Helping Interdisciplinary Vocabulary Engineering (HIVE) system on the inter-indexer consistency of information professionals when assigning keywords to a scientific abstract. This study examined first, the inter-indexer consistency of potential HIVE users; second, the impact HIVE had on consistency; and third, challenges associated with using HIVE. Design/methodology/approach - A within-subjects quasi-experimental research design was used for this study. Data were collected using a task-scenario based questionnaire. Analysis was performed on consistency results using Hooper's and Rolling's inter-indexer consistency measures. A series of t-tests was used to judge the significance between consistency measure results. Findings - Results suggest that HIVE improves inter-indexing consistency. Working with HIVE increased consistency rates by 22 percent (Rolling's) and 25 percent (Hooper's) when selecting relevant terms from all vocabularies. A statistically significant difference exists between the assignment of free-text keywords and machine-aided keywords. Issues with homographs, disambiguation, vocabulary choice, and document structure were all identified as potential challenges. Research limitations/implications - Research limitations for this study can be found in the small number of vocabularies used for the study. Future research will include implementing HIVE into the Dryad Repository and studying its application in a repository system. Originality/value - This paper showcases several features used in HIVE system. By using traditional consistency measures to evaluate a semantic web technology, this paper emphasizes the link between traditional indexing and next generation machine-aided indexing (MAI) tools.
  3. Veenema, F.: To index or not to index (1996) 0.02
    0.022060106 = product of:
      0.04412021 = sum of:
        0.018330941 = weight(_text_:information in 7247) [ClassicSimilarity], result of:
          0.018330941 = score(doc=7247,freq=4.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.21943474 = fieldWeight in 7247, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0625 = fieldNorm(doc=7247)
        0.02578927 = product of:
          0.05157854 = sum of:
            0.05157854 = weight(_text_:22 in 7247) [ClassicSimilarity], result of:
              0.05157854 = score(doc=7247,freq=2.0), product of:
                0.16663991 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047586527 = queryNorm
                0.30952093 = fieldWeight in 7247, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=7247)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Describes an experiment comparing the performance of automatic full-text indexing software for personal computers with the human intellectual assignment of indexing terms in each document in a collection. Considers the times required to index the document, to retrieve documents satisfying 5 typical foreseen information needs, and the recall and precision ratios of searching. The software used is QuickFinder facility in WordPerfect 6.1 for Windows
    Source
    Canadian journal of information and library science. 21(1996) no.2, S.1-22
  4. Gil-Leiva, I.; Alonso-Arroyo, A.: Keywords given by authors of scientific articles in database descriptors (2007) 0.02
    0.016345935 = product of:
      0.03269187 = sum of:
        0.016202414 = weight(_text_:information in 211) [ClassicSimilarity], result of:
          0.016202414 = score(doc=211,freq=8.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.19395474 = fieldWeight in 211, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=211)
        0.016489455 = product of:
          0.03297891 = sum of:
            0.03297891 = weight(_text_:technology in 211) [ClassicSimilarity], result of:
              0.03297891 = score(doc=211,freq=4.0), product of:
                0.1417311 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.047586527 = queryNorm
                0.23268649 = fieldWeight in 211, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=211)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    In this article, the authors analyze the keywords given by authors of scientific articles and the descriptors assigned to the articles to ascertain the presence of the keywords in the descriptors. Six-hundred forty INSPEC (Information Service for Physics, Engineering, and Computing), CAB (Current Agriculture Bibliography) abstracts, ISTA (Information Science and Technology Abstracts), and LISA (Library and Information Science Abstracts) database records were consulted. After detailed comparisons, it was found that keywords provided by authors have an important presence in the database descriptors studied; nearly 25% of all the keywords appeared in exactly the same form as descriptors, with another 21% though normalized, still detected in the descriptors. This means that almost 46% of keywords appear in the descriptors, either as such or after normalization. Elsewhere, three distinct indexing policies appear, one represented by INSPEC and LISA (indexers seem to have freedom to assign the descriptors they deem necessary); another is represented by CAB (no record has fewer than four descriptors and, in general, a large number of descriptors is employed). In contrast, in ISTA, a certain institutional code exists towards economy in indexing because 84% of records contain only four descriptors.
    Source
    Journal of the American Society for Information Science and Technology. 58(2007) no.8, S.1175-1187
  5. Lu, K.; Mao, J.; Li, G.: Toward effective automated weighted subject indexing : a comparison of different approaches in different environments (2018) 0.02
    0.016345935 = product of:
      0.03269187 = sum of:
        0.016202414 = weight(_text_:information in 4292) [ClassicSimilarity], result of:
          0.016202414 = score(doc=4292,freq=8.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.19395474 = fieldWeight in 4292, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4292)
        0.016489455 = product of:
          0.03297891 = sum of:
            0.03297891 = weight(_text_:technology in 4292) [ClassicSimilarity], result of:
              0.03297891 = score(doc=4292,freq=4.0), product of:
                0.1417311 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.047586527 = queryNorm
                0.23268649 = fieldWeight in 4292, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4292)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Subject indexing plays an important role in supporting subject access to information resources. Current subject indexing systems do not make adequate distinctions on the importance of assigned subject descriptors. Assigning numeric weights to subject descriptors to distinguish their importance to the documents can strengthen the role of subject metadata. Automated methods are more cost-effective. This study compares different automated weighting methods in different environments. Two evaluation methods were used to assess the performance. Experiments on three datasets in the biomedical domain suggest the performance of different weighting methods depends on whether it is an abstract or full text environment. Mutual information with bag-of-words representation shows the best average performance in the full text environment, while cosine with bag-of-words representation is the best in an abstract environment. The cosine measure has relatively consistent and robust performance. A direct weighting method, IDF (Inverse Document Frequency), can produce quick and reasonable estimates of the weights. Bag-of-words representation generally outperforms the concept-based representation. Further improvement in performance can be obtained by using the learning-to-rank method to integrate different weighting methods. This study follows up Lu and Mao (Journal of the Association for Information Science and Technology, 66, 1776-1784, 2015), in which an automated weighted subject indexing method was proposed and validated. The findings from this study contribute to more effective weighted subject indexing.
    Source
    Journal of the Association for Information Science and Technology. 69(2018) no.1, S.121-133
  6. Tseng, Y.-H.: Keyword extraction techniques and relevance feedback (1997) 0.02
    0.016181652 = product of:
      0.032363303 = sum of:
        0.016039573 = weight(_text_:information in 1830) [ClassicSimilarity], result of:
          0.016039573 = score(doc=1830,freq=4.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.1920054 = fieldWeight in 1830, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1830)
        0.016323728 = product of:
          0.032647457 = sum of:
            0.032647457 = weight(_text_:technology in 1830) [ClassicSimilarity], result of:
              0.032647457 = score(doc=1830,freq=2.0), product of:
                0.1417311 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.047586527 = queryNorm
                0.23034787 = fieldWeight in 1830, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1830)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Automatic keyword extraction is an important and fundamental technology in an advanced information retrieval systems. Briefly compares several major keyword extraction methods, lists their advantages and disadvantages, and reports recent research progress in Taiwan. Also describes the application of a keyword extraction algorithm in an information retrieval system for relevance feedback. Preliminary analysis shows that the error rate of extracting relevant keywords is 18%, and that the precision rate is over 50%. The main disadvantage of this approach is that the extraction results depend on the retrieval results, which in turn depend on the data held by the database. Apart from collecting more data, this problem can be alleviated by the application of a thesaurus constructed by the same keyword extraction algorithm
  7. Chan, L.M.: Inter-indexer consistency in subject cataloging (1989) 0.02
    0.015808811 = product of:
      0.031617623 = sum of:
        0.012961932 = weight(_text_:information in 2276) [ClassicSimilarity], result of:
          0.012961932 = score(doc=2276,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.1551638 = fieldWeight in 2276, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0625 = fieldNorm(doc=2276)
        0.01865569 = product of:
          0.03731138 = sum of:
            0.03731138 = weight(_text_:technology in 2276) [ClassicSimilarity], result of:
              0.03731138 = score(doc=2276,freq=2.0), product of:
                0.1417311 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.047586527 = queryNorm
                0.2632547 = fieldWeight in 2276, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.0625 = fieldNorm(doc=2276)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Source
    Information technology and libraries. 8(1989), S.349-358
  8. Rowley, J.: ¬The controlled versus natural indexing languages debate revisited : a perspective on information retrieval practice and research (1994) 0.01
    0.014887327 = product of:
      0.029774655 = sum of:
        0.01811485 = weight(_text_:information in 7151) [ClassicSimilarity], result of:
          0.01811485 = score(doc=7151,freq=10.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.21684799 = fieldWeight in 7151, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=7151)
        0.011659805 = product of:
          0.02331961 = sum of:
            0.02331961 = weight(_text_:technology in 7151) [ClassicSimilarity], result of:
              0.02331961 = score(doc=7151,freq=2.0), product of:
                0.1417311 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.047586527 = queryNorm
                0.16453418 = fieldWeight in 7151, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=7151)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    This article revisits the debate concerning controlled and natural indexing languages, as used in searching the databases of the online hosts, in-house information retrieval systems, online public access catalogues and databases stored on CD-ROM. The debate was first formulated in the early days of information retrieval more than a century ago but, despite significant advance in technology, remains unresolved. The article divides the history of the debate into four eras. Era one was characterised by the introduction of controlled vocabulary. Era two focused on comparisons between different indexing languages in order to assess which was best. Era three saw a number of case studies of limited generalisability and a general recognition that the best search performance can be achieved by the parallel use of the two types of indexing languages. The emphasis in Era four has been on the development of end-user-based systems, including online public access catalogues and databases on CD-ROM. Recent developments in the use of expert systems techniques to support the representation of meaning may lead to systems which offer significant support to the user in end-user searching. In the meantime, however, information retrieval in practice involves a mixture of natural and controlled indexing languages used to search a wide variety of different kinds of databases
    Source
    Journal of information science. 20(1994) no.2, S.108-119
  9. Leininger, K.: Interindexer consistency in PsychINFO (2000) 0.01
    0.0145317 = product of:
      0.0290634 = sum of:
        0.00972145 = weight(_text_:information in 2552) [ClassicSimilarity], result of:
          0.00972145 = score(doc=2552,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.116372846 = fieldWeight in 2552, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=2552)
        0.019341951 = product of:
          0.038683902 = sum of:
            0.038683902 = weight(_text_:22 in 2552) [ClassicSimilarity], result of:
              0.038683902 = score(doc=2552,freq=2.0), product of:
                0.16663991 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047586527 = queryNorm
                0.23214069 = fieldWeight in 2552, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2552)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Date
    9. 2.1997 18:44:22
    Source
    Journal of librarianship and information science. 32(2000) no.1, S.4-8
  10. Peset, F.; Garzón-Farinós, F.; González, L.M.; García-Massó, X.; Ferrer-Sapena, A.; Toca-Herrera, J.L.; Sánchez-Pérez, E.A.: Survival analysis of author keywords : an application to the library and information sciences area (2020) 0.01
    0.012845755 = product of:
      0.02569151 = sum of:
        0.0140317045 = weight(_text_:information in 5774) [ClassicSimilarity], result of:
          0.0140317045 = score(doc=5774,freq=6.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.16796975 = fieldWeight in 5774, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5774)
        0.011659805 = product of:
          0.02331961 = sum of:
            0.02331961 = weight(_text_:technology in 5774) [ClassicSimilarity], result of:
              0.02331961 = score(doc=5774,freq=2.0), product of:
                0.1417311 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.047586527 = queryNorm
                0.16453418 = fieldWeight in 5774, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5774)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Our purpose is to adapt a statistical method for the analysis of discrete numerical series to the keywords appearing in scientific articles of a given area. As an example, we apply our methodological approach to the study of the keywords in the Library and Information Sciences (LIS) area. Our objective is to detect the new author keywords that appear in a fixed knowledge area in the period of 1 year in order to quantify the probabilities of survival for 10 years as a function of the impact of the journals where they appeared. Many of the new keywords appearing in the LIS field are ephemeral. Actually, more than half are never used again. In general, the terms most commonly used in the LIS area come from other areas. The average survival time of these keywords is approximately 3 years, being slightly higher in the case of words that were published in journals classified in the second quartile of the area. We believe that measuring the appearance and disappearance of terms will allow understanding some relevant aspects of the evolution of a discipline, providing in this way a new bibliometric approach.
    Source
    Journal of the Association for Information Science and Technology. 71(2020) no.4, S.462-473
  11. Huffman, G.D.; Vital, D.A.; Bivins, R.G.: Generating indices with lexical association methods : term uniqueness (1990) 0.01
    0.012295332 = product of:
      0.024590664 = sum of:
        0.008101207 = weight(_text_:information in 4152) [ClassicSimilarity], result of:
          0.008101207 = score(doc=4152,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.09697737 = fieldWeight in 4152, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4152)
        0.016489455 = product of:
          0.03297891 = sum of:
            0.03297891 = weight(_text_:technology in 4152) [ClassicSimilarity], result of:
              0.03297891 = score(doc=4152,freq=4.0), product of:
                0.1417311 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.047586527 = queryNorm
                0.23268649 = fieldWeight in 4152, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4152)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    A software system has been developed which orders citations retrieved from an online database in terms of relevancy. The system resulted from an effort generated by NASA's Technology Utilization Program to create new advanced software tools to largely automate the process of determining relevancy of database citations retrieved to support large technology transfer studies. The ranking is based on the generation of an enriched vocabulary using lexical association methods, a user assessment of the vocabulary and a combination of the user assessment and the lexical metric. One of the key elements in relevancy ranking is the enriched vocabulary -the terms mst be both unique and descriptive. This paper examines term uniqueness. Six lexical association methods were employed to generate characteristic word indices. A limited subset of the terms - the highest 20,40,60 and 7,5% of the uniquess words - we compared and uniquess factors developed. Computational times were also measured. It was found that methods based on occurrences and signal produced virtually the same terms. The limited subset of terms producedby the exact and centroid discrimination value were also nearly identical. Unique terms sets were produced by teh occurrence, variance and discrimination value (centroid), An end-user evaluation showed that the generated terms were largely distinct and had values of word precision which were consistent with values of the search precision.
    Source
    Information processing and management. 26(1990) no.4, S.549-558
  12. Westerman, S.J.; Cribbin, T.; Collins, J.: Human assessments of document similarity (2010) 0.01
    0.011856608 = product of:
      0.023713216 = sum of:
        0.00972145 = weight(_text_:information in 3915) [ClassicSimilarity], result of:
          0.00972145 = score(doc=3915,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.116372846 = fieldWeight in 3915, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=3915)
        0.013991767 = product of:
          0.027983533 = sum of:
            0.027983533 = weight(_text_:technology in 3915) [ClassicSimilarity], result of:
              0.027983533 = score(doc=3915,freq=2.0), product of:
                0.1417311 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.047586527 = queryNorm
                0.19744103 = fieldWeight in 3915, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3915)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.8, S.1535-1542
  13. Bodoff, D.; Richter-Levin, Y.: Viewpoints in indexing term assignment (2020) 0.01
    0.011856608 = product of:
      0.023713216 = sum of:
        0.00972145 = weight(_text_:information in 5765) [ClassicSimilarity], result of:
          0.00972145 = score(doc=5765,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.116372846 = fieldWeight in 5765, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=5765)
        0.013991767 = product of:
          0.027983533 = sum of:
            0.027983533 = weight(_text_:technology in 5765) [ClassicSimilarity], result of:
              0.027983533 = score(doc=5765,freq=2.0), product of:
                0.1417311 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.047586527 = queryNorm
                0.19744103 = fieldWeight in 5765, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5765)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Source
    Journal of the Association for Information Science and Technology. 71(2020) no.4, S.450-461
  14. Lee, D.H.; Schleyer, T.: Social tagging is no substitute for controlled indexing : a comparison of Medical Subject Headings and CiteULike tags assigned to 231,388 papers (2012) 0.01
    0.011558321 = product of:
      0.023116643 = sum of:
        0.011456838 = weight(_text_:information in 383) [ClassicSimilarity], result of:
          0.011456838 = score(doc=383,freq=4.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.13714671 = fieldWeight in 383, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=383)
        0.011659805 = product of:
          0.02331961 = sum of:
            0.02331961 = weight(_text_:technology in 383) [ClassicSimilarity], result of:
              0.02331961 = score(doc=383,freq=2.0), product of:
                0.1417311 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.047586527 = queryNorm
                0.16453418 = fieldWeight in 383, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=383)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Social tagging and controlled indexing both facilitate access to information resources. Given the increasing popularity of social tagging and the limitations of controlled indexing (primarily cost and scalability), it is reasonable to investigate to what degree social tagging could substitute for controlled indexing. In this study, we compared CiteULike tags to Medical Subject Headings (MeSH) terms for 231,388 citations indexed in MEDLINE. In addition to descriptive analyses of the data sets, we present a paper-by-paper analysis of tags and MeSH terms: the number of common annotations, Jaccard similarity, and coverage ratio. In the analysis, we apply three increasingly progressive levels of text processing, ranging from normalization to stemming, to reduce the impact of lexical differences. Annotations of our corpus consisted of over 76,968 distinct tags and 21,129 distinct MeSH terms. The top 20 tags/MeSH terms showed little direct overlap. On a paper-by-paper basis, the number of common annotations ranged from 0.29 to 0.5 and the Jaccard similarity from 2.12% to 3.3% using increased levels of text processing. At most, 77,834 citations (33.6%) shared at least one annotation. Our results show that CiteULike tags and MeSH terms are quite distinct lexically, reflecting different viewpoints/processes between social tagging and controlled indexing.
    Source
    Journal of the American Society for Information Science and Technology. 63(2012) no.9, S.1747-1757
  15. Lu, K.; Mao, J.: ¬An automatic approach to weighted subject indexing : an empirical study in the biomedical domain (2015) 0.01
    0.011558321 = product of:
      0.023116643 = sum of:
        0.011456838 = weight(_text_:information in 4005) [ClassicSimilarity], result of:
          0.011456838 = score(doc=4005,freq=4.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.13714671 = fieldWeight in 4005, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4005)
        0.011659805 = product of:
          0.02331961 = sum of:
            0.02331961 = weight(_text_:technology in 4005) [ClassicSimilarity], result of:
              0.02331961 = score(doc=4005,freq=2.0), product of:
                0.1417311 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.047586527 = queryNorm
                0.16453418 = fieldWeight in 4005, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4005)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Subject indexing is an intellectually intensive process that has many inherent uncertainties. Existing manual subject indexing systems generally produce binary outcomes for whether or not to assign an indexing term. This does not sufficiently reflect the extent to which the indexing terms are associated with the documents. On the other hand, the idea of probabilistic or weighted indexing was proposed a long time ago and has seen success in capturing uncertainties in the automatic indexing process. One hurdle to overcome in implementing weighted indexing in manual subject indexing systems is the practical burden that could be added to the already intensive indexing process. This study proposes a method to infer automatically the associations between subject terms and documents through text mining. By uncovering the connections between MeSH descriptors and document text, we are able to derive the weights of MeSH descriptors manually assigned to documents. Our initial results suggest that the inference method is feasible and promising. The study has practical implications for improving subject indexing practice and providing better support for information retrieval.
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.9, S.1776-1784
  16. Wolfram, D.; Zhang, J.: ¬An investigation of the influence of indexing exhaustivity and term distributions on a document space (2002) 0.01
    0.0098805055 = product of:
      0.019761011 = sum of:
        0.008101207 = weight(_text_:information in 5238) [ClassicSimilarity], result of:
          0.008101207 = score(doc=5238,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.09697737 = fieldWeight in 5238, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5238)
        0.011659805 = product of:
          0.02331961 = sum of:
            0.02331961 = weight(_text_:technology in 5238) [ClassicSimilarity], result of:
              0.02331961 = score(doc=5238,freq=2.0), product of:
                0.1417311 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.047586527 = queryNorm
                0.16453418 = fieldWeight in 5238, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5238)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Source
    Journal of the American Society for Information Science and Technology. 53(2002) no.11, S.944-952
  17. Cleverdon, C.W.: ASLIB Cranfield Research Project : Report on the first stage of an investigation into the comparative efficiency of indexing systems (1960) 0.01
    0.009670976 = product of:
      0.038683902 = sum of:
        0.038683902 = product of:
          0.077367805 = sum of:
            0.077367805 = weight(_text_:22 in 6158) [ClassicSimilarity], result of:
              0.077367805 = score(doc=6158,freq=2.0), product of:
                0.16663991 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047586527 = queryNorm
                0.46428138 = fieldWeight in 6158, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=6158)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Footnote
    Rez. in: College and research libraries 22(1961) no.3, S.228 (G. Jahoda)
  18. Lin, Y,-l.; Trattner, C.; Brusilovsky, P.; He, D.: ¬The impact of image descriptions on user tagging behavior : a study of the nature and functionality of crowdsourced tags (2015) 0.01
    0.009246658 = product of:
      0.018493315 = sum of:
        0.0091654705 = weight(_text_:information in 2159) [ClassicSimilarity], result of:
          0.0091654705 = score(doc=2159,freq=4.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.10971737 = fieldWeight in 2159, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.03125 = fieldNorm(doc=2159)
        0.009327845 = product of:
          0.01865569 = sum of:
            0.01865569 = weight(_text_:technology in 2159) [ClassicSimilarity], result of:
              0.01865569 = score(doc=2159,freq=2.0), product of:
                0.1417311 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.047586527 = queryNorm
                0.13162735 = fieldWeight in 2159, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.03125 = fieldNorm(doc=2159)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Crowdsourcing has emerged as a way to harvest social wisdom from thousands of volunteers to perform a series of tasks online. However, little research has been devoted to exploring the impact of various factors such as the content of a resource or crowdsourcing interface design on user tagging behavior. Although images' titles and descriptions are frequently available in image digital libraries, it is not clear whether they should be displayed to crowdworkers engaged in tagging. This paper focuses on offering insight to the curators of digital image libraries who face this dilemma by examining (i) how descriptions influence the user in his/her tagging behavior and (ii) how this relates to the (a) nature of the tags, (b) the emergent folksonomy, and (c) the findability of the images in the tagging system. We compared two different methods for collecting image tags from Amazon's Mechanical Turk's crowdworkers-with and without image descriptions. Several properties of generated tags were examined from different perspectives: diversity, specificity, reusability, quality, similarity, descriptiveness, and so on. In addition, the study was carried out to examine the impact of image description on supporting users' information seeking with a tag cloud interface. The results showed that the properties of tags are affected by the crowdsourcing approach. Tags from the "with description" condition are more diverse and more specific than tags from the "without description" condition, while the latter has a higher tag reuse rate. A user study also revealed that different tag sets provided different support for search. Tags produced "with description" shortened the path to the target results, whereas tags produced without description increased user success in the search task.
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.9, S.1785-1798
  19. Zunde, P.; Dexter, M.E.: Factors affecting indexing performance (1969) 0.01
    0.0068741026 = product of:
      0.02749641 = sum of:
        0.02749641 = weight(_text_:information in 7496) [ClassicSimilarity], result of:
          0.02749641 = score(doc=7496,freq=4.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.3291521 = fieldWeight in 7496, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.09375 = fieldNorm(doc=7496)
      0.25 = coord(1/4)
    
    Source
    Cooperating information societies: Proceedings of the 32nd Annual Meeting of the American Society for Information Science, San Francisco, CA, 1.-4.10.1969. Ed.: J.B. North
  20. Cleverdon, C.W.: Evaluation tests of information retrieval systems (1970) 0.01
    0.006480966 = product of:
      0.025923865 = sum of:
        0.025923865 = weight(_text_:information in 2272) [ClassicSimilarity], result of:
          0.025923865 = score(doc=2272,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.3103276 = fieldWeight in 2272, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.125 = fieldNorm(doc=2272)
      0.25 = coord(1/4)
    

Languages

  • e 55
  • chi 1
  • d 1
  • More… Less…

Types

  • a 55
  • m 1
  • r 1
  • More… Less…