Search (236 results, page 3 of 12)

  • × language_ss:"e"
  • × theme_ss:"Automatisches Indexieren"
  1. Mesquita, L.A.P.; Souza, R.R.; Baracho Porto, R.M.A.: Noun phrases in automatic indexing: : a structural analysis of the distribution of relevant terms in doctoral theses (2014) 0.01
    0.009246535 = product of:
      0.023116335 = sum of:
        0.010551793 = weight(_text_:a in 1442) [ClassicSimilarity], result of:
          0.010551793 = score(doc=1442,freq=30.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.19735932 = fieldWeight in 1442, product of:
              5.477226 = tf(freq=30.0), with freq of:
                30.0 = termFreq=30.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.03125 = fieldNorm(doc=1442)
        0.012564542 = product of:
          0.025129084 = sum of:
            0.025129084 = weight(_text_:22 in 1442) [ClassicSimilarity], result of:
              0.025129084 = score(doc=1442,freq=2.0), product of:
                0.16237405 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046368346 = queryNorm
                0.15476047 = fieldWeight in 1442, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03125 = fieldNorm(doc=1442)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    The main objective of this research was to analyze whether there was a characteristic distribution behavior of relevant terms over a scientific text that could contribute as a criterion for their process of automatic indexing. The terms considered in this study were only full noun phrases contained in the texts themselves. The texts were considered a total of 98 doctoral theses of the eight areas of knowledge in a same university. Initially, 20 full noun phrases were automatically extracted from each text as candidates to be the most relevant terms, and each author of each text assigned a relevance value 0-6 (not relevant and highly relevant, respectively) for each of the 20 noun phrases sent. Only, 22.1 % of noun phrases were considered not relevant. A relevance values of the terms assigned by the authors were associated with their positions in the text. Each full noun phrases found in the text was considered as a valid linear position. The results that were obtained showed values resulting from this distribution by considering two types of position: linear, with values consolidated into ten equal consecutive parts; and structural, considering parts of the text (such as introduction, development and conclusion). As a result of considerable importance, all areas of knowledge related to the Natural Sciences showed a characteristic behavior in the distribution of relevant terms, as well as all areas of knowledge related to Social Sciences showed the same characteristic behavior of distribution, but distinct from the Natural Sciences. The difference of the distribution behavior between the Natural and Social Sciences can be clearly visualized through graphs. All behaviors, including the general behavior of all areas of knowledge together, were characterized in polynomial equations and can be applied in future as criteria for automatic indexing. Until the present date this work has become inedited of for two reasons: to present a method for characterizing the distribution of relevant terms in a scientific text, and also, through this method, pointing out a quantitative trait difference between the Natural and Social Sciences.
    Source
    Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik
    Type
    a
  2. Losee, R.M.: ¬A Gray code based ordering for documents on shelves : classification for browsing and retrieval (1992) 0.01
    0.009222459 = product of:
      0.023056148 = sum of:
        0.013485395 = weight(_text_:a in 2335) [ClassicSimilarity], result of:
          0.013485395 = score(doc=2335,freq=16.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.25222903 = fieldWeight in 2335, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2335)
        0.009570752 = product of:
          0.019141505 = sum of:
            0.019141505 = weight(_text_:information in 2335) [ClassicSimilarity], result of:
              0.019141505 = score(doc=2335,freq=6.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.23515764 = fieldWeight in 2335, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2335)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    A document classifier places documents together in a linear arrangement for browsing or high-speed access by human or computerised information retrieval systems. Requirements for document classification and browsing systems are developed from similarity measures, distance measures, and the notion of subject aboutness. A requirement that documents be arranged in decreasing order of similarity as the distance from a given document increases can often not be met. Based on these requirements, information-theoretic considerations, and the Gray code, a classification system is proposed that can classifiy documents without human intervention. A measure of classifier performance is developed, and used to evaluate experimental results comparing the distance between subject headings assigned to documents given classifications from the proposed system and the Library of Congress Classification (LCC) system
    Source
    Journal of the American Society for Information Science. 43(1992) no.4, S.312-322
    Type
    a
  3. Siebenkäs, A.; Markscheffel, B.: Conception of a workflow for the semi-automatic construction of a thesaurus for the German printing industry (2015) 0.01
    0.009222459 = product of:
      0.023056148 = sum of:
        0.013485395 = weight(_text_:a in 2091) [ClassicSimilarity], result of:
          0.013485395 = score(doc=2091,freq=16.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.25222903 = fieldWeight in 2091, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2091)
        0.009570752 = product of:
          0.019141505 = sum of:
            0.019141505 = weight(_text_:information in 2091) [ClassicSimilarity], result of:
              0.019141505 = score(doc=2091,freq=6.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.23515764 = fieldWeight in 2091, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2091)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    During the BMWI granted project "Print-IT", the need of a thesaurus based uniform and consistent language for the German printing industry became evident. In this paper we introduce a semi-automatic construction approach for such a thesaurus and present a workflow which supports users to generate thesaurus typical information structures from relevant digitalized resources with the help of common IT-tools.
    Source
    Re:inventing information science in the networked society: Proceedings of the 14th International Symposium on Information Science, Zadar/Croatia, 19th-21st May 2015. Eds.: F. Pehar, C. Schloegl u. C. Wolff
    Type
    a
  4. Krutulis, J.D.; Jacob, E.K.: ¬A theoretical model for the study of emergent structure in adaptive information networks (1995) 0.01
    0.009206772 = product of:
      0.02301693 = sum of:
        0.010661141 = weight(_text_:a in 3353) [ClassicSimilarity], result of:
          0.010661141 = score(doc=3353,freq=10.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.19940455 = fieldWeight in 3353, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3353)
        0.012355788 = product of:
          0.024711575 = sum of:
            0.024711575 = weight(_text_:information in 3353) [ClassicSimilarity], result of:
              0.024711575 = score(doc=3353,freq=10.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.3035872 = fieldWeight in 3353, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3353)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    Attempts to automate classification have focused on mimicking the intellectual processes whereby human classifiers assign entities to mutually exclusive groups that exhibit or more shared characteristics. A more viable approach might be to construct an adaptive retrieval system that produces groupings of related entities by generating dynamic categories based on document content and on the system's emergent structure as it adapts to modifications in the database and to observed patterns of access. Presents a theoretical model for adaptive information networks using relevance feedback and genetic algorithms to generate emergent structure
    Imprint
    Alberta : Alberta University, School of Library and Information Studies
    Source
    Connectedness: information, systems, people, organizations. Proceedings of CAIS/ACSI 95, the proceedings of the 23rd Annual Conference of the Canadian Association for Information Science. Ed. by Hope A. Olson and Denis B. Ward
    Type
    a
  5. Wan, T.-L.; Evens, M.; Wan, Y.-W.; Pao, Y.-Y.: Experiments with automatic indexing and a relational thesaurus in a Chinese information retrieval system (1997) 0.01
    0.008684997 = product of:
      0.021712493 = sum of:
        0.010661141 = weight(_text_:a in 956) [ClassicSimilarity], result of:
          0.010661141 = score(doc=956,freq=10.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.19940455 = fieldWeight in 956, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=956)
        0.011051352 = product of:
          0.022102704 = sum of:
            0.022102704 = weight(_text_:information in 956) [ClassicSimilarity], result of:
              0.022102704 = score(doc=956,freq=8.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.27153665 = fieldWeight in 956, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=956)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    This article describes a series of experiments with an interactive Chinese information retrieval system named CIRS and an interactive relational thesaurus. 2 important issues have been explored: whether thesauri enhance the retrieval effectiveness of Chinese documents, and whether automatic indexing can complete with manual indexing in a Chinese information retrieval system. Recall and precision are used to measure and evaluate the effectiveness of the system. Statistical analysis of the recall and precision measures suggest that the use of the relational thesaurus does improve the retrieval effectiveness both in the automatic indexing environment and in the manual indexing environment and that automatic indexing is at least as good as manual indexing
    Source
    Journal of the American Society for Information Science. 48(1997) no.12, S.1086-1096
    Type
    a
  6. Hersh, W.R.; Hickam, D.H.: ¬A comparison of two methods for indexing and retrieval from a full-text medical database (1992) 0.01
    0.008499779 = product of:
      0.021249447 = sum of:
        0.011678694 = weight(_text_:a in 4526) [ClassicSimilarity], result of:
          0.011678694 = score(doc=4526,freq=12.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.21843673 = fieldWeight in 4526, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4526)
        0.009570752 = product of:
          0.019141505 = sum of:
            0.019141505 = weight(_text_:information in 4526) [ClassicSimilarity], result of:
              0.019141505 = score(doc=4526,freq=6.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.23515764 = fieldWeight in 4526, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=4526)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    Reports results of a study of 2 information retrieval systems on a 2.000 document full text medical database. The first system, SAPHIRE, features concept based automatic indexing and statistical retrieval techniques, while the second system, SWORD, features traditional word based Boolean techniques, 16 medical students at Oregon Health Sciences Univ. each performed 10 searches and their results, recorded in terms of recall and precision, showed nearly equal performance for both systems. SAPHIRE was also compared with a version of SWORD modified to use automatic indexing and ranked retrieval. Using batch input of queries, the latter method performed slightly better
    Imprint
    Medford, NJ : Learned Information Inc.
    Source
    Proceedings of the 55th Annual Meeting of the American Society for Information Science, Pittsburgh, 26.-29.10.92. Ed.: D. Shaw
    Type
    a
  7. Fuhr, N.; Knorz, G.: Retrieval test evaluation of a rule based automatic indexing (AIR/PHYS) (1984) 0.01
    0.008412599 = product of:
      0.021031497 = sum of:
        0.01155891 = weight(_text_:a in 2321) [ClassicSimilarity], result of:
          0.01155891 = score(doc=2321,freq=4.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.2161963 = fieldWeight in 2321, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.09375 = fieldNorm(doc=2321)
        0.009472587 = product of:
          0.018945174 = sum of:
            0.018945174 = weight(_text_:information in 2321) [ClassicSimilarity], result of:
              0.018945174 = score(doc=2321,freq=2.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.23274569 = fieldWeight in 2321, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.09375 = fieldNorm(doc=2321)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Source
    Research and development in information retrieval. Proc. of the 3rd joint BCS and ACM symp., Cambridge, 2.-6.7.1984. Ed.: C.J. van Rijsbergen
    Type
    a
  8. Salton, G.; Araya, J.: On the use of clustered file organizations in information search and retrieval (1990) 0.01
    0.008412599 = product of:
      0.021031497 = sum of:
        0.01155891 = weight(_text_:a in 2409) [ClassicSimilarity], result of:
          0.01155891 = score(doc=2409,freq=4.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.2161963 = fieldWeight in 2409, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.09375 = fieldNorm(doc=2409)
        0.009472587 = product of:
          0.018945174 = sum of:
            0.018945174 = weight(_text_:information in 2409) [ClassicSimilarity], result of:
              0.018945174 = score(doc=2409,freq=2.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.23274569 = fieldWeight in 2409, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.09375 = fieldNorm(doc=2409)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Source
    Library classification and its functions. Int. Conf. on ..., 20.-21.6.1989, Edmonton, Alberta. Ed.: A. Nitecki u. T. Fell
    Type
    a
  9. Mongin, L.; Fu, Y.Y.; Mostafa, J.: Open Archives data Service prototype and automated subject indexing using D-Lib archive content as a testbed (2003) 0.01
    0.008240394 = product of:
      0.020600986 = sum of:
        0.0100103095 = weight(_text_:a in 1167) [ClassicSimilarity], result of:
          0.0100103095 = score(doc=1167,freq=12.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.18723148 = fieldWeight in 1167, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=1167)
        0.010590675 = product of:
          0.02118135 = sum of:
            0.02118135 = weight(_text_:information in 1167) [ClassicSimilarity], result of:
              0.02118135 = score(doc=1167,freq=10.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.2602176 = fieldWeight in 1167, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1167)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    The Indiana University School of Library and Information Science opened a new research laboratory in January 2003; The Indiana University School of Library and Information Science Information Processing Laboratory [IU IP Lab]. The purpose of the new laboratory is to facilitate collaboration between scientists in the department in the areas of information retrieval (IR) and information visualization (IV) research. The lab has several areas of focus. These include grid and cluster computing, and a standard Java-based software platform to support plug and play research datasets, a selection of standard IR modules and standard IV algorithms. Future development includes software to enable researchers to contribute datasets, IR algorithms, and visualization algorithms into the standard environment. We decided early on to use OAI-PMH as a resource discovery tool because it is consistent with our mission.
    Type
    a
  10. Zeng, L.: Automatic indexing for Chinese text : problems and progress (1992) 0.01
    0.008234787 = product of:
      0.020586967 = sum of:
        0.009535614 = weight(_text_:a in 1289) [ClassicSimilarity], result of:
          0.009535614 = score(doc=1289,freq=2.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.17835285 = fieldWeight in 1289, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.109375 = fieldNorm(doc=1289)
        0.011051352 = product of:
          0.022102704 = sum of:
            0.022102704 = weight(_text_:information in 1289) [ClassicSimilarity], result of:
              0.022102704 = score(doc=1289,freq=2.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.27153665 = fieldWeight in 1289, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.109375 = fieldNorm(doc=1289)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Source
    Encyclopedia of library and information science. Vol.49, [=Suppl.12]
    Type
    a
  11. Mars, N.J.I.: ¬The management of scientific information, or, how to cope with the flood (1996) 0.01
    0.008150326 = product of:
      0.020375814 = sum of:
        0.009437811 = weight(_text_:a in 7414) [ClassicSimilarity], result of:
          0.009437811 = score(doc=7414,freq=6.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.17652355 = fieldWeight in 7414, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=7414)
        0.010938003 = product of:
          0.021876005 = sum of:
            0.021876005 = weight(_text_:information in 7414) [ClassicSimilarity], result of:
              0.021876005 = score(doc=7414,freq=6.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.2687516 = fieldWeight in 7414, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0625 = fieldNorm(doc=7414)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    Research in the Knowledge-Based Systems Group of the University of Twente in the Netherlands is aimed at reducing information overload. One approach is to support indexing by the traditional method of assigning content descriptions to find documents. A second way is to use a computer program to determine what the document says without descriptors. Discusses automated indexing and direct access to information
    Type
    a
  12. Flores, F.N.; Moreira, V.P.: Assessing the impact of stemming accuracy on information retrieval : a multilingual perspective (2016) 0.01
    0.007909955 = product of:
      0.019774888 = sum of:
        0.008173384 = weight(_text_:a in 3187) [ClassicSimilarity], result of:
          0.008173384 = score(doc=3187,freq=8.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.15287387 = fieldWeight in 3187, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=3187)
        0.011601503 = product of:
          0.023203006 = sum of:
            0.023203006 = weight(_text_:information in 3187) [ClassicSimilarity], result of:
              0.023203006 = score(doc=3187,freq=12.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.2850541 = fieldWeight in 3187, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3187)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    The quality of stemming algorithms is typically measured in two different ways: (i) how accurately they map the variant forms of a word to the same stem; or (ii) how much improvement they bring to Information Retrieval systems. In this article, we evaluate various stemming algorithms, in four languages, in terms of accuracy and in terms of their aid to Information Retrieval. The aim is to assess whether the most accurate stemmers are also the ones that bring the biggest gain in Information Retrieval. Experiments in English, French, Portuguese, and Spanish show that this is not always the case, as stemmers with higher error rates yield better retrieval quality. As a byproduct, we also identified the most accurate stemmers and the best for Information Retrieval purposes.
    Source
    Information processing and management. 52(2016) no.5, S.840-854
    Type
    a
  13. Advances in intelligent retrieval: Proc. of a conference ... Wadham College, Oxford, 16.-17.4.1985 (1986) 0.01
    0.007891519 = product of:
      0.019728797 = sum of:
        0.009138121 = weight(_text_:a in 1384) [ClassicSimilarity], result of:
          0.009138121 = score(doc=1384,freq=10.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.1709182 = fieldWeight in 1384, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=1384)
        0.010590675 = product of:
          0.02118135 = sum of:
            0.02118135 = weight(_text_:information in 1384) [ClassicSimilarity], result of:
              0.02118135 = score(doc=1384,freq=10.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.2602176 = fieldWeight in 1384, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1384)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Content
    Enthält die Beiträge: ADDIS, T.: Extended relational analysis: a design approach to knowledge-based systems; PARKINSON, D.: Supercomputers and non-numeric processing; McGREGOR, D.R. u. J.R. MALONE: An architectural approach to advances in information retrieval; ALLEN, M.J. u. O.S. HARRISON: Word processing and information retrieval: some practical problems; MURTAGH, F.: Clustering and nearest neighborhood searching; ENSER, P.G.B.: Experimenting with the automatic classification of books; TESKEY, N. u. Z. RAZAK: An analysis of ranking for free text retrieval systems; ZARRI, G.P.: Interactive information retrieval: an artificial intelligence approach to deal with biographical data; HANCOX, P. u. F. SMITH: A case system processor for the PRECIS indexing language; ROUAULT, J.: Linguistic methods in information retrieval systems; ARAGON-RAMIREZ, V. u. C.D. PAICE: Design of a system for the online elucidation of natural language search statements; BROOKS, H.M., P.J. DANIELS u. N.J. BELKIN: Problem descriptions and user models: developing an intelligent interface for document retrieval systems; BLACK, W.J., P. HARGREAVES u. P.B. MAYES: HEADS: a cataloguing advisory system; BELL, D.A.: An architecture for integrating data, knowledge, and information bases
  14. Fauzi, F.; Belkhatir, M.: Multifaceted conceptual image indexing on the world wide web (2013) 0.01
    0.007891519 = product of:
      0.019728797 = sum of:
        0.009138121 = weight(_text_:a in 2721) [ClassicSimilarity], result of:
          0.009138121 = score(doc=2721,freq=10.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.1709182 = fieldWeight in 2721, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=2721)
        0.010590675 = product of:
          0.02118135 = sum of:
            0.02118135 = weight(_text_:information in 2721) [ClassicSimilarity], result of:
              0.02118135 = score(doc=2721,freq=10.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.2602176 = fieldWeight in 2721, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2721)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    In this paper, we describe a user-centered design of an automated multifaceted concept-based indexing framework which analyzes the semantics of the Web image contextual information and classifies it into five broad semantic concept facets: signal, object, abstract, scene, and relational; and identifies the semantic relationships between the concepts. An important aspect of our indexing model is that it relates to the users' levels of image descriptions. Also, a major contribution relies on the fact that the classification is performed automatically with the raw image contextual information extracted from any general webpage and is not solely based on image tags like state-of-the-art solutions. Human Language Technology techniques and an external knowledge base are used to analyze the information both syntactically and semantically. Experimental results on a human-annotated Web image collection and corresponding contextual information indicate that our method outperforms empirical frameworks employing tf-idf and location-based tf-idf weighting schemes as well as n-gram indexing in a recall/precision based evaluation framework.
    Source
    Information processing and management. 49(2013) no.2, S.420-440
    Type
    a
  15. Salton, G.; Wong, A.; Yang, C.S.: ¬A vector space model for automatic indexing (1975) 0.01
    0.007876435 = product of:
      0.019691087 = sum of:
        0.011797264 = weight(_text_:a in 1934) [ClassicSimilarity], result of:
          0.011797264 = score(doc=1934,freq=6.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.22065444 = fieldWeight in 1934, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.078125 = fieldNorm(doc=1934)
        0.007893822 = product of:
          0.015787644 = sum of:
            0.015787644 = weight(_text_:information in 1934) [ClassicSimilarity], result of:
              0.015787644 = score(doc=1934,freq=2.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.19395474 = fieldWeight in 1934, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.078125 = fieldNorm(doc=1934)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Footnote
    Wiederabgedruckt in: Readings in information retrieval. Ed.: K. Sparck Jones u. P. Willett. San Francisco: Morgan Kaufmann 1997. S.273-280.
    Type
    a
  16. Warner, A.J.: ¬A linguistic approach to the automated hierarchical organization of phrases (1990) 0.01
    0.007642546 = product of:
      0.019106366 = sum of:
        0.009535614 = weight(_text_:a in 4902) [ClassicSimilarity], result of:
          0.009535614 = score(doc=4902,freq=8.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.17835285 = fieldWeight in 4902, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4902)
        0.009570752 = product of:
          0.019141505 = sum of:
            0.019141505 = weight(_text_:information in 4902) [ClassicSimilarity], result of:
              0.019141505 = score(doc=4902,freq=6.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.23515764 = fieldWeight in 4902, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=4902)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    A linguistic analysis was carried out on 8 sets of phrases automatically selected from documents surrogates in mathematics. The purpose of this analysis was to derive an algorithm which would automatically generate a hierarchically organised arrangement of phrases for online display to the user. This would replace an alphabetical display and would be particularly useful in online browsing of large numbers of items. It is also the first step toward an automatic thesaurus generator
    Imprint
    Medford, NJ : Learned Information Inc.
    Source
    ASIS'90: Information in the year 2000, from research to applications. Proc. of the 53rd Annual Meeting of the American Society for Information Science, Toronto, Canada, 4.-8.11.1990. Ed. by Diana Henderson
    Type
    a
  17. Buckley, C.; Allan, J.; Salton, G.: Automatic routing and retrieval using Smart : TREC-2 (1995) 0.01
    0.0076063494 = product of:
      0.019015873 = sum of:
        0.010812371 = weight(_text_:a in 5699) [ClassicSimilarity], result of:
          0.010812371 = score(doc=5699,freq=14.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.20223314 = fieldWeight in 5699, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=5699)
        0.008203502 = product of:
          0.016407004 = sum of:
            0.016407004 = weight(_text_:information in 5699) [ClassicSimilarity], result of:
              0.016407004 = score(doc=5699,freq=6.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.20156369 = fieldWeight in 5699, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5699)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    The Smart information retrieval project emphazises completely automatic approaches to the understanding and retrieval of large quantities of text. The work in the TREC-2 environment continues, performing both routing and ad hoc experiments. The ad hoc work extends investigations into combining global similarities, giving an overall indication of how a document matches a query, with local similarities identifying a smaller part of the document that matches the query. The performance of ad hoc runs is good, but it is clear that full advantage of the available local information is not been taken advantage of. The routing experiments use conventional relevance feedback approaches to routing, but with a much greater degree of query expansion than was previously done. The length of a query vector is increased by a factor of 5 to 10 by adding terms found in previously seen relevant documents. This approach improves effectiveness by 30-40% over the original query
    Source
    Information processing and management. 31(1995) no.3, S.315-326
    Type
    a
  18. Driscoll, J.R.; Rajala, D.A.; Shaffer, W.H.: ¬The operation and performance of an artificially intelligent keywording system (1991) 0.01
    0.0076044286 = product of:
      0.019011071 = sum of:
        0.013485395 = weight(_text_:a in 6681) [ClassicSimilarity], result of:
          0.013485395 = score(doc=6681,freq=16.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.25222903 = fieldWeight in 6681, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6681)
        0.005525676 = product of:
          0.011051352 = sum of:
            0.011051352 = weight(_text_:information in 6681) [ClassicSimilarity], result of:
              0.011051352 = score(doc=6681,freq=2.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.13576832 = fieldWeight in 6681, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=6681)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    Presents a new approach to text analysis for automating the key phrase indexing process, using artificial intelligence techniques. This mimics the behaviour of human experts by using a rule base consisting of insertion and deletion rules generated by subject-matter experts. The insertion rules are based on the idea that some phrases found in a text imply or trigger other phrases. The deletion rules apply to semantically ambiguous phrases where text presence alone does not determine appropriateness as a key phrase. The insertion and deletion rules are used to transform a list of found phrases to a list of key phrases for indexing a document. Statistical data are provided to demonstrate the performance of this expert rule based system
    Source
    Information processing and management. 27(1991) no.1, S.43-54
    Type
    a
  19. Cunningham, P.; Veale, T.; Conway, A.: Knowledge acquisition for concept indexing in document retrieval (1992) 0.01
    0.007399688 = product of:
      0.01849922 = sum of:
        0.012184162 = weight(_text_:a in 5083) [ClassicSimilarity], result of:
          0.012184162 = score(doc=5083,freq=10.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.22789092 = fieldWeight in 5083, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=5083)
        0.006315058 = product of:
          0.012630116 = sum of:
            0.012630116 = weight(_text_:information in 5083) [ClassicSimilarity], result of:
              0.012630116 = score(doc=5083,freq=2.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.1551638 = fieldWeight in 5083, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0625 = fieldNorm(doc=5083)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    Describes TWIG, a system for knowledge acquisition from text for use in an intelligent document database system. Documents are scanned into the system and converted into a hypertext thus providing a richer environment for browsing and retrieval. The knowledge acquisition phase is blackboard based with the text analysis expertise partitioned into agents that communicate through the blackboard
    Source
    Expert systems for information management. 5(1992) no.1, S.25-41
    Type
    a
  20. Moulaison-Sandy, H.; Adkins, D.; Bossaller, J.; Cho, H.: ¬An automated approach to describing fiction : a methodology to use book reviews to identify affect (2021) 0.01
    0.0073902505 = product of:
      0.018475626 = sum of:
        0.010661141 = weight(_text_:a in 710) [ClassicSimilarity], result of:
          0.010661141 = score(doc=710,freq=10.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.19940455 = fieldWeight in 710, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=710)
        0.007814486 = product of:
          0.015628971 = sum of:
            0.015628971 = weight(_text_:information in 710) [ClassicSimilarity], result of:
              0.015628971 = score(doc=710,freq=4.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.1920054 = fieldWeight in 710, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=710)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    Subject headings and genre terms are notoriously difficult to apply, yet are important for fiction. The current project functions as a proof of concept, using a text-mining methodology to identify affective information (emotion and tone) about fiction titles from professional book reviews as a potential first step in automating the subject analysis process. Findings are presented and discussed, comparing results to the range of aboutness and isness information in library cataloging records. The methodology is likewise presented, and how future work might expand on the current project to enhance catalog records through text-mining is explored.
    Type
    a

Types

  • a 224
  • el 18
  • s 5
  • m 4
  • More… Less…