Search (138 results, page 1 of 7)

  • × theme_ss:"Automatisches Indexieren"
  1. Gil-Leiva, I.: SISA-automatic indexing system for scientific articles : experiments with location heuristics rules versus TF-IDF rules (2017) 0.03
    0.03199327 = product of:
      0.25594616 = sum of:
        0.08531539 = product of:
          0.17063078 = sum of:
            0.17063078 = weight(_text_:rules in 3622) [ClassicSimilarity], result of:
              0.17063078 = score(doc=3622,freq=20.0), product of:
                0.16161752 = queryWeight, product of:
                  5.036312 = idf(docFreq=780, maxDocs=44218)
                  0.032090448 = queryNorm
                1.0557691 = fieldWeight in 3622, product of:
                  4.472136 = tf(freq=20.0), with freq of:
                    20.0 = termFreq=20.0
                  5.036312 = idf(docFreq=780, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3622)
          0.5 = coord(1/2)
        0.17063078 = weight(_text_:rules in 3622) [ClassicSimilarity], result of:
          0.17063078 = score(doc=3622,freq=20.0), product of:
            0.16161752 = queryWeight, product of:
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.032090448 = queryNorm
            1.0557691 = fieldWeight in 3622, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.046875 = fieldNorm(doc=3622)
      0.125 = coord(2/16)
    
    Abstract
    Indexing is contextualized and a brief description is provided of some of the most used automatic indexing systems. We describe SISA, a system which uses location heuristics rules, statistical rules like term frequency (TF) or TF-IDF to obtain automatic or semi-automatic indexing, depending on the user's preference. The aim of this research is to ascertain which rules (location heuristics rules or TF-IDF rules) provide the best indexing terms. SISA is used to obtain the automatic indexing of 200 scientific articles on fruit growing written in Portuguese. It uses, on the one hand, location heuristics rules founded on the value of certain parts of the articles for indexing such as titles, abstracts, keywords, headings, first paragraph, conclusions and references and, on the other, TF-IDF rules. The indexing is then evaluated to ascertain retrieval performance through recall, precision and f-measure. Automatic indexing of the articles with location heuristics rules provided the best results with the evaluation measures.
  2. Dow Jones unveils knowledge indexing system (1997) 0.03
    0.02686711 = product of:
      0.14329125 = sum of:
        0.035374884 = weight(_text_:26 in 751) [ClassicSimilarity], result of:
          0.035374884 = score(doc=751,freq=2.0), product of:
            0.113328174 = queryWeight, product of:
              3.5315237 = idf(docFreq=3516, maxDocs=44218)
              0.032090448 = queryNorm
            0.31214553 = fieldWeight in 751, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5315237 = idf(docFreq=3516, maxDocs=44218)
              0.0625 = fieldNorm(doc=751)
        0.035972122 = product of:
          0.071944244 = sum of:
            0.071944244 = weight(_text_:rules in 751) [ClassicSimilarity], result of:
              0.071944244 = score(doc=751,freq=2.0), product of:
                0.16161752 = queryWeight, product of:
                  5.036312 = idf(docFreq=780, maxDocs=44218)
                  0.032090448 = queryNorm
                0.4451513 = fieldWeight in 751, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.036312 = idf(docFreq=780, maxDocs=44218)
                  0.0625 = fieldNorm(doc=751)
          0.5 = coord(1/2)
        0.071944244 = weight(_text_:rules in 751) [ClassicSimilarity], result of:
          0.071944244 = score(doc=751,freq=2.0), product of:
            0.16161752 = queryWeight, product of:
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.032090448 = queryNorm
            0.4451513 = fieldWeight in 751, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.0625 = fieldNorm(doc=751)
      0.1875 = coord(3/16)
    
    Abstract
    Dow Jones Interactive Publishing has developed a sophisticated automatic knowledge indexing system that will allow searchers of the Dow Jones News / Retrieval service to get highly targeted results from a search in the service's Publications Library. Instead of relying on a thesaurus of company names, the new system uses a combination of that basic algorithm plus unique rules based on the editorial styles of individual publications in the Library. Dow Jones have also announced its acceptance of the definitions of 'selected full text' and 'full text' from Bibliodata's Fulltext Sources Online directory
    Source
    Advanced technology libraries. 26(1997) no.5, S.2-3
  3. Driscoll, J.R.; Rajala, D.A.; Shaffer, W.H.: ¬The operation and performance of an artificially intelligent keywording system (1991) 0.02
    0.023606706 = product of:
      0.18885365 = sum of:
        0.062951215 = product of:
          0.12590243 = sum of:
            0.12590243 = weight(_text_:rules in 6681) [ClassicSimilarity], result of:
              0.12590243 = score(doc=6681,freq=8.0), product of:
                0.16161752 = queryWeight, product of:
                  5.036312 = idf(docFreq=780, maxDocs=44218)
                  0.032090448 = queryNorm
                0.77901477 = fieldWeight in 6681, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  5.036312 = idf(docFreq=780, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=6681)
          0.5 = coord(1/2)
        0.12590243 = weight(_text_:rules in 6681) [ClassicSimilarity], result of:
          0.12590243 = score(doc=6681,freq=8.0), product of:
            0.16161752 = queryWeight, product of:
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.032090448 = queryNorm
            0.77901477 = fieldWeight in 6681, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6681)
      0.125 = coord(2/16)
    
    Abstract
    Presents a new approach to text analysis for automating the key phrase indexing process, using artificial intelligence techniques. This mimics the behaviour of human experts by using a rule base consisting of insertion and deletion rules generated by subject-matter experts. The insertion rules are based on the idea that some phrases found in a text imply or trigger other phrases. The deletion rules apply to semantically ambiguous phrases where text presence alone does not determine appropriateness as a key phrase. The insertion and deletion rules are used to transform a list of found phrases to a list of key phrases for indexing a document. Statistical data are provided to demonstrate the performance of this expert rule based system
  4. Hlava, M.M.K.: Machine aided indexing (MAI) in a multilingual environment (1993) 0.02
    0.02064722 = product of:
      0.11011851 = sum of:
        0.031475607 = product of:
          0.062951215 = sum of:
            0.062951215 = weight(_text_:rules in 7405) [ClassicSimilarity], result of:
              0.062951215 = score(doc=7405,freq=2.0), product of:
                0.16161752 = queryWeight, product of:
                  5.036312 = idf(docFreq=780, maxDocs=44218)
                  0.032090448 = queryNorm
                0.38950738 = fieldWeight in 7405, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.036312 = idf(docFreq=780, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=7405)
          0.5 = coord(1/2)
        0.062951215 = weight(_text_:rules in 7405) [ClassicSimilarity], result of:
          0.062951215 = score(doc=7405,freq=2.0), product of:
            0.16161752 = queryWeight, product of:
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.032090448 = queryNorm
            0.38950738 = fieldWeight in 7405, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.0546875 = fieldNorm(doc=7405)
        0.015691686 = product of:
          0.031383373 = sum of:
            0.031383373 = weight(_text_:ed in 7405) [ClassicSimilarity], result of:
              0.031383373 = score(doc=7405,freq=2.0), product of:
                0.11411327 = queryWeight, product of:
                  3.5559888 = idf(docFreq=3431, maxDocs=44218)
                  0.032090448 = queryNorm
                0.27501947 = fieldWeight in 7405, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5559888 = idf(docFreq=3431, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=7405)
          0.5 = coord(1/2)
      0.1875 = coord(3/16)
    
    Abstract
    The machine aided indexing (MAI) software devloped by Access Innovations, Inc., is a semantic based, Boolean statement, rule interpreting application with 3 modules: the MA engine which accepts input files, matches terms in the knowledge base, interprets rules, and outputs a text file with suggested indexing terms; a rule building application allowing each Boolean style rule in the knowledge base to be created or modifies; and a statistical computation module which analyzes performance of the MA software against text manually indexed by professional human indexers. The MA software can be applied across multiple languages and can be used where the text to be searched is in one language and the indexes to be output are in another
    Source
    Proceedings of the 14th National Online Meeting 1993, New York, 4-6 May 1993. Ed.: M.E. Williams
  5. Mansour, N.; Haraty, R.A.; Daher, W.; Houri, M.: ¬An auto-indexing method for Arabic text (2008) 0.02
    0.020150334 = product of:
      0.10746844 = sum of:
        0.026531162 = weight(_text_:26 in 2103) [ClassicSimilarity], result of:
          0.026531162 = score(doc=2103,freq=2.0), product of:
            0.113328174 = queryWeight, product of:
              3.5315237 = idf(docFreq=3516, maxDocs=44218)
              0.032090448 = queryNorm
            0.23410915 = fieldWeight in 2103, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5315237 = idf(docFreq=3516, maxDocs=44218)
              0.046875 = fieldNorm(doc=2103)
        0.026979093 = product of:
          0.053958185 = sum of:
            0.053958185 = weight(_text_:rules in 2103) [ClassicSimilarity], result of:
              0.053958185 = score(doc=2103,freq=2.0), product of:
                0.16161752 = queryWeight, product of:
                  5.036312 = idf(docFreq=780, maxDocs=44218)
                  0.032090448 = queryNorm
                0.33386347 = fieldWeight in 2103, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.036312 = idf(docFreq=780, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2103)
          0.5 = coord(1/2)
        0.053958185 = weight(_text_:rules in 2103) [ClassicSimilarity], result of:
          0.053958185 = score(doc=2103,freq=2.0), product of:
            0.16161752 = queryWeight, product of:
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.032090448 = queryNorm
            0.33386347 = fieldWeight in 2103, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.046875 = fieldNorm(doc=2103)
      0.1875 = coord(3/16)
    
    Abstract
    This work addresses the information retrieval problem of auto-indexing Arabic documents. Auto-indexing a text document refers to automatically extracting words that are suitable for building an index for the document. In this paper, we propose an auto-indexing method for Arabic text documents. This method is mainly based on morphological analysis and on a technique for assigning weights to words. The morphological analysis uses a number of grammatical rules to extract stem words that become candidate index words. The weight assignment technique computes weights for these words relative to the container document. The weight is based on how spread is the word in a document and not only on its rate of occurrence. The candidate index words are then sorted in descending order by weight so that information retrievers can select the more important index words. We empirically verify the usefulness of our method using several examples. For these examples, we obtained an average recall of 46% and an average precision of 64%.
    Date
    1. 8.2008 12:35:26
  6. Medelyan, O.; Witten, I.H.: Domain-independent automatic keyphrase indexing with small training sets (2008) 0.02
    0.019812116 = product of:
      0.10566462 = sum of:
        0.026979093 = product of:
          0.053958185 = sum of:
            0.053958185 = weight(_text_:rules in 1871) [ClassicSimilarity], result of:
              0.053958185 = score(doc=1871,freq=2.0), product of:
                0.16161752 = queryWeight, product of:
                  5.036312 = idf(docFreq=780, maxDocs=44218)
                  0.032090448 = queryNorm
                0.33386347 = fieldWeight in 1871, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.036312 = idf(docFreq=780, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1871)
          0.5 = coord(1/2)
        0.024727343 = weight(_text_:american in 1871) [ClassicSimilarity], result of:
          0.024727343 = score(doc=1871,freq=2.0), product of:
            0.10940785 = queryWeight, product of:
              3.4093587 = idf(docFreq=3973, maxDocs=44218)
              0.032090448 = queryNorm
            0.22601068 = fieldWeight in 1871, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.4093587 = idf(docFreq=3973, maxDocs=44218)
              0.046875 = fieldNorm(doc=1871)
        0.053958185 = weight(_text_:rules in 1871) [ClassicSimilarity], result of:
          0.053958185 = score(doc=1871,freq=2.0), product of:
            0.16161752 = queryWeight, product of:
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.032090448 = queryNorm
            0.33386347 = fieldWeight in 1871, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.046875 = fieldNorm(doc=1871)
      0.1875 = coord(3/16)
    
    Abstract
    Keyphrases are widely used in both physical and digital libraries as a brief, but precise, summary of documents. They help organize material based on content, provide thematic access, represent search results, and assist with navigation. Manual assignment is expensive because trained human indexers must reach an understanding of the document and select appropriate descriptors according to defined cataloging rules. We propose a new method that enhances automatic keyphrase extraction by using semantic information about terms and phrases gleaned from a domain-specific thesaurus. The key advantage of the new approach is that it performs well with very little training data. We evaluate it on a large set of manually indexed documents in the domain of agriculture, compare its consistency with a group of six professional indexers, and explore its performance on smaller collections of documents in other domains and of French and Spanish documents.
    Source
    Journal of the American Society for Information Science and Technology. 59(2008) no.7, S.1026-1040
  7. Kim, P.K.: ¬An automatic indexing of compound words based on mutual information for Korean text retrieval (1995) 0.02
    0.0190771 = product of:
      0.1526168 = sum of:
        0.050872266 = product of:
          0.10174453 = sum of:
            0.10174453 = weight(_text_:rules in 620) [ClassicSimilarity], result of:
              0.10174453 = score(doc=620,freq=4.0), product of:
                0.16161752 = queryWeight, product of:
                  5.036312 = idf(docFreq=780, maxDocs=44218)
                  0.032090448 = queryNorm
                0.629539 = fieldWeight in 620, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.036312 = idf(docFreq=780, maxDocs=44218)
                  0.0625 = fieldNorm(doc=620)
          0.5 = coord(1/2)
        0.10174453 = weight(_text_:rules in 620) [ClassicSimilarity], result of:
          0.10174453 = score(doc=620,freq=4.0), product of:
            0.16161752 = queryWeight, product of:
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.032090448 = queryNorm
            0.629539 = fieldWeight in 620, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.0625 = fieldNorm(doc=620)
      0.125 = coord(2/16)
    
    Abstract
    Presents an automatic indexing technique for compound words suitable for an agglutinative language, specifically Korean. Discusses some construction conditions for compound words and the rules for decomposing compound words to enhance the exhaustivity of indexing, demonstrating that this system, mutual information, enhances both the exhaustivity of indexing and the specifity of terms. Suggests that the construction conditions and rules for decomposition presented may be used in multilingual information retrieval systems to translate the indexing terms of the specific language into those of the language required
  8. Jones, S.; Paynter, G.W.: Automatic extractionof document keyphrases for use in digital libraries : evaluations and applications (2002) 0.02
    0.018951688 = product of:
      0.101075664 = sum of:
        0.05836024 = weight(_text_:author in 601) [ClassicSimilarity], result of:
          0.05836024 = score(doc=601,freq=4.0), product of:
            0.15482868 = queryWeight, product of:
              4.824759 = idf(docFreq=964, maxDocs=44218)
              0.032090448 = queryNorm
            0.3769343 = fieldWeight in 601, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.824759 = idf(docFreq=964, maxDocs=44218)
              0.0390625 = fieldNorm(doc=601)
        0.022109302 = weight(_text_:26 in 601) [ClassicSimilarity], result of:
          0.022109302 = score(doc=601,freq=2.0), product of:
            0.113328174 = queryWeight, product of:
              3.5315237 = idf(docFreq=3516, maxDocs=44218)
              0.032090448 = queryNorm
            0.19509095 = fieldWeight in 601, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5315237 = idf(docFreq=3516, maxDocs=44218)
              0.0390625 = fieldNorm(doc=601)
        0.02060612 = weight(_text_:american in 601) [ClassicSimilarity], result of:
          0.02060612 = score(doc=601,freq=2.0), product of:
            0.10940785 = queryWeight, product of:
              3.4093587 = idf(docFreq=3973, maxDocs=44218)
              0.032090448 = queryNorm
            0.18834224 = fieldWeight in 601, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.4093587 = idf(docFreq=3973, maxDocs=44218)
              0.0390625 = fieldNorm(doc=601)
      0.1875 = coord(3/16)
    
    Abstract
    This article describes an evaluation of the Kea automatic keyphrase extraction algorithm. Document keyphrases are conventionally used as concise descriptors of document content, and are increasingly used in novel ways, including document clustering, searching and browsing interfaces, and retrieval engines. However, it is costly and time consuming to manually assign keyphrases to documents, motivating the development of tools that automatically perform this function. Previous studies have evaluated Kea's performance by measuring its ability to identify author keywords and keyphrases, but this methodology has a number of well-known limitations. The results presented in this article are based on evaluations by human assessors of the quality and appropriateness of Kea keyphrases. The results indicate that, in general, Kea produces keyphrases that are rated positively by human assessors. However, typical Kea settings can degrade performance, particularly those relating to keyphrase length and domain specificity. We found that for some settings, Kea's performance is better than that of similar systems, and that Kea's ranking of extracted keyphrases is effective. We also determined that author-specified keyphrases appear to exhibit an inherent ranking, and that they are rated highly and therefore suitable for use in training and evaluation of automatic keyphrasing systems.
    Date
    26. 5.2002 15:32:08
    Source
    Journal of the American Society for Information Science and technology. 53(2002) no.8, S.653-677
  9. Junger, U.: Can indexing be automated? : the example of the Deutsche Nationalbibliothek (2012) 0.02
    0.017346904 = product of:
      0.13877523 = sum of:
        0.069387615 = weight(_text_:cataloguing in 1717) [ClassicSimilarity], result of:
          0.069387615 = score(doc=1717,freq=4.0), product of:
            0.14268221 = queryWeight, product of:
              4.446252 = idf(docFreq=1408, maxDocs=44218)
              0.032090448 = queryNorm
            0.4863088 = fieldWeight in 1717, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.446252 = idf(docFreq=1408, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1717)
        0.069387615 = weight(_text_:cataloguing in 1717) [ClassicSimilarity], result of:
          0.069387615 = score(doc=1717,freq=4.0), product of:
            0.14268221 = queryWeight, product of:
              4.446252 = idf(docFreq=1408, maxDocs=44218)
              0.032090448 = queryNorm
            0.4863088 = fieldWeight in 1717, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.446252 = idf(docFreq=1408, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1717)
      0.125 = coord(2/16)
    
    Abstract
    The German subject headings authority file (Schlagwortnormdatei/SWD) provides a broad controlled vocabulary for indexing documents of all subjects. Traditionally used for intellectual subject cataloguing primarily of books the Deutsche Nationalbibliothek (DNB, German National Library) has been working on developping and implementing procedures for automated assignment of subject headings for online publications. This project, its results and problems are sketched in the paper.
    Source
    Cataloguing & Classification Quarterly 52(2014) no.1, S.102-109
  10. Martins, A.L.; Souza, R.R.; Ribeiro de Mello, H.: ¬The use of noun phrases in information retrieval : proposing a mechanism for automatic classification (2014) 0.02
    0.016740523 = product of:
      0.08928279 = sum of:
        0.017986061 = product of:
          0.035972122 = sum of:
            0.035972122 = weight(_text_:rules in 1441) [ClassicSimilarity], result of:
              0.035972122 = score(doc=1441,freq=2.0), product of:
                0.16161752 = queryWeight, product of:
                  5.036312 = idf(docFreq=780, maxDocs=44218)
                  0.032090448 = queryNorm
                0.22257565 = fieldWeight in 1441, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.036312 = idf(docFreq=780, maxDocs=44218)
                  0.03125 = fieldNorm(doc=1441)
          0.5 = coord(1/2)
        0.035972122 = weight(_text_:rules in 1441) [ClassicSimilarity], result of:
          0.035972122 = score(doc=1441,freq=2.0), product of:
            0.16161752 = queryWeight, product of:
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.032090448 = queryNorm
            0.22257565 = fieldWeight in 1441, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.03125 = fieldNorm(doc=1441)
        0.035324603 = sum of:
          0.017933354 = weight(_text_:ed in 1441) [ClassicSimilarity], result of:
            0.017933354 = score(doc=1441,freq=2.0), product of:
              0.11411327 = queryWeight, product of:
                3.5559888 = idf(docFreq=3431, maxDocs=44218)
                0.032090448 = queryNorm
              0.15715398 = fieldWeight in 1441, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5559888 = idf(docFreq=3431, maxDocs=44218)
                0.03125 = fieldNorm(doc=1441)
          0.017391251 = weight(_text_:22 in 1441) [ClassicSimilarity], result of:
            0.017391251 = score(doc=1441,freq=2.0), product of:
              0.11237528 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.032090448 = queryNorm
              0.15476047 = fieldWeight in 1441, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.03125 = fieldNorm(doc=1441)
      0.1875 = coord(3/16)
    
    Abstract
    This paper presents a research on syntactic structures known as noun phrases (NP) being applied to increase the effectiveness and efficiency of the mechanisms for the document's classification. Our hypothesis is the fact that the NP can be used instead of single words as a semantic aggregator to reduce the number of words that will be used for the classification system without losing its semantic coverage, increasing its efficiency. The experiment divided the documents classification process in three phases: a) NP preprocessing b) system training; and c) classification experiments. In the first step, a corpus of digitalized texts was submitted to a natural language processing platform1 in which the part-of-speech tagging was done, and them PERL scripts pertaining to the PALAVRAS package were used to extract the Noun Phrases. The preprocessing also involved the tasks of a) removing NP low meaning pre-modifiers, as quantifiers; b) identification of synonyms and corresponding substitution for common hyperonyms; and c) stemming of the relevant words contained in the NP, for similitude checking with other NPs. The first tests with the resulting documents have demonstrated its effectiveness. We have compared the structural similarity of the documents before and after the whole pre-processing steps of phase one. The texts maintained the consistency with the original and have kept the readability. The second phase involves submitting the modified documents to a SVM algorithm to identify clusters and classify the documents. The classification rules are to be established using a machine learning approach. Finally, tests will be conducted to check the effectiveness of the whole process.
    Source
    Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik
  11. Vledutz-Stokolov, N.: Concept recognition in an automatic text-processing system for the life sciences (1987) 0.02
    0.016510097 = product of:
      0.08805385 = sum of:
        0.022482576 = product of:
          0.04496515 = sum of:
            0.04496515 = weight(_text_:rules in 2849) [ClassicSimilarity], result of:
              0.04496515 = score(doc=2849,freq=2.0), product of:
                0.16161752 = queryWeight, product of:
                  5.036312 = idf(docFreq=780, maxDocs=44218)
                  0.032090448 = queryNorm
                0.27821955 = fieldWeight in 2849, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.036312 = idf(docFreq=780, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2849)
          0.5 = coord(1/2)
        0.02060612 = weight(_text_:american in 2849) [ClassicSimilarity], result of:
          0.02060612 = score(doc=2849,freq=2.0), product of:
            0.10940785 = queryWeight, product of:
              3.4093587 = idf(docFreq=3973, maxDocs=44218)
              0.032090448 = queryNorm
            0.18834224 = fieldWeight in 2849, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.4093587 = idf(docFreq=3973, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2849)
        0.04496515 = weight(_text_:rules in 2849) [ClassicSimilarity], result of:
          0.04496515 = score(doc=2849,freq=2.0), product of:
            0.16161752 = queryWeight, product of:
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.032090448 = queryNorm
            0.27821955 = fieldWeight in 2849, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2849)
      0.1875 = coord(3/16)
    
    Abstract
    This article describes a natural-language text-processing system designed as an automatic aid to subject indexing at BIOSIS. The intellectual procedure the system should model is a deep indexing with a controlled vocabulary of biological concepts - Concept Headings (CHs). On the average, ten CHs are assigned to each article by BIOSIS indexers. The automatic procedure consists of two stages: (1) translation of natural-language biological titles into title-semantic representations which are in the constructed formalized language of Concept Primitives, and (2) translation of the latter representations into the language of CHs. The first stage is performed by matching the titles agianst the system's Semantic Vocabulary (SV). The SV currently contains approximately 15.000 biological natural-language terms and their translations in the language of Concept Primitives. Tor the ambiguous terms, the SV contains the algorithmical rules of term disambiguation, ruels based on semantic analysis of the contexts. The second stage of the automatic procedure is performed by matching the title representations against the CH definitions, formulated as Boolean search strategies in the language of Concept Primitives. Three experiments performed with the system and their results are decribed. The most typical problems the system encounters, the problems of lexical and situational ambiguities, are discussed. The disambiguation techniques employed are described and demonstrated in many examples
    Source
    Journal of the American Society for Information Science. 38(1987) no.4, S.269-287
  12. Humphrey, S.M.; Névéol, A.; Browne, A.; Gobeil, J.; Ruch, P.; Darmoni, S.J.: Comparing a rule-based versus statistical system for automatic categorization of MEDLINE documents according to biomedical specialty (2009) 0.02
    0.016510097 = product of:
      0.08805385 = sum of:
        0.022482576 = product of:
          0.04496515 = sum of:
            0.04496515 = weight(_text_:rules in 3300) [ClassicSimilarity], result of:
              0.04496515 = score(doc=3300,freq=2.0), product of:
                0.16161752 = queryWeight, product of:
                  5.036312 = idf(docFreq=780, maxDocs=44218)
                  0.032090448 = queryNorm
                0.27821955 = fieldWeight in 3300, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.036312 = idf(docFreq=780, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3300)
          0.5 = coord(1/2)
        0.02060612 = weight(_text_:american in 3300) [ClassicSimilarity], result of:
          0.02060612 = score(doc=3300,freq=2.0), product of:
            0.10940785 = queryWeight, product of:
              3.4093587 = idf(docFreq=3973, maxDocs=44218)
              0.032090448 = queryNorm
            0.18834224 = fieldWeight in 3300, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.4093587 = idf(docFreq=3973, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3300)
        0.04496515 = weight(_text_:rules in 3300) [ClassicSimilarity], result of:
          0.04496515 = score(doc=3300,freq=2.0), product of:
            0.16161752 = queryWeight, product of:
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.032090448 = queryNorm
            0.27821955 = fieldWeight in 3300, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3300)
      0.1875 = coord(3/16)
    
    Abstract
    Automatic document categorization is an important research problem in Information Science and Natural Language Processing. Many applications, including, Word Sense Disambiguation and Information Retrieval in large collections, can benefit from such categorization. This paper focuses on automatic categorization of documents from the biomedical literature into broad discipline-based categories. Two different systems are described and contrasted: CISMeF, which uses rules based on human indexing of the documents by the Medical Subject Headings (MeSH) controlled vocabulary in order to assign metaterms (MTs), and Journal Descriptor Indexing (JDI), based on human categorization of about 4,000 journals and statistical associations between journal descriptors (JDs) and textwords in the documents. We evaluate and compare the performance of these systems against a gold standard of humanly assigned categories for 100 MEDLINE documents, using six measures selected from trec_eval. The results show that for five of the measures performance is comparable, and for one measure JDI is superior. We conclude that these results favor JDI, given the significantly greater intellectual overhead involved in human indexing and maintaining a rule base for mapping MeSH terms to MTs. We also note a JDI method that associates JDs with MeSH indexing rather than textwords, and it may be worthwhile to investigate whether this JDI method (statistical) and CISMeF (rule-based) might be combined and then evaluated showing they are complementary to one another.
    Source
    Journal of the American Society for Information Science and Technology. 60(2009) no.12, S.2530-2539
  13. Hersh, W.R.; Hickam, D.H.: ¬A comparison of two methods for indexing and retrieval from a full-text medical database (1992) 0.01
    0.014154989 = product of:
      0.075493276 = sum of:
        0.030953024 = weight(_text_:26 in 4526) [ClassicSimilarity], result of:
          0.030953024 = score(doc=4526,freq=2.0), product of:
            0.113328174 = queryWeight, product of:
              3.5315237 = idf(docFreq=3516, maxDocs=44218)
              0.032090448 = queryNorm
            0.27312735 = fieldWeight in 4526, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5315237 = idf(docFreq=3516, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4526)
        0.028848568 = weight(_text_:american in 4526) [ClassicSimilarity], result of:
          0.028848568 = score(doc=4526,freq=2.0), product of:
            0.10940785 = queryWeight, product of:
              3.4093587 = idf(docFreq=3973, maxDocs=44218)
              0.032090448 = queryNorm
            0.26367915 = fieldWeight in 4526, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.4093587 = idf(docFreq=3973, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4526)
        0.015691686 = product of:
          0.031383373 = sum of:
            0.031383373 = weight(_text_:ed in 4526) [ClassicSimilarity], result of:
              0.031383373 = score(doc=4526,freq=2.0), product of:
                0.11411327 = queryWeight, product of:
                  3.5559888 = idf(docFreq=3431, maxDocs=44218)
                  0.032090448 = queryNorm
                0.27501947 = fieldWeight in 4526, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5559888 = idf(docFreq=3431, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=4526)
          0.5 = coord(1/2)
      0.1875 = coord(3/16)
    
    Source
    Proceedings of the 55th Annual Meeting of the American Society for Information Science, Pittsburgh, 26.-29.10.92. Ed.: D. Shaw
  14. Shafer, K.: Scorpion Project explores using Dewey to organize the Web (1996) 0.01
    0.012266113 = product of:
      0.09812891 = sum of:
        0.049064454 = weight(_text_:cataloguing in 6750) [ClassicSimilarity], result of:
          0.049064454 = score(doc=6750,freq=2.0), product of:
            0.14268221 = queryWeight, product of:
              4.446252 = idf(docFreq=1408, maxDocs=44218)
              0.032090448 = queryNorm
            0.34387225 = fieldWeight in 6750, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.446252 = idf(docFreq=1408, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6750)
        0.049064454 = weight(_text_:cataloguing in 6750) [ClassicSimilarity], result of:
          0.049064454 = score(doc=6750,freq=2.0), product of:
            0.14268221 = queryWeight, product of:
              4.446252 = idf(docFreq=1408, maxDocs=44218)
              0.032090448 = queryNorm
            0.34387225 = fieldWeight in 6750, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.446252 = idf(docFreq=1408, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6750)
      0.125 = coord(2/16)
    
    Abstract
    As the amount of accessible information on the WWW increases, so will the cost of accessing it, even if search servcies remain free, due to the increasing amount of time users will have to spend to find needed items. Considers what the seemingly unorganized Web and the organized world of libraries can offer each other. The OCLC Scorpion Project is attempting to combine indexing and cataloguing, specifically focusing on building tools for automatic subject recognition using the technqiues of library science and information retrieval. If subject headings or concept domains can be automatically assigned to electronic items, improved filtering tools for searching can be produced
  15. Vlachidis, A.; Tudhope, D.: ¬A knowledge-based approach to information extraction for semantic interoperability in the archaeology domain (2016) 0.01
    0.011923187 = product of:
      0.0953855 = sum of:
        0.031795166 = product of:
          0.06359033 = sum of:
            0.06359033 = weight(_text_:rules in 2895) [ClassicSimilarity], result of:
              0.06359033 = score(doc=2895,freq=4.0), product of:
                0.16161752 = queryWeight, product of:
                  5.036312 = idf(docFreq=780, maxDocs=44218)
                  0.032090448 = queryNorm
                0.39346188 = fieldWeight in 2895, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.036312 = idf(docFreq=780, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2895)
          0.5 = coord(1/2)
        0.06359033 = weight(_text_:rules in 2895) [ClassicSimilarity], result of:
          0.06359033 = score(doc=2895,freq=4.0), product of:
            0.16161752 = queryWeight, product of:
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.032090448 = queryNorm
            0.39346188 = fieldWeight in 2895, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2895)
      0.125 = coord(2/16)
    
    Abstract
    The article presents a method for automatic semantic indexing of archaeological grey-literature reports using empirical (rule-based) Information Extraction techniques in combination with domain-specific knowledge organization systems. The semantic annotation system (OPTIMA) performs the tasks of Named Entity Recognition, Relation Extraction, Negation Detection, and Word-Sense Disambiguation using hand-crafted rules and terminological resources for associating contextual abstractions with classes of the standard ontology CIDOC Conceptual Reference Model (CRM) for cultural heritage and its archaeological extension, CRM-EH. Relation Extraction (RE) performance benefits from a syntactic-based definition of RE patterns derived from domain oriented corpus analysis. The evaluation also shows clear benefit in the use of assistive natural language processing (NLP) modules relating to Word-Sense Disambiguation, Negation Detection, and Noun Phrase Validation, together with controlled thesaurus expansion. The semantic indexing results demonstrate the capacity of rule-based Information Extraction techniques to deliver interoperable semantic abstractions (semantic annotations) with respect to the CIDOC CRM and archaeological thesauri. Major contributions include recognition of relevant entities using shallow parsing NLP techniques driven by a complimentary use of ontological and terminological domain resources and empirical derivation of context-driven RE rules for the recognition of semantic relationships from phrases of unstructured text.
  16. Greiner-Petter, A.; Schubotz, M.; Cohl, H.S.; Gipp, B.: Semantic preserving bijective mappings for expressions involving special functions between computer algebra systems and document preparation systems (2019) 0.01
    0.011747589 = product of:
      0.06265381 = sum of:
        0.017986061 = product of:
          0.035972122 = sum of:
            0.035972122 = weight(_text_:rules in 5499) [ClassicSimilarity], result of:
              0.035972122 = score(doc=5499,freq=2.0), product of:
                0.16161752 = queryWeight, product of:
                  5.036312 = idf(docFreq=780, maxDocs=44218)
                  0.032090448 = queryNorm
                0.22257565 = fieldWeight in 5499, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.036312 = idf(docFreq=780, maxDocs=44218)
                  0.03125 = fieldNorm(doc=5499)
          0.5 = coord(1/2)
        0.035972122 = weight(_text_:rules in 5499) [ClassicSimilarity], result of:
          0.035972122 = score(doc=5499,freq=2.0), product of:
            0.16161752 = queryWeight, product of:
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.032090448 = queryNorm
            0.22257565 = fieldWeight in 5499, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.03125 = fieldNorm(doc=5499)
        0.008695626 = product of:
          0.017391251 = sum of:
            0.017391251 = weight(_text_:22 in 5499) [ClassicSimilarity], result of:
              0.017391251 = score(doc=5499,freq=2.0), product of:
                0.11237528 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.032090448 = queryNorm
                0.15476047 = fieldWeight in 5499, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03125 = fieldNorm(doc=5499)
          0.5 = coord(1/2)
      0.1875 = coord(3/16)
    
    Abstract
    Purpose Modern mathematicians and scientists of math-related disciplines often use Document Preparation Systems (DPS) to write and Computer Algebra Systems (CAS) to calculate mathematical expressions. Usually, they translate the expressions manually between DPS and CAS. This process is time-consuming and error-prone. The purpose of this paper is to automate this translation. This paper uses Maple and Mathematica as the CAS, and LaTeX as the DPS. Design/methodology/approach Bruce Miller at the National Institute of Standards and Technology (NIST) developed a collection of special LaTeX macros that create links from mathematical symbols to their definitions in the NIST Digital Library of Mathematical Functions (DLMF). The authors are using these macros to perform rule-based translations between the formulae in the DLMF and CAS. Moreover, the authors develop software to ease the creation of new rules and to discover inconsistencies. Findings The authors created 396 mappings and translated 58.8 percent of DLMF formulae (2,405 expressions) successfully between Maple and DLMF. For a significant percentage, the special function definitions in Maple and the DLMF were different. An atomic symbol in one system maps to a composite expression in the other system. The translator was also successfully used for automatic verification of mathematical online compendia and CAS. The evaluation techniques discovered two errors in the DLMF and one defect in Maple. Originality/value This paper introduces the first translation tool for special functions between LaTeX and CAS. The approach improves error-prone manual translations and can be used to verify mathematical online compendia and CAS.
    Date
    20. 1.2015 18:30:22
  17. Olsgaard, J.N.; Evans, E.J.: Improving keyword indexing (1981) 0.01
    0.010678856 = product of:
      0.085430846 = sum of:
        0.044218604 = weight(_text_:26 in 4996) [ClassicSimilarity], result of:
          0.044218604 = score(doc=4996,freq=2.0), product of:
            0.113328174 = queryWeight, product of:
              3.5315237 = idf(docFreq=3516, maxDocs=44218)
              0.032090448 = queryNorm
            0.3901819 = fieldWeight in 4996, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5315237 = idf(docFreq=3516, maxDocs=44218)
              0.078125 = fieldNorm(doc=4996)
        0.04121224 = weight(_text_:american in 4996) [ClassicSimilarity], result of:
          0.04121224 = score(doc=4996,freq=2.0), product of:
            0.10940785 = queryWeight, product of:
              3.4093587 = idf(docFreq=3973, maxDocs=44218)
              0.032090448 = queryNorm
            0.3766845 = fieldWeight in 4996, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.4093587 = idf(docFreq=3973, maxDocs=44218)
              0.078125 = fieldNorm(doc=4996)
      0.125 = coord(2/16)
    
    Date
    26. 2.1996 16:49:13
    Source
    Journal of the American society for information science. 32(1981), S.71-72
  18. Advances in intelligent retrieval: Proc. of a conference ... Wadham College, Oxford, 16.-17.4.1985 (1986) 0.01
    0.010513811 = product of:
      0.08411049 = sum of:
        0.042055245 = weight(_text_:cataloguing in 1384) [ClassicSimilarity], result of:
          0.042055245 = score(doc=1384,freq=2.0), product of:
            0.14268221 = queryWeight, product of:
              4.446252 = idf(docFreq=1408, maxDocs=44218)
              0.032090448 = queryNorm
            0.29474765 = fieldWeight in 1384, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.446252 = idf(docFreq=1408, maxDocs=44218)
              0.046875 = fieldNorm(doc=1384)
        0.042055245 = weight(_text_:cataloguing in 1384) [ClassicSimilarity], result of:
          0.042055245 = score(doc=1384,freq=2.0), product of:
            0.14268221 = queryWeight, product of:
              4.446252 = idf(docFreq=1408, maxDocs=44218)
              0.032090448 = queryNorm
            0.29474765 = fieldWeight in 1384, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.446252 = idf(docFreq=1408, maxDocs=44218)
              0.046875 = fieldNorm(doc=1384)
      0.125 = coord(2/16)
    
    Content
    Enthält die Beiträge: ADDIS, T.: Extended relational analysis: a design approach to knowledge-based systems; PARKINSON, D.: Supercomputers and non-numeric processing; McGREGOR, D.R. u. J.R. MALONE: An architectural approach to advances in information retrieval; ALLEN, M.J. u. O.S. HARRISON: Word processing and information retrieval: some practical problems; MURTAGH, F.: Clustering and nearest neighborhood searching; ENSER, P.G.B.: Experimenting with the automatic classification of books; TESKEY, N. u. Z. RAZAK: An analysis of ranking for free text retrieval systems; ZARRI, G.P.: Interactive information retrieval: an artificial intelligence approach to deal with biographical data; HANCOX, P. u. F. SMITH: A case system processor for the PRECIS indexing language; ROUAULT, J.: Linguistic methods in information retrieval systems; ARAGON-RAMIREZ, V. u. C.D. PAICE: Design of a system for the online elucidation of natural language search statements; BROOKS, H.M., P.J. DANIELS u. N.J. BELKIN: Problem descriptions and user models: developing an intelligent interface for document retrieval systems; BLACK, W.J., P. HARGREAVES u. P.B. MAYES: HEADS: a cataloguing advisory system; BELL, D.A.: An architecture for integrating data, knowledge, and information bases
  19. Munkelt, J.; Schaer, P.; Lepsky, K.: Towards an IR test collection for the German National Library (2018) 0.01
    0.010513811 = product of:
      0.08411049 = sum of:
        0.042055245 = weight(_text_:cataloguing in 4311) [ClassicSimilarity], result of:
          0.042055245 = score(doc=4311,freq=2.0), product of:
            0.14268221 = queryWeight, product of:
              4.446252 = idf(docFreq=1408, maxDocs=44218)
              0.032090448 = queryNorm
            0.29474765 = fieldWeight in 4311, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.446252 = idf(docFreq=1408, maxDocs=44218)
              0.046875 = fieldNorm(doc=4311)
        0.042055245 = weight(_text_:cataloguing in 4311) [ClassicSimilarity], result of:
          0.042055245 = score(doc=4311,freq=2.0), product of:
            0.14268221 = queryWeight, product of:
              4.446252 = idf(docFreq=1408, maxDocs=44218)
              0.032090448 = queryNorm
            0.29474765 = fieldWeight in 4311, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.446252 = idf(docFreq=1408, maxDocs=44218)
              0.046875 = fieldNorm(doc=4311)
      0.125 = coord(2/16)
    
    Abstract
    Automatic content indexing is one of the innovations that are increasingly changing the way libraries work. In theory, it promises a cataloguing service that would hardly be possible with humans in terms of speed, quantity and maybe quality. The German National Library (DNB) has also recognised this potential and is increasingly relying on the automatic indexing of their catalogue content. The DNB took a major step in this direction in 2017, which was announced in two papers. The announcement was rather restrained, but the content of the papers is all the more explosive for the library community: Since September 2017, the DNB has discontinued the intellectual indexing of series Band H and has switched to an automatic process for these series. The subject indexing of online publications (series O) has been purely automatical since 2010; from September 2017, monographs and periodicals published outside the publishing industry and university publications will no longer be indexed by people. This raises the question: What is the quality of the automatic indexing compared to the manual work or in other words to which degree can the automatic indexing replace people without a signi cant drop in regards to quality?
  20. Roberts, D.; Souter, C.: ¬The automation of controlled vocabulary subject indexing of medical journal articles (2000) 0.01
    0.01011716 = product of:
      0.08093728 = sum of:
        0.026979093 = product of:
          0.053958185 = sum of:
            0.053958185 = weight(_text_:rules in 711) [ClassicSimilarity], result of:
              0.053958185 = score(doc=711,freq=2.0), product of:
                0.16161752 = queryWeight, product of:
                  5.036312 = idf(docFreq=780, maxDocs=44218)
                  0.032090448 = queryNorm
                0.33386347 = fieldWeight in 711, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.036312 = idf(docFreq=780, maxDocs=44218)
                  0.046875 = fieldNorm(doc=711)
          0.5 = coord(1/2)
        0.053958185 = weight(_text_:rules in 711) [ClassicSimilarity], result of:
          0.053958185 = score(doc=711,freq=2.0), product of:
            0.16161752 = queryWeight, product of:
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.032090448 = queryNorm
            0.33386347 = fieldWeight in 711, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.046875 = fieldNorm(doc=711)
      0.125 = coord(2/16)
    
    Abstract
    This article discusses the possibility of the automation of sophisticated subject indexing of medical journal articles. Approaches to subject descriptor assignment in information retrieval research are usually either based upon the manual descriptors in the database or generation of search parameters from the text of the article. The principles of the Medline indexing system are described, followed by a summary of a pilot project, based upon the Amed database. The results suggest that a more extended study, based upon Medline, should encompass various components: Extraction of 'concept strings' from titles and abstracts of records, based upon linguistic features characteristic of medical literature. Use of the Unified Medical Language System (UMLS) for identification of controlled vocabulary descriptors. Coordination of descriptors, utilising features of the Medline indexing system. The emphasis should be on system manipulation of data, based upon input, available resources and specifically designed rules.

Years

Languages

  • e 103
  • d 29
  • f 3
  • ja 1
  • nl 1
  • ru 1
  • More… Less…

Types

  • a 125
  • x 6
  • el 4
  • m 3
  • s 2
  • d 1
  • r 1
  • More… Less…