Search (115 results, page 1 of 6)

Hlava, M.M.K.: Automatic indexing : comparing rule-based and statistics-based indexing systems (2005) 0.03

0.03274348 = product of:
  0.11460218 = sum of:
    0.07756059 = weight(_text_:based in 6265) [ClassicSimilarity], result of:
      0.07756059 = score(doc=6265,freq=4.0), product of:
        0.11767787 = queryWeight, product of:
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.03905679 = queryNorm
        0.6590924 = fieldWeight in 6265, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.109375 = fieldNorm(doc=6265)
    0.03704159 = product of:
      0.07408318 = sum of:
        0.07408318 = weight(_text_:22 in 6265) [ClassicSimilarity], result of:
          0.07408318 = score(doc=6265,freq=2.0), product of:
            0.13677022 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03905679 = queryNorm
            0.5416616 = fieldWeight in 6265, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=6265)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)

Source: Information outlook. 9(2005) no.8, S.22-23

Kutschekmanesch, S.; Lutes, B.; Moelle, K.; Thiel, U.; Tzeras, K.: Automated multilingual indexing : a synthesis of rule-based and thesaurus-based methods (1998) 0.02

0.0233882 = product of:
  0.081858695 = sum of:
    0.05540042 = weight(_text_:based in 4157) [ClassicSimilarity], result of:
      0.05540042 = score(doc=4157,freq=4.0), product of:
        0.11767787 = queryWeight, product of:
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.03905679 = queryNorm
        0.47078028 = fieldWeight in 4157, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.078125 = fieldNorm(doc=4157)
    0.026458278 = product of:
      0.052916557 = sum of:
        0.052916557 = weight(_text_:22 in 4157) [ClassicSimilarity], result of:
          0.052916557 = score(doc=4157,freq=2.0), product of:
            0.13677022 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03905679 = queryNorm
            0.38690117 = fieldWeight in 4157, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=4157)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)

Source: Information und Märkte: 50. Deutscher Dokumentartag 1998, Kongreß der Deutschen Gesellschaft für Dokumentation e.V. (DGD), Rheinische Friedrich-Wilhelms-Universität Bonn, 22.-24. September 1998. Hrsg. von Marlies Ockenfeld u. Gerhard J. Mantwill

Stankovic, R. et al.: Indexing of textual databases based on lexical resources : a case study for Serbian (2016) 0.02

0.0233882 = product of:
  0.081858695 = sum of:
    0.05540042 = weight(_text_:based in 2759) [ClassicSimilarity], result of:
      0.05540042 = score(doc=2759,freq=4.0), product of:
        0.11767787 = queryWeight, product of:
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.03905679 = queryNorm
        0.47078028 = fieldWeight in 2759, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.078125 = fieldNorm(doc=2759)
    0.026458278 = product of:
      0.052916557 = sum of:
        0.052916557 = weight(_text_:22 in 2759) [ClassicSimilarity], result of:
          0.052916557 = score(doc=2759,freq=2.0), product of:
            0.13677022 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03905679 = queryNorm
            0.38690117 = fieldWeight in 2759, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2759)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)

Date: 1. 2.2016 18:25:22
Source: Semantic keyword-based search on structured data sources: First COST Action IC1302 International KEYSTONE Conference, IKC 2015, Coimbra, Portugal, September 8-9, 2015. Revised Selected Papers. Eds.: J. Cardoso et al

Tsujii, J.-I.: Automatic acquisition of semantic collocation from corpora (1995) 0.02

0.015001667 = product of:
  0.052505832 = sum of:
    0.03133921 = weight(_text_:based in 4709) [ClassicSimilarity], result of:
      0.03133921 = score(doc=4709,freq=2.0), product of:
        0.11767787 = queryWeight, product of:
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.03905679 = queryNorm
        0.26631355 = fieldWeight in 4709, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.0625 = fieldNorm(doc=4709)
    0.021166623 = product of:
      0.042333245 = sum of:
        0.042333245 = weight(_text_:22 in 4709) [ClassicSimilarity], result of:
          0.042333245 = score(doc=4709,freq=2.0), product of:
            0.13677022 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03905679 = queryNorm
            0.30952093 = fieldWeight in 4709, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=4709)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)

Abstract: Proposes automatic linguistic knowledge acquisition from sublanguage corpora. The system combines existing linguistic knowledge and human intervention with corpus based techniques. The algorithm involves a gradual approximation which works to converge linguistic knowledge gradually towards desirable results. The 1st experiment revealed the characteristic of this algorithm and the others proved the effectiveness of this algorithm for a real corpus
Date: 31. 7.1996 9:22:19

Keller, A.: Attitudes among German- and English-speaking librarians toward (automatic) subject indexing (2015) 0.01
```
0.013681654 = product of:
  0.09577157 = sum of:
    0.09577157 = weight(_text_:great in 2629) [ClassicSimilarity], result of:
      0.09577157 = score(doc=2629,freq=2.0), product of:
        0.21992016 = queryWeight, product of:
          5.6307793 = idf(docFreq=430, maxDocs=44218)
          0.03905679 = queryNorm
        0.43548337 = fieldWeight in 2629, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.6307793 = idf(docFreq=430, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2629)
  0.14285715 = coord(1/7)
```
Abstract

The survey described in this article investigates the attitudes of librarians in German- and English-speaking countries toward subject indexing in general, and automatic subject indexing in particular. The results show great similarity between attitudes in both language areas. Respondents agree that the current quality standards should be upheld and dismiss critical voices claiming that subject indexing has lost relevance. With regard to automatic subject indexing, respondents demonstrate considerable skepticism-both with regard to the likely timeframe and the expected quality of such systems. The author considers how this low acceptance poses a difficulty for those involved in change management.
Plaunt, C.; Norgard, B.A.: ¬An association-based method for automatic indexing with a controlled vocabulary (1998) 0.01
```
0.01347281 = product of:
  0.047154833 = sum of:
    0.033925693 = weight(_text_:based in 1794) [ClassicSimilarity], result of:
      0.033925693 = score(doc=1794,freq=6.0), product of:
        0.11767787 = queryWeight, product of:
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.03905679 = queryNorm
        0.28829288 = fieldWeight in 1794, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1794)
    0.013229139 = product of:
      0.026458278 = sum of:
        0.026458278 = weight(_text_:22 in 1794) [ClassicSimilarity], result of:
          0.026458278 = score(doc=1794,freq=2.0), product of:
            0.13677022 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03905679 = queryNorm
            0.19345059 = fieldWeight in 1794, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1794)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)
```
Abstract

In this article, we describe and test a two-stage algorithm based on a lexical collocation technique which maps from the lexical clues contained in a document representation into a controlled vocabulary list of subject headings. Using a collection of 4.626 INSPEC documents, we create a 'dictionary' of associations between the lexical items contained in the titles, authors, and abstracts, and controlled vocabulary subject headings assigned to those records by human indexers using a likelihood ratio statistic as the measure of association. In the deployment stage, we use the dictiony to predict which of the controlled vocabulary subject headings best describe new documents when they are presented to the system. Our evaluation of this algorithm, in which we compare the automatically assigned subject headings to the subject headings assigned to the test documents by human catalogers, shows that we can obtain results comparable to, and consistent with, human cataloging. In effect we have cast this as a classic partial match information retrieval problem. We consider the problem to be one of 'retrieving' (or assigning) the most probably 'relevant' (or correct) controlled vocabulary subject headings to a document based on the clues contained in that document

Date

11. 9.2000 19:53:22

Hodges, P.R.: Keyword in title indexes : effectiveness of retrieval in computer searches (1983) 0.01

0.013126459 = product of:
  0.045942605 = sum of:
    0.02742181 = weight(_text_:based in 5001) [ClassicSimilarity], result of:
      0.02742181 = score(doc=5001,freq=2.0), product of:
        0.11767787 = queryWeight, product of:
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.03905679 = queryNorm
        0.23302436 = fieldWeight in 5001, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5001)
    0.018520795 = product of:
      0.03704159 = sum of:
        0.03704159 = weight(_text_:22 in 5001) [ClassicSimilarity], result of:
          0.03704159 = score(doc=5001,freq=2.0), product of:
            0.13677022 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03905679 = queryNorm
            0.2708308 = fieldWeight in 5001, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5001)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)

Abstract: A study was done to test the effectiveness of retrieval using title word searching. It was based on actual search profiles used in the Mechanized Information Center at Ohio State University, in order ro replicate as closely as possible actual searching conditions. Fewer than 50% of the relevant titles were retrieved by keywords in titles. The low rate of retrieval can be attributes to three sources: titles themselves, user and information specialist ignorance of the subject vocabulary in use, and to general language problems. Across fields it was found that the social sciences had the best retrieval rate, with science having the next best, and arts and humanities the lowest. Ways to enhance and supplement keyword in title searching on the computer and in printed indexes are discussed.
Date: 14. 3.1996 13:22:21

Gibb, F.; Smart, G.: Knowledge-based indexing : the view from SIMPR (1991) 0.01

0.007834803 = product of:
  0.05484362 = sum of:
    0.05484362 = weight(_text_:based in 4424) [ClassicSimilarity], result of:
      0.05484362 = score(doc=4424,freq=2.0), product of:
        0.11767787 = queryWeight, product of:
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.03905679 = queryNorm
        0.46604872 = fieldWeight in 4424, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.109375 = fieldNorm(doc=4424)
  0.14285715 = coord(1/7)

Roberts, D.; Souter, C.: ¬The automation of controlled vocabulary subject indexing of medical journal articles (2000) 0.01
```
0.0075082085 = product of:
  0.052557457 = sum of:
    0.052557457 = weight(_text_:based in 711) [ClassicSimilarity], result of:
      0.052557457 = score(doc=711,freq=10.0), product of:
        0.11767787 = queryWeight, product of:
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.03905679 = queryNorm
        0.44662142 = fieldWeight in 711, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.046875 = fieldNorm(doc=711)
  0.14285715 = coord(1/7)
```
Abstract

This article discusses the possibility of the automation of sophisticated subject indexing of medical journal articles. Approaches to subject descriptor assignment in information retrieval research are usually either based upon the manual descriptors in the database or generation of search parameters from the text of the article. The principles of the Medline indexing system are described, followed by a summary of a pilot project, based upon the Amed database. The results suggest that a more extended study, based upon Medline, should encompass various components: Extraction of 'concept strings' from titles and abstracts of records, based upon linguistic features characteristic of medical literature. Use of the Unified Medical Language System (UMLS) for identification of controlled vocabulary descriptors. Coordination of descriptors, utilising features of the Medline indexing system. The emphasis should be on system manipulation of data, based upon input, available resources and specifically designed rules.
Greiner-Petter, A.; Schubotz, M.; Cohl, H.S.; Gipp, B.: Semantic preserving bijective mappings for expressions involving special functions between computer algebra systems and document preparation systems (2019) 0.01
```
0.0075008334 = product of:
  0.026252916 = sum of:
    0.015669605 = weight(_text_:based in 5499) [ClassicSimilarity], result of:
      0.015669605 = score(doc=5499,freq=2.0), product of:
        0.11767787 = queryWeight, product of:
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.03905679 = queryNorm
        0.13315678 = fieldWeight in 5499, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.03125 = fieldNorm(doc=5499)
    0.010583311 = product of:
      0.021166623 = sum of:
        0.021166623 = weight(_text_:22 in 5499) [ClassicSimilarity], result of:
          0.021166623 = score(doc=5499,freq=2.0), product of:
            0.13677022 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03905679 = queryNorm
            0.15476047 = fieldWeight in 5499, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=5499)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)
```
Abstract

Purpose Modern mathematicians and scientists of math-related disciplines often use Document Preparation Systems (DPS) to write and Computer Algebra Systems (CAS) to calculate mathematical expressions. Usually, they translate the expressions manually between DPS and CAS. This process is time-consuming and error-prone. The purpose of this paper is to automate this translation. This paper uses Maple and Mathematica as the CAS, and LaTeX as the DPS. Design/methodology/approach Bruce Miller at the National Institute of Standards and Technology (NIST) developed a collection of special LaTeX macros that create links from mathematical symbols to their definitions in the NIST Digital Library of Mathematical Functions (DLMF). The authors are using these macros to perform rule-based translations between the formulae in the DLMF and CAS. Moreover, the authors develop software to ease the creation of new rules and to discover inconsistencies. Findings The authors created 396 mappings and translated 58.8 percent of DLMF formulae (2,405 expressions) successfully between Maple and DLMF. For a significant percentage, the special function definitions in Maple and the DLMF were different. An atomic symbol in one system maps to a composite expression in the other system. The translator was also successfully used for automatic verification of mathematical online compendia and CAS. The evaluation techniques discovered two errors in the DLMF and one defect in Maple. Originality/value This paper introduces the first translation tool for special functions between LaTeX and CAS. The approach improves error-prone manual translations and can be used to verify mathematical online compendia and CAS.

Date

20. 1.2015 18:30:22

Fuhr, N.; Knorz, G.: Retrieval test evaluation of a rule based automatic indexing (AIR/PHYS) (1984) 0.01

0.0067155454 = product of:
  0.047008816 = sum of:
    0.047008816 = weight(_text_:based in 2321) [ClassicSimilarity], result of:
      0.047008816 = score(doc=2321,freq=2.0), product of:
        0.11767787 = queryWeight, product of:
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.03905679 = queryNorm
        0.39947033 = fieldWeight in 2321, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.09375 = fieldNorm(doc=2321)
  0.14285715 = coord(1/7)

Salton, G.: Future prospects for text-based information retrieval (1990) 0.01

0.0067155454 = product of:
  0.047008816 = sum of:
    0.047008816 = weight(_text_:based in 2327) [ClassicSimilarity], result of:
      0.047008816 = score(doc=2327,freq=2.0), product of:
        0.11767787 = queryWeight, product of:
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.03905679 = queryNorm
        0.39947033 = fieldWeight in 2327, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.09375 = fieldNorm(doc=2327)
  0.14285715 = coord(1/7)

Fauzi, F.; Belkhatir, M.: Multifaceted conceptual image indexing on the world wide web (2013) 0.01
```
0.0067155454 = product of:
  0.047008816 = sum of:
    0.047008816 = weight(_text_:based in 2721) [ClassicSimilarity], result of:
      0.047008816 = score(doc=2721,freq=8.0), product of:
        0.11767787 = queryWeight, product of:
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.03905679 = queryNorm
        0.39947033 = fieldWeight in 2721, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.046875 = fieldNorm(doc=2721)
  0.14285715 = coord(1/7)
```
Abstract

In this paper, we describe a user-centered design of an automated multifaceted concept-based indexing framework which analyzes the semantics of the Web image contextual information and classifies it into five broad semantic concept facets: signal, object, abstract, scene, and relational; and identifies the semantic relationships between the concepts. An important aspect of our indexing model is that it relates to the users' levels of image descriptions. Also, a major contribution relies on the fact that the classification is performed automatically with the raw image contextual information extracted from any general webpage and is not solely based on image tags like state-of-the-art solutions. Human Language Technology techniques and an external knowledge base are used to analyze the information both syntactically and semantically. Experimental results on a human-annotated Web image collection and corresponding contextual information indicate that our method outperforms empirical frameworks employing tf-idf and location-based tf-idf weighting schemes as well as n-gram indexing in a recall/precision based evaluation framework.
Faraj, N.: Analyse d'une methode d'indexation automatique basée sur une analyse syntaxique de texte (1996) 0.01
```
0.006331477 = product of:
  0.044320337 = sum of:
    0.044320337 = weight(_text_:based in 685) [ClassicSimilarity], result of:
      0.044320337 = score(doc=685,freq=4.0), product of:
        0.11767787 = queryWeight, product of:
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.03905679 = queryNorm
        0.37662423 = fieldWeight in 685, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.0625 = fieldNorm(doc=685)
  0.14285715 = coord(1/7)
```
Abstract

Evaluates an automatic indexing method based on syntactical text analysis combined with statistical analysis. Tests many combinations for the choice of term categories and weighting methods. The experiment, conducted on a software engineering corpus, shows systematic improvement in the use of syntactic term phrases compared to using only individual words as index terms

Footnote

Übers. d. Titels: Analysis of an automatic indexing method based on syntactic analysis of text
Schuegraf, E.J.; Bommel, M.F.van: ¬An automatic document indexing system based on cooperating expert systems : design and development (1993) 0.01
```
0.006331477 = product of:
  0.044320337 = sum of:
    0.044320337 = weight(_text_:based in 6504) [ClassicSimilarity], result of:
      0.044320337 = score(doc=6504,freq=4.0), product of:
        0.11767787 = queryWeight, product of:
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.03905679 = queryNorm
        0.37662423 = fieldWeight in 6504, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.0625 = fieldNorm(doc=6504)
  0.14285715 = coord(1/7)
```
Abstract

Discusses the design of an automatic indexing system based on two cooperating expert systems and the investigation related to its development. The design combines statistical and artificial intelligence techniques. Examines choice of content indicators, the effect of stemming and the identification of characteristic vocabularies for given subject areas. Presents experimental results. Discusses the application of machine learning algorithms to the identification of vocabularies
Prasad, A.R.D.: PROMETHEUS: an automatic indexing system (1996) 0.01
```
0.006331477 = product of:
  0.044320337 = sum of:
    0.044320337 = weight(_text_:based in 5189) [ClassicSimilarity], result of:
      0.044320337 = score(doc=5189,freq=4.0), product of:
        0.11767787 = queryWeight, product of:
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.03905679 = queryNorm
        0.37662423 = fieldWeight in 5189, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.0625 = fieldNorm(doc=5189)
  0.14285715 = coord(1/7)
```
Abstract

An automatic indexing system using the tools and techniques of artificial intelligence is described. The paper presents the various components of the system like the parser, grammar formalism, lexicon, and the frame based knowledge representation for semantic representation. The semantic representation is based on the Ranganathan school of thought, especially that of Deep Structure of Subject Indexing Languages enunciated by Bhattacharyya. It is attempted to demonstrate the various stepts in indexing by providing an illustration
Gödert, W.: Detecting multiword phrases in mathematical text corpora (2012) 0.01
```
0.006331477 = product of:
  0.044320337 = sum of:
    0.044320337 = weight(_text_:based in 466) [ClassicSimilarity], result of:
      0.044320337 = score(doc=466,freq=4.0), product of:
        0.11767787 = queryWeight, product of:
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.03905679 = queryNorm
        0.37662423 = fieldWeight in 466, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.0625 = fieldNorm(doc=466)
  0.14285715 = coord(1/7)
```
Abstract

We present an approach for detecting multiword phrases in mathematical text corpora. The method used is based on characteristic features of mathematical terminology. It makes use of a software tool named Lingo which allows to identify words by means of previously defined dictionaries for specific word classes as adjectives, personal names or nouns. The detection of multiword groups is done algorithmically. Possible advantages of the method for indexing and information retrieval and conclusions for applying dictionary-based methods of automatic indexing instead of stemming procedures are discussed.
Humphrey, S.M.; Névéol, A.; Browne, A.; Gobeil, J.; Ruch, P.; Darmoni, S.J.: Comparing a rule-based versus statistical system for automatic categorization of MEDLINE documents according to biomedical specialty (2009) 0.01
```
0.00625684 = product of:
  0.04379788 = sum of:
    0.04379788 = weight(_text_:based in 3300) [ClassicSimilarity], result of:
      0.04379788 = score(doc=3300,freq=10.0), product of:
        0.11767787 = queryWeight, product of:
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.03905679 = queryNorm
        0.37218451 = fieldWeight in 3300, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3300)
  0.14285715 = coord(1/7)
```
Abstract

Automatic document categorization is an important research problem in Information Science and Natural Language Processing. Many applications, including, Word Sense Disambiguation and Information Retrieval in large collections, can benefit from such categorization. This paper focuses on automatic categorization of documents from the biomedical literature into broad discipline-based categories. Two different systems are described and contrasted: CISMeF, which uses rules based on human indexing of the documents by the Medical Subject Headings (MeSH) controlled vocabulary in order to assign metaterms (MTs), and Journal Descriptor Indexing (JDI), based on human categorization of about 4,000 journals and statistical associations between journal descriptors (JDs) and textwords in the documents. We evaluate and compare the performance of these systems against a gold standard of humanly assigned categories for 100 MEDLINE documents, using six measures selected from trec_eval. The results show that for five of the measures performance is comparable, and for one measure JDI is superior. We conclude that these results favor JDI, given the significantly greater intellectual overhead involved in human indexing and maintaining a rule base for mapping MeSH terms to MTs. We also note a JDI method that associates JDs with MeSH indexing rather than textwords, and it may be worthwhile to investigate whether this JDI method (statistical) and CISMeF (rule-based) might be combined and then evaluated showing they are complementary to one another.
Yang, T.-H.; Hsieh, Y.-L.; Liu, S.-H.; Chang, Y.-C.; Hsu, W.-L.: ¬A flexible template generation and matching method with applications for publication reference metadata extraction (2021) 0.01
```
0.00625684 = product of:
  0.04379788 = sum of:
    0.04379788 = weight(_text_:based in 63) [ClassicSimilarity], result of:
      0.04379788 = score(doc=63,freq=10.0), product of:
        0.11767787 = queryWeight, product of:
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.03905679 = queryNorm
        0.37218451 = fieldWeight in 63, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.0390625 = fieldNorm(doc=63)
  0.14285715 = coord(1/7)
```
Abstract

Conventional rule-based approaches use exact template matching to capture linguistic information and necessarily need to enumerate all variations. We propose a novel flexible template generation and matching scheme called the principle-based approach (PBA) based on sequence alignment, and employ it for reference metadata extraction (RME) to demonstrate its effectiveness. The main contributions of this research are threefold. First, we propose an automatic template generation that can capture prominent patterns using the dominating set algorithm. Second, we devise an alignment-based template-matching technique that uses a logistic regression model, which makes it more general and flexible than pure rule-based approaches. Last, we apply PBA to RME on extensive cross-domain corpora and demonstrate its robustness and generality. Experiments reveal that the same set of templates produced by the PBA framework not only deliver consistent performance on various unseen domains, but also surpass hand-crafted knowledge (templates). We use four independent journal style test sets and one conference style test set in the experiments. When compared to renowned machine learning methods, such as conditional random fields (CRF), as well as recent deep learning methods (i.e., bi-directional long short-term memory with a CRF layer, Bi-LSTM-CRF), PBA has the best performance for all datasets.

Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.01

0.006047607 = product of:
  0.042333245 = sum of:
    0.042333245 = product of:
      0.08466649 = sum of:
        0.08466649 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
          0.08466649 = score(doc=402,freq=2.0), product of:
            0.13677022 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03905679 = queryNorm
            0.61904186 = fieldWeight in 402, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.125 = fieldNorm(doc=402)
      0.5 = coord(1/2)
  0.14285715 = coord(1/7)

Source: Information processing and management. 22(1986) no.6, S.465-476

Search (115 results, page 1 of 6)

Authors

Years

Languages

Types

Themes

Classifications