Search (79 results, page 1 of 4)

Plaunt, C.; Norgard, B.A.: ¬An association-based method for automatic indexing with a controlled vocabulary (1998) 0.07

0.06844571 = sum of:
  0.008582156 = product of:
    0.051492937 = sum of:
      0.051492937 = weight(_text_:authors in 1794) [ClassicSimilarity], result of:
        0.051492937 = score(doc=1794,freq=2.0), product of:
          0.20446584 = queryWeight, product of:
            4.558814 = idf(docFreq=1258, maxDocs=44218)
            0.044850662 = queryNorm
          0.25184128 = fieldWeight in 1794, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            4.558814 = idf(docFreq=1258, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1794)
    0.16666667 = coord(1/6)
  0.059863552 = sum of:
    0.029480325 = weight(_text_:c in 1794) [ClassicSimilarity], result of:
      0.029480325 = score(doc=1794,freq=2.0), product of:
        0.1547081 = queryWeight, product of:
          3.4494052 = idf(docFreq=3817, maxDocs=44218)
          0.044850662 = queryNorm
        0.1905545 = fieldWeight in 1794, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.4494052 = idf(docFreq=3817, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1794)
    0.030383227 = weight(_text_:22 in 1794) [ClassicSimilarity], result of:
      0.030383227 = score(doc=1794,freq=2.0), product of:
        0.15705937 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.044850662 = queryNorm
        0.19345059 = fieldWeight in 1794, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1794)

Abstract: In this article, we describe and test a two-stage algorithm based on a lexical collocation technique which maps from the lexical clues contained in a document representation into a controlled vocabulary list of subject headings. Using a collection of 4.626 INSPEC documents, we create a 'dictionary' of associations between the lexical items contained in the titles, authors, and abstracts, and controlled vocabulary subject headings assigned to those records by human indexers using a likelihood ratio statistic as the measure of association. In the deployment stage, we use the dictiony to predict which of the controlled vocabulary subject headings best describe new documents when they are presented to the system. Our evaluation of this algorithm, in which we compare the automatically assigned subject headings to the subject headings assigned to the test documents by human catalogers, shows that we can obtain results comparable to, and consistent with, human cataloging. In effect we have cast this as a classic partial match information retrieval problem. We consider the problem to be one of 'retrieving' (or assigning) the most probably 'relevant' (or correct) controlled vocabulary subject headings to a document based on the clues contained in that document
Date: 11. 9.2000 19:53:22

Koryconski, C.; Newell, A.F.: Natural-language processing and automatic indexing (1990) 0.05

0.0474217 = sum of:
  0.023837443 = product of:
    0.14302465 = sum of:
      0.14302465 = weight(_text_:back in 2313) [ClassicSimilarity], result of:
        0.14302465 = score(doc=2313,freq=2.0), product of:
          0.26939675 = queryWeight, product of:
            6.006528 = idf(docFreq=295, maxDocs=44218)
            0.044850662 = queryNorm
          0.5309071 = fieldWeight in 2313, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            6.006528 = idf(docFreq=295, maxDocs=44218)
            0.0625 = fieldNorm(doc=2313)
    0.16666667 = coord(1/6)
  0.02358426 = product of:
    0.04716852 = sum of:
      0.04716852 = weight(_text_:c in 2313) [ClassicSimilarity], result of:
        0.04716852 = score(doc=2313,freq=2.0), product of:
          0.1547081 = queryWeight, product of:
            3.4494052 = idf(docFreq=3817, maxDocs=44218)
            0.044850662 = queryNorm
          0.3048872 = fieldWeight in 2313, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.4494052 = idf(docFreq=3817, maxDocs=44218)
            0.0625 = fieldNorm(doc=2313)
    0.5 = coord(1/2)

Abstract: The task of producing satisfactory indexes by automatic means has been tackled on two fronts: by statistical analysis of text and by attempting content analysis of the text in much the same way as a human indexer does. Though statistical techniques have a lot to offer for free-text database systems, neither method has had much success with back-of-the-book indexing. This review examines some problems associated with the application of natural-language processing techniques to book texts. - Vgl. auch die Erwiderung von K.P. Jones

Oliver, C.: Leveraging KOS to extend our reach with automated processes (2021) 0.04

0.03731571 = sum of:
  0.01373145 = product of:
    0.0823887 = sum of:
      0.0823887 = weight(_text_:authors in 722) [ClassicSimilarity], result of:
        0.0823887 = score(doc=722,freq=2.0), product of:
          0.20446584 = queryWeight, product of:
            4.558814 = idf(docFreq=1258, maxDocs=44218)
            0.044850662 = queryNorm
          0.40294603 = fieldWeight in 722, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            4.558814 = idf(docFreq=1258, maxDocs=44218)
            0.0625 = fieldNorm(doc=722)
    0.16666667 = coord(1/6)
  0.02358426 = product of:
    0.04716852 = sum of:
      0.04716852 = weight(_text_:c in 722) [ClassicSimilarity], result of:
        0.04716852 = score(doc=722,freq=2.0), product of:
          0.1547081 = queryWeight, product of:
            3.4494052 = idf(docFreq=3817, maxDocs=44218)
            0.044850662 = queryNorm
          0.3048872 = fieldWeight in 722, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.4494052 = idf(docFreq=3817, maxDocs=44218)
            0.0625 = fieldNorm(doc=722)
    0.5 = coord(1/2)

Abstract: This article provides a conclusion to the special issue on Artificial Intelligence (AI) and Automated Processes for Subject Access. The authors who contributed to this special issue have provoked interesting questions as well as bringing attention to important issues. This concluding article looks at common themes and highlights some of the questions raised.

Chou, C.; Chu, T.: ¬An analysis of BERT (NLP) for assisted subject indexing for Project Gutenberg (2022) 0.03

0.032651246 = sum of:
  0.012015019 = product of:
    0.07209011 = sum of:
      0.07209011 = weight(_text_:authors in 1139) [ClassicSimilarity], result of:
        0.07209011 = score(doc=1139,freq=2.0), product of:
          0.20446584 = queryWeight, product of:
            4.558814 = idf(docFreq=1258, maxDocs=44218)
            0.044850662 = queryNorm
          0.35257778 = fieldWeight in 1139, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            4.558814 = idf(docFreq=1258, maxDocs=44218)
            0.0546875 = fieldNorm(doc=1139)
    0.16666667 = coord(1/6)
  0.020636227 = product of:
    0.041272454 = sum of:
      0.041272454 = weight(_text_:c in 1139) [ClassicSimilarity], result of:
        0.041272454 = score(doc=1139,freq=2.0), product of:
          0.1547081 = queryWeight, product of:
            3.4494052 = idf(docFreq=3817, maxDocs=44218)
            0.044850662 = queryNorm
          0.2667763 = fieldWeight in 1139, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.4494052 = idf(docFreq=3817, maxDocs=44218)
            0.0546875 = fieldNorm(doc=1139)
    0.5 = coord(1/2)

Abstract: In light of AI (Artificial Intelligence) and NLP (Natural language processing) technologies, this article examines the feasibility of using AI/NLP models to enhance the subject indexing of digital resources. While BERT (Bidirectional Encoder Representations from Transformers) models are widely used in scholarly communities, the authors assess whether BERT models can be used in machine-assisted indexing in the Project Gutenberg collection, through suggesting Library of Congress subject headings filtered by certain Library of Congress Classification subclass labels. The findings of this study are informative for further research on BERT models to assist with automatic subject indexing for digital library collections.

Martins, A.L.; Souza, R.R.; Ribeiro de Mello, H.: ¬The use of noun phrases in information retrieval : proposing a mechanism for automatic classification (2014) 0.03
```
0.028829884 = product of:
  0.057659768 = sum of:
    0.057659768 = sum of:
      0.033353183 = weight(_text_:c in 1441) [ClassicSimilarity], result of:
        0.033353183 = score(doc=1441,freq=4.0), product of:
          0.1547081 = queryWeight, product of:
            3.4494052 = idf(docFreq=3817, maxDocs=44218)
            0.044850662 = queryNorm
          0.21558782 = fieldWeight in 1441, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            3.4494052 = idf(docFreq=3817, maxDocs=44218)
            0.03125 = fieldNorm(doc=1441)
      0.024306582 = weight(_text_:22 in 1441) [ClassicSimilarity], result of:
        0.024306582 = score(doc=1441,freq=2.0), product of:
          0.15705937 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.044850662 = queryNorm
          0.15476047 = fieldWeight in 1441, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.03125 = fieldNorm(doc=1441)
  0.5 = coord(1/2)
```
Abstract

This paper presents a research on syntactic structures known as noun phrases (NP) being applied to increase the effectiveness and efficiency of the mechanisms for the document's classification. Our hypothesis is the fact that the NP can be used instead of single words as a semantic aggregator to reduce the number of words that will be used for the classification system without losing its semantic coverage, increasing its efficiency. The experiment divided the documents classification process in three phases: a) NP preprocessing b) system training; and c) classification experiments. In the first step, a corpus of digitalized texts was submitted to a natural language processing platform1 in which the part-of-speech tagging was done, and them PERL scripts pertaining to the PALAVRAS package were used to extract the Noun Phrases. The preprocessing also involved the tasks of a) removing NP low meaning pre-modifiers, as quantifiers; b) identification of synonyms and corresponding substitution for common hyperonyms; and c) stemming of the relevant words contained in the NP, for similitude checking with other NPs. The first tests with the resulting documents have demonstrated its effectiveness. We have compared the structural similarity of the documents before and after the whole pre-processing steps of phase one. The texts maintained the consistency with the original and have kept the readability. The second phase involves submitting the modified documents to a SVM algorithm to identify clusters and classify the documents. The classification rules are to be established using a machine learning approach. Finally, tests will be conducted to check the effectiveness of the whole process.

Source

Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik

Schneider, C.; Womser-Hacker, C.: Inhaltserschließungssysteme für Patenttexte : Test und Systemvergleich im Projekt PADOK (1986) 0.03

0.025014887 = product of:
  0.050029773 = sum of:
    0.050029773 = product of:
      0.10005955 = sum of:
        0.10005955 = weight(_text_:c in 2648) [ClassicSimilarity], result of:
          0.10005955 = score(doc=2648,freq=4.0), product of:
            0.1547081 = queryWeight, product of:
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.044850662 = queryNorm
            0.64676344 = fieldWeight in 2648, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.09375 = fieldNorm(doc=2648)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.02

0.024306582 = product of:
  0.048613165 = sum of:
    0.048613165 = product of:
      0.09722633 = sum of:
        0.09722633 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
          0.09722633 = score(doc=402,freq=2.0), product of:
            0.15705937 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044850662 = queryNorm
            0.61904186 = fieldWeight in 402, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.125 = fieldNorm(doc=402)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Information processing and management. 22(1986) no.6, S.465-476

Greiner-Petter, A.; Schubotz, M.; Cohl, H.S.; Gipp, B.: Semantic preserving bijective mappings for expressions involving special functions between computer algebra systems and document preparation systems (2019) 0.02
```
0.024045076 = sum of:
  0.011891785 = product of:
    0.07135071 = sum of:
      0.07135071 = weight(_text_:authors in 5499) [ClassicSimilarity], result of:
        0.07135071 = score(doc=5499,freq=6.0), product of:
          0.20446584 = queryWeight, product of:
            4.558814 = idf(docFreq=1258, maxDocs=44218)
            0.044850662 = queryNorm
          0.34896153 = fieldWeight in 5499, product of:
            2.4494898 = tf(freq=6.0), with freq of:
              6.0 = termFreq=6.0
            4.558814 = idf(docFreq=1258, maxDocs=44218)
            0.03125 = fieldNorm(doc=5499)
    0.16666667 = coord(1/6)
  0.012153291 = product of:
    0.024306582 = sum of:
      0.024306582 = weight(_text_:22 in 5499) [ClassicSimilarity], result of:
        0.024306582 = score(doc=5499,freq=2.0), product of:
          0.15705937 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.044850662 = queryNorm
          0.15476047 = fieldWeight in 5499, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.03125 = fieldNorm(doc=5499)
    0.5 = coord(1/2)
```
Abstract

Purpose Modern mathematicians and scientists of math-related disciplines often use Document Preparation Systems (DPS) to write and Computer Algebra Systems (CAS) to calculate mathematical expressions. Usually, they translate the expressions manually between DPS and CAS. This process is time-consuming and error-prone. The purpose of this paper is to automate this translation. This paper uses Maple and Mathematica as the CAS, and LaTeX as the DPS. Design/methodology/approach Bruce Miller at the National Institute of Standards and Technology (NIST) developed a collection of special LaTeX macros that create links from mathematical symbols to their definitions in the NIST Digital Library of Mathematical Functions (DLMF). The authors are using these macros to perform rule-based translations between the formulae in the DLMF and CAS. Moreover, the authors develop software to ease the creation of new rules and to discover inconsistencies. Findings The authors created 396 mappings and translated 58.8 percent of DLMF formulae (2,405 expressions) successfully between Maple and DLMF. For a significant percentage, the special function definitions in Maple and the DLMF were different. An atomic symbol in one system maps to a composite expression in the other system. The translator was also successfully used for automatic verification of mathematical online compendia and CAS. The evaluation techniques discovered two errors in the DLMF and one defect in Maple. Originality/value This paper introduces the first translation tool for special functions between LaTeX and CAS. The approach improves error-prone manual translations and can be used to verify mathematical online compendia and CAS.

Date

20. 1.2015 18:30:22

Jones, K.P.: Natural-language processing and automatic indexing : a reply (1990) 0.02

0.02358426 = product of:
  0.04716852 = sum of:
    0.04716852 = product of:
      0.09433704 = sum of:
        0.09433704 = weight(_text_:c in 394) [ClassicSimilarity], result of:
          0.09433704 = score(doc=394,freq=2.0), product of:
            0.1547081 = queryWeight, product of:
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.044850662 = queryNorm
            0.6097744 = fieldWeight in 394, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.125 = fieldNorm(doc=394)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Footnote: Erwiderung auf: Korycinski, C. u. A.F. Newell

Toepfer, M.; Seifert, C.: Content-based quality estimation for automatic subject indexing of short texts under precision and recall constraints 0.02

0.023322318 = sum of:
  0.008582156 = product of:
    0.051492937 = sum of:
      0.051492937 = weight(_text_:authors in 4309) [ClassicSimilarity], result of:
        0.051492937 = score(doc=4309,freq=2.0), product of:
          0.20446584 = queryWeight, product of:
            4.558814 = idf(docFreq=1258, maxDocs=44218)
            0.044850662 = queryNorm
          0.25184128 = fieldWeight in 4309, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            4.558814 = idf(docFreq=1258, maxDocs=44218)
            0.0390625 = fieldNorm(doc=4309)
    0.16666667 = coord(1/6)
  0.0147401625 = product of:
    0.029480325 = sum of:
      0.029480325 = weight(_text_:c in 4309) [ClassicSimilarity], result of:
        0.029480325 = score(doc=4309,freq=2.0), product of:
          0.1547081 = queryWeight, product of:
            3.4494052 = idf(docFreq=3817, maxDocs=44218)
            0.044850662 = queryNorm
          0.1905545 = fieldWeight in 4309, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.4494052 = idf(docFreq=3817, maxDocs=44218)
            0.0390625 = fieldNorm(doc=4309)
    0.5 = coord(1/2)

Content: This is an authors' manuscript version of a paper accepted for proceedings of TPDL-2018, Porto, Portugal, Sept 10-13. The nal authenticated publication is available online at https://doi.org/will be added as soon as available.

Fuhr, N.; Niewelt, B.: ¬Ein Retrievaltest mit automatisch indexierten Dokumenten (1984) 0.02

0.021268258 = product of:
  0.042536516 = sum of:
    0.042536516 = product of:
      0.08507303 = sum of:
        0.08507303 = weight(_text_:22 in 262) [ClassicSimilarity], result of:
          0.08507303 = score(doc=262,freq=2.0), product of:
            0.15705937 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044850662 = queryNorm
            0.5416616 = fieldWeight in 262, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=262)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 20.10.2000 12:22:23

Hlava, M.M.K.: Automatic indexing : comparing rule-based and statistics-based indexing systems (2005) 0.02

0.021268258 = product of:
  0.042536516 = sum of:
    0.042536516 = product of:
      0.08507303 = sum of:
        0.08507303 = weight(_text_:22 in 6265) [ClassicSimilarity], result of:
          0.08507303 = score(doc=6265,freq=2.0), product of:
            0.15705937 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044850662 = queryNorm
            0.5416616 = fieldWeight in 6265, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=6265)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Information outlook. 9(2005) no.8, S.22-23

Gibb, F.; Smart, G.: Knowledge-based indexing : the view from SIMPR (1991) 0.02

0.020636227 = product of:
  0.041272454 = sum of:
    0.041272454 = product of:
      0.08254491 = sum of:
        0.08254491 = weight(_text_:c in 4424) [ClassicSimilarity], result of:
          0.08254491 = score(doc=4424,freq=2.0), product of:
            0.1547081 = queryWeight, product of:
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.044850662 = queryNorm
            0.5335526 = fieldWeight in 4424, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.109375 = fieldNorm(doc=4424)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Libraries and expert systems. Ed. C. MacDonald et al

Schwarz, C.: Komplexe Nominalgruppen als Indexierungseinheiten am Beispiel des Projekte CONDOR (1982) 0.02

0.020636227 = product of:
  0.041272454 = sum of:
    0.041272454 = product of:
      0.08254491 = sum of:
        0.08254491 = weight(_text_:c in 435) [ClassicSimilarity], result of:
          0.08254491 = score(doc=435,freq=2.0), product of:
            0.1547081 = queryWeight, product of:
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.044850662 = queryNorm
            0.5335526 = fieldWeight in 435, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.109375 = fieldNorm(doc=435)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Salton, G.; Allen, J.; Buckley, C.; Singhal, A.: Automatic analysis, theme generation, and summarization of machine-readable data (1994) 0.02

0.020636227 = product of:
  0.041272454 = sum of:
    0.041272454 = product of:
      0.08254491 = sum of:
        0.08254491 = weight(_text_:c in 1168) [ClassicSimilarity], result of:
          0.08254491 = score(doc=1168,freq=2.0), product of:
            0.1547081 = queryWeight, product of:
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.044850662 = queryNorm
            0.5335526 = fieldWeight in 1168, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.109375 = fieldNorm(doc=1168)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Schröther, C.: Automatische Indexierung, Kategorisierung und inhaltliche Erschließung von Textnachrichten (2003) 0.02

0.020636227 = product of:
  0.041272454 = sum of:
    0.041272454 = product of:
      0.08254491 = sum of:
        0.08254491 = weight(_text_:c in 521) [ClassicSimilarity], result of:
          0.08254491 = score(doc=521,freq=2.0), product of:
            0.1547081 = queryWeight, product of:
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.044850662 = queryNorm
            0.5335526 = fieldWeight in 521, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.109375 = fieldNorm(doc=521)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Mesquita, L.A.P.; Souza, R.R.; Baracho Porto, R.M.A.: Noun phrases in automatic indexing: : a structural analysis of the distribution of relevant terms in doctoral theses (2014) 0.02
```
0.019019015 = sum of:
  0.006865725 = product of:
    0.04119435 = sum of:
      0.04119435 = weight(_text_:authors in 1442) [ClassicSimilarity], result of:
        0.04119435 = score(doc=1442,freq=2.0), product of:
          0.20446584 = queryWeight, product of:
            4.558814 = idf(docFreq=1258, maxDocs=44218)
            0.044850662 = queryNorm
          0.20147301 = fieldWeight in 1442, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            4.558814 = idf(docFreq=1258, maxDocs=44218)
            0.03125 = fieldNorm(doc=1442)
    0.16666667 = coord(1/6)
  0.012153291 = product of:
    0.024306582 = sum of:
      0.024306582 = weight(_text_:22 in 1442) [ClassicSimilarity], result of:
        0.024306582 = score(doc=1442,freq=2.0), product of:
          0.15705937 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.044850662 = queryNorm
          0.15476047 = fieldWeight in 1442, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.03125 = fieldNorm(doc=1442)
    0.5 = coord(1/2)
```
Abstract

The main objective of this research was to analyze whether there was a characteristic distribution behavior of relevant terms over a scientific text that could contribute as a criterion for their process of automatic indexing. The terms considered in this study were only full noun phrases contained in the texts themselves. The texts were considered a total of 98 doctoral theses of the eight areas of knowledge in a same university. Initially, 20 full noun phrases were automatically extracted from each text as candidates to be the most relevant terms, and each author of each text assigned a relevance value 0-6 (not relevant and highly relevant, respectively) for each of the 20 noun phrases sent. Only, 22.1 % of noun phrases were considered not relevant. A relevance values of the terms assigned by the authors were associated with their positions in the text. Each full noun phrases found in the text was considered as a valid linear position. The results that were obtained showed values resulting from this distribution by considering two types of position: linear, with values consolidated into ten equal consecutive parts; and structural, considering parts of the text (such as introduction, development and conclusion). As a result of considerable importance, all areas of knowledge related to the Natural Sciences showed a characteristic behavior in the distribution of relevant terms, as well as all areas of knowledge related to Social Sciences showed the same characteristic behavior of distribution, but distinct from the Natural Sciences. The difference of the distribution behavior between the Natural and Social Sciences can be clearly visualized through graphs. All behaviors, including the general behavior of all areas of knowledge together, were characterized in polynomial equations and can be applied in future as criteria for automatic indexing. Until the present date this work has become inedited of for two reasons: to present a method for characterizing the distribution of relevant terms in a scientific text, and also, through this method, pointing out a quantitative trait difference between the Natural and Social Sciences.

Source

Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik

Fuhr, N.: Ranking-Experimente mit gewichteter Indexierung (1986) 0.02

0.018229935 = product of:
  0.03645987 = sum of:
    0.03645987 = product of:
      0.07291974 = sum of:
        0.07291974 = weight(_text_:22 in 58) [ClassicSimilarity], result of:
          0.07291974 = score(doc=58,freq=2.0), product of:
            0.15705937 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044850662 = queryNorm
            0.46428138 = fieldWeight in 58, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=58)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 14. 6.2015 22:12:44

Hauer, M.: Automatische Indexierung (2000) 0.02

0.018229935 = product of:
  0.03645987 = sum of:
    0.03645987 = product of:
      0.07291974 = sum of:
        0.07291974 = weight(_text_:22 in 5887) [ClassicSimilarity], result of:
          0.07291974 = score(doc=5887,freq=2.0), product of:
            0.15705937 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044850662 = queryNorm
            0.46428138 = fieldWeight in 5887, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=5887)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Wissen in Aktion: Wege des Knowledge Managements. 22. Online-Tagung der DGI, Frankfurt am Main, 2.-4.5.2000. Proceedings. Hrsg.: R. Schmidt

Fuhr, N.: Rankingexperimente mit gewichteter Indexierung (1986) 0.02

0.018229935 = product of:
  0.03645987 = sum of:
    0.03645987 = product of:
      0.07291974 = sum of:
        0.07291974 = weight(_text_:22 in 2051) [ClassicSimilarity], result of:
          0.07291974 = score(doc=2051,freq=2.0), product of:
            0.15705937 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044850662 = queryNorm
            0.46428138 = fieldWeight in 2051, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=2051)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 14. 6.2015 22:12:56

Search (79 results, page 1 of 4)

Authors

Years

Languages

Types

Themes