Search (11 results, page 1 of 1)

Martins, A.L.; Souza, R.R.; Ribeiro de Mello, H.: ¬The use of noun phrases in information retrieval : proposing a mechanism for automatic classification (2014) 0.04
```
0.035266254 = product of:
  0.07053251 = sum of:
    0.07053251 = sum of:
      0.012346405 = weight(_text_:h in 1441) [ClassicSimilarity], result of:
        0.012346405 = score(doc=1441,freq=2.0), product of:
          0.11244635 = queryWeight, product of:
            2.4844491 = idf(docFreq=10020, maxDocs=44218)
            0.045260075 = queryNorm
          0.10979818 = fieldWeight in 1441, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            2.4844491 = idf(docFreq=10020, maxDocs=44218)
            0.03125 = fieldNorm(doc=1441)
      0.033657644 = weight(_text_:c in 1441) [ClassicSimilarity], result of:
        0.033657644 = score(doc=1441,freq=4.0), product of:
          0.15612034 = queryWeight, product of:
            3.4494052 = idf(docFreq=3817, maxDocs=44218)
            0.045260075 = queryNorm
          0.21558782 = fieldWeight in 1441, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            3.4494052 = idf(docFreq=3817, maxDocs=44218)
            0.03125 = fieldNorm(doc=1441)
      0.02452846 = weight(_text_:22 in 1441) [ClassicSimilarity], result of:
        0.02452846 = score(doc=1441,freq=2.0), product of:
          0.15849307 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.045260075 = queryNorm
          0.15476047 = fieldWeight in 1441, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.03125 = fieldNorm(doc=1441)
  0.5 = coord(1/2)
```
Abstract

This paper presents a research on syntactic structures known as noun phrases (NP) being applied to increase the effectiveness and efficiency of the mechanisms for the document's classification. Our hypothesis is the fact that the NP can be used instead of single words as a semantic aggregator to reduce the number of words that will be used for the classification system without losing its semantic coverage, increasing its efficiency. The experiment divided the documents classification process in three phases: a) NP preprocessing b) system training; and c) classification experiments. In the first step, a corpus of digitalized texts was submitted to a natural language processing platform1 in which the part-of-speech tagging was done, and them PERL scripts pertaining to the PALAVRAS package were used to extract the Noun Phrases. The preprocessing also involved the tasks of a) removing NP low meaning pre-modifiers, as quantifiers; b) identification of synonyms and corresponding substitution for common hyperonyms; and c) stemming of the relevant words contained in the NP, for similitude checking with other NPs. The first tests with the resulting documents have demonstrated its effectiveness. We have compared the structural similarity of the documents before and after the whole pre-processing steps of phase one. The texts maintained the consistency with the original and have kept the readability. The second phase involves submitting the modified documents to a SVM algorithm to identify clusters and classify the documents. The classification rules are to be established using a machine learning approach. Finally, tests will be conducted to check the effectiveness of the whole process.

Source

Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik
Cui, H.; Boufford, D.; Selden, P.: Semantic annotation of biosystematics literature without training examples (2010) 0.02
```
0.018072978 = product of:
  0.036145955 = sum of:
    0.036145955 = product of:
      0.05421893 = sum of:
        0.018519606 = weight(_text_:h in 3422) [ClassicSimilarity], result of:
          0.018519606 = score(doc=3422,freq=2.0), product of:
            0.11244635 = queryWeight, product of:
              2.4844491 = idf(docFreq=10020, maxDocs=44218)
              0.045260075 = queryNorm
            0.16469726 = fieldWeight in 3422, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.4844491 = idf(docFreq=10020, maxDocs=44218)
              0.046875 = fieldNorm(doc=3422)
        0.035699323 = weight(_text_:c in 3422) [ClassicSimilarity], result of:
          0.035699323 = score(doc=3422,freq=2.0), product of:
            0.15612034 = queryWeight, product of:
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.045260075 = queryNorm
            0.22866541 = fieldWeight in 3422, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.046875 = fieldNorm(doc=3422)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)
```
Abstract

This article presents an unsupervised algorithm for semantic annotation of morphological descriptions of whole organisms. The algorithm is able to annotate plain text descriptions with high accuracy at the clause level by exploiting the corpus itself. In other words, the algorithm does not need lexicons, syntactic parsers, training examples, or annotation templates. The evaluation on two real-life description collections in botany and paleontology shows that the algorithm has the following desirable features: (a) reduces/eliminates manual labor required to compile dictionaries and prepare source documents; (b) improves annotation coverage: the algorithm annotates what appears in documents and is not limited by predefined and often incomplete templates; (c) learns clean and reusable concepts: the algorithm learns organ names and character states that can be used to construct reusable domain lexicons, as opposed to collection-dependent patterns whose applicability is often limited to a particular collection; (d) insensitive to collection size; and (e) runs in linear time with respect to the number of clauses to be annotated.

Stankovic, R. et al.: Indexing of textual databases based on lexical resources : a case study for Serbian (2016) 0.01

0.010220192 = product of:
  0.020440385 = sum of:
    0.020440385 = product of:
      0.061321154 = sum of:
        0.061321154 = weight(_text_:22 in 2759) [ClassicSimilarity], result of:
          0.061321154 = score(doc=2759,freq=2.0), product of:
            0.15849307 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045260075 = queryNorm
            0.38690117 = fieldWeight in 2759, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2759)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Date: 1. 2.2016 18:25:22

Siebenkäs, A.; Markscheffel, B.: Conception of a workflow for the semi-automatic construction of a thesaurus for the German printing industry (2015) 0.01

0.009816813 = product of:
  0.019633627 = sum of:
    0.019633627 = product of:
      0.058900878 = sum of:
        0.058900878 = weight(_text_:c in 2091) [ClassicSimilarity], result of:
          0.058900878 = score(doc=2091,freq=4.0), product of:
            0.15612034 = queryWeight, product of:
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.045260075 = queryNorm
            0.3772787 = fieldWeight in 2091, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2091)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Source: Re:inventing information science in the networked society: Proceedings of the 14th International Symposium on Information Science, Zadar/Croatia, 19th-21st May 2015. Eds.: F. Pehar, C. Schloegl u. C. Wolff

Vilares, D.; Alonso, M.A.; Gómez-Rodríguez, C.: On the usefulness of lexical and syntactic processing in polarity classification of Twitter messages (2015) 0.00

0.0049582394 = product of:
  0.009916479 = sum of:
    0.009916479 = product of:
      0.029749434 = sum of:
        0.029749434 = weight(_text_:c in 2161) [ClassicSimilarity], result of:
          0.029749434 = score(doc=2161,freq=2.0), product of:
            0.15612034 = queryWeight, product of:
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.045260075 = queryNorm
            0.1905545 = fieldWeight in 2161, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2161)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Li, X.; Zhang, A.; Li, C.; Ouyang, J.; Cai, Y.: Exploring coherent topics by topic modeling with term weighting (2018) 0.00

0.0049582394 = product of:
  0.009916479 = sum of:
    0.009916479 = product of:
      0.029749434 = sum of:
        0.029749434 = weight(_text_:c in 5045) [ClassicSimilarity], result of:
          0.029749434 = score(doc=5045,freq=2.0), product of:
            0.15612034 = queryWeight, product of:
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.045260075 = queryNorm
            0.1905545 = fieldWeight in 5045, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5045)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Mesquita, L.A.P.; Souza, R.R.; Baracho Porto, R.M.A.: Noun phrases in automatic indexing: : a structural analysis of the distribution of relevant terms in doctoral theses (2014) 0.00

0.0040880768 = product of:
  0.0081761535 = sum of:
    0.0081761535 = product of:
      0.02452846 = sum of:
        0.02452846 = weight(_text_:22 in 1442) [ClassicSimilarity], result of:
          0.02452846 = score(doc=1442,freq=2.0), product of:
            0.15849307 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045260075 = queryNorm
            0.15476047 = fieldWeight in 1442, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=1442)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Source: Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik

Greiner-Petter, A.; Schubotz, M.; Cohl, H.S.; Gipp, B.: Semantic preserving bijective mappings for expressions involving special functions between computer algebra systems and document preparation systems (2019) 0.00

0.0040880768 = product of:
  0.0081761535 = sum of:
    0.0081761535 = product of:
      0.02452846 = sum of:
        0.02452846 = weight(_text_:22 in 5499) [ClassicSimilarity], result of:
          0.02452846 = score(doc=5499,freq=2.0), product of:
            0.15849307 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045260075 = queryNorm
            0.15476047 = fieldWeight in 5499, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=5499)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Date: 20. 1.2015 18:30:22

Willis, C.; Losee, R.M.: ¬A random walk on an ontology : using thesaurus structure for automatic subject indexing (2013) 0.00

0.0039665913 = product of:
  0.007933183 = sum of:
    0.007933183 = product of:
      0.023799548 = sum of:
        0.023799548 = weight(_text_:c in 1016) [ClassicSimilarity], result of:
          0.023799548 = score(doc=1016,freq=2.0), product of:
            0.15612034 = queryWeight, product of:
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.045260075 = queryNorm
            0.1524436 = fieldWeight in 1016, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.03125 = fieldNorm(doc=1016)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Gábor, K.; Zargayouna, H.; Tellier, I.; Buscaldi, D.; Charnois, T.: ¬A typology of semantic relations dedicated to scientific literature analysis (2016) 0.00

0.0036010346 = product of:
  0.0072020693 = sum of:
    0.0072020693 = product of:
      0.021606207 = sum of:
        0.021606207 = weight(_text_:h in 2933) [ClassicSimilarity], result of:
          0.021606207 = score(doc=2933,freq=2.0), product of:
            0.11244635 = queryWeight, product of:
              2.4844491 = idf(docFreq=10020, maxDocs=44218)
              0.045260075 = queryNorm
            0.19214681 = fieldWeight in 2933, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.4844491 = idf(docFreq=10020, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2933)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Munkelt, J.; Schaer, P.; Lepsky, K.: Towards an IR test collection for the German National Library (2018) 0.00
```
0.0030866012 = product of:
  0.0061732023 = sum of:
    0.0061732023 = product of:
      0.018519606 = sum of:
        0.018519606 = weight(_text_:h in 4311) [ClassicSimilarity], result of:
          0.018519606 = score(doc=4311,freq=2.0), product of:
            0.11244635 = queryWeight, product of:
              2.4844491 = idf(docFreq=10020, maxDocs=44218)
              0.045260075 = queryNorm
            0.16469726 = fieldWeight in 4311, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.4844491 = idf(docFreq=10020, maxDocs=44218)
              0.046875 = fieldNorm(doc=4311)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)
```
Abstract

Automatic content indexing is one of the innovations that are increasingly changing the way libraries work. In theory, it promises a cataloguing service that would hardly be possible with humans in terms of speed, quantity and maybe quality. The German National Library (DNB) has also recognised this potential and is increasingly relying on the automatic indexing of their catalogue content. The DNB took a major step in this direction in 2017, which was announced in two papers. The announcement was rather restrained, but the content of the papers is all the more explosive for the library community: Since September 2017, the DNB has discontinued the intellectual indexing of series Band H and has switched to an automatic process for these series. The subject indexing of online publications (series O) has been purely automatical since 2010; from September 2017, monographs and periodicals published outside the publishing industry and university publications will no longer be indexed by people. This raises the question: What is the quality of the automatic indexing compared to the manual work or in other words to which degree can the automatic indexing replace people without a signi cant drop in regards to quality?

Search (11 results, page 1 of 1)

Authors

Types

Themes