Search (19 results, page 1 of 1)

Gil-Leiva, I.: SISA-automatic indexing system for scientific articles : experiments with location heuristics rules versus TF-IDF rules (2017) 0.03

0.03199327 = product of:
  0.25594616 = sum of:
    0.08531539 = product of:
      0.17063078 = sum of:
        0.17063078 = weight(_text_:rules in 3622) [ClassicSimilarity], result of:
          0.17063078 = score(doc=3622,freq=20.0), product of:
            0.16161752 = queryWeight, product of:
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.032090448 = queryNorm
            1.0557691 = fieldWeight in 3622, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.046875 = fieldNorm(doc=3622)
      0.5 = coord(1/2)
    0.17063078 = weight(_text_:rules in 3622) [ClassicSimilarity], result of:
      0.17063078 = score(doc=3622,freq=20.0), product of:
        0.16161752 = queryWeight, product of:
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.032090448 = queryNorm
        1.0557691 = fieldWeight in 3622, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.046875 = fieldNorm(doc=3622)
  0.125 = coord(2/16)

Abstract: Indexing is contextualized and a brief description is provided of some of the most used automatic indexing systems. We describe SISA, a system which uses location heuristics rules, statistical rules like term frequency (TF) or TF-IDF to obtain automatic or semi-automatic indexing, depending on the user's preference. The aim of this research is to ascertain which rules (location heuristics rules or TF-IDF rules) provide the best indexing terms. SISA is used to obtain the automatic indexing of 200 scientific articles on fruit growing written in Portuguese. It uses, on the one hand, location heuristics rules founded on the value of certain parts of the articles for indexing such as titles, abstracts, keywords, headings, first paragraph, conclusions and references and, on the other, TF-IDF rules. The indexing is then evaluated to ascertain retrieval performance through recall, precision and f-measure. Automatic indexing of the articles with location heuristics rules provided the best results with the evaluation measures.

Junger, U.: Can indexing be automated? : the example of the Deutsche Nationalbibliothek (2012) 0.02

0.017346904 = product of:
  0.13877523 = sum of:
    0.069387615 = weight(_text_:cataloguing in 1717) [ClassicSimilarity], result of:
      0.069387615 = score(doc=1717,freq=4.0), product of:
        0.14268221 = queryWeight, product of:
          4.446252 = idf(docFreq=1408, maxDocs=44218)
          0.032090448 = queryNorm
        0.4863088 = fieldWeight in 1717, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.446252 = idf(docFreq=1408, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1717)
    0.069387615 = weight(_text_:cataloguing in 1717) [ClassicSimilarity], result of:
      0.069387615 = score(doc=1717,freq=4.0), product of:
        0.14268221 = queryWeight, product of:
          4.446252 = idf(docFreq=1408, maxDocs=44218)
          0.032090448 = queryNorm
        0.4863088 = fieldWeight in 1717, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.446252 = idf(docFreq=1408, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1717)
  0.125 = coord(2/16)

Abstract: The German subject headings authority file (Schlagwortnormdatei/SWD) provides a broad controlled vocabulary for indexing documents of all subjects. Traditionally used for intellectual subject cataloguing primarily of books the Deutsche Nationalbibliothek (DNB, German National Library) has been working on developping and implementing procedures for automated assignment of subject headings for online publications. This project, its results and problems are sketched in the paper.
Source: Cataloguing & Classification Quarterly 52(2014) no.1, S.102-109

Martins, A.L.; Souza, R.R.; Ribeiro de Mello, H.: ¬The use of noun phrases in information retrieval : proposing a mechanism for automatic classification (2014) 0.02

0.016740523 = product of:
  0.08928279 = sum of:
    0.017986061 = product of:
      0.035972122 = sum of:
        0.035972122 = weight(_text_:rules in 1441) [ClassicSimilarity], result of:
          0.035972122 = score(doc=1441,freq=2.0), product of:
            0.16161752 = queryWeight, product of:
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.032090448 = queryNorm
            0.22257565 = fieldWeight in 1441, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.03125 = fieldNorm(doc=1441)
      0.5 = coord(1/2)
    0.035972122 = weight(_text_:rules in 1441) [ClassicSimilarity], result of:
      0.035972122 = score(doc=1441,freq=2.0), product of:
        0.16161752 = queryWeight, product of:
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.032090448 = queryNorm
        0.22257565 = fieldWeight in 1441, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.03125 = fieldNorm(doc=1441)
    0.035324603 = sum of:
      0.017933354 = weight(_text_:ed in 1441) [ClassicSimilarity], result of:
        0.017933354 = score(doc=1441,freq=2.0), product of:
          0.11411327 = queryWeight, product of:
            3.5559888 = idf(docFreq=3431, maxDocs=44218)
            0.032090448 = queryNorm
          0.15715398 = fieldWeight in 1441, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5559888 = idf(docFreq=3431, maxDocs=44218)
            0.03125 = fieldNorm(doc=1441)
      0.017391251 = weight(_text_:22 in 1441) [ClassicSimilarity], result of:
        0.017391251 = score(doc=1441,freq=2.0), product of:
          0.11237528 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.032090448 = queryNorm
          0.15476047 = fieldWeight in 1441, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.03125 = fieldNorm(doc=1441)
  0.1875 = coord(3/16)

Abstract: This paper presents a research on syntactic structures known as noun phrases (NP) being applied to increase the effectiveness and efficiency of the mechanisms for the document's classification. Our hypothesis is the fact that the NP can be used instead of single words as a semantic aggregator to reduce the number of words that will be used for the classification system without losing its semantic coverage, increasing its efficiency. The experiment divided the documents classification process in three phases: a) NP preprocessing b) system training; and c) classification experiments. In the first step, a corpus of digitalized texts was submitted to a natural language processing platform1 in which the part-of-speech tagging was done, and them PERL scripts pertaining to the PALAVRAS package were used to extract the Noun Phrases. The preprocessing also involved the tasks of a) removing NP low meaning pre-modifiers, as quantifiers; b) identification of synonyms and corresponding substitution for common hyperonyms; and c) stemming of the relevant words contained in the NP, for similitude checking with other NPs. The first tests with the resulting documents have demonstrated its effectiveness. We have compared the structural similarity of the documents before and after the whole pre-processing steps of phase one. The texts maintained the consistency with the original and have kept the readability. The second phase involves submitting the modified documents to a SVM algorithm to identify clusters and classify the documents. The classification rules are to be established using a machine learning approach. Finally, tests will be conducted to check the effectiveness of the whole process.
Source: Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik

Vlachidis, A.; Tudhope, D.: ¬A knowledge-based approach to information extraction for semantic interoperability in the archaeology domain (2016) 0.01
```
0.011923187 = product of:
  0.0953855 = sum of:
    0.031795166 = product of:
      0.06359033 = sum of:
        0.06359033 = weight(_text_:rules in 2895) [ClassicSimilarity], result of:
          0.06359033 = score(doc=2895,freq=4.0), product of:
            0.16161752 = queryWeight, product of:
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.032090448 = queryNorm
            0.39346188 = fieldWeight in 2895, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2895)
      0.5 = coord(1/2)
    0.06359033 = weight(_text_:rules in 2895) [ClassicSimilarity], result of:
      0.06359033 = score(doc=2895,freq=4.0), product of:
        0.16161752 = queryWeight, product of:
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.032090448 = queryNorm
        0.39346188 = fieldWeight in 2895, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2895)
  0.125 = coord(2/16)
```
Abstract

The article presents a method for automatic semantic indexing of archaeological grey-literature reports using empirical (rule-based) Information Extraction techniques in combination with domain-specific knowledge organization systems. The semantic annotation system (OPTIMA) performs the tasks of Named Entity Recognition, Relation Extraction, Negation Detection, and Word-Sense Disambiguation using hand-crafted rules and terminological resources for associating contextual abstractions with classes of the standard ontology CIDOC Conceptual Reference Model (CRM) for cultural heritage and its archaeological extension, CRM-EH. Relation Extraction (RE) performance benefits from a syntactic-based definition of RE patterns derived from domain oriented corpus analysis. The evaluation also shows clear benefit in the use of assistive natural language processing (NLP) modules relating to Word-Sense Disambiguation, Negation Detection, and Noun Phrase Validation, together with controlled thesaurus expansion. The semantic indexing results demonstrate the capacity of rule-based Information Extraction techniques to deliver interoperable semantic abstractions (semantic annotations) with respect to the CIDOC CRM and archaeological thesauri. Major contributions include recognition of relevant entities using shallow parsing NLP techniques driven by a complimentary use of ontological and terminological domain resources and empirical derivation of context-driven RE rules for the recognition of semantic relationships from phrases of unstructured text.
Greiner-Petter, A.; Schubotz, M.; Cohl, H.S.; Gipp, B.: Semantic preserving bijective mappings for expressions involving special functions between computer algebra systems and document preparation systems (2019) 0.01
```
0.011747589 = product of:
  0.06265381 = sum of:
    0.017986061 = product of:
      0.035972122 = sum of:
        0.035972122 = weight(_text_:rules in 5499) [ClassicSimilarity], result of:
          0.035972122 = score(doc=5499,freq=2.0), product of:
            0.16161752 = queryWeight, product of:
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.032090448 = queryNorm
            0.22257565 = fieldWeight in 5499, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.03125 = fieldNorm(doc=5499)
      0.5 = coord(1/2)
    0.035972122 = weight(_text_:rules in 5499) [ClassicSimilarity], result of:
      0.035972122 = score(doc=5499,freq=2.0), product of:
        0.16161752 = queryWeight, product of:
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.032090448 = queryNorm
        0.22257565 = fieldWeight in 5499, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.03125 = fieldNorm(doc=5499)
    0.008695626 = product of:
      0.017391251 = sum of:
        0.017391251 = weight(_text_:22 in 5499) [ClassicSimilarity], result of:
          0.017391251 = score(doc=5499,freq=2.0), product of:
            0.11237528 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.032090448 = queryNorm
            0.15476047 = fieldWeight in 5499, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=5499)
      0.5 = coord(1/2)
  0.1875 = coord(3/16)
```
Abstract

Purpose Modern mathematicians and scientists of math-related disciplines often use Document Preparation Systems (DPS) to write and Computer Algebra Systems (CAS) to calculate mathematical expressions. Usually, they translate the expressions manually between DPS and CAS. This process is time-consuming and error-prone. The purpose of this paper is to automate this translation. This paper uses Maple and Mathematica as the CAS, and LaTeX as the DPS. Design/methodology/approach Bruce Miller at the National Institute of Standards and Technology (NIST) developed a collection of special LaTeX macros that create links from mathematical symbols to their definitions in the NIST Digital Library of Mathematical Functions (DLMF). The authors are using these macros to perform rule-based translations between the formulae in the DLMF and CAS. Moreover, the authors develop software to ease the creation of new rules and to discover inconsistencies. Findings The authors created 396 mappings and translated 58.8 percent of DLMF formulae (2,405 expressions) successfully between Maple and DLMF. For a significant percentage, the special function definitions in Maple and the DLMF were different. An atomic symbol in one system maps to a composite expression in the other system. The translator was also successfully used for automatic verification of mathematical online compendia and CAS. The evaluation techniques discovered two errors in the DLMF and one defect in Maple. Originality/value This paper introduces the first translation tool for special functions between LaTeX and CAS. The approach improves error-prone manual translations and can be used to verify mathematical online compendia and CAS.

Date

20. 1.2015 18:30:22
Munkelt, J.; Schaer, P.; Lepsky, K.: Towards an IR test collection for the German National Library (2018) 0.01
```
0.010513811 = product of:
  0.08411049 = sum of:
    0.042055245 = weight(_text_:cataloguing in 4311) [ClassicSimilarity], result of:
      0.042055245 = score(doc=4311,freq=2.0), product of:
        0.14268221 = queryWeight, product of:
          4.446252 = idf(docFreq=1408, maxDocs=44218)
          0.032090448 = queryNorm
        0.29474765 = fieldWeight in 4311, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.446252 = idf(docFreq=1408, maxDocs=44218)
          0.046875 = fieldNorm(doc=4311)
    0.042055245 = weight(_text_:cataloguing in 4311) [ClassicSimilarity], result of:
      0.042055245 = score(doc=4311,freq=2.0), product of:
        0.14268221 = queryWeight, product of:
          4.446252 = idf(docFreq=1408, maxDocs=44218)
          0.032090448 = queryNorm
        0.29474765 = fieldWeight in 4311, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.446252 = idf(docFreq=1408, maxDocs=44218)
          0.046875 = fieldNorm(doc=4311)
  0.125 = coord(2/16)
```
Abstract

Automatic content indexing is one of the innovations that are increasingly changing the way libraries work. In theory, it promises a cataloguing service that would hardly be possible with humans in terms of speed, quantity and maybe quality. The German National Library (DNB) has also recognised this potential and is increasingly relying on the automatic indexing of their catalogue content. The DNB took a major step in this direction in 2017, which was announced in two papers. The announcement was rather restrained, but the content of the papers is all the more explosive for the library community: Since September 2017, the DNB has discontinued the intellectual indexing of series Band H and has switched to an automatic process for these series. The subject indexing of online publications (series O) has been purely automatical since 2010; from September 2017, monographs and periodicals published outside the publishing industry and university publications will no longer be indexed by people. This raises the question: What is the quality of the automatic indexing compared to the manual work or in other words to which degree can the automatic indexing replace people without a signi cant drop in regards to quality?
Strobel, S.; Marín-Arraiza, P.: Metadata for scientific audiovisual media : current practices and perspectives of the TIB / AV-portal (2015) 0.01
```
0.009715533 = product of:
  0.07772426 = sum of:
    0.022109302 = weight(_text_:26 in 3667) [ClassicSimilarity], result of:
      0.022109302 = score(doc=3667,freq=2.0), product of:
        0.113328174 = queryWeight, product of:
          3.5315237 = idf(docFreq=3516, maxDocs=44218)
          0.032090448 = queryNorm
        0.19509095 = fieldWeight in 3667, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5315237 = idf(docFreq=3516, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3667)
    0.055614963 = weight(_text_:descriptive in 3667) [ClassicSimilarity], result of:
      0.055614963 = score(doc=3667,freq=2.0), product of:
        0.17974061 = queryWeight, product of:
          5.601063 = idf(docFreq=443, maxDocs=44218)
          0.032090448 = queryNorm
        0.3094179 = fieldWeight in 3667, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.601063 = idf(docFreq=443, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3667)
  0.125 = coord(2/16)
```
Abstract

Descriptive metadata play a key role in finding relevant search results in large amounts of unstructured data. However, current scientific audiovisual media are provided with little metadata, which makes them hard to find, let alone individual sequences. In this paper, the TIB / AV-Portal is presented as a use case where methods concerning the automatic generation of metadata, a semantic search and cross-lingual retrieval (German/English) have already been applied. These methods result in a better discoverability of the scientific audiovisual media hosted in the portal. Text, speech, and image content of the video are automatically indexed by specialised GND (Gemeinsame Normdatei) subject headings. A semantic search is established based on properties of the GND ontology. The cross-lingual retrieval uses English 'translations' that were derived by an ontology mapping (DBpedia i. a.). Further ways of increasing the discoverability and reuse of the metadata are publishing them as Linked Open Data and interlinking them with other data sets.

Date

19.12.2014 19:26:51
Mesquita, L.A.P.; Souza, R.R.; Baracho Porto, R.M.A.: Noun phrases in automatic indexing: : a structural analysis of the distribution of relevant terms in doctoral theses (2014) 0.01
```
0.008542268 = product of:
  0.06833814 = sum of:
    0.033013538 = weight(_text_:author in 1442) [ClassicSimilarity], result of:
      0.033013538 = score(doc=1442,freq=2.0), product of:
        0.15482868 = queryWeight, product of:
          4.824759 = idf(docFreq=964, maxDocs=44218)
          0.032090448 = queryNorm
        0.21322623 = fieldWeight in 1442, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.824759 = idf(docFreq=964, maxDocs=44218)
          0.03125 = fieldNorm(doc=1442)
    0.035324603 = sum of:
      0.017933354 = weight(_text_:ed in 1442) [ClassicSimilarity], result of:
        0.017933354 = score(doc=1442,freq=2.0), product of:
          0.11411327 = queryWeight, product of:
            3.5559888 = idf(docFreq=3431, maxDocs=44218)
            0.032090448 = queryNorm
          0.15715398 = fieldWeight in 1442, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5559888 = idf(docFreq=3431, maxDocs=44218)
            0.03125 = fieldNorm(doc=1442)
      0.017391251 = weight(_text_:22 in 1442) [ClassicSimilarity], result of:
        0.017391251 = score(doc=1442,freq=2.0), product of:
          0.11237528 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.032090448 = queryNorm
          0.15476047 = fieldWeight in 1442, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.03125 = fieldNorm(doc=1442)
  0.125 = coord(2/16)
```
Abstract

The main objective of this research was to analyze whether there was a characteristic distribution behavior of relevant terms over a scientific text that could contribute as a criterion for their process of automatic indexing. The terms considered in this study were only full noun phrases contained in the texts themselves. The texts were considered a total of 98 doctoral theses of the eight areas of knowledge in a same university. Initially, 20 full noun phrases were automatically extracted from each text as candidates to be the most relevant terms, and each author of each text assigned a relevance value 0-6 (not relevant and highly relevant, respectively) for each of the 20 noun phrases sent. Only, 22.1 % of noun phrases were considered not relevant. A relevance values of the terms assigned by the authors were associated with their positions in the text. Each full noun phrases found in the text was considered as a valid linear position. The results that were obtained showed values resulting from this distribution by considering two types of position: linear, with values consolidated into ten equal consecutive parts; and structural, considering parts of the text (such as introduction, development and conclusion). As a result of considerable importance, all areas of knowledge related to the Natural Sciences showed a characteristic behavior in the distribution of relevant terms, as well as all areas of knowledge related to Social Sciences showed the same characteristic behavior of distribution, but distinct from the Natural Sciences. The difference of the distribution behavior between the Natural and Social Sciences can be clearly visualized through graphs. All behaviors, including the general behavior of all areas of knowledge together, were characterized in polynomial equations and can be applied in future as criteria for automatic indexing. Until the present date this work has become inedited of for two reasons: to present a method for characterizing the distribution of relevant terms in a scientific text, and also, through this method, pointing out a quantitative trait difference between the Natural and Social Sciences.

Source

Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik
Keller, A.: Attitudes among German- and English-speaking librarians toward (automatic) subject indexing (2015) 0.00
```
0.0036108557 = product of:
  0.05777369 = sum of:
    0.05777369 = weight(_text_:author in 2629) [ClassicSimilarity], result of:
      0.05777369 = score(doc=2629,freq=2.0), product of:
        0.15482868 = queryWeight, product of:
          4.824759 = idf(docFreq=964, maxDocs=44218)
          0.032090448 = queryNorm
        0.3731459 = fieldWeight in 2629, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.824759 = idf(docFreq=964, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2629)
  0.0625 = coord(1/16)
```
Abstract

The survey described in this article investigates the attitudes of librarians in German- and English-speaking countries toward subject indexing in general, and automatic subject indexing in particular. The results show great similarity between attitudes in both language areas. Respondents agree that the current quality standards should be upheld and dismiss critical voices claiming that subject indexing has lost relevance. With regard to automatic subject indexing, respondents demonstrate considerable skepticism-both with regard to the likely timeframe and the expected quality of such systems. The author considers how this low acceptance poses a difficulty for those involved in change management.
Blank, I.; Rokach, L.; Shani, G.: Leveraging metadata to recommend keywords for academic papers (2016) 0.00
```
0.0034759352 = product of:
  0.055614963 = sum of:
    0.055614963 = weight(_text_:descriptive in 3232) [ClassicSimilarity], result of:
      0.055614963 = score(doc=3232,freq=2.0), product of:
        0.17974061 = queryWeight, product of:
          5.601063 = idf(docFreq=443, maxDocs=44218)
          0.032090448 = queryNorm
        0.3094179 = fieldWeight in 3232, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.601063 = idf(docFreq=443, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3232)
  0.0625 = coord(1/16)
```
Abstract

Users of research databases, such as CiteSeerX, Google Scholar, and Microsoft Academic, often search for papers using a set of keywords. Unfortunately, many authors avoid listing sufficient keywords for their papers. As such, these applications may need to automatically associate good descriptive keywords with papers. When the full text of the paper is available this problem has been thoroughly studied. In many cases, however, due to copyright limitations, research databases do not have access to the full text. On the other hand, such databases typically maintain metadata, such as the title and abstract and the citation network of each paper. In this paper we study the problem of predicting which keywords are appropriate for a research paper, using different methods based on the citation network and available metadata. Our main goal is in providing search engines with the ability to extract keywords from the available metadata. However, our system can also be used for other applications, such as for recommending keywords for the authors of new papers. We create a data set of research papers, and their citation network, keywords, and other metadata, containing over 470K papers with and more than 2 million keywords. We compare our methods with predicting keywords using the title and abstract, in offline experiments and in a user study, concluding that the citation network provides much better predictions.

Williams, R.V.: Hans Peter Luhn and Herbert M. Ohlman : their roles in the origins of keyword-in-context/permutation automatic indexing (2010) 0.00

0.002060612 = product of:
  0.03296979 = sum of:
    0.03296979 = weight(_text_:american in 3440) [ClassicSimilarity], result of:
      0.03296979 = score(doc=3440,freq=2.0), product of:
        0.10940785 = queryWeight, product of:
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.032090448 = queryNorm
        0.30134758 = fieldWeight in 3440, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.0625 = fieldNorm(doc=3440)
  0.0625 = coord(1/16)

Source: Journal of the American Society for Information Science and Technology. 61(2010) no.4, S.835-849

Lepsky, K.; Müller, T.; Wille, J.: Metadata improvement for image information retrieval (2010) 0.00

0.001934564 = product of:
  0.030953024 = sum of:
    0.030953024 = weight(_text_:26 in 4995) [ClassicSimilarity], result of:
      0.030953024 = score(doc=4995,freq=2.0), product of:
        0.113328174 = queryWeight, product of:
          3.5315237 = idf(docFreq=3516, maxDocs=44218)
          0.032090448 = queryNorm
        0.27312735 = fieldWeight in 4995, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5315237 = idf(docFreq=3516, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4995)
  0.0625 = coord(1/16)

Source: Paradigms and conceptual systems in knowledge organization: Proceedings of the Eleventh International ISKO Conference, 23-26 February 2010 Rome, Italy. Edited by Claudio Gnoli and Fulvio Mazzocchi

Wolfe, EW.: a case study in automated metadata enhancement : Natural Language Processing in the humanities (2019) 0.00
```
0.0018030355 = product of:
  0.028848568 = sum of:
    0.028848568 = weight(_text_:american in 5236) [ClassicSimilarity], result of:
      0.028848568 = score(doc=5236,freq=2.0), product of:
        0.10940785 = queryWeight, product of:
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.032090448 = queryNorm
        0.26367915 = fieldWeight in 5236, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5236)
  0.0625 = coord(1/16)
```
Abstract

The Black Book Interactive Project at the University of Kansas (KU) is developing an expanded corpus of novels by African American authors, with an emphasis on lesser known writers and a goal of expanding research in this field. Using a custom metadata schema with an emphasis on race-related elements, each novel is analyzed for a variety of elements such as literary style, targeted content analysis, historical context, and other areas. Librarians at KU have worked to develop a variety of computational text analysis processes designed to assist with specific aspects of this metadata collection, including text mining and natural language processing, automated subject extraction based on word sense disambiguation, harvesting data from Wikidata, and other actions.
Markoff, J.: Researchers announce advance in image-recognition software (2014) 0.00
```
0.0017379676 = product of:
  0.027807482 = sum of:
    0.027807482 = weight(_text_:descriptive in 1875) [ClassicSimilarity], result of:
      0.027807482 = score(doc=1875,freq=2.0), product of:
        0.17974061 = queryWeight, product of:
          5.601063 = idf(docFreq=443, maxDocs=44218)
          0.032090448 = queryNorm
        0.15470895 = fieldWeight in 1875, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.601063 = idf(docFreq=443, maxDocs=44218)
          0.01953125 = fieldNorm(doc=1875)
  0.0625 = coord(1/16)
```
Content

In living organisms, webs of neurons in the brain vastly outperform even the best computer-based networks in perception and pattern recognition. But by adopting some of the same architecture, computers are catching up, learning to identify patterns in speech and imagery with increasing accuracy. The advances are apparent to consumers who use Apple's Siri personal assistant, for example, or Google's image search. Both groups of researchers employed similar approaches, weaving together two types of neural networks, one focused on recognizing images and the other on human language. In both cases the researchers trained the software with relatively small sets of digital images that had been annotated with descriptive sentences by humans. After the software programs "learned" to see patterns in the pictures and description, the researchers turned them on previously unseen images. The programs were able to identify objects and actions with roughly double the accuracy of earlier efforts, although still nowhere near human perception capabilities. "I was amazed that even with the small amount of training data that we were able to do so well," said Oriol Vinyals, a Google computer scientist who wrote the paper with Alexander Toshev, Samy Bengio and Dumitru Erhan, members of the Google Brain project. "The field is just starting, and we will see a lot of increases."

Benson, A.C.: Image descriptions and their relational expressions : a review of the literature and the issues (2015) 0.00

0.0016581976 = product of:
  0.026531162 = sum of:
    0.026531162 = weight(_text_:26 in 1867) [ClassicSimilarity], result of:
      0.026531162 = score(doc=1867,freq=2.0), product of:
        0.113328174 = queryWeight, product of:
          3.5315237 = idf(docFreq=3516, maxDocs=44218)
          0.032090448 = queryNorm
        0.23410915 = fieldWeight in 1867, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5315237 = idf(docFreq=3516, maxDocs=44218)
          0.046875 = fieldNorm(doc=1867)
  0.0625 = coord(1/16)

Date: 24. 5.2015 19:26:43

Cui, H.; Boufford, D.; Selden, P.: Semantic annotation of biosystematics literature without training examples (2010) 0.00

0.0015454589 = product of:
  0.024727343 = sum of:
    0.024727343 = weight(_text_:american in 3422) [ClassicSimilarity], result of:
      0.024727343 = score(doc=3422,freq=2.0), product of:
        0.10940785 = queryWeight, product of:
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.032090448 = queryNorm
        0.22601068 = fieldWeight in 3422, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.046875 = fieldNorm(doc=3422)
  0.0625 = coord(1/16)

Source: Journal of the American Society for Information Science and Technology. 61(2010) no.3, S.522-542

Stankovic, R. et al.: Indexing of textual databases based on lexical resources : a case study for Serbian (2016) 0.00

0.0013586915 = product of:
  0.021739064 = sum of:
    0.021739064 = product of:
      0.043478128 = sum of:
        0.043478128 = weight(_text_:22 in 2759) [ClassicSimilarity], result of:
          0.043478128 = score(doc=2759,freq=2.0), product of:
            0.11237528 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.032090448 = queryNorm
            0.38690117 = fieldWeight in 2759, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2759)
      0.5 = coord(1/2)
  0.0625 = coord(1/16)

Date: 1. 2.2016 18:25:22

Chung, E.-K.; Miksa, S.; Hastings, S.K.: ¬A framework of automatic subject term assignment for text categorization : an indexing conception-based approach (2010) 0.00

0.001030306 = product of:
  0.016484896 = sum of:
    0.016484896 = weight(_text_:american in 3434) [ClassicSimilarity], result of:
      0.016484896 = score(doc=3434,freq=2.0), product of:
        0.10940785 = queryWeight, product of:
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.032090448 = queryNorm
        0.15067379 = fieldWeight in 3434, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.03125 = fieldNorm(doc=3434)
  0.0625 = coord(1/16)

Source: Journal of the American Society for Information Science and Technology. 61(2010) no.4, S.688-699

Willis, C.; Losee, R.M.: ¬A random walk on an ontology : using thesaurus structure for automatic subject indexing (2013) 0.00

0.001030306 = product of:
  0.016484896 = sum of:
    0.016484896 = weight(_text_:american in 1016) [ClassicSimilarity], result of:
      0.016484896 = score(doc=1016,freq=2.0), product of:
        0.10940785 = queryWeight, product of:
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.032090448 = queryNorm
        0.15067379 = fieldWeight in 1016, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.03125 = fieldNorm(doc=1016)
  0.0625 = coord(1/16)

Source: Journal of the American Society for Information Science and Technology. 64(2013) no.7, S.1330-1344

Search (19 results, page 1 of 1)

Authors

Types

Themes