Search (9 results, page 1 of 1)

  • × language_ss:"e"
  • × theme_ss:"Automatisches Indexieren"
  • × year_i:[2010 TO 2020}
  1. Greiner-Petter, A.; Schubotz, M.; Cohl, H.S.; Gipp, B.: Semantic preserving bijective mappings for expressions involving special functions between computer algebra systems and document preparation systems (2019) 0.03
    0.03334463 = sum of:
      0.01983233 = product of:
        0.07932932 = sum of:
          0.07932932 = weight(_text_:authors in 5499) [ClassicSimilarity], result of:
            0.07932932 = score(doc=5499,freq=6.0), product of:
              0.2273297 = queryWeight, product of:
                4.558814 = idf(docFreq=1258, maxDocs=44218)
                0.049865972 = queryNorm
              0.34896153 = fieldWeight in 5499, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.558814 = idf(docFreq=1258, maxDocs=44218)
                0.03125 = fieldNorm(doc=5499)
        0.25 = coord(1/4)
      0.013512301 = product of:
        0.027024603 = sum of:
          0.027024603 = weight(_text_:22 in 5499) [ClassicSimilarity], result of:
            0.027024603 = score(doc=5499,freq=2.0), product of:
              0.17462213 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.049865972 = queryNorm
              0.15476047 = fieldWeight in 5499, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.03125 = fieldNorm(doc=5499)
        0.5 = coord(1/2)
    
    Abstract
    Purpose Modern mathematicians and scientists of math-related disciplines often use Document Preparation Systems (DPS) to write and Computer Algebra Systems (CAS) to calculate mathematical expressions. Usually, they translate the expressions manually between DPS and CAS. This process is time-consuming and error-prone. The purpose of this paper is to automate this translation. This paper uses Maple and Mathematica as the CAS, and LaTeX as the DPS. Design/methodology/approach Bruce Miller at the National Institute of Standards and Technology (NIST) developed a collection of special LaTeX macros that create links from mathematical symbols to their definitions in the NIST Digital Library of Mathematical Functions (DLMF). The authors are using these macros to perform rule-based translations between the formulae in the DLMF and CAS. Moreover, the authors develop software to ease the creation of new rules and to discover inconsistencies. Findings The authors created 396 mappings and translated 58.8 percent of DLMF formulae (2,405 expressions) successfully between Maple and DLMF. For a significant percentage, the special function definitions in Maple and the DLMF were different. An atomic symbol in one system maps to a composite expression in the other system. The translator was also successfully used for automatic verification of mathematical online compendia and CAS. The evaluation techniques discovered two errors in the DLMF and one defect in Maple. Originality/value This paper introduces the first translation tool for special functions between LaTeX and CAS. The approach improves error-prone manual translations and can be used to verify mathematical online compendia and CAS.
    Date
    20. 1.2015 18:30:22
  2. Mesquita, L.A.P.; Souza, R.R.; Baracho Porto, R.M.A.: Noun phrases in automatic indexing: : a structural analysis of the distribution of relevant terms in doctoral theses (2014) 0.02
    0.024962502 = sum of:
      0.0114502 = product of:
        0.0458008 = sum of:
          0.0458008 = weight(_text_:authors in 1442) [ClassicSimilarity], result of:
            0.0458008 = score(doc=1442,freq=2.0), product of:
              0.2273297 = queryWeight, product of:
                4.558814 = idf(docFreq=1258, maxDocs=44218)
                0.049865972 = queryNorm
              0.20147301 = fieldWeight in 1442, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.558814 = idf(docFreq=1258, maxDocs=44218)
                0.03125 = fieldNorm(doc=1442)
        0.25 = coord(1/4)
      0.013512301 = product of:
        0.027024603 = sum of:
          0.027024603 = weight(_text_:22 in 1442) [ClassicSimilarity], result of:
            0.027024603 = score(doc=1442,freq=2.0), product of:
              0.17462213 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.049865972 = queryNorm
              0.15476047 = fieldWeight in 1442, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.03125 = fieldNorm(doc=1442)
        0.5 = coord(1/2)
    
    Abstract
    The main objective of this research was to analyze whether there was a characteristic distribution behavior of relevant terms over a scientific text that could contribute as a criterion for their process of automatic indexing. The terms considered in this study were only full noun phrases contained in the texts themselves. The texts were considered a total of 98 doctoral theses of the eight areas of knowledge in a same university. Initially, 20 full noun phrases were automatically extracted from each text as candidates to be the most relevant terms, and each author of each text assigned a relevance value 0-6 (not relevant and highly relevant, respectively) for each of the 20 noun phrases sent. Only, 22.1 % of noun phrases were considered not relevant. A relevance values of the terms assigned by the authors were associated with their positions in the text. Each full noun phrases found in the text was considered as a valid linear position. The results that were obtained showed values resulting from this distribution by considering two types of position: linear, with values consolidated into ten equal consecutive parts; and structural, considering parts of the text (such as introduction, development and conclusion). As a result of considerable importance, all areas of knowledge related to the Natural Sciences showed a characteristic behavior in the distribution of relevant terms, as well as all areas of knowledge related to Social Sciences showed the same characteristic behavior of distribution, but distinct from the Natural Sciences. The difference of the distribution behavior between the Natural and Social Sciences can be clearly visualized through graphs. All behaviors, including the general behavior of all areas of knowledge together, were characterized in polynomial equations and can be applied in future as criteria for automatic indexing. Until the present date this work has become inedited of for two reasons: to present a method for characterizing the distribution of relevant terms in a scientific text, and also, through this method, pointing out a quantitative trait difference between the Natural and Social Sciences.
    Source
    Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik
  3. Stankovic, R. et al.: Indexing of textual databases based on lexical resources : a case study for Serbian (2016) 0.02
    0.016890377 = product of:
      0.033780754 = sum of:
        0.033780754 = product of:
          0.06756151 = sum of:
            0.06756151 = weight(_text_:22 in 2759) [ClassicSimilarity], result of:
              0.06756151 = score(doc=2759,freq=2.0), product of:
                0.17462213 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049865972 = queryNorm
                0.38690117 = fieldWeight in 2759, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=2759)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    1. 2.2016 18:25:22
  4. Blank, I.; Rokach, L.; Shani, G.: Leveraging metadata to recommend keywords for academic papers (2016) 0.01
    0.010120643 = product of:
      0.020241287 = sum of:
        0.020241287 = product of:
          0.08096515 = sum of:
            0.08096515 = weight(_text_:authors in 3232) [ClassicSimilarity], result of:
              0.08096515 = score(doc=3232,freq=4.0), product of:
                0.2273297 = queryWeight, product of:
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.049865972 = queryNorm
                0.35615736 = fieldWeight in 3232, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3232)
          0.25 = coord(1/4)
      0.5 = coord(1/2)
    
    Abstract
    Users of research databases, such as CiteSeerX, Google Scholar, and Microsoft Academic, often search for papers using a set of keywords. Unfortunately, many authors avoid listing sufficient keywords for their papers. As such, these applications may need to automatically associate good descriptive keywords with papers. When the full text of the paper is available this problem has been thoroughly studied. In many cases, however, due to copyright limitations, research databases do not have access to the full text. On the other hand, such databases typically maintain metadata, such as the title and abstract and the citation network of each paper. In this paper we study the problem of predicting which keywords are appropriate for a research paper, using different methods based on the citation network and available metadata. Our main goal is in providing search engines with the ability to extract keywords from the available metadata. However, our system can also be used for other applications, such as for recommending keywords for the authors of new papers. We create a data set of research papers, and their citation network, keywords, and other metadata, containing over 470K papers with and more than 2 million keywords. We compare our methods with predicting keywords using the title and abstract, in offline experiments and in a user study, concluding that the citation network provides much better predictions.
  5. Wolfe, EW.: a case study in automated metadata enhancement : Natural Language Processing in the humanities (2019) 0.01
    0.010018925 = product of:
      0.02003785 = sum of:
        0.02003785 = product of:
          0.0801514 = sum of:
            0.0801514 = weight(_text_:authors in 5236) [ClassicSimilarity], result of:
              0.0801514 = score(doc=5236,freq=2.0), product of:
                0.2273297 = queryWeight, product of:
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.049865972 = queryNorm
                0.35257778 = fieldWeight in 5236, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5236)
          0.25 = coord(1/4)
      0.5 = coord(1/2)
    
    Abstract
    The Black Book Interactive Project at the University of Kansas (KU) is developing an expanded corpus of novels by African American authors, with an emphasis on lesser known writers and a goal of expanding research in this field. Using a custom metadata schema with an emphasis on race-related elements, each novel is analyzed for a variety of elements such as literary style, targeted content analysis, historical context, and other areas. Librarians at KU have worked to develop a variety of computational text analysis processes designed to assist with specific aspects of this metadata collection, including text mining and natural language processing, automated subject extraction based on word sense disambiguation, harvesting data from Wikidata, and other actions.
  6. Smiraglia, R.P.; Cai, X.: Tracking the evolution of clustering, machine learning, automatic indexing and automatic classification in knowledge organization (2017) 0.01
    0.0071563753 = product of:
      0.014312751 = sum of:
        0.014312751 = product of:
          0.057251003 = sum of:
            0.057251003 = weight(_text_:authors in 3627) [ClassicSimilarity], result of:
              0.057251003 = score(doc=3627,freq=2.0), product of:
                0.2273297 = queryWeight, product of:
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.049865972 = queryNorm
                0.25184128 = fieldWeight in 3627, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3627)
          0.25 = coord(1/4)
      0.5 = coord(1/2)
    
    Abstract
    A very important extension of the traditional domain of knowledge organization (KO) arises from attempts to incorporate techniques devised in the computer science domain for automatic concept extraction and for grouping, categorizing, clustering and otherwise organizing knowledge using mechanical means. Four specific terms have emerged to identify the most prevalent techniques: machine learning, clustering, automatic indexing, and automatic classification. Our study presents three domain analytical case analyses in search of answers. The first case relies on citations located using the ISKO-supported "Knowledge Organization Bibliography." The second case relies on works in both Web of Science and SCOPUS. Case three applies co-word analysis and citation analysis to the contents of the papers in the present special issue. We observe scholars involved in "clustering" and "automatic classification" who share common thematic emphases. But we have found no coherence, no common activity and no social semantics. We have not found a research front, or a common teleology within the KO domain. We also have found a lively group of authors who have succeeded in submitting papers to this special issue, and their work quite interestingly aligns with the case studies we report. There is an emphasis on KO for information retrieval; there is much work on clustering (which involves conceptual points within texts) and automatic classification (which involves semantic groupings at the meta-document level).
  7. Wang, S.; Koopman, R.: Embed first, then predict (2019) 0.01
    0.0071563753 = product of:
      0.014312751 = sum of:
        0.014312751 = product of:
          0.057251003 = sum of:
            0.057251003 = weight(_text_:authors in 5400) [ClassicSimilarity], result of:
              0.057251003 = score(doc=5400,freq=2.0), product of:
                0.2273297 = queryWeight, product of:
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.049865972 = queryNorm
                0.25184128 = fieldWeight in 5400, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5400)
          0.25 = coord(1/4)
      0.5 = coord(1/2)
    
    Abstract
    Automatic subject prediction is a desirable feature for modern digital library systems, as manual indexing can no longer cope with the rapid growth of digital collections. It is also desirable to be able to identify a small set of entities (e.g., authors, citations, bibliographic records) which are most relevant to a query. This gets more difficult when the amount of data increases dramatically. Data sparsity and model scalability are the major challenges to solving this type of extreme multilabel classification problem automatically. In this paper, we propose to address this problem in two steps: we first embed different types of entities into the same semantic space, where similarity could be computed easily; second, we propose a novel non-parametric method to identify the most relevant entities in addition to direct semantic similarities. We show how effectively this approach predicts even very specialised subjects, which are associated with few documents in the training set and are more problematic for a classifier.
  8. Martins, A.L.; Souza, R.R.; Ribeiro de Mello, H.: ¬The use of noun phrases in information retrieval : proposing a mechanism for automatic classification (2014) 0.01
    0.0067561506 = product of:
      0.013512301 = sum of:
        0.013512301 = product of:
          0.027024603 = sum of:
            0.027024603 = weight(_text_:22 in 1441) [ClassicSimilarity], result of:
              0.027024603 = score(doc=1441,freq=2.0), product of:
                0.17462213 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049865972 = queryNorm
                0.15476047 = fieldWeight in 1441, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03125 = fieldNorm(doc=1441)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik
  9. Chung, E.-K.; Miksa, S.; Hastings, S.K.: ¬A framework of automatic subject term assignment for text categorization : an indexing conception-based approach (2010) 0.01
    0.0057251 = product of:
      0.0114502 = sum of:
        0.0114502 = product of:
          0.0458008 = sum of:
            0.0458008 = weight(_text_:authors in 3434) [ClassicSimilarity], result of:
              0.0458008 = score(doc=3434,freq=2.0), product of:
                0.2273297 = queryWeight, product of:
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.049865972 = queryNorm
                0.20147301 = fieldWeight in 3434, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.03125 = fieldNorm(doc=3434)
          0.25 = coord(1/4)
      0.5 = coord(1/2)
    
    Abstract
    The purpose of this study is to examine whether the understandings of subject-indexing processes conducted by human indexers have a positive impact on the effectiveness of automatic subject term assignment through text categorization (TC). More specifically, human indexers' subject-indexing approaches, or conceptions, in conjunction with semantic sources were explored in the context of a typical scientific journal article dataset. Based on the premise that subject indexing approaches or conceptions with semantic sources are important for automatic subject term assignment through TC, this study proposed an indexing conception-based framework. For the purpose of this study, two research questions were explored: To what extent are semantic sources effective? To what extent are indexing conceptions effective? The experiments were conducted using a Support Vector Machine implementation in WEKA (I.H. Witten & E. Frank, [2000]). Using F-measure, the experiment results showed that cited works, source title, and title were as effective as the full text while a keyword was found more effective than the full text. In addition, the findings showed that an indexing conception-based framework was more effective than the full text. The content-oriented and the document-oriented indexing approaches especially were found more effective than the full text. Among three indexing conception-based approaches, the content-oriented approach and the document-oriented approach were more effective than the domain-oriented approach. In other words, in the context of a typical scientific journal article dataset, the objective contents and authors' intentions were more desirable for automatic subject term assignment via TC than the possible users' needs. The findings of this study support that incorporation of human indexers' indexing approaches or conception in conjunction with semantic sources has a positive impact on the effectiveness of automatic subject term assignment.