Search (99 results, page 1 of 5)

Chou, C.; Chu, T.: ¬An analysis of BERT (NLP) for assisted subject indexing for Project Gutenberg (2022) 0.14

0.13541423 = product of:
  0.1805523 = sum of:
    0.085297674 = weight(_text_:digital in 1139) [ClassicSimilarity], result of:
      0.085297674 = score(doc=1139,freq=4.0), product of:
        0.19770671 = queryWeight, product of:
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.050121464 = queryNorm
        0.43143538 = fieldWeight in 1139, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1139)
    0.04641878 = weight(_text_:library in 1139) [ClassicSimilarity], result of:
      0.04641878 = score(doc=1139,freq=6.0), product of:
        0.1317883 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.050121464 = queryNorm
        0.3522223 = fieldWeight in 1139, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1139)
    0.048835836 = product of:
      0.09767167 = sum of:
        0.09767167 = weight(_text_:project in 1139) [ClassicSimilarity], result of:
          0.09767167 = score(doc=1139,freq=4.0), product of:
            0.21156175 = queryWeight, product of:
              4.220981 = idf(docFreq=1764, maxDocs=44218)
              0.050121464 = queryNorm
            0.4616698 = fieldWeight in 1139, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.220981 = idf(docFreq=1764, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1139)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: In light of AI (Artificial Intelligence) and NLP (Natural language processing) technologies, this article examines the feasibility of using AI/NLP models to enhance the subject indexing of digital resources. While BERT (Bidirectional Encoder Representations from Transformers) models are widely used in scholarly communities, the authors assess whether BERT models can be used in machine-assisted indexing in the Project Gutenberg collection, through suggesting Library of Congress subject headings filtered by certain Library of Congress Classification subclass labels. The findings of this study are informative for further research on BERT models to assist with automatic subject indexing for digital library collections.

Junger, U.: Can indexing be automated? : the example of the Deutsche Nationalbibliothek (2012) 0.09

0.09123495 = product of:
  0.121646605 = sum of:
    0.060314562 = weight(_text_:digital in 1717) [ClassicSimilarity], result of:
      0.060314562 = score(doc=1717,freq=2.0), product of:
        0.19770671 = queryWeight, product of:
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.050121464 = queryNorm
        0.30507088 = fieldWeight in 1717, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1717)
    0.026799891 = weight(_text_:library in 1717) [ClassicSimilarity], result of:
      0.026799891 = score(doc=1717,freq=2.0), product of:
        0.1317883 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.050121464 = queryNorm
        0.20335563 = fieldWeight in 1717, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1717)
    0.034532152 = product of:
      0.069064304 = sum of:
        0.069064304 = weight(_text_:project in 1717) [ClassicSimilarity], result of:
          0.069064304 = score(doc=1717,freq=2.0), product of:
            0.21156175 = queryWeight, product of:
              4.220981 = idf(docFreq=1764, maxDocs=44218)
              0.050121464 = queryNorm
            0.32644984 = fieldWeight in 1717, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.220981 = idf(docFreq=1764, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1717)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: The German subject headings authority file (Schlagwortnormdatei/SWD) provides a broad controlled vocabulary for indexing documents of all subjects. Traditionally used for intellectual subject cataloguing primarily of books the Deutsche Nationalbibliothek (DNB, German National Library) has been working on developping and implementing procedures for automated assignment of subject headings for online publications. This project, its results and problems are sketched in the paper.
Content: Beitrag für die Tagung: Beyond libraries - subject metadata in the digital environment and semantic web. IFLA Satellite Post-Conference, 17-18 August 2012, Tallinn. Vgl.: http://http://www.nlib.ee/index.php?id=17763.

Junger, U.: Can indexing be automated? : the example of the Deutsche Nationalbibliothek (2014) 0.09

0.09123495 = product of:
  0.121646605 = sum of:
    0.060314562 = weight(_text_:digital in 1969) [ClassicSimilarity], result of:
      0.060314562 = score(doc=1969,freq=2.0), product of:
        0.19770671 = queryWeight, product of:
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.050121464 = queryNorm
        0.30507088 = fieldWeight in 1969, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1969)
    0.026799891 = weight(_text_:library in 1969) [ClassicSimilarity], result of:
      0.026799891 = score(doc=1969,freq=2.0), product of:
        0.1317883 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.050121464 = queryNorm
        0.20335563 = fieldWeight in 1969, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1969)
    0.034532152 = product of:
      0.069064304 = sum of:
        0.069064304 = weight(_text_:project in 1969) [ClassicSimilarity], result of:
          0.069064304 = score(doc=1969,freq=2.0), product of:
            0.21156175 = queryWeight, product of:
              4.220981 = idf(docFreq=1764, maxDocs=44218)
              0.050121464 = queryNorm
            0.32644984 = fieldWeight in 1969, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.220981 = idf(docFreq=1764, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1969)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: The German Integrated Authority File (Gemeinsame Normdatei, GND), provides a broad controlled vocabulary for indexing documents on all subjects. Traditionally used for intellectual subject cataloging primarily for books, the Deutsche Nationalbibliothek (DNB, German National Library) has been working on developing and implementing procedures for automated assignment of subject headings for online publications. This project, its results, and problems are outlined in this article.
Footnote: Contribution in a special issue "Beyond libraries: Subject metadata in the digital environment and Semantic Web" - Enthält Beiträge der gleichnamigen IFLA Satellite Post-Conference, 17-18 August 2012, Tallinn.

Alexander, M.: Retrieving digital data with fuzzy matching (1997) 0.08

0.07501016 = product of:
  0.15002032 = sum of:
    0.11939187 = weight(_text_:digital in 151) [ClassicSimilarity], result of:
      0.11939187 = score(doc=151,freq=6.0), product of:
        0.19770671 = queryWeight, product of:
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.050121464 = queryNorm
        0.60388374 = fieldWeight in 151, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.0625 = fieldNorm(doc=151)
    0.030628446 = weight(_text_:library in 151) [ClassicSimilarity], result of:
      0.030628446 = score(doc=151,freq=2.0), product of:
        0.1317883 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.050121464 = queryNorm
        0.23240642 = fieldWeight in 151, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0625 = fieldNorm(doc=151)
  0.5 = coord(2/4)

Abstract: In 1993 the British Library established a programme of activities entitled Initiatives for Access (IFA) to identify and develop computer applications based on the new technologies emerging in the aereas of digital and network service. Discusses the problem of the effective retrieval of digital data after its capture focusing on the product Excalibur EFS which looks at the way information is sorted at its fundamental level and identifies patterns in numbers. Looks at the benefits of Excalibur and outlines other experiments in progress as part of the IFA programme

Kanan, T.; Fox, E.A.: Automated arabic text classification with P-Stemmer, machine learning, and a tailored news article taxonomy (2016) 0.07

0.06516782 = product of:
  0.08689043 = sum of:
    0.043081827 = weight(_text_:digital in 3151) [ClassicSimilarity], result of:
      0.043081827 = score(doc=3151,freq=2.0), product of:
        0.19770671 = queryWeight, product of:
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.050121464 = queryNorm
        0.21790776 = fieldWeight in 3151, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3151)
    0.01914278 = weight(_text_:library in 3151) [ClassicSimilarity], result of:
      0.01914278 = score(doc=3151,freq=2.0), product of:
        0.1317883 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.050121464 = queryNorm
        0.14525402 = fieldWeight in 3151, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3151)
    0.024665821 = product of:
      0.049331643 = sum of:
        0.049331643 = weight(_text_:project in 3151) [ClassicSimilarity], result of:
          0.049331643 = score(doc=3151,freq=2.0), product of:
            0.21156175 = queryWeight, product of:
              4.220981 = idf(docFreq=1764, maxDocs=44218)
              0.050121464 = queryNorm
            0.23317845 = fieldWeight in 3151, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.220981 = idf(docFreq=1764, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3151)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: Arabic news articles in electronic collections are difficult to study. Browsing by category is rarely supported. Although helpful machine-learning methods have been applied successfully to similar situations for English news articles, limited research has been completed to yield suitable solutions for Arabic news. In connection with a Qatar National Research Fund (QNRF)-funded project to build digital library community and infrastructure in Qatar, we developed software for browsing a collection of about 237,000 Arabic news articles, which should be applicable to other Arabic news collections. We designed a simple taxonomy for Arabic news stories that is suitable for the needs of Qatar and other nations, is compatible with the subject codes of the International Press Telecommunications Council, and was enhanced with the aid of a librarian expert as well as five Arabic-speaking volunteers. We developed tailored stemming (i.e., a new Arabic light stemmer called P-Stemmer) and automatic classification methods (the best being binary Support Vector Machines classifiers) to work with the taxonomy. Using evaluation techniques commonly used in the information retrieval community, including 10-fold cross-validation and the Wilcoxon signed-rank test, we showed that our approach to stemming and classification is superior to state-of-the-art techniques.

Lepsky, K.; Müller, T.; Wille, J.: Metadata improvement for image information retrieval (2010) 0.06

0.059914913 = product of:
  0.119829826 = sum of:
    0.085297674 = weight(_text_:digital in 4995) [ClassicSimilarity], result of:
      0.085297674 = score(doc=4995,freq=4.0), product of:
        0.19770671 = queryWeight, product of:
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.050121464 = queryNorm
        0.43143538 = fieldWeight in 4995, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4995)
    0.034532152 = product of:
      0.069064304 = sum of:
        0.069064304 = weight(_text_:project in 4995) [ClassicSimilarity], result of:
          0.069064304 = score(doc=4995,freq=2.0), product of:
            0.21156175 = queryWeight, product of:
              4.220981 = idf(docFreq=1764, maxDocs=44218)
              0.050121464 = queryNorm
            0.32644984 = fieldWeight in 4995, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.220981 = idf(docFreq=1764, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4995)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: This paper discusses the goals and results of the research project Perseus-a as an attempt to improve information retrieval of digital images by automatically connecting them with text-based descriptions. The development uses the image collection of prometheus, the distributed digital image archive for research and studies, the articles of the digitized Reallexikon zur Deutschen Kunstgeschichte, art historical terminological resources and classification data, and an open source system for linguistic and statistic automatic indexing called lingo.

Alexander, M.: Automatic indexing of document images using Excalibur EFS (1995) 0.06

0.056123044 = product of:
  0.11224609 = sum of:
    0.068930924 = weight(_text_:digital in 1911) [ClassicSimilarity], result of:
      0.068930924 = score(doc=1911,freq=2.0), product of:
        0.19770671 = queryWeight, product of:
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.050121464 = queryNorm
        0.34865242 = fieldWeight in 1911, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.0625 = fieldNorm(doc=1911)
    0.043315165 = weight(_text_:library in 1911) [ClassicSimilarity], result of:
      0.043315165 = score(doc=1911,freq=4.0), product of:
        0.1317883 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.050121464 = queryNorm
        0.32867232 = fieldWeight in 1911, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0625 = fieldNorm(doc=1911)
  0.5 = coord(2/4)

Abstract: Discusses research into the application of adaptive pattern recognition technology to enable effective retrieval from scanned document images. Describes application at the British Library of Excalibur EFS software which uses adaptive pattern recognition technology to provide access to digital information in its native forms, fuzzy searching retrieval and automatic indexing capabilities. It was used to make specialist printed catalogues and indexes accessible on computer via content based indexes
Source: Library technology news. 1995, no.16, S.4-8

Alexander, M.: Retrieving digital data with fuzzy matching (1996) 0.06

0.056123044 = product of:
  0.11224609 = sum of:
    0.068930924 = weight(_text_:digital in 6961) [ClassicSimilarity], result of:
      0.068930924 = score(doc=6961,freq=2.0), product of:
        0.19770671 = queryWeight, product of:
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.050121464 = queryNorm
        0.34865242 = fieldWeight in 6961, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.0625 = fieldNorm(doc=6961)
    0.043315165 = weight(_text_:library in 6961) [ClassicSimilarity], result of:
      0.043315165 = score(doc=6961,freq=4.0), product of:
        0.1317883 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.050121464 = queryNorm
        0.32867232 = fieldWeight in 6961, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0625 = fieldNorm(doc=6961)
  0.5 = coord(2/4)

Abstract: Briefly describes the Excalibur EFS system which makes use of adaptive pattern recognition technology as an aid to automatic indexing and how it is being tested at the British Library for the indexing and retrieval of scanned images from the library's holdings. Notes how Excalibur EFS can support a wide degree of fuzzy searching, compensate for the errors produced by OCR conversion of scanned images, reduce the costs of indexing, and require far less storage space than more traditional indexes
Source: New library world. 97(1996) no.1131, S.28-31

Jones, R.L.: Automatic document content analysis : the AIDA project (1992) 0.05

0.054025523 = product of:
  0.10805105 = sum of:
    0.03828556 = weight(_text_:library in 2607) [ClassicSimilarity], result of:
      0.03828556 = score(doc=2607,freq=2.0), product of:
        0.1317883 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.050121464 = queryNorm
        0.29050803 = fieldWeight in 2607, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.078125 = fieldNorm(doc=2607)
    0.069765486 = product of:
      0.13953097 = sum of:
        0.13953097 = weight(_text_:project in 2607) [ClassicSimilarity], result of:
          0.13953097 = score(doc=2607,freq=4.0), product of:
            0.21156175 = queryWeight, product of:
              4.220981 = idf(docFreq=1764, maxDocs=44218)
              0.050121464 = queryNorm
            0.6595283 = fieldWeight in 2607, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.220981 = idf(docFreq=1764, maxDocs=44218)
              0.078125 = fieldNorm(doc=2607)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: The AIDA project is a research program being carried out by Computer Power in Canberra, Australia, in collaboration with the Australian Parliament. Its primary objective is to develop practical methods for carrying out document content analysis with minimal human intervention. The different techniques employed by AIDA to achieve its results are described
Source: Library hi tech. 10(1992) no.1/2, S.111-118

Koch, T.: Experiments with automatic classification of WAIS databases and indexing of WWW : some results from the Nordic WAIS/WWW project (1994) 0.05

0.053482536 = product of:
  0.10696507 = sum of:
    0.03790077 = weight(_text_:library in 7209) [ClassicSimilarity], result of:
      0.03790077 = score(doc=7209,freq=4.0), product of:
        0.1317883 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.050121464 = queryNorm
        0.28758827 = fieldWeight in 7209, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7209)
    0.069064304 = product of:
      0.13812861 = sum of:
        0.13812861 = weight(_text_:project in 7209) [ClassicSimilarity], result of:
          0.13812861 = score(doc=7209,freq=8.0), product of:
            0.21156175 = queryWeight, product of:
              4.220981 = idf(docFreq=1764, maxDocs=44218)
              0.050121464 = queryNorm
            0.6528997 = fieldWeight in 7209, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              4.220981 = idf(docFreq=1764, maxDocs=44218)
              0.0546875 = fieldNorm(doc=7209)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: The Nordic WAIS/WWW project sponsored by NORDINFO is a joint project between Lund University Library and the National Technological Library of Denmark. It aims to improve the existing networked information discovery and retrieval tools Wide Area Information System (WAIS) and World Wide Web (WWW), and to move towards unifying WWW and WAIS. Details current results focusing on the WAIS side of the project. Describes research into automatic indexing and classification of WAIS sources, development of an orientation tool for WAIS, and development of a WAIS index of WWW resources

Greiner-Petter, A.; Schubotz, M.; Cohl, H.S.; Gipp, B.: Semantic preserving bijective mappings for expressions involving special functions between computer algebra systems and document preparation systems (2019) 0.05
```
0.04752091 = product of:
  0.06336121 = sum of:
    0.034465462 = weight(_text_:digital in 5499) [ClassicSimilarity], result of:
      0.034465462 = score(doc=5499,freq=2.0), product of:
        0.19770671 = queryWeight, product of:
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.050121464 = queryNorm
        0.17432621 = fieldWeight in 5499, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.03125 = fieldNorm(doc=5499)
    0.015314223 = weight(_text_:library in 5499) [ClassicSimilarity], result of:
      0.015314223 = score(doc=5499,freq=2.0), product of:
        0.1317883 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.050121464 = queryNorm
        0.11620321 = fieldWeight in 5499, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.03125 = fieldNorm(doc=5499)
    0.013581533 = product of:
      0.027163066 = sum of:
        0.027163066 = weight(_text_:22 in 5499) [ClassicSimilarity], result of:
          0.027163066 = score(doc=5499,freq=2.0), product of:
            0.17551683 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050121464 = queryNorm
            0.15476047 = fieldWeight in 5499, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=5499)
      0.5 = coord(1/2)
  0.75 = coord(3/4)
```
Abstract

Purpose Modern mathematicians and scientists of math-related disciplines often use Document Preparation Systems (DPS) to write and Computer Algebra Systems (CAS) to calculate mathematical expressions. Usually, they translate the expressions manually between DPS and CAS. This process is time-consuming and error-prone. The purpose of this paper is to automate this translation. This paper uses Maple and Mathematica as the CAS, and LaTeX as the DPS. Design/methodology/approach Bruce Miller at the National Institute of Standards and Technology (NIST) developed a collection of special LaTeX macros that create links from mathematical symbols to their definitions in the NIST Digital Library of Mathematical Functions (DLMF). The authors are using these macros to perform rule-based translations between the formulae in the DLMF and CAS. Moreover, the authors develop software to ease the creation of new rules and to discover inconsistencies. Findings The authors created 396 mappings and translated 58.8 percent of DLMF formulae (2,405 expressions) successfully between Maple and DLMF. For a significant percentage, the special function definitions in Maple and the DLMF were different. An atomic symbol in one system maps to a composite expression in the other system. The translator was also successfully used for automatic verification of mathematical online compendia and CAS. The evaluation techniques discovered two errors in the DLMF and one defect in Maple. Originality/value This paper introduces the first translation tool for special functions between LaTeX and CAS. The approach improves error-prone manual translations and can be used to verify mathematical online compendia and CAS.

Date

20. 1.2015 18:30:22
Wang, S.; Koopman, R.: Embed first, then predict (2019) 0.04
```
0.040034845 = product of:
  0.08006969 = sum of:
    0.060926907 = weight(_text_:digital in 5400) [ClassicSimilarity], result of:
      0.060926907 = score(doc=5400,freq=4.0), product of:
        0.19770671 = queryWeight, product of:
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.050121464 = queryNorm
        0.3081681 = fieldWeight in 5400, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5400)
    0.01914278 = weight(_text_:library in 5400) [ClassicSimilarity], result of:
      0.01914278 = score(doc=5400,freq=2.0), product of:
        0.1317883 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.050121464 = queryNorm
        0.14525402 = fieldWeight in 5400, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5400)
  0.5 = coord(2/4)
```
Abstract

Automatic subject prediction is a desirable feature for modern digital library systems, as manual indexing can no longer cope with the rapid growth of digital collections. It is also desirable to be able to identify a small set of entities (e.g., authors, citations, bibliographic records) which are most relevant to a query. This gets more difficult when the amount of data increases dramatically. Data sparsity and model scalability are the major challenges to solving this type of extreme multilabel classification problem automatically. In this paper, we propose to address this problem in two steps: we first embed different types of entities into the same semantic space, where similarity could be computed easily; second, we propose a novel non-parametric method to identify the most relevant entities in addition to direct semantic similarities. We show how effectively this approach predicts even very specialised subjects, which are associated with few documents in the training set and are more problematic for a classifier.

Shafer, K.: Scorpion Project explores using Dewey to organize the Web (1996) 0.04

0.037817866 = product of:
  0.07563573 = sum of:
    0.026799891 = weight(_text_:library in 6750) [ClassicSimilarity], result of:
      0.026799891 = score(doc=6750,freq=2.0), product of:
        0.1317883 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.050121464 = queryNorm
        0.20335563 = fieldWeight in 6750, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0546875 = fieldNorm(doc=6750)
    0.048835836 = product of:
      0.09767167 = sum of:
        0.09767167 = weight(_text_:project in 6750) [ClassicSimilarity], result of:
          0.09767167 = score(doc=6750,freq=4.0), product of:
            0.21156175 = queryWeight, product of:
              4.220981 = idf(docFreq=1764, maxDocs=44218)
              0.050121464 = queryNorm
            0.4616698 = fieldWeight in 6750, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.220981 = idf(docFreq=1764, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6750)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: As the amount of accessible information on the WWW increases, so will the cost of accessing it, even if search servcies remain free, due to the increasing amount of time users will have to spend to find needed items. Considers what the seemingly unorganized Web and the organized world of libraries can offer each other. The OCLC Scorpion Project is attempting to combine indexing and cataloguing, specifically focusing on building tools for automatic subject recognition using the technqiues of library science and information retrieval. If subject headings or concept domains can be automatically assigned to electronic items, improved filtering tools for searching can be produced

Moulaison-Sandy, H.; Adkins, D.; Bossaller, J.; Cho, H.: ¬An automated approach to describing fiction : a methodology to use book reviews to identify affect (2021) 0.04

0.037817866 = product of:
  0.07563573 = sum of:
    0.026799891 = weight(_text_:library in 710) [ClassicSimilarity], result of:
      0.026799891 = score(doc=710,freq=2.0), product of:
        0.1317883 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.050121464 = queryNorm
        0.20335563 = fieldWeight in 710, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0546875 = fieldNorm(doc=710)
    0.048835836 = product of:
      0.09767167 = sum of:
        0.09767167 = weight(_text_:project in 710) [ClassicSimilarity], result of:
          0.09767167 = score(doc=710,freq=4.0), product of:
            0.21156175 = queryWeight, product of:
              4.220981 = idf(docFreq=1764, maxDocs=44218)
              0.050121464 = queryNorm
            0.4616698 = fieldWeight in 710, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.220981 = idf(docFreq=1764, maxDocs=44218)
              0.0546875 = fieldNorm(doc=710)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: Subject headings and genre terms are notoriously difficult to apply, yet are important for fiction. The current project functions as a proof of concept, using a text-mining methodology to identify affective information (emotion and tone) about fiction titles from professional book reviews as a potential first step in automating the subject analysis process. Findings are presented and discussed, comparing results to the range of aboutness and isness information in library cataloging records. The methodology is likewise presented, and how future work might expand on the current project to enhance catalog records through text-mining is explored.

Banerjee, K.; Johnson, M.: Improving access to archival collections with automated entity extraction (2015) 0.04

0.037334766 = product of:
  0.07466953 = sum of:
    0.051698197 = weight(_text_:digital in 2144) [ClassicSimilarity], result of:
      0.051698197 = score(doc=2144,freq=2.0), product of:
        0.19770671 = queryWeight, product of:
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.050121464 = queryNorm
        0.26148933 = fieldWeight in 2144, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.046875 = fieldNorm(doc=2144)
    0.022971334 = weight(_text_:library in 2144) [ClassicSimilarity], result of:
      0.022971334 = score(doc=2144,freq=2.0), product of:
        0.1317883 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.050121464 = queryNorm
        0.17430481 = fieldWeight in 2144, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.046875 = fieldNorm(doc=2144)
  0.5 = coord(2/4)

Abstract: The complexity and diversity of archival resources make constructing rich metadata records time consuming and expensive, which in turn limits access to these valuable materials. However, significant automation of the metadata creation process would dramatically reduce the cost of providing access points, improve access to individual resources, and establish connections between resources that would otherwise remain unknown. Using a case study at Oregon Health & Science University as a lens to examine the conceptual and technical challenges associated with automated extraction of access points, we discuss using publically accessible API's to extract entities (i.e. people, places, concepts, etc.) from digital and digitized objects. We describe why Linked Open Data is not well suited for a use case such as ours. We conclude with recommendations about how this method can be used in archives as well as for other library applications.

Milstead, J.L.: Thesauri in a full-text world (1998) 0.03

0.025066596 = product of:
  0.05013319 = sum of:
    0.033156272 = weight(_text_:library in 2337) [ClassicSimilarity], result of:
      0.033156272 = score(doc=2337,freq=6.0), product of:
        0.1317883 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.050121464 = queryNorm
        0.25158736 = fieldWeight in 2337, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2337)
    0.016976917 = product of:
      0.033953834 = sum of:
        0.033953834 = weight(_text_:22 in 2337) [ClassicSimilarity], result of:
          0.033953834 = score(doc=2337,freq=2.0), product of:
            0.17551683 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050121464 = queryNorm
            0.19345059 = fieldWeight in 2337, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2337)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Date: 22. 9.1997 19:16:05
Imprint: Urbana-Champaign, IL : Illinois University at Urbana-Champaign, Graduate School of Library and Information Science
Source: Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al

Markoff, J.: Researchers announce advance in image-recognition software (2014) 0.02
```
0.021398183 = product of:
  0.042796366 = sum of:
    0.030463453 = weight(_text_:digital in 1875) [ClassicSimilarity], result of:
      0.030463453 = score(doc=1875,freq=4.0), product of:
        0.19770671 = queryWeight, product of:
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.050121464 = queryNorm
        0.15408406 = fieldWeight in 1875, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.01953125 = fieldNorm(doc=1875)
    0.012332911 = product of:
      0.024665821 = sum of:
        0.024665821 = weight(_text_:project in 1875) [ClassicSimilarity], result of:
          0.024665821 = score(doc=1875,freq=2.0), product of:
            0.21156175 = queryWeight, product of:
              4.220981 = idf(docFreq=1764, maxDocs=44218)
              0.050121464 = queryNorm
            0.116589226 = fieldWeight in 1875, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.220981 = idf(docFreq=1764, maxDocs=44218)
              0.01953125 = fieldNorm(doc=1875)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Content

In the longer term, the new research may lead to technology that helps the blind and robots navigate natural environments. But it also raises chilling possibilities for surveillance. During the past 15 years, video cameras have been placed in a vast number of public and private spaces. In the future, the software operating the cameras will not only be able to identify particular humans via facial recognition, experts say, but also identify certain types of behavior, perhaps even automatically alerting authorities. Two years ago Google researchers created image-recognition software and presented it with 10 million images taken from YouTube videos. Without human guidance, the program trained itself to recognize cats - a testament to the number of cat videos on YouTube. Current artificial intelligence programs in new cars already can identify pedestrians and bicyclists from cameras positioned atop the windshield and can stop the car automatically if the driver does not take action to avoid a collision. But "just single object recognition is not very beneficial," said Ali Farhadi, a computer scientist at the University of Washington who has published research on software that generates sentences from digital pictures. "We've focused on objects, and we've ignored verbs," he said, adding that these programs do not grasp what is going on in an image. Both the Google and Stanford groups tackled the problem by refining software programs known as neural networks, inspired by our understanding of how the brain works. Neural networks can "train" themselves to discover similarities and patterns in data, even when their human creators do not know the patterns exist.
In living organisms, webs of neurons in the brain vastly outperform even the best computer-based networks in perception and pattern recognition. But by adopting some of the same architecture, computers are catching up, learning to identify patterns in speech and imagery with increasing accuracy. The advances are apparent to consumers who use Apple's Siri personal assistant, for example, or Google's image search. Both groups of researchers employed similar approaches, weaving together two types of neural networks, one focused on recognizing images and the other on human language. In both cases the researchers trained the software with relatively small sets of digital images that had been annotated with descriptive sentences by humans. After the software programs "learned" to see patterns in the pictures and description, the researchers turned them on previously unseen images. The programs were able to identify objects and actions with roughly double the accuracy of earlier efforts, although still nowhere near human perception capabilities. "I was amazed that even with the small amount of training data that we were able to do so well," said Oriol Vinyals, a Google computer scientist who wrote the paper with Alexander Toshev, Samy Bengio and Dumitru Erhan, members of the Google Brain project. "The field is just starting, and we will see a lot of increases."
Simões, M. da Graça; Machado, L.M.; Souza, R.R.; Almeida, M.B.; Tavares Lopes, A.: Automatic indexing and ontologies : the consistency of research chronology and authoring in the context of Information Science (2018) 0.02
```
0.021324418 = product of:
  0.085297674 = sum of:
    0.085297674 = weight(_text_:digital in 5909) [ClassicSimilarity], result of:
      0.085297674 = score(doc=5909,freq=4.0), product of:
        0.19770671 = queryWeight, product of:
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.050121464 = queryNorm
        0.43143538 = fieldWeight in 5909, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5909)
  0.25 = coord(1/4)
```
Source

Challenges and opportunities for knowledge organization in the digital age: proceedings of the Fifteenth International ISKO Conference, 9-11 July 2018, Porto, Portugal / organized by: International Society for Knowledge Organization (ISKO), ISKO Spain and Portugal Chapter, University of Porto - Faculty of Arts and Humanities, Research Centre in Communication, Information and Digital Culture (CIC.digital) - Porto. Eds.: F. Ribeiro u. M.E. Cerveira
Souza, R.R.; Raghavan, K.S.: ¬A methodology for noun phrase-based automatic indexing (2006) 0.02
```
0.018278074 = product of:
  0.073112294 = sum of:
    0.073112294 = weight(_text_:digital in 173) [ClassicSimilarity], result of:
      0.073112294 = score(doc=173,freq=4.0), product of:
        0.19770671 = queryWeight, product of:
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.050121464 = queryNorm
        0.36980176 = fieldWeight in 173, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.046875 = fieldNorm(doc=173)
  0.25 = coord(1/4)
```
Abstract

The scholarly community is increasingly employing the Web both for publication of scholarly output and for locating and accessing relevant scholarly literature. Organization of this vast body of digital information assumes significance in this context. The sheer volume of digital information to be handled makes traditional indexing and knowledge representation strategies ineffective and impractical. It is, therefore, worth exploring new approaches. An approach being discussed considers the intrinsic semantics of texts of documents. Based on the hypothesis that noun phrases in a text are semantically rich in terms of their ability to represent the subject content of the document, this approach seeks to identify and extract noun phrases instead of single keywords, and use them as descriptors. This paper presents a methodology that has been developed for extracting noun phrases from Portuguese texts. The results of an experiment carried out to test the adequacy of the methodology are also presented.
Smart, G.: Using language analysis to manage information (1993) 0.01
```
0.013953096 = product of:
  0.055812385 = sum of:
    0.055812385 = product of:
      0.11162477 = sum of:
        0.11162477 = weight(_text_:project in 4423) [ClassicSimilarity], result of:
          0.11162477 = score(doc=4423,freq=4.0), product of:
            0.21156175 = queryWeight, product of:
              4.220981 = idf(docFreq=1764, maxDocs=44218)
              0.050121464 = queryNorm
            0.52762264 = fieldWeight in 4423, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.220981 = idf(docFreq=1764, maxDocs=44218)
              0.0625 = fieldNorm(doc=4423)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

The ESPRIT project SIMPR developed software to analyse documents and generate indexes for them. Of immediate application as a document indexing and classification system, this also offers a technology for information modelling that has broader implications, supporting many new uses for information management softeware. The project was based on the assumption that information can only be managed successfully by computer systems that can view the information contained in a document through the language in which the document is written, and that systems need to be sufficiently flexible to respond to the changing requirements of document use

Search (99 results, page 1 of 5)

Authors

Years

Languages

Types

Themes