Search (106 results, page 1 of 6)

  • × theme_ss:"Computerlinguistik"
  1. Chou, C.; Chu, T.: ¬An analysis of BERT (NLP) for assisted subject indexing for Project Gutenberg (2022) 0.14
    0.13541423 = product of:
      0.1805523 = sum of:
        0.085297674 = weight(_text_:digital in 1139) [ClassicSimilarity], result of:
          0.085297674 = score(doc=1139,freq=4.0), product of:
            0.19770671 = queryWeight, product of:
              3.944552 = idf(docFreq=2326, maxDocs=44218)
              0.050121464 = queryNorm
            0.43143538 = fieldWeight in 1139, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.944552 = idf(docFreq=2326, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1139)
        0.04641878 = weight(_text_:library in 1139) [ClassicSimilarity], result of:
          0.04641878 = score(doc=1139,freq=6.0), product of:
            0.1317883 = queryWeight, product of:
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.050121464 = queryNorm
            0.3522223 = fieldWeight in 1139, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1139)
        0.048835836 = product of:
          0.09767167 = sum of:
            0.09767167 = weight(_text_:project in 1139) [ClassicSimilarity], result of:
              0.09767167 = score(doc=1139,freq=4.0), product of:
                0.21156175 = queryWeight, product of:
                  4.220981 = idf(docFreq=1764, maxDocs=44218)
                  0.050121464 = queryNorm
                0.4616698 = fieldWeight in 1139, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.220981 = idf(docFreq=1764, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1139)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    In light of AI (Artificial Intelligence) and NLP (Natural language processing) technologies, this article examines the feasibility of using AI/NLP models to enhance the subject indexing of digital resources. While BERT (Bidirectional Encoder Representations from Transformers) models are widely used in scholarly communities, the authors assess whether BERT models can be used in machine-assisted indexing in the Project Gutenberg collection, through suggesting Library of Congress subject headings filtered by certain Library of Congress Classification subclass labels. The findings of this study are informative for further research on BERT models to assist with automatic subject indexing for digital library collections.
  2. Paolillo, J.C.: Linguistics and the information sciences (2009) 0.08
    0.0831616 = product of:
      0.11088213 = sum of:
        0.060314562 = weight(_text_:digital in 3840) [ClassicSimilarity], result of:
          0.060314562 = score(doc=3840,freq=2.0), product of:
            0.19770671 = queryWeight, product of:
              3.944552 = idf(docFreq=2326, maxDocs=44218)
              0.050121464 = queryNorm
            0.30507088 = fieldWeight in 3840, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.944552 = idf(docFreq=2326, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3840)
        0.026799891 = weight(_text_:library in 3840) [ClassicSimilarity], result of:
          0.026799891 = score(doc=3840,freq=2.0), product of:
            0.1317883 = queryWeight, product of:
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.050121464 = queryNorm
            0.20335563 = fieldWeight in 3840, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3840)
        0.023767682 = product of:
          0.047535364 = sum of:
            0.047535364 = weight(_text_:22 in 3840) [ClassicSimilarity], result of:
              0.047535364 = score(doc=3840,freq=2.0), product of:
                0.17551683 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050121464 = queryNorm
                0.2708308 = fieldWeight in 3840, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3840)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Content
    Digital unter: http://dx.doi.org/10.1081/E-ELIS3-120044491. Vgl.: http://www.tandfonline.com/doi/book/10.1081/E-ELIS3.
    Date
    27. 8.2011 14:22:33
    Source
    Encyclopedia of library and information sciences. 3rd ed. Ed.: M.J. Bates
  3. Morris, V.: Automated language identification of bibliographic resources (2020) 0.08
    0.0819426 = product of:
      0.1638852 = sum of:
        0.030628446 = weight(_text_:library in 5749) [ClassicSimilarity], result of:
          0.030628446 = score(doc=5749,freq=2.0), product of:
            0.1317883 = queryWeight, product of:
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.050121464 = queryNorm
            0.23240642 = fieldWeight in 5749, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.0625 = fieldNorm(doc=5749)
        0.13325676 = sum of:
          0.07893063 = weight(_text_:project in 5749) [ClassicSimilarity], result of:
            0.07893063 = score(doc=5749,freq=2.0), product of:
              0.21156175 = queryWeight, product of:
                4.220981 = idf(docFreq=1764, maxDocs=44218)
                0.050121464 = queryNorm
              0.37308553 = fieldWeight in 5749, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.220981 = idf(docFreq=1764, maxDocs=44218)
                0.0625 = fieldNorm(doc=5749)
          0.054326132 = weight(_text_:22 in 5749) [ClassicSimilarity], result of:
            0.054326132 = score(doc=5749,freq=2.0), product of:
              0.17551683 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.050121464 = queryNorm
              0.30952093 = fieldWeight in 5749, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0625 = fieldNorm(doc=5749)
      0.5 = coord(2/4)
    
    Abstract
    This article describes experiments in the use of machine learning techniques at the British Library to assign language codes to catalog records, in order to provide information about the language of content of the resources described. In the first phase of the project, language codes were assigned to 1.15 million records with 99.7% confidence. The automated language identification tools developed will be used to contribute to future enhancement of over 4 million legacy records.
    Date
    2. 3.2020 19:04:22
  4. Yang, C.C.; Luk, J.: Automatic generation of English/Chinese thesaurus based on a parallel corpus in laws (2003) 0.07
    0.06953813 = product of:
      0.092717506 = sum of:
        0.06743372 = weight(_text_:digital in 1616) [ClassicSimilarity], result of:
          0.06743372 = score(doc=1616,freq=10.0), product of:
            0.19770671 = queryWeight, product of:
              3.944552 = idf(docFreq=2326, maxDocs=44218)
              0.050121464 = queryNorm
            0.3410796 = fieldWeight in 1616, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.944552 = idf(docFreq=2326, maxDocs=44218)
              0.02734375 = fieldNorm(doc=1616)
        0.013399946 = weight(_text_:library in 1616) [ClassicSimilarity], result of:
          0.013399946 = score(doc=1616,freq=2.0), product of:
            0.1317883 = queryWeight, product of:
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.050121464 = queryNorm
            0.10167781 = fieldWeight in 1616, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.02734375 = fieldNorm(doc=1616)
        0.011883841 = product of:
          0.023767682 = sum of:
            0.023767682 = weight(_text_:22 in 1616) [ClassicSimilarity], result of:
              0.023767682 = score(doc=1616,freq=2.0), product of:
                0.17551683 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050121464 = queryNorm
                0.1354154 = fieldWeight in 1616, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=1616)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    The information available in languages other than English in the World Wide Web is increasing significantly. According to a report from Computer Economics in 1999, 54% of Internet users are English speakers ("English Will Dominate Web for Only Three More Years," Computer Economics, July 9, 1999, http://www.computereconomics. com/new4/pr/pr990610.html). However, it is predicted that there will be only 60% increase in Internet users among English speakers verses a 150% growth among nonEnglish speakers for the next five years. By 2005, 57% of Internet users will be non-English speakers. A report by CNN.com in 2000 showed that the number of Internet users in China had been increased from 8.9 million to 16.9 million from January to June in 2000 ("Report: China Internet users double to 17 million," CNN.com, July, 2000, http://cnn.org/2000/TECH/computing/07/27/ china.internet.reut/index.html). According to Nielsen/ NetRatings, there was a dramatic leap from 22.5 millions to 56.6 millions Internet users from 2001 to 2002. China had become the second largest global at-home Internet population in 2002 (US's Internet population was 166 millions) (Robyn Greenspan, "China Pulls Ahead of Japan," Internet.com, April 22, 2002, http://cyberatias.internet.com/big-picture/geographics/article/0,,5911_1013841,00. html). All of the evidences reveal the importance of crosslingual research to satisfy the needs in the near future. Digital library research has been focusing in structural and semantic interoperability in the past. Searching and retrieving objects across variations in protocols, formats and disciplines are widely explored (Schatz, B., & Chen, H. (1999). Digital libraries: technological advances and social impacts. IEEE Computer, Special Issue an Digital Libraries, February, 32(2), 45-50.; Chen, H., Yen, J., & Yang, C.C. (1999). International activities: development of Asian digital libraries. IEEE Computer, Special Issue an Digital Libraries, 32(2), 48-49.). However, research in crossing language boundaries, especially across European languages and Oriental languages, is still in the initial stage. In this proposal, we put our focus an cross-lingual semantic interoperability by developing automatic generation of a cross-lingual thesaurus based an English/Chinese parallel corpus. When the searchers encounter retrieval problems, Professional librarians usually consult the thesaurus to identify other relevant vocabularies. In the problem of searching across language boundaries, a cross-lingual thesaurus, which is generated by co-occurrence analysis and Hopfield network, can be used to generate additional semantically relevant terms that cannot be obtained from dictionary. In particular, the automatically generated cross-lingual thesaurus is able to capture the unknown words that do not exist in a dictionary, such as names of persons, organizations, and events. Due to Hong Kong's unique history background, both English and Chinese are used as official languages in all legal documents. Therefore, English/Chinese cross-lingual information retrieval is critical for applications in courts and the government. In this paper, we develop an automatic thesaurus by the Hopfield network based an a parallel corpus collected from the Web site of the Department of Justice of the Hong Kong Special Administrative Region (HKSAR) Government. Experiments are conducted to measure the precision and recall of the automatic generated English/Chinese thesaurus. The result Shows that such thesaurus is a promising tool to retrieve relevant terms, especially in the language that is not the same as the input term. The direct translation of the input term can also be retrieved in most of the cases.
  5. Ruge, G.: ¬A spreading activation network for automatic generation of thesaurus relationships (1991) 0.05
    0.050567575 = product of:
      0.10113515 = sum of:
        0.053599782 = weight(_text_:library in 4506) [ClassicSimilarity], result of:
          0.053599782 = score(doc=4506,freq=2.0), product of:
            0.1317883 = queryWeight, product of:
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.050121464 = queryNorm
            0.40671125 = fieldWeight in 4506, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.109375 = fieldNorm(doc=4506)
        0.047535364 = product of:
          0.09507073 = sum of:
            0.09507073 = weight(_text_:22 in 4506) [ClassicSimilarity], result of:
              0.09507073 = score(doc=4506,freq=2.0), product of:
                0.17551683 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050121464 = queryNorm
                0.5416616 = fieldWeight in 4506, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=4506)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Date
    8.10.2000 11:52:22
    Source
    Library science with a slant to documentation. 28(1991) no.4, S.125-130
  6. Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.05
    0.04998924 = product of:
      0.09997848 = sum of:
        0.079606175 = product of:
          0.23881851 = sum of:
            0.23881851 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
              0.23881851 = score(doc=562,freq=2.0), product of:
                0.42493033 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.050121464 = queryNorm
                0.56201804 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.33333334 = coord(1/3)
        0.0203723 = product of:
          0.0407446 = sum of:
            0.0407446 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
              0.0407446 = score(doc=562,freq=2.0), product of:
                0.17551683 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050121464 = queryNorm
                0.23214069 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Content
    Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
    Date
    8. 1.2013 10:22:32
  7. Godby, J.: WordSmith research project bridges gap between tokens and indexes (1998) 0.04
    0.041789565 = product of:
      0.16715826 = sum of:
        0.16715826 = sum of:
          0.11962289 = weight(_text_:project in 4729) [ClassicSimilarity], result of:
            0.11962289 = score(doc=4729,freq=6.0), product of:
              0.21156175 = queryWeight, product of:
                4.220981 = idf(docFreq=1764, maxDocs=44218)
                0.050121464 = queryNorm
              0.5654278 = fieldWeight in 4729, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.220981 = idf(docFreq=1764, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4729)
          0.047535364 = weight(_text_:22 in 4729) [ClassicSimilarity], result of:
            0.047535364 = score(doc=4729,freq=2.0), product of:
              0.17551683 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.050121464 = queryNorm
              0.2708308 = fieldWeight in 4729, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4729)
      0.25 = coord(1/4)
    
    Abstract
    Reports on an OCLC natural language processing research project to develop methods for identifying terminology in unstructured electronic text, especially material associated with new cultural trends and emerging subjects. Current OCLC production software can only identify single words as indexable terms in full text documents, thus a major goal of the WordSmith project is to develop software that can automatically identify and intelligently organize phrases for uses in database indexes. By analyzing user terminology from local newspapers in the USA, the latest cultural trends and technical developments as well as personal and geographic names have been drawm out. Notes that this new vocabulary can also be mapped into reference works
    Source
    OCLC newsletter. 1998, no.234, Jul/Aug, S.22-24
  8. Hammwöhner, R.: TransRouter revisited : Decision support in the routing of translation projects (2000) 0.04
    0.03630176 = product of:
      0.14520703 = sum of:
        0.14520703 = sum of:
          0.09767167 = weight(_text_:project in 5483) [ClassicSimilarity], result of:
            0.09767167 = score(doc=5483,freq=4.0), product of:
              0.21156175 = queryWeight, product of:
                4.220981 = idf(docFreq=1764, maxDocs=44218)
                0.050121464 = queryNorm
              0.4616698 = fieldWeight in 5483, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.220981 = idf(docFreq=1764, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5483)
          0.047535364 = weight(_text_:22 in 5483) [ClassicSimilarity], result of:
            0.047535364 = score(doc=5483,freq=2.0), product of:
              0.17551683 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.050121464 = queryNorm
              0.2708308 = fieldWeight in 5483, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5483)
      0.25 = coord(1/4)
    
    Abstract
    This paper gives an outline of the final results of the TransRouter project. In the scope of this project a decision support system for translation managers has been developed, which will support the selection of appropriate routes for translation projects. In this paper emphasis is put on the decision model, which is based on a stepwise refined assessment of translation routes. The workflow of using this system is considered as well
    Date
    10.12.2000 18:22:35
  9. Liddy, E.D.: Natural language processing for information retrieval and knowledge discovery (1998) 0.04
    0.03509323 = product of:
      0.07018646 = sum of:
        0.04641878 = weight(_text_:library in 2345) [ClassicSimilarity], result of:
          0.04641878 = score(doc=2345,freq=6.0), product of:
            0.1317883 = queryWeight, product of:
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.050121464 = queryNorm
            0.3522223 = fieldWeight in 2345, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2345)
        0.023767682 = product of:
          0.047535364 = sum of:
            0.047535364 = weight(_text_:22 in 2345) [ClassicSimilarity], result of:
              0.047535364 = score(doc=2345,freq=2.0), product of:
                0.17551683 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050121464 = queryNorm
                0.2708308 = fieldWeight in 2345, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2345)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Date
    22. 9.1997 19:16:05
    Imprint
    Urbana-Champaign, IL : Illinois University at Urbana-Champaign, Graduate School of Library and Information Science
    Source
    Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al
  10. Greengrass, M.: Conflation methods for searching databases of Latin text (1996) 0.03
    0.030666022 = product of:
      0.061332043 = sum of:
        0.026799891 = weight(_text_:library in 6987) [ClassicSimilarity], result of:
          0.026799891 = score(doc=6987,freq=2.0), product of:
            0.1317883 = queryWeight, product of:
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.050121464 = queryNorm
            0.20335563 = fieldWeight in 6987, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6987)
        0.034532152 = product of:
          0.069064304 = sum of:
            0.069064304 = weight(_text_:project in 6987) [ClassicSimilarity], result of:
              0.069064304 = score(doc=6987,freq=2.0), product of:
                0.21156175 = queryWeight, product of:
                  4.220981 = idf(docFreq=1764, maxDocs=44218)
                  0.050121464 = queryNorm
                0.32644984 = fieldWeight in 6987, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.220981 = idf(docFreq=1764, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=6987)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Describes the results of a project to develop conflation tools for searching databases of Latin text. Reports on the results of a questionnaire sent to 64 users of Latin text retrieval systems. Describes a Latin stemming algorithm that uses a simple longest match with some recoding but differs from most stemmers in its use of 2 separate suffix dictionaries for processing query and database words. Describes a retrieval system in which a user inputs the principal component of their search term, these components are stemmed and the resulting stems matched against the noun based and verb based stem dictionaries. Evaluates the system, describing its limitations, and a more complex system
    Imprint
    London : British Library
  11. Schwarz, C.: THESYS: Thesaurus Syntax System : a fully automatic thesaurus building aid (1988) 0.03
    0.029149916 = product of:
      0.116599664 = sum of:
        0.116599664 = sum of:
          0.069064304 = weight(_text_:project in 1361) [ClassicSimilarity], result of:
            0.069064304 = score(doc=1361,freq=2.0), product of:
              0.21156175 = queryWeight, product of:
                4.220981 = idf(docFreq=1764, maxDocs=44218)
                0.050121464 = queryNorm
              0.32644984 = fieldWeight in 1361, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.220981 = idf(docFreq=1764, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1361)
          0.047535364 = weight(_text_:22 in 1361) [ClassicSimilarity], result of:
            0.047535364 = score(doc=1361,freq=2.0), product of:
              0.17551683 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.050121464 = queryNorm
              0.2708308 = fieldWeight in 1361, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1361)
      0.25 = coord(1/4)
    
    Abstract
    THESYS is based on the natural language processing of free-text databases. It yields statistically evaluated correlations between words of the database. These correlations correspond to traditional thesaurus relations. The person who has to build a thesaurus is thus assisted by the proposals made by THESYS. THESYS is being tested on commercial databases under real world conditions. It is part of a text processing project at Siemens, called TINA (Text-Inhalts-Analyse). Software from TINA is actually being applied and evaluated by the US Department of Commerce for patent search and indexing (REALIST: REtrieval Aids by Linguistics and STatistics)
    Date
    6. 1.1999 10:22:07
  12. Schneider, J.W.; Borlund, P.: ¬A bibliometric-based semiautomatic approach to identification of candidate thesaurus terms : parsing and filtering of noun phrases from citation contexts (2005) 0.03
    0.025283787 = product of:
      0.050567575 = sum of:
        0.026799891 = weight(_text_:library in 156) [ClassicSimilarity], result of:
          0.026799891 = score(doc=156,freq=2.0), product of:
            0.1317883 = queryWeight, product of:
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.050121464 = queryNorm
            0.20335563 = fieldWeight in 156, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.0546875 = fieldNorm(doc=156)
        0.023767682 = product of:
          0.047535364 = sum of:
            0.047535364 = weight(_text_:22 in 156) [ClassicSimilarity], result of:
              0.047535364 = score(doc=156,freq=2.0), product of:
                0.17551683 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050121464 = queryNorm
                0.2708308 = fieldWeight in 156, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=156)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Date
    8. 3.2007 19:55:22
    Source
    Context: nature, impact and role. 5th International Conference an Conceptions of Library and Information Sciences, CoLIS 2005 Glasgow, UK, June 2005. Ed. by F. Crestani u. I. Ruthven
  13. Dorr, B.J.: Large-scale dictionary construction for foreign language tutoring and interlingual machine translation (1997) 0.02
    0.024985643 = product of:
      0.09994257 = sum of:
        0.09994257 = sum of:
          0.059197973 = weight(_text_:project in 3244) [ClassicSimilarity], result of:
            0.059197973 = score(doc=3244,freq=2.0), product of:
              0.21156175 = queryWeight, product of:
                4.220981 = idf(docFreq=1764, maxDocs=44218)
                0.050121464 = queryNorm
              0.27981415 = fieldWeight in 3244, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.220981 = idf(docFreq=1764, maxDocs=44218)
                0.046875 = fieldNorm(doc=3244)
          0.0407446 = weight(_text_:22 in 3244) [ClassicSimilarity], result of:
            0.0407446 = score(doc=3244,freq=2.0), product of:
              0.17551683 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.050121464 = queryNorm
              0.23214069 = fieldWeight in 3244, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=3244)
      0.25 = coord(1/4)
    
    Abstract
    Describes techniques for automatic construction of dictionaries for use in large-scale foreign language tutoring (FLT) and interlingual machine translation (MT) systems. The dictionaries are based on a language independent representation called lexical conceptual structure (LCS). Demonstrates that synonymous verb senses share distribution patterns. Shows how the syntax-semantics relation can be used to develop a lexical acquisition approach that contributes both toward the enrichment of existing online resources and toward the development of lexicons containing more complete information than is provided in any of these resources alone. Describes the structure of the LCS and shows how this representation is used in FLT and MT. Focuses on the problem of building LCS dictionaries for large-scale FLT and MT. Describes authoring tools for manual and semi-automatic construction of LCS dictionaries. Presents an approach that uses linguistic techniques for building word definitions automatically. The techniques have been implemented as part of a set of lixicon-development tools used in the MILT FLT project
    Date
    31. 7.1996 9:22:19
  14. Andrushchenko, M.; Sandberg, K.; Turunen, R.; Marjanen, J.; Hatavara, M.; Kurunmäki, J.; Nummenmaa, T.; Hyvärinen, M.; Teräs, K.; Peltonen, J.; Nummenmaa, J.: Using parsed and annotated corpora to analyze parliamentarians' talk in Finland (2022) 0.02
    0.024083475 = product of:
      0.0963339 = sum of:
        0.0963339 = weight(_text_:digital in 471) [ClassicSimilarity], result of:
          0.0963339 = score(doc=471,freq=10.0), product of:
            0.19770671 = queryWeight, product of:
              3.944552 = idf(docFreq=2326, maxDocs=44218)
              0.050121464 = queryNorm
            0.4872566 = fieldWeight in 471, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.944552 = idf(docFreq=2326, maxDocs=44218)
              0.0390625 = fieldNorm(doc=471)
      0.25 = coord(1/4)
    
    Abstract
    We present a search system for grammatically analyzed corpora of Finnish parliamentary records and interviews with former parliamentarians, annotated with metadata of talk structure and involved parliamentarians, and discuss their use through carefully chosen digital humanities case studies. We first introduce the construction, contents, and principles of use of the corpora. Then we discuss the application of the search system and the corpora to study how politicians talk about power, how ideological terms are used in political speech, and how to identify narratives in the data. All case studies stem from questions in the humanities and the social sciences, but rely on the grammatically parsed corpora in both identifying and quantifying passages of interest. Finally, the paper discusses the role of natural language processing methods for questions in the (digital) humanities. It makes the claim that a digital humanities inquiry of parliamentary speech and interviews with politicians cannot only rely on computational humanities modeling, but needs to accommodate a range of perspectives starting with simple searches, quantitative exploration, and ending with modeling. Furthermore, the digital humanities need a more thorough discussion about how the utilization of tools from information science and technologies alter the research questions posed in the humanities.
    Series
    JASIST special issue on digital humanities (DH): C. Methodological innovations, challenges, and new interest in DH
  15. Stede, M.: Lexicalization in natural language generation (2002) 0.02
    0.0219043 = product of:
      0.0438086 = sum of:
        0.01914278 = weight(_text_:library in 4245) [ClassicSimilarity], result of:
          0.01914278 = score(doc=4245,freq=2.0), product of:
            0.1317883 = queryWeight, product of:
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.050121464 = queryNorm
            0.14525402 = fieldWeight in 4245, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4245)
        0.024665821 = product of:
          0.049331643 = sum of:
            0.049331643 = weight(_text_:project in 4245) [ClassicSimilarity], result of:
              0.049331643 = score(doc=4245,freq=2.0), product of:
                0.21156175 = queryWeight, product of:
                  4.220981 = idf(docFreq=1764, maxDocs=44218)
                  0.050121464 = queryNorm
                0.23317845 = fieldWeight in 4245, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.220981 = idf(docFreq=1764, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4245)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Natural language generation (NLG), the automatic production of text by Computers, is commonly seen as a process consisting of the following distinct phases: Obviously, choosing words is a central aspect of generatiog language. In which of the these phases it should take place is not entirely clear, however. The decision depends an various factors: what exactly is seen as an individual lexical item; how the relation between word meaning and background knowledge (concepts) is defined; how one accounts for the interactions between individual lexical choices in the Same sentence; what criteria are employed for choosing between similar words; whether or not output is required in one or more languages. This article surveys these issues and the answers that have been proposed in NLG research. For many applications of natural language processing, large scale lexical resources have become available in recent years, such as the WordNet database. In language generation, however, generic lexicons are not in use yet; rather, almost every generation project develops its own format for lexical representations. The reason is that the entries of a generation lexicon need their specific interfaces to the Input representations processed by the generator; lexical semantics in an NLG lexicon needs to be tailored to the Input. Ort the other hand, the large lexicons used for language analysis typically have only very limited semantic information at all. Yet the syntactic behavior of words remains the same regardless of the particular application; thus, it should be possible to build at least parts of generic NLG lexical entries automatically, which could then be used by different systems.
    Source
    Encyclopedia of library and information science. Vol.70, [=Suppl.33]
  16. Rahmstorf, G.: Concept structures for large vocabularies (1998) 0.02
    0.021671817 = product of:
      0.043343633 = sum of:
        0.022971334 = weight(_text_:library in 75) [ClassicSimilarity], result of:
          0.022971334 = score(doc=75,freq=2.0), product of:
            0.1317883 = queryWeight, product of:
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.050121464 = queryNorm
            0.17430481 = fieldWeight in 75, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.046875 = fieldNorm(doc=75)
        0.0203723 = product of:
          0.0407446 = sum of:
            0.0407446 = weight(_text_:22 in 75) [ClassicSimilarity], result of:
              0.0407446 = score(doc=75,freq=2.0), product of:
                0.17551683 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050121464 = queryNorm
                0.23214069 = fieldWeight in 75, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=75)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    A technology is described which supports the acquisition, visualisation and manipulation of large vocabularies with associated structures. It is used for dictionary production, terminology data bases, thesauri, library classification systems etc. Essential features of the technology are a lexicographic user interface, variable word description, unlimited list of word readings, a concept language, automatic transformations of formulas into graphic structures, structure manipulation operations and retransformation into formulas. The concept language includes notations for undefined concepts. The structure of defined concepts can be constructed interactively. The technology supports the generation of large vocabularies with structures representing word senses. Concept structures and ordering systems for indexing and retrieval can be constructed separately and connected by associating relations.
    Date
    30.12.2001 19:01:22
  17. Doszkocs, T.E.; Zamora, A.: Dictionary services and spelling aids for Web searching (2004) 0.02
    0.021575883 = product of:
      0.043151766 = sum of:
        0.01914278 = weight(_text_:library in 2541) [ClassicSimilarity], result of:
          0.01914278 = score(doc=2541,freq=2.0), product of:
            0.1317883 = queryWeight, product of:
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.050121464 = queryNorm
            0.14525402 = fieldWeight in 2541, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2541)
        0.024008986 = product of:
          0.04801797 = sum of:
            0.04801797 = weight(_text_:22 in 2541) [ClassicSimilarity], result of:
              0.04801797 = score(doc=2541,freq=4.0), product of:
                0.17551683 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050121464 = queryNorm
                0.27358043 = fieldWeight in 2541, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2541)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    The Specialized Information Services Division (SIS) of the National Library of Medicine (NLM) provides Web access to more than a dozen scientific databases on toxicology and the environment on TOXNET . Search queries on TOXNET often include misspelled or variant English words, medical and scientific jargon and chemical names. Following the example of search engines like Google and ClinicalTrials.gov, we set out to develop a spelling "suggestion" system for increased recall and precision in TOXNET searching. This paper describes development of dictionary technology that can be used in a variety of applications such as orthographic verification, writing aid, natural language processing, and information storage and retrieval. The design of the technology allows building complex applications using the components developed in the earlier phases of the work in a modular fashion without extensive rewriting of computer code. Since many of the potential applications envisioned for this work have on-line or web-based interfaces, the dictionaries and other computer components must have fast response, and must be adaptable to open-ended database vocabularies, including chemical nomenclature. The dictionary vocabulary for this work was derived from SIS and other databases and specialized resources, such as NLM's Unified Medical Language Systems (UMLS) . The resulting technology, A-Z Dictionary (AZdict), has three major constituents: 1) the vocabulary list, 2) the word attributes that define part of speech and morphological relationships between words in the list, and 3) a set of programs that implements the retrieval of words and their attributes, and determines similarity between words (ChemSpell). These three components can be used in various applications such as spelling verification, spelling aid, part-of-speech tagging, paraphrasing, and many other natural language processing functions.
    Date
    14. 8.2004 17:22:56
    Source
    Online. 28(2004) no.3, S.22-29
  18. Noever, D.; Ciolino, M.: ¬The Turing deception (2022) 0.02
    0.019901544 = product of:
      0.079606175 = sum of:
        0.079606175 = product of:
          0.23881851 = sum of:
            0.23881851 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
              0.23881851 = score(doc=862,freq=2.0), product of:
                0.42493033 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.050121464 = queryNorm
                0.56201804 = fieldWeight in 862, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=862)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Source
    https%3A%2F%2Farxiv.org%2Fabs%2F2212.06721&usg=AOvVaw3i_9pZm9y_dQWoHi6uv0EN
  19. Suissa, O.; Elmalech, A.; Zhitomirsky-Geffet, M.: Text analysis using deep neural networks in digital humanities and information science (2022) 0.02
    0.01865498 = product of:
      0.07461992 = sum of:
        0.07461992 = weight(_text_:digital in 491) [ClassicSimilarity], result of:
          0.07461992 = score(doc=491,freq=6.0), product of:
            0.19770671 = queryWeight, product of:
              3.944552 = idf(docFreq=2326, maxDocs=44218)
              0.050121464 = queryNorm
            0.37742734 = fieldWeight in 491, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.944552 = idf(docFreq=2326, maxDocs=44218)
              0.0390625 = fieldNorm(doc=491)
      0.25 = coord(1/4)
    
    Abstract
    Combining computational technologies and humanities is an ongoing effort aimed at making resources such as texts, images, audio, video, and other artifacts digitally available, searchable, and analyzable. In recent years, deep neural networks (DNN) dominate the field of automatic text analysis and natural language processing (NLP), in some cases presenting a super-human performance. DNNs are the state-of-the-art machine learning algorithms solving many NLP tasks that are relevant for Digital Humanities (DH) research, such as spell checking, language detection, entity extraction, author detection, question answering, and other tasks. These supervised algorithms learn patterns from a large number of "right" and "wrong" examples and apply them to new examples. However, using DNNs for analyzing the text resources in DH research presents two main challenges: (un)availability of training data and a need for domain adaptation. This paper explores these challenges by analyzing multiple use-cases of DH studies in recent literature and their possible solutions and lays out a practical decision model for DH experts for when and how to choose the appropriate deep learning approaches for their research. Moreover, in this paper, we aim to raise awareness of the benefits of utilizing deep learning models in the DH community.
    Series
    JASIST special issue on digital humanities (DH): C. Methodological innovations, challenges, and new interest in DH
  20. Reyes Ayala, B.; Knudson, R.; Chen, J.; Cao, G.; Wang, X.: Metadata records machine translation combining multi-engine outputs with limited parallel data (2018) 0.02
    0.015231727 = product of:
      0.060926907 = sum of:
        0.060926907 = weight(_text_:digital in 4010) [ClassicSimilarity], result of:
          0.060926907 = score(doc=4010,freq=4.0), product of:
            0.19770671 = queryWeight, product of:
              3.944552 = idf(docFreq=2326, maxDocs=44218)
              0.050121464 = queryNorm
            0.3081681 = fieldWeight in 4010, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.944552 = idf(docFreq=2326, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4010)
      0.25 = coord(1/4)
    
    Abstract
    One way to facilitate Multilingual Information Access (MLIA) for digital libraries is to generate multilingual metadata records by applying Machine Translation (MT) techniques. Current online MT services are available and affordable, but are not always effective for creating multilingual metadata records. In this study, we implemented 3 different MT strategies and evaluated their performance when translating English metadata records to Chinese and Spanish. These strategies included combining MT results from 3 online MT systems (Google, Bing, and Yahoo!) with and without additional linguistic resources, such as manually-generated parallel corpora, and metadata records in the two target languages obtained from international partners. The open-source statistical MT platform Moses was applied to design and implement the three translation strategies. Human evaluation of the MT results using adequacy and fluency demonstrated that two of the strategies produced higher quality translations than individual online MT systems for both languages. Especially, adding small, manually-generated parallel corpora of metadata records significantly improved translation performance. Our study suggested an effective and efficient MT approach for providing multilingual services for digital collections.

Years

Languages

  • e 84
  • d 19
  • chi 1
  • f 1
  • ru 1
  • More… Less…

Types

  • a 85
  • el 11
  • m 9
  • s 5
  • p 4
  • x 3
  • d 1
  • r 1
  • More… Less…