Search (35 results, page 2 of 2)

  • × theme_ss:"Automatisches Indexieren"
  • × theme_ss:"Computerlinguistik"
  1. Chou, C.; Chu, T.: ¬An analysis of BERT (NLP) for assisted subject indexing for Project Gutenberg (2022) 0.01
    0.007096812 = product of:
      0.028387249 = sum of:
        0.020662563 = weight(_text_:of in 1139) [ClassicSimilarity], result of:
          0.020662563 = score(doc=1139,freq=14.0), product of:
            0.06457475 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.041294612 = queryNorm
            0.31997898 = fieldWeight in 1139, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1139)
        0.007724685 = product of:
          0.01544937 = sum of:
            0.01544937 = weight(_text_:on in 1139) [ClassicSimilarity], result of:
              0.01544937 = score(doc=1139,freq=2.0), product of:
                0.090823986 = queryWeight, product of:
                  2.199415 = idf(docFreq=13325, maxDocs=44218)
                  0.041294612 = queryNorm
                0.17010231 = fieldWeight in 1139, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.199415 = idf(docFreq=13325, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1139)
          0.5 = coord(1/2)
      0.25 = coord(2/8)
    
    Abstract
    In light of AI (Artificial Intelligence) and NLP (Natural language processing) technologies, this article examines the feasibility of using AI/NLP models to enhance the subject indexing of digital resources. While BERT (Bidirectional Encoder Representations from Transformers) models are widely used in scholarly communities, the authors assess whether BERT models can be used in machine-assisted indexing in the Project Gutenberg collection, through suggesting Library of Congress subject headings filtered by certain Library of Congress Classification subclass labels. The findings of this study are informative for further research on BERT models to assist with automatic subject indexing for digital library collections.
  2. Driscoll, J.R.; Rajala, D.A.; Shaffer, W.H.: ¬The operation and performance of an artificially intelligent keywording system (1991) 0.01
    0.006713625 = product of:
      0.0268545 = sum of:
        0.019129815 = weight(_text_:of in 6681) [ClassicSimilarity], result of:
          0.019129815 = score(doc=6681,freq=12.0), product of:
            0.06457475 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.041294612 = queryNorm
            0.29624295 = fieldWeight in 6681, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6681)
        0.007724685 = product of:
          0.01544937 = sum of:
            0.01544937 = weight(_text_:on in 6681) [ClassicSimilarity], result of:
              0.01544937 = score(doc=6681,freq=2.0), product of:
                0.090823986 = queryWeight, product of:
                  2.199415 = idf(docFreq=13325, maxDocs=44218)
                  0.041294612 = queryNorm
                0.17010231 = fieldWeight in 6681, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.199415 = idf(docFreq=13325, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=6681)
          0.5 = coord(1/2)
      0.25 = coord(2/8)
    
    Abstract
    Presents a new approach to text analysis for automating the key phrase indexing process, using artificial intelligence techniques. This mimics the behaviour of human experts by using a rule base consisting of insertion and deletion rules generated by subject-matter experts. The insertion rules are based on the idea that some phrases found in a text imply or trigger other phrases. The deletion rules apply to semantically ambiguous phrases where text presence alone does not determine appropriateness as a key phrase. The insertion and deletion rules are used to transform a list of found phrases to a list of key phrases for indexing a document. Statistical data are provided to demonstrate the performance of this expert rule based system
  3. Porter, M.F.: ¬An algorithm for suffix stripping (1980) 0.01
    0.006262043 = product of:
      0.050096344 = sum of:
        0.050096344 = weight(_text_:retrieval in 3122) [ClassicSimilarity], result of:
          0.050096344 = score(doc=3122,freq=2.0), product of:
            0.124912694 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.041294612 = queryNorm
            0.40105087 = fieldWeight in 3122, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.09375 = fieldNorm(doc=3122)
      0.125 = coord(1/8)
    
    Footnote
    Wiederabgedruckt in: Readings in information retrieval. Ed.: K. Sparck Jones u. P. Willett. San Francisco: Morgan Kaufmann 1997. S.313-316.
  4. Kajanan, S.; Bao, Y.; Datta, A.; VanderMeer, D.; Dutta, K.: Efficient automatic search query formulation using phrase-level analysis (2014) 0.01
    0.0060233157 = product of:
      0.024093263 = sum of:
        0.017850775 = weight(_text_:of in 1264) [ClassicSimilarity], result of:
          0.017850775 = score(doc=1264,freq=32.0), product of:
            0.06457475 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.041294612 = queryNorm
            0.27643585 = fieldWeight in 1264, product of:
              5.656854 = tf(freq=32.0), with freq of:
                32.0 = termFreq=32.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03125 = fieldNorm(doc=1264)
        0.0062424885 = product of:
          0.012484977 = sum of:
            0.012484977 = weight(_text_:on in 1264) [ClassicSimilarity], result of:
              0.012484977 = score(doc=1264,freq=4.0), product of:
                0.090823986 = queryWeight, product of:
                  2.199415 = idf(docFreq=13325, maxDocs=44218)
                  0.041294612 = queryNorm
                0.13746344 = fieldWeight in 1264, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  2.199415 = idf(docFreq=13325, maxDocs=44218)
                  0.03125 = fieldNorm(doc=1264)
          0.5 = coord(1/2)
      0.25 = coord(2/8)
    
    Abstract
    Over the past decade, the volume of information available digitally over the Internet has grown enormously. Technical developments in the area of search, such as Google's Page Rank algorithm, have proved so good at serving relevant results that Internet search has become integrated into daily human activity. One can endlessly explore topics of interest simply by querying and reading through the resulting links. Yet, although search engines are well known for providing relevant results based on users' queries, users do not always receive the results they are looking for. Google's Director of Research describes clickstream evidence of frustrated users repeatedly reformulating queries and searching through page after page of results. Given the general quality of search engine results, one must consider the possibility that the frustrated user's query is not effective; that is, it does not describe the essence of the user's interest. Indeed, extensive research into human search behavior has found that humans are not very effective at formulating good search queries that describe what they are interested in. Ideally, the user should simply point to a portion of text that sparked the user's interest, and a system should automatically formulate a search query that captures the essence of the text. In this paper, we describe an implemented system that provides this capability. We first describe how our work differs from existing work in automatic query formulation, and propose a new method for improved quantification of the relevance of candidate search terms drawn from input text using phrase-level analysis. We then propose an implementable method designed to provide relevant queries based on a user's text input. We demonstrate the quality of our results and performance of our system through experimental studies. Our results demonstrate that our system produces relevant search terms with roughly two-thirds precision and recall compared to search terms selected by experts, and that typical users find significantly more relevant results (31% more relevant) more quickly (64% faster) using our system than self-formulated search queries. Further, we show that our implementation can scale to request loads of up to 10 requests per second within current online responsiveness expectations (<2-second response times at the highest loads tested).
    Source
    Journal of the Association for Information Science and Technology. 65(2014) no.5, S.1058-1075
  5. Rapke, K.: Automatische Indexierung von Volltexten für die Gruner+Jahr Pressedatenbank (2001) 0.01
    0.005834314 = product of:
      0.046674512 = sum of:
        0.046674512 = weight(_text_:retrieval in 5863) [ClassicSimilarity], result of:
          0.046674512 = score(doc=5863,freq=10.0), product of:
            0.124912694 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.041294612 = queryNorm
            0.37365708 = fieldWeight in 5863, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5863)
      0.125 = coord(1/8)
    
    Abstract
    Retrievaltests sind die anerkannteste Methode, um neue Verfahren der Inhaltserschließung gegenüber traditionellen Verfahren zu rechtfertigen. Im Rahmen einer Diplomarbeit wurden zwei grundsätzlich unterschiedliche Systeme der automatischen inhaltlichen Erschließung anhand der Pressedatenbank des Verlagshauses Gruner + Jahr (G+J) getestet und evaluiert. Untersucht wurde dabei natürlichsprachliches Retrieval im Vergleich zu Booleschem Retrieval. Bei den beiden Systemen handelt es sich zum einen um Autonomy von Autonomy Inc. und DocCat, das von IBM an die Datenbankstruktur der G+J Pressedatenbank angepasst wurde. Ersteres ist ein auf natürlichsprachlichem Retrieval basierendes, probabilistisches System. DocCat demgegenüber basiert auf Booleschem Retrieval und ist ein lernendes System, das aufgrund einer intellektuell erstellten Trainingsvorlage indexiert. Methodisch geht die Evaluation vom realen Anwendungskontext der Textdokumentation von G+J aus. Die Tests werden sowohl unter statistischen wie auch qualitativen Gesichtspunkten bewertet. Ein Ergebnis der Tests ist, dass DocCat einige Mängel gegenüber der intellektuellen Inhaltserschließung aufweist, die noch behoben werden müssen, während das natürlichsprachliche Retrieval von Autonomy in diesem Rahmen und für die speziellen Anforderungen der G+J Textdokumentation so nicht einsetzbar ist
  6. Stock, M.: Textwortmethode und Übersetzungsrelation : Eine Methode zum Aufbau von kombinierten Literaturnachweis- und Terminologiedatenbanken (1989) 0.01
    0.005218369 = product of:
      0.04174695 = sum of:
        0.04174695 = weight(_text_:retrieval in 3412) [ClassicSimilarity], result of:
          0.04174695 = score(doc=3412,freq=2.0), product of:
            0.124912694 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.041294612 = queryNorm
            0.33420905 = fieldWeight in 3412, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.078125 = fieldNorm(doc=3412)
      0.125 = coord(1/8)
    
    Abstract
    Geisteswissenschaftliche Fachinformation erfordert eine enge Kooperation zwischen Literaturnachweis- und Terminologieinformationssystemen. Eine geeignete Dokumentationsmethode für die Auswertung geisteswissen- schaftlicher Literatur ist die Textwortwethode. Dem originalsprachig aufgenommenen Begriffsrepertoire ist ein einheitssprachiger Zugriff beizuordnen, der einerseits ein vollständiges und genaues Retrieval garantiert und andererseits den Aufbau fachspezifischer Wörterbücher vorantreibt
  7. Kuhlen, R.: Experimentelle Morphologie in der Informationswissenschaft (1977) 0.01
    0.0051659215 = product of:
      0.041327372 = sum of:
        0.041327372 = weight(_text_:retrieval in 4253) [ClassicSimilarity], result of:
          0.041327372 = score(doc=4253,freq=4.0), product of:
            0.124912694 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.041294612 = queryNorm
            0.33085006 = fieldWeight in 4253, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4253)
      0.125 = coord(1/8)
    
    LCSH
    Information storage and retrieval systems
    Subject
    Information storage and retrieval systems
  8. Goller, C.; Löning, J.; Will, T.; Wolff, W.: Automatic document classification : a thourough evaluation of various methods (2000) 0.00
    0.0045538945 = product of:
      0.018215578 = sum of:
        0.011594418 = weight(_text_:of in 5480) [ClassicSimilarity], result of:
          0.011594418 = score(doc=5480,freq=6.0), product of:
            0.06457475 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.041294612 = queryNorm
            0.17955035 = fieldWeight in 5480, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=5480)
        0.006621159 = product of:
          0.013242318 = sum of:
            0.013242318 = weight(_text_:on in 5480) [ClassicSimilarity], result of:
              0.013242318 = score(doc=5480,freq=2.0), product of:
                0.090823986 = queryWeight, product of:
                  2.199415 = idf(docFreq=13325, maxDocs=44218)
                  0.041294612 = queryNorm
                0.14580199 = fieldWeight in 5480, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.199415 = idf(docFreq=13325, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5480)
          0.5 = coord(1/2)
      0.25 = coord(2/8)
    
    Abstract
    (Automatic) document classification is generally defined as content-based assignment of one or more predefined categories to documents. Usually, machine learning, statistical pattern recognition, or neural network approaches are used to construct classifiers automatically. In this paper we thoroughly evaluate a wide variety of these methods on a document classification task for German text. We evaluate different feature construction and selection methods and various classifiers. Our main results are: (1) feature selection is necessary not only to reduce learning and classification time, but also to avoid overfitting (even for Support Vector Machines); (2) surprisingly, our morphological analysis does not improve classification quality compared to a letter 5-gram approach; (3) Support Vector Machines are significantly better than all other classification methods
  9. Li, W.; Wong, K.-F.; Yuan, C.: Toward automatic Chinese temporal information extraction (2001) 0.00
    0.0044978103 = product of:
      0.017991241 = sum of:
        0.012473608 = weight(_text_:of in 6029) [ClassicSimilarity], result of:
          0.012473608 = score(doc=6029,freq=10.0), product of:
            0.06457475 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.041294612 = queryNorm
            0.19316542 = fieldWeight in 6029, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=6029)
        0.0055176322 = product of:
          0.0110352645 = sum of:
            0.0110352645 = weight(_text_:on in 6029) [ClassicSimilarity], result of:
              0.0110352645 = score(doc=6029,freq=2.0), product of:
                0.090823986 = queryWeight, product of:
                  2.199415 = idf(docFreq=13325, maxDocs=44218)
                  0.041294612 = queryNorm
                0.121501654 = fieldWeight in 6029, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.199415 = idf(docFreq=13325, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=6029)
          0.5 = coord(1/2)
      0.25 = coord(2/8)
    
    Abstract
    Over the past few years, temporal information processing and temporal database management have increasingly become hot topics. Nevertheless, only a few researchers have investigated these areas in the Chinese language. This lays down the objective of our research: to exploit Chinese language processing techniques for temporal information extraction and concept reasoning. In this article, we first study the mechanism for expressing time in Chinese. On the basis of the study, we then design a general frame structure for maintaining the extracted temporal concepts and propose a system for extracting time-dependent information from Hong Kong financial news. In the system, temporal knowledge is represented by different types of temporal concepts (TTC) and different temporal relations, including absolute and relative relations, which are used to correlate between action times and reference times. In analyzing a sentence, the algorithm first determines the situation related to the verb. This in turn will identify the type of temporal concept associated with the verb. After that, the relevant temporal information is extracted and the temporal relations are derived. These relations link relevant concept frames together in chronological order, which in turn provide the knowledge to fulfill users' queries, e.g., for question-answering (i.e., Q&A) applications
    Source
    Journal of the American Society for Information Science and technology. 52(2001) no.9, S.748-762
  10. Fox, C.: Lexical analysis and stoplists (1992) 0.00
    0.0041746954 = product of:
      0.033397563 = sum of:
        0.033397563 = weight(_text_:retrieval in 3502) [ClassicSimilarity], result of:
          0.033397563 = score(doc=3502,freq=2.0), product of:
            0.124912694 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.041294612 = queryNorm
            0.26736724 = fieldWeight in 3502, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0625 = fieldNorm(doc=3502)
      0.125 = coord(1/8)
    
    Source
    Information retrieval: data structures and algorithms. Ed.: W.B. Frakes u. R. Baeza-Yates
  11. Stock, W.G.: Textwortmethode : Norbert Henrichs zum 65. (3) (2000) 0.00
    0.0041746954 = product of:
      0.033397563 = sum of:
        0.033397563 = weight(_text_:retrieval in 4891) [ClassicSimilarity], result of:
          0.033397563 = score(doc=4891,freq=2.0), product of:
            0.124912694 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.041294612 = queryNorm
            0.26736724 = fieldWeight in 4891, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0625 = fieldNorm(doc=4891)
      0.125 = coord(1/8)
    
    Abstract
    Nur wenige Dokumentationsmethoden werden mit dem Namen ihrer Entwickler assoziiert. Ausnahmen sind Melvil Dewey (DDC), S.R. Ranganathan (Colon Classification) - und Norbert Henrichs. Seine Textwortmethode ermöglicht die Indexierung und das Retrieval von Literatur aus Fachgebieten, die keine allseits akzeptierte Fachterminologie vorweisen, also viele Sozial- und Geisteswissenschaften, vorneweg die Philosophie. Für den Einsatz in der elektronischen Philosophie-Dokumentation hat Henrichs in den späten sechziger Jahren die Textwortmethode entworfen. Er ist damit nicht nur einer der Pioniere der Anwendung der elektronischen Datenverarbeitung in der Informationspraxis, sondern auch der Pionier bei der Dokumentation terminologisch nicht starrer Fachsprachen
  12. Volk, M.; Mittermaier, H.; Schurig, A.; Biedassek, T.: Halbautomatische Volltextanalyse, Datenbankaufbau und Document Retrieval (1992) 0.00
    0.0036528583 = product of:
      0.029222867 = sum of:
        0.029222867 = weight(_text_:retrieval in 2571) [ClassicSimilarity], result of:
          0.029222867 = score(doc=2571,freq=2.0), product of:
            0.124912694 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.041294612 = queryNorm
            0.23394634 = fieldWeight in 2571, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2571)
      0.125 = coord(1/8)
    
  13. Lorenz, S.: Konzeption und prototypische Realisierung einer begriffsbasierten Texterschließung (2006) 0.00
    0.0020980686 = product of:
      0.016784549 = sum of:
        0.016784549 = product of:
          0.033569098 = sum of:
            0.033569098 = weight(_text_:22 in 1746) [ClassicSimilarity], result of:
              0.033569098 = score(doc=1746,freq=2.0), product of:
                0.1446067 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.041294612 = queryNorm
                0.23214069 = fieldWeight in 1746, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1746)
          0.5 = coord(1/2)
      0.125 = coord(1/8)
    
    Date
    22. 3.2015 9:17:30
  14. Malone, L.C.; Driscoll, J.R.; Pepe, J.W.: Modeling the performance of an automated keywording system (1991) 0.00
    0.0019324032 = product of:
      0.0154592255 = sum of:
        0.0154592255 = weight(_text_:of in 6682) [ClassicSimilarity], result of:
          0.0154592255 = score(doc=6682,freq=6.0), product of:
            0.06457475 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.041294612 = queryNorm
            0.23940048 = fieldWeight in 6682, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0625 = fieldNorm(doc=6682)
      0.125 = coord(1/8)
    
    Abstract
    Presents a model for predicting the performance of a computerised keyword assigning and indexing system. Statistical procedures were investigated in order to protect against incorrect keywording by the system behaving as an expert system designed to mimic the behaviour of human keyword indexers and representing lessons learned from military exercises and operations
  15. Bredack, J.: Automatische Extraktion fachterminologischer Mehrwortbegriffe : ein Verfahrensvergleich (2016) 0.00
    6.9729594E-4 = product of:
      0.0055783675 = sum of:
        0.0055783675 = weight(_text_:of in 3194) [ClassicSimilarity], result of:
          0.0055783675 = score(doc=3194,freq=2.0), product of:
            0.06457475 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.041294612 = queryNorm
            0.086386204 = fieldWeight in 3194, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3194)
      0.125 = coord(1/8)
    
    Content
    Schriftliche Hausarbeit (Masterarbeit) zur Erlangung des Grades eines Master of Arts An der Universität Trier Fachbereich II Studiengang Computerlinguistik.