Search (135 results, page 1 of 7)

  • × theme_ss:"Computerlinguistik"
  1. Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.23
    0.22910213 = product of:
      0.3054695 = sum of:
        0.07177531 = product of:
          0.21532592 = sum of:
            0.21532592 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
              0.21532592 = score(doc=562,freq=2.0), product of:
                0.38312992 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.045191016 = queryNorm
                0.56201804 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.33333334 = coord(1/3)
        0.21532592 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
          0.21532592 = score(doc=562,freq=2.0), product of:
            0.38312992 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.045191016 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
        0.018368276 = product of:
          0.03673655 = sum of:
            0.03673655 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
              0.03673655 = score(doc=562,freq=2.0), product of:
                0.15825124 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045191016 = queryNorm
                0.23214069 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Content
    Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
    Date
    8. 1.2013 10:22:32
  2. Huo, W.: Automatic multi-word term extraction and its application to Web-page summarization (2012) 0.15
    0.15024263 = product of:
      0.30048525 = sum of:
        0.21532592 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
          0.21532592 = score(doc=563,freq=2.0), product of:
            0.38312992 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.045191016 = queryNorm
            0.56201804 = fieldWeight in 563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=563)
        0.08515934 = sum of:
          0.048422787 = weight(_text_:methods in 563) [ClassicSimilarity], result of:
            0.048422787 = score(doc=563,freq=2.0), product of:
              0.18168657 = queryWeight, product of:
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.045191016 = queryNorm
              0.26651827 = fieldWeight in 563, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.046875 = fieldNorm(doc=563)
          0.03673655 = weight(_text_:22 in 563) [ClassicSimilarity], result of:
            0.03673655 = score(doc=563,freq=2.0), product of:
              0.15825124 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.045191016 = queryNorm
              0.23214069 = fieldWeight in 563, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=563)
      0.5 = coord(2/4)
    
    Abstract
    In this thesis we propose three new word association measures for multi-word term extraction. We combine these association measures with LocalMaxs algorithm in our extraction model and compare the results of different multi-word term extraction methods. Our approach is language and domain independent and requires no training data. It can be applied to such tasks as text summarization, information retrieval, and document classification. We further explore the potential of using multi-word terms as an effective representation for general web-page summarization. We extract multi-word terms from human written summaries in a large collection of web-pages, and generate the summaries by aligning document words with these multi-word terms. Our system applies machine translation technology to learn the aligning process from a training set and focuses on selecting high quality multi-word terms from human written summaries to generate suitable results for web-page summarization.
    Content
    A Thesis presented to The University of Guelph In partial fulfilment of requirements for the degree of Master of Science in Computer Science. Vgl. Unter: http://www.inf.ufrgs.br%2F~ceramisch%2Fdownload_files%2Fpublications%2F2009%2Fp01.pdf.
    Date
    10. 1.2013 19:22:47
  3. Noever, D.; Ciolino, M.: ¬The Turing deception (2022) 0.14
    0.14355062 = product of:
      0.28710124 = sum of:
        0.07177531 = product of:
          0.21532592 = sum of:
            0.21532592 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
              0.21532592 = score(doc=862,freq=2.0), product of:
                0.38312992 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.045191016 = queryNorm
                0.56201804 = fieldWeight in 862, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=862)
          0.33333334 = coord(1/3)
        0.21532592 = weight(_text_:2f in 862) [ClassicSimilarity], result of:
          0.21532592 = score(doc=862,freq=2.0), product of:
            0.38312992 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.045191016 = queryNorm
            0.56201804 = fieldWeight in 862, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=862)
      0.5 = coord(2/4)
    
    Source
    https%3A%2F%2Farxiv.org%2Fabs%2F2212.06721&usg=AOvVaw3i_9pZm9y_dQWoHi6uv0EN
  4. Zhou, L.; Zhang, D.: NLPIR: a theoretical framework for applying Natural Language Processing to information retrieval (2003) 0.08
    0.077783585 = product of:
      0.15556717 = sum of:
        0.13135578 = weight(_text_:graphic in 5148) [ClassicSimilarity], result of:
          0.13135578 = score(doc=5148,freq=2.0), product of:
            0.29924196 = queryWeight, product of:
              6.6217136 = idf(docFreq=159, maxDocs=44218)
              0.045191016 = queryNorm
            0.43896174 = fieldWeight in 5148, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.6217136 = idf(docFreq=159, maxDocs=44218)
              0.046875 = fieldNorm(doc=5148)
        0.024211394 = product of:
          0.048422787 = sum of:
            0.048422787 = weight(_text_:methods in 5148) [ClassicSimilarity], result of:
              0.048422787 = score(doc=5148,freq=2.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.26651827 = fieldWeight in 5148, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5148)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Zhou and Zhang believe that for the potential of natural language processing NLP to be reached in information retrieval a framework for guiding the effort should be in place. They provide a graphic model that identifies different levels of natural language processing effort during the query, document matching process. A direct matching approach uses little NLP, an expansion approach with thesauri, little more, but an extraction approach will often use a variety of NLP techniques, as well as statistical methods. A transformation approach which creates intermediate representations of documents and queries is a step higher in NLP usage, and a uniform approach, which relies on a body of knowledge beyond that of the documents and queries to provide inference and sense making prior to matching would require a maximum NPL effort.
  5. Galvez, C.; Moya-Anegón, F. de: ¬An evaluation of conflation accuracy using finite-state transducers (2006) 0.08
    0.077783585 = product of:
      0.15556717 = sum of:
        0.13135578 = weight(_text_:graphic in 5599) [ClassicSimilarity], result of:
          0.13135578 = score(doc=5599,freq=2.0), product of:
            0.29924196 = queryWeight, product of:
              6.6217136 = idf(docFreq=159, maxDocs=44218)
              0.045191016 = queryNorm
            0.43896174 = fieldWeight in 5599, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.6217136 = idf(docFreq=159, maxDocs=44218)
              0.046875 = fieldNorm(doc=5599)
        0.024211394 = product of:
          0.048422787 = sum of:
            0.048422787 = weight(_text_:methods in 5599) [ClassicSimilarity], result of:
              0.048422787 = score(doc=5599,freq=2.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.26651827 = fieldWeight in 5599, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5599)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Purpose - To evaluate the accuracy of conflation methods based on finite-state transducers (FSTs). Design/methodology/approach - Incorrectly lemmatized and stemmed forms may lead to the retrieval of inappropriate documents. Experimental studies to date have focused on retrieval performance, but very few on conflation performance. The process of normalization we used involved a linguistic toolbox that allowed us to construct, through graphic interfaces, electronic dictionaries represented internally by FSTs. The lexical resources developed were applied to a Spanish test corpus for merging term variants in canonical lemmatized forms. Conflation performance was evaluated in terms of an adaptation of recall and precision measures, based on accuracy and coverage, not actual retrieval. The results were compared with those obtained using a Spanish version of the Porter algorithm. Findings - The conclusion is that the main strength of lemmatization is its accuracy, whereas its main limitation is the underanalysis of variant forms. Originality/value - The report outlines the potential of transducers in their application to normalization processes.
  6. Rahmstorf, G.: Concept structures for large vocabularies (1998) 0.07
    0.074862026 = product of:
      0.14972405 = sum of:
        0.13135578 = weight(_text_:graphic in 75) [ClassicSimilarity], result of:
          0.13135578 = score(doc=75,freq=2.0), product of:
            0.29924196 = queryWeight, product of:
              6.6217136 = idf(docFreq=159, maxDocs=44218)
              0.045191016 = queryNorm
            0.43896174 = fieldWeight in 75, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.6217136 = idf(docFreq=159, maxDocs=44218)
              0.046875 = fieldNorm(doc=75)
        0.018368276 = product of:
          0.03673655 = sum of:
            0.03673655 = weight(_text_:22 in 75) [ClassicSimilarity], result of:
              0.03673655 = score(doc=75,freq=2.0), product of:
                0.15825124 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045191016 = queryNorm
                0.23214069 = fieldWeight in 75, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=75)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    A technology is described which supports the acquisition, visualisation and manipulation of large vocabularies with associated structures. It is used for dictionary production, terminology data bases, thesauri, library classification systems etc. Essential features of the technology are a lexicographic user interface, variable word description, unlimited list of word readings, a concept language, automatic transformations of formulas into graphic structures, structure manipulation operations and retransformation into formulas. The concept language includes notations for undefined concepts. The structure of defined concepts can be constructed interactively. The technology supports the generation of large vocabularies with structures representing word senses. Concept structures and ordering systems for indexing and retrieval can be constructed separately and connected by associating relations.
    Date
    30.12.2001 19:01:22
  7. Basili, R.; Pazienza, M.T.; Velardi, P.: ¬An empirical symbolic approach to natural language processing (1996) 0.03
    0.028386448 = product of:
      0.11354579 = sum of:
        0.11354579 = sum of:
          0.064563714 = weight(_text_:methods in 6753) [ClassicSimilarity], result of:
            0.064563714 = score(doc=6753,freq=2.0), product of:
              0.18168657 = queryWeight, product of:
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.045191016 = queryNorm
              0.35535768 = fieldWeight in 6753, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.0625 = fieldNorm(doc=6753)
          0.048982073 = weight(_text_:22 in 6753) [ClassicSimilarity], result of:
            0.048982073 = score(doc=6753,freq=2.0), product of:
              0.15825124 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.045191016 = queryNorm
              0.30952093 = fieldWeight in 6753, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0625 = fieldNorm(doc=6753)
      0.25 = coord(1/4)
    
    Abstract
    Describes and evaluates the results of a large scale lexical learning system, ARISTO-LEX, that uses a combination of probabilisitc and knowledge based methods for the acquisition of selectional restrictions of words in sublanguages. Presents experimental data obtained from different corpora in different doamins and languages, and shows that the acquired lexical data not only have practical applications in natural language processing, but they are useful for a comparative analysis of sublanguages
    Date
    6. 3.1997 16:22:15
  8. Ruge, G.: Sprache und Computer : Wortbedeutung und Termassoziation. Methoden zur automatischen semantischen Klassifikation (1995) 0.03
    0.028386448 = product of:
      0.11354579 = sum of:
        0.11354579 = sum of:
          0.064563714 = weight(_text_:methods in 1534) [ClassicSimilarity], result of:
            0.064563714 = score(doc=1534,freq=2.0), product of:
              0.18168657 = queryWeight, product of:
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.045191016 = queryNorm
              0.35535768 = fieldWeight in 1534, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.0625 = fieldNorm(doc=1534)
          0.048982073 = weight(_text_:22 in 1534) [ClassicSimilarity], result of:
            0.048982073 = score(doc=1534,freq=2.0), product of:
              0.15825124 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.045191016 = queryNorm
              0.30952093 = fieldWeight in 1534, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0625 = fieldNorm(doc=1534)
      0.25 = coord(1/4)
    
    Content
    Enthält folgende Kapitel: (1) Motivation; (2) Language philosophical foundations; (3) Structural comparison of extensions; (4) Earlier approaches towards term association; (5) Experiments; (6) Spreading-activation networks or memory models; (7) Perspective. Appendices: Heads and modifiers of 'car'. Glossary. Index. Language and computer. Word semantics and term association. Methods towards an automatic semantic classification
    Footnote
    Rez. in: Knowledge organization 22(1995) no.3/4, S.182-184 (M.T. Rolland)
  9. Godby, J.: WordSmith research project bridges gap between tokens and indexes (1998) 0.02
    0.024838142 = product of:
      0.09935257 = sum of:
        0.09935257 = sum of:
          0.056493253 = weight(_text_:methods in 4729) [ClassicSimilarity], result of:
            0.056493253 = score(doc=4729,freq=2.0), product of:
              0.18168657 = queryWeight, product of:
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.045191016 = queryNorm
              0.31093797 = fieldWeight in 4729, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4729)
          0.042859312 = weight(_text_:22 in 4729) [ClassicSimilarity], result of:
            0.042859312 = score(doc=4729,freq=2.0), product of:
              0.15825124 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.045191016 = queryNorm
              0.2708308 = fieldWeight in 4729, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4729)
      0.25 = coord(1/4)
    
    Abstract
    Reports on an OCLC natural language processing research project to develop methods for identifying terminology in unstructured electronic text, especially material associated with new cultural trends and emerging subjects. Current OCLC production software can only identify single words as indexable terms in full text documents, thus a major goal of the WordSmith project is to develop software that can automatically identify and intelligently organize phrases for uses in database indexes. By analyzing user terminology from local newspapers in the USA, the latest cultural trends and technical developments as well as personal and geographic names have been drawm out. Notes that this new vocabulary can also be mapped into reference works
    Source
    OCLC newsletter. 1998, no.234, Jul/Aug, S.22-24
  10. Deventer, J.P. van; Kruger, C.J.; Johnson, R.D.: Delineating knowledge management through lexical analysis : a retrospective (2015) 0.02
    0.019480728 = product of:
      0.07792291 = sum of:
        0.07792291 = sum of:
          0.056493253 = weight(_text_:methods in 3807) [ClassicSimilarity], result of:
            0.056493253 = score(doc=3807,freq=8.0), product of:
              0.18168657 = queryWeight, product of:
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.045191016 = queryNorm
              0.31093797 = fieldWeight in 3807, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.02734375 = fieldNorm(doc=3807)
          0.021429656 = weight(_text_:22 in 3807) [ClassicSimilarity], result of:
            0.021429656 = score(doc=3807,freq=2.0), product of:
              0.15825124 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.045191016 = queryNorm
              0.1354154 = fieldWeight in 3807, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.02734375 = fieldNorm(doc=3807)
      0.25 = coord(1/4)
    
    Abstract
    Purpose Academic authors tend to define terms that meet their own needs. Knowledge Management (KM) is a term that comes to mind and is examined in this study. Lexicographical research identified KM terms used by authors from 1996 to 2006 in academic outlets to define KM. Data were collected based on strict criteria which included that definitions should be unique instances. From 2006 onwards, these authors could not identify new unique instances of definitions with repetitive usage of such definition instances. Analysis revealed that KM is directly defined by People (Person and Organisation), Processes (Codify, Share, Leverage, and Process) and Contextualised Content (Information). The paper aims to discuss these issues. Design/methodology/approach The aim of this paper is to add to the body of knowledge in the KM discipline and supply KM practitioners and scholars with insight into what is commonly regarded to be KM so as to reignite the debate on what one could consider as KM. The lexicon used by KM scholars was evaluated though the application of lexicographical research methods as extended though Knowledge Discovery and Text Analysis methods. Findings By simplifying term relationships through the application of lexicographical research methods, as extended though Knowledge Discovery and Text Analysis methods, it was found that KM is directly defined by People (Person and Organisation), Processes (Codify, Share, Leverage, Process) and Contextualised Content (Information). One would therefore be able to indicate that KM, from an academic point of view, refers to people processing contextualised content.
    Date
    20. 1.2015 18:30:22
  11. Fóris, A.: Network theory and terminology (2013) 0.02
    0.01774153 = product of:
      0.07096612 = sum of:
        0.07096612 = sum of:
          0.040352322 = weight(_text_:methods in 1365) [ClassicSimilarity], result of:
            0.040352322 = score(doc=1365,freq=2.0), product of:
              0.18168657 = queryWeight, product of:
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.045191016 = queryNorm
              0.22209854 = fieldWeight in 1365, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1365)
          0.030613795 = weight(_text_:22 in 1365) [ClassicSimilarity], result of:
            0.030613795 = score(doc=1365,freq=2.0), product of:
              0.15825124 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.045191016 = queryNorm
              0.19345059 = fieldWeight in 1365, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1365)
      0.25 = coord(1/4)
    
    Abstract
    The paper aims to present the relations of network theory and terminology. The model of scale-free networks, which has been recently developed and widely applied since, can be effectively used in terminology research as well. Operation based on the principle of networks is a universal characteristic of complex systems. Networks are governed by general laws. The model of scale-free networks can be viewed as a statistical-probability model, and it can be described with mathematical tools. Its main feature is that "everything is connected to everything else," that is, every node is reachable (in a few steps) starting from any other node; this phenomena is called "the small world phenomenon." The existence of a linguistic network and the general laws of the operation of networks enable us to place issues of language use in the complex system of relations that reveal the deeper connection s between phenomena with the help of networks embedded in each other. The realization of the metaphor that language also has a network structure is the basis of the classification methods of the terminological system, and likewise of the ways of creating terminology databases, which serve the purpose of providing easy and versatile accessibility to specialised knowledge.
    Date
    2. 9.2014 21:22:48
  12. Luo, L.; Ju, J.; Li, Y.-F.; Haffari, G.; Xiong, B.; Pan, S.: ChatRule: mining logical rules with large language models for knowledge graph reasoning (2023) 0.02
    0.01774153 = product of:
      0.07096612 = sum of:
        0.07096612 = sum of:
          0.040352322 = weight(_text_:methods in 1171) [ClassicSimilarity], result of:
            0.040352322 = score(doc=1171,freq=2.0), product of:
              0.18168657 = queryWeight, product of:
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.045191016 = queryNorm
              0.22209854 = fieldWeight in 1171, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1171)
          0.030613795 = weight(_text_:22 in 1171) [ClassicSimilarity], result of:
            0.030613795 = score(doc=1171,freq=2.0), product of:
              0.15825124 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.045191016 = queryNorm
              0.19345059 = fieldWeight in 1171, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1171)
      0.25 = coord(1/4)
    
    Abstract
    Logical rules are essential for uncovering the logical connections between relations, which could improve the reasoning performance and provide interpretable results on knowledge graphs (KGs). Although there have been many efforts to mine meaningful logical rules over KGs, existing methods suffer from the computationally intensive searches over the rule space and a lack of scalability for large-scale KGs. Besides, they often ignore the semantics of relations which is crucial for uncovering logical connections. Recently, large language models (LLMs) have shown impressive performance in the field of natural language processing and various applications, owing to their emergent ability and generalizability. In this paper, we propose a novel framework, ChatRule, unleashing the power of large language models for mining logical rules over knowledge graphs. Specifically, the framework is initiated with an LLM-based rule generator, leveraging both the semantic and structural information of KGs to prompt LLMs to generate logical rules. To refine the generated rules, a rule ranking module estimates the rule quality by incorporating facts from existing KGs. Last, a rule validator harnesses the reasoning ability of LLMs to validate the logical correctness of ranked rules through chain-of-thought reasoning. ChatRule is evaluated on four large-scale KGs, w.r.t. different rule quality metrics and downstream tasks, showing the effectiveness and scalability of our method.
    Date
    23.11.2023 19:07:22
  13. Pirkola, A.; Hedlund, T.; Keskustalo, H.; Järvelin, K.: Dictionary-based cross-language information retrieval : problems, methods, and research findings (2001) 0.01
    0.014123313 = product of:
      0.056493253 = sum of:
        0.056493253 = product of:
          0.112986505 = sum of:
            0.112986505 = weight(_text_:methods in 3908) [ClassicSimilarity], result of:
              0.112986505 = score(doc=3908,freq=2.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.62187594 = fieldWeight in 3908, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.109375 = fieldNorm(doc=3908)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
  14. Ramisch, C.; Schreiner, P.; Idiart, M.; Villavicencio, A.: ¬An evaluation of methods for the extraction of multiword expressions (20xx) 0.01
    0.013978455 = product of:
      0.05591382 = sum of:
        0.05591382 = product of:
          0.11182764 = sum of:
            0.11182764 = weight(_text_:methods in 962) [ClassicSimilarity], result of:
              0.11182764 = score(doc=962,freq=6.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.6154976 = fieldWeight in 962, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.0625 = fieldNorm(doc=962)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    This paper focuses on the evaluation of some methods for the automatic acquisition of Multiword Expressions (MWEs). First we investigate the hypothesis that MWEs can be detected solely by the distinct statistical properties of their component words, regardless of their type, comparing 3 statistical measures: Mutual Information, Chi**2 and Permutation Entropy. Moreover, we also look at the impact that the addition of type-specific linguistic information has on the performance of these methods.
  15. Galvez, C.; Moya-Anegón, F. de; Solana, V.H.: Term conflation methods in information retrieval : non-linguistic and linguistic approaches (2005) 0.01
    0.013534581 = product of:
      0.054138325 = sum of:
        0.054138325 = product of:
          0.10827665 = sum of:
            0.10827665 = weight(_text_:methods in 4394) [ClassicSimilarity], result of:
              0.10827665 = score(doc=4394,freq=10.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.595953 = fieldWeight in 4394, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4394)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Purpose - To propose a categorization of the different conflation procedures at the two basic approaches, non-linguistic and linguistic techniques, and to justify the application of normalization methods within the framework of linguistic techniques. Design/methodology/approach - Presents a range of term conflation methods, that can be used in information retrieval. The uniterm and multiterm variants can be considered equivalent units for the purposes of automatic indexing. Stemming algorithms, segmentation rules, association measures and clustering techniques are well evaluated non-linguistic methods, and experiments with these techniques show a wide variety of results. Alternatively, the lemmatisation and the use of syntactic pattern-matching, through equivalence relations represented in finite-state transducers (FST), are emerging methods for the recognition and standardization of terms. Findings - The survey attempts to point out the positive and negative effects of the linguistic approach and its potential as a term conflation method. Originality/value - Outlines the importance of FSTs for the normalization of term variants.
  16. Xiang, R.; Chersoni, E.; Lu, Q.; Huang, C.-R.; Li, W.; Long, Y.: Lexical data augmentation for sentiment analysis (2021) 0.01
    0.012355326 = product of:
      0.049421303 = sum of:
        0.049421303 = product of:
          0.098842606 = sum of:
            0.098842606 = weight(_text_:methods in 392) [ClassicSimilarity], result of:
              0.098842606 = score(doc=392,freq=12.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.54402816 = fieldWeight in 392, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=392)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Machine learning methods, especially deep learning models, have achieved impressive performance in various natural language processing tasks including sentiment analysis. However, deep learning models are more demanding for training data. Data augmentation techniques are widely used to generate new instances based on modifications to existing data or relying on external knowledge bases to address annotated data scarcity, which hinders the full potential of machine learning techniques. This paper presents our work using part-of-speech (POS) focused lexical substitution for data augmentation (PLSDA) to enhance the performance of machine learning algorithms in sentiment analysis. We exploit POS information to identify words to be replaced and investigate different augmentation strategies to find semantically related substitutions when generating new instances. The choice of POS tags as well as a variety of strategies such as semantic-based substitution methods and sampling methods are discussed in detail. Performance evaluation focuses on the comparison between PLSDA and two previous lexical substitution-based data augmentation methods, one of which is thesaurus-based, and the other is lexicon manipulation based. Our approach is tested on five English sentiment analysis benchmarks: SST-2, MR, IMDB, Twitter, and AirRecord. Hyperparameters such as the candidate similarity threshold and number of newly generated instances are optimized. Results show that six classifiers (SVM, LSTM, BiLSTM-AT, bidirectional encoder representations from transformers [BERT], XLNet, and RoBERTa) trained with PLSDA achieve accuracy improvement of more than 0.6% comparing to two previous lexical substitution methods averaged on five benchmarks. Introducing POS constraint and well-designed augmentation strategies can improve the reliability of lexical data augmentation methods. Consequently, PLSDA significantly improves the performance of sentiment analysis algorithms.
  17. Warner, A.J.: Natural language processing (1987) 0.01
    0.012245518 = product of:
      0.048982073 = sum of:
        0.048982073 = product of:
          0.097964145 = sum of:
            0.097964145 = weight(_text_:22 in 337) [ClassicSimilarity], result of:
              0.097964145 = score(doc=337,freq=2.0), product of:
                0.15825124 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045191016 = queryNorm
                0.61904186 = fieldWeight in 337, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.125 = fieldNorm(doc=337)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Source
    Annual review of information science and technology. 22(1987), S.79-108
  18. Priß, U.: ¬The formalization of WordNet by methods of relational concept analysis (1998) 0.01
    0.012105697 = product of:
      0.048422787 = sum of:
        0.048422787 = product of:
          0.096845575 = sum of:
            0.096845575 = weight(_text_:methods in 3079) [ClassicSimilarity], result of:
              0.096845575 = score(doc=3079,freq=2.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.53303653 = fieldWeight in 3079, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.09375 = fieldNorm(doc=3079)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
  19. Empirical natural language processing (1997) 0.01
    0.012105697 = product of:
      0.048422787 = sum of:
        0.048422787 = product of:
          0.096845575 = sum of:
            0.096845575 = weight(_text_:methods in 3328) [ClassicSimilarity], result of:
              0.096845575 = score(doc=3328,freq=2.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.53303653 = fieldWeight in 3328, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.09375 = fieldNorm(doc=3328)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Footnote
    A special section reviewing recent research in empirical methods in speech recognition, syntactic parsing, semantic processing, information extraction and machine translation
  20. Bakar, Z.A.; Sembok, T.M.T.; Yusoff, M.: ¬An evaluation of retrieval effectiveness using spelling-correction and string-similarity matching methods on Malay texts (2000) 0.01
    0.012105697 = product of:
      0.048422787 = sum of:
        0.048422787 = product of:
          0.096845575 = sum of:
            0.096845575 = weight(_text_:methods in 4804) [ClassicSimilarity], result of:
              0.096845575 = score(doc=4804,freq=8.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.53303653 = fieldWeight in 4804, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4804)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    This article evaluates the effectiveness of spelling-correction and string-similarity matching methods in retrieving similar words in a Maly dictionary associated with a set of query words. The spelling-correction techniques used are SPEEDCOP, Soundex, Davidson, Phonic, and Hartlib. 2 dynamic-programming methods that measure longest common subsequence and edit-cost-distance are used. Several search combinations od query and doctionary words are performed in the experiments, the best being one that stems both query and dictionary words using an existing Malay stemming algorithm. the retrieval effectivness (E) and retrieved and relevant (R&R) mean measure are calculated from weighted combination of recall and precision values. Results from these experiments are then compared with available diagram, a string-similarity method. The best R&R and E results are given by using diagram. Editcost-distances produce the best E results, and both dynamic-programming methods rank second in finding R&R mean measures

Years

Languages

  • e 115
  • d 18
  • el 1
  • slv 1
  • More… Less…

Types

  • a 112
  • el 12
  • m 10
  • s 8
  • x 3
  • p 2
  • d 1
  • r 1
  • More… Less…

Classifications