Search (97 results, page 1 of 5)

  • × theme_ss:"Computerlinguistik"
  1. Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.15
    0.14797027 = product of:
      0.24661711 = sum of:
        0.057946928 = product of:
          0.17384078 = sum of:
            0.17384078 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
              0.17384078 = score(doc=562,freq=2.0), product of:
                0.3093153 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.036484417 = queryNorm
                0.56201804 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.33333334 = coord(1/3)
        0.17384078 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
          0.17384078 = score(doc=562,freq=2.0), product of:
            0.3093153 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.036484417 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
        0.014829405 = product of:
          0.02965881 = sum of:
            0.02965881 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
              0.02965881 = score(doc=562,freq=2.0), product of:
                0.12776221 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.036484417 = queryNorm
                0.23214069 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.5 = coord(1/2)
      0.6 = coord(3/5)
    
    Content
    Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
    Date
    8. 1.2013 10:22:32
  2. Noever, D.; Ciolino, M.: ¬The Turing deception (2022) 0.09
    0.092715085 = product of:
      0.23178771 = sum of:
        0.057946928 = product of:
          0.17384078 = sum of:
            0.17384078 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
              0.17384078 = score(doc=862,freq=2.0), product of:
                0.3093153 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.036484417 = queryNorm
                0.56201804 = fieldWeight in 862, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=862)
          0.33333334 = coord(1/3)
        0.17384078 = weight(_text_:2f in 862) [ClassicSimilarity], result of:
          0.17384078 = score(doc=862,freq=2.0), product of:
            0.3093153 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.036484417 = queryNorm
            0.56201804 = fieldWeight in 862, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=862)
      0.4 = coord(2/5)
    
    Source
    https%3A%2F%2Farxiv.org%2Fabs%2F2212.06721&usg=AOvVaw3i_9pZm9y_dQWoHi6uv0EN
  3. Huo, W.: Automatic multi-word term extraction and its application to Web-page summarization (2012) 0.08
    0.07546808 = product of:
      0.18867019 = sum of:
        0.17384078 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
          0.17384078 = score(doc=563,freq=2.0), product of:
            0.3093153 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.036484417 = queryNorm
            0.56201804 = fieldWeight in 563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=563)
        0.014829405 = product of:
          0.02965881 = sum of:
            0.02965881 = weight(_text_:22 in 563) [ClassicSimilarity], result of:
              0.02965881 = score(doc=563,freq=2.0), product of:
                0.12776221 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.036484417 = queryNorm
                0.23214069 = fieldWeight in 563, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=563)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Content
    A Thesis presented to The University of Guelph In partial fulfilment of requirements for the degree of Master of Science in Computer Science. Vgl. Unter: http://www.inf.ufrgs.br%2F~ceramisch%2Fdownload_files%2Fpublications%2F2009%2Fp01.pdf.
    Date
    10. 1.2013 19:22:47
  4. Haas, S.W.: Natural language processing : toward large-scale, robust systems (1996) 0.02
    0.023447776 = product of:
      0.05861944 = sum of:
        0.038846903 = product of:
          0.077693805 = sum of:
            0.077693805 = weight(_text_:problems in 7415) [ClassicSimilarity], result of:
              0.077693805 = score(doc=7415,freq=4.0), product of:
                0.15058853 = queryWeight, product of:
                  4.1274753 = idf(docFreq=1937, maxDocs=44218)
                  0.036484417 = queryNorm
                0.5159344 = fieldWeight in 7415, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.1274753 = idf(docFreq=1937, maxDocs=44218)
                  0.0625 = fieldNorm(doc=7415)
          0.5 = coord(1/2)
        0.019772539 = product of:
          0.039545078 = sum of:
            0.039545078 = weight(_text_:22 in 7415) [ClassicSimilarity], result of:
              0.039545078 = score(doc=7415,freq=2.0), product of:
                0.12776221 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.036484417 = queryNorm
                0.30952093 = fieldWeight in 7415, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=7415)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    State of the art review of natural language processing updating an earlier review published in ARIST 22(1987). Discusses important developments that have allowed for significant advances in the field of natural language processing: materials and resources; knowledge based systems and statistical approaches; and a strong emphasis on evaluation. Reviews some natural language processing applications and common problems still awaiting solution. Considers closely related applications such as language generation and th egeneration phase of machine translation which face the same problems as natural language processing. Covers natural language methodologies for information retrieval only briefly
  5. Rahmstorf, G.: Concept structures for large vocabularies (1998) 0.02
    0.02012331 = product of:
      0.10061655 = sum of:
        0.10061655 = sum of:
          0.07095774 = weight(_text_:etc in 75) [ClassicSimilarity], result of:
            0.07095774 = score(doc=75,freq=2.0), product of:
              0.19761753 = queryWeight, product of:
                5.4164915 = idf(docFreq=533, maxDocs=44218)
                0.036484417 = queryNorm
              0.35906604 = fieldWeight in 75, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4164915 = idf(docFreq=533, maxDocs=44218)
                0.046875 = fieldNorm(doc=75)
          0.02965881 = weight(_text_:22 in 75) [ClassicSimilarity], result of:
            0.02965881 = score(doc=75,freq=2.0), product of:
              0.12776221 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.036484417 = queryNorm
              0.23214069 = fieldWeight in 75, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=75)
      0.2 = coord(1/5)
    
    Abstract
    A technology is described which supports the acquisition, visualisation and manipulation of large vocabularies with associated structures. It is used for dictionary production, terminology data bases, thesauri, library classification systems etc. Essential features of the technology are a lexicographic user interface, variable word description, unlimited list of word readings, a concept language, automatic transformations of formulas into graphic structures, structure manipulation operations and retransformation into formulas. The concept language includes notations for undefined concepts. The structure of defined concepts can be constructed interactively. The technology supports the generation of large vocabularies with structures representing word senses. Concept structures and ordering systems for indexing and retrieval can be constructed separately and connected by associating relations.
    Date
    30.12.2001 19:01:22
  6. Kay, M.: ¬The proper place of men and machines in language translation (1997) 0.02
    0.016534507 = product of:
      0.04133627 = sum of:
        0.024035294 = product of:
          0.048070587 = sum of:
            0.048070587 = weight(_text_:problems in 1178) [ClassicSimilarity], result of:
              0.048070587 = score(doc=1178,freq=2.0), product of:
                0.15058853 = queryWeight, product of:
                  4.1274753 = idf(docFreq=1937, maxDocs=44218)
                  0.036484417 = queryNorm
                0.31921813 = fieldWeight in 1178, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.1274753 = idf(docFreq=1937, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1178)
          0.5 = coord(1/2)
        0.017300973 = product of:
          0.034601945 = sum of:
            0.034601945 = weight(_text_:22 in 1178) [ClassicSimilarity], result of:
              0.034601945 = score(doc=1178,freq=2.0), product of:
                0.12776221 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.036484417 = queryNorm
                0.2708308 = fieldWeight in 1178, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1178)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    Machine translation stands no chance of filling actual needs for translation because, although there has been progress in relevant areas of computer science, advance in linguistics have not touched the core problems. Cooperative man-machine systems need to be developed, Proposes a translator's amanuensis, incorporating into a word processor some simple facilities peculiar to translation. Gradual enhancements of such a system could lead to the original goal of machine translation
    Date
    31. 7.1996 9:22:19
  7. Malone, L.C.; Driscoll, J.R.; Pepe, J.W.: Modeling the performance of an automated keywording system (1991) 0.02
    0.016310833 = product of:
      0.08155417 = sum of:
        0.08155417 = product of:
          0.16310833 = sum of:
            0.16310833 = weight(_text_:exercises in 6682) [ClassicSimilarity], result of:
              0.16310833 = score(doc=6682,freq=2.0), product of:
                0.25947425 = queryWeight, product of:
                  7.11192 = idf(docFreq=97, maxDocs=44218)
                  0.036484417 = queryNorm
                0.62861085 = fieldWeight in 6682, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  7.11192 = idf(docFreq=97, maxDocs=44218)
                  0.0625 = fieldNorm(doc=6682)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    Presents a model for predicting the performance of a computerised keyword assigning and indexing system. Statistical procedures were investigated in order to protect against incorrect keywording by the system behaving as an expert system designed to mimic the behaviour of human keyword indexers and representing lessons learned from military exercises and operations
  8. Sabourin, C.F. (Bearb.): Computational linguistics in information science : bibliography (1994) 0.01
    0.011826291 = product of:
      0.05913145 = sum of:
        0.05913145 = product of:
          0.1182629 = sum of:
            0.1182629 = weight(_text_:etc in 8280) [ClassicSimilarity], result of:
              0.1182629 = score(doc=8280,freq=2.0), product of:
                0.19761753 = queryWeight, product of:
                  5.4164915 = idf(docFreq=533, maxDocs=44218)
                  0.036484417 = queryNorm
                0.5984434 = fieldWeight in 8280, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.4164915 = idf(docFreq=533, maxDocs=44218)
                  0.078125 = fieldNorm(doc=8280)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    The bibliography covers information retrieval (2100 refs.), fulltext (890) or conceptual (60), automatic indexing (930), information extraction (520), query languages (1090), etc.; altogether 6390 references, fully indexed
  9. Pirkola, A.; Hedlund, T.; Keskustalo, H.; Järvelin, K.: Dictionary-based cross-language information retrieval : problems, methods, and research findings (2001) 0.01
    0.009614117 = product of:
      0.048070587 = sum of:
        0.048070587 = product of:
          0.096141174 = sum of:
            0.096141174 = weight(_text_:problems in 3908) [ClassicSimilarity], result of:
              0.096141174 = score(doc=3908,freq=2.0), product of:
                0.15058853 = queryWeight, product of:
                  4.1274753 = idf(docFreq=1937, maxDocs=44218)
                  0.036484417 = queryNorm
                0.63843626 = fieldWeight in 3908, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.1274753 = idf(docFreq=1937, maxDocs=44218)
                  0.109375 = fieldNorm(doc=3908)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
  10. Yang, C.C.; Luk, J.: Automatic generation of English/Chinese thesaurus based on a parallel corpus in laws (2003) 0.01
    0.008267254 = product of:
      0.020668134 = sum of:
        0.012017647 = product of:
          0.024035294 = sum of:
            0.024035294 = weight(_text_:problems in 1616) [ClassicSimilarity], result of:
              0.024035294 = score(doc=1616,freq=2.0), product of:
                0.15058853 = queryWeight, product of:
                  4.1274753 = idf(docFreq=1937, maxDocs=44218)
                  0.036484417 = queryNorm
                0.15960906 = fieldWeight in 1616, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.1274753 = idf(docFreq=1937, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=1616)
          0.5 = coord(1/2)
        0.008650486 = product of:
          0.017300973 = sum of:
            0.017300973 = weight(_text_:22 in 1616) [ClassicSimilarity], result of:
              0.017300973 = score(doc=1616,freq=2.0), product of:
                0.12776221 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.036484417 = queryNorm
                0.1354154 = fieldWeight in 1616, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=1616)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    The information available in languages other than English in the World Wide Web is increasing significantly. According to a report from Computer Economics in 1999, 54% of Internet users are English speakers ("English Will Dominate Web for Only Three More Years," Computer Economics, July 9, 1999, http://www.computereconomics. com/new4/pr/pr990610.html). However, it is predicted that there will be only 60% increase in Internet users among English speakers verses a 150% growth among nonEnglish speakers for the next five years. By 2005, 57% of Internet users will be non-English speakers. A report by CNN.com in 2000 showed that the number of Internet users in China had been increased from 8.9 million to 16.9 million from January to June in 2000 ("Report: China Internet users double to 17 million," CNN.com, July, 2000, http://cnn.org/2000/TECH/computing/07/27/ china.internet.reut/index.html). According to Nielsen/ NetRatings, there was a dramatic leap from 22.5 millions to 56.6 millions Internet users from 2001 to 2002. China had become the second largest global at-home Internet population in 2002 (US's Internet population was 166 millions) (Robyn Greenspan, "China Pulls Ahead of Japan," Internet.com, April 22, 2002, http://cyberatias.internet.com/big-picture/geographics/article/0,,5911_1013841,00. html). All of the evidences reveal the importance of crosslingual research to satisfy the needs in the near future. Digital library research has been focusing in structural and semantic interoperability in the past. Searching and retrieving objects across variations in protocols, formats and disciplines are widely explored (Schatz, B., & Chen, H. (1999). Digital libraries: technological advances and social impacts. IEEE Computer, Special Issue an Digital Libraries, February, 32(2), 45-50.; Chen, H., Yen, J., & Yang, C.C. (1999). International activities: development of Asian digital libraries. IEEE Computer, Special Issue an Digital Libraries, 32(2), 48-49.). However, research in crossing language boundaries, especially across European languages and Oriental languages, is still in the initial stage. In this proposal, we put our focus an cross-lingual semantic interoperability by developing automatic generation of a cross-lingual thesaurus based an English/Chinese parallel corpus. When the searchers encounter retrieval problems, Professional librarians usually consult the thesaurus to identify other relevant vocabularies. In the problem of searching across language boundaries, a cross-lingual thesaurus, which is generated by co-occurrence analysis and Hopfield network, can be used to generate additional semantically relevant terms that cannot be obtained from dictionary. In particular, the automatically generated cross-lingual thesaurus is able to capture the unknown words that do not exist in a dictionary, such as names of persons, organizations, and events. Due to Hong Kong's unique history background, both English and Chinese are used as official languages in all legal documents. Therefore, English/Chinese cross-lingual information retrieval is critical for applications in courts and the government. In this paper, we develop an automatic thesaurus by the Hopfield network based an a parallel corpus collected from the Web site of the Department of Justice of the Hong Kong Special Administrative Region (HKSAR) Government. Experiments are conducted to measure the precision and recall of the automatic generated English/Chinese thesaurus. The result Shows that such thesaurus is a promising tool to retrieve relevant terms, especially in the language that is not the same as the input term. The direct translation of the input term can also be retrieved in most of the cases.
  11. Warner, A.J.: Natural language processing (1987) 0.01
    0.007909016 = product of:
      0.039545078 = sum of:
        0.039545078 = product of:
          0.079090156 = sum of:
            0.079090156 = weight(_text_:22 in 337) [ClassicSimilarity], result of:
              0.079090156 = score(doc=337,freq=2.0), product of:
                0.12776221 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.036484417 = queryNorm
                0.61904186 = fieldWeight in 337, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.125 = fieldNorm(doc=337)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Source
    Annual review of information science and technology. 22(1987), S.79-108
  12. Cimiano, P.; Völker, J.; Studer, R.: Ontologies on demand? : a description of the state-of-the-art, applications, challenges and trends for ontology learning from text (2006) 0.01
    0.007095774 = product of:
      0.03547887 = sum of:
        0.03547887 = product of:
          0.07095774 = sum of:
            0.07095774 = weight(_text_:etc in 6014) [ClassicSimilarity], result of:
              0.07095774 = score(doc=6014,freq=2.0), product of:
                0.19761753 = queryWeight, product of:
                  5.4164915 = idf(docFreq=533, maxDocs=44218)
                  0.036484417 = queryNorm
                0.35906604 = fieldWeight in 6014, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.4164915 = idf(docFreq=533, maxDocs=44218)
                  0.046875 = fieldNorm(doc=6014)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    Ontologies are nowadays used for many applications requiring data, services and resources in general to be interoperable and machine understandable. Such applications are for example web service discovery and composition, information integration across databases, intelligent search, etc. The general idea is that data and services are semantically described with respect to ontologies, which are formal specifications of a domain of interest, and can thus be shared and reused in a way such that the shared meaning specified by the ontology remains formally the same across different parties and applications. As the cost of creating ontologies is relatively high, different proposals have emerged for learning ontologies from structured and unstructured resources. In this article we examine the maturity of techniques for ontology learning from textual resources, addressing the question whether the state-of-the-art is mature enough to produce ontologies 'on demand'.
  13. Santana Suárez, O.; Carreras Riudavets, F.J.; Hernández Figueroa, Z.; González Cabrera, A.C.: Integration of an XML electronic dictionary with linguistic tools for natural language processing (2007) 0.01
    0.007095774 = product of:
      0.03547887 = sum of:
        0.03547887 = product of:
          0.07095774 = sum of:
            0.07095774 = weight(_text_:etc in 921) [ClassicSimilarity], result of:
              0.07095774 = score(doc=921,freq=2.0), product of:
                0.19761753 = queryWeight, product of:
                  5.4164915 = idf(docFreq=533, maxDocs=44218)
                  0.036484417 = queryNorm
                0.35906604 = fieldWeight in 921, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.4164915 = idf(docFreq=533, maxDocs=44218)
                  0.046875 = fieldNorm(doc=921)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    This study proposes the codification of lexical information in electronic dictionaries, in accordance with a generic and extendable XML scheme model, and its conjunction with linguistic tools for the processing of natural language. Our approach is different from other similar studies in that we propose XML coding of those items from a dictionary of meanings that are less related to the lexical units. Linguistic information, such as morphology, syllables, phonology, etc., will be included by means of specific linguistic tools. The use of XML as a container for the information allows the use of other XML tools for carrying out searches or for enabling presentation of the information in different resources. This model is particularly important as it combines two parallel paradigms-extendable labelling of documents and computational linguistics-and it is also applicable to other languages. We have included a comparison with the labelling proposal of printed dictionaries carried out by the Text Encoding Initiative (TEI). The proposed design has been validated with a dictionary of more than 145 000 accepted meanings.
  14. Airio, E.; Kettunen, K.: Does dictionary based bilingual retrieval work in a non-normalized index? (2009) 0.01
    0.007095774 = product of:
      0.03547887 = sum of:
        0.03547887 = product of:
          0.07095774 = sum of:
            0.07095774 = weight(_text_:etc in 4224) [ClassicSimilarity], result of:
              0.07095774 = score(doc=4224,freq=2.0), product of:
                0.19761753 = queryWeight, product of:
                  5.4164915 = idf(docFreq=533, maxDocs=44218)
                  0.036484417 = queryNorm
                0.35906604 = fieldWeight in 4224, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.4164915 = idf(docFreq=533, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4224)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    Many operational IR indexes are non-normalized, i.e. no lemmatization or stemming techniques, etc. have been employed in indexing. This poses a challenge for dictionary-based cross-language retrieval (CLIR), because translations are mostly lemmas. In this study, we face the challenge of dictionary-based CLIR in a non-normalized index. We test two optional approaches: FCG (Frequent Case Generation) and s-gramming. The idea of FCG is to automatically generate the most frequent inflected forms for a given lemma. FCG has been tested in monolingual retrieval and has been shown to be a good method for inflected retrieval, especially for highly inflected languages. S-gramming is an approximate string matching technique (an extension of n-gramming). The language pairs in our tests were English-Finnish, English-Swedish, Swedish-Finnish and Finnish-Swedish. Both our approaches performed quite well, but the results varied depending on the language pair. S-gramming and FCG performed quite equally in all the other language pairs except Finnish-Swedish, where s-gramming outperformed FCG.
  15. Altmann, E.G.; Cristadoro, G.; Esposti, M.D.: On the origin of long-range correlations in texts (2012) 0.01
    0.007095774 = product of:
      0.03547887 = sum of:
        0.03547887 = product of:
          0.07095774 = sum of:
            0.07095774 = weight(_text_:etc in 330) [ClassicSimilarity], result of:
              0.07095774 = score(doc=330,freq=2.0), product of:
                0.19761753 = queryWeight, product of:
                  5.4164915 = idf(docFreq=533, maxDocs=44218)
                  0.036484417 = queryNorm
                0.35906604 = fieldWeight in 330, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.4164915 = idf(docFreq=533, maxDocs=44218)
                  0.046875 = fieldNorm(doc=330)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    The complexity of human interactions with social and natural phenomena is mirrored in the way we describe our experiences through natural language. In order to retain and convey such a high dimensional information, the statistical properties of our linguistic output has to be highly correlated in time. An example are the robust observations, still largely not understood, of correlations on arbitrary long scales in literary texts. In this paper we explain how long-range correlations flow from highly structured linguistic levels down to the building blocks of a text (words, letters, etc..). By combining calculations and data analysis we show that correlations take form of a bursty sequence of events once we approach the semantically relevant topics of the text. The mechanisms we identify are fairly general and can be equally applied to other hierarchical settings.
  16. AL-Smadi, M.; Jaradat, Z.; AL-Ayyoub, M.; Jararweh, Y.: Paraphrase identification and semantic text similarity analysis in Arabic news tweets using lexical, syntactic, and semantic features (2017) 0.01
    0.007095774 = product of:
      0.03547887 = sum of:
        0.03547887 = product of:
          0.07095774 = sum of:
            0.07095774 = weight(_text_:etc in 5095) [ClassicSimilarity], result of:
              0.07095774 = score(doc=5095,freq=2.0), product of:
                0.19761753 = queryWeight, product of:
                  5.4164915 = idf(docFreq=533, maxDocs=44218)
                  0.036484417 = queryNorm
                0.35906604 = fieldWeight in 5095, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.4164915 = idf(docFreq=533, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5095)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    The rapid growth in digital information has raised considerable challenges in particular when it comes to automated content analysis. Social media such as twitter share a lot of its users' information about their events, opinions, personalities, etc. Paraphrase Identification (PI) is concerned with recognizing whether two texts have the same/similar meaning, whereas the Semantic Text Similarity (STS) is concerned with the degree of that similarity. This research proposes a state-of-the-art approach for paraphrase identification and semantic text similarity analysis in Arabic news tweets. The approach adopts several phases of text processing, features extraction and text classification. Lexical, syntactic, and semantic features are extracted to overcome the weakness and limitations of the current technologies in solving these tasks for the Arabic language. Maximum Entropy (MaxEnt) and Support Vector Regression (SVR) classifiers are trained using these features and are evaluated using a dataset prepared for this research. The experimentation results show that the approach achieves good results in comparison to the baseline results.
  17. Shree, P.: ¬The journey of Open AI GPT models (2020) 0.01
    0.007095774 = product of:
      0.03547887 = sum of:
        0.03547887 = product of:
          0.07095774 = sum of:
            0.07095774 = weight(_text_:etc in 869) [ClassicSimilarity], result of:
              0.07095774 = score(doc=869,freq=2.0), product of:
                0.19761753 = queryWeight, product of:
                  5.4164915 = idf(docFreq=533, maxDocs=44218)
                  0.036484417 = queryNorm
                0.35906604 = fieldWeight in 869, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.4164915 = idf(docFreq=533, maxDocs=44218)
                  0.046875 = fieldNorm(doc=869)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    Generative Pre-trained Transformer (GPT) models by OpenAI have taken natural language processing (NLP) community by storm by introducing very powerful language models. These models can perform various NLP tasks like question answering, textual entailment, text summarisation etc. without any supervised training. These language models need very few to no examples to understand the tasks and perform equivalent or even better than the state-of-the-art models trained in supervised fashion. In this article we will cover the journey of these models and understand how they have evolved over a period of 2 years. 1. Discussion of GPT-1 paper (Improving Language Understanding by Generative Pre-training). 2. Discussion of GPT-2 paper (Language Models are unsupervised multitask learners) and its subsequent improvements over GPT-1. 3. Discussion of GPT-3 paper (Language models are few shot learners) and the improvements which have made it one of the most powerful models NLP has seen till date. This article assumes familiarity with the basics of NLP terminologies and transformer architecture.
  18. Thomas, I.S.; Wang, J.; GPT-3: Was euch zu Menschen macht : Antworten einer künstlichen Intelligenz auf die großen Fragen des Lebens (2022) 0.01
    0.007095774 = product of:
      0.03547887 = sum of:
        0.03547887 = product of:
          0.07095774 = sum of:
            0.07095774 = weight(_text_:etc in 878) [ClassicSimilarity], result of:
              0.07095774 = score(doc=878,freq=2.0), product of:
                0.19761753 = queryWeight, product of:
                  5.4164915 = idf(docFreq=533, maxDocs=44218)
                  0.036484417 = queryNorm
                0.35906604 = fieldWeight in 878, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.4164915 = idf(docFreq=533, maxDocs=44218)
                  0.046875 = fieldNorm(doc=878)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    Das erste durch KI verfasste Weisheitsbuch. »Die Künstliche Intelligenz sieht den Menschen, wie er ist. Es gibt für sie keinen Gott, keine Rituale, keinen Himmel, keine Hölle, keine Engel. Es gibt für sie nur empfindsame Wesen.« GPT-3. Dieses Buch enthält Weisheitstexte, die durch die modernste KI im Bereich der Spracherkennung verfasst wurden. Es ist die GPT-3, die durch die Technikerin Jasmine Wang gesteuert wird. Die originären Texte von GPT-3 werden von dem international bekannten Dichter Iain S. Thomas kuratiert. Die Basis von GPT-3 reicht von den Weisheitsbücher der Menschheit bis hin zu modernen Texten. GPT-3 antwortet auf Fragen wie: Was macht den Mensch zum Menschen? Was bedeutet es zu lieben? Wie führen wir ein erfülltes Leben? etc. und ist in der Lage, eigene Sätze zu kreieren. So wird eine zeitgenössische und noch nie dagewesene Erforschung von Sinn und Spiritualität geschaffen, die zu einem neuen Verständnis dessen inspiriert, was uns zu Menschen macht.
  19. McMahon, J.G.; Smith, F.J.: Improved statistical language model performance with automatic generated word hierarchies (1996) 0.01
    0.0069203894 = product of:
      0.034601945 = sum of:
        0.034601945 = product of:
          0.06920389 = sum of:
            0.06920389 = weight(_text_:22 in 3164) [ClassicSimilarity], result of:
              0.06920389 = score(doc=3164,freq=2.0), product of:
                0.12776221 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.036484417 = queryNorm
                0.5416616 = fieldWeight in 3164, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=3164)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Source
    Computational linguistics. 22(1996) no.2, S.217-248
  20. Ruge, G.: ¬A spreading activation network for automatic generation of thesaurus relationships (1991) 0.01
    0.0069203894 = product of:
      0.034601945 = sum of:
        0.034601945 = product of:
          0.06920389 = sum of:
            0.06920389 = weight(_text_:22 in 4506) [ClassicSimilarity], result of:
              0.06920389 = score(doc=4506,freq=2.0), product of:
                0.12776221 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.036484417 = queryNorm
                0.5416616 = fieldWeight in 4506, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=4506)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Date
    8.10.2000 11:52:22

Years

Languages

  • e 75
  • d 20
  • m 1
  • More… Less…

Types

  • a 72
  • m 13
  • el 11
  • s 6
  • p 3
  • x 3
  • b 1
  • d 1
  • More… Less…

Classifications