Search (62 results, page 1 of 4)

  • × theme_ss:"Computerlinguistik"
  1. Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.12
    0.119041316 = sum of:
      0.09478464 = product of:
        0.2843539 = sum of:
          0.2843539 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
            0.2843539 = score(doc=562,freq=2.0), product of:
              0.5059516 = queryWeight, product of:
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.059678096 = queryNorm
              0.56201804 = fieldWeight in 562, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.046875 = fieldNorm(doc=562)
        0.33333334 = coord(1/3)
      0.024256675 = product of:
        0.04851335 = sum of:
          0.04851335 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
            0.04851335 = score(doc=562,freq=2.0), product of:
              0.20898253 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.059678096 = queryNorm
              0.23214069 = fieldWeight in 562, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=562)
        0.5 = coord(1/2)
    
    Content
    Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
    Date
    8. 1.2013 10:22:32
  2. Roberts, C.W.; Popping, R.: Computer-supported content analysis : some recent developments (1993) 0.05
    0.052049424 = product of:
      0.10409885 = sum of:
        0.10409885 = product of:
          0.2081977 = sum of:
            0.2081977 = weight(_text_:maps in 4236) [ClassicSimilarity], result of:
              0.2081977 = score(doc=4236,freq=2.0), product of:
                0.33534583 = queryWeight, product of:
                  5.619245 = idf(docFreq=435, maxDocs=44218)
                  0.059678096 = queryNorm
                0.6208447 = fieldWeight in 4236, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.619245 = idf(docFreq=435, maxDocs=44218)
                  0.078125 = fieldNorm(doc=4236)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Presents an overview of some recent developments in the clause-based content analysis of linguistic data. Introduces network analysis of evaluative texts, for the analysis of cognitive maps, and linguistic content analysis. Focuses on the types of substantive inferences afforded by the three approaches
  3. Noever, D.; Ciolino, M.: ¬The Turing deception (2022) 0.05
    0.04739232 = product of:
      0.09478464 = sum of:
        0.09478464 = product of:
          0.2843539 = sum of:
            0.2843539 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
              0.2843539 = score(doc=862,freq=2.0), product of:
                0.5059516 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.059678096 = queryNorm
                0.56201804 = fieldWeight in 862, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=862)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
    Source
    https%3A%2F%2Farxiv.org%2Fabs%2F2212.06721&usg=AOvVaw3i_9pZm9y_dQWoHi6uv0EN
  4. Yang, C.C.; Luk, J.: Automatic generation of English/Chinese thesaurus based on a parallel corpus in laws (2003) 0.04
    0.035880987 = sum of:
      0.021731261 = product of:
        0.06519378 = sum of:
          0.06519378 = weight(_text_:objects in 1616) [ClassicSimilarity], result of:
            0.06519378 = score(doc=1616,freq=2.0), product of:
              0.31719333 = queryWeight, product of:
                5.315071 = idf(docFreq=590, maxDocs=44218)
                0.059678096 = queryNorm
              0.20553327 = fieldWeight in 1616, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.315071 = idf(docFreq=590, maxDocs=44218)
                0.02734375 = fieldNorm(doc=1616)
        0.33333334 = coord(1/3)
      0.014149726 = product of:
        0.028299453 = sum of:
          0.028299453 = weight(_text_:22 in 1616) [ClassicSimilarity], result of:
            0.028299453 = score(doc=1616,freq=2.0), product of:
              0.20898253 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.059678096 = queryNorm
              0.1354154 = fieldWeight in 1616, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.02734375 = fieldNorm(doc=1616)
        0.5 = coord(1/2)
    
    Abstract
    The information available in languages other than English in the World Wide Web is increasing significantly. According to a report from Computer Economics in 1999, 54% of Internet users are English speakers ("English Will Dominate Web for Only Three More Years," Computer Economics, July 9, 1999, http://www.computereconomics. com/new4/pr/pr990610.html). However, it is predicted that there will be only 60% increase in Internet users among English speakers verses a 150% growth among nonEnglish speakers for the next five years. By 2005, 57% of Internet users will be non-English speakers. A report by CNN.com in 2000 showed that the number of Internet users in China had been increased from 8.9 million to 16.9 million from January to June in 2000 ("Report: China Internet users double to 17 million," CNN.com, July, 2000, http://cnn.org/2000/TECH/computing/07/27/ china.internet.reut/index.html). According to Nielsen/ NetRatings, there was a dramatic leap from 22.5 millions to 56.6 millions Internet users from 2001 to 2002. China had become the second largest global at-home Internet population in 2002 (US's Internet population was 166 millions) (Robyn Greenspan, "China Pulls Ahead of Japan," Internet.com, April 22, 2002, http://cyberatias.internet.com/big-picture/geographics/article/0,,5911_1013841,00. html). All of the evidences reveal the importance of crosslingual research to satisfy the needs in the near future. Digital library research has been focusing in structural and semantic interoperability in the past. Searching and retrieving objects across variations in protocols, formats and disciplines are widely explored (Schatz, B., & Chen, H. (1999). Digital libraries: technological advances and social impacts. IEEE Computer, Special Issue an Digital Libraries, February, 32(2), 45-50.; Chen, H., Yen, J., & Yang, C.C. (1999). International activities: development of Asian digital libraries. IEEE Computer, Special Issue an Digital Libraries, 32(2), 48-49.). However, research in crossing language boundaries, especially across European languages and Oriental languages, is still in the initial stage. In this proposal, we put our focus an cross-lingual semantic interoperability by developing automatic generation of a cross-lingual thesaurus based an English/Chinese parallel corpus. When the searchers encounter retrieval problems, Professional librarians usually consult the thesaurus to identify other relevant vocabularies. In the problem of searching across language boundaries, a cross-lingual thesaurus, which is generated by co-occurrence analysis and Hopfield network, can be used to generate additional semantically relevant terms that cannot be obtained from dictionary. In particular, the automatically generated cross-lingual thesaurus is able to capture the unknown words that do not exist in a dictionary, such as names of persons, organizations, and events. Due to Hong Kong's unique history background, both English and Chinese are used as official languages in all legal documents. Therefore, English/Chinese cross-lingual information retrieval is critical for applications in courts and the government. In this paper, we develop an automatic thesaurus by the Hopfield network based an a parallel corpus collected from the Web site of the Department of Justice of the Hong Kong Special Administrative Region (HKSAR) Government. Experiments are conducted to measure the precision and recall of the automatic generated English/Chinese thesaurus. The result Shows that such thesaurus is a promising tool to retrieve relevant terms, especially in the language that is not the same as the input term. The direct translation of the input term can also be retrieved in most of the cases.
  5. Warner, A.J.: Natural language processing (1987) 0.03
    0.032342233 = product of:
      0.064684466 = sum of:
        0.064684466 = product of:
          0.12936893 = sum of:
            0.12936893 = weight(_text_:22 in 337) [ClassicSimilarity], result of:
              0.12936893 = score(doc=337,freq=2.0), product of:
                0.20898253 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.059678096 = queryNorm
                0.61904186 = fieldWeight in 337, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.125 = fieldNorm(doc=337)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    Annual review of information science and technology. 22(1987), S.79-108
  6. Hausser, R.: Language and nonlanguage cognition (2021) 0.03
    0.031229656 = product of:
      0.062459312 = sum of:
        0.062459312 = product of:
          0.124918625 = sum of:
            0.124918625 = weight(_text_:maps in 255) [ClassicSimilarity], result of:
              0.124918625 = score(doc=255,freq=2.0), product of:
                0.33534583 = queryWeight, product of:
                  5.619245 = idf(docFreq=435, maxDocs=44218)
                  0.059678096 = queryNorm
                0.37250686 = fieldWeight in 255, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.619245 = idf(docFreq=435, maxDocs=44218)
                  0.046875 = fieldNorm(doc=255)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    A basic distinction in agent-based data-driven Database Semantics (DBS) is between language and nonlanguage cognition. Language cognition transfers content between agents by means of raw data. Nonlanguage cognition maps between content and raw data inside the focus agent. {\it Recognition} applies a concept type to raw data, resulting in a concept token. In language recognition, the focus agent (hearer) takes raw language-data (surfaces) produced by another agent (speaker) as input, while nonlanguage recognition takes raw nonlanguage-data as input. In either case, the output is a content which is stored in the agent's onboard short term memory. {\it Action} adapts a concept type to a purpose, resulting in a token. In language action, the focus agent (speaker) produces language-dependent surfaces for another agent (hearer), while nonlanguage action produces intentions for a nonlanguage purpose. In either case, the output is raw action data. As long as the procedural implementation of place holder values works properly, it is compatible with the DBS requirement of input-output equivalence between the natural prototype and the artificial reconstruction.
  7. Sokirko, A.V.: Programnaya realizatsiya Russkogo abshchesemanticheskogo slovarya (1997) 0.03
    0.03104466 = product of:
      0.06208932 = sum of:
        0.06208932 = product of:
          0.18626796 = sum of:
            0.18626796 = weight(_text_:objects in 2258) [ClassicSimilarity], result of:
              0.18626796 = score(doc=2258,freq=2.0), product of:
                0.31719333 = queryWeight, product of:
                  5.315071 = idf(docFreq=590, maxDocs=44218)
                  0.059678096 = queryNorm
                0.58723795 = fieldWeight in 2258, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.315071 = idf(docFreq=590, maxDocs=44218)
                  0.078125 = fieldNorm(doc=2258)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
    Abstract
    Discusses the Dolphi2 for Windows software which has been used for the development of the Russian Semantic Dictionay ROSS. Although not a relational database as such, Dolphi actively uses standard objects of relational databases
  8. Humphrey, S.M.; Rogers, W.J.; Kilicoglu, H.; Demner-Fushman, D.; Rindflesch, T.C.: Word sense disambiguation by selecting the best semantic type based on journal descriptor indexing : preliminary experiment (2006) 0.03
    0.029443601 = product of:
      0.058887202 = sum of:
        0.058887202 = product of:
          0.117774405 = sum of:
            0.117774405 = weight(_text_:maps in 4912) [ClassicSimilarity], result of:
              0.117774405 = score(doc=4912,freq=4.0), product of:
                0.33534583 = queryWeight, product of:
                  5.619245 = idf(docFreq=435, maxDocs=44218)
                  0.059678096 = queryNorm
                0.35120282 = fieldWeight in 4912, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.619245 = idf(docFreq=435, maxDocs=44218)
                  0.03125 = fieldNorm(doc=4912)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    An experiment was performed at the National Library of Medicine® (NLM®) in word sense disambiguation (WSD) using the Journal Descriptor Indexing (JDI) methodology. The motivation is the need to solve the ambiguity problem confronting NLM's MetaMap system, which maps free text to terms corresponding to concepts in NLM's Unified Medical Language System® (UMLS®) Metathesaurus®. If the text maps to more than one Metathesaurus concept at the same high confidence score, MetaMap has no way of knowing which concept is the correct mapping. We describe the JDI methodology, which is ultimately based an statistical associations between words in a training set of MEDLINE® citations and a small set of journal descriptors (assigned by humans to journals per se) assumed to be inherited by the citations. JDI is the basis for selecting the best meaning that is correlated to UMLS semantic types (STs) assigned to ambiguous concepts in the Metathesaurus. For example, the ambiguity transport has two meanings: "Biological Transport" assigned the ST Cell Function and "Patient transport" assigned the ST Health Care Activity. A JDI-based methodology can analyze text containing transport and determine which ST receives a higher score for that text, which then returns the associated meaning, presumed to apply to the ambiguity itself. We then present an experiment in which a baseline disambiguation method was compared to four versions of JDI in disambiguating 45 ambiguous strings from NLM's WSD Test Collection. Overall average precision for the highest-scoring JDI version was 0.7873 compared to 0.2492 for the baseline method, and average precision for individual ambiguities was greater than 0.90 for 23 of them (51%), greater than 0.85 for 24 (53%), and greater than 0.65 for 35 (79%). On the basis of these results, we hope to improve performance of JDI and test its use in applications.
  9. McMahon, J.G.; Smith, F.J.: Improved statistical language model performance with automatic generated word hierarchies (1996) 0.03
    0.028299453 = product of:
      0.056598905 = sum of:
        0.056598905 = product of:
          0.11319781 = sum of:
            0.11319781 = weight(_text_:22 in 3164) [ClassicSimilarity], result of:
              0.11319781 = score(doc=3164,freq=2.0), product of:
                0.20898253 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.059678096 = queryNorm
                0.5416616 = fieldWeight in 3164, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=3164)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    Computational linguistics. 22(1996) no.2, S.217-248
  10. Ruge, G.: ¬A spreading activation network for automatic generation of thesaurus relationships (1991) 0.03
    0.028299453 = product of:
      0.056598905 = sum of:
        0.056598905 = product of:
          0.11319781 = sum of:
            0.11319781 = weight(_text_:22 in 4506) [ClassicSimilarity], result of:
              0.11319781 = score(doc=4506,freq=2.0), product of:
                0.20898253 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.059678096 = queryNorm
                0.5416616 = fieldWeight in 4506, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=4506)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    8.10.2000 11:52:22
  11. Somers, H.: Example-based machine translation : Review article (1999) 0.03
    0.028299453 = product of:
      0.056598905 = sum of:
        0.056598905 = product of:
          0.11319781 = sum of:
            0.11319781 = weight(_text_:22 in 6672) [ClassicSimilarity], result of:
              0.11319781 = score(doc=6672,freq=2.0), product of:
                0.20898253 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.059678096 = queryNorm
                0.5416616 = fieldWeight in 6672, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=6672)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    31. 7.1996 9:22:19
  12. New tools for human translators (1997) 0.03
    0.028299453 = product of:
      0.056598905 = sum of:
        0.056598905 = product of:
          0.11319781 = sum of:
            0.11319781 = weight(_text_:22 in 1179) [ClassicSimilarity], result of:
              0.11319781 = score(doc=1179,freq=2.0), product of:
                0.20898253 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.059678096 = queryNorm
                0.5416616 = fieldWeight in 1179, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=1179)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    31. 7.1996 9:22:19
  13. Baayen, R.H.; Lieber, H.: Word frequency distributions and lexical semantics (1997) 0.03
    0.028299453 = product of:
      0.056598905 = sum of:
        0.056598905 = product of:
          0.11319781 = sum of:
            0.11319781 = weight(_text_:22 in 3117) [ClassicSimilarity], result of:
              0.11319781 = score(doc=3117,freq=2.0), product of:
                0.20898253 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.059678096 = queryNorm
                0.5416616 = fieldWeight in 3117, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=3117)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    28. 2.1999 10:48:22
  14. ¬Der Student aus dem Computer (2023) 0.03
    0.028299453 = product of:
      0.056598905 = sum of:
        0.056598905 = product of:
          0.11319781 = sum of:
            0.11319781 = weight(_text_:22 in 1079) [ClassicSimilarity], result of:
              0.11319781 = score(doc=1079,freq=2.0), product of:
                0.20898253 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.059678096 = queryNorm
                0.5416616 = fieldWeight in 1079, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=1079)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    27. 1.2023 16:22:55
  15. Witschel, H.F.: Terminologie-Extraktion : Möglichkeiten der Kombination statistischer uns musterbasierter Verfahren (2004) 0.03
    0.026024712 = product of:
      0.052049424 = sum of:
        0.052049424 = product of:
          0.10409885 = sum of:
            0.10409885 = weight(_text_:maps in 123) [ClassicSimilarity], result of:
              0.10409885 = score(doc=123,freq=2.0), product of:
                0.33534583 = queryWeight, product of:
                  5.619245 = idf(docFreq=435, maxDocs=44218)
                  0.059678096 = queryNorm
                0.31042236 = fieldWeight in 123, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.619245 = idf(docFreq=435, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=123)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Die Suche nach Informationen in unstrukturierten natürlichsprachlichen Daten ist Gegenstand des sogenannten Text Mining. In dieser Arbeit wird ein Teilgebiet des Text Mining beleuchtet, nämlich die Extraktion domänenspezifischer Fachbegriffe aus Fachtexten der jeweiligen Domäne. Wofür überhaupt Terminologie-Extraktion? Die Antwort darauf ist einfach: der Schlüssel zum Verständnis vieler Fachgebiete liegt in der Kenntnis der zugehörigen Terminologie. Natürlich genügt es nicht, nur eine Liste der Fachtermini einer Domäne zu kennen, um diese zu durchdringen. Eine solche Liste ist aber eine wichtige Voraussetzung für die Erstellung von Fachwörterbüchern (man denke z.B. an Nachschlagewerke wie das klinische Wörterbuch "Pschyrembel"): zunächst muß geklärt werden, welche Begriffe in das Wörterbuch aufgenommen werden sollen, bevor man sich Gedanken um die genaue Definition der einzelnen Termini machen kann. Ein Fachwörterbuch sollte genau diejenigen Begriffe einer Domäne beinhalten, welche Gegenstand der Forschung in diesem Gebiet sind oder waren. Was liegt also näher, als entsprechende Fachliteratur zu betrachten und das darin enthaltene Wissen in Form von Fachtermini zu extrahieren? Darüberhinaus sind weitere Anwendungen der Terminologie-Extraktion denkbar, wie z.B. die automatische Beschlagwortung von Texten oder die Erstellung sogenannter Topic Maps, welche wichtige Begriffe zu einem Thema darstellt und in Beziehung setzt. Es muß also zunächst die Frage geklärt werden, was Terminologie eigentlich ist, vor allem aber werden verschiedene Methoden entwickelt, welche die Eigenschaften von Fachtermini ausnutzen, um diese aufzufinden. Die Verfahren werden aus den linguistischen und 'statistischen' Charakteristika von Fachbegriffen hergeleitet und auf geeignete Weise kombiniert.
  16. Witschel, H.F.: Terminology extraction and automatic indexing : comparison and qualitative evaluation of methods (2005) 0.03
    0.026024712 = product of:
      0.052049424 = sum of:
        0.052049424 = product of:
          0.10409885 = sum of:
            0.10409885 = weight(_text_:maps in 1842) [ClassicSimilarity], result of:
              0.10409885 = score(doc=1842,freq=2.0), product of:
                0.33534583 = queryWeight, product of:
                  5.619245 = idf(docFreq=435, maxDocs=44218)
                  0.059678096 = queryNorm
                0.31042236 = fieldWeight in 1842, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.619245 = idf(docFreq=435, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1842)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Many terminology engineering processes involve the task of automatic terminology extraction: before the terminology of a given domain can be modelled, organised or standardised, important concepts (or terms) of this domain have to be identified and fed into terminological databases. These serve in further steps as a starting point for compiling dictionaries, thesauri or maybe even terminological ontologies for the domain. For the extraction of the initial concepts, extraction methods are needed that operate on specialised language texts. On the other hand, many machine learning or information retrieval applications require automatic indexing techniques. In Machine Learning applications concerned with the automatic clustering or classification of texts, often feature vectors are needed that describe the contents of a given text briefly but meaningfully. These feature vectors typically consist of a fairly small set of index terms together with weights indicating their importance. Short but meaningful descriptions of document contents as provided by good index terms are also useful to humans: some knowledge management applications (e.g. topic maps) use them as a set of basic concepts (topics). The author believes that the tasks of terminology extraction and automatic indexing have much in common and can thus benefit from the same set of basic algorithms. It is the goal of this paper to outline some methods that may be used in both contexts, but also to find the discriminating factors between the two tasks that call for the variation of parameters or application of different techniques. The discussion of these methods will be based on statistical, syntactical and especially morphological properties of (index) terms. The paper is concluded by the presentation of some qualitative and quantitative results comparing statistical and morphological methods.
  17. Lian, T.; Yu, C.; Wang, W.; Yuan, Q.; Hou, Z.: Doctoral dissertations on tourism in China : a co-word analysis (2016) 0.03
    0.026024712 = product of:
      0.052049424 = sum of:
        0.052049424 = product of:
          0.10409885 = sum of:
            0.10409885 = weight(_text_:maps in 3178) [ClassicSimilarity], result of:
              0.10409885 = score(doc=3178,freq=2.0), product of:
                0.33534583 = queryWeight, product of:
                  5.619245 = idf(docFreq=435, maxDocs=44218)
                  0.059678096 = queryNorm
                0.31042236 = fieldWeight in 3178, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.619245 = idf(docFreq=435, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3178)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The aim of this paper is to map the foci of research in doctoral dissertations on tourism in China. In the paper, coword analysis is applied, with keywords coming from six public dissertation databases, i.e. CDFD, Wanfang Data, NLC, CALIS, ISTIC, and NSTL, as well as some university libraries providing doctoral dissertations on tourism. Altogether we have examined 928 doctoral dissertations on tourism written between 1989 and 2013. Doctoral dissertations on tourism in China involve 36 first level disciplines and 102 secondary level disciplines. We collect the top 68 keywords of practical significance in tourism which are mentioned at least four times or more. These keywords are classified into 12 categories based on co-word analysis, including cluster analysis, strategic diagrams analysis, and social network analysis. According to the strategic diagram of the 12 categories, we find the mature and immature areas in tourism study. From social networks, we can see the social network maps of original co-occurrence matrix and k-cores analysis of binary matrix. The paper provides valuable insight into the study of tourism by analyzing doctoral dissertations on tourism in China.
  18. Byrne, C.C.; McCracken, S.A.: ¬An adaptive thesaurus employing semantic distance, relational inheritance and nominal compound interpretation for linguistic support of information retrieval (1999) 0.02
    0.024256675 = product of:
      0.04851335 = sum of:
        0.04851335 = product of:
          0.0970267 = sum of:
            0.0970267 = weight(_text_:22 in 4483) [ClassicSimilarity], result of:
              0.0970267 = score(doc=4483,freq=2.0), product of:
                0.20898253 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.059678096 = queryNorm
                0.46428138 = fieldWeight in 4483, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=4483)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    15. 3.2000 10:22:37
  19. Boleda, G.; Evert, S.: Multiword expressions : a pain in the neck of lexical semantics (2009) 0.02
    0.024256675 = product of:
      0.04851335 = sum of:
        0.04851335 = product of:
          0.0970267 = sum of:
            0.0970267 = weight(_text_:22 in 4888) [ClassicSimilarity], result of:
              0.0970267 = score(doc=4888,freq=2.0), product of:
                0.20898253 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.059678096 = queryNorm
                0.46428138 = fieldWeight in 4888, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=4888)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    1. 3.2013 14:56:22
  20. Monnerjahn, P.: Vorsprung ohne Technik : Übersetzen: Computer und Qualität (2000) 0.02
    0.024256675 = product of:
      0.04851335 = sum of:
        0.04851335 = product of:
          0.0970267 = sum of:
            0.0970267 = weight(_text_:22 in 5429) [ClassicSimilarity], result of:
              0.0970267 = score(doc=5429,freq=2.0), product of:
                0.20898253 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.059678096 = queryNorm
                0.46428138 = fieldWeight in 5429, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=5429)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    c't. 2000, H.22, S.230-231

Years

Languages

  • e 43
  • d 18
  • ru 1
  • More… Less…

Types

  • a 46
  • el 7
  • m 7
  • s 4
  • p 3
  • x 2
  • d 1
  • More… Less…