Search (59 results, page 1 of 3)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.32

0.31819054 = sum of:
  0.07476433 = product of:
    0.224293 = sum of:
      0.224293 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
        0.224293 = score(doc=562,freq=2.0), product of:
          0.39908504 = queryWeight, product of:
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.047072954 = queryNorm
          0.56201804 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
    0.33333334 = coord(1/3)
  0.224293 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
    0.224293 = score(doc=562,freq=2.0), product of:
      0.39908504 = queryWeight, product of:
        8.478011 = idf(docFreq=24, maxDocs=44218)
        0.047072954 = queryNorm
      0.56201804 = fieldWeight in 562, product of:
        1.4142135 = tf(freq=2.0), with freq of:
          2.0 = termFreq=2.0
        8.478011 = idf(docFreq=24, maxDocs=44218)
        0.046875 = fieldNorm(doc=562)
  0.019133206 = product of:
    0.038266413 = sum of:
      0.038266413 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
        0.038266413 = score(doc=562,freq=2.0), product of:
          0.16484147 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.047072954 = queryNorm
          0.23214069 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
    0.5 = coord(1/2)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Noever, D.; Ciolino, M.: ¬The Turing deception (2022) 0.20

0.19937156 = product of:
  0.29905733 = sum of:
    0.07476433 = product of:
      0.224293 = sum of:
        0.224293 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
          0.224293 = score(doc=862,freq=2.0), product of:
            0.39908504 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.047072954 = queryNorm
            0.56201804 = fieldWeight in 862, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=862)
      0.33333334 = coord(1/3)
    0.224293 = weight(_text_:2f in 862) [ClassicSimilarity], result of:
      0.224293 = score(doc=862,freq=2.0), product of:
        0.39908504 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.047072954 = queryNorm
        0.56201804 = fieldWeight in 862, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=862)
  0.6666667 = coord(2/3)

Source: https%3A%2F%2Farxiv.org%2Fabs%2F2212.06721&usg=AOvVaw3i_9pZm9y_dQWoHi6uv0EN

Huo, W.: Automatic multi-word term extraction and its application to Web-page summarization (2012) 0.16

0.16228414 = product of:
  0.2434262 = sum of:
    0.224293 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
      0.224293 = score(doc=563,freq=2.0), product of:
        0.39908504 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.047072954 = queryNorm
        0.56201804 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.019133206 = product of:
      0.038266413 = sum of:
        0.038266413 = weight(_text_:22 in 563) [ClassicSimilarity], result of:
          0.038266413 = score(doc=563,freq=2.0), product of:
            0.16484147 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.047072954 = queryNorm
            0.23214069 = fieldWeight in 563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=563)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Content: A Thesis presented to The University of Guelph In partial fulfilment of requirements for the degree of Master of Science in Computer Science. Vgl. Unter: http://www.inf.ufrgs.br%2F~ceramisch%2Fdownload_files%2Fpublications%2F2009%2Fp01.pdf.
Date: 10. 1.2013 19:22:47

Hutchins, J.: From first conception to first demonstration : the nascent years of machine translation, 1947-1954. A chronology (1997) 0.06

0.0587764 = product of:
  0.1763292 = sum of:
    0.1763292 = sum of:
      0.11255183 = weight(_text_:history in 1463) [ClassicSimilarity], result of:
        0.11255183 = score(doc=1463,freq=2.0), product of:
          0.21898255 = queryWeight, product of:
            4.6519823 = idf(docFreq=1146, maxDocs=44218)
            0.047072954 = queryNorm
          0.5139763 = fieldWeight in 1463, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            4.6519823 = idf(docFreq=1146, maxDocs=44218)
            0.078125 = fieldNorm(doc=1463)
      0.06377736 = weight(_text_:22 in 1463) [ClassicSimilarity], result of:
        0.06377736 = score(doc=1463,freq=2.0), product of:
          0.16484147 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.047072954 = queryNorm
          0.38690117 = fieldWeight in 1463, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.078125 = fieldNorm(doc=1463)
  0.33333334 = coord(1/3)

Abstract: Chronicles the early history of applying electronic computers to the task of translating natural languages, from the 1st suggestions by Warren Weaver in Mar 1947 to the 1st demonstration of a working, if limited, program in Jan 1954
Date: 31. 7.1996 9:22:19

Paolillo, J.C.: Linguistics and the information sciences (2009) 0.04

0.041143484 = product of:
  0.123430446 = sum of:
    0.123430446 = sum of:
      0.07878629 = weight(_text_:history in 3840) [ClassicSimilarity], result of:
        0.07878629 = score(doc=3840,freq=2.0), product of:
          0.21898255 = queryWeight, product of:
            4.6519823 = idf(docFreq=1146, maxDocs=44218)
            0.047072954 = queryNorm
          0.3597834 = fieldWeight in 3840, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            4.6519823 = idf(docFreq=1146, maxDocs=44218)
            0.0546875 = fieldNorm(doc=3840)
      0.04464415 = weight(_text_:22 in 3840) [ClassicSimilarity], result of:
        0.04464415 = score(doc=3840,freq=2.0), product of:
          0.16484147 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.047072954 = queryNorm
          0.2708308 = fieldWeight in 3840, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=3840)
  0.33333334 = coord(1/3)

Abstract: Linguistics is the scientific study of language which emphasizes language spoken in everyday settings by human beings. It has a long history of interdisciplinarity, both internally and in contribution to other fields, including information science. A linguistic perspective is beneficial in many ways in information science, since it examines the relationship between the forms of meaningful expressions and their social, cognitive, institutional, and communicative context, these being two perspectives on information that are actively studied, to different degrees, in information science. Examples of issues relevant to information science are presented for which the approach taken under a linguistic perspective is illustrated.
Date: 27. 8.2011 14:22:33

Yang, C.C.; Luk, J.: Automatic generation of English/Chinese thesaurus based on a parallel corpus in laws (2003) 0.02
```
0.020571742 = product of:
  0.061715223 = sum of:
    0.061715223 = sum of:
      0.039393146 = weight(_text_:history in 1616) [ClassicSimilarity], result of:
        0.039393146 = score(doc=1616,freq=2.0), product of:
          0.21898255 = queryWeight, product of:
            4.6519823 = idf(docFreq=1146, maxDocs=44218)
            0.047072954 = queryNorm
          0.1798917 = fieldWeight in 1616, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            4.6519823 = idf(docFreq=1146, maxDocs=44218)
            0.02734375 = fieldNorm(doc=1616)
      0.022322075 = weight(_text_:22 in 1616) [ClassicSimilarity], result of:
        0.022322075 = score(doc=1616,freq=2.0), product of:
          0.16484147 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.047072954 = queryNorm
          0.1354154 = fieldWeight in 1616, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.02734375 = fieldNorm(doc=1616)
  0.33333334 = coord(1/3)
```
Abstract

The information available in languages other than English in the World Wide Web is increasing significantly. According to a report from Computer Economics in 1999, 54% of Internet users are English speakers ("English Will Dominate Web for Only Three More Years," Computer Economics, July 9, 1999, http://www.computereconomics. com/new4/pr/pr990610.html). However, it is predicted that there will be only 60% increase in Internet users among English speakers verses a 150% growth among nonEnglish speakers for the next five years. By 2005, 57% of Internet users will be non-English speakers. A report by CNN.com in 2000 showed that the number of Internet users in China had been increased from 8.9 million to 16.9 million from January to June in 2000 ("Report: China Internet users double to 17 million," CNN.com, July, 2000, http://cnn.org/2000/TECH/computing/07/27/ china.internet.reut/index.html). According to Nielsen/ NetRatings, there was a dramatic leap from 22.5 millions to 56.6 millions Internet users from 2001 to 2002. China had become the second largest global at-home Internet population in 2002 (US's Internet population was 166 millions) (Robyn Greenspan, "China Pulls Ahead of Japan," Internet.com, April 22, 2002, http://cyberatias.internet.com/big-picture/geographics/article/0,,5911_1013841,00. html). All of the evidences reveal the importance of crosslingual research to satisfy the needs in the near future. Digital library research has been focusing in structural and semantic interoperability in the past. Searching and retrieving objects across variations in protocols, formats and disciplines are widely explored (Schatz, B., & Chen, H. (1999). Digital libraries: technological advances and social impacts. IEEE Computer, Special Issue an Digital Libraries, February, 32(2), 45-50.; Chen, H., Yen, J., & Yang, C.C. (1999). International activities: development of Asian digital libraries. IEEE Computer, Special Issue an Digital Libraries, 32(2), 48-49.). However, research in crossing language boundaries, especially across European languages and Oriental languages, is still in the initial stage. In this proposal, we put our focus an cross-lingual semantic interoperability by developing automatic generation of a cross-lingual thesaurus based an English/Chinese parallel corpus. When the searchers encounter retrieval problems, Professional librarians usually consult the thesaurus to identify other relevant vocabularies. In the problem of searching across language boundaries, a cross-lingual thesaurus, which is generated by co-occurrence analysis and Hopfield network, can be used to generate additional semantically relevant terms that cannot be obtained from dictionary. In particular, the automatically generated cross-lingual thesaurus is able to capture the unknown words that do not exist in a dictionary, such as names of persons, organizations, and events. Due to Hong Kong's unique history background, both English and Chinese are used as official languages in all legal documents. Therefore, English/Chinese cross-lingual information retrieval is critical for applications in courts and the government. In this paper, we develop an automatic thesaurus by the Hopfield network based an a parallel corpus collected from the Web site of the Department of Justice of the Hong Kong Special Administrative Region (HKSAR) Government. Experiments are conducted to measure the precision and recall of the automatic generated English/Chinese thesaurus. The result Shows that such thesaurus is a promising tool to retrieve relevant terms, especially in the language that is not the same as the input term. The direct translation of the input term can also be retrieved in most of the cases.

Harari, Y.N.: ¬[Yuval-Noah-Harari-argues-that] AI has hacked the operating system of human civilisation (2023) 0.02

0.01875864 = product of:
  0.056275915 = sum of:
    0.056275915 = product of:
      0.11255183 = sum of:
        0.11255183 = weight(_text_:history in 953) [ClassicSimilarity], result of:
          0.11255183 = score(doc=953,freq=2.0), product of:
            0.21898255 = queryWeight, product of:
              4.6519823 = idf(docFreq=1146, maxDocs=44218)
              0.047072954 = queryNorm
            0.5139763 = fieldWeight in 953, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6519823 = idf(docFreq=1146, maxDocs=44218)
              0.078125 = fieldNorm(doc=953)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Abstract: Storytelling computers will change the course of human history, says the historian and philosopher.

Warner, A.J.: Natural language processing (1987) 0.02

0.017007295 = product of:
  0.051021885 = sum of:
    0.051021885 = product of:
      0.10204377 = sum of:
        0.10204377 = weight(_text_:22 in 337) [ClassicSimilarity], result of:
          0.10204377 = score(doc=337,freq=2.0), product of:
            0.16484147 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.047072954 = queryNorm
            0.61904186 = fieldWeight in 337, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.125 = fieldNorm(doc=337)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Source: Annual review of information science and technology. 22(1987), S.79-108

McMahon, J.G.; Smith, F.J.: Improved statistical language model performance with automatic generated word hierarchies (1996) 0.01

0.014881384 = product of:
  0.04464415 = sum of:
    0.04464415 = product of:
      0.0892883 = sum of:
        0.0892883 = weight(_text_:22 in 3164) [ClassicSimilarity], result of:
          0.0892883 = score(doc=3164,freq=2.0), product of:
            0.16484147 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.047072954 = queryNorm
            0.5416616 = fieldWeight in 3164, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=3164)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Source: Computational linguistics. 22(1996) no.2, S.217-248

Ruge, G.: ¬A spreading activation network for automatic generation of thesaurus relationships (1991) 0.01

0.014881384 = product of:
  0.04464415 = sum of:
    0.04464415 = product of:
      0.0892883 = sum of:
        0.0892883 = weight(_text_:22 in 4506) [ClassicSimilarity], result of:
          0.0892883 = score(doc=4506,freq=2.0), product of:
            0.16484147 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.047072954 = queryNorm
            0.5416616 = fieldWeight in 4506, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=4506)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 8.10.2000 11:52:22

Somers, H.: Example-based machine translation : Review article (1999) 0.01

0.014881384 = product of:
  0.04464415 = sum of:
    0.04464415 = product of:
      0.0892883 = sum of:
        0.0892883 = weight(_text_:22 in 6672) [ClassicSimilarity], result of:
          0.0892883 = score(doc=6672,freq=2.0), product of:
            0.16484147 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.047072954 = queryNorm
            0.5416616 = fieldWeight in 6672, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=6672)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 31. 7.1996 9:22:19

New tools for human translators (1997) 0.01

0.014881384 = product of:
  0.04464415 = sum of:
    0.04464415 = product of:
      0.0892883 = sum of:
        0.0892883 = weight(_text_:22 in 1179) [ClassicSimilarity], result of:
          0.0892883 = score(doc=1179,freq=2.0), product of:
            0.16484147 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.047072954 = queryNorm
            0.5416616 = fieldWeight in 1179, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=1179)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 31. 7.1996 9:22:19

Baayen, R.H.; Lieber, H.: Word frequency distributions and lexical semantics (1997) 0.01

0.014881384 = product of:
  0.04464415 = sum of:
    0.04464415 = product of:
      0.0892883 = sum of:
        0.0892883 = weight(_text_:22 in 3117) [ClassicSimilarity], result of:
          0.0892883 = score(doc=3117,freq=2.0), product of:
            0.16484147 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.047072954 = queryNorm
            0.5416616 = fieldWeight in 3117, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=3117)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 28. 2.1999 10:48:22

¬Der Student aus dem Computer (2023) 0.01

0.014881384 = product of:
  0.04464415 = sum of:
    0.04464415 = product of:
      0.0892883 = sum of:
        0.0892883 = weight(_text_:22 in 1079) [ClassicSimilarity], result of:
          0.0892883 = score(doc=1079,freq=2.0), product of:
            0.16484147 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.047072954 = queryNorm
            0.5416616 = fieldWeight in 1079, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=1079)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 27. 1.2023 16:22:55

Byrne, C.C.; McCracken, S.A.: ¬An adaptive thesaurus employing semantic distance, relational inheritance and nominal compound interpretation for linguistic support of information retrieval (1999) 0.01

0.012755471 = product of:
  0.038266413 = sum of:
    0.038266413 = product of:
      0.076532826 = sum of:
        0.076532826 = weight(_text_:22 in 4483) [ClassicSimilarity], result of:
          0.076532826 = score(doc=4483,freq=2.0), product of:
            0.16484147 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.047072954 = queryNorm
            0.46428138 = fieldWeight in 4483, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=4483)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 15. 3.2000 10:22:37

Boleda, G.; Evert, S.: Multiword expressions : a pain in the neck of lexical semantics (2009) 0.01

0.012755471 = product of:
  0.038266413 = sum of:
    0.038266413 = product of:
      0.076532826 = sum of:
        0.076532826 = weight(_text_:22 in 4888) [ClassicSimilarity], result of:
          0.076532826 = score(doc=4888,freq=2.0), product of:
            0.16484147 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.047072954 = queryNorm
            0.46428138 = fieldWeight in 4888, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=4888)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 1. 3.2013 14:56:22

Monnerjahn, P.: Vorsprung ohne Technik : Übersetzen: Computer und Qualität (2000) 0.01

0.012755471 = product of:
  0.038266413 = sum of:
    0.038266413 = product of:
      0.076532826 = sum of:
        0.076532826 = weight(_text_:22 in 5429) [ClassicSimilarity], result of:
          0.076532826 = score(doc=5429,freq=2.0), product of:
            0.16484147 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.047072954 = queryNorm
            0.46428138 = fieldWeight in 5429, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=5429)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Source: c't. 2000, H.22, S.230-231

Comeau, D.C.; Wilbur, W.J.: Non-Word Identification or Spell Checking Without a Dictionary (2004) 0.01
```
0.011255185 = product of:
  0.033765554 = sum of:
    0.033765554 = product of:
      0.06753111 = sum of:
        0.06753111 = weight(_text_:history in 2092) [ClassicSimilarity], result of:
          0.06753111 = score(doc=2092,freq=2.0), product of:
            0.21898255 = queryWeight, product of:
              4.6519823 = idf(docFreq=1146, maxDocs=44218)
              0.047072954 = queryNorm
            0.3083858 = fieldWeight in 2092, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6519823 = idf(docFreq=1146, maxDocs=44218)
              0.046875 = fieldNorm(doc=2092)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

MEDLINE is a collection of more than 12 million references and abstracts covering recent life science literature. With its continued growth and cutting-edge terminology, spell-checking with a traditional lexicon based approach requires significant additional manual followup. In this work, an internal corpus based context quality rating a, frequency, and simple misspelling transformations are used to rank words from most likely to be misspellings to least likely. Eleven-point average precisions of 0.891 have been achieved within a class of 42,340 all alphabetic words having an a score less than 10. Our models predict that 16,274 or 38% of these words are misspellings. Based an test data, this result has a recall of 79% and a precision of 86%. In other words, spell checking can be done by statistics instead of with a dictionary. As an application we examine the time history of low a words in MEDLINE titles and abstracts.
Sebastiani, F.: ¬A tutorial an automated text categorisation (1999) 0.01
```
0.011255185 = product of:
  0.033765554 = sum of:
    0.033765554 = product of:
      0.06753111 = sum of:
        0.06753111 = weight(_text_:history in 3390) [ClassicSimilarity], result of:
          0.06753111 = score(doc=3390,freq=2.0), product of:
            0.21898255 = queryWeight, product of:
              4.6519823 = idf(docFreq=1146, maxDocs=44218)
              0.047072954 = queryNorm
            0.3083858 = fieldWeight in 3390, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6519823 = idf(docFreq=1146, maxDocs=44218)
              0.046875 = fieldNorm(doc=3390)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

The automated categorisation (or classification) of texts into topical categories has a long history, dating back at least to 1960. Until the late '80s, the dominant approach to the problem involved knowledge-engineering automatic categorisers, i.e. manually building a set of rules encoding expert knowledge an how to classify documents. In the '90s, with the booming production and availability of on-line documents, automated text categorisation has witnessed an increased and renewed interest. A newer paradigm based an machine learning has superseded the previous approach. Within this paradigm, a general inductive process automatically builds a classifier by "learning", from a set of previously classified documents, the characteristics of one or more categories; the advantages are a very good effectiveness, a considerable savings in terms of expert manpower, and domain independence. In this tutorial we look at the main approaches that have been taken towards automatic text categorisation within the general machine learning paradigm. Issues of document indexing, classifier construction, and classifier evaluation, will be touched upon.
Clark, M.; Kim, Y.; Kruschwitz, U.; Song, D.; Albakour, D.; Dignum, S.; Beresi, U.C.; Fasli, M.; Roeck, A De: Automatically structuring domain knowledge from text : an overview of current research (2012) 0.01
```
0.011255185 = product of:
  0.033765554 = sum of:
    0.033765554 = product of:
      0.06753111 = sum of:
        0.06753111 = weight(_text_:history in 2738) [ClassicSimilarity], result of:
          0.06753111 = score(doc=2738,freq=2.0), product of:
            0.21898255 = queryWeight, product of:
              4.6519823 = idf(docFreq=1146, maxDocs=44218)
              0.047072954 = queryNorm
            0.3083858 = fieldWeight in 2738, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6519823 = idf(docFreq=1146, maxDocs=44218)
              0.046875 = fieldNorm(doc=2738)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

This paper presents an overview of automatic methods for building domain knowledge structures (domain models) from text collections. Applications of domain models have a long history within knowledge engineering and artificial intelligence. In the last couple of decades they have surfaced noticeably as a useful tool within natural language processing, information retrieval and semantic web technology. Inspired by the ubiquitous propagation of domain model structures that are emerging in several research disciplines, we give an overview of the current research landscape and some techniques and approaches. We will also discuss trade-offs between different approaches and point to some recent trends.

Search (59 results, page 1 of 3)

Authors

Years

Languages

Types

Themes

Subjects

Classifications