Search (99 results, page 1 of 5)

  • × theme_ss:"Computerlinguistik"
  1. Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.13
    0.13081796 = sum of:
      0.076931566 = product of:
        0.2307947 = sum of:
          0.2307947 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
            0.2307947 = score(doc=562,freq=2.0), product of:
              0.41065353 = queryWeight, product of:
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.048437484 = queryNorm
              0.56201804 = fieldWeight in 562, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.046875 = fieldNorm(doc=562)
        0.33333334 = coord(1/3)
      0.034198564 = weight(_text_:web in 562) [ClassicSimilarity], result of:
        0.034198564 = score(doc=562,freq=2.0), product of:
          0.15807624 = queryWeight, product of:
            3.2635105 = idf(docFreq=4597, maxDocs=44218)
            0.048437484 = queryNorm
          0.21634221 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.2635105 = idf(docFreq=4597, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
      0.01968783 = product of:
        0.03937566 = sum of:
          0.03937566 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
            0.03937566 = score(doc=562,freq=2.0), product of:
              0.16961981 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.048437484 = queryNorm
              0.23214069 = fieldWeight in 562, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=562)
        0.5 = coord(1/2)
    
    Content
    Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
    Date
    8. 1.2013 10:22:32
  2. Schneider, R.: Web 3.0 ante portas? : Integration von Social Web und Semantic Web (2008) 0.09
    0.085686795 = product of:
      0.12853019 = sum of:
        0.105561055 = weight(_text_:web in 4184) [ClassicSimilarity], result of:
          0.105561055 = score(doc=4184,freq=14.0), product of:
            0.15807624 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.048437484 = queryNorm
            0.6677857 = fieldWeight in 4184, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4184)
        0.022969136 = product of:
          0.045938272 = sum of:
            0.045938272 = weight(_text_:22 in 4184) [ClassicSimilarity], result of:
              0.045938272 = score(doc=4184,freq=2.0), product of:
                0.16961981 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.048437484 = queryNorm
                0.2708308 = fieldWeight in 4184, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=4184)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Das Medium Internet ist im Wandel, und mit ihm ändern sich seine Publikations- und Rezeptionsbedingungen. Welche Chancen bieten die momentan parallel diskutierten Zukunftsentwürfe von Social Web und Semantic Web? Zur Beantwortung dieser Frage beschäftigt sich der Beitrag mit den Grundlagen beider Modelle unter den Aspekten Anwendungsbezug und Technologie, beleuchtet darüber hinaus jedoch auch deren Unzulänglichkeiten sowie den Mehrwert einer mediengerechten Kombination. Am Beispiel des grammatischen Online-Informationssystems grammis wird eine Strategie zur integrativen Nutzung der jeweiligen Stärken skizziert.
    Date
    22. 1.2011 10:38:28
    Source
    Kommunikation, Partizipation und Wirkungen im Social Web, Band 1. Hrsg.: A. Zerfaß u.a
    Theme
    Semantic Web
  3. Huo, W.: Automatic multi-word term extraction and its application to Web-page summarization (2012) 0.06
    0.05872331 = product of:
      0.08808496 = sum of:
        0.06839713 = weight(_text_:web in 563) [ClassicSimilarity], result of:
          0.06839713 = score(doc=563,freq=8.0), product of:
            0.15807624 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.048437484 = queryNorm
            0.43268442 = fieldWeight in 563, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=563)
        0.01968783 = product of:
          0.03937566 = sum of:
            0.03937566 = weight(_text_:22 in 563) [ClassicSimilarity], result of:
              0.03937566 = score(doc=563,freq=2.0), product of:
                0.16961981 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.048437484 = queryNorm
                0.23214069 = fieldWeight in 563, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=563)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    In this thesis we propose three new word association measures for multi-word term extraction. We combine these association measures with LocalMaxs algorithm in our extraction model and compare the results of different multi-word term extraction methods. Our approach is language and domain independent and requires no training data. It can be applied to such tasks as text summarization, information retrieval, and document classification. We further explore the potential of using multi-word terms as an effective representation for general web-page summarization. We extract multi-word terms from human written summaries in a large collection of web-pages, and generate the summaries by aligning document words with these multi-word terms. Our system applies machine translation technology to learn the aligning process from a training set and focuses on selecting high quality multi-word terms from human written summaries to generate suitable results for web-page summarization.
    Date
    10. 1.2013 19:22:47
  4. Doszkocs, T.E.; Zamora, A.: Dictionary services and spelling aids for Web searching (2004) 0.05
    0.048375808 = product of:
      0.07256371 = sum of:
        0.049361378 = weight(_text_:web in 2541) [ClassicSimilarity], result of:
          0.049361378 = score(doc=2541,freq=6.0), product of:
            0.15807624 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.048437484 = queryNorm
            0.3122631 = fieldWeight in 2541, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2541)
        0.023202332 = product of:
          0.046404663 = sum of:
            0.046404663 = weight(_text_:22 in 2541) [ClassicSimilarity], result of:
              0.046404663 = score(doc=2541,freq=4.0), product of:
                0.16961981 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.048437484 = queryNorm
                0.27358043 = fieldWeight in 2541, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2541)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    The Specialized Information Services Division (SIS) of the National Library of Medicine (NLM) provides Web access to more than a dozen scientific databases on toxicology and the environment on TOXNET . Search queries on TOXNET often include misspelled or variant English words, medical and scientific jargon and chemical names. Following the example of search engines like Google and ClinicalTrials.gov, we set out to develop a spelling "suggestion" system for increased recall and precision in TOXNET searching. This paper describes development of dictionary technology that can be used in a variety of applications such as orthographic verification, writing aid, natural language processing, and information storage and retrieval. The design of the technology allows building complex applications using the components developed in the earlier phases of the work in a modular fashion without extensive rewriting of computer code. Since many of the potential applications envisioned for this work have on-line or web-based interfaces, the dictionaries and other computer components must have fast response, and must be adaptable to open-ended database vocabularies, including chemical nomenclature. The dictionary vocabulary for this work was derived from SIS and other databases and specialized resources, such as NLM's Unified Medical Language Systems (UMLS) . The resulting technology, A-Z Dictionary (AZdict), has three major constituents: 1) the vocabulary list, 2) the word attributes that define part of speech and morphological relationships between words in the list, and 3) a set of programs that implements the retrieval of words and their attributes, and determines similarity between words (ChemSpell). These three components can be used in various applications such as spelling verification, spelling aid, part-of-speech tagging, paraphrasing, and many other natural language processing functions.
    Date
    14. 8.2004 17:22:56
    Source
    Online. 28(2004) no.3, S.22-29
  5. Bian, G.-W.; Chen, H.-H.: Cross-language information access to multilingual collections on the Internet (2000) 0.05
    0.045367938 = product of:
      0.068051904 = sum of:
        0.048364073 = weight(_text_:web in 4436) [ClassicSimilarity], result of:
          0.048364073 = score(doc=4436,freq=4.0), product of:
            0.15807624 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.048437484 = queryNorm
            0.3059541 = fieldWeight in 4436, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=4436)
        0.01968783 = product of:
          0.03937566 = sum of:
            0.03937566 = weight(_text_:22 in 4436) [ClassicSimilarity], result of:
              0.03937566 = score(doc=4436,freq=2.0), product of:
                0.16961981 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.048437484 = queryNorm
                0.23214069 = fieldWeight in 4436, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4436)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Language barrier is the major problem that people face in searching for, retrieving, and understanding multilingual collections on the Internet. This paper deals with query translation and document translation in a Chinese-English information retrieval system called MTIR. Bilingual dictionary and monolingual corpus-based approaches are adopted to select suitable tranlated query terms. A machine transliteration algorithm is introduced to resolve proper name searching. We consider several design issues for document translation, including which material is translated, what roles the HTML tags play in translation, what the tradeoff is between the speed performance and the translation performance, and what from the translated result is presented in. About 100.000 Web pages translated in the last 4 months of 1997 are used for quantitative study of online and real-time Web page translation
    Date
    16. 2.2000 14:22:39
  6. Yang, C.C.; Luk, J.: Automatic generation of English/Chinese thesaurus based on a parallel corpus in laws (2003) 0.03
    0.034255266 = product of:
      0.051382896 = sum of:
        0.03989833 = weight(_text_:web in 1616) [ClassicSimilarity], result of:
          0.03989833 = score(doc=1616,freq=8.0), product of:
            0.15807624 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.048437484 = queryNorm
            0.25239927 = fieldWeight in 1616, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.02734375 = fieldNorm(doc=1616)
        0.011484568 = product of:
          0.022969136 = sum of:
            0.022969136 = weight(_text_:22 in 1616) [ClassicSimilarity], result of:
              0.022969136 = score(doc=1616,freq=2.0), product of:
                0.16961981 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.048437484 = queryNorm
                0.1354154 = fieldWeight in 1616, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=1616)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    The information available in languages other than English in the World Wide Web is increasing significantly. According to a report from Computer Economics in 1999, 54% of Internet users are English speakers ("English Will Dominate Web for Only Three More Years," Computer Economics, July 9, 1999, http://www.computereconomics. com/new4/pr/pr990610.html). However, it is predicted that there will be only 60% increase in Internet users among English speakers verses a 150% growth among nonEnglish speakers for the next five years. By 2005, 57% of Internet users will be non-English speakers. A report by CNN.com in 2000 showed that the number of Internet users in China had been increased from 8.9 million to 16.9 million from January to June in 2000 ("Report: China Internet users double to 17 million," CNN.com, July, 2000, http://cnn.org/2000/TECH/computing/07/27/ china.internet.reut/index.html). According to Nielsen/ NetRatings, there was a dramatic leap from 22.5 millions to 56.6 millions Internet users from 2001 to 2002. China had become the second largest global at-home Internet population in 2002 (US's Internet population was 166 millions) (Robyn Greenspan, "China Pulls Ahead of Japan," Internet.com, April 22, 2002, http://cyberatias.internet.com/big-picture/geographics/article/0,,5911_1013841,00. html). All of the evidences reveal the importance of crosslingual research to satisfy the needs in the near future. Digital library research has been focusing in structural and semantic interoperability in the past. Searching and retrieving objects across variations in protocols, formats and disciplines are widely explored (Schatz, B., & Chen, H. (1999). Digital libraries: technological advances and social impacts. IEEE Computer, Special Issue an Digital Libraries, February, 32(2), 45-50.; Chen, H., Yen, J., & Yang, C.C. (1999). International activities: development of Asian digital libraries. IEEE Computer, Special Issue an Digital Libraries, 32(2), 48-49.). However, research in crossing language boundaries, especially across European languages and Oriental languages, is still in the initial stage. In this proposal, we put our focus an cross-lingual semantic interoperability by developing automatic generation of a cross-lingual thesaurus based an English/Chinese parallel corpus. When the searchers encounter retrieval problems, Professional librarians usually consult the thesaurus to identify other relevant vocabularies. In the problem of searching across language boundaries, a cross-lingual thesaurus, which is generated by co-occurrence analysis and Hopfield network, can be used to generate additional semantically relevant terms that cannot be obtained from dictionary. In particular, the automatically generated cross-lingual thesaurus is able to capture the unknown words that do not exist in a dictionary, such as names of persons, organizations, and events. Due to Hong Kong's unique history background, both English and Chinese are used as official languages in all legal documents. Therefore, English/Chinese cross-lingual information retrieval is critical for applications in courts and the government. In this paper, we develop an automatic thesaurus by the Hopfield network based an a parallel corpus collected from the Web site of the Department of Justice of the Hong Kong Special Administrative Region (HKSAR) Government. Experiments are conducted to measure the precision and recall of the automatic generated English/Chinese thesaurus. The result Shows that such thesaurus is a promising tool to retrieve relevant terms, especially in the language that is not the same as the input term. The direct translation of the input term can also be retrieved in most of the cases.
    Footnote
    Teil eines Themenheftes: "Web retrieval and mining: A machine learning perspective"
  7. Rettinger, A.; Schumilin, A.; Thoma, S.; Ell, B.: Learning a cross-lingual semantic representation of relations expressed in text (2015) 0.03
    0.032907587 = product of:
      0.098722756 = sum of:
        0.098722756 = weight(_text_:web in 2027) [ClassicSimilarity], result of:
          0.098722756 = score(doc=2027,freq=6.0), product of:
            0.15807624 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.048437484 = queryNorm
            0.6245262 = fieldWeight in 2027, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.078125 = fieldNorm(doc=2027)
      0.33333334 = coord(1/3)
    
    Series
    Information Systems and Applications, incl. Internet/Web, and HCI; Bd. 9088
    Source
    The Semantic Web: latest advances and new domains. 12th European Semantic Web Conference, ESWC 2015 Portoroz, Slovenia, May 31 -- June 4, 2015. Proceedings. Eds.: F. Gandon u.a
  8. Noever, D.; Ciolino, M.: ¬The Turing deception (2022) 0.03
    0.025643855 = product of:
      0.076931566 = sum of:
        0.076931566 = product of:
          0.2307947 = sum of:
            0.2307947 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
              0.2307947 = score(doc=862,freq=2.0), product of:
                0.41065353 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.048437484 = queryNorm
                0.56201804 = fieldWeight in 862, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=862)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Source
    https%3A%2F%2Farxiv.org%2Fabs%2F2212.06721&usg=AOvVaw3i_9pZm9y_dQWoHi6uv0EN
  9. Granitzer, M.: Statistische Verfahren der Textanalyse (2006) 0.02
    0.02303531 = product of:
      0.06910593 = sum of:
        0.06910593 = weight(_text_:web in 5809) [ClassicSimilarity], result of:
          0.06910593 = score(doc=5809,freq=6.0), product of:
            0.15807624 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.048437484 = queryNorm
            0.43716836 = fieldWeight in 5809, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5809)
      0.33333334 = coord(1/3)
    
    Abstract
    Der vorliegende Artikel bietet einen Überblick über statistische Verfahren der Textanalyse im Kontext des Semantic Webs. Als Einleitung erfolgt die Diskussion von Methoden und gängigen Techniken zur Vorverarbeitung von Texten wie z. B. Stemming oder Part-of-Speech Tagging. Die so eingeführten Repräsentationsformen dienen als Basis für statistische Merkmalsanalysen sowie für weiterführende Techniken wie Information Extraction und maschinelle Lernverfahren. Die Darstellung dieser speziellen Techniken erfolgt im Überblick, wobei auf die wichtigsten Aspekte in Bezug auf das Semantic Web detailliert eingegangen wird. Die Anwendung der vorgestellten Techniken zur Erstellung und Wartung von Ontologien sowie der Verweis auf weiterführende Literatur bilden den Abschluss dieses Artikels.
    Source
    Semantic Web: Wege zur vernetzten Wissensgesellschaft. Hrsg.: T. Pellegrini, u. A. Blumauer
    Theme
    Semantic Web
  10. Wang, F.L.; Yang, C.C.: Mining Web data for Chinese segmentation (2007) 0.02
    0.021241754 = product of:
      0.06372526 = sum of:
        0.06372526 = weight(_text_:web in 604) [ClassicSimilarity], result of:
          0.06372526 = score(doc=604,freq=10.0), product of:
            0.15807624 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.048437484 = queryNorm
            0.40312994 = fieldWeight in 604, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=604)
      0.33333334 = coord(1/3)
    
    Abstract
    Modern information retrieval systems use keywords within documents as indexing terms for search of relevant documents. As Chinese is an ideographic character-based language, the words in the texts are not delimited by white spaces. Indexing of Chinese documents is impossible without a proper segmentation algorithm. Many Chinese segmentation algorithms have been proposed in the past. Traditional segmentation algorithms cannot operate without a large dictionary or a large corpus of training data. Nowadays, the Web has become the largest corpus that is ideal for Chinese segmentation. Although most search engines have problems in segmenting texts into proper words, they maintain huge databases of documents and frequencies of character sequences in the documents. Their databases are important potential resources for segmentation. In this paper, we propose a segmentation algorithm by mining Web data with the help of search engines. On the other hand, the Romanized pinyin of Chinese language indicates boundaries of words in the text. Our algorithm is the first to utilize the Romanized pinyin to segmentation. It is the first unified segmentation algorithm for the Chinese language from different geographical areas, and it is also domain independent because of the nature of the Web. Experiments have been conducted on the datasets of a recent Chinese segmentation competition. The results show that our algorithm outperforms the traditional algorithms in terms of precision and recall. Moreover, our algorithm can effectively deal with the problems of segmentation ambiguity, new word (unknown word) detection, and stop words.
    Footnote
    Beitrag eines Themenschwerpunktes "Mining Web resources for enhancing information retrieval"
  11. Radev, D.; Fan, W.; Qu, H.; Wu, H.; Grewal, A.: Probabilistic question answering on the Web (2005) 0.02
    0.019744553 = product of:
      0.059233658 = sum of:
        0.059233658 = weight(_text_:web in 3455) [ClassicSimilarity], result of:
          0.059233658 = score(doc=3455,freq=6.0), product of:
            0.15807624 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.048437484 = queryNorm
            0.37471575 = fieldWeight in 3455, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=3455)
      0.33333334 = coord(1/3)
    
    Abstract
    Web-based search engines such as Google and NorthernLight return documents that are relevant to a user query, not answers to user questions. We have developed an architecture that augments existing search engines so that they support natural language question answering. The process entails five steps: query modulation, document retrieval, passage extraction, phrase extraction, and answer ranking. In this article, we describe some probabilistic approaches to the last three of these stages. We show how our techniques apply to a number of existing search engines, and we also present results contrasting three different methods for question answering. Our algorithm, probabilistic phrase reranking (PPR), uses proximity and question type features and achieves a total reciprocal document rank of .20 an the TREC8 corpus. Our techniques have been implemented as a Web-accessible system, called NSIR.
  12. Thelwall, M.; Price, L.: Language evolution and the spread of ideas on the Web : a procedure for identifying emergent hybrid word (2006) 0.02
    0.019744553 = product of:
      0.059233658 = sum of:
        0.059233658 = weight(_text_:web in 5896) [ClassicSimilarity], result of:
          0.059233658 = score(doc=5896,freq=6.0), product of:
            0.15807624 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.048437484 = queryNorm
            0.37471575 = fieldWeight in 5896, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=5896)
      0.33333334 = coord(1/3)
    
    Abstract
    Word usage is of interest to linguists for its own sake as well as to social scientists and others who seek to track the spread of ideas, for example, in public debates over political decisions. The historical evolution of language can be analyzed with the tools of corpus linguistics through evolving corpora and the Web. But word usage statistics can only be gathered for known words. In this article, techniques are described and tested for identifying new words from the Web, focusing on the case when the words are related to a topic and have a hybrid form with a common sequence of letters. The results highlight the need to employ a combination of search techniques and show the wide potential of hybrid word family investigations in linguistics and social science.
  13. Jensen, N.: Evaluierung von mehrsprachigem Web-Retrieval : Experimente mit dem EuroGOV-Korpus im Rahmen des Cross Language Evaluation Forum (CLEF) (2006) 0.02
    0.019744553 = product of:
      0.059233658 = sum of:
        0.059233658 = weight(_text_:web in 5964) [ClassicSimilarity], result of:
          0.059233658 = score(doc=5964,freq=6.0), product of:
            0.15807624 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.048437484 = queryNorm
            0.37471575 = fieldWeight in 5964, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=5964)
      0.33333334 = coord(1/3)
    
    Abstract
    Der vorliegende Artikel beschreibt die Experimente der Universität Hildesheim im Rahmen des ersten Web Track der CLEF-Initiative (WebCLEF) im Jahr 2005. Bei der Teilnahme konnten Erfahrungen mit einem multilingualen Web-Korpus (EuroGOV) bei der Vorverarbeitung, der Topic- bzw. Query-Entwicklung, bei sprachunabhängigen Indexierungsmethoden und multilingualen Retrieval-Strategien gesammelt werden. Aufgrund des großen Um-fangs des Korpus und der zeitlichen Einschränkungen wurden multilinguale Indizes aufgebaut. Der Artikel beschreibt die Vorgehensweise bei der Teilnahme der Universität Hildesheim und die Ergebnisse der offiziell eingereichten sowie weiterer Experimente. Für den Multilingual Task konnte das beste Ergebnis in CLEF erzielt werden.
  14. Airio, E.: Who benefits from CLIR in web retrieval? (2008) 0.02
    0.019744553 = product of:
      0.059233658 = sum of:
        0.059233658 = weight(_text_:web in 2342) [ClassicSimilarity], result of:
          0.059233658 = score(doc=2342,freq=6.0), product of:
            0.15807624 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.048437484 = queryNorm
            0.37471575 = fieldWeight in 2342, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=2342)
      0.33333334 = coord(1/3)
    
    Abstract
    Purpose - The aim of the current paper is to test whether query translation is beneficial in web retrieval. Design/methodology/approach - The language pairs were Finnish-Swedish, English-German and Finnish-French. A total of 12-18 participants were recruited for each language pair. Each participant performed four retrieval tasks. The author's aim was to compare the performance of the translated queries with that of the target language queries. Thus, the author asked participants to formulate a source language query and a target language query for each task. The source language queries were translated into the target language utilizing a dictionary-based system. In English-German, also machine translation was utilized. The author used Google as the search engine. Findings - The results differed depending on the language pair. The author concluded that the dictionary coverage had an effect on the results. On average, the results of query-translation were better than in the traditional laboratory tests. Originality/value - This research shows that query translation in web is beneficial especially for users with moderate and non-active language skills. This is valuable information for developers of cross-language information retrieval systems.
  15. Dreehsen, B.: ¬Der PC als Dolmetscher (1998) 0.02
    0.018999204 = product of:
      0.05699761 = sum of:
        0.05699761 = weight(_text_:web in 1474) [ClassicSimilarity], result of:
          0.05699761 = score(doc=1474,freq=2.0), product of:
            0.15807624 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.048437484 = queryNorm
            0.36057037 = fieldWeight in 1474, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.078125 = fieldNorm(doc=1474)
      0.33333334 = coord(1/3)
    
    Abstract
    Für englische Web-Seiten und fremdsprachige Korrespondenz ist Übersetzungssoftware hilfreich, die per Mausklick den Text ins Deutsche überträgt und umgekehrt. Die neuen Versionen geben den Inhalt sinngemäß bereits gut wieder. CHIP hat die Leistungen von 5 Programmen getestet
  16. Li, Q.; Chen, Y.P.; Myaeng, S.-H.; Jin, Y.; Kang, B.-Y.: Concept unification of terms in different languages via web mining for Information Retrieval (2009) 0.02
    0.018999204 = product of:
      0.05699761 = sum of:
        0.05699761 = weight(_text_:web in 4215) [ClassicSimilarity], result of:
          0.05699761 = score(doc=4215,freq=8.0), product of:
            0.15807624 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.048437484 = queryNorm
            0.36057037 = fieldWeight in 4215, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4215)
      0.33333334 = coord(1/3)
    
    Abstract
    For historical and cultural reasons, English phrases, especially proper nouns and new words, frequently appear in Web pages written primarily in East Asian languages such as Chinese, Korean, and Japanese. Although such English terms and their equivalences in these East Asian languages refer to the same concept, they are often erroneously treated as independent index units in traditional Information Retrieval (IR). This paper describes the degree to which the problem arises in IR and proposes a novel technique to solve it. Our method first extracts English terms from native Web documents in an East Asian language, and then unifies the extracted terms and their equivalences in the native language as one index unit. For Cross-Language Information Retrieval (CLIR), one of the major hindrances to achieving retrieval performance at the level of Mono-Lingual Information Retrieval (MLIR) is the translation of terms in search queries which can not be found in a bilingual dictionary. The Web mining approach proposed in this paper for concept unification of terms in different languages can also be applied to solve this well-known challenge in CLIR. Experimental results based on NTCIR and KT-Set test collections show that the high translation precision of our approach greatly improves performance of both Mono-Lingual and Cross-Language Information Retrieval.
  17. Rajasurya, S.; Muralidharan, T.; Devi, S.; Swamynathan, S.: Semantic information retrieval using ontology in university domain (2012) 0.02
    0.018999204 = product of:
      0.05699761 = sum of:
        0.05699761 = weight(_text_:web in 2861) [ClassicSimilarity], result of:
          0.05699761 = score(doc=2861,freq=8.0), product of:
            0.15807624 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.048437484 = queryNorm
            0.36057037 = fieldWeight in 2861, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2861)
      0.33333334 = coord(1/3)
    
    Abstract
    Today's conventional search engines hardly do provide the essential content relevant to the user's search query. This is because the context and semantics of the request made by the user is not analyzed to the full extent. So here the need for a semantic web search arises. SWS is upcoming in the area of web search which combines Natural Language Processing and Artificial Intelligence. The objective of the work done here is to design, develop and implement a semantic search engine- SIEU(Semantic Information Extraction in University Domain) confined to the university domain. SIEU uses ontology as a knowledge base for the information retrieval process. It is not just a mere keyword search. It is one layer above what Google or any other search engines retrieve by analyzing just the keywords. Here the query is analyzed both syntactically and semantically. The developed system retrieves the web results more relevant to the user query through keyword expansion. The results obtained here will be accurate enough to satisfy the request made by the user. The level of accuracy will be enhanced since the query is analyzed semantically. The system will be of great use to the developers and researchers who work on web. The Google results are re-ranked and optimized for providing the relevant links. For ranking an algorithm has been applied which fetches more apt results for the user query.
  18. Rozinajová, V.; Macko, P.: Using natural language to search linked data (2017) 0.02
    0.018999204 = product of:
      0.05699761 = sum of:
        0.05699761 = weight(_text_:web in 3488) [ClassicSimilarity], result of:
          0.05699761 = score(doc=3488,freq=8.0), product of:
            0.15807624 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.048437484 = queryNorm
            0.36057037 = fieldWeight in 3488, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3488)
      0.33333334 = coord(1/3)
    
    Abstract
    There are many endeavors aiming to offer users more effective ways of getting relevant information from web. One of them is represented by a concept of Linked Data, which provides interconnected data sources. But querying these types of data is difficult not only for the conventional web users but also for ex-perts in this field. Therefore, a more comfortable way of user query would be of great value. One direction could be to allow the user to use a natural language. To make this task easier we have proposed a method for translating natural language query to SPARQL query. It is based on a sentence structure - utilizing dependen-cies between the words in user queries. Dependencies are used to map the query to the semantic web structure, which is in the next step translated to SPARQL query. According to our first experiments we are able to answer a significant group of user queries.
    Series
    Information Systems and Applications, incl. Internet/Web, and HCI; 10151
  19. Kuo, J.-S.; Li, H.; Yang, Y.-K.: Active learning for constructing transliteration lexicons from the Web (2008) 0.02
    0.018808253 = product of:
      0.056424756 = sum of:
        0.056424756 = weight(_text_:web in 1345) [ClassicSimilarity], result of:
          0.056424756 = score(doc=1345,freq=4.0), product of:
            0.15807624 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.048437484 = queryNorm
            0.35694647 = fieldWeight in 1345, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1345)
      0.33333334 = coord(1/3)
    
    Abstract
    This article presents an adaptive learning framework for Phonetic Similarity Modeling (PSM) that supports the automatic construction of transliteration lexicons. The learning algorithm starts with minimum prior knowledge about machine transliteration and acquires knowledge iteratively from the Web. We study the unsupervised learning and the active learning strategies that minimize human supervision in terms of data labeling. The learning process refines the PSM and constructs a transliteration lexicon at the same time. We evaluate the proposed PSM and its learning algorithm through a series of systematic experiments, which show that the proposed framework is reliably effective on two independent databases.
  20. Wong, W.; Liu, W.; Bennamoun, M.: Ontology learning from text : a look back and into the future (2010) 0.02
    0.018808253 = product of:
      0.056424756 = sum of:
        0.056424756 = weight(_text_:web in 4733) [ClassicSimilarity], result of:
          0.056424756 = score(doc=4733,freq=4.0), product of:
            0.15807624 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.048437484 = queryNorm
            0.35694647 = fieldWeight in 4733, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4733)
      0.33333334 = coord(1/3)
    
    Abstract
    Ontologies are often viewed as the answer to the need for inter-operable semantics in modern information systems. The explosion of textual information on the "Read/Write" Web coupled with the increasing demand for ontologies to power the Semantic Web have made (semi-)automatic ontology learning from text a very promising research area. This together with the advanced state in related areas such as natural language processing have fuelled research into ontology learning over the past decade. This survey looks at how far we have come since the turn of the millennium, and discusses the remaining challenges that will define the research directions in this area in the near future.

Years

Languages

  • e 74
  • d 25
  • m 2
  • More… Less…

Types

  • a 76
  • el 12
  • m 12
  • s 7
  • x 3
  • p 2
  • d 1
  • More… Less…

Classifications