Search (8 results, page 1 of 1)

  • × language_ss:"e"
  • × theme_ss:"Computerlinguistik"
  • × theme_ss:"Internet"
  1. Bian, G.-W.; Chen, H.-H.: Cross-language information access to multilingual collections on the Internet (2000) 0.02
    0.022235535 = product of:
      0.04447107 = sum of:
        0.04447107 = sum of:
          0.007030784 = weight(_text_:a in 4436) [ClassicSimilarity], result of:
            0.007030784 = score(doc=4436,freq=6.0), product of:
              0.053105544 = queryWeight, product of:
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.046056706 = queryNorm
              0.13239266 = fieldWeight in 4436, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.046875 = fieldNorm(doc=4436)
          0.037440285 = weight(_text_:22 in 4436) [ClassicSimilarity], result of:
            0.037440285 = score(doc=4436,freq=2.0), product of:
              0.16128273 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046056706 = queryNorm
              0.23214069 = fieldWeight in 4436, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=4436)
      0.5 = coord(1/2)
    
    Abstract
    Language barrier is the major problem that people face in searching for, retrieving, and understanding multilingual collections on the Internet. This paper deals with query translation and document translation in a Chinese-English information retrieval system called MTIR. Bilingual dictionary and monolingual corpus-based approaches are adopted to select suitable tranlated query terms. A machine transliteration algorithm is introduced to resolve proper name searching. We consider several design issues for document translation, including which material is translated, what roles the HTML tags play in translation, what the tradeoff is between the speed performance and the translation performance, and what from the translated result is presented in. About 100.000 Web pages translated in the last 4 months of 1997 are used for quantitative study of online and real-time Web page translation
    Date
    16. 2.2000 14:22:39
    Type
    a
  2. Olsen, K.A.; Williams, J.G.: Spelling and grammar checking using the Web as a text repository (2004) 0.00
    0.0030255679 = product of:
      0.0060511357 = sum of:
        0.0060511357 = product of:
          0.012102271 = sum of:
            0.012102271 = weight(_text_:a in 2891) [ClassicSimilarity], result of:
              0.012102271 = score(doc=2891,freq=40.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.22789092 = fieldWeight in 2891, product of:
                  6.3245554 = tf(freq=40.0), with freq of:
                    40.0 = termFreq=40.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.03125 = fieldNorm(doc=2891)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Natural languages are both complex and dynamic. They are in part formalized through dictionaries and grammar. Dictionaries attempt to provide definitions and examples of various usages for all the words in a language. Grammar, on the other hand, is the system of rules that defines the structure of a language and is concerned with the correct use and application of the language in speaking or writing. The fact that these two mechanisms lag behind the language as currently used is not a serious problem for those living in a language culture and talking their native language. However, the correct choice of words, expressions, and word relationships is much more difficult when speaking or writing in a foreign language. The basics of the grammar of a language may have been learned in school decades ago, and even then there were always several choices for the correct expression for an idea, fact, opinion, or emotion. Although many different parts of speech and their relationships can make for difficult language decisions, prepositions tend to be problematic for nonnative speakers of English, and, in reality, prepositions are a major problem in most languages. Does a speaker or writer say "in the West Coast" or "on the West Coast," or perhaps "at the West Coast"? In Norwegian, we are "in" a city, but "at" a place. But the distinction between cities and places is vague. To be absolutely correct, one really has to learn the right preposition for every single place. A simplistic way of resolving these language issues is to ask a native speaker. But even native speakers may disagree about the right choice of words. If there is disagreement, then one will have to ask more than one native speaker, treat his/her response as a vote for a particular choice, and perhaps choose the majority choice as the best possible alternative. In real life, such a procedure may be impossible or impractical, but in the electronic world, as we shall see, this is quite easy to achieve. Using the vast text repository of the Web, we may get a significant voting base for even the most detailed and distinct phrases. We shall start by introducing a set of examples to present our idea of using the text repository an the Web to aid in making the best word selection, especially for the use of prepositions. Then we will present a more general discussion of the possibilities and limitations of using the Web as an aid for correct writing.
    Type
    a
  3. Mustafa el Hadi, W.: Dynamics of the linguistic paradigm in information retrieval (2000) 0.00
    0.0026849252 = product of:
      0.0053698504 = sum of:
        0.0053698504 = product of:
          0.010739701 = sum of:
            0.010739701 = weight(_text_:a in 151) [ClassicSimilarity], result of:
              0.010739701 = score(doc=151,freq=14.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.20223314 = fieldWeight in 151, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=151)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    In this paper we briefly sketch the dynamics of the linguistic paradigm in Information Retrieval (IR) and its adaptation to the Internet. The emergence of Natural Language Processing (NLP) techniques has been a major factor leading to this adaptation. These techniques and tools try to adapt to the current needs, i.e. retrieving information from documents written and indexed in a foreign language by using a native language query to express the information need. This process, known as cross-language IR (CLIR), is a field at the cross roads of both Machine Translation and IR. This field represents a real challenge to the IR community and will require a solid cooperation with the NLP community.
    Type
    a
  4. Luo, Z.; Yu, Y.; Osborne, M.; Wang, T.: Structuring tweets for improving Twitter search (2015) 0.00
    0.0026742492 = product of:
      0.0053484985 = sum of:
        0.0053484985 = product of:
          0.010696997 = sum of:
            0.010696997 = weight(_text_:a in 2335) [ClassicSimilarity], result of:
              0.010696997 = score(doc=2335,freq=20.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.20142901 = fieldWeight in 2335, product of:
                  4.472136 = tf(freq=20.0), with freq of:
                    20.0 = termFreq=20.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2335)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Spam and wildly varying documents make searching in Twitter challenging. Most Twitter search systems generally treat a Tweet as a plain text when modeling relevance. However, a series of conventions allows users to Tweet in structural ways using a combination of different blocks of texts. These blocks include plain texts, hashtags, links, mentions, etc. Each block encodes a variety of communicative intent and the sequence of these blocks captures changing discourse. Previous work shows that exploiting the structural information can improve the structured documents (e.g., web pages) retrieval. In this study we utilize the structure of Tweets, induced by these blocks, for Twitter retrieval and Twitter opinion retrieval. For Twitter retrieval, a set of features, derived from the blocks of text and their combinations, is used into a learning-to-rank scenario. We show that structuring Tweets can achieve state-of-the-art performance. Our approach does not rely on social media features, but when we do add this additional information, performance improves significantly. For Twitter opinion retrieval, we explore the question of whether structural information derived from the body of Tweets and opinionatedness ratings of Tweets can improve performance. Experimental results show that retrieval using a novel unsupervised opinionatedness feature based on structuring Tweets achieves comparable performance with a supervised method using manually tagged Tweets. Topic-related specific structured Tweet sets are shown to help with query-dependent opinion retrieval.
    Type
    a
  5. Jaaranen, K.; Lehtola, A.; Tenni, J.; Bounsaythip, C.: Webtran tools for in-company language support (2000) 0.00
    0.002269176 = product of:
      0.004538352 = sum of:
        0.004538352 = product of:
          0.009076704 = sum of:
            0.009076704 = weight(_text_:a in 5553) [ClassicSimilarity], result of:
              0.009076704 = score(doc=5553,freq=10.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.1709182 = fieldWeight in 5553, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5553)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Webtran tools for authoring and translating domain specific texts can make the multilingual text production in a company more efficient and less expensive. Tile tools have been in production use since spring 2000 for checking and translating product article texts of a specific domain, namely an in-company language in sales catalogues of a mail-order company. Webtran tools have been developed by VTT Information Technology. Use experiences have shown that an automatic translation process is faster than phrase-lexicon assisted manual translation, if an in-company language model is created to control and support the language used within the company
    Type
    a
  6. Wright, S.E.: Leveraging terminology resources across application boundaries : accessing resources in future integrated environments (2000) 0.00
    0.0022374375 = product of:
      0.004474875 = sum of:
        0.004474875 = product of:
          0.00894975 = sum of:
            0.00894975 = weight(_text_:a in 5528) [ClassicSimilarity], result of:
              0.00894975 = score(doc=5528,freq=14.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.1685276 = fieldWeight in 5528, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5528)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The title for this conference, stated in English, is Language Technology for a Dynamic Economy - y in the Media Age - The question arises as to what the media are we are dealing with and to what extent we are moving away from tile reality of different media to a world in which all sub-categories flow together into a unified stream of information that is constantly resealed to appear in different hardware configurations. A few years ago, people who were interested in sharing data or getting different electronic "boxes" to talk to each other were focused on two major aspects: I ) developing data conversion technology, and 2) convincing potential users that sharing information was an even remotely interesting option. Although some content "owners" are still reticent about releasing their data, it has become dramatically apparent in the Web environment that a broad range of users does indeed want this technology. Even as researchers struggle with the remaining technical, legal, and ethical impediments that stand in the way of unlimited information access to existing multi-platform resources, the future view of the world will no longer be as obsessed with conversion capability as it will be with creating content, with ,in eye to morphing technologies that will enable the delivery of that content from ail open-standards-based format such as XML (eXtensibic Markup Language), MPEG (Moving Picture Experts Group), or WAP (Wireless Application Protocol) to a rich variety of display Options
    Type
    a
  7. Allen, E.E.: Searching, naturally (1998) 0.00
    0.0020296127 = product of:
      0.0040592253 = sum of:
        0.0040592253 = product of:
          0.008118451 = sum of:
            0.008118451 = weight(_text_:a in 2602) [ClassicSimilarity], result of:
              0.008118451 = score(doc=2602,freq=2.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.15287387 = fieldWeight in 2602, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.09375 = fieldNorm(doc=2602)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Type
    a
  8. Nait-Baha, L.; Jackiewicz, A.; Djioua, B.; Laublet, P.: Query reformulation for information retrieval on the Web using the point of view methodology : preliminary results (2001) 0.00
    0.0020296127 = product of:
      0.0040592253 = sum of:
        0.0040592253 = product of:
          0.008118451 = sum of:
            0.008118451 = weight(_text_:a in 249) [ClassicSimilarity], result of:
              0.008118451 = score(doc=249,freq=8.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.15287387 = fieldWeight in 249, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=249)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The work we are presenting is devoted to the information collected on the WWW. By the term collected we mean the whole process of retrieving, extracting and presenting results to the user. This research is part of the RAP (Research, Analyze, Propose) project in which we propose to combine two methods: (i) query reformulation using linguistic markers according to a given point of view; and (ii) text semantic analysis by means of contextual exploration results (Descles, 1991). The general project architecture describing the interactions between the users, the RAP system and the WWW search engines is presented in Nait-Baha et al. (1998). We will focus this paper on showing how we use linguistic markers to reformulate the queries according to a given point of view
    Type
    a