Search (38 results, page 1 of 2)

  • × theme_ss:"Computerlinguistik"
  • × year_i:[2000 TO 2010}
  • × type_ss:"a"
  1. Doszkocs, T.E.; Zamora, A.: Dictionary services and spelling aids for Web searching (2004) 0.11
    0.111534975 = product of:
      0.16730246 = sum of:
        0.04744636 = weight(_text_:search in 2541) [ClassicSimilarity], result of:
          0.04744636 = score(doc=2541,freq=4.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.27153727 = fieldWeight in 2541, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2541)
        0.119856104 = sum of:
          0.07169304 = weight(_text_:engines in 2541) [ClassicSimilarity], result of:
            0.07169304 = score(doc=2541,freq=2.0), product of:
              0.25542772 = queryWeight, product of:
                5.080822 = idf(docFreq=746, maxDocs=44218)
                0.05027291 = queryNorm
              0.2806784 = fieldWeight in 2541, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.080822 = idf(docFreq=746, maxDocs=44218)
                0.0390625 = fieldNorm(doc=2541)
          0.048163064 = weight(_text_:22 in 2541) [ClassicSimilarity], result of:
            0.048163064 = score(doc=2541,freq=4.0), product of:
              0.17604718 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.05027291 = queryNorm
              0.27358043 = fieldWeight in 2541, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=2541)
      0.6666667 = coord(2/3)
    
    Abstract
    The Specialized Information Services Division (SIS) of the National Library of Medicine (NLM) provides Web access to more than a dozen scientific databases on toxicology and the environment on TOXNET . Search queries on TOXNET often include misspelled or variant English words, medical and scientific jargon and chemical names. Following the example of search engines like Google and ClinicalTrials.gov, we set out to develop a spelling "suggestion" system for increased recall and precision in TOXNET searching. This paper describes development of dictionary technology that can be used in a variety of applications such as orthographic verification, writing aid, natural language processing, and information storage and retrieval. The design of the technology allows building complex applications using the components developed in the earlier phases of the work in a modular fashion without extensive rewriting of computer code. Since many of the potential applications envisioned for this work have on-line or web-based interfaces, the dictionaries and other computer components must have fast response, and must be adaptable to open-ended database vocabularies, including chemical nomenclature. The dictionary vocabulary for this work was derived from SIS and other databases and specialized resources, such as NLM's Unified Medical Language Systems (UMLS) . The resulting technology, A-Z Dictionary (AZdict), has three major constituents: 1) the vocabulary list, 2) the word attributes that define part of speech and morphological relationships between words in the list, and 3) a set of programs that implements the retrieval of words and their attributes, and determines similarity between words (ChemSpell). These three components can be used in various applications such as spelling verification, spelling aid, part-of-speech tagging, paraphrasing, and many other natural language processing functions.
    Date
    14. 8.2004 17:22:56
    Source
    Online. 28(2004) no.3, S.22-29
  2. Radev, D.; Fan, W.; Qu, H.; Wu, H.; Grewal, A.: Probabilistic question answering on the Web (2005) 0.10
    0.09615815 = product of:
      0.14423722 = sum of:
        0.06973162 = weight(_text_:search in 3455) [ClassicSimilarity], result of:
          0.06973162 = score(doc=3455,freq=6.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.39907667 = fieldWeight in 3455, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.046875 = fieldNorm(doc=3455)
        0.074505605 = product of:
          0.14901121 = sum of:
            0.14901121 = weight(_text_:engines in 3455) [ClassicSimilarity], result of:
              0.14901121 = score(doc=3455,freq=6.0), product of:
                0.25542772 = queryWeight, product of:
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.05027291 = queryNorm
                0.58337915 = fieldWeight in 3455, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3455)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Web-based search engines such as Google and NorthernLight return documents that are relevant to a user query, not answers to user questions. We have developed an architecture that augments existing search engines so that they support natural language question answering. The process entails five steps: query modulation, document retrieval, passage extraction, phrase extraction, and answer ranking. In this article, we describe some probabilistic approaches to the last three of these stages. We show how our techniques apply to a number of existing search engines, and we also present results contrasting three different methods for question answering. Our algorithm, probabilistic phrase reranking (PPR), uses proximity and question type features and achieves a total reciprocal document rank of .20 an the TREC8 corpus. Our techniques have been implemented as a Web-accessible system, called NSIR.
  3. Chandrasekar, R.; Bangalore, S.: Glean : using syntactic information in document filtering (2002) 0.07
    0.07253622 = product of:
      0.10880433 = sum of:
        0.058109686 = weight(_text_:search in 4257) [ClassicSimilarity], result of:
          0.058109686 = score(doc=4257,freq=6.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.33256388 = fieldWeight in 4257, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4257)
        0.05069464 = product of:
          0.10138928 = sum of:
            0.10138928 = weight(_text_:engines in 4257) [ClassicSimilarity], result of:
              0.10138928 = score(doc=4257,freq=4.0), product of:
                0.25542772 = queryWeight, product of:
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.05027291 = queryNorm
                0.39693922 = fieldWeight in 4257, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4257)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    In today's networked world, a huge amount of data is available in machine-processable form. Likewise, there are any number of search engines and specialized information retrieval (IR) programs that seek to extract relevant information from these data repositories. Most IR systems and Web search engines have been designed for speed and tend to maximize the quantity of information (recall) rather than the relevance of the information (precision) to the query. As a result, search engine users get inundated with information for practically any query, and are forced to scan a large number of potentially relevant items to get to the information of interest. The Holy Grail of IR is to somehow retrieve those and only those documents pertinent to the user's query. Polysemy and synonymy - the fact that often there are several meanings for a word or phrase, and likewise, many ways to express a conceptmake this a very hard task. While conventional IR systems provide usable solutions, there are a number of open problems to be solved, in areas such as syntactic processing, semantic analysis, and user modeling, before we develop systems that "understand" user queries and text collections. Meanwhile, we can use tools and techniques available today to improve the precision of retrieval. In particular, using the approach described in this article, we can approximate understanding using the syntactic structure and patterns of language use that is latent in documents to make IR more effective.
  4. Wang, F.L.; Yang, C.C.: Mining Web data for Chinese segmentation (2007) 0.07
    0.07253622 = product of:
      0.10880433 = sum of:
        0.058109686 = weight(_text_:search in 604) [ClassicSimilarity], result of:
          0.058109686 = score(doc=604,freq=6.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.33256388 = fieldWeight in 604, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=604)
        0.05069464 = product of:
          0.10138928 = sum of:
            0.10138928 = weight(_text_:engines in 604) [ClassicSimilarity], result of:
              0.10138928 = score(doc=604,freq=4.0), product of:
                0.25542772 = queryWeight, product of:
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.05027291 = queryNorm
                0.39693922 = fieldWeight in 604, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=604)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Modern information retrieval systems use keywords within documents as indexing terms for search of relevant documents. As Chinese is an ideographic character-based language, the words in the texts are not delimited by white spaces. Indexing of Chinese documents is impossible without a proper segmentation algorithm. Many Chinese segmentation algorithms have been proposed in the past. Traditional segmentation algorithms cannot operate without a large dictionary or a large corpus of training data. Nowadays, the Web has become the largest corpus that is ideal for Chinese segmentation. Although most search engines have problems in segmenting texts into proper words, they maintain huge databases of documents and frequencies of character sequences in the documents. Their databases are important potential resources for segmentation. In this paper, we propose a segmentation algorithm by mining Web data with the help of search engines. On the other hand, the Romanized pinyin of Chinese language indicates boundaries of words in the text. Our algorithm is the first to utilize the Romanized pinyin to segmentation. It is the first unified segmentation algorithm for the Chinese language from different geographical areas, and it is also domain independent because of the nature of the Web. Experiments have been conducted on the datasets of a recent Chinese segmentation competition. The results show that our algorithm outperforms the traditional algorithms in terms of precision and recall. Moreover, our algorithm can effectively deal with the problems of segmentation ambiguity, new word (unknown word) detection, and stop words.
  5. Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.07
    0.06685372 = product of:
      0.10028057 = sum of:
        0.07984671 = product of:
          0.23954013 = sum of:
            0.23954013 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
              0.23954013 = score(doc=562,freq=2.0), product of:
                0.4262143 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.05027291 = queryNorm
                0.56201804 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.33333334 = coord(1/3)
        0.020433856 = product of:
          0.040867712 = sum of:
            0.040867712 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
              0.040867712 = score(doc=562,freq=2.0), product of:
                0.17604718 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05027291 = queryNorm
                0.23214069 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Content
    Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
    Date
    8. 1.2013 10:22:32
  6. Ahmed, F.; Nürnberger, A.: Evaluation of n-gram conflation approaches for Arabic text retrieval (2009) 0.07
    0.066634305 = product of:
      0.09995145 = sum of:
        0.056935627 = weight(_text_:search in 2941) [ClassicSimilarity], result of:
          0.056935627 = score(doc=2941,freq=4.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.3258447 = fieldWeight in 2941, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.046875 = fieldNorm(doc=2941)
        0.043015826 = product of:
          0.08603165 = sum of:
            0.08603165 = weight(_text_:engines in 2941) [ClassicSimilarity], result of:
              0.08603165 = score(doc=2941,freq=2.0), product of:
                0.25542772 = queryWeight, product of:
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.05027291 = queryNorm
                0.33681408 = fieldWeight in 2941, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2941)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    In this paper we present a language-independent approach for conflation that does not depend on predefined rules or prior knowledge of the target language. The proposed unsupervised method is based on an enhancement of the pure n-gram model that can group related words based on various string-similarity measures, while restricting the search to specific locations of the target word by taking into account the order of n-grams. We show that the method is effective to achieve high score similarities for all word-form variations and reduces the ambiguity, i.e., obtains a higher precision and recall, compared to pure n-gram-based approaches for English, Portuguese, and Arabic. The proposed method is especially suited for conflation approaches in Arabic, since Arabic is a highly inflectional language. Therefore, we present in addition an adaptive user interface for Arabic text retrieval called araSearch. araSearch serves as a metasearch interface to existing search engines. The system is able to extend a query using the proposed conflation approach such that additional results for relevant subwords can be found automatically.
  7. Nait-Baha, L.; Jackiewicz, A.; Djioua, B.; Laublet, P.: Query reformulation for information retrieval on the Web using the point of view methodology : preliminary results (2001) 0.06
    0.05551693 = product of:
      0.08327539 = sum of:
        0.04025957 = weight(_text_:search in 249) [ClassicSimilarity], result of:
          0.04025957 = score(doc=249,freq=2.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.230407 = fieldWeight in 249, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.046875 = fieldNorm(doc=249)
        0.043015826 = product of:
          0.08603165 = sum of:
            0.08603165 = weight(_text_:engines in 249) [ClassicSimilarity], result of:
              0.08603165 = score(doc=249,freq=2.0), product of:
                0.25542772 = queryWeight, product of:
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.05027291 = queryNorm
                0.33681408 = fieldWeight in 249, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.046875 = fieldNorm(doc=249)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    The work we are presenting is devoted to the information collected on the WWW. By the term collected we mean the whole process of retrieving, extracting and presenting results to the user. This research is part of the RAP (Research, Analyze, Propose) project in which we propose to combine two methods: (i) query reformulation using linguistic markers according to a given point of view; and (ii) text semantic analysis by means of contextual exploration results (Descles, 1991). The general project architecture describing the interactions between the users, the RAP system and the WWW search engines is presented in Nait-Baha et al. (1998). We will focus this paper on showing how we use linguistic markers to reformulate the queries according to a given point of view
  8. Notess, G.R.: Up and coming search technologies (2000) 0.03
    0.031313002 = product of:
      0.093939 = sum of:
        0.093939 = weight(_text_:search in 5467) [ClassicSimilarity], result of:
          0.093939 = score(doc=5467,freq=2.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.5376164 = fieldWeight in 5467, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.109375 = fieldNorm(doc=5467)
      0.33333334 = coord(1/3)
    
  9. Hull, D.; Ait-Mokhtar, S.; Chuat, M.; Eisele, A.; Gaussier, E.; Grefenstette, G.; Isabelle, P.; Samulesson, C.; Segand, F.: Language technologies and patent search and classification (2001) 0.03
    0.026839714 = product of:
      0.08051914 = sum of:
        0.08051914 = weight(_text_:search in 6318) [ClassicSimilarity], result of:
          0.08051914 = score(doc=6318,freq=2.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.460814 = fieldWeight in 6318, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.09375 = fieldNorm(doc=6318)
      0.33333334 = coord(1/3)
    
  10. Ding, Y.; Chowdhury, G.C.; Foo, S.: Incorporating the results of co-word analyses to increase search variety for information retrieval (2000) 0.03
    0.026839714 = product of:
      0.08051914 = sum of:
        0.08051914 = weight(_text_:search in 6328) [ClassicSimilarity], result of:
          0.08051914 = score(doc=6328,freq=2.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.460814 = fieldWeight in 6328, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.09375 = fieldNorm(doc=6328)
      0.33333334 = coord(1/3)
    
  11. Navarretta, C.; Pedersen, B.S.; Hansen, D.H.: Language technology in knowledge-organization systems (2006) 0.02
    0.023243874 = product of:
      0.06973162 = sum of:
        0.06973162 = weight(_text_:search in 5706) [ClassicSimilarity], result of:
          0.06973162 = score(doc=5706,freq=6.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.39907667 = fieldWeight in 5706, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.046875 = fieldNorm(doc=5706)
      0.33333334 = coord(1/3)
    
    Abstract
    This paper describes the language technology methods developed in the Danish research project VID to extract from Danish text material relevant information for the population of knowledge organization systems (KOS) within specific corporate domains. The results achieved by applying these methods to a prototype search engine tuned to the patent and trademark domain indicate that the use of human language technology can support the construction of a linguistically based KOS and that linguistic information in search improves recall substantially without harming precision (near 90%). Finally, we describe two research experiments where (1) linguistic analysis of Danish compounds and is exploited to improve search atrategies on these (2) linguistic knowledge is used to model corporate knowledge into a language-based ontology.
  12. Feldman, S.: Find what I mean, not what I say : meaning-based search tools (2000) 0.02
    0.022366427 = product of:
      0.06709928 = sum of:
        0.06709928 = weight(_text_:search in 4799) [ClassicSimilarity], result of:
          0.06709928 = score(doc=4799,freq=2.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.3840117 = fieldWeight in 4799, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.078125 = fieldNorm(doc=4799)
      0.33333334 = coord(1/3)
    
  13. Diaz, I.; Morato, J.; Lioréns, J.: ¬An algorithm for term conflation based on tree structures (2002) 0.02
    0.017893143 = product of:
      0.053679425 = sum of:
        0.053679425 = weight(_text_:search in 246) [ClassicSimilarity], result of:
          0.053679425 = score(doc=246,freq=2.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.30720934 = fieldWeight in 246, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0625 = fieldNorm(doc=246)
      0.33333334 = coord(1/3)
    
    Abstract
    This work presents a new stemming algorithm. This algorithm stores the stemming information in tree structures. This storage allows us to enhance the performance of the algorithm due to the reduction of the search space and the overall complexity. The final result of that stemming algorithm is a normalized concept, understanding this process as the automatic extraction of the generic form (or a lexeme) for a selected term.
  14. Monnerjahn, P.: Vorsprung ohne Technik : Übersetzen: Computer und Qualität (2000) 0.01
    0.013622571 = product of:
      0.040867712 = sum of:
        0.040867712 = product of:
          0.081735425 = sum of:
            0.081735425 = weight(_text_:22 in 5429) [ClassicSimilarity], result of:
              0.081735425 = score(doc=5429,freq=2.0), product of:
                0.17604718 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05027291 = queryNorm
                0.46428138 = fieldWeight in 5429, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=5429)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Source
    c't. 2000, H.22, S.230-231
  15. Bakar, Z.A.; Sembok, T.M.T.; Yusoff, M.: ¬An evaluation of retrieval effectiveness using spelling-correction and string-similarity matching methods on Malay texts (2000) 0.01
    0.013419857 = product of:
      0.04025957 = sum of:
        0.04025957 = weight(_text_:search in 4804) [ClassicSimilarity], result of:
          0.04025957 = score(doc=4804,freq=2.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.230407 = fieldWeight in 4804, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.046875 = fieldNorm(doc=4804)
      0.33333334 = coord(1/3)
    
    Abstract
    This article evaluates the effectiveness of spelling-correction and string-similarity matching methods in retrieving similar words in a Maly dictionary associated with a set of query words. The spelling-correction techniques used are SPEEDCOP, Soundex, Davidson, Phonic, and Hartlib. 2 dynamic-programming methods that measure longest common subsequence and edit-cost-distance are used. Several search combinations od query and doctionary words are performed in the experiments, the best being one that stems both query and dictionary words using an existing Malay stemming algorithm. the retrieval effectivness (E) and retrieved and relevant (R&R) mean measure are calculated from weighted combination of recall and precision values. Results from these experiments are then compared with available diagram, a string-similarity method. The best R&R and E results are given by using diagram. Editcost-distances produce the best E results, and both dynamic-programming methods rank second in finding R&R mean measures
  16. Peis, E.; Herrera-Viedma, E.; Herrera, J.C.: On the evaluation of XML documents using Fuzzy linguistic techniques (2003) 0.01
    0.013419857 = product of:
      0.04025957 = sum of:
        0.04025957 = weight(_text_:search in 2778) [ClassicSimilarity], result of:
          0.04025957 = score(doc=2778,freq=2.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.230407 = fieldWeight in 2778, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.046875 = fieldNorm(doc=2778)
      0.33333334 = coord(1/3)
    
    Abstract
    Recommender systems evaluate and filter the great amount of information available an the Web to assist people in their search processes. A fuzzy evaluation method of XML documents based an computing with words is presented. Given an XML document type (e.g. scientific article), we consider that its elements are not equally informative. This is indicated by the use of a DTD and defining linguistic importance attributes to the more meaningful elements of the DTD designed. Then, the evaluation method generates linguistic recommendations from linguistic evaluation judgements provided by different recommenders an meaningful elements of DTD.
  17. Thelwall, M.; Price, L.: Language evolution and the spread of ideas on the Web : a procedure for identifying emergent hybrid word (2006) 0.01
    0.013419857 = product of:
      0.04025957 = sum of:
        0.04025957 = weight(_text_:search in 5896) [ClassicSimilarity], result of:
          0.04025957 = score(doc=5896,freq=2.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.230407 = fieldWeight in 5896, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.046875 = fieldNorm(doc=5896)
      0.33333334 = coord(1/3)
    
    Abstract
    Word usage is of interest to linguists for its own sake as well as to social scientists and others who seek to track the spread of ideas, for example, in public debates over political decisions. The historical evolution of language can be analyzed with the tools of corpus linguistics through evolving corpora and the Web. But word usage statistics can only be gathered for known words. In this article, techniques are described and tested for identifying new words from the Web, focusing on the case when the words are related to a topic and have a hybrid form with a common sequence of letters. The results highlight the need to employ a combination of search techniques and show the wide potential of hybrid word family investigations in linguistics and social science.
  18. Cimiano, P.; Völker, J.; Studer, R.: Ontologies on demand? : a description of the state-of-the-art, applications, challenges and trends for ontology learning from text (2006) 0.01
    0.013419857 = product of:
      0.04025957 = sum of:
        0.04025957 = weight(_text_:search in 6014) [ClassicSimilarity], result of:
          0.04025957 = score(doc=6014,freq=2.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.230407 = fieldWeight in 6014, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.046875 = fieldNorm(doc=6014)
      0.33333334 = coord(1/3)
    
    Abstract
    Ontologies are nowadays used for many applications requiring data, services and resources in general to be interoperable and machine understandable. Such applications are for example web service discovery and composition, information integration across databases, intelligent search, etc. The general idea is that data and services are semantically described with respect to ontologies, which are formal specifications of a domain of interest, and can thus be shared and reused in a way such that the shared meaning specified by the ontology remains formally the same across different parties and applications. As the cost of creating ontologies is relatively high, different proposals have emerged for learning ontologies from structured and unstructured resources. In this article we examine the maturity of techniques for ontology learning from textual resources, addressing the question whether the state-of-the-art is mature enough to produce ontologies 'on demand'.
  19. Oard, D.W.; He, D.; Wang, J.: User-assisted query translation for interactive cross-language information retrieval (2008) 0.01
    0.013419857 = product of:
      0.04025957 = sum of:
        0.04025957 = weight(_text_:search in 2030) [ClassicSimilarity], result of:
          0.04025957 = score(doc=2030,freq=2.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.230407 = fieldWeight in 2030, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.046875 = fieldNorm(doc=2030)
      0.33333334 = coord(1/3)
    
    Abstract
    Interactive Cross-Language Information Retrieval (CLIR), a process in which searcher and system collaborate to find documents that satisfy an information need regardless of the language in which those documents are written, calls for designs in which synergies between searcher and system can be leveraged so that the strengths of one can cover weaknesses of the other. This paper describes an approach that employs user-assisted query translation to help searchers better understand the system's operation. Supporting interaction and interface designs are introduced, and results from three user studies are presented. The results indicate that experienced searchers presented with this new system evolve new search strategies that make effective use of the new capabilities, that they achieve retrieval effectiveness comparable to results obtained using fully automatic techniques, and that reported satisfaction with support for cross-language searching increased. The paper concludes with a description of a freely available interactive CLIR system that incorporates lessons learned from this research.
  20. Airio, E.: Who benefits from CLIR in web retrieval? (2008) 0.01
    0.013419857 = product of:
      0.04025957 = sum of:
        0.04025957 = weight(_text_:search in 2342) [ClassicSimilarity], result of:
          0.04025957 = score(doc=2342,freq=2.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.230407 = fieldWeight in 2342, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.046875 = fieldNorm(doc=2342)
      0.33333334 = coord(1/3)
    
    Abstract
    Purpose - The aim of the current paper is to test whether query translation is beneficial in web retrieval. Design/methodology/approach - The language pairs were Finnish-Swedish, English-German and Finnish-French. A total of 12-18 participants were recruited for each language pair. Each participant performed four retrieval tasks. The author's aim was to compare the performance of the translated queries with that of the target language queries. Thus, the author asked participants to formulate a source language query and a target language query for each task. The source language queries were translated into the target language utilizing a dictionary-based system. In English-German, also machine translation was utilized. The author used Google as the search engine. Findings - The results differed depending on the language pair. The author concluded that the dictionary coverage had an effect on the results. On average, the results of query-translation were better than in the traditional laboratory tests. Originality/value - This research shows that query translation in web is beneficial especially for users with moderate and non-active language skills. This is valuable information for developers of cross-language information retrieval systems.

Languages

  • e 32
  • d 6