Search (118 results, page 1 of 6)

  • × theme_ss:"Computerlinguistik"
  1. Bedathur, S.; Narang, A.: Mind your language : effects of spoken query formulation on retrieval effectiveness (2013) 0.13
    0.12953952 = product of:
      0.19430926 = sum of:
        0.093939 = weight(_text_:search in 1150) [ClassicSimilarity], result of:
          0.093939 = score(doc=1150,freq=8.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.5376164 = fieldWeight in 1150, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1150)
        0.10037026 = product of:
          0.20074052 = sum of:
            0.20074052 = weight(_text_:engines in 1150) [ClassicSimilarity], result of:
              0.20074052 = score(doc=1150,freq=8.0), product of:
                0.25542772 = queryWeight, product of:
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.05027291 = queryNorm
                0.7858995 = fieldWeight in 1150, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1150)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Voice search is becoming a popular mode for interacting with search engines. As a result, research has gone into building better voice transcription engines, interfaces, and search engines that better handle inherent verbosity of queries. However, when one considers its use by non- native speakers of English, another aspect that becomes important is the formulation of the query by users. In this paper, we present the results of a preliminary study that we conducted with non-native English speakers who formulate queries for given retrieval tasks. Our results show that the current search engines are sensitive in their rankings to the query formulation, and thus highlights the need for developing more robust ranking methods.
  2. Doszkocs, T.E.; Zamora, A.: Dictionary services and spelling aids for Web searching (2004) 0.11
    0.111534975 = product of:
      0.16730246 = sum of:
        0.04744636 = weight(_text_:search in 2541) [ClassicSimilarity], result of:
          0.04744636 = score(doc=2541,freq=4.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.27153727 = fieldWeight in 2541, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2541)
        0.119856104 = sum of:
          0.07169304 = weight(_text_:engines in 2541) [ClassicSimilarity], result of:
            0.07169304 = score(doc=2541,freq=2.0), product of:
              0.25542772 = queryWeight, product of:
                5.080822 = idf(docFreq=746, maxDocs=44218)
                0.05027291 = queryNorm
              0.2806784 = fieldWeight in 2541, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.080822 = idf(docFreq=746, maxDocs=44218)
                0.0390625 = fieldNorm(doc=2541)
          0.048163064 = weight(_text_:22 in 2541) [ClassicSimilarity], result of:
            0.048163064 = score(doc=2541,freq=4.0), product of:
              0.17604718 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.05027291 = queryNorm
              0.27358043 = fieldWeight in 2541, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=2541)
      0.6666667 = coord(2/3)
    
    Abstract
    The Specialized Information Services Division (SIS) of the National Library of Medicine (NLM) provides Web access to more than a dozen scientific databases on toxicology and the environment on TOXNET . Search queries on TOXNET often include misspelled or variant English words, medical and scientific jargon and chemical names. Following the example of search engines like Google and ClinicalTrials.gov, we set out to develop a spelling "suggestion" system for increased recall and precision in TOXNET searching. This paper describes development of dictionary technology that can be used in a variety of applications such as orthographic verification, writing aid, natural language processing, and information storage and retrieval. The design of the technology allows building complex applications using the components developed in the earlier phases of the work in a modular fashion without extensive rewriting of computer code. Since many of the potential applications envisioned for this work have on-line or web-based interfaces, the dictionaries and other computer components must have fast response, and must be adaptable to open-ended database vocabularies, including chemical nomenclature. The dictionary vocabulary for this work was derived from SIS and other databases and specialized resources, such as NLM's Unified Medical Language Systems (UMLS) . The resulting technology, A-Z Dictionary (AZdict), has three major constituents: 1) the vocabulary list, 2) the word attributes that define part of speech and morphological relationships between words in the list, and 3) a set of programs that implements the retrieval of words and their attributes, and determines similarity between words (ChemSpell). These three components can be used in various applications such as spelling verification, spelling aid, part-of-speech tagging, paraphrasing, and many other natural language processing functions.
    Date
    14. 8.2004 17:22:56
    Source
    Online. 28(2004) no.3, S.22-29
  3. Radev, D.; Fan, W.; Qu, H.; Wu, H.; Grewal, A.: Probabilistic question answering on the Web (2005) 0.10
    0.09615815 = product of:
      0.14423722 = sum of:
        0.06973162 = weight(_text_:search in 3455) [ClassicSimilarity], result of:
          0.06973162 = score(doc=3455,freq=6.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.39907667 = fieldWeight in 3455, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.046875 = fieldNorm(doc=3455)
        0.074505605 = product of:
          0.14901121 = sum of:
            0.14901121 = weight(_text_:engines in 3455) [ClassicSimilarity], result of:
              0.14901121 = score(doc=3455,freq=6.0), product of:
                0.25542772 = queryWeight, product of:
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.05027291 = queryNorm
                0.58337915 = fieldWeight in 3455, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3455)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Web-based search engines such as Google and NorthernLight return documents that are relevant to a user query, not answers to user questions. We have developed an architecture that augments existing search engines so that they support natural language question answering. The process entails five steps: query modulation, document retrieval, passage extraction, phrase extraction, and answer ranking. In this article, we describe some probabilistic approaches to the last three of these stages. We show how our techniques apply to a number of existing search engines, and we also present results contrasting three different methods for question answering. Our algorithm, probabilistic phrase reranking (PPR), uses proximity and question type features and achieves a total reciprocal document rank of .20 an the TREC8 corpus. Our techniques have been implemented as a Web-accessible system, called NSIR.
  4. Rajasurya, S.; Muralidharan, T.; Devi, S.; Swamynathan, S.: Semantic information retrieval using ontology in university domain (2012) 0.09
    0.09297244 = product of:
      0.13945866 = sum of:
        0.08876401 = weight(_text_:search in 2861) [ClassicSimilarity], result of:
          0.08876401 = score(doc=2861,freq=14.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.5079997 = fieldWeight in 2861, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2861)
        0.05069464 = product of:
          0.10138928 = sum of:
            0.10138928 = weight(_text_:engines in 2861) [ClassicSimilarity], result of:
              0.10138928 = score(doc=2861,freq=4.0), product of:
                0.25542772 = queryWeight, product of:
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.05027291 = queryNorm
                0.39693922 = fieldWeight in 2861, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2861)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Today's conventional search engines hardly do provide the essential content relevant to the user's search query. This is because the context and semantics of the request made by the user is not analyzed to the full extent. So here the need for a semantic web search arises. SWS is upcoming in the area of web search which combines Natural Language Processing and Artificial Intelligence. The objective of the work done here is to design, develop and implement a semantic search engine- SIEU(Semantic Information Extraction in University Domain) confined to the university domain. SIEU uses ontology as a knowledge base for the information retrieval process. It is not just a mere keyword search. It is one layer above what Google or any other search engines retrieve by analyzing just the keywords. Here the query is analyzed both syntactically and semantically. The developed system retrieves the web results more relevant to the user query through keyword expansion. The results obtained here will be accurate enough to satisfy the request made by the user. The level of accuracy will be enhanced since the query is analyzed semantically. The system will be of great use to the developers and researchers who work on web. The Google results are re-ranked and optimized for providing the relevant links. For ranking an algorithm has been applied which fetches more apt results for the user query.
  5. Brenner, E.H.: Beyond Boolean : new approaches in information retrieval; the quest for intuitive online search systems past, present & future (1995) 0.09
    0.08769246 = product of:
      0.13153869 = sum of:
        0.08135357 = weight(_text_:search in 2547) [ClassicSimilarity], result of:
          0.08135357 = score(doc=2547,freq=6.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.46558946 = fieldWeight in 2547, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2547)
        0.05018513 = product of:
          0.10037026 = sum of:
            0.10037026 = weight(_text_:engines in 2547) [ClassicSimilarity], result of:
              0.10037026 = score(doc=2547,freq=2.0), product of:
                0.25542772 = queryWeight, product of:
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.05027291 = queryNorm
                0.39294976 = fieldWeight in 2547, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2547)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    The challenge of effectively bringing specific, relevant information from the global sea of data to our fingertips, has become an increasingly difficult one. Discusses how the online information industry, founded on Boolean search systems, may be evolving to take advantage of other methods, such as 'term weighting', 'relevance ranking' and 'query by example'
    Content
    (1) The Boolean world; (2) The Non-Boolean picture; (3) The commercial search engines: Personal Librarian, CLARIT, ConQuest, DR-LINK, InQuizit, InTEXT, TOPIC, WIN, TARGET, FREESTYLE, InfoSeek; (4) Wiedergabe von 8 Aufsätzen aus 'Monitor'
  6. Kajanan, S.; Bao, Y.; Datta, A.; VanderMeer, D.; Dutta, K.: Efficient automatic search query formulation using phrase-level analysis (2014) 0.08
    0.08110181 = product of:
      0.121652715 = sum of:
        0.0929755 = weight(_text_:search in 1264) [ClassicSimilarity], result of:
          0.0929755 = score(doc=1264,freq=24.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.5321022 = fieldWeight in 1264, product of:
              4.8989797 = tf(freq=24.0), with freq of:
                24.0 = termFreq=24.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.03125 = fieldNorm(doc=1264)
        0.028677218 = product of:
          0.057354435 = sum of:
            0.057354435 = weight(_text_:engines in 1264) [ClassicSimilarity], result of:
              0.057354435 = score(doc=1264,freq=2.0), product of:
                0.25542772 = queryWeight, product of:
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.05027291 = queryNorm
                0.22454272 = fieldWeight in 1264, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.03125 = fieldNorm(doc=1264)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Over the past decade, the volume of information available digitally over the Internet has grown enormously. Technical developments in the area of search, such as Google's Page Rank algorithm, have proved so good at serving relevant results that Internet search has become integrated into daily human activity. One can endlessly explore topics of interest simply by querying and reading through the resulting links. Yet, although search engines are well known for providing relevant results based on users' queries, users do not always receive the results they are looking for. Google's Director of Research describes clickstream evidence of frustrated users repeatedly reformulating queries and searching through page after page of results. Given the general quality of search engine results, one must consider the possibility that the frustrated user's query is not effective; that is, it does not describe the essence of the user's interest. Indeed, extensive research into human search behavior has found that humans are not very effective at formulating good search queries that describe what they are interested in. Ideally, the user should simply point to a portion of text that sparked the user's interest, and a system should automatically formulate a search query that captures the essence of the text. In this paper, we describe an implemented system that provides this capability. We first describe how our work differs from existing work in automatic query formulation, and propose a new method for improved quantification of the relevance of candidate search terms drawn from input text using phrase-level analysis. We then propose an implementable method designed to provide relevant queries based on a user's text input. We demonstrate the quality of our results and performance of our system through experimental studies. Our results demonstrate that our system produces relevant search terms with roughly two-thirds precision and recall compared to search terms selected by experts, and that typical users find significantly more relevant results (31% more relevant) more quickly (64% faster) using our system than self-formulated search queries. Further, we show that our implementation can scale to request loads of up to 10 requests per second within current online responsiveness expectations (<2-second response times at the highest loads tested).
  7. Addison, E.R.; Wilson, H.D.; Feder, J.: ¬The impact of plain English searching on end users (1993) 0.07
    0.074022576 = product of:
      0.11103386 = sum of:
        0.053679425 = weight(_text_:search in 5354) [ClassicSimilarity], result of:
          0.053679425 = score(doc=5354,freq=2.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.30720934 = fieldWeight in 5354, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0625 = fieldNorm(doc=5354)
        0.057354435 = product of:
          0.11470887 = sum of:
            0.11470887 = weight(_text_:engines in 5354) [ClassicSimilarity], result of:
              0.11470887 = score(doc=5354,freq=2.0), product of:
                0.25542772 = queryWeight, product of:
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.05027291 = queryNorm
                0.44908544 = fieldWeight in 5354, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.0625 = fieldNorm(doc=5354)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Commercial software products are available with plain English searching capabilities as engines for online and CD-ROM information services, and for internal text information management. With plain English interfaces, end users do not need to master the keyword and connector approach of the Boolean search query language. Describes plain English searching and its impact on the process of full text retrieval. Explores the issues of ease of use, reliability and implications for the total research process
  8. Chandrasekar, R.; Bangalore, S.: Glean : using syntactic information in document filtering (2002) 0.07
    0.07253622 = product of:
      0.10880433 = sum of:
        0.058109686 = weight(_text_:search in 4257) [ClassicSimilarity], result of:
          0.058109686 = score(doc=4257,freq=6.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.33256388 = fieldWeight in 4257, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4257)
        0.05069464 = product of:
          0.10138928 = sum of:
            0.10138928 = weight(_text_:engines in 4257) [ClassicSimilarity], result of:
              0.10138928 = score(doc=4257,freq=4.0), product of:
                0.25542772 = queryWeight, product of:
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.05027291 = queryNorm
                0.39693922 = fieldWeight in 4257, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4257)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    In today's networked world, a huge amount of data is available in machine-processable form. Likewise, there are any number of search engines and specialized information retrieval (IR) programs that seek to extract relevant information from these data repositories. Most IR systems and Web search engines have been designed for speed and tend to maximize the quantity of information (recall) rather than the relevance of the information (precision) to the query. As a result, search engine users get inundated with information for practically any query, and are forced to scan a large number of potentially relevant items to get to the information of interest. The Holy Grail of IR is to somehow retrieve those and only those documents pertinent to the user's query. Polysemy and synonymy - the fact that often there are several meanings for a word or phrase, and likewise, many ways to express a conceptmake this a very hard task. While conventional IR systems provide usable solutions, there are a number of open problems to be solved, in areas such as syntactic processing, semantic analysis, and user modeling, before we develop systems that "understand" user queries and text collections. Meanwhile, we can use tools and techniques available today to improve the precision of retrieval. In particular, using the approach described in this article, we can approximate understanding using the syntactic structure and patterns of language use that is latent in documents to make IR more effective.
  9. Wang, F.L.; Yang, C.C.: Mining Web data for Chinese segmentation (2007) 0.07
    0.07253622 = product of:
      0.10880433 = sum of:
        0.058109686 = weight(_text_:search in 604) [ClassicSimilarity], result of:
          0.058109686 = score(doc=604,freq=6.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.33256388 = fieldWeight in 604, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=604)
        0.05069464 = product of:
          0.10138928 = sum of:
            0.10138928 = weight(_text_:engines in 604) [ClassicSimilarity], result of:
              0.10138928 = score(doc=604,freq=4.0), product of:
                0.25542772 = queryWeight, product of:
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.05027291 = queryNorm
                0.39693922 = fieldWeight in 604, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=604)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Modern information retrieval systems use keywords within documents as indexing terms for search of relevant documents. As Chinese is an ideographic character-based language, the words in the texts are not delimited by white spaces. Indexing of Chinese documents is impossible without a proper segmentation algorithm. Many Chinese segmentation algorithms have been proposed in the past. Traditional segmentation algorithms cannot operate without a large dictionary or a large corpus of training data. Nowadays, the Web has become the largest corpus that is ideal for Chinese segmentation. Although most search engines have problems in segmenting texts into proper words, they maintain huge databases of documents and frequencies of character sequences in the documents. Their databases are important potential resources for segmentation. In this paper, we propose a segmentation algorithm by mining Web data with the help of search engines. On the other hand, the Romanized pinyin of Chinese language indicates boundaries of words in the text. Our algorithm is the first to utilize the Romanized pinyin to segmentation. It is the first unified segmentation algorithm for the Chinese language from different geographical areas, and it is also domain independent because of the nature of the Web. Experiments have been conducted on the datasets of a recent Chinese segmentation competition. The results show that our algorithm outperforms the traditional algorithms in terms of precision and recall. Moreover, our algorithm can effectively deal with the problems of segmentation ambiguity, new word (unknown word) detection, and stop words.
  10. Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.07
    0.06685372 = product of:
      0.10028057 = sum of:
        0.07984671 = product of:
          0.23954013 = sum of:
            0.23954013 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
              0.23954013 = score(doc=562,freq=2.0), product of:
                0.4262143 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.05027291 = queryNorm
                0.56201804 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.33333334 = coord(1/3)
        0.020433856 = product of:
          0.040867712 = sum of:
            0.040867712 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
              0.040867712 = score(doc=562,freq=2.0), product of:
                0.17604718 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05027291 = queryNorm
                0.23214069 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Content
    Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
    Date
    8. 1.2013 10:22:32
  11. Ahmed, F.; Nürnberger, A.: Evaluation of n-gram conflation approaches for Arabic text retrieval (2009) 0.07
    0.066634305 = product of:
      0.09995145 = sum of:
        0.056935627 = weight(_text_:search in 2941) [ClassicSimilarity], result of:
          0.056935627 = score(doc=2941,freq=4.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.3258447 = fieldWeight in 2941, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.046875 = fieldNorm(doc=2941)
        0.043015826 = product of:
          0.08603165 = sum of:
            0.08603165 = weight(_text_:engines in 2941) [ClassicSimilarity], result of:
              0.08603165 = score(doc=2941,freq=2.0), product of:
                0.25542772 = queryWeight, product of:
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.05027291 = queryNorm
                0.33681408 = fieldWeight in 2941, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2941)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    In this paper we present a language-independent approach for conflation that does not depend on predefined rules or prior knowledge of the target language. The proposed unsupervised method is based on an enhancement of the pure n-gram model that can group related words based on various string-similarity measures, while restricting the search to specific locations of the target word by taking into account the order of n-grams. We show that the method is effective to achieve high score similarities for all word-form variations and reduces the ambiguity, i.e., obtains a higher precision and recall, compared to pure n-gram-based approaches for English, Portuguese, and Arabic. The proposed method is especially suited for conflation approaches in Arabic, since Arabic is a highly inflectional language. Therefore, we present in addition an adaptive user interface for Arabic text retrieval called araSearch. araSearch serves as a metasearch interface to existing search engines. The system is able to extend a query using the proposed conflation approach such that additional results for relevant subwords can be found automatically.
  12. Nait-Baha, L.; Jackiewicz, A.; Djioua, B.; Laublet, P.: Query reformulation for information retrieval on the Web using the point of view methodology : preliminary results (2001) 0.06
    0.05551693 = product of:
      0.08327539 = sum of:
        0.04025957 = weight(_text_:search in 249) [ClassicSimilarity], result of:
          0.04025957 = score(doc=249,freq=2.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.230407 = fieldWeight in 249, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.046875 = fieldNorm(doc=249)
        0.043015826 = product of:
          0.08603165 = sum of:
            0.08603165 = weight(_text_:engines in 249) [ClassicSimilarity], result of:
              0.08603165 = score(doc=249,freq=2.0), product of:
                0.25542772 = queryWeight, product of:
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.05027291 = queryNorm
                0.33681408 = fieldWeight in 249, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.046875 = fieldNorm(doc=249)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    The work we are presenting is devoted to the information collected on the WWW. By the term collected we mean the whole process of retrieving, extracting and presenting results to the user. This research is part of the RAP (Research, Analyze, Propose) project in which we propose to combine two methods: (i) query reformulation using linguistic markers according to a given point of view; and (ii) text semantic analysis by means of contextual exploration results (Descles, 1991). The general project architecture describing the interactions between the users, the RAP system and the WWW search engines is presented in Nait-Baha et al. (1998). We will focus this paper on showing how we use linguistic markers to reformulate the queries according to a given point of view
  13. Schwarz, C.: THESYS: Thesaurus Syntax System : a fully automatic thesaurus building aid (1988) 0.05
    0.047206 = product of:
      0.070809 = sum of:
        0.0469695 = weight(_text_:search in 1361) [ClassicSimilarity], result of:
          0.0469695 = score(doc=1361,freq=2.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.2688082 = fieldWeight in 1361, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1361)
        0.0238395 = product of:
          0.047679 = sum of:
            0.047679 = weight(_text_:22 in 1361) [ClassicSimilarity], result of:
              0.047679 = score(doc=1361,freq=2.0), product of:
                0.17604718 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05027291 = queryNorm
                0.2708308 = fieldWeight in 1361, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1361)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    THESYS is based on the natural language processing of free-text databases. It yields statistically evaluated correlations between words of the database. These correlations correspond to traditional thesaurus relations. The person who has to build a thesaurus is thus assisted by the proposals made by THESYS. THESYS is being tested on commercial databases under real world conditions. It is part of a text processing project at Siemens, called TINA (Text-Inhalts-Analyse). Software from TINA is actually being applied and evaluated by the US Department of Commerce for patent search and indexing (REALIST: REtrieval Aids by Linguistics and STatistics)
    Date
    6. 1.1999 10:22:07
  14. Sprachtechnologie, mobile Kommunikation und linguistische Ressourcen : Beiträge zur GLDV Tagung 2005 in Bonn (2005) 0.04
    0.041178323 = product of:
      0.06176748 = sum of:
        0.04025957 = weight(_text_:search in 3578) [ClassicSimilarity], result of:
          0.04025957 = score(doc=3578,freq=8.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.230407 = fieldWeight in 3578, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0234375 = fieldNorm(doc=3578)
        0.021507913 = product of:
          0.043015826 = sum of:
            0.043015826 = weight(_text_:engines in 3578) [ClassicSimilarity], result of:
              0.043015826 = score(doc=3578,freq=2.0), product of:
                0.25542772 = queryWeight, product of:
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.05027291 = queryNorm
                0.16840704 = fieldWeight in 3578, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.0234375 = fieldNorm(doc=3578)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Content
    INHALT: Chris Biemann/Rainer Osswald: Automatische Erweiterung eines semantikbasierten Lexikons durch Bootstrapping auf großen Korpora - Ernesto William De Luca/Andreas Nürnberger: Supporting Mobile Web Search by Ontology-based Categorization - Rüdiger Gleim: HyGraph - Ein Framework zur Extraktion, Repräsentation und Analyse webbasierter Hypertextstrukturen - Felicitas Haas/Bernhard Schröder: Freges Grundgesetze der Arithmetik: Dokumentbaum und Formelwald - Ulrich Held/ Andre Blessing/Bettina Säuberlich/Jürgen Sienel/Horst Rößler/Dieter Kopp: A personalized multimodal news service -Jürgen Hermes/Christoph Benden: Fusion von Annotation und Präprozessierung als Vorschlag zur Behebung des Rohtextproblems - Sonja Hüwel/Britta Wrede/Gerhard Sagerer: Semantisches Parsing mit Frames für robuste multimodale Mensch-Maschine-Kommunikation - Brigitte Krenn/Stefan Evert: Separating the wheat from the chaff- Corpus-driven evaluation of statistical association measures for collocation extraction - Jörn Kreutel: An application-centered Perspective an Multimodal Dialogue Systems - Jonas Kuhn: An Architecture for Prallel Corpusbased Grammar Learning - Thomas Mandl/Rene Schneider/Pia Schnetzler/Christa Womser-Hacker: Evaluierung von Systemen für die Eigennamenerkennung im crosslingualen Information Retrieval - Alexander Mehler/Matthias Dehmer/Rüdiger Gleim: Zur Automatischen Klassifikation von Webgenres - Charlotte Merz/Martin Volk: Requirements for a Parallel Treebank Search Tool - Sally YK. Mok: Multilingual Text Retrieval an the Web: The Case of a Cantonese-Dagaare-English Trilingual e-Lexicon -
    Karel Pala: The Balkanet Experience - Peter M. Kruse/Andre Nauloks/Dietmar Rösner/Manuela Kunze: Clever Search: A WordNet Based Wrapper for Internet Search Engines - Rosmary Stegmann/Wolfgang Woerndl: Using GermaNet to Generate Individual Customer Profiles - Ingo Glöckner/Sven Hartrumpf/Rainer Osswald: From GermaNet Glosses to Formal Meaning Postulates -Aljoscha Burchardt/ Katrin Erk/Anette Frank: A WordNet Detour to FrameNet - Daniel Naber: OpenThesaurus: ein offenes deutsches Wortnetz - Anke Holler/Wolfgang Grund/Heinrich Petith: Maschinelle Generierung assoziativer Termnetze für die Dokumentensuche - Stefan Bordag/Hans Friedrich Witschel/Thomas Wittig: Evaluation of Lexical Acquisition Algorithms - Iryna Gurevych/Hendrik Niederlich: Computing Semantic Relatedness of GermaNet Concepts - Roland Hausser: Turn-taking als kognitive Grundmechanik der Datenbanksemantik - Rodolfo Delmonte: Parsing Overlaps - Melanie Twiggs: Behandlung des Passivs im Rahmen der Datenbanksemantik- Sandra Hohmann: Intention und Interaktion - Anmerkungen zur Relevanz der Benutzerabsicht - Doris Helfenbein: Verwendung von Pronomina im Sprecher- und Hörmodus - Bayan Abu Shawar/Eric Atwell: Modelling turn-taking in a corpus-trained chatbot - Barbara März: Die Koordination in der Datenbanksemantik - Jens Edlund/Mattias Heldner/Joakim Gustafsson: Utterance segmentation and turn-taking in spoken dialogue systems - Ekaterina Buyko: Numerische Repräsentation von Textkorpora für Wissensextraktion - Bernhard Fisseni: ProofML - eine Annotationssprache für natürlichsprachliche mathematische Beweise - Iryna Schenk: Auflösung der Pronomen mit Nicht-NP-Antezedenten in spontansprachlichen Dialogen - Stephan Schwiebert: Entwurf eines agentengestützten Systems zur Paradigmenbildung - Ingmar Steiner: On the analysis of speech rhythm through acoustic parameters - Hans Friedrich Witschel: Text, Wörter, Morpheme - Möglichkeiten einer automatischen Terminologie-Extraktion.
  15. Prasad, A.R.D.; Kar, B.B.: Parsing Boolean search expression using definite clause grammars (1994) 0.04
    0.035786286 = product of:
      0.10735885 = sum of:
        0.10735885 = weight(_text_:search in 8188) [ClassicSimilarity], result of:
          0.10735885 = score(doc=8188,freq=8.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.6144187 = fieldWeight in 8188, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0625 = fieldNorm(doc=8188)
      0.33333334 = coord(1/3)
    
    Abstract
    Briefly discusses the role of search languages in information retrieval and broadly groups the search languages into 4 categories. Explains the idea of definite clause grammars and demonstrates how parsers for Boolean logic-based search languages can easily be developed. Presents a partial Prolog code of the parser that was used in an object-oriented bibliographic database management system
  16. Zaitseva, E.M.: Developing linguistic tools of thematic search in library information systems (2023) 0.03
    0.033549644 = product of:
      0.100648925 = sum of:
        0.100648925 = weight(_text_:search in 1187) [ClassicSimilarity], result of:
          0.100648925 = score(doc=1187,freq=18.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.5760175 = fieldWeight in 1187, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1187)
      0.33333334 = coord(1/3)
    
    Abstract
    Within the R&D program "Information support of research by scientists and specialists on the basis of RNPLS&T Open Archive - the system of scientific knowledge aggregation", the RNPLS&T analyzes the use of linguistic tools of thematic search in the modern library information systems and the prospects for their development. The author defines the key common characteristics of e-catalogs of the largest Russian libraries revealed at the first stage of the analysis. Based on the specified common characteristics and detailed comparison analysis, the author outlines and substantiates the vectors for enhancing search inter faces of e-catalogs. The focus is made on linguistic tools of thematic search in library information systems; the key vectors are suggested: use of thematic search at different search levels with the clear-cut level differentiation; use of combined functionality within thematic search system; implementation of classification search in all e-catalogs; hierarchical representation of classifications; use of the matching systems for classification information retrieval languages, and in the long term classification and verbal information retrieval languages, and various verbal information retrieval languages. The author formulates practical recommendations to improve thematic search in library information systems.
  17. Krueger, S.: Getting more out of NEXIS (1996) 0.03
    0.031630907 = product of:
      0.09489272 = sum of:
        0.09489272 = weight(_text_:search in 4512) [ClassicSimilarity], result of:
          0.09489272 = score(doc=4512,freq=4.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.54307455 = fieldWeight in 4512, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.078125 = fieldNorm(doc=4512)
      0.33333334 = coord(1/3)
    
    Abstract
    The MORE search command on the LEXIS/NEXIS online databses analyzes the words in a retrieved document, selects and creates a FREESTYLE search, and retrieves th 25 most relevant documents. Shows how MORE works and gives advice about when and when not to use it
  18. Griffith, C.: FREESTYLE: LEXIS-NEXIS goes natural (1994) 0.03
    0.031630907 = product of:
      0.09489272 = sum of:
        0.09489272 = weight(_text_:search in 2512) [ClassicSimilarity], result of:
          0.09489272 = score(doc=2512,freq=4.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.54307455 = fieldWeight in 2512, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.078125 = fieldNorm(doc=2512)
      0.33333334 = coord(1/3)
    
    Abstract
    Describes FREESTYLE, the associative language search engine, developed by Mead Data Central for its LEXIS/NEXIS online service. The special feature of the associative language in FREESTYLE allows users to enter search descriptions in plain English
  19. Notess, G.R.: Up and coming search technologies (2000) 0.03
    0.031313002 = product of:
      0.093939 = sum of:
        0.093939 = weight(_text_:search in 5467) [ClassicSimilarity], result of:
          0.093939 = score(doc=5467,freq=2.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.5376164 = fieldWeight in 5467, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.109375 = fieldNorm(doc=5467)
      0.33333334 = coord(1/3)
    
  20. Hsinchun, C.: Knowledge-based document retrieval framework and design (1992) 0.03
    0.030991834 = product of:
      0.0929755 = sum of:
        0.0929755 = weight(_text_:search in 6686) [ClassicSimilarity], result of:
          0.0929755 = score(doc=6686,freq=6.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.5321022 = fieldWeight in 6686, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0625 = fieldNorm(doc=6686)
      0.33333334 = coord(1/3)
    
    Abstract
    Presents research on the design of knowledge-based document retrieval systems in which a semantic network was adopted to represent subject knowledge and classification scheme knowledge and experts' search strategies and user modelling capability were modelled as procedural knowledge. These functionalities were incorporated into a prototype knowledge-based retrieval system, Metacat. Describes a system, the design of which was based on the blackboard architecture, which was able to create a user profile, identify task requirements, suggest heuristics-based search strategies, perform semantic-based search assistance, and assist online query refinement

Years

Languages

  • e 98
  • d 20
  • el 1
  • m 1
  • More… Less…

Types

  • a 97
  • el 14
  • m 7
  • s 5
  • p 3
  • x 3
  • d 1
  • r 1
  • More… Less…