Search (63 results, page 1 of 4)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.10

0.101847395 = sum of:
  0.08109427 = product of:
    0.24328281 = sum of:
      0.24328281 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
        0.24328281 = score(doc=562,freq=2.0), product of:
          0.43287367 = queryWeight, product of:
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.051058397 = queryNorm
          0.56201804 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
    0.33333334 = coord(1/3)
  0.020753123 = product of:
    0.041506246 = sum of:
      0.041506246 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
        0.041506246 = score(doc=562,freq=2.0), product of:
          0.17879781 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051058397 = queryNorm
          0.23214069 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
    0.5 = coord(1/2)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Doszkocs, T.E.; Zamora, A.: Dictionary services and spelling aids for Web searching (2004) 0.06
```
0.060864396 = product of:
  0.12172879 = sum of:
    0.12172879 = sum of:
      0.072813205 = weight(_text_:engines in 2541) [ClassicSimilarity], result of:
        0.072813205 = score(doc=2541,freq=2.0), product of:
          0.25941864 = queryWeight, product of:
            5.080822 = idf(docFreq=746, maxDocs=44218)
            0.051058397 = queryNorm
          0.2806784 = fieldWeight in 2541, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            5.080822 = idf(docFreq=746, maxDocs=44218)
            0.0390625 = fieldNorm(doc=2541)
      0.048915584 = weight(_text_:22 in 2541) [ClassicSimilarity], result of:
        0.048915584 = score(doc=2541,freq=4.0), product of:
          0.17879781 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051058397 = queryNorm
          0.27358043 = fieldWeight in 2541, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=2541)
  0.5 = coord(1/2)
```
Abstract

The Specialized Information Services Division (SIS) of the National Library of Medicine (NLM) provides Web access to more than a dozen scientific databases on toxicology and the environment on TOXNET . Search queries on TOXNET often include misspelled or variant English words, medical and scientific jargon and chemical names. Following the example of search engines like Google and ClinicalTrials.gov, we set out to develop a spelling "suggestion" system for increased recall and precision in TOXNET searching. This paper describes development of dictionary technology that can be used in a variety of applications such as orthographic verification, writing aid, natural language processing, and information storage and retrieval. The design of the technology allows building complex applications using the components developed in the earlier phases of the work in a modular fashion without extensive rewriting of computer code. Since many of the potential applications envisioned for this work have on-line or web-based interfaces, the dictionaries and other computer components must have fast response, and must be adaptable to open-ended database vocabularies, including chemical nomenclature. The dictionary vocabulary for this work was derived from SIS and other databases and specialized resources, such as NLM's Unified Medical Language Systems (UMLS) . The resulting technology, A-Z Dictionary (AZdict), has three major constituents: 1) the vocabulary list, 2) the word attributes that define part of speech and morphological relationships between words in the list, and 3) a set of programs that implements the retrieval of words and their attributes, and determines similarity between words (ChemSpell). These three components can be used in various applications such as spelling verification, spelling aid, part-of-speech tagging, paraphrasing, and many other natural language processing functions.

Date

14. 8.2004 17:22:56

Source

Online. 28(2004) no.3, S.22-29
Bedathur, S.; Narang, A.: Mind your language : effects of spoken query formulation on retrieval effectiveness (2013) 0.05
```
0.050969247 = product of:
  0.10193849 = sum of:
    0.10193849 = product of:
      0.20387699 = sum of:
        0.20387699 = weight(_text_:engines in 1150) [ClassicSimilarity], result of:
          0.20387699 = score(doc=1150,freq=8.0), product of:
            0.25941864 = queryWeight, product of:
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.051058397 = queryNorm
            0.7858995 = fieldWeight in 1150, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1150)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Voice search is becoming a popular mode for interacting with search engines. As a result, research has gone into building better voice transcription engines, interfaces, and search engines that better handle inherent verbosity of queries. However, when one considers its use by non- native speakers of English, another aspect that becomes important is the formulation of the query by users. In this paper, we present the results of a preliminary study that we conducted with non-native English speakers who formulate queries for given retrieval tasks. Our results show that the current search engines are sensitive in their rankings to the query formulation, and thus highlights the need for developing more robust ranking methods.

Noever, D.; Ciolino, M.: ¬The Turing deception (2022) 0.04

0.040547136 = product of:
  0.08109427 = sum of:
    0.08109427 = product of:
      0.24328281 = sum of:
        0.24328281 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
          0.24328281 = score(doc=862,freq=2.0), product of:
            0.43287367 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.051058397 = queryNorm
            0.56201804 = fieldWeight in 862, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=862)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Source: https%3A%2F%2Farxiv.org%2Fabs%2F2212.06721&usg=AOvVaw3i_9pZm9y_dQWoHi6uv0EN

Radev, D.; Fan, W.; Qu, H.; Wu, H.; Grewal, A.: Probabilistic question answering on the Web (2005) 0.04
```
0.037834857 = product of:
  0.07566971 = sum of:
    0.07566971 = product of:
      0.15133943 = sum of:
        0.15133943 = weight(_text_:engines in 3455) [ClassicSimilarity], result of:
          0.15133943 = score(doc=3455,freq=6.0), product of:
            0.25941864 = queryWeight, product of:
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.051058397 = queryNorm
            0.58337915 = fieldWeight in 3455, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.046875 = fieldNorm(doc=3455)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Web-based search engines such as Google and NorthernLight return documents that are relevant to a user query, not answers to user questions. We have developed an architecture that augments existing search engines so that they support natural language question answering. The process entails five steps: query modulation, document retrieval, passage extraction, phrase extraction, and answer ranking. In this article, we describe some probabilistic approaches to the last three of these stages. We show how our techniques apply to a number of existing search engines, and we also present results contrasting three different methods for question answering. Our algorithm, probabilistic phrase reranking (PPR), uses proximity and question type features and achieves a total reciprocal document rank of .20 an the TREC8 corpus. Our techniques have been implemented as a Web-accessible system, called NSIR.

Addison, E.R.; Wilson, H.D.; Feder, J.: ¬The impact of plain English searching on end users (1993) 0.03

0.029125283 = product of:
  0.058250565 = sum of:
    0.058250565 = product of:
      0.11650113 = sum of:
        0.11650113 = weight(_text_:engines in 5354) [ClassicSimilarity], result of:
          0.11650113 = score(doc=5354,freq=2.0), product of:
            0.25941864 = queryWeight, product of:
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.051058397 = queryNorm
            0.44908544 = fieldWeight in 5354, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.0625 = fieldNorm(doc=5354)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: Commercial software products are available with plain English searching capabilities as engines for online and CD-ROM information services, and for internal text information management. With plain English interfaces, end users do not need to master the keyword and connector approach of the Boolean search query language. Describes plain English searching and its impact on the process of full text retrieval. Explores the issues of ease of use, reliability and implications for the total research process

Warner, A.J.: Natural language processing (1987) 0.03

0.027670832 = product of:
  0.055341665 = sum of:
    0.055341665 = product of:
      0.11068333 = sum of:
        0.11068333 = weight(_text_:22 in 337) [ClassicSimilarity], result of:
          0.11068333 = score(doc=337,freq=2.0), product of:
            0.17879781 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051058397 = queryNorm
            0.61904186 = fieldWeight in 337, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.125 = fieldNorm(doc=337)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Annual review of information science and technology. 22(1987), S.79-108

Chandrasekar, R.; Bangalore, S.: Glean : using syntactic information in document filtering (2002) 0.03
```
0.025743358 = product of:
  0.051486716 = sum of:
    0.051486716 = product of:
      0.10297343 = sum of:
        0.10297343 = weight(_text_:engines in 4257) [ClassicSimilarity], result of:
          0.10297343 = score(doc=4257,freq=4.0), product of:
            0.25941864 = queryWeight, product of:
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.051058397 = queryNorm
            0.39693922 = fieldWeight in 4257, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4257)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

In today's networked world, a huge amount of data is available in machine-processable form. Likewise, there are any number of search engines and specialized information retrieval (IR) programs that seek to extract relevant information from these data repositories. Most IR systems and Web search engines have been designed for speed and tend to maximize the quantity of information (recall) rather than the relevance of the information (precision) to the query. As a result, search engine users get inundated with information for practically any query, and are forced to scan a large number of potentially relevant items to get to the information of interest. The Holy Grail of IR is to somehow retrieve those and only those documents pertinent to the user's query. Polysemy and synonymy - the fact that often there are several meanings for a word or phrase, and likewise, many ways to express a conceptmake this a very hard task. While conventional IR systems provide usable solutions, there are a number of open problems to be solved, in areas such as syntactic processing, semantic analysis, and user modeling, before we develop systems that "understand" user queries and text collections. Meanwhile, we can use tools and techniques available today to improve the precision of retrieval. In particular, using the approach described in this article, we can approximate understanding using the syntactic structure and patterns of language use that is latent in documents to make IR more effective.
Wang, F.L.; Yang, C.C.: Mining Web data for Chinese segmentation (2007) 0.03
```
0.025743358 = product of:
  0.051486716 = sum of:
    0.051486716 = product of:
      0.10297343 = sum of:
        0.10297343 = weight(_text_:engines in 604) [ClassicSimilarity], result of:
          0.10297343 = score(doc=604,freq=4.0), product of:
            0.25941864 = queryWeight, product of:
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.051058397 = queryNorm
            0.39693922 = fieldWeight in 604, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.0390625 = fieldNorm(doc=604)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Modern information retrieval systems use keywords within documents as indexing terms for search of relevant documents. As Chinese is an ideographic character-based language, the words in the texts are not delimited by white spaces. Indexing of Chinese documents is impossible without a proper segmentation algorithm. Many Chinese segmentation algorithms have been proposed in the past. Traditional segmentation algorithms cannot operate without a large dictionary or a large corpus of training data. Nowadays, the Web has become the largest corpus that is ideal for Chinese segmentation. Although most search engines have problems in segmenting texts into proper words, they maintain huge databases of documents and frequencies of character sequences in the documents. Their databases are important potential resources for segmentation. In this paper, we propose a segmentation algorithm by mining Web data with the help of search engines. On the other hand, the Romanized pinyin of Chinese language indicates boundaries of words in the text. Our algorithm is the first to utilize the Romanized pinyin to segmentation. It is the first unified segmentation algorithm for the Chinese language from different geographical areas, and it is also domain independent because of the nature of the Web. Experiments have been conducted on the datasets of a recent Chinese segmentation competition. The results show that our algorithm outperforms the traditional algorithms in terms of precision and recall. Moreover, our algorithm can effectively deal with the problems of segmentation ambiguity, new word (unknown word) detection, and stop words.
Rajasurya, S.; Muralidharan, T.; Devi, S.; Swamynathan, S.: Semantic information retrieval using ontology in university domain (2012) 0.03
```
0.025743358 = product of:
  0.051486716 = sum of:
    0.051486716 = product of:
      0.10297343 = sum of:
        0.10297343 = weight(_text_:engines in 2861) [ClassicSimilarity], result of:
          0.10297343 = score(doc=2861,freq=4.0), product of:
            0.25941864 = queryWeight, product of:
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.051058397 = queryNorm
            0.39693922 = fieldWeight in 2861, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2861)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Today's conventional search engines hardly do provide the essential content relevant to the user's search query. This is because the context and semantics of the request made by the user is not analyzed to the full extent. So here the need for a semantic web search arises. SWS is upcoming in the area of web search which combines Natural Language Processing and Artificial Intelligence. The objective of the work done here is to design, develop and implement a semantic search engine- SIEU(Semantic Information Extraction in University Domain) confined to the university domain. SIEU uses ontology as a knowledge base for the information retrieval process. It is not just a mere keyword search. It is one layer above what Google or any other search engines retrieve by analyzing just the keywords. Here the query is analyzed both syntactically and semantically. The developed system retrieves the web results more relevant to the user query through keyword expansion. The results obtained here will be accurate enough to satisfy the request made by the user. The level of accuracy will be enhanced since the query is analyzed semantically. The system will be of great use to the developers and researchers who work on web. The Google results are re-ranked and optimized for providing the relevant links. For ranking an algorithm has been applied which fetches more apt results for the user query.

Brenner, E.H.: Beyond Boolean : new approaches in information retrieval; the quest for intuitive online search systems past, present & future (1995) 0.03

0.025484623 = product of:
  0.050969247 = sum of:
    0.050969247 = product of:
      0.10193849 = sum of:
        0.10193849 = weight(_text_:engines in 2547) [ClassicSimilarity], result of:
          0.10193849 = score(doc=2547,freq=2.0), product of:
            0.25941864 = queryWeight, product of:
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.051058397 = queryNorm
            0.39294976 = fieldWeight in 2547, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2547)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Content: (1) The Boolean world; (2) The Non-Boolean picture; (3) The commercial search engines: Personal Librarian, CLARIT, ConQuest, DR-LINK, InQuizit, InTEXT, TOPIC, WIN, TARGET, FREESTYLE, InfoSeek; (4) Wiedergabe von 8 Aufsätzen aus 'Monitor'

McMahon, J.G.; Smith, F.J.: Improved statistical language model performance with automatic generated word hierarchies (1996) 0.02

0.024211979 = product of:
  0.048423957 = sum of:
    0.048423957 = product of:
      0.096847914 = sum of:
        0.096847914 = weight(_text_:22 in 3164) [ClassicSimilarity], result of:
          0.096847914 = score(doc=3164,freq=2.0), product of:
            0.17879781 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051058397 = queryNorm
            0.5416616 = fieldWeight in 3164, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=3164)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Computational linguistics. 22(1996) no.2, S.217-248

Ruge, G.: ¬A spreading activation network for automatic generation of thesaurus relationships (1991) 0.02

0.024211979 = product of:
  0.048423957 = sum of:
    0.048423957 = product of:
      0.096847914 = sum of:
        0.096847914 = weight(_text_:22 in 4506) [ClassicSimilarity], result of:
          0.096847914 = score(doc=4506,freq=2.0), product of:
            0.17879781 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051058397 = queryNorm
            0.5416616 = fieldWeight in 4506, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=4506)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 8.10.2000 11:52:22

Somers, H.: Example-based machine translation : Review article (1999) 0.02

0.024211979 = product of:
  0.048423957 = sum of:
    0.048423957 = product of:
      0.096847914 = sum of:
        0.096847914 = weight(_text_:22 in 6672) [ClassicSimilarity], result of:
          0.096847914 = score(doc=6672,freq=2.0), product of:
            0.17879781 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051058397 = queryNorm
            0.5416616 = fieldWeight in 6672, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=6672)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 31. 7.1996 9:22:19

New tools for human translators (1997) 0.02

0.024211979 = product of:
  0.048423957 = sum of:
    0.048423957 = product of:
      0.096847914 = sum of:
        0.096847914 = weight(_text_:22 in 1179) [ClassicSimilarity], result of:
          0.096847914 = score(doc=1179,freq=2.0), product of:
            0.17879781 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051058397 = queryNorm
            0.5416616 = fieldWeight in 1179, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=1179)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 31. 7.1996 9:22:19

Baayen, R.H.; Lieber, H.: Word frequency distributions and lexical semantics (1997) 0.02

0.024211979 = product of:
  0.048423957 = sum of:
    0.048423957 = product of:
      0.096847914 = sum of:
        0.096847914 = weight(_text_:22 in 3117) [ClassicSimilarity], result of:
          0.096847914 = score(doc=3117,freq=2.0), product of:
            0.17879781 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051058397 = queryNorm
            0.5416616 = fieldWeight in 3117, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=3117)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 28. 2.1999 10:48:22

¬Der Student aus dem Computer (2023) 0.02

0.024211979 = product of:
  0.048423957 = sum of:
    0.048423957 = product of:
      0.096847914 = sum of:
        0.096847914 = weight(_text_:22 in 1079) [ClassicSimilarity], result of:
          0.096847914 = score(doc=1079,freq=2.0), product of:
            0.17879781 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051058397 = queryNorm
            0.5416616 = fieldWeight in 1079, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=1079)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 27. 1.2023 16:22:55

Nait-Baha, L.; Jackiewicz, A.; Djioua, B.; Laublet, P.: Query reformulation for information retrieval on the Web using the point of view methodology : preliminary results (2001) 0.02
```
0.021843962 = product of:
  0.043687925 = sum of:
    0.043687925 = product of:
      0.08737585 = sum of:
        0.08737585 = weight(_text_:engines in 249) [ClassicSimilarity], result of:
          0.08737585 = score(doc=249,freq=2.0), product of:
            0.25941864 = queryWeight, product of:
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.051058397 = queryNorm
            0.33681408 = fieldWeight in 249, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.046875 = fieldNorm(doc=249)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The work we are presenting is devoted to the information collected on the WWW. By the term collected we mean the whole process of retrieving, extracting and presenting results to the user. This research is part of the RAP (Research, Analyze, Propose) project in which we propose to combine two methods: (i) query reformulation using linguistic markers according to a given point of view; and (ii) text semantic analysis by means of contextual exploration results (Descles, 1991). The general project architecture describing the interactions between the users, the RAP system and the WWW search engines is presented in Nait-Baha et al. (1998). We will focus this paper on showing how we use linguistic markers to reformulate the queries according to a given point of view
Ahmed, F.; Nürnberger, A.: Evaluation of n-gram conflation approaches for Arabic text retrieval (2009) 0.02
```
0.021843962 = product of:
  0.043687925 = sum of:
    0.043687925 = product of:
      0.08737585 = sum of:
        0.08737585 = weight(_text_:engines in 2941) [ClassicSimilarity], result of:
          0.08737585 = score(doc=2941,freq=2.0), product of:
            0.25941864 = queryWeight, product of:
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.051058397 = queryNorm
            0.33681408 = fieldWeight in 2941, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.046875 = fieldNorm(doc=2941)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

In this paper we present a language-independent approach for conflation that does not depend on predefined rules or prior knowledge of the target language. The proposed unsupervised method is based on an enhancement of the pure n-gram model that can group related words based on various string-similarity measures, while restricting the search to specific locations of the target word by taking into account the order of n-grams. We show that the method is effective to achieve high score similarities for all word-form variations and reduces the ambiguity, i.e., obtains a higher precision and recall, compared to pure n-gram-based approaches for English, Portuguese, and Arabic. The proposed method is especially suited for conflation approaches in Arabic, since Arabic is a highly inflectional language. Therefore, we present in addition an adaptive user interface for Arabic text retrieval called araSearch. araSearch serves as a metasearch interface to existing search engines. The system is able to extend a query using the proposed conflation approach such that additional results for relevant subwords can be found automatically.

Byrne, C.C.; McCracken, S.A.: ¬An adaptive thesaurus employing semantic distance, relational inheritance and nominal compound interpretation for linguistic support of information retrieval (1999) 0.02

0.020753123 = product of:
  0.041506246 = sum of:
    0.041506246 = product of:
      0.08301249 = sum of:
        0.08301249 = weight(_text_:22 in 4483) [ClassicSimilarity], result of:
          0.08301249 = score(doc=4483,freq=2.0), product of:
            0.17879781 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051058397 = queryNorm
            0.46428138 = fieldWeight in 4483, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=4483)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 15. 3.2000 10:22:37

Search (63 results, page 1 of 4)

Authors

Years

Languages

Types

Themes

Subjects

Classifications