Search (110 results, page 2 of 6)

Li, Q.; Chen, Y.P.; Myaeng, S.-H.; Jin, Y.; Kang, B.-Y.: Concept unification of terms in different languages via web mining for Information Retrieval (2009) 0.02
```
0.01615954 = product of:
  0.03231908 = sum of:
    0.03231908 = product of:
      0.06463816 = sum of:
        0.06463816 = weight(_text_:retrieval in 4215) [ClassicSimilarity], result of:
          0.06463816 = score(doc=4215,freq=12.0), product of:
            0.15791564 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.052204985 = queryNorm
            0.40932083 = fieldWeight in 4215, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4215)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

For historical and cultural reasons, English phrases, especially proper nouns and new words, frequently appear in Web pages written primarily in East Asian languages such as Chinese, Korean, and Japanese. Although such English terms and their equivalences in these East Asian languages refer to the same concept, they are often erroneously treated as independent index units in traditional Information Retrieval (IR). This paper describes the degree to which the problem arises in IR and proposes a novel technique to solve it. Our method first extracts English terms from native Web documents in an East Asian language, and then unifies the extracted terms and their equivalences in the native language as one index unit. For Cross-Language Information Retrieval (CLIR), one of the major hindrances to achieving retrieval performance at the level of Mono-Lingual Information Retrieval (MLIR) is the translation of terms in search queries which can not be found in a bilingual dictionary. The Web mining approach proposed in this paper for concept unification of terms in different languages can also be applied to solve this well-known challenge in CLIR. Experimental results based on NTCIR and KT-Set test collections show that the high translation precision of our approach greatly improves performance of both Mono-Lingual and Cross-Language Information Retrieval.

Ding, Y.; Chowdhury, G.C.; Foo, S.: Incorporating the results of co-word analyses to increase search variety for information retrieval (2000) 0.02

0.01583305 = product of:
  0.0316661 = sum of:
    0.0316661 = product of:
      0.0633322 = sum of:
        0.0633322 = weight(_text_:retrieval in 6328) [ClassicSimilarity], result of:
          0.0633322 = score(doc=6328,freq=2.0), product of:
            0.15791564 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.052204985 = queryNorm
            0.40105087 = fieldWeight in 6328, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.09375 = fieldNorm(doc=6328)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Vilar, P.; Dimec, J.: Krnjenje kot osnova nekaterih nekonvencionalnih metod poizvedovanja (2000) 0.02

0.01583305 = product of:
  0.0316661 = sum of:
    0.0316661 = product of:
      0.0633322 = sum of:
        0.0633322 = weight(_text_:retrieval in 6331) [ClassicSimilarity], result of:
          0.0633322 = score(doc=6331,freq=2.0), product of:
            0.15791564 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.052204985 = queryNorm
            0.40105087 = fieldWeight in 6331, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.09375 = fieldNorm(doc=6331)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Footnote: Übers. d. Titels: Stemming as a basis for some non-conventional methods of information retrieval

Ruchimskaya, E.M.: Yavlenie variativnosti estestevennogo yazyka i sposoby ee ustraneniya v verbal'nykh IPYA (2000) 0.02

0.01583305 = product of:
  0.0316661 = sum of:
    0.0316661 = product of:
      0.0633322 = sum of:
        0.0633322 = weight(_text_:retrieval in 6472) [ClassicSimilarity], result of:
          0.0633322 = score(doc=6472,freq=2.0), product of:
            0.15791564 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.052204985 = queryNorm
            0.40105087 = fieldWeight in 6472, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.09375 = fieldNorm(doc=6472)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Footnote: Übers. des Titels: Natural language variations and their handling in information retrieval languages

Figuerola, C.G.; Gomez, R.; Lopez de San Roman, E.: Stemming and n-grams in Spanish : an evaluation of their impact in information retrieval (2000) 0.02

0.01583305 = product of:
  0.0316661 = sum of:
    0.0316661 = product of:
      0.0633322 = sum of:
        0.0633322 = weight(_text_:retrieval in 6501) [ClassicSimilarity], result of:
          0.0633322 = score(doc=6501,freq=2.0), product of:
            0.15791564 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.052204985 = queryNorm
            0.40105087 = fieldWeight in 6501, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.09375 = fieldNorm(doc=6501)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Chieu, H.L.; Lee, Y.K.: Query based event extraction along a timeline (2004) 0.02

0.01583305 = product of:
  0.0316661 = sum of:
    0.0316661 = product of:
      0.0633322 = sum of:
        0.0633322 = weight(_text_:retrieval in 4108) [ClassicSimilarity], result of:
          0.0633322 = score(doc=4108,freq=2.0), product of:
            0.15791564 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.052204985 = queryNorm
            0.40105087 = fieldWeight in 4108, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.09375 = fieldNorm(doc=4108)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: SIGIR'04: Proceedings of the 27th Annual International ACM-SIGIR Conference an Research and Development in Information Retrieval. Ed.: K. Järvelin, u.a

Airio, E.: Who benefits from CLIR in web retrieval? (2008) 0.02
```
0.01583305 = product of:
  0.0316661 = sum of:
    0.0316661 = product of:
      0.0633322 = sum of:
        0.0633322 = weight(_text_:retrieval in 2342) [ClassicSimilarity], result of:
          0.0633322 = score(doc=2342,freq=8.0), product of:
            0.15791564 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.052204985 = queryNorm
            0.40105087 = fieldWeight in 2342, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=2342)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Purpose - The aim of the current paper is to test whether query translation is beneficial in web retrieval. Design/methodology/approach - The language pairs were Finnish-Swedish, English-German and Finnish-French. A total of 12-18 participants were recruited for each language pair. Each participant performed four retrieval tasks. The author's aim was to compare the performance of the translated queries with that of the target language queries. Thus, the author asked participants to formulate a source language query and a target language query for each task. The source language queries were translated into the target language utilizing a dictionary-based system. In English-German, also machine translation was utilized. The author used Google as the search engine. Findings - The results differed depending on the language pair. The author concluded that the dictionary coverage had an effect on the results. On average, the results of query-translation were better than in the traditional laboratory tests. Originality/value - This research shows that query translation in web is beneficial especially for users with moderate and non-active language skills. This is valuable information for developers of cross-language information retrieval systems.
Airio, E.; Kettunen, K.: Does dictionary based bilingual retrieval work in a non-normalized index? (2009) 0.02
```
0.01583305 = product of:
  0.0316661 = sum of:
    0.0316661 = product of:
      0.0633322 = sum of:
        0.0633322 = weight(_text_:retrieval in 4224) [ClassicSimilarity], result of:
          0.0633322 = score(doc=4224,freq=8.0), product of:
            0.15791564 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.052204985 = queryNorm
            0.40105087 = fieldWeight in 4224, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=4224)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Many operational IR indexes are non-normalized, i.e. no lemmatization or stemming techniques, etc. have been employed in indexing. This poses a challenge for dictionary-based cross-language retrieval (CLIR), because translations are mostly lemmas. In this study, we face the challenge of dictionary-based CLIR in a non-normalized index. We test two optional approaches: FCG (Frequent Case Generation) and s-gramming. The idea of FCG is to automatically generate the most frequent inflected forms for a given lemma. FCG has been tested in monolingual retrieval and has been shown to be a good method for inflected retrieval, especially for highly inflected languages. S-gramming is an approximate string matching technique (an extension of n-gramming). The language pairs in our tests were English-Finnish, English-Swedish, Swedish-Finnish and Finnish-Swedish. Both our approaches performed quite well, but the results varied depending on the language pair. S-gramming and FCG performed quite equally in all the other language pairs except Finnish-Swedish, where s-gramming outperformed FCG.
Schneider, R.: Question answering : das Retrieval der Zukunft? (2007) 0.01
```
0.014927543 = product of:
  0.029855086 = sum of:
    0.029855086 = product of:
      0.05971017 = sum of:
        0.05971017 = weight(_text_:retrieval in 5953) [ClassicSimilarity], result of:
          0.05971017 = score(doc=5953,freq=4.0), product of:
            0.15791564 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.052204985 = queryNorm
            0.37811437 = fieldWeight in 5953, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0625 = fieldNorm(doc=5953)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Der Artikel geht der Frage nach, ob und inwieweit Informations- und Recherchesysteme von der Technologie natürlich sprachlicher Frage-Antwortsysteme, so genannter Question Answering-Systeme, profitieren können. Nach einer allgemeinen Einführung in die Zielsetzung und die historische Entwicklung dieses Sonderzweigs der maschinellen Sprachverarbeitung werden dessen Abgrenzung von herkömmlichen Retrieval- und Extraktionsverfahren erläutert und die besondere Struktur von Question Answering-Systemen sowie einzelne Evaluierungsinitiativen aufgezeichnet. Zudem werden konkrete Anwendungsfelder im Bibliothekswesen vorgestellt.

Benoit, G.: Data discretization for novel relationship discovery in information retrieval (2002) 0.01

0.014927543 = product of:
  0.029855086 = sum of:
    0.029855086 = product of:
      0.05971017 = sum of:
        0.05971017 = weight(_text_:retrieval in 5197) [ClassicSimilarity], result of:
          0.05971017 = score(doc=5197,freq=4.0), product of:
            0.15791564 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.052204985 = queryNorm
            0.37811437 = fieldWeight in 5197, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0625 = fieldNorm(doc=5197)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: A sample of 600 Dialog and Swiss-Prot full text records in genetics and molecular biology were parsed and term frequencies calculated to provide data for a test of Benoit's visualization model for retrieval. A retrieved set is displayed graphically allowing for manipulation of document and concept relationships in real time, which hopefully will reveal unanticipated relationships.

Rapke, K.: Automatische Indexierung von Volltexten für die Gruner+Jahr Pressedatenbank (2001) 0.01
```
0.014751574 = product of:
  0.029503148 = sum of:
    0.029503148 = product of:
      0.059006296 = sum of:
        0.059006296 = weight(_text_:retrieval in 5863) [ClassicSimilarity], result of:
          0.059006296 = score(doc=5863,freq=10.0), product of:
            0.15791564 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.052204985 = queryNorm
            0.37365708 = fieldWeight in 5863, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5863)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Retrievaltests sind die anerkannteste Methode, um neue Verfahren der Inhaltserschließung gegenüber traditionellen Verfahren zu rechtfertigen. Im Rahmen einer Diplomarbeit wurden zwei grundsätzlich unterschiedliche Systeme der automatischen inhaltlichen Erschließung anhand der Pressedatenbank des Verlagshauses Gruner + Jahr (G+J) getestet und evaluiert. Untersucht wurde dabei natürlichsprachliches Retrieval im Vergleich zu Booleschem Retrieval. Bei den beiden Systemen handelt es sich zum einen um Autonomy von Autonomy Inc. und DocCat, das von IBM an die Datenbankstruktur der G+J Pressedatenbank angepasst wurde. Ersteres ist ein auf natürlichsprachlichem Retrieval basierendes, probabilistisches System. DocCat demgegenüber basiert auf Booleschem Retrieval und ist ein lernendes System, das aufgrund einer intellektuell erstellten Trainingsvorlage indexiert. Methodisch geht die Evaluation vom realen Anwendungskontext der Textdokumentation von G+J aus. Die Tests werden sowohl unter statistischen wie auch qualitativen Gesichtspunkten bewertet. Ein Ergebnis der Tests ist, dass DocCat einige Mängel gegenüber der intellektuellen Inhaltserschließung aufweist, die noch behoben werden müssen, während das natürlichsprachliche Retrieval von Autonomy in diesem Rahmen und für die speziellen Anforderungen der G+J Textdokumentation so nicht einsetzbar ist
Jacquemin, C.: Spotting and discovering terms through natural language processing (2001) 0.01
```
0.014751574 = product of:
  0.029503148 = sum of:
    0.029503148 = product of:
      0.059006296 = sum of:
        0.059006296 = weight(_text_:retrieval in 119) [ClassicSimilarity], result of:
          0.059006296 = score(doc=119,freq=10.0), product of:
            0.15791564 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.052204985 = queryNorm
            0.37365708 = fieldWeight in 119, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=119)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

In this book Christian Jacquemin shows how the power of natural language processing (NLP) can be used to advance text indexing and information retrieval (IR). Jacquemin's novel tool is FASTR, a parser that normalizes terms and recognizes term variants. Since there are more meanings in a language than there are words, FASTR uses a metagrammar composed of shallow linguistic transformations that describe the morphological, syntactic, semantic, and pragmatic variations of words and terms. The acquired parsed terms can then be applied for precise retrieval and assembly of information. The use of a corpus-based unification grammar to define, recognize, and combine term variants from their base forms allows for intelligent information access to, or "linguistic data tuning" of, heterogeneous texts. FASTR can be used to do automatic controlled indexing, to carry out content-based Web searches through conceptually related alternative query formulations, to abstract scientific and technical extracts, and even to translate and collect terms from multilingual material. Jacquemin provides a comprehensive account of the method and implementation of this innovative retrieval technique for text processing.

RSWK

Automatische Indexierung / Computerlinguistik / Information Retrieval

Subject

Automatische Indexierung / Computerlinguistik / Information Retrieval
Sidhom, S.; Hassoun, M.: Morpho-syntactic parsing for a text mining environment : An NP recognition model for knowledge visualization and information retrieval (2002) 0.01
```
0.013711825 = product of:
  0.02742365 = sum of:
    0.02742365 = product of:
      0.0548473 = sum of:
        0.0548473 = weight(_text_:retrieval in 1852) [ClassicSimilarity], result of:
          0.0548473 = score(doc=1852,freq=6.0), product of:
            0.15791564 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.052204985 = queryNorm
            0.34732026 = fieldWeight in 1852, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=1852)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Sidhom and Hassoun discuss the crucial role of NLP tools in Knowledge Extraction and Management as well as in the design of Information Retrieval Systems. The authors focus more specifically an the morpho-syntactic issues by describing their morpho-syntactic analysis platform, which has been implemented to cover the automatic indexing and information retrieval topics. To this end they implemented the Cascaded "Augmented Transition Network (ATN)". They used this formalism in order to analyse French text descriptions of Multimedia documents. An implementation of an ATN parsing automaton is briefly described. The Platform in its logical operation is considered as an investigative tool towards the knowledge organization (based an an NP recognition model) and management of multiform e-documents (text, multimedia, audio, image) using their text descriptions.
Galvez, C.; Moya-Anegón, F. de: ¬An evaluation of conflation accuracy using finite-state transducers (2006) 0.01
```
0.013711825 = product of:
  0.02742365 = sum of:
    0.02742365 = product of:
      0.0548473 = sum of:
        0.0548473 = weight(_text_:retrieval in 5599) [ClassicSimilarity], result of:
          0.0548473 = score(doc=5599,freq=6.0), product of:
            0.15791564 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.052204985 = queryNorm
            0.34732026 = fieldWeight in 5599, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=5599)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Purpose - To evaluate the accuracy of conflation methods based on finite-state transducers (FSTs). Design/methodology/approach - Incorrectly lemmatized and stemmed forms may lead to the retrieval of inappropriate documents. Experimental studies to date have focused on retrieval performance, but very few on conflation performance. The process of normalization we used involved a linguistic toolbox that allowed us to construct, through graphic interfaces, electronic dictionaries represented internally by FSTs. The lexical resources developed were applied to a Spanish test corpus for merging term variants in canonical lemmatized forms. Conflation performance was evaluated in terms of an adaptation of recall and precision measures, based on accuracy and coverage, not actual retrieval. The results were compared with those obtained using a Spanish version of the Porter algorithm. Findings - The conclusion is that the main strength of lemmatization is its accuracy, whereas its main limitation is the underanalysis of variant forms. Originality/value - The report outlines the potential of transducers in their application to normalization processes.
Jensen, N.: Evaluierung von mehrsprachigem Web-Retrieval : Experimente mit dem EuroGOV-Korpus im Rahmen des Cross Language Evaluation Forum (CLEF) (2006) 0.01
```
0.013711825 = product of:
  0.02742365 = sum of:
    0.02742365 = product of:
      0.0548473 = sum of:
        0.0548473 = weight(_text_:retrieval in 5964) [ClassicSimilarity], result of:
          0.0548473 = score(doc=5964,freq=6.0), product of:
            0.15791564 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.052204985 = queryNorm
            0.34732026 = fieldWeight in 5964, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=5964)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Der vorliegende Artikel beschreibt die Experimente der Universität Hildesheim im Rahmen des ersten Web Track der CLEF-Initiative (WebCLEF) im Jahr 2005. Bei der Teilnahme konnten Erfahrungen mit einem multilingualen Web-Korpus (EuroGOV) bei der Vorverarbeitung, der Topic- bzw. Query-Entwicklung, bei sprachunabhängigen Indexierungsmethoden und multilingualen Retrieval-Strategien gesammelt werden. Aufgrund des großen Um-fangs des Korpus und der zeitlichen Einschränkungen wurden multilinguale Indizes aufgebaut. Der Artikel beschreibt die Vorgehensweise bei der Teilnahme der Universität Hildesheim und die Ergebnisse der offiziell eingereichten sowie weiterer Experimente. Für den Multilingual Task konnte das beste Ergebnis in CLEF erzielt werden.

Source

Effektive Information Retrieval Verfahren in Theorie und Praxis: ausgewählte und erweiterte Beiträge des Vierten Hildesheimer Evaluierungs- und Retrievalworkshop (HIER 2005), Hildesheim, 20.7.2005. Hrsg.: T. Mandl u. C. Womser-Hacker
Warner, J.: Linguistics and information theory : analytic advantages (2007) 0.01
```
0.013711825 = product of:
  0.02742365 = sum of:
    0.02742365 = product of:
      0.0548473 = sum of:
        0.0548473 = weight(_text_:retrieval in 77) [ClassicSimilarity], result of:
          0.0548473 = score(doc=77,freq=6.0), product of:
            0.15791564 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.052204985 = queryNorm
            0.34732026 = fieldWeight in 77, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=77)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The analytic advantages of central concepts from linguistics and information theory, and the analogies demonstrated between them, for understanding patterns of retrieval from full-text indexes to documents are developed. The interaction between the syntagm and the paradigm in computational operations on written language in indexing, searching, and retrieval is used to account for transformations of the signified or meaning between documents and their representation and between queries and documents retrieved. Characteristics of the message, and messages for selection for written language, are brought to explain the relative frequency of occurrence of words and multiple word sequences in documents. The examples given in the companion article are revisited and a fuller example introduced. The signified of the sequence stood for, the term classically used in the definitions of the sign, as something standing for something else, can itself change rapidly according to its syntagm. A greater than ordinary discourse understanding of patterns in retrieval is obtained.
Oard, D.W.; He, D.; Wang, J.: User-assisted query translation for interactive cross-language information retrieval (2008) 0.01
```
0.013711825 = product of:
  0.02742365 = sum of:
    0.02742365 = product of:
      0.0548473 = sum of:
        0.0548473 = weight(_text_:retrieval in 2030) [ClassicSimilarity], result of:
          0.0548473 = score(doc=2030,freq=6.0), product of:
            0.15791564 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.052204985 = queryNorm
            0.34732026 = fieldWeight in 2030, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=2030)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Interactive Cross-Language Information Retrieval (CLIR), a process in which searcher and system collaborate to find documents that satisfy an information need regardless of the language in which those documents are written, calls for designs in which synergies between searcher and system can be leveraged so that the strengths of one can cover weaknesses of the other. This paper describes an approach that employs user-assisted query translation to help searchers better understand the system's operation. Supporting interaction and interface designs are introduced, and results from three user studies are presented. The results indicate that experienced searchers presented with this new system evolve new search strategies that make effective use of the new capabilities, that they achieve retrieval effectiveness comparable to results obtained using fully automatic techniques, and that reported satisfaction with support for cross-language searching increased. The paper concludes with a description of a freely available interactive CLIR system that incorporates lessons learned from this research.
Fautsch, C.; Savoy, J.: Algorithmic stemmers or morphological analysis? : an evaluation (2009) 0.01
```
0.013711825 = product of:
  0.02742365 = sum of:
    0.02742365 = product of:
      0.0548473 = sum of:
        0.0548473 = weight(_text_:retrieval in 2950) [ClassicSimilarity], result of:
          0.0548473 = score(doc=2950,freq=6.0), product of:
            0.15791564 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.052204985 = queryNorm
            0.34732026 = fieldWeight in 2950, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=2950)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

It is important in information retrieval (IR), information extraction, or classification tasks that morphologically related forms are conflated under the same stem (using stemmer) or lemma (using morphological analyzer). To achieve this for the English language, algorithmic stemming or various morphological analysis approaches have been suggested. Based on Cross-Language Evaluation Forum test collections containing 284 queries and various IR models, this article evaluates these word-normalization proposals. Stemming improves the mean average precision significantly by around 7% while performance differences are not significant when comparing various algorithmic stemmers or algorithmic stemmers and morphological analysis. Accounting for thesaurus class numbers during indexing does not modify overall retrieval performances. Finally, we demonstrate that including a stop word list, even one containing only around 10 terms, might significantly improve retrieval performance, depending on the IR model.

Pimenov, E.N.: Normativnost' i nekotorye problem razrabotki tezauruzov i drugikh lingvistiicheskikh sredstv IPS (2000) 0.01

0.013194209 = product of:
  0.026388418 = sum of:
    0.026388418 = product of:
      0.052776836 = sum of:
        0.052776836 = weight(_text_:retrieval in 3281) [ClassicSimilarity], result of:
          0.052776836 = score(doc=3281,freq=2.0), product of:
            0.15791564 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.052204985 = queryNorm
            0.33420905 = fieldWeight in 3281, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.078125 = fieldNorm(doc=3281)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Footnote: Übers. des Titels: Standardisation and some other issues connected with the development of thesauri and other linguistic information retrieval tools

Xu, J.; Weischedel, R.; Licuanan, A.: Evaluation of an extraction-based approach to answering definitional questions (2004) 0.01

0.013194209 = product of:
  0.026388418 = sum of:
    0.026388418 = product of:
      0.052776836 = sum of:
        0.052776836 = weight(_text_:retrieval in 4107) [ClassicSimilarity], result of:
          0.052776836 = score(doc=4107,freq=2.0), product of:
            0.15791564 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.052204985 = queryNorm
            0.33420905 = fieldWeight in 4107, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.078125 = fieldNorm(doc=4107)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: SIGIR'04: Proceedings of the 27th Annual International ACM-SIGIR Conference an Research and Development in Information Retrieval. Ed.: K. Järvelin, u.a

Search (110 results, page 2 of 6)

Authors

Languages

Types

Themes

Subjects

Classifications