Search (13 results, page 1 of 1)

Oard, D.W.; He, D.; Wang, J.: User-assisted query translation for interactive cross-language information retrieval (2008) 0.00

0.004896801 = product of:
  0.034277607 = sum of:
    0.034277607 = product of:
      0.085694015 = sum of:
        0.0380555 = weight(_text_:retrieval in 2030) [ClassicSimilarity], result of:
          0.0380555 = score(doc=2030,freq=6.0), product of:
            0.109568894 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03622214 = queryNorm
            0.34732026 = fieldWeight in 2030, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=2030)
        0.047638513 = weight(_text_:system in 2030) [ClassicSimilarity], result of:
          0.047638513 = score(doc=2030,freq=8.0), product of:
            0.11408355 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.03622214 = queryNorm
            0.41757566 = fieldWeight in 2030, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.046875 = fieldNorm(doc=2030)
      0.4 = coord(2/5)
  0.14285715 = coord(1/7)

Abstract: Interactive Cross-Language Information Retrieval (CLIR), a process in which searcher and system collaborate to find documents that satisfy an information need regardless of the language in which those documents are written, calls for designs in which synergies between searcher and system can be leveraged so that the strengths of one can cover weaknesses of the other. This paper describes an approach that employs user-assisted query translation to help searchers better understand the system's operation. Supporting interaction and interface designs are introduced, and results from three user studies are presented. The results indicate that experienced searchers presented with this new system evolve new search strategies that make effective use of the new capabilities, that they achieve retrieval effectiveness comparable to results obtained using fully automatic techniques, and that reported satisfaction with support for cross-language searching increased. The paper concludes with a description of a freely available interactive CLIR system that incorporates lessons learned from this research.

Oard, D.W.: Alternative approaches for cross-language text retrieval (1997) 0.00
```
0.0039593396 = product of:
  0.027715376 = sum of:
    0.027715376 = product of:
      0.06928844 = sum of:
        0.04963856 = weight(_text_:retrieval in 1164) [ClassicSimilarity], result of:
          0.04963856 = score(doc=1164,freq=30.0), product of:
            0.109568894 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03622214 = queryNorm
            0.45303512 = fieldWeight in 1164, product of:
              5.477226 = tf(freq=30.0), with freq of:
                30.0 = termFreq=30.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.02734375 = fieldNorm(doc=1164)
        0.019649884 = weight(_text_:system in 1164) [ClassicSimilarity], result of:
          0.019649884 = score(doc=1164,freq=4.0), product of:
            0.11408355 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.03622214 = queryNorm
            0.17224117 = fieldWeight in 1164, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.02734375 = fieldNorm(doc=1164)
      0.4 = coord(2/5)
  0.14285715 = coord(1/7)
```
Abstract

The explosive growth of the Internet and other sources of networked information have made automatic mediation of access to networked information sources an increasingly important problem. Much of this information is expressed as electronic text, and it is becoming practical to automatically convert some printed documents and recorded speech to electronic text as well. Thus, automated systems capable of detecting useful documents are finding widespread application. With even a small number of languages it can be inconvenient to issue the same query repeatedly in every language, so users who are able to read more than one language will likely prefer a multilingual text retrieval system over a collection of monolingual systems. And since reading ability in a language does not always imply fluent writing ability in that language, such users will likely find cross-language text retrieval particularly useful for languages in which they are less confident of their ability to express their information needs effectively. The use of such systems can be also be beneficial if the user is able to read only a single language. For example, when only a small portion of the document collection will ever be examined by the user, performing retrieval before translation can be significantly more economical than performing translation before retrieval. So when the application is sufficiently important to justify the time and effort required for translation, those costs can be minimized if an effective cross-language text retrieval system is available. Even when translation is not available, there are circumstances in which cross-language text retrieval could be useful to a monolingual user. For example, a researcher might find a paper published in an unfamiliar language useful if that paper contains references to works by the same author that are in the researcher's native language.
Multilingual text retrieval can be defined as selection of useful documents from collections that may contain several languages (English, French, Chinese, etc.). This formulation allows for the possibility that individual documents might contain more than one language, a common occurrence in some applications. Both cross-language and within-language retrieval are included in this formulation, but it is the cross-language aspect of the problem which distinguishes multilingual text retrieval from its well studied monolingual counterpart. At the SIGIR 96 workshop on "Cross-Linguistic Information Retrieval" the participants discussed the proliferation of terminology being used to describe the field and settled on "Cross-Language" as the best single description of the salient aspect of the problem. "Multilingual" was felt to be too broad, since that term has also been used to describe systems able to perform within-language retrieval in more than one language but that lack any cross-language capability. "Cross-lingual" and "cross-linguistic" were felt to be equally good descriptions of the field, but "crosslanguage" was selected as the preferred term in the interest of standardization. Unfortunately, at about the same time the U.S. Defense Advanced Research Projects Agency (DARPA) introduced "translingual" as their preferred term, so we are still some distance from reaching consensus on this matter.
I will not attempt to draw a sharp distinction between retrieval and filtering in this survey. Although my own work on adaptive cross-language text filtering has led me to make this distinction fairly carefully in other presentations (c.f., (Oard 1997b)), such an proach does little to help understand the fundamental techniques which have been applied or the results that have been obtained in this case. Since it is still common to view filtering (detection of useful documents in dynamic document streams) as a kind of retrieval, will simply adopt that perspective here.

Theme

Semantisches Umfeld in Indexierung u. Retrieval

Oard, D.W.; Dorr, B.J.: Evaluating cross-laguage text filtering effectiveness (1998) 0.00

0.0020714786 = product of:
  0.01450035 = sum of:
    0.01450035 = product of:
      0.07250175 = sum of:
        0.07250175 = weight(_text_:retrieval in 6214) [ClassicSimilarity], result of:
          0.07250175 = score(doc=6214,freq=4.0), product of:
            0.109568894 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03622214 = queryNorm
            0.6617001 = fieldWeight in 6214, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.109375 = fieldNorm(doc=6214)
      0.2 = coord(1/5)
  0.14285715 = coord(1/7)

Series: The Kluwer International series on information retrieval
Source: Cross-language information retrieval. Ed.: G. Grefenstette

Oard, D.W.; Webber, W.: Information retrieval for e-discovery (2013) 0.00
```
0.0017755532 = product of:
  0.012428872 = sum of:
    0.012428872 = product of:
      0.06214436 = sum of:
        0.06214436 = weight(_text_:retrieval in 211) [ClassicSimilarity], result of:
          0.06214436 = score(doc=211,freq=16.0), product of:
            0.109568894 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03622214 = queryNorm
            0.5671716 = fieldWeight in 211, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=211)
      0.2 = coord(1/5)
  0.14285715 = coord(1/7)
```
Abstract

E-discovery refers generally to the process by which one party (for example, the plaintiff) is entitled to discover evidence in the form of electronically stored information that is held by another party (for example, the defendant), and that is relevant to some matter that is the subject of civil litigation (that is, what is commonly called a "lawsuit"). Information Retrieval for E-Discovery describes the emergence of the field, identifies the information retrieval issues that arise, reviews the work to date on this topic, and summarizes major open issues. Information Retrieval for E-Discovery is an ideal primer for anyone with an interest in e-discovery; be it researchers who first practiced law but now study information retrieval, or those who studied information retrieval but now practice law.

Content

Table of contents 1. Introduction 2. The E-Discovery Process 3. Information Retrieval for E-Discovery 4. Evaluating E-Discovery 5. Experimental Evaluation 6. Looking to the Future 7. Conclusion A. Interpreting Legal Citations Acknowledgments Notations and Acronyms References

Series

Foundations and trends(r) in information retrieval; 7,2/3
Wang, J.; Oard, D.W.: Matching meaning for cross-language information retrieval (2012) 0.00
```
0.0014647568 = product of:
  0.010253297 = sum of:
    0.010253297 = product of:
      0.051266484 = sum of:
        0.051266484 = weight(_text_:retrieval in 7430) [ClassicSimilarity], result of:
          0.051266484 = score(doc=7430,freq=8.0), product of:
            0.109568894 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03622214 = queryNorm
            0.46789268 = fieldWeight in 7430, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0546875 = fieldNorm(doc=7430)
      0.2 = coord(1/5)
  0.14285715 = coord(1/7)
```
Abstract

This article describes a framework for cross-language information retrieval that efficiently leverages statistical estimation of translation probabilities. The framework provides a unified perspective into which some earlier work on techniques for cross-language information retrieval based on translation probabilities can be cast. Modeling synonymy and filtering translation probabilities using bidirectional evidence are shown to yield a balance between retrieval effectiveness and query-time (or indexing-time) efficiency that seems well suited large-scale applications. Evaluations with six test collections show consistent improvements over strong baselines.

Oard, D.W.; Resnik, P.: Support for interactive document selection in cross-language information retrieval (1999) 0.00

0.0014647568 = product of:
  0.010253297 = sum of:
    0.010253297 = product of:
      0.051266484 = sum of:
        0.051266484 = weight(_text_:retrieval in 5938) [ClassicSimilarity], result of:
          0.051266484 = score(doc=5938,freq=2.0), product of:
            0.109568894 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03622214 = queryNorm
            0.46789268 = fieldWeight in 5938, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.109375 = fieldNorm(doc=5938)
      0.2 = coord(1/5)
  0.14285715 = coord(1/7)

Oard, D.W.; Diekema, A.R.: Cross-language information retrieval (1999) 0.00

0.0014647568 = product of:
  0.010253297 = sum of:
    0.010253297 = product of:
      0.051266484 = sum of:
        0.051266484 = weight(_text_:retrieval in 4690) [ClassicSimilarity], result of:
          0.051266484 = score(doc=4690,freq=2.0), product of:
            0.109568894 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03622214 = queryNorm
            0.46789268 = fieldWeight in 4690, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.109375 = fieldNorm(doc=4690)
      0.2 = coord(1/5)
  0.14285715 = coord(1/7)

Levow, G.-A.; Oard, D.W.; Resnik, P.: Dictionary-based techniques for cross-language information retrieval (2005) 0.00
```
8.877766E-4 = product of:
  0.006214436 = sum of:
    0.006214436 = product of:
      0.03107218 = sum of:
        0.03107218 = weight(_text_:retrieval in 1025) [ClassicSimilarity], result of:
          0.03107218 = score(doc=1025,freq=4.0), product of:
            0.109568894 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03622214 = queryNorm
            0.2835858 = fieldWeight in 1025, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=1025)
      0.2 = coord(1/5)
  0.14285715 = coord(1/7)
```
Abstract

Cross-language information retrieval (CLIR) systems allow users to find documents written in different languages from that of their query. Simple knowledge structures such as bilingual term lists have proven to be a remarkably useful basis for bridging that language gap. A broad array of dictionary-based techniques have demonstrated utility, but comparison across techniques has been difficult because evaluation results often span only a limited range of conditions. This article identifies the key issues in dictionary-based CLIR, develops unified frameworks for term selection and term translation that help to explain the relationships among existing techniques, and illustrates the effect of those techniques using four contrasting languages for systematic experiments with a uniform query translation architecture. Key results include identification of a previously unseen dependence of pre- and post-translation expansion on orthographic cognates and development of a query-specific measure for translation fanout that helps to explain the utility of structured query methods.
Kim, S.; Ko, Y.; Oard, D.W.: Combining lexical and statistical translation evidence for cross-language information retrieval (2015) 0.00
```
8.877766E-4 = product of:
  0.006214436 = sum of:
    0.006214436 = product of:
      0.03107218 = sum of:
        0.03107218 = weight(_text_:retrieval in 1606) [ClassicSimilarity], result of:
          0.03107218 = score(doc=1606,freq=4.0), product of:
            0.109568894 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03622214 = queryNorm
            0.2835858 = fieldWeight in 1606, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=1606)
      0.2 = coord(1/5)
  0.14285715 = coord(1/7)
```
Abstract

This article explores how best to use lexical and statistical translation evidence together for cross-language information retrieval (CLIR). Lexical translation evidence is assembled from Wikipedia and from a large machine-readable dictionary, statistical translation evidence is drawn from parallel corpora, and evidence from co-occurrence in the document language provides a basis for limiting the adverse effect of translation ambiguity. Coverage statistics for NII Testbeds and Community for Information Access Research (NTCIR) queries confirm that these resources have complementary strengths. Experiments with translation evidence from a small parallel corpus indicate that even rather rough estimates of translation probabilities can yield further improvements over a strong technique for translation weighting based on using Jensen-Shannon divergence as a term-association measure. Finally, a novel approach to posttranslation query expansion using a random walk over the Wikipedia concept link graph is shown to yield further improvements over alternative techniques for posttranslation query expansion. Evaluation results on the NTCIR-5 English-Korean test collection show statistically significant improvements over strong baselines.

Oard, D.W.: Multilingual information access (2009) 0.00

8.3700387E-4 = product of:
  0.0058590267 = sum of:
    0.0058590267 = product of:
      0.029295133 = sum of:
        0.029295133 = weight(_text_:retrieval in 3850) [ClassicSimilarity], result of:
          0.029295133 = score(doc=3850,freq=2.0), product of:
            0.109568894 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03622214 = queryNorm
            0.26736724 = fieldWeight in 3850, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0625 = fieldNorm(doc=3850)
      0.2 = coord(1/5)
  0.14285715 = coord(1/7)

Abstract: This entry describes the process by which systems can be designed to help users find content in a language that may be different from the language of their query. The discussion of the relatively narrowly construed technical issues that are often referred to as Cross-Language Information Retrieval (CLIR) is situated in the context of important related issues such as information-seeking behavior, interaction design, and machine translation.

Gao, N.; Dredze, M.; Oard, D.W.: Person entity linking in email with NIL detection (2017) 0.00
```
8.0203614E-4 = product of:
  0.0056142528 = sum of:
    0.0056142528 = product of:
      0.028071264 = sum of:
        0.028071264 = weight(_text_:system in 3830) [ClassicSimilarity], result of:
          0.028071264 = score(doc=3830,freq=4.0), product of:
            0.11408355 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.03622214 = queryNorm
            0.24605882 = fieldWeight in 3830, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3830)
      0.2 = coord(1/5)
  0.14285715 = coord(1/7)
```
Abstract

For each specific mention of an entity found in a text, the goal of entity linking is to determine whether the referenced entity is present in an existing knowledge base, and if so to determine which KB entity is the correct referent. Entity linking has been well explored for dissemination-oriented sources such as news stories, blogs, and microblog posts, but the limited work to date on "conversational" sources such as email or text chat has not yet attempted to determine when the referent entity is not in the knowledge base (a task known as "NIL detection"). This article presents a supervised machine learning system for linking named mentions of people in email messages to a collection-specific knowledge base, and that is also capable of NIL detection. This system learns from manually annotated training examples to leverage a rich set of features. The entity linking accuracy for entities present in the knowledge base is substantially and significantly better than the best previously reported results on the Enron email collection, comparable accuracy is reported for the challenging NIL detection task, and these results are for the first time replicated on a second email collection from a different source with comparable results.
Oard, D.W.: Serving users in many languages : cross-language information retrieval for digital libraries (1997) 0.00
```
7.398139E-4 = product of:
  0.005178697 = sum of:
    0.005178697 = product of:
      0.025893483 = sum of:
        0.025893483 = weight(_text_:retrieval in 1261) [ClassicSimilarity], result of:
          0.025893483 = score(doc=1261,freq=4.0), product of:
            0.109568894 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03622214 = queryNorm
            0.23632148 = fieldWeight in 1261, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1261)
      0.2 = coord(1/5)
  0.14285715 = coord(1/7)
```
Abstract

We are rapidly constructing an extensive network infrastructure for moving information across national boundaries, but much remains to be done before linguistic barriers can be surmounted as effectively as geographic ones. Users seeking information from a digital library could benefit from the ability to query large collections once using a single language, even when more than one language is present in the collection. If the information they locate is not available in a language that they can read, some form of translation will be needed. At present, multilingual thesauri such as EUROVOC help to address this challenge by facilitating controlled vocabulary search using terms from several languages, and services such as INSPEC produce English abstracts for documents in other languages. On the other hand, support for free text searching across languages is not yet widely deployed, and fully automatic machine translation is presently neither sufficiently fast nor sufficiently accurate to adequately support interactive cross-language information seeking. An active and rapidly growing research community has coalesced around these and other related issues, applying techniques drawn from several fields - notably information retrieval and natural language processing - to provide access to large multilingual collections.
Kang, H.; Plaisant, C.; Elsayed, T.; Oard, D.W.: Making sense of archived e-mail : exploring the Enron collection with NetLens (2010) 0.00
```
6.8055023E-4 = product of:
  0.0047638514 = sum of:
    0.0047638514 = product of:
      0.023819257 = sum of:
        0.023819257 = weight(_text_:system in 3446) [ClassicSimilarity], result of:
          0.023819257 = score(doc=3446,freq=2.0), product of:
            0.11408355 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.03622214 = queryNorm
            0.20878783 = fieldWeight in 3446, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.046875 = fieldNorm(doc=3446)
      0.2 = coord(1/5)
  0.14285715 = coord(1/7)
```
Abstract

Informal communications media pose new challenges for information-systems design, but the nature of informal interaction offers new opportunities as well. This paper describes NetLens-E-mail, a system designed to support exploration of the content-actor network in large e-mail collections. Unique features of NetLens-E-mail include close coupling of orientation, specification, restriction, and expansion, and introduction and incorporation of a novel capability for iterative projection between content and actor networks within the same collection. Scenarios are presented to illustrate the intended employment of NetLens-E-mail, and design walkthroughs with two domain experts provide an initial basis for assessment of the suitability of the design by scholars and analysts.

Search (13 results, page 1 of 1)

Authors

Years

Types

Themes