Search (6 results, page 1 of 1)

Bacchin, M.; Ferro, N.; Melucci, M.: ¬A probabilistic model for stemmer generation (2005) 0.00
```
0.0026061484 = product of:
  0.0052122967 = sum of:
    0.0052122967 = product of:
      0.010424593 = sum of:
        0.010424593 = weight(_text_:a in 1001) [ClassicSimilarity], result of:
          0.010424593 = score(doc=1001,freq=12.0), product of:
            0.04772363 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.041389145 = queryNorm
            0.21843673 = fieldWeight in 1001, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1001)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

In this paper we will present a language-independent probabilistic model which can automatically generate stemmers. Stemmers can improve the retrieval effectiveness of information retrieval systems, however the designing and the implementation of stemmers requires a laborious amount of effort due to the fact that documents and queries are often written or spoken in several different languages. The probabilistic model proposed in this paper aims at the development of stemmers used for several languages. The proposed model describes the mutual reinforcement relationship between stems and derivations and then provides a probabilistic interpretation. A series of experiments shows that the stemmers generated by the probabilistic model are as effective as the ones based on linguistic knowledge.

Type

a
Ferro, N.; Silvello, G.; Keskustalo, H.; Pirkola, A.; Järvelin, K.: ¬The twist measure for IR evaluation : taking user's effort into account (2016) 0.00
```
0.0024032309 = product of:
  0.0048064617 = sum of:
    0.0048064617 = product of:
      0.0096129235 = sum of:
        0.0096129235 = weight(_text_:a in 2771) [ClassicSimilarity], result of:
          0.0096129235 = score(doc=2771,freq=20.0), product of:
            0.04772363 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.041389145 = queryNorm
            0.20142901 = fieldWeight in 2771, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2771)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

We present a novel measure for ranking evaluation, called Twist (t). It is a measure for informational intents, which handles both binary and graded relevance. t stems from the observation that searching is currently a that searching is currently taken for granted and it is natural for users to assume that search engines are available and work well. As a consequence, users may assume the utility they have in finding relevant documents, which is the focus of traditional measures, as granted. On the contrary, they may feel uneasy when the system returns nonrelevant documents because they are then forced to do additional work to get the desired information, and this causes avoidable effort. The latter is the focus of t, which evaluates the effectiveness of a system from the point of view of the effort required to the users to retrieve the desired information. We provide a formal definition of t, a demonstration of its properties, and introduce the notion of effort/gain plots, which complement traditional utility-based measures. By means of an extensive experimental evaluation, t is shown to grasp different aspects of system performances, to not require extensive and costly assessments, and to be a robust tool for detecting differences between systems.

Type

a

Agosti, M.; Braschler, M.; Ferro, N.; Peters, C.; Siebinga, S.: Roadmap for multiLingual information access in the European Library (2007) 0.00

0.0021279112 = product of:
  0.0042558224 = sum of:
    0.0042558224 = product of:
      0.008511645 = sum of:
        0.008511645 = weight(_text_:a in 2431) [ClassicSimilarity], result of:
          0.008511645 = score(doc=2431,freq=8.0), product of:
            0.04772363 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.041389145 = queryNorm
            0.17835285 = fieldWeight in 2431, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2431)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: The paper studies the problem of implementing MultiLingual Information Access (MLIA) functionality in The European Library (TEL). The issues that must be considered are described in detail and the results of a preliminary feasibility study are presented. The paper concludes by discussing the difficulties inherent in attempting to provide a realistic full-scale MLIA solution and proposes a roadmap aimed at determining whether this is in fact possible.
Type: a

Angelini, M.; Fazzini, V.; Ferro, N.; Santucci, G.; Silvello, G.: CLAIRE: A combinatorial visual analytics system for information retrieval evaluation (2018) 0.00
```
0.0020106873 = product of:
  0.0040213745 = sum of:
    0.0040213745 = product of:
      0.008042749 = sum of:
        0.008042749 = weight(_text_:a in 5049) [ClassicSimilarity], result of:
          0.008042749 = score(doc=5049,freq=14.0), product of:
            0.04772363 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.041389145 = queryNorm
            0.1685276 = fieldWeight in 5049, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5049)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Information Retrieval (IR) develops complex systems, constituted of several components, which aim at returning and optimally ranking the most relevant documents in response to user queries. In this context, experimental evaluation plays a central role, since it allows for measuring IR systems effectiveness, increasing the understanding of their functioning, and better directing the efforts for improving them. Current evaluation methodologies are limited by two major factors: (i) IR systems are evaluated as "black boxes", since it is not possible to decompose the contributions of the different components, e.g., stop lists, stemmers, and IR models; (ii) given that it is not possible to predict the effectiveness of an IR system, both academia and industry need to explore huge numbers of systems, originated by large combinatorial compositions of their components, to understand how they perform and how these components interact together. We propose a Combinatorial visuaL Analytics system for Information Retrieval Evaluation (CLAIRE) which allows for exploring and making sense of the performances of a large amount of IR systems, in order to quickly and intuitively grasp which system configurations are preferred, what are the contributions of the different components and how these components interact together. The CLAIRE system is then validated against use cases based on several test collections using a wide set of systems, generated by a combinatorial composition of several off-the-shelf components, representing the most common denominator almost always present in English IR systems. In particular, we validate the findings enabled by CLAIRE with respect to consolidated deep statistical analyses and we show that the CLAIRE system allows the generation of new insights, which were not detectable with traditional approaches.

Type

a
Ferro, N.; Silvello, G.: Toward an anatomy of IR system component performances (2018) 0.00
```
0.0016993409 = product of:
  0.0033986818 = sum of:
    0.0033986818 = product of:
      0.0067973635 = sum of:
        0.0067973635 = weight(_text_:a in 4035) [ClassicSimilarity], result of:
          0.0067973635 = score(doc=4035,freq=10.0), product of:
            0.04772363 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.041389145 = queryNorm
            0.14243183 = fieldWeight in 4035, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4035)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Information retrieval (IR) systems are the prominent means for searching and accessing huge amounts of unstructured information on the web and elsewhere. They are complex systems, constituted by many different components interacting together, and evaluation is crucial to both tune and improve them. Nevertheless, in the current evaluation methodology, there is still no way to determine how much each component contributes to the overall performances and how the components interact together. This hampers the possibility of a deep understanding of IR system behavior and, in turn, prevents us from designing ahead which components are best suited to work together for a specific search task. In this paper, we move the evaluation methodology one step forward by overcoming these barriers and beginning to devise an "anatomy" of IR systems and their internals. In particular, we propose a methodology based on the General Linear Mixed Model (GLMM) and analysis of variance (ANOVA) to develop statistical models able to isolate system variance and component effects as well as their interaction, by relying on a grid of points (GoP) containing all the combinations of the analyzed components. We apply the proposed methodology to the analysis of two relevant search tasks-news search and web search-by using standard TREC collections. We analyze the basic set of components typically part of an IR system, namely, stop lists, stemmers, and n-grams, and IR models. In this way, we derive insights about English text retrieval.

Type

a
Ferro, N.; Silvello, G.: NESTOR: a formal model for digital archives (2013) 0.00
```
0.0015199365 = product of:
  0.003039873 = sum of:
    0.003039873 = product of:
      0.006079746 = sum of:
        0.006079746 = weight(_text_:a in 2707) [ClassicSimilarity], result of:
          0.006079746 = score(doc=2707,freq=8.0), product of:
            0.04772363 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.041389145 = queryNorm
            0.12739488 = fieldWeight in 2707, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2707)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Archives are an extremely valuable part of our cultural heritage since they represent the trace of the activities of a physical or juridical person in the course of their business. Despite their importance, the models and technologies that have been developed over the past two decades in the Digital Library (DL) field have not been specifically tailored to archives. This is especially true when it comes to formal and foundational frameworks, as the Streams, Structures, Spaces, Scenarios, Societies (5S) model is. Therefore, we propose an innovative formal model, called NEsted SeTs for Object hieRarchies (NESTOR), for archives, explicitly built around the concepts of context and hierarchy which play a central role in the archival realm. NESTOR is composed of two set-based data models: the Nested Sets Model (NS-M) and the Inverse Nested Sets Model (INS-M) that express the hierarchical relationships between objects through the inclusion property between sets. We formally study the properties of these models and prove their equivalence with the notion of hierarchy entailed by archives. We then use NESTOR to extend the 5S model in order to take into account the specific features of archives and to tailor the notion of digital library accordingly. This offers the possibility of opening up the full wealth of DL methods and technologies to archives. We demonstrate the impact of NESTOR on this problem through three example use cases.

Type

a

Search (6 results, page 1 of 1)

Authors

Years

Themes