Search (6 results, page 1 of 1)

Sebastiani, F.: On the role of logic in information retrieval (1998) 0.00
```
0.0046377215 = product of:
  0.03246405 = sum of:
    0.011280581 = weight(_text_:information in 1140) [ClassicSimilarity], result of:
      0.011280581 = score(doc=1140,freq=10.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.21684799 = fieldWeight in 1140, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1140)
    0.021183468 = weight(_text_:retrieval in 1140) [ClassicSimilarity], result of:
      0.021183468 = score(doc=1140,freq=4.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.23632148 = fieldWeight in 1140, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1140)
  0.14285715 = coord(2/14)
```
Abstract

The logical approach to information retrieval has recently been the object of active research. It is our contention that researchers have put a lot of effort in trying to address some difficult problems of IR within this framework, but little effort in checking that the resulting models satisfy those well-formedness criteria that, in the field of mathematical logic, are considered essential and conductive to effective modelling of a real-world phenomenon. The main motivation of this paper is not to propose a new logical model of IR, but to discuss some central issues in the application of logic to IR. The first issue we touch upon is the logical relationship we might want to enforce between formulae d, representing a document and n, representing an information need; we analyse the different implications of models based on truth, validity or logical consequentiality. The relationship between this issue and the issue of partiality vs. totality of information is subsequently analysed, in the context of a broader discussion of the role of denotational semantics in IR modelling. Finally, the relationship between the paradoxes of material implication and the (in)adequacy of classical logic for IR modelling purposes is discusses

Source

Information processing and management. 34(1998) no.1, S.1-18

Sebastiani, F.: Classification of text, automatic (2006) 0.00

0.004004761 = product of:
  0.028033325 = sum of:
    0.0070627616 = weight(_text_:information in 5003) [ClassicSimilarity], result of:
      0.0070627616 = score(doc=5003,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.13576832 = fieldWeight in 5003, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5003)
    0.020970564 = weight(_text_:retrieval in 5003) [ClassicSimilarity], result of:
      0.020970564 = score(doc=5003,freq=2.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.23394634 = fieldWeight in 5003, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5003)
  0.14285715 = coord(2/14)

Abstract: Automatic text classification (ATC) is a discipline at the crossroads of information retrieval (IR), machine learning (ML), and computational linguistics (CL), and consists in the realization of text classifiers, i.e. software systems capable of assigning texts to one or more categories, or classes, from a predefined set. Applications range from the automated indexing of scientific articles, to e-mail routing, spam filtering, authorship attribution, and automated survey coding. This article will focus on the ML approach to ATC, whereby a software system (called the learner) automatically builds a classifier for the categories of interest by generalizing from a "training" set of pre-classified texts.

Debole, F.; Sebastiani, F.: ¬An analysis of the relative hardness of Reuters-21578 subsets (2005) 0.00
```
0.0031590632 = product of:
  0.02211344 = sum of:
    0.0071344664 = weight(_text_:information in 3456) [ClassicSimilarity], result of:
      0.0071344664 = score(doc=3456,freq=4.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.13714671 = fieldWeight in 3456, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3456)
    0.014978974 = weight(_text_:retrieval in 3456) [ClassicSimilarity], result of:
      0.014978974 = score(doc=3456,freq=2.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.16710453 = fieldWeight in 3456, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3456)
  0.14285715 = coord(2/14)
```
Abstract

The existence, public availability, and widespread acceptance of a standard benchmark for a given information retrieval (IR) task are beneficial to research an this task, because they allow different researchers to experimentally compare their own systems by comparing the results they have obtained an this benchmark. The Reuters-21578 test collection, together with its earlier variants, has been such a standard benchmark for the text categorization (TC) task throughout the last 10 years. However, the benefits that this has brought about have somehow been limited by the fact that different researchers have "carved" different subsets out of this collection and tested their systems an one of these subsets only; systems that have been tested an different Reuters-21578 subsets are thus not readily comparable. In this article, we present a systematic, comparative experimental study of the three subsets of Reuters-21578 that have been most popular among TC researchers. The results we obtain allow us to determine the relative hardness of these subsets, thus establishing an indirect means for comparing TC systems that have, or will be, tested an these different subsets.

Source

Journal of the American Society for Information Science and Technology. 56(2005) no.6, S.584-596

Corbara, S.; Moreo, A.; Sebastiani, F.: Syllabic quantity patterns as rhythmic features for Latin authorship attribution (2023) 0.00

4.32414E-4 = product of:
  0.0060537956 = sum of:
    0.0060537956 = weight(_text_:information in 846) [ClassicSimilarity], result of:
      0.0060537956 = score(doc=846,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.116372846 = fieldWeight in 846, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=846)
  0.071428575 = coord(1/14)

Source: Journal of the Association for Information Science and Technology. 74(2023) no.1, S.128-141

Giorgetti, D.; Sebastiani, F.: Automating survey coding by multiclass text categorization techniques (2003) 0.00

3.6034497E-4 = product of:
  0.0050448296 = sum of:
    0.0050448296 = weight(_text_:information in 5172) [ClassicSimilarity], result of:
      0.0050448296 = score(doc=5172,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.09697737 = fieldWeight in 5172, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5172)
  0.071428575 = coord(1/14)

Source: Journal of the American Society for Information Science and technology. 54(2003) no.14, S.1269-1277

Fagni, T.; Sebastiani, F.: Selecting negative examples for hierarchical text classification: An experimental comparison (2010) 0.00

3.6034497E-4 = product of:
  0.0050448296 = sum of:
    0.0050448296 = weight(_text_:information in 4101) [ClassicSimilarity], result of:
      0.0050448296 = score(doc=4101,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.09697737 = fieldWeight in 4101, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4101)
  0.071428575 = coord(1/14)

Source: Journal of the American Society for Information Science and Technology. 61(2010) no.11, S.2256-2265

Search (6 results, page 1 of 1)

Authors

Years

Themes