Search (6 results, page 1 of 1)

Savoy, J.: Estimating the probability of an authorship attribution (2016) 0.02
```
0.01774153 = product of:
  0.07096612 = sum of:
    0.07096612 = sum of:
      0.040352322 = weight(_text_:methods in 2937) [ClassicSimilarity], result of:
        0.040352322 = score(doc=2937,freq=2.0), product of:
          0.18168657 = queryWeight, product of:
            4.0204134 = idf(docFreq=2156, maxDocs=44218)
            0.045191016 = queryNorm
          0.22209854 = fieldWeight in 2937, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            4.0204134 = idf(docFreq=2156, maxDocs=44218)
            0.0390625 = fieldNorm(doc=2937)
      0.030613795 = weight(_text_:22 in 2937) [ClassicSimilarity], result of:
        0.030613795 = score(doc=2937,freq=2.0), product of:
          0.15825124 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.045191016 = queryNorm
          0.19345059 = fieldWeight in 2937, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=2937)
  0.25 = coord(1/4)
```
Abstract

In authorship attribution, various distance-based metrics have been proposed to determine the most probable author of a disputed text. In this paradigm, a distance is computed between each author profile and the query text. These values are then employed only to rank the possible authors. In this article, we analyze their distribution and show that we can model it as a mixture of 2 Beta distributions. Based on this finding, we demonstrate how we can derive a more accurate probability that the closest author is, in fact, the real author. To evaluate this approach, we have chosen 4 authorship attribution methods (Burrows' Delta, Kullback-Leibler divergence, Labbé's intertextual distance, and the naïve Bayes). As the first test collection, we have downloaded 224 State of the Union addresses (from 1790 to 2014) delivered by 41 U.S. presidents. The second test collection is formed by the Federalist Papers. The evaluations indicate that the accuracy rate of some authorship decisions can be improved. The suggested method can signal that the proposed assignment should be interpreted as possible, without strong certainty. Being able to quantify the certainty associated with an authorship decision can be a useful component when important decisions must be taken.

Date

7. 5.2016 21:22:27

Savoy, J.; Desbois, D.: Information retrieval in hypertext systems (1991) 0.01

0.008070464 = product of:
  0.032281857 = sum of:
    0.032281857 = product of:
      0.064563714 = sum of:
        0.064563714 = weight(_text_:methods in 4452) [ClassicSimilarity], result of:
          0.064563714 = score(doc=4452,freq=2.0), product of:
            0.18168657 = queryWeight, product of:
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.045191016 = queryNorm
            0.35535768 = fieldWeight in 4452, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.0625 = fieldNorm(doc=4452)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Abstract: The emphasis in most hypertext systems is on the navigational methods, rather than on the global document retrieval mechanisms. When a search mechanism is provided, it is often restricted to simple string matching or to the Boolean model (as an alternate method). proposes a retrieval mechanism using Bayesian inference networks. The main contribution of this approach is the automatic construction of this network using the expected mutual information measure to build the inference tree, and using Jaccard's formula to define fixed conditional probability relationships

Savoy, J.: Effectiveness of information retrieval systems used in a hypertext environment (1993) 0.01

0.008070464 = product of:
  0.032281857 = sum of:
    0.032281857 = product of:
      0.064563714 = sum of:
        0.064563714 = weight(_text_:methods in 6511) [ClassicSimilarity], result of:
          0.064563714 = score(doc=6511,freq=2.0), product of:
            0.18168657 = queryWeight, product of:
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.045191016 = queryNorm
            0.35535768 = fieldWeight in 6511, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.0625 = fieldNorm(doc=6511)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Abstract: In most hypertext systems, information retrieval techniques emphasize browsing or navigational methods which are not thorough enough to find all relevant material, especially when the number of nodes and/or links becomes very large. Reviews the main query-based search techniques currently used in hypertext environments. Explains the experimental methodology. Concentrates on the retrieval effectiveness of these retrieval strategies. Considers ways of improving search effectiveness

Dolamic, L.; Savoy, J.: Retrieval effectiveness of machine translated queries (2010) 0.01
```
0.0060528484 = product of:
  0.024211394 = sum of:
    0.024211394 = product of:
      0.048422787 = sum of:
        0.048422787 = weight(_text_:methods in 4102) [ClassicSimilarity], result of:
          0.048422787 = score(doc=4102,freq=2.0), product of:
            0.18168657 = queryWeight, product of:
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.045191016 = queryNorm
            0.26651827 = fieldWeight in 4102, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.046875 = fieldNorm(doc=4102)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

This article describes and evaluates various information retrieval models used to search document collections written in English through submitting queries written in various other languages, either members of the Indo-European family (English, French, German, and Spanish) or radically different language groups such as Chinese. This evaluation method involves searching a rather large number of topics (around 300) and using two commercial machine translation systems to translate across the language barriers. In this study, mean average precision is used to measure variances in retrieval effectiveness when a query language differs from the document language. Although performance differences are rather large for certain languages pairs, this does not mean that bilingual search methods are not commercially viable. Causes of the difficulties incurred when searching or during translation are analyzed and the results of concrete examples are explained.

Picard, J.; Savoy, J.: Enhancing retrieval with hyperlinks : a general model based on propositional argumentation systems (2003) 0.01

0.0050440403 = product of:
  0.020176161 = sum of:
    0.020176161 = product of:
      0.040352322 = sum of:
        0.040352322 = weight(_text_:methods in 1427) [ClassicSimilarity], result of:
          0.040352322 = score(doc=1427,freq=2.0), product of:
            0.18168657 = queryWeight, product of:
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.045191016 = queryNorm
            0.22209854 = fieldWeight in 1427, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1427)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Footnote: Beitrag eines Themenheftes: Mathematical, logical, and formal methods in information retrieval

Savoy, J.: Authorship of Pauline epistles revisited (2019) 0.01
```
0.0050440403 = product of:
  0.020176161 = sum of:
    0.020176161 = product of:
      0.040352322 = sum of:
        0.040352322 = weight(_text_:methods in 5386) [ClassicSimilarity], result of:
          0.040352322 = score(doc=5386,freq=2.0), product of:
            0.18168657 = queryWeight, product of:
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.045191016 = queryNorm
            0.22209854 = fieldWeight in 5386, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5386)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

The name Paul appears in 13 epistles, but is he the real author? According to different biblical scholars, the number of letters really attributed to Paul varies from 4 to 13, with a majority agreeing on seven. This article proposes to revisit this authorship attribution problem by considering two effective methods (Burrows' Delta, Labbé's intertextual distance). Based on these results, a hierarchical clustering is then applied showing that four clusters can be derived, namely: {Colossians-Ephesians}, {1 and 2 Thessalonians}, {Titus, 1 and 2 Timothy}, and {Romans, Galatians, 1 and 2 Corinthians}. Moreover, a verification method based on the impostors' strategy indicates clearly that the group {Colossians-Ephesians} is written by the same author who seems not to be Paul. The same conclusion can be found for the cluster {Titus, 1 and 2 Timothy}. The Letter to Philemon stays as a singleton, without any close stylistic relationship with the other epistles. Finally, a group of four letters {Romans, Galatians, 1 and 2 Corinthians} is certainly written by the same author (Paul), but the verification protocol also indicates that 2 Corinthians is related to 1 Thessalonians, rendering a clear and simple interpretation difficult.

Search (6 results, page 1 of 1)

Authors

Years

Themes