Search (10 results, page 1 of 1)

Savoy, J.: Estimating the probability of an authorship attribution (2016) 0.03
```
0.026587836 = product of:
  0.053175673 = sum of:
    0.053175673 = sum of:
      0.017662432 = weight(_text_:2 in 2937) [ClassicSimilarity], result of:
        0.017662432 = score(doc=2937,freq=2.0), product of:
          0.1294644 = queryWeight, product of:
            2.4695914 = idf(docFreq=10170, maxDocs=44218)
            0.05242341 = queryNorm
          0.13642694 = fieldWeight in 2937, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            2.4695914 = idf(docFreq=10170, maxDocs=44218)
            0.0390625 = fieldNorm(doc=2937)
      0.03551324 = weight(_text_:22 in 2937) [ClassicSimilarity], result of:
        0.03551324 = score(doc=2937,freq=2.0), product of:
          0.18357785 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05242341 = queryNorm
          0.19345059 = fieldWeight in 2937, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=2937)
  0.5 = coord(1/2)
```
Abstract

In authorship attribution, various distance-based metrics have been proposed to determine the most probable author of a disputed text. In this paradigm, a distance is computed between each author profile and the query text. These values are then employed only to rank the possible authors. In this article, we analyze their distribution and show that we can model it as a mixture of 2 Beta distributions. Based on this finding, we demonstrate how we can derive a more accurate probability that the closest author is, in fact, the real author. To evaluate this approach, we have chosen 4 authorship attribution methods (Burrows' Delta, Kullback-Leibler divergence, Labbé's intertextual distance, and the naïve Bayes). As the first test collection, we have downloaded 224 State of the Union addresses (from 1790 to 2014) delivered by 41 U.S. presidents. The second test collection is formed by the Federalist Papers. The evaluations indicate that the accuracy rate of some authorship decisions can be improved. The suggested method can signal that the proposed assignment should be interpreted as possible, without strong certainty. Being able to quantify the certainty associated with an authorship decision can be a useful component when important decisions must be taken.

Date

7. 5.2016 21:22:27
Savoy, J.: Authorship of Pauline epistles revisited (2019) 0.01
```
0.010815986 = product of:
  0.021631973 = sum of:
    0.021631973 = product of:
      0.043263946 = sum of:
        0.043263946 = weight(_text_:2 in 5386) [ClassicSimilarity], result of:
          0.043263946 = score(doc=5386,freq=12.0), product of:
            0.1294644 = queryWeight, product of:
              2.4695914 = idf(docFreq=10170, maxDocs=44218)
              0.05242341 = queryNorm
            0.3341764 = fieldWeight in 5386, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              2.4695914 = idf(docFreq=10170, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5386)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The name Paul appears in 13 epistles, but is he the real author? According to different biblical scholars, the number of letters really attributed to Paul varies from 4 to 13, with a majority agreeing on seven. This article proposes to revisit this authorship attribution problem by considering two effective methods (Burrows' Delta, Labbé's intertextual distance). Based on these results, a hierarchical clustering is then applied showing that four clusters can be derived, namely: {Colossians-Ephesians}, {1 and 2 Thessalonians}, {Titus, 1 and 2 Timothy}, and {Romans, Galatians, 1 and 2 Corinthians}. Moreover, a verification method based on the impostors' strategy indicates clearly that the group {Colossians-Ephesians} is written by the same author who seems not to be Paul. The same conclusion can be found for the cluster {Titus, 1 and 2 Timothy}. The Letter to Philemon stays as a singleton, without any close stylistic relationship with the other epistles. Finally, a group of four letters {Romans, Galatians, 1 and 2 Corinthians} is certainly written by the same author (Paul), but the verification protocol also indicates that 2 Corinthians is related to 1 Thessalonians, rendering a clear and simple interpretation difficult.
Savoy, J.: ¬An extended vector-processing scheme for searching information in hypertext systems (1996) 0.01
```
0.0076480582 = product of:
  0.0152961165 = sum of:
    0.0152961165 = product of:
      0.030592233 = sum of:
        0.030592233 = weight(_text_:2 in 4036) [ClassicSimilarity], result of:
          0.030592233 = score(doc=4036,freq=6.0), product of:
            0.1294644 = queryWeight, product of:
              2.4695914 = idf(docFreq=10170, maxDocs=44218)
              0.05242341 = queryNorm
            0.23629841 = fieldWeight in 4036, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              2.4695914 = idf(docFreq=10170, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4036)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

When searching information in a hypertext is limited to navigation, it is not an easy task, especially when the number of nodes and/or links becomes very large. A query based access mechanism must therefore be provided to complement the navigational tools inherent in hypertext systems. Most mechanisms currently proposed are based on conventional information retrieval models which consider documents as indepent entities, and ignore hypertext links. To promote the use of other information retrieval mechnaisms adapted to hypertext systems, responds to the following questions; how can we integrate information given by hypertext links into an information retrieval scheme; are these hypertext links (and link semantics) clues to the enhancement of retrieval effectiveness; if so, how can we use them. 2 solutions are: using a default weight function based on link tape or assigning the same strength to all link types; or using a specific weight for each particular link, i.e. the level of association or a similarity measure. Proposes an extended vector processing scheme which extracts additional information from hypertext links to enhance retrieval effectiveness. A hypertext based on 2 medium size collections, the CACM and the CISI collection has been built. The hypergraph is composed of explicit links (bibliographic references), computed links based on bibliographic information, or on hypertext links established according to document representatives (nearest neighbour)

Source

Information processing and management. 32(1996) no.2, S.155-170

Savoy, J.; Desbois, D.: Information retrieval in hypertext systems (1991) 0.01

0.007064973 = product of:
  0.014129946 = sum of:
    0.014129946 = product of:
      0.028259892 = sum of:
        0.028259892 = weight(_text_:2 in 4452) [ClassicSimilarity], result of:
          0.028259892 = score(doc=4452,freq=2.0), product of:
            0.1294644 = queryWeight, product of:
              2.4695914 = idf(docFreq=10170, maxDocs=44218)
              0.05242341 = queryNorm
            0.2182831 = fieldWeight in 4452, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.4695914 = idf(docFreq=10170, maxDocs=44218)
              0.0625 = fieldNorm(doc=4452)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Electronic publishing. 4(1991) no.2, S.87-108

Savoy, J.: ¬A learning scheme for information retrieval in hypertext (1994) 0.01

0.007064973 = product of:
  0.014129946 = sum of:
    0.014129946 = product of:
      0.028259892 = sum of:
        0.028259892 = weight(_text_:2 in 7292) [ClassicSimilarity], result of:
          0.028259892 = score(doc=7292,freq=2.0), product of:
            0.1294644 = queryWeight, product of:
              2.4695914 = idf(docFreq=10170, maxDocs=44218)
              0.05242341 = queryNorm
            0.2182831 = fieldWeight in 7292, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.4695914 = idf(docFreq=10170, maxDocs=44218)
              0.0625 = fieldNorm(doc=7292)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: In proposing a searching strategy well suited to the hypertext environment, we have considered four criteria: (1) the retrieval scheme should be integrated into a large hypertext environment; (2) the retrieval process should be operable with an unrestricted text collection; (3) the processing time should be reasonable; and (4) the system should be capable of learning in order to improve its retrieval effectiveness

Savoy, J.: Searching information in legal hypertext systems (1993/94) 0.01

0.007064973 = product of:
  0.014129946 = sum of:
    0.014129946 = product of:
      0.028259892 = sum of:
        0.028259892 = weight(_text_:2 in 757) [ClassicSimilarity], result of:
          0.028259892 = score(doc=757,freq=2.0), product of:
            0.1294644 = queryWeight, product of:
              2.4695914 = idf(docFreq=10170, maxDocs=44218)
              0.05242341 = queryNorm
            0.2182831 = fieldWeight in 757, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.4695914 = idf(docFreq=10170, maxDocs=44218)
              0.0625 = fieldNorm(doc=757)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Artificial intelligence and law. 2(1993/94) no.3, S.205-232

Ikae, C.; Savoy, J.: Gender identification on Twitter (2022) 0.01
```
0.006244613 = product of:
  0.012489226 = sum of:
    0.012489226 = product of:
      0.024978451 = sum of:
        0.024978451 = weight(_text_:2 in 445) [ClassicSimilarity], result of:
          0.024978451 = score(doc=445,freq=4.0), product of:
            0.1294644 = queryWeight, product of:
              2.4695914 = idf(docFreq=10170, maxDocs=44218)
              0.05242341 = queryNorm
            0.19293682 = fieldWeight in 445, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.4695914 = idf(docFreq=10170, maxDocs=44218)
              0.0390625 = fieldNorm(doc=445)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

To determine the author of a text's gender, various feature types have been suggested (e.g., function words, n-gram of letters, etc.) leading to a huge number of stylistic markers. To determine the target category, different machine learning models have been suggested (e.g., logistic regression, decision tree, k nearest-neighbors, support vector machine, naïve Bayes, neural networks, and random forest). In this study, our first objective is to know whether or not the same model always proposes the best effectiveness when considering similar corpora under the same conditions. Thus, based on 7 CLEF-PAN collections, this study analyzes the effectiveness of 10 different classifiers. Our second aim is to propose a 2-stage feature selection to reduce the feature size to a few hundred terms without any significant change in the performance level compared to approaches using all the attributes (increase of around 5% after applying the proposed feature selection). Based on our experiments, neural network or random forest tend, on average, to produce the highest effectiveness. Moreover, empirical evidence indicates that reducing the feature set size to around 300 without penalizing the effectiveness is possible. Finally, based on such reduced feature sizes, an analysis reveals some of the specific terms that clearly discriminate between the 2 genders.

Abdou, S.; Savoy, J.: Searching in Medline : query expansion and manual indexing evaluation (2008) 0.01

0.0052987295 = product of:
  0.010597459 = sum of:
    0.010597459 = product of:
      0.021194918 = sum of:
        0.021194918 = weight(_text_:2 in 2062) [ClassicSimilarity], result of:
          0.021194918 = score(doc=2062,freq=2.0), product of:
            0.1294644 = queryWeight, product of:
              2.4695914 = idf(docFreq=10170, maxDocs=44218)
              0.05242341 = queryNorm
            0.16371232 = fieldWeight in 2062, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.4695914 = idf(docFreq=10170, maxDocs=44218)
              0.046875 = fieldNorm(doc=2062)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Information processing and management. 44(2008) no.2, S.781-789

Picard, J.; Savoy, J.: Enhancing retrieval with hyperlinks : a general model based on propositional argumentation systems (2003) 0.00
```
0.004415608 = product of:
  0.008831216 = sum of:
    0.008831216 = product of:
      0.017662432 = sum of:
        0.017662432 = weight(_text_:2 in 1427) [ClassicSimilarity], result of:
          0.017662432 = score(doc=1427,freq=2.0), product of:
            0.1294644 = queryWeight, product of:
              2.4695914 = idf(docFreq=10170, maxDocs=44218)
              0.05242341 = queryNorm
            0.13642694 = fieldWeight in 1427, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.4695914 = idf(docFreq=10170, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1427)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Fast, effective, and adaptable techniques are needed to automatically organize and retrieve information an the ever-increasing World Wide Web. In that respect, different strategies have been suggested to take hypertext links into account. For example, hyperlinks have been used to (1) enhance document representation, (2) improve document ranking by propagating document score, (3) provide an indicator of popularity, and (4) find hubs and authorities for a given topic. Although the TREC experiments have not demonstrated the usefulness of hyperlinks for retrieval, the hypertext structure is nevertheless an essential aspect of the Web, and as such, should not be ignored. The development of abstract models of the IR task was a key factor to the improvement of search engines. However, at this time conceptual tools for modeling the hypertext retrieval task are lacking, making it difficult to compare, improve, and reason an the existing techniques. This article proposes a general model for using hyperlinks based an Probabilistic Argumentation Systems, in which each of the above-mentioned techniques can be stated. This model will allow to discover some inconsistencies in the mentioned techniques, and to take a higher level and systematic approach for using hyperlinks for retrieval.

Dolamic, L.; Savoy, J.: Indexing and searching strategies for the Russian language (2009) 0.00

0.004415608 = product of:
  0.008831216 = sum of:
    0.008831216 = product of:
      0.017662432 = sum of:
        0.017662432 = weight(_text_:2 in 3301) [ClassicSimilarity], result of:
          0.017662432 = score(doc=3301,freq=2.0), product of:
            0.1294644 = queryWeight, product of:
              2.4695914 = idf(docFreq=10170, maxDocs=44218)
              0.05242341 = queryNorm
            0.13642694 = fieldWeight in 3301, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.4695914 = idf(docFreq=10170, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3301)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 2. 2.2010 19:43:51

Search (10 results, page 1 of 1)

Authors

Years

Themes