Search (6 results, page 1 of 1)

  • × author_ss:"Savoy, J."
  1. Savoy, J.: Estimating the probability of an authorship attribution (2016) 0.05
    0.05040447 = product of:
      0.075606704 = sum of:
        0.05889038 = weight(_text_:query in 2937) [ClassicSimilarity], result of:
          0.05889038 = score(doc=2937,freq=2.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.25674784 = fieldWeight in 2937, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2937)
        0.016716326 = product of:
          0.03343265 = sum of:
            0.03343265 = weight(_text_:22 in 2937) [ClassicSimilarity], result of:
              0.03343265 = score(doc=2937,freq=2.0), product of:
                0.1728227 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049352113 = queryNorm
                0.19345059 = fieldWeight in 2937, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2937)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    In authorship attribution, various distance-based metrics have been proposed to determine the most probable author of a disputed text. In this paradigm, a distance is computed between each author profile and the query text. These values are then employed only to rank the possible authors. In this article, we analyze their distribution and show that we can model it as a mixture of 2 Beta distributions. Based on this finding, we demonstrate how we can derive a more accurate probability that the closest author is, in fact, the real author. To evaluate this approach, we have chosen 4 authorship attribution methods (Burrows' Delta, Kullback-Leibler divergence, Labbé's intertextual distance, and the naïve Bayes). As the first test collection, we have downloaded 224 State of the Union addresses (from 1790 to 2014) delivered by 41 U.S. presidents. The second test collection is formed by the Federalist Papers. The evaluations indicate that the accuracy rate of some authorship decisions can be improved. The suggested method can signal that the proposed assignment should be interpreted as possible, without strong certainty. Being able to quantify the certainty associated with an authorship decision can be a useful component when important decisions must be taken.
    Date
    7. 5.2016 21:22:27
  2. Abdou, S.; Savoy, J.: Searching in Medline : query expansion and manual indexing evaluation (2008) 0.03
    0.03331343 = product of:
      0.09994029 = sum of:
        0.09994029 = weight(_text_:query in 2062) [ClassicSimilarity], result of:
          0.09994029 = score(doc=2062,freq=4.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.43571556 = fieldWeight in 2062, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.046875 = fieldNorm(doc=2062)
      0.33333334 = coord(1/3)
    
    Abstract
    Based on a relatively large subset representing one third of the Medline collection, this paper evaluates ten different IR models, including recent developments in both probabilistic and language models. We show that the best performing IR models is a probabilistic model developed within the Divergence from Randomness framework [Amati, G., & van Rijsbergen, C.J. (2002) Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM-Transactions on Information Systems 20(4), 357-389], which result in 170% enhancements in mean average precision when compared to the classical tf idf vector-space model. This paper also reports on our impact evaluations on the retrieval effectiveness of manually assigned descriptors (MeSH or Medical Subject Headings), showing that by including these terms retrieval performance can improve from 2.4% to 13.5%, depending on the underling IR model. Finally, we design a new general blind-query expansion approach showing improved retrieval performances compared to those obtained using the Rocchio approach.
  3. Savoy, J.: Effectiveness of information retrieval systems used in a hypertext environment (1993) 0.03
    0.031408206 = product of:
      0.09422461 = sum of:
        0.09422461 = weight(_text_:query in 6511) [ClassicSimilarity], result of:
          0.09422461 = score(doc=6511,freq=2.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.41079655 = fieldWeight in 6511, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0625 = fieldNorm(doc=6511)
      0.33333334 = coord(1/3)
    
    Abstract
    In most hypertext systems, information retrieval techniques emphasize browsing or navigational methods which are not thorough enough to find all relevant material, especially when the number of nodes and/or links becomes very large. Reviews the main query-based search techniques currently used in hypertext environments. Explains the experimental methodology. Concentrates on the retrieval effectiveness of these retrieval strategies. Considers ways of improving search effectiveness
  4. Savoy, J.: Ranking schemes in hybrid Boolean systems : a new approach (1997) 0.02
    0.023556154 = product of:
      0.07066846 = sum of:
        0.07066846 = weight(_text_:query in 393) [ClassicSimilarity], result of:
          0.07066846 = score(doc=393,freq=2.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.30809742 = fieldWeight in 393, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.046875 = fieldNorm(doc=393)
      0.33333334 = coord(1/3)
    
    Abstract
    In most commercial online systems, the retrieval system is based on the Boolean model and its inverted file organization. Since the investment in these systems is so great and changing them could be economically unfeasible, this article suggests a new ranking scheme especially adapted for hypertext environments in order to produce more effective retrieval results and yet maintain the effectiveness of the investment made to date in the Boolean model. To select the retrieved documents, the suggested ranking strategy uses multiple sources of document content evidence. The proposed scheme integrates both the information provided by the index and query terms, and the inherent relationships between documents such as bibliographic references or hypertext links. We will demonstrate that our scheme represents an integration of both subject and citation indexing, and results in a significant imporvement over classical ranking schemes uses in hybrid Boolean systems, while preserving its efficiency. Moreover, through knowing the nearest neighbor and the hypertext links which constitute additional sources of evidence, our strategy will take them into account in order to further improve retrieval effectiveness and to provide 'good' starting points for browsing in a hypertext or hypermedia environement
  5. Dolamic, L.; Savoy, J.: Retrieval effectiveness of machine translated queries (2010) 0.02
    0.023556154 = product of:
      0.07066846 = sum of:
        0.07066846 = weight(_text_:query in 4102) [ClassicSimilarity], result of:
          0.07066846 = score(doc=4102,freq=2.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.30809742 = fieldWeight in 4102, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.046875 = fieldNorm(doc=4102)
      0.33333334 = coord(1/3)
    
    Abstract
    This article describes and evaluates various information retrieval models used to search document collections written in English through submitting queries written in various other languages, either members of the Indo-European family (English, French, German, and Spanish) or radically different language groups such as Chinese. This evaluation method involves searching a rather large number of topics (around 300) and using two commercial machine translation systems to translate across the language barriers. In this study, mean average precision is used to measure variances in retrieval effectiveness when a query language differs from the document language. Although performance differences are rather large for certain languages pairs, this does not mean that bilingual search methods are not commercially viable. Causes of the difficulties incurred when searching or during translation are analyzed and the results of concrete examples are explained.
  6. Savoy, J.: ¬An extended vector-processing scheme for searching information in hypertext systems (1996) 0.02
    0.019630127 = product of:
      0.05889038 = sum of:
        0.05889038 = weight(_text_:query in 4036) [ClassicSimilarity], result of:
          0.05889038 = score(doc=4036,freq=2.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.25674784 = fieldWeight in 4036, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4036)
      0.33333334 = coord(1/3)
    
    Abstract
    When searching information in a hypertext is limited to navigation, it is not an easy task, especially when the number of nodes and/or links becomes very large. A query based access mechanism must therefore be provided to complement the navigational tools inherent in hypertext systems. Most mechanisms currently proposed are based on conventional information retrieval models which consider documents as indepent entities, and ignore hypertext links. To promote the use of other information retrieval mechnaisms adapted to hypertext systems, responds to the following questions; how can we integrate information given by hypertext links into an information retrieval scheme; are these hypertext links (and link semantics) clues to the enhancement of retrieval effectiveness; if so, how can we use them. 2 solutions are: using a default weight function based on link tape or assigning the same strength to all link types; or using a specific weight for each particular link, i.e. the level of association or a similarity measure. Proposes an extended vector processing scheme which extracts additional information from hypertext links to enhance retrieval effectiveness. A hypertext based on 2 medium size collections, the CACM and the CISI collection has been built. The hypergraph is composed of explicit links (bibliographic references), computed links based on bibliographic information, or on hypertext links established according to document representatives (nearest neighbour)