Search (19 results, page 1 of 1)

Savoy, J.: Estimating the probability of an authorship attribution (2016) 0.01

0.012610499 = product of:
  0.025220998 = sum of:
    0.009207015 = weight(_text_:for in 2937) [ClassicSimilarity], result of:
      0.009207015 = score(doc=2937,freq=2.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.103720546 = fieldWeight in 2937, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2937)
    0.016013984 = product of:
      0.032027967 = sum of:
        0.032027967 = weight(_text_:22 in 2937) [ClassicSimilarity], result of:
          0.032027967 = score(doc=2937,freq=2.0), product of:
            0.16556148 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.047278564 = queryNorm
            0.19345059 = fieldWeight in 2937, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2937)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Date: 7. 5.2016 21:22:27
Source: Journal of the Association for Information Science and Technology. 67(2016) no.6, S.1462-1472

Savoy, J.: ¬A stemming procedure and stopword list for general French Corpora (1999) 0.01

0.0078124115 = product of:
  0.031249646 = sum of:
    0.031249646 = weight(_text_:for in 4314) [ClassicSimilarity], result of:
      0.031249646 = score(doc=4314,freq=4.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.35203922 = fieldWeight in 4314, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.09375 = fieldNorm(doc=4314)
  0.25 = coord(1/4)

Source: Journal of the American Society for Information Science. 50(1999) no.10, S.944-954

Savoy, J.: Stemming of French words based on grammatical categories (1993) 0.01

0.0073656123 = product of:
  0.02946245 = sum of:
    0.02946245 = weight(_text_:for in 4650) [ClassicSimilarity], result of:
      0.02946245 = score(doc=4650,freq=2.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.33190575 = fieldWeight in 4650, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.125 = fieldNorm(doc=4650)
  0.25 = coord(1/4)

Source: Journal of the American Society for Information Science. 44(1993) no.1, S.1-9

Picard, J.; Savoy, J.: Enhancing retrieval with hyperlinks : a general model based on propositional argumentation systems (2003) 0.01
```
0.006510343 = product of:
  0.026041372 = sum of:
    0.026041372 = weight(_text_:for in 1427) [ClassicSimilarity], result of:
      0.026041372 = score(doc=1427,freq=16.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.29336601 = fieldWeight in 1427, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1427)
  0.25 = coord(1/4)
```
Abstract

Fast, effective, and adaptable techniques are needed to automatically organize and retrieve information an the ever-increasing World Wide Web. In that respect, different strategies have been suggested to take hypertext links into account. For example, hyperlinks have been used to (1) enhance document representation, (2) improve document ranking by propagating document score, (3) provide an indicator of popularity, and (4) find hubs and authorities for a given topic. Although the TREC experiments have not demonstrated the usefulness of hyperlinks for retrieval, the hypertext structure is nevertheless an essential aspect of the Web, and as such, should not be ignored. The development of abstract models of the IR task was a key factor to the improvement of search engines. However, at this time conceptual tools for modeling the hypertext retrieval task are lacking, making it difficult to compare, improve, and reason an the existing techniques. This article proposes a general model for using hyperlinks based an Probabilistic Argumentation Systems, in which each of the above-mentioned techniques can be stated. This model will allow to discover some inconsistencies in the mentioned techniques, and to take a higher level and systematic approach for using hyperlinks for retrieval.

Source

Journal of the American Society for Information Science and technology. 54(2003) no.4, S.347-355
Dolamic, L.; Savoy, J.: When stopword lists make the difference (2009) 0.01
```
0.0064449105 = product of:
  0.025779642 = sum of:
    0.025779642 = weight(_text_:for in 3319) [ClassicSimilarity], result of:
      0.025779642 = score(doc=3319,freq=8.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.29041752 = fieldWeight in 3319, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3319)
  0.25 = coord(1/4)
```
Abstract

In this brief communication, we evaluate the use of two stopword lists for the English language (one comprising 571 words and another with 9) and compare them with a search approach accounting for all word forms. We show that through implementing the original Okapi form or certain ones derived from the Divergence from Randomness (DFR) paradigm, significantly lower performance levels may result when using short or no stopword lists. For other DFR models and a revised Okapi implementation, performance differences between approaches using short or long stopword lists or no list at all are usually not statistically significant. Similar conclusions can be drawn when using other natural languages such as French, Hindi, or Persian.

Source

Journal of the American Society for Information Science and Technology. 61(2010) no.1, S.200-203
Savoy, J.: Searching strategies for the Hungarian language (2008) 0.01
```
0.0055242092 = product of:
  0.022096837 = sum of:
    0.022096837 = weight(_text_:for in 2037) [ClassicSimilarity], result of:
      0.022096837 = score(doc=2037,freq=8.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.24892932 = fieldWeight in 2037, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.046875 = fieldNorm(doc=2037)
  0.25 = coord(1/4)
```
Abstract

This paper reports on the underlying IR problems encountered when dealing with the complex morphology and compound constructions found in the Hungarian language. It describes evaluations carried out on two general stemming strategies for this language, and also demonstrates that a light stemming approach could be quite effective. Based on searches done on the CLEF test collection, we find that a more aggressive suffix-stripping approach may produce better MAP. When compared to an IR scheme without stemming or one based on only a light stemmer, we find the differences to be statistically significant. When compared with probabilistic, vector-space and language models, we find that the Okapi model results in the best retrieval effectiveness. The resulting MAP is found to be about 35% better than the classical tf idf approach, particularly for very short requests. Finally, we demonstrate that applying an automatic decompounding procedure for both queries and documents significantly improves IR performance (+10%), compared to word-based indexing strategies.
Savoy, J.: Ranking schemes in hybrid Boolean systems : a new approach (1997) 0.00
```
0.004784106 = product of:
  0.019136423 = sum of:
    0.019136423 = weight(_text_:for in 393) [ClassicSimilarity], result of:
      0.019136423 = score(doc=393,freq=6.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.21557912 = fieldWeight in 393, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.046875 = fieldNorm(doc=393)
  0.25 = coord(1/4)
```
Abstract

In most commercial online systems, the retrieval system is based on the Boolean model and its inverted file organization. Since the investment in these systems is so great and changing them could be economically unfeasible, this article suggests a new ranking scheme especially adapted for hypertext environments in order to produce more effective retrieval results and yet maintain the effectiveness of the investment made to date in the Boolean model. To select the retrieved documents, the suggested ranking strategy uses multiple sources of document content evidence. The proposed scheme integrates both the information provided by the index and query terms, and the inherent relationships between documents such as bibliographic references or hypertext links. We will demonstrate that our scheme represents an integration of both subject and citation indexing, and results in a significant imporvement over classical ranking schemes uses in hybrid Boolean systems, while preserving its efficiency. Moreover, through knowing the nearest neighbor and the hypertext links which constitute additional sources of evidence, our strategy will take them into account in order to further improve retrieval effectiveness and to provide 'good' starting points for browsing in a hypertext or hypermedia environement

Source

Journal of the American Society for Information Science. 48(1997) no.3, S.235-253
Fautsch, C.; Savoy, J.: Algorithmic stemmers or morphological analysis? : an evaluation (2009) 0.00
```
0.004784106 = product of:
  0.019136423 = sum of:
    0.019136423 = weight(_text_:for in 2950) [ClassicSimilarity], result of:
      0.019136423 = score(doc=2950,freq=6.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.21557912 = fieldWeight in 2950, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.046875 = fieldNorm(doc=2950)
  0.25 = coord(1/4)
```
Abstract

It is important in information retrieval (IR), information extraction, or classification tasks that morphologically related forms are conflated under the same stem (using stemmer) or lemma (using morphological analyzer). To achieve this for the English language, algorithmic stemming or various morphological analysis approaches have been suggested. Based on Cross-Language Evaluation Forum test collections containing 284 queries and various IR models, this article evaluates these word-normalization proposals. Stemming improves the mean average precision significantly by around 7% while performance differences are not significant when comparing various algorithmic stemmers or algorithmic stemmers and morphological analysis. Accounting for thesaurus class numbers during indexing does not modify overall retrieval performances. Finally, we demonstrate that including a stop word list, even one containing only around 10 terms, might significantly improve retrieval performance, depending on the IR model.

Source

Journal of the American Society for Information Science and Technology. 60(2009) no.8, S.1616-1624
Dolamic, L.; Savoy, J.: Indexing and searching strategies for the Russian language (2009) 0.00
```
0.003986755 = product of:
  0.01594702 = sum of:
    0.01594702 = weight(_text_:for in 3301) [ClassicSimilarity], result of:
      0.01594702 = score(doc=3301,freq=6.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.17964928 = fieldWeight in 3301, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3301)
  0.25 = coord(1/4)
```
Abstract

This paper describes and evaluates various stemming and indexing strategies for the Russian language. We design and evaluate two stemming approaches, a light and a more aggressive one, and compare these stemmers to the Snowball stemmer, to no stemming, and also to a language-independent approach (n-gram). To evaluate the suggested stemming strategies we apply various probabilistic information retrieval (IR) models, including the Okapi, the Divergence from Randomness (DFR), a statistical language model (LM), as well as two vector-space approaches, namely, the classical tf idf scheme and the dtu-dtn model. We find that the vector-space dtu-dtn and the DFR models tend to result in better retrieval effectiveness than the Okapi, LM, or tf idf models, while only the latter two IR approaches result in statistically significant performance differences. Ignoring stemming generally reduces the MAP by more than 50%, and these differences are always significant. When applying an n-gram approach, performance differences are usually lower than an approach involving stemming. Finally, our light stemmer tends to perform best, although performance differences between the light, aggressive, and Snowball stemmers are not statistically significant.

Source

Journal of the American Society for Information Science and Technology. 60(2009) no.12, S.2540-2547
Dolamic, L.; Savoy, J.: Retrieval effectiveness of machine translated queries (2010) 0.00
```
0.0039062058 = product of:
  0.015624823 = sum of:
    0.015624823 = weight(_text_:for in 4102) [ClassicSimilarity], result of:
      0.015624823 = score(doc=4102,freq=4.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.17601961 = fieldWeight in 4102, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.046875 = fieldNorm(doc=4102)
  0.25 = coord(1/4)
```
Abstract

This article describes and evaluates various information retrieval models used to search document collections written in English through submitting queries written in various other languages, either members of the Indo-European family (English, French, German, and Spanish) or radically different language groups such as Chinese. This evaluation method involves searching a rather large number of topics (around 300) and using two commercial machine translation systems to translate across the language barriers. In this study, mean average precision is used to measure variances in retrieval effectiveness when a query language differs from the document language. Although performance differences are rather large for certain languages pairs, this does not mean that bilingual search methods are not commercially viable. Causes of the difficulties incurred when searching or during translation are analyzed and the results of concrete examples are explained.

Source

Journal of the American Society for Information Science and Technology. 61(2010) no.11, S.2266-2273

Kocher, M.; Savoy, J.: ¬A simple and efficient algorithm for authorship verification (2017) 0.00

0.0039062058 = product of:
  0.015624823 = sum of:
    0.015624823 = weight(_text_:for in 3330) [ClassicSimilarity], result of:
      0.015624823 = score(doc=3330,freq=4.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.17601961 = fieldWeight in 3330, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.046875 = fieldNorm(doc=3330)
  0.25 = coord(1/4)

Source: Journal of the Association for Information Science and Technology. 68(2017) no.1, S.259-269

Savoy, J.: ¬A learning scheme for information retrieval in hypertext (1994) 0.00

0.0036828062 = product of:
  0.014731225 = sum of:
    0.014731225 = weight(_text_:for in 7292) [ClassicSimilarity], result of:
      0.014731225 = score(doc=7292,freq=2.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.16595288 = fieldWeight in 7292, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.0625 = fieldNorm(doc=7292)
  0.25 = coord(1/4)

Savoy, J.: ¬An extended vector-processing scheme for searching information in hypertext systems (1996) 0.00
```
0.0032551715 = product of:
  0.013020686 = sum of:
    0.013020686 = weight(_text_:for in 4036) [ClassicSimilarity], result of:
      0.013020686 = score(doc=4036,freq=4.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.14668301 = fieldWeight in 4036, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4036)
  0.25 = coord(1/4)
```
Abstract

When searching information in a hypertext is limited to navigation, it is not an easy task, especially when the number of nodes and/or links becomes very large. A query based access mechanism must therefore be provided to complement the navigational tools inherent in hypertext systems. Most mechanisms currently proposed are based on conventional information retrieval models which consider documents as indepent entities, and ignore hypertext links. To promote the use of other information retrieval mechnaisms adapted to hypertext systems, responds to the following questions; how can we integrate information given by hypertext links into an information retrieval scheme; are these hypertext links (and link semantics) clues to the enhancement of retrieval effectiveness; if so, how can we use them. 2 solutions are: using a default weight function based on link tape or assigning the same strength to all link types; or using a specific weight for each particular link, i.e. the level of association or a similarity measure. Proposes an extended vector processing scheme which extracts additional information from hypertext links to enhance retrieval effectiveness. A hypertext based on 2 medium size collections, the CACM and the CISI collection has been built. The hypergraph is composed of explicit links (bibliographic references), computed links based on bibliographic information, or on hypertext links established according to document representatives (nearest neighbour)
Savoy, J.: Text clustering : an application with the 'State of the Union' addresses (2015) 0.00
```
0.0032551715 = product of:
  0.013020686 = sum of:
    0.013020686 = weight(_text_:for in 2128) [ClassicSimilarity], result of:
      0.013020686 = score(doc=2128,freq=4.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.14668301 = fieldWeight in 2128, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2128)
  0.25 = coord(1/4)
```
Abstract

This paper describes a clustering and authorship attribution study over the State of the Union addresses from 1790 to 2014 (224 speeches delivered by 41 presidents). To define the style of each presidency, we have applied a principal component analysis (PCA) based on the part-of-speech (POS) frequencies. From Roosevelt (1934), each president tends to own a distinctive style whereas previous presidents tend usually to share some stylistic aspects with others. Applying an automatic classification based on the frequencies of all content-bearing word-types we show that chronology tends to play a central role in forming clusters, a factor that is more important than political affiliation. Using the 300 most frequent word-types, we generate another clustering representation based on the style of each president. This second view shares similarities with the first one, but usually with more numerous and smaller clusters. Finally, an authorship attribution approach for each speech can reach a success rate of around 95.7% under some constraints. When an incorrect assignment is detected, the proposed author often belongs to the same party and has lived during roughly the same time period as the presumed author. A deeper analysis of some incorrect assignments reveals interesting reasons justifying difficult attributions.

Source

Journal of the Association for Information Science and Technology. 66(2015) no.8, S.1645-1654
Savoy, J.: Text representation strategies : an example with the State of the union addresses (2016) 0.00
```
0.0032551715 = product of:
  0.013020686 = sum of:
    0.013020686 = weight(_text_:for in 3042) [ClassicSimilarity], result of:
      0.013020686 = score(doc=3042,freq=4.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.14668301 = fieldWeight in 3042, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3042)
  0.25 = coord(1/4)
```
Abstract

Based on State of the Union addresses from 1790 to 2014 (225 speeches delivered by 42 presidents), this paper describes and evaluates different text representation strategies. To determine the most important words of a given text, the term frequencies (tf) or the tf?idf weighting scheme can be applied. Recently, latent Dirichlet allocation (LDA) has been proposed to define the topics included in a corpus. As another strategy, this study proposes to apply a vocabulary specificity measure (Z?score) to determine the most significantly overused word-types or short sequences of them. Our experiments show that the simple term frequency measure is not able to discriminate between specific terms associated with a document or a set of texts. Using the tf idf or LDA approach, the selection requires some arbitrary decisions. Based on the term-specific measure (Z?score), the term selection has a clear theoretical basis. Moreover, the most significant sentences for each presidency can be determined. As another facet, we can visualize the dynamic evolution of usage of some terms associated with their specificity measures. Finally, this technique can be employed to define the most important lexical leaders introducing terms overused by the k following presidencies.

Source

Journal of the Association for Information Science and Technology. 67(2016) no.8, S.1858-1870
Savoy, J.: Authorship of Pauline epistles revisited (2019) 0.00
```
0.0032551715 = product of:
  0.013020686 = sum of:
    0.013020686 = weight(_text_:for in 5386) [ClassicSimilarity], result of:
      0.013020686 = score(doc=5386,freq=4.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.14668301 = fieldWeight in 5386, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5386)
  0.25 = coord(1/4)
```
Abstract

The name Paul appears in 13 epistles, but is he the real author? According to different biblical scholars, the number of letters really attributed to Paul varies from 4 to 13, with a majority agreeing on seven. This article proposes to revisit this authorship attribution problem by considering two effective methods (Burrows' Delta, Labbé's intertextual distance). Based on these results, a hierarchical clustering is then applied showing that four clusters can be derived, namely: {Colossians-Ephesians}, {1 and 2 Thessalonians}, {Titus, 1 and 2 Timothy}, and {Romans, Galatians, 1 and 2 Corinthians}. Moreover, a verification method based on the impostors' strategy indicates clearly that the group {Colossians-Ephesians} is written by the same author who seems not to be Paul. The same conclusion can be found for the cluster {Titus, 1 and 2 Timothy}. The Letter to Philemon stays as a singleton, without any close stylistic relationship with the other epistles. Finally, a group of four letters {Romans, Galatians, 1 and 2 Corinthians} is certainly written by the same author (Paul), but the verification protocol also indicates that 2 Corinthians is related to 1 Thessalonians, rendering a clear and simple interpretation difficult.

Source

Journal of the Association for Information Science and Technology. 70(2019) no.10, S.1089-1097

Savoy, J.: ¬A new probabilistic scheme for information retrieval in hypertext (1995) 0.00

0.0032224553 = product of:
  0.012889821 = sum of:
    0.012889821 = weight(_text_:for in 7254) [ClassicSimilarity], result of:
      0.012889821 = score(doc=7254,freq=2.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.14520876 = fieldWeight in 7254, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7254)
  0.25 = coord(1/4)

Savoy, J.: Bibliographic database access using free-text and controlled vocabulary : an evaluation (2005) 0.00
```
0.0032224553 = product of:
  0.012889821 = sum of:
    0.012889821 = weight(_text_:for in 1053) [ClassicSimilarity], result of:
      0.012889821 = score(doc=1053,freq=2.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.14520876 = fieldWeight in 1053, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1053)
  0.25 = coord(1/4)
```
Abstract

This paper evaluates and compares the retrieval effectiveness of various search models, based on either automatic text-word indexing or on manually assigned controlled descriptors. Retrieval is from a relatively large collection of bibliographic material written in French. Moreover, for this French collection we evaluate improvements that result from combining automatic and manual indexing. First, when considering various contexts, this study reveals that the combined indexing strategy always obtains the best retrieval performance. Second, when users wish to conduct exhaustive searches with minimal effort, we demonstrate that manually assigned terms are essential. Third, the evaluations presented in this paper study reveal the comparative retrieval performances that result from manual and automatic indexing in a variety of circumstances.

Ikae, C.; Savoy, J.: Gender identification on Twitter (2022) 0.00

0.0023017537 = product of:
  0.009207015 = sum of:
    0.009207015 = weight(_text_:for in 445) [ClassicSimilarity], result of:
      0.009207015 = score(doc=445,freq=2.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.103720546 = fieldWeight in 445, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.0390625 = fieldNorm(doc=445)
  0.25 = coord(1/4)

Source: Journal of the Association for Information Science and Technology. 73(2022) no.1, S.58-69

Search (19 results, page 1 of 1)

Authors

Years

Themes