Search (38 results, page 1 of 2)

Campos, L.M. de; Fernández-Luna, J.M.; Huete, J.F.: Implementing relevance feedback in the Bayesian network retrieval model (2003) 0.08

0.07950986 = product of:
  0.15901972 = sum of:
    0.15901972 = sum of:
      0.11718599 = weight(_text_:network in 825) [ClassicSimilarity], result of:
        0.11718599 = score(doc=825,freq=6.0), product of:
          0.22917621 = queryWeight, product of:
            4.4533744 = idf(docFreq=1398, maxDocs=44218)
            0.05146125 = queryNorm
          0.51133573 = fieldWeight in 825, product of:
            2.4494898 = tf(freq=6.0), with freq of:
              6.0 = termFreq=6.0
            4.4533744 = idf(docFreq=1398, maxDocs=44218)
            0.046875 = fieldNorm(doc=825)
      0.041833732 = weight(_text_:22 in 825) [ClassicSimilarity], result of:
        0.041833732 = score(doc=825,freq=2.0), product of:
          0.18020853 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05146125 = queryNorm
          0.23214069 = fieldWeight in 825, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=825)
  0.5 = coord(1/2)

Abstract: Relevance Feedback consists in automatically formulating a new query according to the relevance judgments provided by the user after evaluating a set of retrieved documents. In this article, we introduce several relevance feedback methods for the Bayesian Network Retrieval ModeL The theoretical frame an which our methods are based uses the concept of partial evidences, which summarize the new pieces of information gathered after evaluating the results obtained by the original query. These partial evidences are inserted into the underlying Bayesian network and a new inference process (probabilities propagation) is run to compute the posterior relevance probabilities of the documents in the collection given the new query. The quality of the proposed methods is tested using a preliminary experimentation with different standard document collections.
Date: 22. 3.2003 19:30:19

Soulier, L.; Jabeur, L.B.; Tamine, L.; Bahsoun, W.: On ranking relevant entities in heterogeneous networks using a language-based model (2013) 0.07
```
0.07381185 = product of:
  0.1476237 = sum of:
    0.1476237 = sum of:
      0.11276226 = weight(_text_:network in 664) [ClassicSimilarity], result of:
        0.11276226 = score(doc=664,freq=8.0), product of:
          0.22917621 = queryWeight, product of:
            4.4533744 = idf(docFreq=1398, maxDocs=44218)
            0.05146125 = queryNorm
          0.492033 = fieldWeight in 664, product of:
            2.828427 = tf(freq=8.0), with freq of:
              8.0 = termFreq=8.0
            4.4533744 = idf(docFreq=1398, maxDocs=44218)
            0.0390625 = fieldNorm(doc=664)
      0.034861445 = weight(_text_:22 in 664) [ClassicSimilarity], result of:
        0.034861445 = score(doc=664,freq=2.0), product of:
          0.18020853 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05146125 = queryNorm
          0.19345059 = fieldWeight in 664, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=664)
  0.5 = coord(1/2)
```
Abstract

A new challenge, accessing multiple relevant entities, arises from the availability of linked heterogeneous data. In this article, we address more specifically the problem of accessing relevant entities, such as publications and authors within a bibliographic network, given an information need. We propose a novel algorithm, called BibRank, that estimates a joint relevance of documents and authors within a bibliographic network. This model ranks each type of entity using a score propagation algorithm with respect to the query topic and the structure of the underlying bi-type information entity network. Evidence sources, namely content-based and network-based scores, are both used to estimate the topical similarity between connected entities. For this purpose, authorship relationships are analyzed through a language model-based score on the one hand and on the other hand, non topically related entities of the same type are detected through marginal citations. The article reports the results of experiments using the Bibrank algorithm for an information retrieval task. The CiteSeerX bibliographic data set forms the basis for the topical query automatic generation and evaluation. We show that a statistically significant improvement over closely related ranking models is achieved.

Date

22. 3.2013 19:34:49

Furner, J.: ¬A unifying model of document relatedness for hybrid search engines (2003) 0.05

0.05474554 = product of:
  0.10949108 = sum of:
    0.10949108 = sum of:
      0.06765735 = weight(_text_:network in 2717) [ClassicSimilarity], result of:
        0.06765735 = score(doc=2717,freq=2.0), product of:
          0.22917621 = queryWeight, product of:
            4.4533744 = idf(docFreq=1398, maxDocs=44218)
            0.05146125 = queryNorm
          0.29521978 = fieldWeight in 2717, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            4.4533744 = idf(docFreq=1398, maxDocs=44218)
            0.046875 = fieldNorm(doc=2717)
      0.041833732 = weight(_text_:22 in 2717) [ClassicSimilarity], result of:
        0.041833732 = score(doc=2717,freq=2.0), product of:
          0.18020853 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05146125 = queryNorm
          0.23214069 = fieldWeight in 2717, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=2717)
  0.5 = coord(1/2)

Abstract: Previous work an search-engine design has indicated that information-seekers may benefit from being given the opportunity to exploit multiple sources of evidence of document relatedness. Few existing systems, however, give users more than minimal control over the selections that may be made among methods of exploitation. By applying the methods of "document network analysis" (DNA), a unifying, graph-theoretic model of content-, collaboration-, and context-based systems (CCC) may be developed in which the nature of the similarities between types of document relatedness and document ranking are clarified. The usefulness of the approach to system design suggested by this model may be tested by constructing and evaluating a prototype system (UCXtra) that allows searchers to maintain control over the multiple ways in which document collections may be ranked and re-ranked.
Date: 11. 9.2004 17:32:22

Kwok, K.L.: ¬A network approach to probabilistic information retrieval (1995) 0.03
```
0.029296497 = product of:
  0.058592994 = sum of:
    0.058592994 = product of:
      0.11718599 = sum of:
        0.11718599 = weight(_text_:network in 5696) [ClassicSimilarity], result of:
          0.11718599 = score(doc=5696,freq=6.0), product of:
            0.22917621 = queryWeight, product of:
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.05146125 = queryNorm
            0.51133573 = fieldWeight in 5696, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.046875 = fieldNorm(doc=5696)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Shows how probabilistic information retrieval based on document components may be implemented as a feedforward (feedbackward) artificial neural network. The network supports adaptation of connection weights as well as the growing of new edges between queries and terms based on user relevance feedback data for training, and it reflects query modification and expansion in information retrieval. A learning rule is applied that can also be viewed as supporting sequential learning using a harmonic sequence learning rate. Experimental results with 4 standard small collections and a large Wall Street Journal collection show that small query expansion levels of about 30 terms can achieve most of the gains at the low-recall high-precision region, while larger expansion levels continue to provide gains at the high-recall low-precision region of a precision recall curve
Ding, Y.; Yan, E.; Frazho, A.; Caverlee, J.: PageRank for ranking authors in co-citation networks (2009) 0.03
```
0.029296497 = product of:
  0.058592994 = sum of:
    0.058592994 = product of:
      0.11718599 = sum of:
        0.11718599 = weight(_text_:network in 3161) [ClassicSimilarity], result of:
          0.11718599 = score(doc=3161,freq=6.0), product of:
            0.22917621 = queryWeight, product of:
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.05146125 = queryNorm
            0.51133573 = fieldWeight in 3161, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.046875 = fieldNorm(doc=3161)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

This paper studies how varied damping factors in the PageRank algorithm influence the ranking of authors and proposes weighted PageRank algorithms. We selected the 108 most highly cited authors in the information retrieval (IR) area from the 1970s to 2008 to form the author co-citation network. We calculated the ranks of these 108 authors based on PageRank with the damping factor ranging from 0.05 to 0.95. In order to test the relationship between different measures, we compared PageRank and weighted PageRank results with the citation ranking, h-index, and centrality measures. We found that in our author co-citation network, citation rank is highly correlated with PageRank with different damping factors and also with different weighted PageRank algorithms; citation rank and PageRank are not significantly correlated with centrality measures; and h-index rank does not significantly correlate with centrality measures but does significantly correlate with other measures. The key factors that have impact on the PageRank of authors in the author co-citation network are being co-cited with important authors.
Calegari, S.; Sanchez, E.: Object-fuzzy concept network : an enrichment of ontologies in semantic information retrieval (2008) 0.03
```
0.028190564 = product of:
  0.05638113 = sum of:
    0.05638113 = product of:
      0.11276226 = sum of:
        0.11276226 = weight(_text_:network in 2393) [ClassicSimilarity], result of:
          0.11276226 = score(doc=2393,freq=8.0), product of:
            0.22917621 = queryWeight, product of:
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.05146125 = queryNorm
            0.492033 = fieldWeight in 2393, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2393)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

This article shows how a fuzzy ontology-based approach can improve semantic documents retrieval. After formally defining a fuzzy ontology and a fuzzy knowledge base, a special type of new fuzzy relationship called (semantic) correlation, which links the concepts or entities in a fuzzy ontology, is discussed. These correlations, first assigned by experts, are updated after querying or when a document has been inserted into a database. Moreover, in order to define a dynamic knowledge of a domain adapting itself to the context, it is shown how to handle a tradeoff between the correct definition of an object, taken in the ontology structure, and the actual meaning assigned by individuals. The notion of a fuzzy concept network is extended, incorporating database objects so that entities and documents can similarly be represented in the network. Information retrieval (IR) algorithm, using an object-fuzzy concept network (O-FCN), is introduced and described. This algorithm allows us to derive a unique path among the entities involved in the query to obtain maxima semantic associations in the knowledge domain. Finally, the study has been validated by querying a database using fuzzy recall, fuzzy precision, and coefficient variant measures in the crisp and fuzzy cases.

Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.03

0.027889157 = product of:
  0.055778313 = sum of:
    0.055778313 = product of:
      0.11155663 = sum of:
        0.11155663 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
          0.11155663 = score(doc=402,freq=2.0), product of:
            0.18020853 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05146125 = queryNorm
            0.61904186 = fieldWeight in 402, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.125 = fieldNorm(doc=402)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Information processing and management. 22(1986) no.6, S.465-476

Jiang, X.; Sun, X.; Yang, Z.; Zhuge, H.; Lapshinova-Koltunski, E.; Yao, J.: Exploiting heterogeneous scientific literature networks to combat ranking bias : evidence from the computational linguistics area (2016) 0.02
```
0.024413744 = product of:
  0.048827488 = sum of:
    0.048827488 = product of:
      0.097654976 = sum of:
        0.097654976 = weight(_text_:network in 3017) [ClassicSimilarity], result of:
          0.097654976 = score(doc=3017,freq=6.0), product of:
            0.22917621 = queryWeight, product of:
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.05146125 = queryNorm
            0.42611307 = fieldWeight in 3017, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3017)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

It is important to help researchers find valuable papers from a large literature collection. To this end, many graph-based ranking algorithms have been proposed. However, most of these algorithms suffer from the problem of ranking bias. Ranking bias hurts the usefulness of a ranking algorithm because it returns a ranking list with an undesirable time distribution. This paper is a focused study on how to alleviate ranking bias by leveraging the heterogeneous network structure of the literature collection. We propose a new graph-based ranking algorithm, MutualRank, that integrates mutual reinforcement relationships among networks of papers, researchers, and venues to achieve a more synthetic, accurate, and less-biased ranking than previous methods. MutualRank provides a unified model that involves both intra- and inter-network information for ranking papers, researchers, and venues simultaneously. We use the ACL Anthology Network as the benchmark data set and construct the gold standard from computer linguistics course websites of well-known universities and two well-known textbooks. The experimental results show that MutualRank greatly outperforms the state-of-the-art competitors, including PageRank, HITS, CoRank, Future Rank, and P-Rank, in ranking papers in both improving ranking effectiveness and alleviating ranking bias. Rankings of researchers and venues by MutualRank are also quite reasonable.

Smeaton, A.F.; Rijsbergen, C.J. van: ¬The retrieval effects of query expansion on a feedback document retrieval system (1983) 0.02

0.024403011 = product of:
  0.048806023 = sum of:
    0.048806023 = product of:
      0.097612046 = sum of:
        0.097612046 = weight(_text_:22 in 2134) [ClassicSimilarity], result of:
          0.097612046 = score(doc=2134,freq=2.0), product of:
            0.18020853 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05146125 = queryNorm
            0.5416616 = fieldWeight in 2134, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=2134)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 30. 3.2001 13:32:22

Back, J.: ¬An evaluation of relevancy ranking techniques used by Internet search engines (2000) 0.02

0.024403011 = product of:
  0.048806023 = sum of:
    0.048806023 = product of:
      0.097612046 = sum of:
        0.097612046 = weight(_text_:22 in 3445) [ClassicSimilarity], result of:
          0.097612046 = score(doc=3445,freq=2.0), product of:
            0.18020853 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05146125 = queryNorm
            0.5416616 = fieldWeight in 3445, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=3445)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 25. 8.2005 17:42:22

Chen, H.; Zhang, Y.; Houston, A.L.: Semantic indexing and searching using a Hopfield net (1998) 0.02
```
0.023920486 = product of:
  0.04784097 = sum of:
    0.04784097 = product of:
      0.09568194 = sum of:
        0.09568194 = weight(_text_:network in 5704) [ClassicSimilarity], result of:
          0.09568194 = score(doc=5704,freq=4.0), product of:
            0.22917621 = queryWeight, product of:
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.05146125 = queryNorm
            0.41750383 = fieldWeight in 5704, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.046875 = fieldNorm(doc=5704)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Presents a neural network approach to document semantic indexing. Reports results of a study to apply a Hopfield net algorithm to simulate human associative memory for concept exploration in the domain of computer science and engineering. The INSPEC database, consisting of 320.000 abstracts from leading periodical articles was used as the document test bed. Benchmark tests conformed that 3 parameters: maximum number of activated nodes; maximum allowable error; and maximum number of iterations; were useful in positively influencing network convergence behaviour without negatively impacting central processing unit performance. Another series of benchmark tests was performed to determine the effectiveness of various filtering techniques in reducing the negative impact of noisy input terms. Preliminary user tests conformed expectations that the Hopfield net is potentially useful as an associative memory technique to improve document recall and precision by solving discrepancies between indexer vocabularies and end user vocabularies

Guerrero-Bote, V.P.; Moya Anegón, F. de; Herrero Solana, V.: Document organization using Kohonen's algorithm (2002) 0.02

0.022552451 = product of:
  0.045104902 = sum of:
    0.045104902 = product of:
      0.090209804 = sum of:
        0.090209804 = weight(_text_:network in 2564) [ClassicSimilarity], result of:
          0.090209804 = score(doc=2564,freq=2.0), product of:
            0.22917621 = queryWeight, product of:
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.05146125 = queryNorm
            0.3936264 = fieldWeight in 2564, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.0625 = fieldNorm(doc=2564)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: The classification of documents from a bibliographic database is a task that is linked to processes of information retrieval based on partial matching. A method is described of vectorizing reference documents from LISA which permits their topological organization using Kohonen's algorithm. As an example a map is generated of 202 documents from LISA, and an analysis is made of the possibilities of this type of neural network with respect to the development of information retrieval systems based on graphical browsing.

Fuhr, N.: Ranking-Experimente mit gewichteter Indexierung (1986) 0.02

0.020916866 = product of:
  0.041833732 = sum of:
    0.041833732 = product of:
      0.083667465 = sum of:
        0.083667465 = weight(_text_:22 in 58) [ClassicSimilarity], result of:
          0.083667465 = score(doc=58,freq=2.0), product of:
            0.18020853 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05146125 = queryNorm
            0.46428138 = fieldWeight in 58, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=58)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 14. 6.2015 22:12:44

Fuhr, N.: Rankingexperimente mit gewichteter Indexierung (1986) 0.02

0.020916866 = product of:
  0.041833732 = sum of:
    0.041833732 = product of:
      0.083667465 = sum of:
        0.083667465 = weight(_text_:22 in 2051) [ClassicSimilarity], result of:
          0.083667465 = score(doc=2051,freq=2.0), product of:
            0.18020853 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05146125 = queryNorm
            0.46428138 = fieldWeight in 2051, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=2051)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 14. 6.2015 22:12:56

Li, J.; Willett, P.: ArticleRank : a PageRank-based alternative to numbers of citations for analysing citation networks (2009) 0.02
```
0.01993374 = product of:
  0.03986748 = sum of:
    0.03986748 = product of:
      0.07973496 = sum of:
        0.07973496 = weight(_text_:network in 751) [ClassicSimilarity], result of:
          0.07973496 = score(doc=751,freq=4.0), product of:
            0.22917621 = queryWeight, product of:
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.05146125 = queryNorm
            0.34791988 = fieldWeight in 751, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.0390625 = fieldNorm(doc=751)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Purpose - The purpose of this paper is to suggest an alternative to the widely used Times Cited criterion for analysing citation networks. The approach involves taking account of the natures of the papers that cite a given paper, so as to differentiate between papers that attract the same number of citations. Design/methodology/approach - ArticleRank is an algorithm that has been derived from Google's PageRank algorithm to measure the influence of journal articles. ArticleRank is applied to two datasets - a citation network based on an early paper on webometrics, and a self-citation network based on the 19 most cited papers in the Journal of Documentation - using citation data taken from the Web of Knowledge database. Findings - ArticleRank values provide a different ranking of a set of papers from that provided by the corresponding Times Cited values, and overcomes the inability of the latter to differentiate between papers with the same numbers of citations. The difference in rankings between Times Cited and ArticleRank is greatest for the most heavily cited articles in a dataset. Originality/value - This is a novel application of the PageRank algorithm.
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment (1998) 0.02
```
0.016914338 = product of:
  0.033828676 = sum of:
    0.033828676 = product of:
      0.06765735 = sum of:
        0.06765735 = weight(_text_:network in 5) [ClassicSimilarity], result of:
          0.06765735 = score(doc=5,freq=2.0), product of:
            0.22917621 = queryWeight, product of:
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.05146125 = queryNorm
            0.29521978 = fieldWeight in 5, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.046875 = fieldNorm(doc=5)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The network structure of a hyperlinked environment can be a rich source of information about the content of the environment, provided we have effective means for understanding it. We develop a set of algorithmic tools for extracting information from the link structures of such environments, and report on experiments that demonstrate their effectiveness in a variety of contexts on the World Wide Web. The central issue we address within our framework is the distillation of broad search topics, through the discovery of "authoritative" information sources on such topics. We propose and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of "hub pages" that join them together in the link structure. Our formulation has connections to the eigenvectors of certain matrices associated with the link graph; these connections in turn motivate additional heuristics for link-based analysis.
Li, M.; Li, H.; Zhou, Z.-H.: Semi-supervised document retrieval (2009) 0.01
```
0.014095282 = product of:
  0.028190564 = sum of:
    0.028190564 = product of:
      0.05638113 = sum of:
        0.05638113 = weight(_text_:network in 4218) [ClassicSimilarity], result of:
          0.05638113 = score(doc=4218,freq=2.0), product of:
            0.22917621 = queryWeight, product of:
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.05146125 = queryNorm
            0.2460165 = fieldWeight in 4218, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4218)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

This paper proposes a new machine learning method for constructing ranking models in document retrieval. The method, which is referred to as SSRank, aims to use the advantages of both the traditional Information Retrieval (IR) methods and the supervised learning methods for IR proposed recently. The advantages include the use of limited amount of labeled data and rich model representation. To do so, the method adopts a semi-supervised learning framework in ranking model construction. Specifically, given a small number of labeled documents with respect to some queries, the method effectively labels the unlabeled documents for the queries. It then uses all the labeled data to train a machine learning model (in our case, Neural Network). In the data labeling, the method also makes use of a traditional IR model (in our case, BM25). A stopping criterion based on machine learning theory is given for the data labeling process. Experimental results on three benchmark datasets and one web search dataset indicate that SSRank consistently and almost always significantly outperforms the baseline methods (unsupervised and supervised learning methods), given the same amount of labeled data. This is because SSRank can effectively leverage the use of unlabeled data in learning.

MacFarlane, A.; Robertson, S.E.; McCann, J.A.: Parallel computing for passage retrieval (2004) 0.01

0.013944578 = product of:
  0.027889157 = sum of:
    0.027889157 = product of:
      0.055778313 = sum of:
        0.055778313 = weight(_text_:22 in 5108) [ClassicSimilarity], result of:
          0.055778313 = score(doc=5108,freq=2.0), product of:
            0.18020853 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05146125 = queryNorm
            0.30952093 = fieldWeight in 5108, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=5108)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 20. 1.2007 18:30:22

Faloutsos, C.: Signature files (1992) 0.01

0.013944578 = product of:
  0.027889157 = sum of:
    0.027889157 = product of:
      0.055778313 = sum of:
        0.055778313 = weight(_text_:22 in 3499) [ClassicSimilarity], result of:
          0.055778313 = score(doc=3499,freq=2.0), product of:
            0.18020853 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05146125 = queryNorm
            0.30952093 = fieldWeight in 3499, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=3499)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 7. 5.1999 15:22:48

Losada, D.E.; Barreiro, A.: Emebedding term similarity and inverse document frequency into a logical model of information retrieval (2003) 0.01

0.013944578 = product of:
  0.027889157 = sum of:
    0.027889157 = product of:
      0.055778313 = sum of:
        0.055778313 = weight(_text_:22 in 1422) [ClassicSimilarity], result of:
          0.055778313 = score(doc=1422,freq=2.0), product of:
            0.18020853 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05146125 = queryNorm
            0.30952093 = fieldWeight in 1422, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=1422)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 3.2003 19:27:23

Search (38 results, page 1 of 2)

Authors

Years

Languages

Types

Themes

Subjects

Classifications