Search (75 results, page 1 of 4)

  • × theme_ss:"Retrievalalgorithmen"
  • × type_ss:"a"
  • × year_i:[2000 TO 2010}
  1. Li, M.; Li, H.; Zhou, Z.-H.: Semi-supervised document retrieval (2009) 0.04
    0.04227614 = product of:
      0.08455228 = sum of:
        0.063356094 = weight(_text_:data in 4218) [ClassicSimilarity], result of:
          0.063356094 = score(doc=4218,freq=12.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.4278775 = fieldWeight in 4218, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4218)
        0.021196188 = product of:
          0.042392377 = sum of:
            0.042392377 = weight(_text_:processing in 4218) [ClassicSimilarity], result of:
              0.042392377 = score(doc=4218,freq=2.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.22363065 = fieldWeight in 4218, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4218)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    This paper proposes a new machine learning method for constructing ranking models in document retrieval. The method, which is referred to as SSRank, aims to use the advantages of both the traditional Information Retrieval (IR) methods and the supervised learning methods for IR proposed recently. The advantages include the use of limited amount of labeled data and rich model representation. To do so, the method adopts a semi-supervised learning framework in ranking model construction. Specifically, given a small number of labeled documents with respect to some queries, the method effectively labels the unlabeled documents for the queries. It then uses all the labeled data to train a machine learning model (in our case, Neural Network). In the data labeling, the method also makes use of a traditional IR model (in our case, BM25). A stopping criterion based on machine learning theory is given for the data labeling process. Experimental results on three benchmark datasets and one web search dataset indicate that SSRank consistently and almost always significantly outperforms the baseline methods (unsupervised and supervised learning methods), given the same amount of labeled data. This is because SSRank can effectively leverage the use of unlabeled data in learning.
    Source
    Information processing and management. 45(2009) no.3, S.341-355
  2. Beitzel, S.M.; Jensen, E.C.; Chowdhury, A.; Grossman, D.; Frieder, O; Goharian, N.: Fusion of effective retrieval strategies in the same information retrieval system (2004) 0.04
    0.03959743 = product of:
      0.07919486 = sum of:
        0.053759433 = weight(_text_:data in 2502) [ClassicSimilarity], result of:
          0.053759433 = score(doc=2502,freq=6.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.3630661 = fieldWeight in 2502, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=2502)
        0.025435425 = product of:
          0.05087085 = sum of:
            0.05087085 = weight(_text_:processing in 2502) [ClassicSimilarity], result of:
              0.05087085 = score(doc=2502,freq=2.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.26835677 = fieldWeight in 2502, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2502)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Prior efforts have shown that under certain situations retrieval effectiveness may be improved via the use of data fusion techniques. Although these improvements have been observed from the fusion of result sets from several distinct information retrieval systems, it has often been thought that fusing different document retrieval strategies in a single information retrieval system will lead to similar improvements. In this study, we show that this is not the case. We hold constant systemic differences such as parsing, stemming, phrase processing, and relevance feedback, and fuse result sets generated from highly effective retrieval strategies in the same information retrieval system. From this, we show that data fusion of highly effective retrieval strategies alone shows little or no improvement in retrieval effectiveness. Furthermore, we present a detailed analysis of the performance of modern data fusion approaches, and demonstrate the reasons why they do not perform weIl when applied to this problem. Detailed results and analyses are included to support our conclusions.
  3. MacFarlane, A.; Robertson, S.E.; McCann, J.A.: Parallel computing for passage retrieval (2004) 0.04
    0.036669686 = product of:
      0.14667875 = sum of:
        0.14667875 = sum of:
          0.095923 = weight(_text_:processing in 5108) [ClassicSimilarity], result of:
            0.095923 = score(doc=5108,freq=4.0), product of:
              0.18956426 = queryWeight, product of:
                4.048147 = idf(docFreq=2097, maxDocs=44218)
                0.046827413 = queryNorm
              0.5060184 = fieldWeight in 5108, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.048147 = idf(docFreq=2097, maxDocs=44218)
                0.0625 = fieldNorm(doc=5108)
          0.050755743 = weight(_text_:22 in 5108) [ClassicSimilarity], result of:
            0.050755743 = score(doc=5108,freq=2.0), product of:
              0.16398162 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046827413 = queryNorm
              0.30952093 = fieldWeight in 5108, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0625 = fieldNorm(doc=5108)
      0.25 = coord(1/4)
    
    Abstract
    In this paper methods for both speeding up passage processing and examining more passages using parallel computers are explored. The number of passages processed are varied in order to examine the effect on retrieval effectiveness and efficiency. The particular algorithm applied has previously been used to good effect in Okapi experiments at TREC. This algorithm and the mechanism for applying parallel computing to speed up processing are described.
    Date
    20. 1.2007 18:30:22
  4. White, K.J.; Sutcliffe, R.F.E.: Applying incremental tree induction to retrieval : from manuals and medical texts (2006) 0.03
    0.03466491 = product of:
      0.06932982 = sum of:
        0.043894395 = weight(_text_:data in 5044) [ClassicSimilarity], result of:
          0.043894395 = score(doc=5044,freq=4.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.29644224 = fieldWeight in 5044, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=5044)
        0.025435425 = product of:
          0.05087085 = sum of:
            0.05087085 = weight(_text_:processing in 5044) [ClassicSimilarity], result of:
              0.05087085 = score(doc=5044,freq=2.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.26835677 = fieldWeight in 5044, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5044)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    The Decision Tree Forest (DTF) is an architecture for information retrieval that uses a separate decision tree for each document in a collection. Experiments were conducted in which DTFs working with the incremental tree induction (ITI) algorithm of Utgoff, Berkman, and Clouse (1997) were trained and evaluated in the medical and word processing domains using the Cystic Fibrosis and SIFT collections. Performance was compared with that of a conventional inverted index system (IIS) using a BM25-derived probabilistic matching function. Initial results using DTF were poor compared to those obtained with IIS. We then simulated scenarios in which large quantities of training data were available, by using only those parts of the document collection that were well covered by the data sets. Consequently, the retrieval effectiveness of DTF improved substantially. In one particular experiment, precision and recall for DTF were 0.65 and 0.67 respectively, values that compared favorably with values of 0.49 and 0.56 for IIS.
  5. Kekäläinen, J.: Binary and graded relevance in IR evaluations : comparison of the effects on ranking of IR systems (2005) 0.03
    0.03466491 = product of:
      0.06932982 = sum of:
        0.043894395 = weight(_text_:data in 1036) [ClassicSimilarity], result of:
          0.043894395 = score(doc=1036,freq=4.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.29644224 = fieldWeight in 1036, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=1036)
        0.025435425 = product of:
          0.05087085 = sum of:
            0.05087085 = weight(_text_:processing in 1036) [ClassicSimilarity], result of:
              0.05087085 = score(doc=1036,freq=2.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.26835677 = fieldWeight in 1036, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1036)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    In this study the rankings of IR systems based on binary and graded relevance in TREC 7 and 8 data are compared. Relevance of a sample TREC results is reassessed using a relevance scale with four levels: non-relevant, marginally relevant, fairly relevant, highly relevant. Twenty-one topics and 90 systems from TREC 7 and 20 topics and 121 systems from TREC 8 form the data. Binary precision, and cumulated gain, discounted cumulated gain and normalised discounted cumulated gain are the measures compared. Different weighting schemes for relevance levels are tested with cumulated gain measures. Kendall's rank correlations are computed to determine to what extent the rankings produced by different measures are similar. Weighting schemes from binary to emphasising highly relevant documents form a continuum, where the measures correlate strongly in the binary end, and less in the heavily weighted end. The results show the different character of the measures.
    Source
    Information processing and management. 41(2005) no.5, S.1019-1034
  6. Abdelali, A.; Cowie, J.; Soliman, H.S.: Improving query precision using semantic expansion (2007) 0.03
    0.032942846 = product of:
      0.06588569 = sum of:
        0.036211025 = weight(_text_:data in 917) [ClassicSimilarity], result of:
          0.036211025 = score(doc=917,freq=2.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.24455236 = fieldWeight in 917, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0546875 = fieldNorm(doc=917)
        0.029674664 = product of:
          0.05934933 = sum of:
            0.05934933 = weight(_text_:processing in 917) [ClassicSimilarity], result of:
              0.05934933 = score(doc=917,freq=2.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.3130829 = fieldWeight in 917, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=917)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Query Expansion (QE) is one of the most important mechanisms in the information retrieval field. A typical short Internet query will go through a process of refinement to improve its retrieval power. Most of the existing QE techniques suffer from retrieval performance degradation due to imprecise choice of query's additive terms in the QE process. In this paper, we introduce a novel automated QE mechanism. The new expansion process is guided by the semantics relations between the original query and the expanding words, in the context of the utilized corpus. Experimental results of our "controlled" query expansion, using the Arabic TREC-10 data, show a significant enhancement of recall and precision over current existing mechanisms in the field.
    Source
    Information processing and management. 43(2007) no.3, S.705-716
  7. Ning, X.; Jin, H.; Jia, W.; Yuan, P.: Practical and effective IR-style keyword search over semantic web (2009) 0.03
    0.032942846 = product of:
      0.06588569 = sum of:
        0.036211025 = weight(_text_:data in 4213) [ClassicSimilarity], result of:
          0.036211025 = score(doc=4213,freq=2.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.24455236 = fieldWeight in 4213, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4213)
        0.029674664 = product of:
          0.05934933 = sum of:
            0.05934933 = weight(_text_:processing in 4213) [ClassicSimilarity], result of:
              0.05934933 = score(doc=4213,freq=2.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.3130829 = fieldWeight in 4213, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=4213)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    This paper presents a novel IR-style keyword search model for semantic web data retrieval, distinguished from current retrieval methods. In this model, an answer to a keyword query is a connected subgraph that contains all the query keywords. In addition, the answer is minimal because any proper subgraph can not be an answer to the query. We provide an approximation algorithm to retrieve these answers efficiently. A special ranking strategy is also proposed so that answers can be appropriately ordered. The experimental results over real datasets show that our model outperforms existing possible solutions with respect to effectiveness and efficiency.
    Source
    Information processing and management. 45(2009) no.2, S.263-271
  8. Quiroga, L.M.; Mostafa, J.: ¬An experiment in building profiles in information filtering : the role of context of user relevance feedback (2002) 0.03
    0.028887425 = product of:
      0.05777485 = sum of:
        0.03657866 = weight(_text_:data in 2579) [ClassicSimilarity], result of:
          0.03657866 = score(doc=2579,freq=4.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.24703519 = fieldWeight in 2579, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2579)
        0.021196188 = product of:
          0.042392377 = sum of:
            0.042392377 = weight(_text_:processing in 2579) [ClassicSimilarity], result of:
              0.042392377 = score(doc=2579,freq=2.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.22363065 = fieldWeight in 2579, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2579)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    An experiment was conducted to see how relevance feedback could be used to build and adjust profiles to improve the performance of filtering systems. Data was collected during the system interaction of 18 graduate students with SIFTER (Smart Information Filtering Technology for Electronic Resources), a filtering system that ranks incoming information based on users' profiles. The data set came from a collection of 6000 records concerning consumer health. In the first phase of the study, three different modes of profile acquisition were compared. The explicit mode allowed users to directly specify the profile; the implicit mode utilized relevance feedback to create and refine the profile; and the combined mode allowed users to initialize the profile and to continuously refine it using relevance feedback. Filtering performance, measured in terms of Normalized Precision, showed that the three approaches were significantly different ( [small alpha, Greek] =0.05 and p =0.012). The explicit mode of profile acquisition consistently produced superior results. Exclusive reliance on relevance feedback in the implicit mode resulted in inferior performance. The low performance obtained by the implicit acquisition mode motivated the second phase of the study, which aimed to clarify the role of context in relevance feedback judgments. An inductive content analysis of thinking aloud protocols showed dimensions that were highly situational, establishing the importance context plays in feedback relevance assessments. Results suggest the need for better representation of documents, profiles, and relevance feedback mechanisms that incorporate dimensions identified in this research.
    Source
    Information processing and management. 38(2002) no.5, S.671-694
  9. Wan, X.; Yang, J.; Xiao, J.: Towards a unified approach to document similarity search using manifold-ranking of blocks (2008) 0.03
    0.028887425 = product of:
      0.05777485 = sum of:
        0.03657866 = weight(_text_:data in 2081) [ClassicSimilarity], result of:
          0.03657866 = score(doc=2081,freq=4.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.24703519 = fieldWeight in 2081, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2081)
        0.021196188 = product of:
          0.042392377 = sum of:
            0.042392377 = weight(_text_:processing in 2081) [ClassicSimilarity], result of:
              0.042392377 = score(doc=2081,freq=2.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.22363065 = fieldWeight in 2081, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2081)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Document similarity search (i.e. query by example) aims to retrieve a ranked list of documents similar to a query document in a text corpus or on the Web. Most existing approaches to similarity search first compute the pairwise similarity score between each document and the query using a retrieval function or similarity measure (e.g. Cosine), and then rank the documents by the similarity scores. In this paper, we propose a novel retrieval approach based on manifold-ranking of document blocks (i.e. a block of coherent text about a subtopic) to re-rank a small set of documents initially retrieved by some existing retrieval function. The proposed approach can make full use of the intrinsic global manifold structure of the document blocks by propagating the ranking scores between the blocks on a weighted graph. First, the TextTiling algorithm and the VIPS algorithm are respectively employed to segment text documents and web pages into blocks. Then, each block is assigned with a ranking score by the manifold-ranking algorithm. Lastly, a document gets its final ranking score by fusing the scores of its blocks. Experimental results on the TDT data and the ODP data demonstrate that the proposed approach can significantly improve the retrieval performances over baseline approaches. Document block is validated to be a better unit than the whole document in the manifold-ranking process.
    Source
    Information processing and management. 44(2008) no.3, S.1032-1048
  10. Sakai, T.: On the reliability of information retrieval metrics based on graded relevance (2007) 0.03
    0.028236724 = product of:
      0.05647345 = sum of:
        0.031038022 = weight(_text_:data in 910) [ClassicSimilarity], result of:
          0.031038022 = score(doc=910,freq=2.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.2096163 = fieldWeight in 910, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=910)
        0.025435425 = product of:
          0.05087085 = sum of:
            0.05087085 = weight(_text_:processing in 910) [ClassicSimilarity], result of:
              0.05087085 = score(doc=910,freq=2.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.26835677 = fieldWeight in 910, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046875 = fieldNorm(doc=910)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    This paper compares 14 information retrieval metrics based on graded relevance, together with 10 traditional metrics based on binary relevance, in terms of stability, sensitivity and resemblance of system rankings. More specifically, we compare these metrics using the Buckley/Voorhees stability method, the Voorhees/Buckley swap method and Kendall's rank correlation, with three data sets comprising test collections and submitted runs from NTCIR. Our experiments show that (Average) Normalised Discounted Cumulative Gain at document cut-off l are the best among the rank-based graded-relevance metrics, provided that l is large. On the other hand, if one requires a recall-based graded-relevance metric that is highly correlated with Average Precision, then Q-measure is the best choice. Moreover, these best graded-relevance metrics are at least as stable and sensitive as Average Precision, and are fairly robust to the choice of gain values.
    Source
    Information processing and management. 43(2007) no.2, S.531-548
  11. Rokaya, M.; Atlam, E.; Fuketa, M.; Dorji, T.C.; Aoe, J.-i.: Ranking of field association terms using Co-word analysis (2008) 0.03
    0.028236724 = product of:
      0.05647345 = sum of:
        0.031038022 = weight(_text_:data in 2060) [ClassicSimilarity], result of:
          0.031038022 = score(doc=2060,freq=2.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.2096163 = fieldWeight in 2060, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=2060)
        0.025435425 = product of:
          0.05087085 = sum of:
            0.05087085 = weight(_text_:processing in 2060) [ClassicSimilarity], result of:
              0.05087085 = score(doc=2060,freq=2.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.26835677 = fieldWeight in 2060, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2060)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Information retrieval involves finding some desired information in a store of information or a database. In this paper, Co-word analysis will be used to achieve a ranking of a selected sample of FA terms. Based on this ranking a better arranging of search results can be achieved. Experimental results achieved using 41 MB of data (7660 documents) in the field of sports. The corpus was collected from CNN newspaper, sports field. This corpus was chosen to be distributed over 11 sub-fields of the field sports from the experimental results, the average precision increased by 18.3% after applying the proposed arranging scheme depending on the absolute frequency to count the terms weights, and the average precision increased by 17.2% after applying the proposed arranging scheme depending on a formula based on "TF*IDF" to count the terms weights.
    Source
    Information processing and management. 44(2008) no.2, S.738-755
  12. Zhao, L.; Wu, L.; Huang, X.: Using query expansion in graph-based approach for query-focused multi-document summarization (2009) 0.03
    0.028236724 = product of:
      0.05647345 = sum of:
        0.031038022 = weight(_text_:data in 2449) [ClassicSimilarity], result of:
          0.031038022 = score(doc=2449,freq=2.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.2096163 = fieldWeight in 2449, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=2449)
        0.025435425 = product of:
          0.05087085 = sum of:
            0.05087085 = weight(_text_:processing in 2449) [ClassicSimilarity], result of:
              0.05087085 = score(doc=2449,freq=2.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.26835677 = fieldWeight in 2449, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2449)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    This paper presents a novel query expansion method, which is combined in the graph-based algorithm for query-focused multi-document summarization, so as to resolve the problem of information limit in the original query. Our approach makes use of both the sentence-to-sentence relations and the sentence-to-word relations to select the query biased informative words from the document set and use them as query expansions to improve the sentence ranking result. Compared to previous query expansion approaches, our approach can capture more relevant information with less noise. We performed experiments on the data of document understanding conference (DUC) 2005 and DUC 2006, and the evaluation results show that the proposed query expansion method can significantly improve the system performance and make our system comparable to the state-of-the-art systems.
    Source
    Information processing and management. 45(2009) no.1, S.35-41
  13. Shiri, A.A.; Revie, C.: Query expansion behavior within a thesaurus-enhanced search environment : a user-centered evaluation (2006) 0.03
    0.026219916 = product of:
      0.05243983 = sum of:
        0.03657866 = weight(_text_:data in 56) [ClassicSimilarity], result of:
          0.03657866 = score(doc=56,freq=4.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.24703519 = fieldWeight in 56, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=56)
        0.01586117 = product of:
          0.03172234 = sum of:
            0.03172234 = weight(_text_:22 in 56) [ClassicSimilarity], result of:
              0.03172234 = score(doc=56,freq=2.0), product of:
                0.16398162 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046827413 = queryNorm
                0.19345059 = fieldWeight in 56, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=56)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    The study reported here investigated the query expansion behavior of end-users interacting with a thesaurus-enhanced search system on the Web. Two groups, namely academic staff and postgraduate students, were recruited into this study. Data were collected from 90 searches performed by 30 users using the OVID interface to the CAB abstracts database. Data-gathering techniques included questionnaires, screen capturing software, and interviews. The results presented here relate to issues of search-topic and search-term characteristics, number and types of expanded queries, usefulness of thesaurus terms, and behavioral differences between academic staff and postgraduate students in their interaction. The key conclusions drawn were that (a) academic staff chose more narrow and synonymous terms than did postgraduate students, who generally selected broader and related terms; (b) topic complexity affected users' interaction with the thesaurus in that complex topics required more query expansion and search term selection; (c) users' prior topic-search experience appeared to have a significant effect on their selection and evaluation of thesaurus terms; (d) in 50% of the searches where additional terms were suggested from the thesaurus, users stated that they had not been aware of the terms at the beginning of the search; this observation was particularly noticeable in the case of postgraduate students.
    Date
    22. 7.2006 16:32:43
  14. Klas, C.-P.; Fuhr, N.; Schaefer, A.: Evaluating strategic support for information access in the DAFFODIL system (2004) 0.03
    0.025035713 = product of:
      0.050071426 = sum of:
        0.031038022 = weight(_text_:data in 2419) [ClassicSimilarity], result of:
          0.031038022 = score(doc=2419,freq=2.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.2096163 = fieldWeight in 2419, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=2419)
        0.019033402 = product of:
          0.038066804 = sum of:
            0.038066804 = weight(_text_:22 in 2419) [ClassicSimilarity], result of:
              0.038066804 = score(doc=2419,freq=2.0), product of:
                0.16398162 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046827413 = queryNorm
                0.23214069 = fieldWeight in 2419, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2419)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    The digital library system Daffodil is targeted at strategic support of users during the information search process. For searching, exploring and managing digital library objects it provides user-customisable information seeking patterns over a federation of heterogeneous digital libraries. In this paper evaluation results with respect to retrieval effectiveness, efficiency and user satisfaction are presented. The analysis focuses on strategic support for the scientific work-flow. Daffodil supports the whole work-flow, from data source selection over information seeking to the representation, organisation and reuse of information. By embedding high level search functionality into the scientific work-flow, the user experiences better strategic system support due to a more systematic work process. These ideas have been implemented in Daffodil followed by a qualitative evaluation. The evaluation has been conducted with 28 participants, ranging from information seeking novices to experts. The results are promising, as they support the chosen model.
    Date
    16.11.2008 16:22:48
  15. Crouch, C.J.; Crouch, D.B.; Chen, Q.; Holtz, S.J.: Improving the retrieval effectiveness of very short queries (2002) 0.02
    0.023530604 = product of:
      0.04706121 = sum of:
        0.02586502 = weight(_text_:data in 2572) [ClassicSimilarity], result of:
          0.02586502 = score(doc=2572,freq=2.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.17468026 = fieldWeight in 2572, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2572)
        0.021196188 = product of:
          0.042392377 = sum of:
            0.042392377 = weight(_text_:processing in 2572) [ClassicSimilarity], result of:
              0.042392377 = score(doc=2572,freq=2.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.22363065 = fieldWeight in 2572, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2572)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    This paper describes an automatic approach designed to improve the retrieval effectiveness of very short queries such as those used in web searching. The method is based on the observation that stemming, which is designed to maximize recall, often results in depressed precision. Our approach is based on pseudo-feedback and attempts to increase the number of relevant documents in the pseudo-relevant set by reranking those documents based on the presence of unstemmed query terms in the document text. The original experiments underlying this work were carried out using Smart 11.0 and the lnc.ltc weighting scheme on three sets of documents from the TREC collection with corresponding TREC (title only) topics as queries. (The average length of these queries after stoplisting ranges from 2.4 to 4.5 terms.) Results, evaluated in terms of P@20 and non-interpolated average precision, showed clearly that pseudo-feedback (PF) based on this approach was effective in increasing the number of relevant documents in the top ranks. Subsequent experiments, performed on the same data sets using Smart 13.0 and the improved Lnu.ltu weighting scheme, indicate that these results hold up even over the much higher baseline provided by the new weights. Query drift analysis presents a more detailed picture of the improvements produced by this process.
    Source
    Information processing and management. 38(2002) no.1, S.1-36
  16. Ning, X.; Jin, H.; Wu, H.: RSS: a framework enabling ranked search on the semantic web (2008) 0.02
    0.023530604 = product of:
      0.04706121 = sum of:
        0.02586502 = weight(_text_:data in 2069) [ClassicSimilarity], result of:
          0.02586502 = score(doc=2069,freq=2.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.17468026 = fieldWeight in 2069, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2069)
        0.021196188 = product of:
          0.042392377 = sum of:
            0.042392377 = weight(_text_:processing in 2069) [ClassicSimilarity], result of:
              0.042392377 = score(doc=2069,freq=2.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.22363065 = fieldWeight in 2069, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2069)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    The semantic web not only contains resources but also includes the heterogeneous relationships among them, which is sharply distinguished from the current web. As the growth of the semantic web, specialized search techniques are of significance. In this paper, we present RSS-a framework for enabling ranked semantic search on the semantic web. In this framework, the heterogeneity of relationships is fully exploited to determine the global importance of resources. In addition, the search results can be greatly expanded with entities most semantically related to the query, thus able to provide users with properly ordered semantic search results by combining global ranking values and the relevance between the resources and the query. The proposed semantic search model which supports inference is very different from traditional keyword-based search methods. Moreover, RSS also distinguishes from many current methods of accessing the semantic web data in that it applies novel ranking strategies to prevent returning search results in disorder. The experimental results show that the framework is feasible and can produce better ordering of semantic search results than directly applying the standard PageRank algorithm on the semantic web.
    Source
    Information processing and management. 44(2008) no.2, S.893-909
  17. Witschel, H.F.: Global term weights in distributed environments (2008) 0.02
    0.022234414 = product of:
      0.088937655 = sum of:
        0.088937655 = sum of:
          0.05087085 = weight(_text_:processing in 2096) [ClassicSimilarity], result of:
            0.05087085 = score(doc=2096,freq=2.0), product of:
              0.18956426 = queryWeight, product of:
                4.048147 = idf(docFreq=2097, maxDocs=44218)
                0.046827413 = queryNorm
              0.26835677 = fieldWeight in 2096, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.048147 = idf(docFreq=2097, maxDocs=44218)
                0.046875 = fieldNorm(doc=2096)
          0.038066804 = weight(_text_:22 in 2096) [ClassicSimilarity], result of:
            0.038066804 = score(doc=2096,freq=2.0), product of:
              0.16398162 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046827413 = queryNorm
              0.23214069 = fieldWeight in 2096, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=2096)
      0.25 = coord(1/4)
    
    Date
    1. 8.2008 9:44:22
    Source
    Information processing and management. 44(2008) no.3, S.1049-1061
  18. Bodoff, D.; Enache, D.; Kambil, A.; Simon, G.; Yukhimets, A.: ¬A unified maximum likelihood approach to document retrieval (2001) 0.02
    0.017350782 = product of:
      0.06940313 = sum of:
        0.06940313 = weight(_text_:data in 174) [ClassicSimilarity], result of:
          0.06940313 = score(doc=174,freq=10.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.46871632 = fieldWeight in 174, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=174)
      0.25 = coord(1/4)
    
    Abstract
    Empirical work shows significant benefits from using relevance feedback data to improve information retrieval (IR) performance. Still, one fundamental difficulty has limited the ability to fully exploit this valuable data. The problem is that it is not clear whether the relevance feedback data should be used to train the system about what the users really mean, or about what the documents really mean. In this paper, we resolve the question using a maximum likelihood framework. We show how all the available data can be used to simultaneously estimate both documents and queries in proportions that are optimal in a maximum likelihood sense. The resulting algorithm is directly applicable to many approaches to IR, and the unified framework can help explain previously reported results as well as guidethe search for new methods that utilize feedback data in IR
  19. Daniowicz, C.; Baliski, J.: Document ranking based upon Markov chains (2001) 0.01
    0.014837332 = product of:
      0.05934933 = sum of:
        0.05934933 = product of:
          0.11869866 = sum of:
            0.11869866 = weight(_text_:processing in 5388) [ClassicSimilarity], result of:
              0.11869866 = score(doc=5388,freq=2.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.6261658 = fieldWeight in 5388, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.109375 = fieldNorm(doc=5388)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Source
    Information processing and management. 37(2001) no.4, S.623-637
  20. Horng, J.T.; Yeh, C.C.: Applying genetic algorithms to query optimization in document retrieval (2000) 0.01
    0.014837332 = product of:
      0.05934933 = sum of:
        0.05934933 = product of:
          0.11869866 = sum of:
            0.11869866 = weight(_text_:processing in 3045) [ClassicSimilarity], result of:
              0.11869866 = score(doc=3045,freq=2.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.6261658 = fieldWeight in 3045, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.109375 = fieldNorm(doc=3045)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Source
    Information processing and management. 36(2000) no.5, S.737-759

Languages

  • e 74
  • d 1
  • More… Less…