Search (61 results, page 1 of 4)

  • × year_i:[2010 TO 2020}
  • × theme_ss:"Retrievalalgorithmen"
  1. Tober, M.; Hennig, L.; Furch, D.: SEO Ranking-Faktoren und Rang-Korrelationen 2014 : Google Deutschland (2014) 0.06
    0.06095313 = product of:
      0.15238282 = sum of:
        0.12725374 = weight(_text_:91 in 1484) [ClassicSimilarity], result of:
          0.12725374 = score(doc=1484,freq=2.0), product of:
            0.25837386 = queryWeight, product of:
              5.5722036 = idf(docFreq=456, maxDocs=44218)
              0.046368346 = queryNorm
            0.49251786 = fieldWeight in 1484, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.5722036 = idf(docFreq=456, maxDocs=44218)
              0.0625 = fieldNorm(doc=1484)
        0.025129084 = product of:
          0.050258167 = sum of:
            0.050258167 = weight(_text_:22 in 1484) [ClassicSimilarity], result of:
              0.050258167 = score(doc=1484,freq=2.0), product of:
                0.16237405 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046368346 = queryNorm
                0.30952093 = fieldWeight in 1484, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1484)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Date
    13. 9.2014 14:45:22
    Pages
    91 S
  2. Bornmann, L.; Mutz, R.: From P100 to P100' : a new citation-rank approach (2014) 0.03
    0.02951445 = product of:
      0.073786125 = sum of:
        0.010897844 = weight(_text_:a in 1431) [ClassicSimilarity], result of:
          0.010897844 = score(doc=1431,freq=8.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.20383182 = fieldWeight in 1431, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=1431)
        0.06288828 = sum of:
          0.012630116 = weight(_text_:information in 1431) [ClassicSimilarity], result of:
            0.012630116 = score(doc=1431,freq=2.0), product of:
              0.08139861 = queryWeight, product of:
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.046368346 = queryNorm
              0.1551638 = fieldWeight in 1431, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.0625 = fieldNorm(doc=1431)
          0.050258167 = weight(_text_:22 in 1431) [ClassicSimilarity], result of:
            0.050258167 = score(doc=1431,freq=2.0), product of:
              0.16237405 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046368346 = queryNorm
              0.30952093 = fieldWeight in 1431, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0625 = fieldNorm(doc=1431)
      0.4 = coord(2/5)
    
    Abstract
    Properties of a percentile-based rating scale needed in bibliometrics are formulated. Based on these properties, P100 was recently introduced as a new citation-rank approach (Bornmann, Leydesdorff, & Wang, 2013). In this paper, we conceptualize P100 and propose an improvement which we call P100'. Advantages and disadvantages of citation-rank indicators are noted.
    Date
    22. 8.2014 17:05:18
    Source
    Journal of the Association for Information Science and Technology. 65(2014) no.9, S.1939-1943
    Type
    a
  3. Ravana, S.D.; Rajagopal, P.; Balakrishnan, V.: Ranking retrieval systems using pseudo relevance judgments (2015) 0.03
    0.025280405 = product of:
      0.06320101 = sum of:
        0.0076151006 = weight(_text_:a in 2591) [ClassicSimilarity], result of:
          0.0076151006 = score(doc=2591,freq=10.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.14243183 = fieldWeight in 2591, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2591)
        0.055585913 = sum of:
          0.011163551 = weight(_text_:information in 2591) [ClassicSimilarity], result of:
            0.011163551 = score(doc=2591,freq=4.0), product of:
              0.08139861 = queryWeight, product of:
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.046368346 = queryNorm
              0.13714671 = fieldWeight in 2591, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.0390625 = fieldNorm(doc=2591)
          0.044422362 = weight(_text_:22 in 2591) [ClassicSimilarity], result of:
            0.044422362 = score(doc=2591,freq=4.0), product of:
              0.16237405 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046368346 = queryNorm
              0.27358043 = fieldWeight in 2591, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=2591)
      0.4 = coord(2/5)
    
    Abstract
    Purpose In a system-based approach, replicating the web would require large test collections, and judging the relevancy of all documents per topic in creating relevance judgment through human assessors is infeasible. Due to the large amount of documents that requires judgment, there are possible errors introduced by human assessors because of disagreements. The paper aims to discuss these issues. Design/methodology/approach This study explores exponential variation and document ranking methods that generate a reliable set of relevance judgments (pseudo relevance judgments) to reduce human efforts. These methods overcome problems with large amounts of documents for judgment while avoiding human disagreement errors during the judgment process. This study utilizes two key factors: number of occurrences of each document per topic from all the system runs; and document rankings to generate the alternate methods. Findings The effectiveness of the proposed method is evaluated using the correlation coefficient of ranked systems using mean average precision scores between the original Text REtrieval Conference (TREC) relevance judgments and pseudo relevance judgments. The results suggest that the proposed document ranking method with a pool depth of 100 could be a reliable alternative to reduce human effort and disagreement errors involved in generating TREC-like relevance judgments. Originality/value Simple methods proposed in this study show improvement in the correlation coefficient in generating alternate relevance judgment without human assessors while contributing to information retrieval evaluation.
    Date
    20. 1.2015 18:30:22
    18. 9.2018 18:22:56
    Source
    Aslib journal of information management. 67(2015) no.6, S.700-714
    Type
    a
  4. Soulier, L.; Jabeur, L.B.; Tamine, L.; Bahsoun, W.: On ranking relevant entities in heterogeneous networks using a language-based model (2013) 0.02
    0.02318735 = product of:
      0.057968374 = sum of:
        0.010769378 = weight(_text_:a in 664) [ClassicSimilarity], result of:
          0.010769378 = score(doc=664,freq=20.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.20142901 = fieldWeight in 664, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=664)
        0.047198996 = sum of:
          0.015787644 = weight(_text_:information in 664) [ClassicSimilarity], result of:
            0.015787644 = score(doc=664,freq=8.0), product of:
              0.08139861 = queryWeight, product of:
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.046368346 = queryNorm
              0.19395474 = fieldWeight in 664, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.0390625 = fieldNorm(doc=664)
          0.031411353 = weight(_text_:22 in 664) [ClassicSimilarity], result of:
            0.031411353 = score(doc=664,freq=2.0), product of:
              0.16237405 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046368346 = queryNorm
              0.19345059 = fieldWeight in 664, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=664)
      0.4 = coord(2/5)
    
    Abstract
    A new challenge, accessing multiple relevant entities, arises from the availability of linked heterogeneous data. In this article, we address more specifically the problem of accessing relevant entities, such as publications and authors within a bibliographic network, given an information need. We propose a novel algorithm, called BibRank, that estimates a joint relevance of documents and authors within a bibliographic network. This model ranks each type of entity using a score propagation algorithm with respect to the query topic and the structure of the underlying bi-type information entity network. Evidence sources, namely content-based and network-based scores, are both used to estimate the topical similarity between connected entities. For this purpose, authorship relationships are analyzed through a language model-based score on the one hand and on the other hand, non topically related entities of the same type are detected through marginal citations. The article reports the results of experiments using the Bibrank algorithm for an information retrieval task. The CiteSeerX bibliographic data set forms the basis for the topical query automatic generation and evaluation. We show that a statistically significant improvement over closely related ranking models is achieved.
    Date
    22. 3.2013 19:34:49
    Source
    Journal of the American Society for Information Science and Technology. 64(2013) no.3, S.500-515
    Type
    a
  5. Baloh, P.; Desouza, K.C.; Hackney, R.: Contextualizing organizational interventions of knowledge management systems : a design science perspectiveA domain analysis (2012) 0.02
    0.018768111 = product of:
      0.046920277 = sum of:
        0.0076151006 = weight(_text_:a in 241) [ClassicSimilarity], result of:
          0.0076151006 = score(doc=241,freq=10.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.14243183 = fieldWeight in 241, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=241)
        0.039305177 = sum of:
          0.007893822 = weight(_text_:information in 241) [ClassicSimilarity], result of:
            0.007893822 = score(doc=241,freq=2.0), product of:
              0.08139861 = queryWeight, product of:
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.046368346 = queryNorm
              0.09697737 = fieldWeight in 241, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.0390625 = fieldNorm(doc=241)
          0.031411353 = weight(_text_:22 in 241) [ClassicSimilarity], result of:
            0.031411353 = score(doc=241,freq=2.0), product of:
              0.16237405 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046368346 = queryNorm
              0.19345059 = fieldWeight in 241, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=241)
      0.4 = coord(2/5)
    
    Abstract
    We address how individuals' (workers) knowledge needs influence the design of knowledge management systems (KMS), enabling knowledge creation and utilization. It is evident that KMS technologies and activities are indiscriminately deployed in most organizations with little regard to the actual context of their adoption. Moreover, it is apparent that the extant literature pertaining to knowledge management projects is frequently deficient in identifying the variety of factors indicative for successful KMS. This presents an obvious business practice and research gap that requires a critical analysis of the necessary intervention that will actually improve how workers can leverage and form organization-wide knowledge. This research involved an extensive review of the literature, a grounded theory methodological approach and rigorous data collection and synthesis through an empirical case analysis (Parsons Brinckerhoff and Samsung). The contribution of this study is the formulation of a model for designing KMS based upon the design science paradigm, which aspires to create artifacts that are interdependent of people and organizations. The essential proposition is that KMS design and implementation must be contextualized in relation to knowledge needs and that these will differ for various organizational settings. The findings present valuable insights and further understanding of the way in which KMS design efforts should be focused.
    Date
    11. 6.2012 14:22:34
    Source
    Journal of the American Society for Information Science and Technology. 63(2012) no.5, S.948-966
    Type
    a
  6. Karlsson, A.; Hammarfelt, B.; Steinhauer, H.J.; Falkman, G.; Olson, N.; Nelhans, G.; Nolin, J.: Modeling uncertainty in bibliometrics and information retrieval : an information fusion approach (2015) 0.01
    0.009321972 = product of:
      0.023304928 = sum of:
        0.009632425 = weight(_text_:a in 1696) [ClassicSimilarity], result of:
          0.009632425 = score(doc=1696,freq=4.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.18016359 = fieldWeight in 1696, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.078125 = fieldNorm(doc=1696)
        0.013672504 = product of:
          0.027345007 = sum of:
            0.027345007 = weight(_text_:information in 1696) [ClassicSimilarity], result of:
              0.027345007 = score(doc=1696,freq=6.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.3359395 = fieldWeight in 1696, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.078125 = fieldNorm(doc=1696)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Footnote
    Beitrag in einem Special Issue "Combining bibliometrics and information retrieval"
    Type
    a
  7. Efron, M.: Linear time series models for term weighting in information retrieval (2010) 0.01
    0.008693065 = product of:
      0.021732662 = sum of:
        0.012260076 = weight(_text_:a in 3688) [ClassicSimilarity], result of:
          0.012260076 = score(doc=3688,freq=18.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.22931081 = fieldWeight in 3688, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=3688)
        0.009472587 = product of:
          0.018945174 = sum of:
            0.018945174 = weight(_text_:information in 3688) [ClassicSimilarity], result of:
              0.018945174 = score(doc=3688,freq=8.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.23274569 = fieldWeight in 3688, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3688)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    Common measures of term importance in information retrieval (IR) rely on counts of term frequency; rare terms receive higher weight in document ranking than common terms receive. However, realistic scenarios yield additional information about terms in a collection. Of interest in this article is the temporal behavior of terms as a collection changes over time. We propose capturing each term's collection frequency at discrete time intervals over the lifespan of a corpus and analyzing the resulting time series. We hypothesize the collection frequency of a weakly discriminative term x at time t is predictable by a linear model of the term's prior observations. On the other hand, a linear time series model for a strong discriminators' collection frequency will yield a poor fit to the data. Operationalizing this hypothesis, we induce three time-based measures of term importance and test these against state-of-the-art term weighting models.
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.7, S.1299-1312
    Type
    a
  8. Bauckhage, C.: Marginalizing over the PageRank damping factor (2014) 0.01
    0.008606452 = product of:
      0.021516128 = sum of:
        0.013622305 = weight(_text_:a in 928) [ClassicSimilarity], result of:
          0.013622305 = score(doc=928,freq=8.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.25478977 = fieldWeight in 928, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.078125 = fieldNorm(doc=928)
        0.007893822 = product of:
          0.015787644 = sum of:
            0.015787644 = weight(_text_:information in 928) [ClassicSimilarity], result of:
              0.015787644 = score(doc=928,freq=2.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.19395474 = fieldWeight in 928, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.078125 = fieldNorm(doc=928)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    In this note, we show how to marginalize over the damping parameter of the PageRank equation so as to obtain a parameter-free version known as TotalRank. Our discussion is meant as a reference and intended to provide a guided tour towards an interesting result that has applications in information retrieval and classification.
    Type
    a
  9. Efron, M.; Winget, M.: Query polyrepresentation for ranking retrieval systems without relevance judgments (2010) 0.01
    0.0079049645 = product of:
      0.019762412 = sum of:
        0.01155891 = weight(_text_:a in 3469) [ClassicSimilarity], result of:
          0.01155891 = score(doc=3469,freq=16.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.2161963 = fieldWeight in 3469, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=3469)
        0.008203502 = product of:
          0.016407004 = sum of:
            0.016407004 = weight(_text_:information in 3469) [ClassicSimilarity], result of:
              0.016407004 = score(doc=3469,freq=6.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.20156369 = fieldWeight in 3469, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3469)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    Ranking information retrieval (IR) systems with respect to their effectiveness is a crucial operation during IR evaluation, as well as during data fusion. This article offers a novel method of approaching the system-ranking problem, based on the widely studied idea of polyrepresentation. The principle of polyrepresentation suggests that a single information need can be represented by many query articulations-what we call query aspects. By skimming the top k (where k is small) documents retrieved by a single system for multiple query aspects, we collect a set of documents that are likely to be relevant to a given test topic. Labeling these skimmed documents as putatively relevant lets us build pseudorelevance judgments without undue human intervention. We report experiments where using these pseudorelevance judgments delivers a rank ordering of IR systems that correlates highly with rankings based on human relevance judgments.
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.6, S.1081-1091
    Type
    a
  10. Van der Veer Martens, B.; Fleet, C. van: Opening the black box of "relevance work" : a domain analysis (2012) 0.01
    0.0077931583 = product of:
      0.019482896 = sum of:
        0.0100103095 = weight(_text_:a in 247) [ClassicSimilarity], result of:
          0.0100103095 = score(doc=247,freq=12.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.18723148 = fieldWeight in 247, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=247)
        0.009472587 = product of:
          0.018945174 = sum of:
            0.018945174 = weight(_text_:information in 247) [ClassicSimilarity], result of:
              0.018945174 = score(doc=247,freq=8.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.23274569 = fieldWeight in 247, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046875 = fieldNorm(doc=247)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    In response to Hjørland's recent call for a reconceptualization of the foundations of relevance, we suggest that the sociocognitive aspects of intermediation by information agencies, such as archives and libraries, are a necessary and unexplored part of the infrastructure of the subject knowledge domains central to his recommended "view of relevance informed by a social paradigm" (2010, p. 217). From a comparative analysis of documents from 39 graduate-level introductory courses in archives, reference, and strategic/competitive intelligence taught in 13 American Library Association-accredited library and information science (LIS) programs, we identify four defining sociocognitive dimensions of "relevance work" in information agencies within Hjørland's proposed framework for relevance: tasks, time, systems, and assessors. This study is intended to supply sociocognitive content from within the relevance work domain to support further domain analytic research, and to emphasize the importance of intermediary relevance work for all subject knowledge domains.
    Source
    Journal of the American Society for Information Science and Technology. 63(2012) no.5, S.936-947
    Type
    a
  11. Symonds, M.; Bruza, P.; Zuccon, G.; Koopman, B.; Sitbon, L.; Turner, I.: Automatic query expansion : a structural linguistic perspective (2014) 0.01
    0.007616916 = product of:
      0.01904229 = sum of:
        0.01021673 = weight(_text_:a in 1338) [ClassicSimilarity], result of:
          0.01021673 = score(doc=1338,freq=18.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.19109234 = fieldWeight in 1338, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1338)
        0.008825562 = product of:
          0.017651124 = sum of:
            0.017651124 = weight(_text_:information in 1338) [ClassicSimilarity], result of:
              0.017651124 = score(doc=1338,freq=10.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.21684799 = fieldWeight in 1338, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1338)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    A user's query is considered to be an imprecise description of their information need. Automatic query expansion is the process of reformulating the original query with the goal of improving retrieval effectiveness. Many successful query expansion techniques model syntagmatic associations that infer two terms co-occur more often than by chance in natural language. However, structural linguistics relies on both syntagmatic and paradigmatic associations to deduce the meaning of a word. Given the success of dependency-based approaches to query expansion and the reliance on word meanings in the query formulation process, we argue that modeling both syntagmatic and paradigmatic information in the query expansion process improves retrieval effectiveness. This article develops and evaluates a new query expansion technique that is based on a formal, corpus-based model of word meaning that models syntagmatic and paradigmatic associations. We demonstrate that when sufficient statistical information exists, as in the case of longer queries, including paradigmatic information alone provides significant improvements in retrieval effectiveness across a wide variety of data sets. More generally, when our new query expansion approach is applied to large-scale web retrieval it demonstrates significant improvements in retrieval effectiveness over a strong baseline system, based on a commercial search engine.
    Source
    Journal of the Association for Information Science and Technology. 65(2014) no.8, S.1577-1596
    Type
    a
  12. Dang, E.K.F.; Luk, R.W.P.; Allan, J.; Ho, K.S.; Chung, K.F.L.; Lee, D.L.: ¬A new context-dependent term weight computed by boost and discount using relevance information (2010) 0.01
    0.0072442205 = product of:
      0.018110551 = sum of:
        0.01021673 = weight(_text_:a in 4120) [ClassicSimilarity], result of:
          0.01021673 = score(doc=4120,freq=18.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.19109234 = fieldWeight in 4120, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4120)
        0.007893822 = product of:
          0.015787644 = sum of:
            0.015787644 = weight(_text_:information in 4120) [ClassicSimilarity], result of:
              0.015787644 = score(doc=4120,freq=8.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.19395474 = fieldWeight in 4120, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4120)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    We studied the effectiveness of a new class of context-dependent term weights for information retrieval. Unlike the traditional term frequency-inverse document frequency (TF-IDF), the new weighting of a term t in a document d depends not only on the occurrence statistics of t alone but also on the terms found within a text window (or "document-context") centered on t. We introduce a Boost and Discount (B&D) procedure which utilizes partial relevance information to compute the context-dependent term weights of query terms according to a logistic regression model. We investigate the effectiveness of the new term weights compared with the context-independent BM25 weights in the setting of relevance feedback. We performed experiments with title queries of the TREC-6, -7, -8, and 2005 collections, comparing the residual Mean Average Precision (MAP) measures obtained using B&D term weights and those obtained by a baseline using BM25 weights. Given either 10 or 20 relevance judgments of the top retrieved documents, using the new term weights yields improvement over the baseline for all collections tested. The MAP obtained with the new weights has relative improvement over the baseline by 3.3 to 15.2%, with statistical significance at the 95% confidence level across all four collections.
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.12, S.2514-2530
    Type
    a
  13. Hoenkamp, E.; Bruza, P.: How everyday language can and will boost effective information retrieval (2015) 0.01
    0.0072039375 = product of:
      0.018009843 = sum of:
        0.008341924 = weight(_text_:a in 2123) [ClassicSimilarity], result of:
          0.008341924 = score(doc=2123,freq=12.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.15602624 = fieldWeight in 2123, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2123)
        0.009667919 = product of:
          0.019335838 = sum of:
            0.019335838 = weight(_text_:information in 2123) [ClassicSimilarity], result of:
              0.019335838 = score(doc=2123,freq=12.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.23754507 = fieldWeight in 2123, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2123)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    Typing 2 or 3 keywords into a browser has become an easy and efficient way to find information. Yet, typing even short queries becomes tedious on ever shrinking (virtual) keyboards. Meanwhile, speech processing is maturing rapidly, facilitating everyday language input. Also, wearable technology can inform users proactively by listening in on their conversations or processing their social media interactions. Given these developments, everyday language may soon become the new input of choice. We present an information retrieval (IR) algorithm specifically designed to accept everyday language. It integrates two paradigms of information retrieval, previously studied in isolation; one directed mainly at the surface structure of language, the other primarily at the underlying meaning. The integration was achieved by a Markov machine that encodes meaning by its transition graph, and surface structure by the language it generates. A rigorous evaluation of the approach showed, first, that it can compete with the quality of existing language models, second, that it is more effective the more verbose the input, and third, as a consequence, that it is promising for an imminent transition from keyword input, where the onus is on the user to formulate concise queries, to a modality where users can express more freely, more informal, and more natural their need for information in everyday language.
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.8, S.1546-1558
    Type
    a
  14. Fuhr, N.: Modelle im Information Retrieval (2013) 0.01
    0.007189882 = product of:
      0.017974705 = sum of:
        0.0068111527 = weight(_text_:a in 724) [ClassicSimilarity], result of:
          0.0068111527 = score(doc=724,freq=2.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.12739488 = fieldWeight in 724, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.078125 = fieldNorm(doc=724)
        0.011163551 = product of:
          0.022327103 = sum of:
            0.022327103 = weight(_text_:information in 724) [ClassicSimilarity], result of:
              0.022327103 = score(doc=724,freq=4.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.27429342 = fieldWeight in 724, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.078125 = fieldNorm(doc=724)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Source
    Grundlagen der praktischen Information und Dokumentation. Handbuch zur Einführung in die Informationswissenschaft und -praxis. 6., völlig neu gefaßte Ausgabe. Hrsg. von R. Kuhlen, W. Semar u. D. Strauch. Begründet von Klaus Laisiepen, Ernst Lutterbeck, Karl-Heinrich Meyer-Uhlenried
    Type
    a
  15. White, H. D.: Co-cited author retrieval and relevance theory : examples from the humanities (2015) 0.01
    0.007058388 = product of:
      0.01764597 = sum of:
        0.008173384 = weight(_text_:a in 1687) [ClassicSimilarity], result of:
          0.008173384 = score(doc=1687,freq=2.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.15287387 = fieldWeight in 1687, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.09375 = fieldNorm(doc=1687)
        0.009472587 = product of:
          0.018945174 = sum of:
            0.018945174 = weight(_text_:information in 1687) [ClassicSimilarity], result of:
              0.018945174 = score(doc=1687,freq=2.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.23274569 = fieldWeight in 1687, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.09375 = fieldNorm(doc=1687)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Footnote
    Beitrag in einem Special Issue "Combining bibliometrics and information retrieval"
    Type
    a
  16. Ding, Y.: Topic-based PageRank on author cocitation networks (2011) 0.01
    0.007004201 = product of:
      0.017510502 = sum of:
        0.010812371 = weight(_text_:a in 4348) [ClassicSimilarity], result of:
          0.010812371 = score(doc=4348,freq=14.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.20223314 = fieldWeight in 4348, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=4348)
        0.0066981306 = product of:
          0.013396261 = sum of:
            0.013396261 = weight(_text_:information in 4348) [ClassicSimilarity], result of:
              0.013396261 = score(doc=4348,freq=4.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.16457605 = fieldWeight in 4348, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4348)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    Ranking authors is vital for identifying a researcher's impact and standing within a scientific field. There are many different ranking methods (e.g., citations, publications, h-index, PageRank, and weighted PageRank), but most of them are topic-independent. This paper proposes topic-dependent ranks based on the combination of a topic model and a weighted PageRank algorithm. The author-conference-topic (ACT) model was used to extract topic distribution of individual authors. Two ways for combining the ACT model with the PageRank algorithm are proposed: simple combination (I_PR) or using a topic distribution as a weighted vector for PageRank (PR_t). Information retrieval was chosen as the test field and representative authors for different topics at different time phases were identified. Principal component analysis (PCA) was applied to analyze the ranking difference between I_PR and PR_t.
    Source
    Journal of the American Society for Information Science and Technology. 62(2011) no.3, S.449-466
    Type
    a
  17. Lee, J.-T.; Seo, J.; Jeon, J.; Rim, H.-C.: Sentence-based relevance flow analysis for high accuracy retrieval (2011) 0.01
    0.006951616 = product of:
      0.01737904 = sum of:
        0.011797264 = weight(_text_:a in 4746) [ClassicSimilarity], result of:
          0.011797264 = score(doc=4746,freq=24.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.22065444 = fieldWeight in 4746, product of:
              4.8989797 = tf(freq=24.0), with freq of:
                24.0 = termFreq=24.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4746)
        0.0055817757 = product of:
          0.011163551 = sum of:
            0.011163551 = weight(_text_:information in 4746) [ClassicSimilarity], result of:
              0.011163551 = score(doc=4746,freq=4.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.13714671 = fieldWeight in 4746, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4746)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    Traditional ranking models for information retrieval lack the ability to make a clear distinction between relevant and nonrelevant documents at top ranks if both have similar bag-of-words representations with regard to a user query. We aim to go beyond the bag-of-words approach to document ranking in a new perspective, by representing each document as a sequence of sentences. We begin with an assumption that relevant documents are distinguishable from nonrelevant ones by sequential patterns of relevance degrees of sentences to a query. We introduce the notion of relevance flow, which refers to a stream of sentence-query relevance within a document. We then present a framework to learn a function for ranking documents effectively based on various features extracted from their relevance flows and leverage the output to enhance existing retrieval models. We validate the effectiveness of our approach by performing a number of retrieval experiments on three standard test collections, each comprising a different type of document: news articles, medical references, and blog posts. Experimental results demonstrate that the proposed approach can improve the retrieval performance at the top ranks significantly as compared with the state-of-the-art retrieval models regardless of document type.
    Source
    Journal of the American Society for Information Science and Technology. 62(2011) no.9, S.1666-1675
    Type
    a
  18. Liu, X.; Turtle, H.: Real-time user interest modeling for real-time ranking (2013) 0.01
    0.0069366493 = product of:
      0.017341623 = sum of:
        0.009138121 = weight(_text_:a in 1035) [ClassicSimilarity], result of:
          0.009138121 = score(doc=1035,freq=10.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.1709182 = fieldWeight in 1035, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=1035)
        0.008203502 = product of:
          0.016407004 = sum of:
            0.016407004 = weight(_text_:information in 1035) [ClassicSimilarity], result of:
              0.016407004 = score(doc=1035,freq=6.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.20156369 = fieldWeight in 1035, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1035)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    User interest as a very dynamic information need is often ignored in most existing information retrieval systems. In this research, we present the results of experiments designed to evaluate the performance of a real-time interest model (RIM) that attempts to identify the dynamic and changing query level interests regarding social media outputs. Unlike most existing ranking methods, our ranking approach targets calculation of the probability that user interest in the content of the document is subject to very dynamic user interest change. We describe 2 formulations of the model (real-time interest vector space and real-time interest language model) stemming from classical relevance ranking methods and develop a novel methodology for evaluating the performance of RIM using Amazon Mechanical Turk to collect (interest-based) relevance judgments on a daily basis. Our results show that the model usually, although not always, performs better than baseline results obtained from commercial web search engines. We identify factors that affect RIM performance and outline plans for future research.
    Source
    Journal of the American Society for Information Science and Technology. 64(2013) no.8, S.1557-1576
    Type
    a
  19. Lee, J.; Min, J.-K.; Oh, A.; Chung, C.-W.: Effective ranking and search techniques for Web resources considering semantic relationships (2014) 0.01
    0.0067616524 = product of:
      0.01690413 = sum of:
        0.009010308 = weight(_text_:a in 2670) [ClassicSimilarity], result of:
          0.009010308 = score(doc=2670,freq=14.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.1685276 = fieldWeight in 2670, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2670)
        0.007893822 = product of:
          0.015787644 = sum of:
            0.015787644 = weight(_text_:information in 2670) [ClassicSimilarity], result of:
              0.015787644 = score(doc=2670,freq=8.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.19395474 = fieldWeight in 2670, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2670)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    On the Semantic Web, the types of resources and the semantic relationships between resources are defined in an ontology. By using that information, the accuracy of information retrieval can be improved. In this paper, we present effective ranking and search techniques considering the semantic relationships in an ontology. Our technique retrieves top-k resources which are the most relevant to query keywords through the semantic relationships. To do this, we propose a weighting measure for the semantic relationship. Based on this measure, we propose a novel ranking method which considers the number of meaningful semantic relationships between a resource and keywords as well as the coverage and discriminating power of keywords. In order to improve the efficiency of the search, we prune the unnecessary search space using the length and weight thresholds of the semantic relationship path. In addition, we exploit Threshold Algorithm based on an extended inverted index to answer top-k results efficiently. The experimental results using real data sets demonstrate that our retrieval method using the semantic information generates accurate results efficiently compared to the traditional methods.
    Content
    Vgl.: doi: 10.1016/j.ipm.2013.08.007. A short preliminary version of this paper was published in the proceeding of WWW 2009 as a two page poster paper.
    Source
    Information processing and management. 50(2014) no.1, S.132-155
    Type
    a
  20. Nunes, S.; Ribeiro, C.; David, G.: Term weighting based on document revision history (2011) 0.01
    0.0067507178 = product of:
      0.016876794 = sum of:
        0.01129502 = weight(_text_:a in 4946) [ClassicSimilarity], result of:
          0.01129502 = score(doc=4946,freq=22.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.21126054 = fieldWeight in 4946, product of:
              4.690416 = tf(freq=22.0), with freq of:
                22.0 = termFreq=22.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4946)
        0.0055817757 = product of:
          0.011163551 = sum of:
            0.011163551 = weight(_text_:information in 4946) [ClassicSimilarity], result of:
              0.011163551 = score(doc=4946,freq=4.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.13714671 = fieldWeight in 4946, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4946)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    In real-world information retrieval systems, the underlying document collection is rarely stable or definitive. This work is focused on the study of signals extracted from the content of documents at different points in time for the purpose of weighting individual terms in a document. The basic idea behind our proposals is that terms that have existed for a longer time in a document should have a greater weight. We propose 4 term weighting functions that use each document's history to estimate a current term score. To evaluate this thesis, we conduct 3 independent experiments using a collection of documents sampled from Wikipedia. In the first experiment, we use data from Wikipedia to judge each set of terms. In a second experiment, we use an external collection of tags from a popular social bookmarking service as a gold standard. In the third experiment, we crowdsource user judgments to collect feedback on term preference. Across all experiments results consistently support our thesis. We show that temporally aware measures, specifically the proposed revision term frequency and revision term frequency span, outperform a term-weighting measure based on raw term frequency alone.
    Source
    Journal of the American Society for Information Science and Technology. 62(2011) no.12, S.2471-2478
    Type
    a

Languages

  • e 51
  • d 10

Types

  • a 59
  • el 2
  • r 1
  • More… Less…