Search (143 results, page 1 of 8)

  • × year_i:[2000 TO 2010}
  • × theme_ss:"Retrievalalgorithmen"
  1. Furner, J.: ¬A unifying model of document relatedness for hybrid search engines (2003) 0.06
    0.055682447 = product of:
      0.11136489 = sum of:
        0.01029941 = weight(_text_:information in 2717) [ClassicSimilarity], result of:
          0.01029941 = score(doc=2717,freq=2.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.116372846 = fieldWeight in 2717, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=2717)
        0.10106549 = sum of:
          0.060081743 = weight(_text_:organization in 2717) [ClassicSimilarity], result of:
            0.060081743 = score(doc=2717,freq=4.0), product of:
              0.17974974 = queryWeight, product of:
                3.5653565 = idf(docFreq=3399, maxDocs=44218)
                0.050415643 = queryNorm
              0.33425218 = fieldWeight in 2717, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.5653565 = idf(docFreq=3399, maxDocs=44218)
                0.046875 = fieldNorm(doc=2717)
          0.04098374 = weight(_text_:22 in 2717) [ClassicSimilarity], result of:
            0.04098374 = score(doc=2717,freq=2.0), product of:
              0.17654699 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.050415643 = queryNorm
              0.23214069 = fieldWeight in 2717, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=2717)
      0.5 = coord(2/4)
    
    Abstract
    Previous work an search-engine design has indicated that information-seekers may benefit from being given the opportunity to exploit multiple sources of evidence of document relatedness. Few existing systems, however, give users more than minimal control over the selections that may be made among methods of exploitation. By applying the methods of "document network analysis" (DNA), a unifying, graph-theoretic model of content-, collaboration-, and context-based systems (CCC) may be developed in which the nature of the similarities between types of document relatedness and document ranking are clarified. The usefulness of the approach to system design suggested by this model may be tested by constructing and evaluating a prototype system (UCXtra) that allows searchers to maintain control over the multiple ways in which document collections may be ranked and re-ranked.
    Date
    11. 9.2004 17:32:22
    Series
    Advances in knowledge organization; vol.8
    Source
    Challenges in knowledge representation and organization for the 21st century: Integration of knowledge across boundaries. Proceedings of the 7th ISKO International Conference Granada, Spain, July 10-13, 2002. Ed.: M. López-Huertas
  2. Back, J.: ¬An evaluation of relevancy ranking techniques used by Internet search engines (2000) 0.04
    0.03592316 = product of:
      0.07184632 = sum of:
        0.024031956 = weight(_text_:information in 3445) [ClassicSimilarity], result of:
          0.024031956 = score(doc=3445,freq=2.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.27153665 = fieldWeight in 3445, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.109375 = fieldNorm(doc=3445)
        0.047814365 = product of:
          0.09562873 = sum of:
            0.09562873 = weight(_text_:22 in 3445) [ClassicSimilarity], result of:
              0.09562873 = score(doc=3445,freq=2.0), product of:
                0.17654699 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050415643 = queryNorm
                0.5416616 = fieldWeight in 3445, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=3445)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Date
    25. 8.2005 17:42:22
    Source
    Library and information research news. 24(2000) no.77, S.30-34
  3. Guerrero-Bote, V.P.; Moya Anegón, F. de; Herrero Solana, V.: Document organization using Kohonen's algorithm (2002) 0.03
    0.03191998 = product of:
      0.06383996 = sum of:
        0.023785468 = weight(_text_:information in 2564) [ClassicSimilarity], result of:
          0.023785468 = score(doc=2564,freq=6.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.2687516 = fieldWeight in 2564, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0625 = fieldNorm(doc=2564)
        0.040054493 = product of:
          0.080108985 = sum of:
            0.080108985 = weight(_text_:organization in 2564) [ClassicSimilarity], result of:
              0.080108985 = score(doc=2564,freq=4.0), product of:
                0.17974974 = queryWeight, product of:
                  3.5653565 = idf(docFreq=3399, maxDocs=44218)
                  0.050415643 = queryNorm
                0.44566956 = fieldWeight in 2564, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.5653565 = idf(docFreq=3399, maxDocs=44218)
                  0.0625 = fieldNorm(doc=2564)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    The classification of documents from a bibliographic database is a task that is linked to processes of information retrieval based on partial matching. A method is described of vectorizing reference documents from LISA which permits their topological organization using Kohonen's algorithm. As an example a map is generated of 202 documents from LISA, and an analysis is made of the possibilities of this type of neural network with respect to the development of information retrieval systems based on graphical browsing.
    Source
    Information processing and management. 38(2002) no.1, S.79-89
  4. Losada, D.E.; Barreiro, A.: Emebedding term similarity and inverse document frequency into a logical model of information retrieval (2003) 0.03
    0.027393792 = product of:
      0.054787584 = sum of:
        0.027465092 = weight(_text_:information in 1422) [ClassicSimilarity], result of:
          0.027465092 = score(doc=1422,freq=8.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.3103276 = fieldWeight in 1422, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0625 = fieldNorm(doc=1422)
        0.027322493 = product of:
          0.054644987 = sum of:
            0.054644987 = weight(_text_:22 in 1422) [ClassicSimilarity], result of:
              0.054644987 = score(doc=1422,freq=2.0), product of:
                0.17654699 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050415643 = queryNorm
                0.30952093 = fieldWeight in 1422, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1422)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    We propose a novel approach to incorporate term similarity and inverse document frequency into a logical model of information retrieval. The ability of the logic to handle expressive representations along with the use of such classical notions are promising characteristics for IR systems. The approach proposed here has been efficiently implemented and experiments against test collections are presented.
    Date
    22. 3.2003 19:27:23
    Footnote
    Beitrag eines Themenheftes: Mathematical, logical, and formal methods in information retrieval
    Source
    Journal of the American Society for Information Science and technology. 54(2003) no.4, S.285-301
  5. Ding, Y.; Chowdhury, G.; Foo, S.: Organsising keywords in a Web search environment : a methodology based on co-word analysis (2000) 0.03
    0.026535526 = product of:
      0.05307105 = sum of:
        0.02303018 = weight(_text_:information in 105) [ClassicSimilarity], result of:
          0.02303018 = score(doc=105,freq=10.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.2602176 = fieldWeight in 105, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=105)
        0.030040871 = product of:
          0.060081743 = sum of:
            0.060081743 = weight(_text_:organization in 105) [ClassicSimilarity], result of:
              0.060081743 = score(doc=105,freq=4.0), product of:
                0.17974974 = queryWeight, product of:
                  3.5653565 = idf(docFreq=3399, maxDocs=44218)
                  0.050415643 = queryNorm
                0.33425218 = fieldWeight in 105, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.5653565 = idf(docFreq=3399, maxDocs=44218)
                  0.046875 = fieldNorm(doc=105)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    The rapid development of the Internet and World Wide Web has caused some critical problem for information retrieval. Researchers have made several attempts to solve these problems. Thesauri and subject heading lists as traditional information retrieval tools have been criticised for their efficiency to tackle these newly emerging problems. This paper proposes an information retrieval tool generated by cocitation analysis, comprising keyword clusters with relationships based on the co-occurrences of keywords in the literature. Such a tool can play the role of an associative thesaurus that can provide information about the keywords in a domain that might be useful for information searching and query expansion
    Series
    Advances in knowledge organization; vol.7
    Source
    Dynamism and stability in knowledge organization: Proceedings of the 6th International ISKO-Conference, 10-13 July 2000, Toronto, Canada. Ed.: C. Beghtol et al
  6. Herrera-Viedma, E.; Cordón, O.; Herrera, J.C.; Luqe, M.: ¬An IRS based on multi-granular lnguistic information (2003) 0.03
    0.026535526 = product of:
      0.05307105 = sum of:
        0.02303018 = weight(_text_:information in 2740) [ClassicSimilarity], result of:
          0.02303018 = score(doc=2740,freq=10.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.2602176 = fieldWeight in 2740, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=2740)
        0.030040871 = product of:
          0.060081743 = sum of:
            0.060081743 = weight(_text_:organization in 2740) [ClassicSimilarity], result of:
              0.060081743 = score(doc=2740,freq=4.0), product of:
                0.17974974 = queryWeight, product of:
                  3.5653565 = idf(docFreq=3399, maxDocs=44218)
                  0.050415643 = queryNorm
                0.33425218 = fieldWeight in 2740, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.5653565 = idf(docFreq=3399, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2740)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    An information retrieval system (IRS) based on fuzzy multi-granular linguistic information is proposed. The system has an evaluation method to process multi-granular linguistic information, in such a way that the inputs to the IRS are represented in a different linguistic domain than the outputs. The system accepts Boolean queries whose terms are weighted by means of the ordinal linguistic values represented by the linguistic variable "Importance" assessed an a label set S. The system evaluates the weighted queries according to a threshold semantic and obtains the linguistic retrieval status values (RSV) of documents represented by a linguistic variable "Relevance" expressed in a different label set S'. The advantage of this linguistic IRS with respect to others is that the use of the multi-granular linguistic information facilitates and improves the IRS-user interaction
    Series
    Advances in knowledge organization; vol.8
    Source
    Challenges in knowledge representation and organization for the 21st century: Integration of knowledge across boundaries. Proceedings of the 7th ISKO International Conference Granada, Spain, July 10-13, 2002. Ed.: M. López-Huertas
  7. Crestani, F.; Dominich, S.; Lalmas, M.; Rijsbergen, C.J.K. van: Mathematical, logical, and formal methods in information retrieval : an introduction to the special issue (2003) 0.02
    0.022860084 = product of:
      0.045720167 = sum of:
        0.025228297 = weight(_text_:information in 1451) [ClassicSimilarity], result of:
          0.025228297 = score(doc=1451,freq=12.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.2850541 = fieldWeight in 1451, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=1451)
        0.02049187 = product of:
          0.04098374 = sum of:
            0.04098374 = weight(_text_:22 in 1451) [ClassicSimilarity], result of:
              0.04098374 = score(doc=1451,freq=2.0), product of:
                0.17654699 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050415643 = queryNorm
                0.23214069 = fieldWeight in 1451, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1451)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Research an the use of mathematical, logical, and formal methods, has been central to Information Retrieval research for a long time. Research in this area is important not only because it helps enhancing retrieval effectiveness, but also because it helps clarifying the underlying concepts of Information Retrieval. In this article we outline some of the major aspects of the subject, and summarize the papers of this special issue with respect to how they relate to these aspects. We conclude by highlighting some directions of future research, which are needed to better understand the formal characteristics of Information Retrieval.
    Date
    22. 3.2003 19:27:36
    Footnote
    Einführung zu den Beiträgen eines Themenheftes: Mathematical, logical, and formal methods in information retrieval
    Source
    Journal of the American Society for Information Science and technology. 54(2003) no.4, S.281-284
  8. Klas, C.-P.; Fuhr, N.; Schaefer, A.: Evaluating strategic support for information access in the DAFFODIL system (2004) 0.02
    0.022860084 = product of:
      0.045720167 = sum of:
        0.025228297 = weight(_text_:information in 2419) [ClassicSimilarity], result of:
          0.025228297 = score(doc=2419,freq=12.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.2850541 = fieldWeight in 2419, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=2419)
        0.02049187 = product of:
          0.04098374 = sum of:
            0.04098374 = weight(_text_:22 in 2419) [ClassicSimilarity], result of:
              0.04098374 = score(doc=2419,freq=2.0), product of:
                0.17654699 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050415643 = queryNorm
                0.23214069 = fieldWeight in 2419, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2419)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    The digital library system Daffodil is targeted at strategic support of users during the information search process. For searching, exploring and managing digital library objects it provides user-customisable information seeking patterns over a federation of heterogeneous digital libraries. In this paper evaluation results with respect to retrieval effectiveness, efficiency and user satisfaction are presented. The analysis focuses on strategic support for the scientific work-flow. Daffodil supports the whole work-flow, from data source selection over information seeking to the representation, organisation and reuse of information. By embedding high level search functionality into the scientific work-flow, the user experiences better strategic system support due to a more systematic work process. These ideas have been implemented in Daffodil followed by a qualitative evaluation. The evaluation has been conducted with 28 participants, ranging from information seeking novices to experts. The results are promising, as they support the chosen model.
    Date
    16.11.2008 16:22:48
  9. Song, D.; Bruza, P.D.: Towards context sensitive information inference (2003) 0.02
    0.02277131 = product of:
      0.04554262 = sum of:
        0.028466063 = weight(_text_:information in 1428) [ClassicSimilarity], result of:
          0.028466063 = score(doc=1428,freq=22.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.32163754 = fieldWeight in 1428, product of:
              4.690416 = tf(freq=22.0), with freq of:
                22.0 = termFreq=22.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1428)
        0.01707656 = product of:
          0.03415312 = sum of:
            0.03415312 = weight(_text_:22 in 1428) [ClassicSimilarity], result of:
              0.03415312 = score(doc=1428,freq=2.0), product of:
                0.17654699 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050415643 = queryNorm
                0.19345059 = fieldWeight in 1428, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1428)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Humans can make hasty, but generally robust judgements about what a text fragment is, or is not, about. Such judgements are termed information inference. This article furnishes an account of information inference from a psychologistic stance. By drawing an theories from nonclassical logic and applied cognition, an information inference mechanism is proposed that makes inferences via computations of information flow through an approximation of a conceptual space. Within a conceptual space information is represented geometrically. In this article, geometric representations of words are realized as vectors in a high dimensional semantic space, which is automatically constructed from a text corpus. Two approaches were presented for priming vector representations according to context. The first approach uses a concept combination heuristic to adjust the vector representation of a concept in the light of the representation of another concept. The second approach computes a prototypical concept an the basis of exemplar trace texts and moves it in the dimensional space according to the context. Information inference is evaluated by measuring the effectiveness of query models derived by information flow computations. Results show that information flow contributes significantly to query model effectiveness, particularly with respect to precision. Moreover, retrieval effectiveness compares favorably with two probabilistic query models, and another based an semantic association. More generally, this article can be seen as a contribution towards realizing operational systems that mimic text-based human reasoning.
    Date
    22. 3.2003 19:35:46
    Footnote
    Beitrag eines Themenheftes: Mathematical, logical, and formal methods in information retrieval
    Source
    Journal of the American Society for Information Science and technology. 54(2003) no.4, S.321-334
  10. Kanaeva, Z.: Ranking: Google und CiteSeer (2005) 0.02
    0.020450171 = product of:
      0.040900342 = sum of:
        0.01699316 = weight(_text_:information in 3276) [ClassicSimilarity], result of:
          0.01699316 = score(doc=3276,freq=4.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.1920054 = fieldWeight in 3276, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3276)
        0.023907183 = product of:
          0.047814365 = sum of:
            0.047814365 = weight(_text_:22 in 3276) [ClassicSimilarity], result of:
              0.047814365 = score(doc=3276,freq=2.0), product of:
                0.17654699 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050415643 = queryNorm
                0.2708308 = fieldWeight in 3276, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3276)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Im Rahmen des klassischen Information Retrieval wurden verschiedene Verfahren für das Ranking sowie die Suche in einer homogenen strukturlosen Dokumentenmenge entwickelt. Die Erfolge der Suchmaschine Google haben gezeigt dass die Suche in einer zwar inhomogenen aber zusammenhängenden Dokumentenmenge wie dem Internet unter Berücksichtigung der Dokumentenverbindungen (Links) sehr effektiv sein kann. Unter den von der Suchmaschine Google realisierten Konzepten ist ein Verfahren zum Ranking von Suchergebnissen (PageRank), das in diesem Artikel kurz erklärt wird. Darüber hinaus wird auf die Konzepte eines Systems namens CiteSeer eingegangen, welches automatisch bibliographische Angaben indexiert (engl. Autonomous Citation Indexing, ACI). Letzteres erzeugt aus einer Menge von nicht vernetzten wissenschaftlichen Dokumenten eine zusammenhängende Dokumentenmenge und ermöglicht den Einsatz von Banking-Verfahren, die auf den von Google genutzten Verfahren basieren.
    Date
    20. 3.2005 16:23:22
    Source
    Information - Wissenschaft und Praxis. 56(2005) H.2, S.87-92
  11. Fan, W.; Fox, E.A.; Pathak, P.; Wu, H.: ¬The effects of fitness functions an genetic programming-based ranking discovery for Web search (2004) 0.02
    0.019165486 = product of:
      0.038330972 = sum of:
        0.017839102 = weight(_text_:information in 2239) [ClassicSimilarity], result of:
          0.017839102 = score(doc=2239,freq=6.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.20156369 = fieldWeight in 2239, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=2239)
        0.02049187 = product of:
          0.04098374 = sum of:
            0.04098374 = weight(_text_:22 in 2239) [ClassicSimilarity], result of:
              0.04098374 = score(doc=2239,freq=2.0), product of:
                0.17654699 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050415643 = queryNorm
                0.23214069 = fieldWeight in 2239, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2239)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Genetic-based evolutionary learning algorithms, such as genetic algorithms (GAs) and genetic programming (GP), have been applied to information retrieval (IR) since the 1980s. Recently, GP has been applied to a new IR taskdiscovery of ranking functions for Web search-and has achieved very promising results. However, in our prior research, only one fitness function has been used for GP-based learning. It is unclear how other fitness functions may affect ranking function discovery for Web search, especially since it is weIl known that choosing a proper fitness function is very important for the effectiveness and efficiency of evolutionary algorithms. In this article, we report our experience in contrasting different fitness function designs an GP-based learning using a very large Web corpus. Our results indicate that the design of fitness functions is instrumental in performance improvement. We also give recommendations an the design of fitness functions for genetic-based information retrieval experiments.
    Date
    31. 5.2004 19:22:06
    Source
    Journal of the American Society for Information Science and technology. 55(2004) no.7, S.628-636
  12. Campos, L.M. de; Fernández-Luna, J.M.; Huete, J.F.: Implementing relevance feedback in the Bayesian network retrieval model (2003) 0.02
    0.019165486 = product of:
      0.038330972 = sum of:
        0.017839102 = weight(_text_:information in 825) [ClassicSimilarity], result of:
          0.017839102 = score(doc=825,freq=6.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.20156369 = fieldWeight in 825, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=825)
        0.02049187 = product of:
          0.04098374 = sum of:
            0.04098374 = weight(_text_:22 in 825) [ClassicSimilarity], result of:
              0.04098374 = score(doc=825,freq=2.0), product of:
                0.17654699 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050415643 = queryNorm
                0.23214069 = fieldWeight in 825, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=825)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Relevance Feedback consists in automatically formulating a new query according to the relevance judgments provided by the user after evaluating a set of retrieved documents. In this article, we introduce several relevance feedback methods for the Bayesian Network Retrieval ModeL The theoretical frame an which our methods are based uses the concept of partial evidences, which summarize the new pieces of information gathered after evaluating the results obtained by the original query. These partial evidences are inserted into the underlying Bayesian network and a new inference process (probabilities propagation) is run to compute the posterior relevance probabilities of the documents in the collection given the new query. The quality of the proposed methods is tested using a preliminary experimentation with different standard document collections.
    Date
    22. 3.2003 19:30:19
    Footnote
    Beitrag eines Themenheftes: Mathematical, logical, and formal methods in information retrieval
    Source
    Journal of the American Society for Information Science and technology. 54(2003) no.4, S.302-313
  13. Dominich, S.: Mathematical foundations of information retrieval (2001) 0.02
    0.01905007 = product of:
      0.03810014 = sum of:
        0.02102358 = weight(_text_:information in 1753) [ClassicSimilarity], result of:
          0.02102358 = score(doc=1753,freq=12.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.23754507 = fieldWeight in 1753, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1753)
        0.01707656 = product of:
          0.03415312 = sum of:
            0.03415312 = weight(_text_:22 in 1753) [ClassicSimilarity], result of:
              0.03415312 = score(doc=1753,freq=2.0), product of:
                0.17654699 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050415643 = queryNorm
                0.19345059 = fieldWeight in 1753, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1753)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    This book offers a comprehensive and consistent mathematical approach to information retrieval (IR) without which no implementation is possible, and sheds an entirely new light upon the structure of IR models. It contains the descriptions of all IR models in a unified formal style and language, along with examples for each, thus offering a comprehensive overview of them. The book also creates mathematical foundations and a consistent mathematical theory (including all mathematical results achieved so far) of IR as a stand-alone mathematical discipline, which thus can be read and taught independently. Also, the book contains all necessary mathematical knowledge on which IR relies, to help the reader avoid searching different sources. The book will be of interest to computer or information scientists, librarians, mathematicians, undergraduate students and researchers whose work involves information retrieval.
    Date
    22. 3.2008 12:26:32
    LCSH
    Information storage and retrieval
    Subject
    Information storage and retrieval
  14. Witschel, H.F.: Global term weights in distributed environments (2008) 0.02
    0.017528716 = product of:
      0.035057433 = sum of:
        0.014565565 = weight(_text_:information in 2096) [ClassicSimilarity], result of:
          0.014565565 = score(doc=2096,freq=4.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.16457605 = fieldWeight in 2096, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=2096)
        0.02049187 = product of:
          0.04098374 = sum of:
            0.04098374 = weight(_text_:22 in 2096) [ClassicSimilarity], result of:
              0.04098374 = score(doc=2096,freq=2.0), product of:
                0.17654699 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050415643 = queryNorm
                0.23214069 = fieldWeight in 2096, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2096)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    This paper examines the estimation of global term weights (such as IDF) in information retrieval scenarios where a global view on the collection is not available. In particular, the two options of either sampling documents or of using a reference corpus independent of the target retrieval collection are compared using standard IR test collections. In addition, the possibility of pruning term lists based on frequency is evaluated. The results show that very good retrieval performance can be reached when just the most frequent terms of a collection - an "extended stop word list" - are known and all terms which are not in that list are treated equally. However, the list cannot always be fully estimated from a general-purpose reference corpus, but some "domain-specific stop words" need to be added. A good solution for achieving this is to mix estimates from small samples of the target retrieval collection with ones derived from a reference corpus.
    Date
    1. 8.2008 9:44:22
    Source
    Information processing and management. 44(2008) no.3, S.1049-1061
  15. Lopez-Pujalte, C.; Guerrero Bote, V.P.; Moya-Anegón, F. de: Evaluation of the application of genetic algorithms to relevance feedback (2003) 0.02
    0.01680845 = product of:
      0.0336169 = sum of:
        0.008582841 = weight(_text_:information in 2756) [ClassicSimilarity], result of:
          0.008582841 = score(doc=2756,freq=2.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.09697737 = fieldWeight in 2756, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2756)
        0.025034059 = product of:
          0.050068118 = sum of:
            0.050068118 = weight(_text_:organization in 2756) [ClassicSimilarity], result of:
              0.050068118 = score(doc=2756,freq=4.0), product of:
                0.17974974 = queryWeight, product of:
                  3.5653565 = idf(docFreq=3399, maxDocs=44218)
                  0.050415643 = queryNorm
                0.27854347 = fieldWeight in 2756, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.5653565 = idf(docFreq=3399, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2756)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    We evaluated the different genetic algorithms applied to relevance feedback that are to be found in the literature and which follow the vector space model (the most commonly used model in this type of application). They were compared with a traditional relevance feedback algorithm - the Ide dec-hi method - since this had given the best results in the study of Salton & Buckley (1990) an this subject. The experiment was performed an the Cranfield collection, and the different algorithms were evaluated using the residual collection method (one of the most suitable methods for evaluating relevance feedback techniques). The results varied greatly depending an the fitness function that was used, from no improvement in some of the genetic algorithms, to a more than 127% improvement with one algorithm, surpassing even the traditional Ide dec-hi method. One can therefore conclude that genetic algorithms show great promise as an aid to implementing a truly effective information retrieval system.
    Series
    Advances in knowledge organization; vol.8
    Source
    Challenges in knowledge representation and organization for the 21st century: Integration of knowledge across boundaries. Proceedings of the 7th ISKO International Conference Granada, Spain, July 10-13, 2002. Ed.: M. López-Huertas
  16. Shiri, A.A.; Revie, C.: Query expansion behavior within a thesaurus-enhanced search environment : a user-centered evaluation (2006) 0.01
    0.0128297005 = product of:
      0.025659401 = sum of:
        0.008582841 = weight(_text_:information in 56) [ClassicSimilarity], result of:
          0.008582841 = score(doc=56,freq=2.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.09697737 = fieldWeight in 56, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=56)
        0.01707656 = product of:
          0.03415312 = sum of:
            0.03415312 = weight(_text_:22 in 56) [ClassicSimilarity], result of:
              0.03415312 = score(doc=56,freq=2.0), product of:
                0.17654699 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050415643 = queryNorm
                0.19345059 = fieldWeight in 56, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=56)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Date
    22. 7.2006 16:32:43
    Source
    Journal of the American Society for Information Science and Technology. 57(2006) no.4, S.462-478
  17. Austin, D.: How Google finds your needle in the Web's haystack : as we'll see, the trick is to ask the web itself to rank the importance of pages... (2006) 0.01
    0.011398684 = product of:
      0.022797368 = sum of:
        0.010406143 = weight(_text_:information in 93) [ClassicSimilarity], result of:
          0.010406143 = score(doc=93,freq=6.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.11757882 = fieldWeight in 93, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.02734375 = fieldNorm(doc=93)
        0.012391226 = product of:
          0.024782453 = sum of:
            0.024782453 = weight(_text_:organization in 93) [ClassicSimilarity], result of:
              0.024782453 = score(doc=93,freq=2.0), product of:
                0.17974974 = queryWeight, product of:
                  3.5653565 = idf(docFreq=3399, maxDocs=44218)
                  0.050415643 = queryNorm
                0.13787198 = fieldWeight in 93, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5653565 = idf(docFreq=3399, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=93)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Imagine a library containing 25 billion documents but with no centralized organization and no librarians. In addition, anyone may add a document at any time without telling anyone. You may feel sure that one of the documents contained in the collection has a piece of information that is vitally important to you, and, being impatient like most of us, you'd like to find it in a matter of seconds. How would you go about doing it? Posed in this way, the problem seems impossible. Yet this description is not too different from the World Wide Web, a huge, highly-disorganized collection of documents in many different formats. Of course, we're all familiar with search engines (perhaps you found this article using one) so we know that there is a solution. This article will describe Google's PageRank algorithm and how it returns pages from the web's collection of 25 billion documents that match search criteria so well that "google" has become a widely used verb. Most search engines, including Google, continually run an army of computer programs that retrieve pages from the web, index the words in each document, and store this information in an efficient format. Each time a user asks for a web search using a search phrase, such as "search engine," the search engine determines all the pages on the web that contains the words in the search phrase. (Perhaps additional information such as the distance between the words "search" and "engine" will be noted as well.) Here is the problem: Google now claims to index 25 billion pages. Roughly 95% of the text in web pages is composed from a mere 10,000 words. This means that, for most searches, there will be a huge number of pages containing the words in the search phrase. What is needed is a means of ranking the importance of the pages that fit the search criteria so that the pages can be sorted with the most important pages at the top of the list. One way to determine the importance of pages is to use a human-generated ranking. For instance, you may have seen pages that consist mainly of a large number of links to other resources in a particular area of interest. Assuming the person maintaining this page is reliable, the pages referenced are likely to be useful. Of course, the list may quickly fall out of date, and the person maintaining the list may miss some important pages, either unintentionally or as a result of an unstated bias. Google's PageRank algorithm assesses the importance of web pages without human evaluation of the content. In fact, Google feels that the value of its service is largely in its ability to provide unbiased results to search queries; Google claims, "the heart of our software is PageRank." As we'll see, the trick is to ask the web itself to rank the importance of pages.
  18. Aizawa, A.: ¬An information-theoretic perspective of tf-idf measures (2003) 0.01
    0.008409433 = product of:
      0.033637732 = sum of:
        0.033637732 = weight(_text_:information in 4155) [ClassicSimilarity], result of:
          0.033637732 = score(doc=4155,freq=12.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.38007212 = fieldWeight in 4155, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0625 = fieldNorm(doc=4155)
      0.25 = coord(1/4)
    
    Abstract
    This paper presents a mathematical definition of the "probability-weighted amount of information" (PWI), a measure of specificity of terms in documents that is based on an information-theoretic view of retrieval events. The proposed PWI is expressed as a product of the occurrence probabilities of terms and their amounts of information, and corresponds well with the conventional term frequency - inverse document frequency measures that are commonly used in today's information retrieval systems. The mathematical definition of the PWI is shown, together with some illustrative examples of the calculation.
    Source
    Information processing and management. 39(2003) no.1, S.45-65
  19. Thompson, P.: Looking back: on relevance, probabilistic indexing and information retrieval (2008) 0.01
    0.008409433 = product of:
      0.033637732 = sum of:
        0.033637732 = weight(_text_:information in 2074) [ClassicSimilarity], result of:
          0.033637732 = score(doc=2074,freq=12.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.38007212 = fieldWeight in 2074, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0625 = fieldNorm(doc=2074)
      0.25 = coord(1/4)
    
    Abstract
    Forty-eight years ago Maron and Kuhns published their paper, "On Relevance, Probabilistic Indexing and Information Retrieval" (1960). This was the first paper to present a probabilistic approach to information retrieval, and perhaps the first paper on ranked retrieval. Although it is one of the most widely cited papers in the field of information retrieval, many researchers today may not be familiar with its influence. This paper describes the Maron and Kuhns article and the influence that it has had on the field of information retrieval.
    Source
    Information processing and management. 44(2008) no.2, S.963-970
  20. Mandl, T.: Web- und Multimedia-Dokumente : Neuere Entwicklungen bei der Evaluierung von Information Retrieval Systemen (2003) 0.01
    0.0076767267 = product of:
      0.030706907 = sum of:
        0.030706907 = weight(_text_:information in 1734) [ClassicSimilarity], result of:
          0.030706907 = score(doc=1734,freq=10.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.3469568 = fieldWeight in 1734, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0625 = fieldNorm(doc=1734)
      0.25 = coord(1/4)
    
    Abstract
    Die Menge an Daten im Internet steigt weiter rapide an. Damit wächst auch der Bedarf an qualitativ hochwertigen Information Retrieval Diensten zur Orientierung und problemorientierten Suche. Die Entscheidung für die Benutzung oder Beschaffung von Information Retrieval Software erfordert aussagekräftige Evaluierungsergebnisse. Dieser Beitrag stellt neuere Entwicklungen bei der Evaluierung von Information Retrieval Systemen vor und zeigt den Trend zu Spezialisierung und Diversifizierung von Evaluierungsstudien, die den Realitätsgrad derErgebnisse erhöhen. DerSchwerpunkt liegt auf dem Retrieval von Fachtexten, Internet-Seiten und Multimedia-Objekten.
    Source
    Information - Wissenschaft und Praxis. 54(2003) H.4, S.203-210

Authors

Languages

  • e 132
  • d 9
  • m 1
  • sp 1
  • More… Less…

Types

  • a 132
  • m 8
  • el 2
  • s 2
  • x 2
  • More… Less…