Document (#37128)

Author
Witschel, H.F.
Title
Global and local resources for peer-to-peer text retrieval
Imprint
Leipzig : Universität / Fakultät für Mathematik und Informatik Institut für Informatik
Year
2008
Pages
X, 189 S
Abstract
This thesis is organised as follows: Chapter 2 gives a general introduction to the field of information retrieval, covering its most important aspects. Further, the tasks of distributed and peer-to-peer information retrieval (P2PIR) are introduced, motivating their application and characterising the special challenges that they involve, including a review of existing architectures and search protocols in P2PIR. Finally, chapter 2 presents approaches to evaluating the e ectiveness of both traditional and peer-to-peer IR systems. Chapter 3 contains a detailed account of state-of-the-art information retrieval models and algorithms. This encompasses models for matching queries against document representations, term weighting algorithms, approaches to feedback and associative retrieval as well as distributed retrieval. It thus defines important terminology for the following chapters. The notion of "multi-level association graphs" (MLAGs) is introduced in chapter 4. An MLAG is a simple, graph-based framework that allows to model most of the theoretical and practical approaches to IR presented in chapter 3. Moreover, it provides an easy-to-grasp way of defining and including new entities into IR modeling, such as paragraphs or peers, dividing them conceptually while at the same time connecting them to each other in a meaningful way. This allows for a unified view on many IR tasks, including that of distributed and peer-to-peer search. Starting from related work and a formal defiition of the framework, the possibilities of modeling that it provides are discussed in detail, followed by an experimental section that shows how new insights gained from modeling inside the framework can lead to novel combinations of principles and eventually to improved retrieval effectiveness.
Chapter 5 empirically tackles the first of the two research questions formulated above, namely the question of global collection statistics. More precisely, it studies possibilities of radically simplified results merging. The simplification comes from the attempt - without having knowledge of the complete collection - to equip all peers with the same global statistics, making document scores comparable across peers. Chapter 5 empirically tackles the first of the two research questions formulated above, namely the question of global collection statistics. More precisely, it studies possibilities of radically simplified results merging. The simplification comes from the attempt - without having knowledge of the complete collection - to equip all peers with the same global statistics, making document scores comparable across peers. What is examined, is the question of how we can obtain such global statistics and to what extent their use will lead to a drop in retrieval effectiveness. In chapter 6, the second research question is tackled, namely that of making forwarding decisions for queries, based on profiles of other peers. After a review of related work in that area, the chapter first defines the approaches that will be compared against each other. Then, a novel evaluation framework is introduced, including a new measure for comparing results of a distributed search engine against those of a centralised one. Finally, the actual evaluation is performed using the new framework.
Content
Dissertation zur Erlangung des akademischen Grades doctor rerum naturalium (Dr. rer. nat.) im Fachgebiet Informatik.
Theme
Computerlinguistik

Similar documents (author)

  1. Witschel, H.F.: Global term weights in distributed environments (2008) 6.19
    6.190705 = sum of:
      6.190705 = weight(author_txt:witschel in 2096) [ClassicSimilarity], result of:
        6.190705 = fieldWeight in 2096, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.905128 = idf(docFreq=5, maxDocs=44218)
          0.625 = fieldNorm(doc=2096)
    
  2. Witschel, H.F.: Terminologie-Extraktion : Möglichkeiten der Kombination statistischer uns musterbasierter Verfahren (2004) 6.19
    6.190705 = sum of:
      6.190705 = weight(author_txt:witschel in 123) [ClassicSimilarity], result of:
        6.190705 = fieldWeight in 123, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.905128 = idf(docFreq=5, maxDocs=44218)
          0.625 = fieldNorm(doc=123)
    
  3. Witschel, H.F.: Text, Wörter, Morpheme : Möglichkeiten einer automatischen Terminologie-Extraktion (2004) 6.19
    6.190705 = sum of:
      6.190705 = weight(author_txt:witschel in 126) [ClassicSimilarity], result of:
        6.190705 = fieldWeight in 126, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.905128 = idf(docFreq=5, maxDocs=44218)
          0.625 = fieldNorm(doc=126)
    
  4. Witschel, H.F.: Terminology extraction and automatic indexing : comparison and qualitative evaluation of methods (2005) 6.19
    6.190705 = sum of:
      6.190705 = weight(author_txt:witschel in 1842) [ClassicSimilarity], result of:
        6.190705 = fieldWeight in 1842, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.905128 = idf(docFreq=5, maxDocs=44218)
          0.625 = fieldNorm(doc=1842)
    

Similar documents (content)

  1. Fensel, D.; Staab, S.; Studer, R.; Harmelen, F. van; Davies, J.: ¬A future perspective : exploiting peer-to-peer and the Semantic Web for knowledge management (2004) 0.17
    0.16946809 = sum of:
      0.16946809 = product of:
        0.8473404 = sum of:
          0.07532089 = weight(abstract_txt:possibilities in 2262) [ClassicSimilarity], result of:
            0.07532089 = score(doc=2262,freq=1.0), product of:
              0.13133924 = queryWeight, product of:
                1.2172635 = boost
                6.1171575 = idf(docFreq=264, maxDocs=44218)
                0.017638443 = queryNorm
              0.5734835 = fieldWeight in 2262, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1171575 = idf(docFreq=264, maxDocs=44218)
                0.09375 = fieldNorm(doc=2262)
          0.060728546 = weight(abstract_txt:approaches in 2262) [ClassicSimilarity], result of:
            0.060728546 = score(doc=2262,freq=2.0), product of:
              0.09939145 = queryWeight, product of:
                1.2227318 = boost
                4.6084785 = idf(docFreq=1197, maxDocs=44218)
                0.017638443 = queryNorm
              0.6110037 = fieldWeight in 2262, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6084785 = idf(docFreq=1197, maxDocs=44218)
                0.09375 = fieldNorm(doc=2262)
          0.016508462 = weight(abstract_txt:that in 2262) [ClassicSimilarity], result of:
            0.016508462 = score(doc=2262,freq=2.0), product of:
              0.05254945 = queryWeight, product of:
                1.2573489 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.017638443 = queryNorm
              0.314151 = fieldWeight in 2262, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.09375 = fieldNorm(doc=2262)
          0.3405395 = weight(abstract_txt:peer in 2262) [ClassicSimilarity], result of:
            0.3405395 = score(doc=2262,freq=2.0), product of:
              0.39525324 = queryWeight, product of:
                3.4483347 = boost
                6.49839 = idf(docFreq=180, maxDocs=44218)
                0.017638443 = queryNorm
              0.8615729 = fieldWeight in 2262, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.49839 = idf(docFreq=180, maxDocs=44218)
                0.09375 = fieldNorm(doc=2262)
          0.35424304 = weight(abstract_txt:chapter in 2262) [ClassicSimilarity], result of:
            0.35424304 = score(doc=2262,freq=2.0), product of:
              0.42203534 = queryWeight, product of:
                3.7793956 = boost
                6.330911 = idf(docFreq=213, maxDocs=44218)
                0.017638443 = queryNorm
              0.83936816 = fieldWeight in 2262, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.330911 = idf(docFreq=213, maxDocs=44218)
                0.09375 = fieldNorm(doc=2262)
        0.2 = coord(5/25)
    
  2. Habernal, I.; Konopík, M.; Rohlík, O.: Question answering (2012) 0.11
    0.11421165 = sum of:
      0.11421165 = product of:
        0.47588187 = sum of:
          0.0627674 = weight(abstract_txt:possibilities in 101) [ClassicSimilarity], result of:
            0.0627674 = score(doc=101,freq=1.0), product of:
              0.13133924 = queryWeight, product of:
                1.2172635 = boost
                6.1171575 = idf(docFreq=264, maxDocs=44218)
                0.017638443 = queryNorm
              0.47790292 = fieldWeight in 101, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1171575 = idf(docFreq=264, maxDocs=44218)
                0.078125 = fieldNorm(doc=101)
          0.03578464 = weight(abstract_txt:approaches in 101) [ClassicSimilarity], result of:
            0.03578464 = score(doc=101,freq=1.0), product of:
              0.09939145 = queryWeight, product of:
                1.2227318 = boost
                4.6084785 = idf(docFreq=1197, maxDocs=44218)
                0.017638443 = queryNorm
              0.3600374 = fieldWeight in 101, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6084785 = idf(docFreq=1197, maxDocs=44218)
                0.078125 = fieldNorm(doc=101)
          0.009727704 = weight(abstract_txt:that in 101) [ClassicSimilarity], result of:
            0.009727704 = score(doc=101,freq=1.0), product of:
              0.05254945 = queryWeight, product of:
                1.2573489 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.017638443 = queryNorm
              0.18511525 = fieldWeight in 101, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.078125 = fieldNorm(doc=101)
          0.11546286 = weight(abstract_txt:question in 101) [ClassicSimilarity], result of:
            0.11546286 = score(doc=101,freq=5.0), product of:
              0.12691788 = queryWeight, product of:
                1.3817139 = boost
                5.207682 = idf(docFreq=657, maxDocs=44218)
                0.017638443 = queryNorm
              0.9097446 = fieldWeight in 101, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.207682 = idf(docFreq=657, maxDocs=44218)
                0.078125 = fieldNorm(doc=101)
          0.04339958 = weight(abstract_txt:retrieval in 101) [ClassicSimilarity], result of:
            0.04339958 = score(doc=101,freq=2.0), product of:
              0.113033794 = queryWeight, product of:
                1.8440634 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.017638443 = queryNorm
              0.38395226 = fieldWeight in 101, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.078125 = fieldNorm(doc=101)
          0.20873971 = weight(abstract_txt:chapter in 101) [ClassicSimilarity], result of:
            0.20873971 = score(doc=101,freq=1.0), product of:
              0.42203534 = queryWeight, product of:
                3.7793956 = boost
                6.330911 = idf(docFreq=213, maxDocs=44218)
                0.017638443 = queryNorm
              0.49460244 = fieldWeight in 101, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.330911 = idf(docFreq=213, maxDocs=44218)
                0.078125 = fieldNorm(doc=101)
        0.24 = coord(6/25)
    
  3. Aringhieri, R.; Damiani, E.; De Capitani di Vimercati, S.; Paraboschi, S.; Samarati, P.: Fuzzy techniques for trust and reputation management in anonymous peer-to-peer systems (2006) 0.11
    0.11313551 = sum of:
      0.11313551 = product of:
        0.70709693 = sum of:
          0.03578464 = weight(abstract_txt:approaches in 5279) [ClassicSimilarity], result of:
            0.03578464 = score(doc=5279,freq=1.0), product of:
              0.09939145 = queryWeight, product of:
                1.2227318 = boost
                4.6084785 = idf(docFreq=1197, maxDocs=44218)
                0.017638443 = queryNorm
              0.3600374 = fieldWeight in 5279, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6084785 = idf(docFreq=1197, maxDocs=44218)
                0.078125 = fieldNorm(doc=5279)
          0.009727704 = weight(abstract_txt:that in 5279) [ClassicSimilarity], result of:
            0.009727704 = score(doc=5279,freq=1.0), product of:
              0.05254945 = queryWeight, product of:
                1.2573489 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.017638443 = queryNorm
              0.18511525 = fieldWeight in 5279, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.078125 = fieldNorm(doc=5279)
          0.37780166 = weight(abstract_txt:peers in 5279) [ClassicSimilarity], result of:
            0.37780166 = score(doc=5279,freq=2.0), product of:
              0.4345912 = queryWeight, product of:
                3.1314306 = boost
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.017638443 = queryNorm
              0.86932653 = fieldWeight in 5279, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.078125 = fieldNorm(doc=5279)
          0.2837829 = weight(abstract_txt:peer in 5279) [ClassicSimilarity], result of:
            0.2837829 = score(doc=5279,freq=2.0), product of:
              0.39525324 = queryWeight, product of:
                3.4483347 = boost
                6.49839 = idf(docFreq=180, maxDocs=44218)
                0.017638443 = queryNorm
              0.7179774 = fieldWeight in 5279, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.49839 = idf(docFreq=180, maxDocs=44218)
                0.078125 = fieldNorm(doc=5279)
        0.16 = coord(4/25)
    
  4. Laut, J.; Cappa, F.; Nov, O.; Porfiri, M.: Increasing citizen science contribution using a virtual peer (2017) 0.11
    0.11033242 = sum of:
      0.11033242 = product of:
        0.68957764 = sum of:
          0.024111873 = weight(abstract_txt:including in 3427) [ClassicSimilarity], result of:
            0.024111873 = score(doc=3427,freq=1.0), product of:
              0.08864316 = queryWeight, product of:
                1.154727 = boost
                4.352168 = idf(docFreq=1547, maxDocs=44218)
                0.017638443 = queryNorm
              0.2720105 = fieldWeight in 3427, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.352168 = idf(docFreq=1547, maxDocs=44218)
                0.0625 = fieldNorm(doc=3427)
          0.011005641 = weight(abstract_txt:that in 3427) [ClassicSimilarity], result of:
            0.011005641 = score(doc=3427,freq=2.0), product of:
              0.05254945 = queryWeight, product of:
                1.2573489 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.017638443 = queryNorm
              0.20943399 = fieldWeight in 3427, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=3427)
          0.4274338 = weight(abstract_txt:peers in 3427) [ClassicSimilarity], result of:
            0.4274338 = score(doc=3427,freq=4.0), product of:
              0.4345912 = queryWeight, product of:
                3.1314306 = boost
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.017638443 = queryNorm
              0.9835307 = fieldWeight in 3427, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.0625 = fieldNorm(doc=3427)
          0.22702633 = weight(abstract_txt:peer in 3427) [ClassicSimilarity], result of:
            0.22702633 = score(doc=3427,freq=2.0), product of:
              0.39525324 = queryWeight, product of:
                3.4483347 = boost
                6.49839 = idf(docFreq=180, maxDocs=44218)
                0.017638443 = queryNorm
              0.57438195 = fieldWeight in 3427, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.49839 = idf(docFreq=180, maxDocs=44218)
                0.0625 = fieldNorm(doc=3427)
        0.16 = coord(4/25)
    
  5. Chang, Y.; Ounis, I.; Kim, M.: Query reformulation using automatically generated query concepts from a document space (2006) 0.11
    0.10903353 = sum of:
      0.10903353 = product of:
        0.38940546 = sum of:
          0.07879201 = weight(abstract_txt:precisely in 972) [ClassicSimilarity], result of:
            0.07879201 = score(doc=972,freq=1.0), product of:
              0.1335148 = queryWeight, product of:
                1.0020893 = boost
                7.5537524 = idf(docFreq=62, maxDocs=44218)
                0.017638443 = queryNorm
              0.5901369 = fieldWeight in 972, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5537524 = idf(docFreq=62, maxDocs=44218)
                0.078125 = fieldNorm(doc=972)
          0.051196676 = weight(abstract_txt:introduced in 972) [ClassicSimilarity], result of:
            0.051196676 = score(doc=972,freq=1.0), product of:
              0.114656724 = queryWeight, product of:
                1.1373316 = boost
                5.715473 = idf(docFreq=395, maxDocs=44218)
                0.017638443 = queryNorm
              0.44652134 = fieldWeight in 972, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.715473 = idf(docFreq=395, maxDocs=44218)
                0.078125 = fieldNorm(doc=972)
          0.059624597 = weight(abstract_txt:modeling in 972) [ClassicSimilarity], result of:
            0.059624597 = score(doc=972,freq=1.0), product of:
              0.12691765 = queryWeight, product of:
                1.1965982 = boost
                6.0133076 = idf(docFreq=293, maxDocs=44218)
                0.017638443 = queryNorm
              0.46978965 = fieldWeight in 972, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0133076 = idf(docFreq=293, maxDocs=44218)
                0.078125 = fieldNorm(doc=972)
          0.03578464 = weight(abstract_txt:approaches in 972) [ClassicSimilarity], result of:
            0.03578464 = score(doc=972,freq=1.0), product of:
              0.09939145 = queryWeight, product of:
                1.2227318 = boost
                4.6084785 = idf(docFreq=1197, maxDocs=44218)
                0.017638443 = queryNorm
              0.3600374 = fieldWeight in 972, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6084785 = idf(docFreq=1197, maxDocs=44218)
                0.078125 = fieldNorm(doc=972)
          0.0519371 = weight(abstract_txt:collection in 972) [ClassicSimilarity], result of:
            0.0519371 = score(doc=972,freq=2.0), product of:
              0.10112528 = queryWeight, product of:
                1.2333506 = boost
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.017638443 = queryNorm
              0.51359165 = fieldWeight in 972, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.078125 = fieldNorm(doc=972)
          0.016848879 = weight(abstract_txt:that in 972) [ClassicSimilarity], result of:
            0.016848879 = score(doc=972,freq=3.0), product of:
              0.05254945 = queryWeight, product of:
                1.2573489 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.017638443 = queryNorm
              0.320629 = fieldWeight in 972, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.078125 = fieldNorm(doc=972)
          0.09522154 = weight(abstract_txt:global in 972) [ClassicSimilarity], result of:
            0.09522154 = score(doc=972,freq=1.0), product of:
              0.21847671 = queryWeight, product of:
                2.2202654 = boost
                5.57879 = idf(docFreq=453, maxDocs=44218)
                0.017638443 = queryNorm
              0.435843 = fieldWeight in 972, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.57879 = idf(docFreq=453, maxDocs=44218)
                0.078125 = fieldNorm(doc=972)
        0.28 = coord(7/25)