Search (281 results, page 1 of 15)

  • × theme_ss:"Retrievalalgorithmen"
  1. Smeaton, A.F.; Rijsbergen, C.J. van: ¬The retrieval effects of query expansion on a feedback document retrieval system (1983) 0.03
    0.026315134 = product of:
      0.09210297 = sum of:
        0.057749767 = product of:
          0.14437442 = sum of:
            0.08879615 = weight(_text_:retrieval in 2134) [ClassicSimilarity], result of:
              0.08879615 = score(doc=2134,freq=6.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.8104139 = fieldWeight in 2134, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.109375 = fieldNorm(doc=2134)
            0.055578265 = weight(_text_:system in 2134) [ClassicSimilarity], result of:
              0.055578265 = score(doc=2134,freq=2.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.4871716 = fieldWeight in 2134, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.109375 = fieldNorm(doc=2134)
          0.4 = coord(2/5)
        0.0343532 = product of:
          0.0687064 = sum of:
            0.0687064 = weight(_text_:22 in 2134) [ClassicSimilarity], result of:
              0.0687064 = score(doc=2134,freq=2.0), product of:
                0.12684377 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03622214 = queryNorm
                0.5416616 = fieldWeight in 2134, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=2134)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Date
    30. 3.2001 13:32:22
    Theme
    Semantisches Umfeld in Indexierung u. Retrieval
  2. Mandl, T.: Tolerantes Information Retrieval : Neuronale Netze zur Erhöhung der Adaptivität und Flexibilität bei der Informationssuche (2001) 0.02
    0.01788698 = product of:
      0.06260443 = sum of:
        0.005074066 = product of:
          0.02537033 = sum of:
            0.02537033 = weight(_text_:retrieval in 5965) [ClassicSimilarity], result of:
              0.02537033 = score(doc=5965,freq=24.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.23154683 = fieldWeight in 5965, product of:
                  4.8989797 = tf(freq=24.0), with freq of:
                    24.0 = termFreq=24.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.015625 = fieldNorm(doc=5965)
          0.2 = coord(1/5)
        0.057530362 = weight(_text_:mehrsprachigkeit in 5965) [ClassicSimilarity], result of:
          0.057530362 = score(doc=5965,freq=2.0), product of:
            0.3070917 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.03622214 = queryNorm
            0.18733935 = fieldWeight in 5965, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.015625 = fieldNorm(doc=5965)
      0.2857143 = coord(2/7)
    
    Abstract
    Ein wesentliches Bedürfnis im Rahmen der Mensch-Maschine-Interaktion ist die Suche nach Information. Um Information Retrieval (IR) Systeme kognitiv adäquat zu gestalten und sie an den Menschen anzupassen bieten sich Modelle des Soft Computing an. Ein umfassender state-of-the-art Bericht zu neuronalen Netzen im IR zeigt dass die meisten bestehenden Modelle das Potential neuronaler Netze nicht ausschöpfen. Das vorgestellte COSIMIR-Modell (Cognitive Similarity learning in Information Retrieval) basiert auf neuronalen Netzen und lernt, die Ähnlichkeit zwischen Anfrage und Dokument zu berechnen. Es trägt somit die kognitive Modellierung in den Kern eines IR Systems. Das Transformations-Netzwerk ist ein weiteres neuronales Netzwerk, das die Behandlung von Heterogenität anhand von Expertenurteilen lernt. Das COSIMIR-Modell und das Transformations-Netzwerk werden ausführlich diskutiert und anhand realer Datenmengen evaluiert
    Content
    Kapitel: 1 Einleitung - 2 Grundlagen des Information Retrieval - 3 Grundlagen neuronaler Netze - 4 Neuronale Netze im Information Retrieval - 5 Heterogenität und ihre Behandlung im Information Retrieval - 6 Das COSIMIR-Modell - 7 Experimente mit dem COSIMIR-Modell und dem Transformations-Netzwerk - 8 Fazit
    Footnote
    Rez. in: nfd - Information 54(2003) H.6, S.379-380 (U. Thiel): "Kannte G. Salton bei der Entwicklung des Vektorraummodells die kybernetisch orientierten Versuche mit assoziativen Speicherstrukturen? An diese und ähnliche Vermutungen, die ich vor einigen Jahren mit Reginald Ferber und anderen Kollegen diskutierte, erinnerte mich die Thematik des vorliegenden Buches. Immerhin lässt sich feststellen, dass die Vektorrepräsentation eine genial einfache Darstellung sowohl der im Information Retrieval (IR) als grundlegende Datenstruktur benutzten "inverted files" als auch der assoziativen Speichermatrizen darstellt, die sich im Laufe der Zeit Über Perzeptrons zu Neuronalen Netzen (NN) weiterentwickelten. Dieser formale Zusammenhang stimulierte in der Folge eine Reihe von Ansätzen, die Netzwerke im Retrieval zu verwenden, wobei sich, wie auch im vorliegenden Band, hybride Ansätze, die Methoden aus beiden Disziplinen kombinieren, als sehr geeignet erweisen. Aber der Reihe nach... Das Buch wurde vom Autor als Dissertation beim Fachbereich IV "Sprachen und Technik" der Universität Hildesheim eingereicht und resultiert aus einer Folge von Forschungsbeiträgen zu mehreren Projekten, an denen der Autor in der Zeit von 1995 bis 2000 an verschiedenen Standorten beteiligt war. Dies erklärt die ungewohnte Breite der Anwendungen, Szenarien und Domänen, in denen die Ergebnisse gewonnen wurden. So wird das in der Arbeit entwickelte COSIMIR Modell (COgnitive SIMilarity learning in Information Retrieval) nicht nur anhand der klassischen Cranfield-Kollektion evaluiert, sondern auch im WING-Projekt der Universität Regensburg im Faktenretrieval aus einer Werkstoffdatenbank eingesetzt. Weitere Versuche mit der als "Transformations-Netzwerk" bezeichneten Komponente, deren Aufgabe die Abbildung von Gewichtungsfunktionen zwischen zwei Termräumen ist, runden das Spektrum der Experimente ab. Aber nicht nur die vorgestellten Resultate sind vielfältig, auch der dem Leser angebotene "State-of-the-Art"-Überblick fasst in hoch informativer Breite Wesentliches aus den Gebieten IR und NN zusammen und beleuchtet die Schnittpunkte der beiden Bereiche. So werden neben den Grundlagen des Text- und Faktenretrieval die Ansätze zur Verbesserung der Adaptivität und zur Beherrschung von Heterogenität vorgestellt, während als Grundlagen Neuronaler Netze neben einer allgemeinen Einführung in die Grundbegriffe u.a. das Backpropagation-Modell, KohonenNetze und die Adaptive Resonance Theory (ART) geschildert werden. Einweiteres Kapitel stellt die bisherigen NN-orientierten Ansätze im IR vor und rundet den Abriss der relevanten Forschungslandschaft ab. Als Vorbereitung der Präsentation des COSIMIR-Modells schiebt der Autor an dieser Stelle ein diskursives Kapitel zum Thema Heterogenität im IR ein, wodurch die Ziele und Grundannahmen der Arbeit noch einmal reflektiert werden. Als Dimensionen der Heterogenität werden der Objekttyp, die Qualität der Objekte und ihrer Erschließung und die Mehrsprachigkeit genannt. Wenn auch diese Systematik im Wesentlichen die Akzente auf Probleme aus den hier tangierten Projekten legt, und weniger eine umfassende Aufbereitung z.B. der Literatur zum Problem der Relevanz anstrebt, ist sie dennoch hilfreich zum Verständnis der in den nachfolgenden Kapitel oft nur implizit angesprochenen Designentscheidungen bei der Konzeption der entwickelten Prototypen. Der Ansatz, Heterogenität durch Transformationen zu behandeln, wird im speziellen Kontext der NN konkretisiert, wobei andere Möglichkeiten, die z.B. Instrumente der Logik und Probabilistik einzusetzen, nur kurz diskutiert werden. Eine weitergehende Analyse hätte wohl auch den Rahmen der Arbeit zu weit gespannt,
    Im abschließenden Kapitel des Buchs berichtet der Autor über eine Reihe von Experimenten, die im Kontext unterschiedlicher Anwendungen durchgeführt wurden. Die Evaluationen wurden sehr sorgfältig durchgeführt und werden kompetent kommentiert, so dass der Leser sich ein Bild von der Komplexität der Untersuchungen machen kann. Inhaltlich sind die Ergebnisse unterschiedlich, die Verwendung des NN-Ansatzes ist sehr abhängig von der Menge und Qualität des Trainingsmaterials (so sind die Ergebnisse auf der Cranfield-Kollektion wegen der geringen Anzahl von zur Verfügung stehenden Relevanzurteilen schlechter als die der traditionellen Verfahren). Das Experiment mit Werkstoffinformationen im Projekt WING ist eine eher traditionelle NN-Applikation: Aus Merkmalsvektoren soll auf die "Anwendungsähnlichkeit" von Werkstoffen geschlossen werden, was offenbar gut gelingt. Hier sind die konkurrierenden Verfahren aber weniger im IR zu vermuten, sondern eher im Gebiet des Data Mining. Die Versuche mit Textdaten sind Anregung, hier weitere, systematischere Untersuchungen vorzunehmen. So sollte z.B. nicht nur ein Vergleich mit klassischen One-shot IR-Verfahren durchgeführt werden, viel interessanter und aussagekräftiger ist die Gegenüberstellung von NN-Systemen und lernfähigen IR-Systemen, die z.B. über Relevance Feedback Wissen akkumulieren (vergleichbar den NN in der Trainingsphase). Am Ende könnte dann nicht nur ein einheitliches Modell stehen, sondern auch Erkenntnisse darüber, welches Lernverfahren wann vorzuziehen ist. Fazit: Das Buch ist ein hervorragendes Beispiel der "Schriften zur Informationswissenschaft", mit denen der HI (Hochschulverband für Informationswissenschaft) die Ergebnisse der informationswissenschaftlichen Forschung seit etlichen Jahren einem größerem Publikum vorstellt. Es bietet einen umfassenden Überblick zum dynamisch sich entwickelnden Gebiet der Neuronalen Netze im IR, die sich anschicken, ein "tolerantes Information Retrieval" zu ermöglichen."
    RSWK
    Information Retrieval / Neuronales Netz
    Subject
    Information Retrieval / Neuronales Netz
  3. Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.01
    0.014565388 = product of:
      0.050978854 = sum of:
        0.011718053 = product of:
          0.058590267 = sum of:
            0.058590267 = weight(_text_:retrieval in 402) [ClassicSimilarity], result of:
              0.058590267 = score(doc=402,freq=2.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.5347345 = fieldWeight in 402, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.125 = fieldNorm(doc=402)
          0.2 = coord(1/5)
        0.0392608 = product of:
          0.0785216 = sum of:
            0.0785216 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
              0.0785216 = score(doc=402,freq=2.0), product of:
                0.12684377 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03622214 = queryNorm
                0.61904186 = fieldWeight in 402, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.125 = fieldNorm(doc=402)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Source
    Information processing and management. 22(1986) no.6, S.465-476
  4. Klas, C.-P.; Fuhr, N.; Schaefer, A.: Evaluating strategic support for information access in the DAFFODIL system (2004) 0.01
    0.012472611 = product of:
      0.043654136 = sum of:
        0.028931338 = product of:
          0.072328344 = sum of:
            0.03107218 = weight(_text_:retrieval in 2419) [ClassicSimilarity], result of:
              0.03107218 = score(doc=2419,freq=4.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.2835858 = fieldWeight in 2419, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2419)
            0.041256163 = weight(_text_:system in 2419) [ClassicSimilarity], result of:
              0.041256163 = score(doc=2419,freq=6.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.36163113 = fieldWeight in 2419, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2419)
          0.4 = coord(2/5)
        0.0147228 = product of:
          0.0294456 = sum of:
            0.0294456 = weight(_text_:22 in 2419) [ClassicSimilarity], result of:
              0.0294456 = score(doc=2419,freq=2.0), product of:
                0.12684377 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03622214 = queryNorm
                0.23214069 = fieldWeight in 2419, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2419)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    The digital library system Daffodil is targeted at strategic support of users during the information search process. For searching, exploring and managing digital library objects it provides user-customisable information seeking patterns over a federation of heterogeneous digital libraries. In this paper evaluation results with respect to retrieval effectiveness, efficiency and user satisfaction are presented. The analysis focuses on strategic support for the scientific work-flow. Daffodil supports the whole work-flow, from data source selection over information seeking to the representation, organisation and reuse of information. By embedding high level search functionality into the scientific work-flow, the user experiences better strategic system support due to a more systematic work process. These ideas have been implemented in Daffodil followed by a qualitative evaluation. The evaluation has been conducted with 28 participants, ranging from information seeking novices to experts. The results are promising, as they support the chosen model.
    Date
    16.11.2008 16:22:48
    Theme
    Semantisches Umfeld in Indexierung u. Retrieval
  5. Ravana, S.D.; Rajagopal, P.; Balakrishnan, V.: Ranking retrieval systems using pseudo relevance judgments (2015) 0.01
    0.011789902 = product of:
      0.041264657 = sum of:
        0.02391367 = product of:
          0.059784174 = sum of:
            0.031712912 = weight(_text_:retrieval in 2591) [ClassicSimilarity], result of:
              0.031712912 = score(doc=2591,freq=6.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.28943354 = fieldWeight in 2591, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2591)
            0.028071264 = weight(_text_:system in 2591) [ClassicSimilarity], result of:
              0.028071264 = score(doc=2591,freq=4.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.24605882 = fieldWeight in 2591, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2591)
          0.4 = coord(2/5)
        0.017350987 = product of:
          0.034701973 = sum of:
            0.034701973 = weight(_text_:22 in 2591) [ClassicSimilarity], result of:
              0.034701973 = score(doc=2591,freq=4.0), product of:
                0.12684377 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03622214 = queryNorm
                0.27358043 = fieldWeight in 2591, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2591)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    Purpose In a system-based approach, replicating the web would require large test collections, and judging the relevancy of all documents per topic in creating relevance judgment through human assessors is infeasible. Due to the large amount of documents that requires judgment, there are possible errors introduced by human assessors because of disagreements. The paper aims to discuss these issues. Design/methodology/approach This study explores exponential variation and document ranking methods that generate a reliable set of relevance judgments (pseudo relevance judgments) to reduce human efforts. These methods overcome problems with large amounts of documents for judgment while avoiding human disagreement errors during the judgment process. This study utilizes two key factors: number of occurrences of each document per topic from all the system runs; and document rankings to generate the alternate methods. Findings The effectiveness of the proposed method is evaluated using the correlation coefficient of ranked systems using mean average precision scores between the original Text REtrieval Conference (TREC) relevance judgments and pseudo relevance judgments. The results suggest that the proposed document ranking method with a pool depth of 100 could be a reliable alternative to reduce human effort and disagreement errors involved in generating TREC-like relevance judgments. Originality/value Simple methods proposed in this study show improvement in the correlation coefficient in generating alternate relevance judgment without human assessors while contributing to information retrieval evaluation.
    Date
    20. 1.2015 18:30:22
    18. 9.2018 18:22:56
  6. Burgin, R.: ¬The retrieval effectiveness of 5 clustering algorithms as a function of indexing exhaustivity (1995) 0.01
    0.01169244 = product of:
      0.04092354 = sum of:
        0.02865454 = product of:
          0.07163635 = sum of:
            0.051786967 = weight(_text_:retrieval in 3365) [ClassicSimilarity], result of:
              0.051786967 = score(doc=3365,freq=16.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.47264296 = fieldWeight in 3365, product of:
                  4.0 = tf(freq=16.0), with freq of:
                    16.0 = termFreq=16.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3365)
            0.01984938 = weight(_text_:system in 3365) [ClassicSimilarity], result of:
              0.01984938 = score(doc=3365,freq=2.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.17398985 = fieldWeight in 3365, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3365)
          0.4 = coord(2/5)
        0.0122690005 = product of:
          0.024538001 = sum of:
            0.024538001 = weight(_text_:22 in 3365) [ClassicSimilarity], result of:
              0.024538001 = score(doc=3365,freq=2.0), product of:
                0.12684377 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03622214 = queryNorm
                0.19345059 = fieldWeight in 3365, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3365)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    The retrieval effectiveness of 5 hierarchical clustering methods (single link, complete link, group average, Ward's method, and weighted average) is examined as a function of indexing exhaustivity with 4 test collections (CR, Cranfield, Medlars, and Time). Evaluations of retrieval effectiveness, based on 3 measures of optimal retrieval performance, confirm earlier findings that the performance of a retrieval system based on single link clustering varies as a function of indexing exhaustivity but fail ti find similar patterns for other clustering methods. The data also confirm earlier findings regarding the poor performance of single link clustering is a retrieval environment. The poor performance of single link clustering appears to derive from that method's tendency to produce a small number of large, ill defined document clusters. By contrast, the data examined here found the retrieval performance of the other clustering methods to be general comparable. The data presented also provides an opportunity to examine the theoretical limits of cluster based retrieval and to compare these theoretical limits to the effectiveness of operational implementations. Performance standards of the 4 document collections examined were found to vary widely, and the effectiveness of operational implementations were found to be in the range defined as unacceptable. Further improvements in search strategies and document representations warrant investigations
    Date
    22. 2.1996 11:20:06
  7. Faloutsos, C.: Signature files (1992) 0.01
    0.008508152 = product of:
      0.029778533 = sum of:
        0.010148132 = product of:
          0.05074066 = sum of:
            0.05074066 = weight(_text_:retrieval in 3499) [ClassicSimilarity], result of:
              0.05074066 = score(doc=3499,freq=6.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.46309367 = fieldWeight in 3499, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0625 = fieldNorm(doc=3499)
          0.2 = coord(1/5)
        0.0196304 = product of:
          0.0392608 = sum of:
            0.0392608 = weight(_text_:22 in 3499) [ClassicSimilarity], result of:
              0.0392608 = score(doc=3499,freq=2.0), product of:
                0.12684377 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03622214 = queryNorm
                0.30952093 = fieldWeight in 3499, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=3499)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    Presents a survey and discussion on signature-based text retrieval methods. It describes the main idea behind the signature approach and its advantages over other text retrieval methods, it provides a classification of the signature methods that have appeared in the literature, it describes the main representatives of each class, together with the relative advantages and drawbacks, and it gives a list of applications as well as commercial or university prototypes that use the signature approach
    Date
    7. 5.1999 15:22:48
    Source
    Information retrieval: data structures and algorithms. Ed.: W.B. Frakes u. R. Baeza-Yates
  8. Losada, D.E.; Barreiro, A.: Emebedding term similarity and inverse document frequency into a logical model of information retrieval (2003) 0.01
    0.008508152 = product of:
      0.029778533 = sum of:
        0.010148132 = product of:
          0.05074066 = sum of:
            0.05074066 = weight(_text_:retrieval in 1422) [ClassicSimilarity], result of:
              0.05074066 = score(doc=1422,freq=6.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.46309367 = fieldWeight in 1422, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1422)
          0.2 = coord(1/5)
        0.0196304 = product of:
          0.0392608 = sum of:
            0.0392608 = weight(_text_:22 in 1422) [ClassicSimilarity], result of:
              0.0392608 = score(doc=1422,freq=2.0), product of:
                0.12684377 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03622214 = queryNorm
                0.30952093 = fieldWeight in 1422, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1422)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    We propose a novel approach to incorporate term similarity and inverse document frequency into a logical model of information retrieval. The ability of the logic to handle expressive representations along with the use of such classical notions are promising characteristics for IR systems. The approach proposed here has been efficiently implemented and experiments against test collections are presented.
    Date
    22. 3.2003 19:27:23
    Footnote
    Beitrag eines Themenheftes: Mathematical, logical, and formal methods in information retrieval
  9. MacFarlane, A.; Robertson, S.E.; McCann, J.A.: Parallel computing for passage retrieval (2004) 0.01
    0.007976091 = product of:
      0.027916316 = sum of:
        0.008285915 = product of:
          0.04142957 = sum of:
            0.04142957 = weight(_text_:retrieval in 5108) [ClassicSimilarity], result of:
              0.04142957 = score(doc=5108,freq=4.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.37811437 = fieldWeight in 5108, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0625 = fieldNorm(doc=5108)
          0.2 = coord(1/5)
        0.0196304 = product of:
          0.0392608 = sum of:
            0.0392608 = weight(_text_:22 in 5108) [ClassicSimilarity], result of:
              0.0392608 = score(doc=5108,freq=2.0), product of:
                0.12684377 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03622214 = queryNorm
                0.30952093 = fieldWeight in 5108, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=5108)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    In this paper methods for both speeding up passage processing and examining more passages using parallel computers are explored. The number of passages processed are varied in order to examine the effect on retrieval effectiveness and efficiency. The particular algorithm applied has previously been used to good effect in Okapi experiments at TREC. This algorithm and the mechanism for applying parallel computing to speed up processing are described.
    Date
    20. 1.2007 18:30:22
  10. Efthimiadis, E.N.: User choices : a new yardstick for the evaluation of ranking algorithms for interactive query expansion (1995) 0.01
    0.007866439 = product of:
      0.027532537 = sum of:
        0.015263537 = product of:
          0.03815884 = sum of:
            0.01830946 = weight(_text_:retrieval in 5697) [ClassicSimilarity], result of:
              0.01830946 = score(doc=5697,freq=2.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.16710453 = fieldWeight in 5697, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5697)
            0.01984938 = weight(_text_:system in 5697) [ClassicSimilarity], result of:
              0.01984938 = score(doc=5697,freq=2.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.17398985 = fieldWeight in 5697, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5697)
          0.4 = coord(2/5)
        0.0122690005 = product of:
          0.024538001 = sum of:
            0.024538001 = weight(_text_:22 in 5697) [ClassicSimilarity], result of:
              0.024538001 = score(doc=5697,freq=2.0), product of:
                0.12684377 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03622214 = queryNorm
                0.19345059 = fieldWeight in 5697, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5697)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    The performance of 8 ranking algorithms was evaluated with respect to their effectiveness in ranking terms for query expansion. The evaluation was conducted within an investigation of interactive query expansion and relevance feedback in a real operational environment. Focuses on the identification of algorithms that most effectively take cognizance of user preferences. user choices (i.e. the terms selected by the searchers for the query expansion search) provided the yardstick for the evaluation of the 8 ranking algorithms. This methodology introduces a user oriented approach in evaluating ranking algorithms for query expansion in contrast to the standard, system oriented approaches. Similarities in the performance of the 8 algorithms and the ways these algorithms rank terms were the main focus of this evaluation. The findings demonstrate that the r-lohi, wpq, enim, and porter algorithms have similar performance in bringing good terms to the top of a ranked list of terms for query expansion. However, further evaluation of the algorithms in different (e.g. full text) environments is needed before these results can be generalized beyond the context of the present study
    Date
    22. 2.1996 13:14:10
    Theme
    Semantisches Umfeld in Indexierung u. Retrieval
  11. Shiri, A.A.; Revie, C.: Query expansion behavior within a thesaurus-enhanced search environment : a user-centered evaluation (2006) 0.01
    0.007866439 = product of:
      0.027532537 = sum of:
        0.015263537 = product of:
          0.03815884 = sum of:
            0.01830946 = weight(_text_:retrieval in 56) [ClassicSimilarity], result of:
              0.01830946 = score(doc=56,freq=2.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.16710453 = fieldWeight in 56, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=56)
            0.01984938 = weight(_text_:system in 56) [ClassicSimilarity], result of:
              0.01984938 = score(doc=56,freq=2.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.17398985 = fieldWeight in 56, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=56)
          0.4 = coord(2/5)
        0.0122690005 = product of:
          0.024538001 = sum of:
            0.024538001 = weight(_text_:22 in 56) [ClassicSimilarity], result of:
              0.024538001 = score(doc=56,freq=2.0), product of:
                0.12684377 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03622214 = queryNorm
                0.19345059 = fieldWeight in 56, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=56)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    The study reported here investigated the query expansion behavior of end-users interacting with a thesaurus-enhanced search system on the Web. Two groups, namely academic staff and postgraduate students, were recruited into this study. Data were collected from 90 searches performed by 30 users using the OVID interface to the CAB abstracts database. Data-gathering techniques included questionnaires, screen capturing software, and interviews. The results presented here relate to issues of search-topic and search-term characteristics, number and types of expanded queries, usefulness of thesaurus terms, and behavioral differences between academic staff and postgraduate students in their interaction. The key conclusions drawn were that (a) academic staff chose more narrow and synonymous terms than did postgraduate students, who generally selected broader and related terms; (b) topic complexity affected users' interaction with the thesaurus in that complex topics required more query expansion and search term selection; (c) users' prior topic-search experience appeared to have a significant effect on their selection and evaluation of thesaurus terms; (d) in 50% of the searches where additional terms were suggested from the thesaurus, users stated that they had not been aware of the terms at the beginning of the search; this observation was particularly noticeable in the case of postgraduate students.
    Date
    22. 7.2006 16:32:43
    Theme
    Semantisches Umfeld in Indexierung u. Retrieval
  12. Crestani, F.; Dominich, S.; Lalmas, M.; Rijsbergen, C.J.K. van: Mathematical, logical, and formal methods in information retrieval : an introduction to the special issue (2003) 0.01
    0.0072818627 = product of:
      0.025486518 = sum of:
        0.010763719 = product of:
          0.053818595 = sum of:
            0.053818595 = weight(_text_:retrieval in 1451) [ClassicSimilarity], result of:
              0.053818595 = score(doc=1451,freq=12.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.49118498 = fieldWeight in 1451, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1451)
          0.2 = coord(1/5)
        0.0147228 = product of:
          0.0294456 = sum of:
            0.0294456 = weight(_text_:22 in 1451) [ClassicSimilarity], result of:
              0.0294456 = score(doc=1451,freq=2.0), product of:
                0.12684377 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03622214 = queryNorm
                0.23214069 = fieldWeight in 1451, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1451)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    Research an the use of mathematical, logical, and formal methods, has been central to Information Retrieval research for a long time. Research in this area is important not only because it helps enhancing retrieval effectiveness, but also because it helps clarifying the underlying concepts of Information Retrieval. In this article we outline some of the major aspects of the subject, and summarize the papers of this special issue with respect to how they relate to these aspects. We conclude by highlighting some directions of future research, which are needed to better understand the formal characteristics of Information Retrieval.
    Date
    22. 3.2003 19:27:36
    Footnote
    Einführung zu den Beiträgen eines Themenheftes: Mathematical, logical, and formal methods in information retrieval
  13. Chang, C.-H.; Hsu, C.-C.: Integrating query expansion and conceptual relevance feedback for personalized Web information retrieval (1998) 0.01
    0.006979079 = product of:
      0.024426775 = sum of:
        0.007250175 = product of:
          0.036250874 = sum of:
            0.036250874 = weight(_text_:retrieval in 1319) [ClassicSimilarity], result of:
              0.036250874 = score(doc=1319,freq=4.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.33085006 = fieldWeight in 1319, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1319)
          0.2 = coord(1/5)
        0.0171766 = product of:
          0.0343532 = sum of:
            0.0343532 = weight(_text_:22 in 1319) [ClassicSimilarity], result of:
              0.0343532 = score(doc=1319,freq=2.0), product of:
                0.12684377 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03622214 = queryNorm
                0.2708308 = fieldWeight in 1319, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1319)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Date
    1. 8.1996 22:08:06
    Theme
    Semantisches Umfeld in Indexierung u. Retrieval
  14. Hofferer, M.: Heuristic search in information retrieval (1994) 0.01
    0.006886522 = product of:
      0.04820565 = sum of:
        0.04820565 = product of:
          0.120514125 = sum of:
            0.06550591 = weight(_text_:retrieval in 1070) [ClassicSimilarity], result of:
              0.06550591 = score(doc=1070,freq=10.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.59785134 = fieldWeight in 1070, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1070)
            0.055008218 = weight(_text_:system in 1070) [ClassicSimilarity], result of:
              0.055008218 = score(doc=1070,freq=6.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.48217484 = fieldWeight in 1070, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1070)
          0.4 = coord(2/5)
      0.14285715 = coord(1/7)
    
    Abstract
    Describes an adaptive information retrieval system: Information Retrieval Algorithm System (IRAS); that uses heuristic searching to sample a document space and retrieve relevant documents according to users' requests; and also a learning module based on a knowledge representation system and an approximate probabilistic characterization of relevant documents; to reproduce a user classification of relevant documents and to provide a rule controlled ranking
    Source
    Information retrieval: new systems and current research. Proceedings of the 15th Research Colloquium of the British Computer Society Information Retrieval Specialist Group, Glasgow 1993. Ed.: Ruben Leon
  15. Witschel, H.F.: Global term weights in distributed environments (2008) 0.01
    0.006717526 = product of:
      0.023511339 = sum of:
        0.00878854 = product of:
          0.0439427 = sum of:
            0.0439427 = weight(_text_:retrieval in 2096) [ClassicSimilarity], result of:
              0.0439427 = score(doc=2096,freq=8.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.40105087 = fieldWeight in 2096, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2096)
          0.2 = coord(1/5)
        0.0147228 = product of:
          0.0294456 = sum of:
            0.0294456 = weight(_text_:22 in 2096) [ClassicSimilarity], result of:
              0.0294456 = score(doc=2096,freq=2.0), product of:
                0.12684377 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03622214 = queryNorm
                0.23214069 = fieldWeight in 2096, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2096)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    This paper examines the estimation of global term weights (such as IDF) in information retrieval scenarios where a global view on the collection is not available. In particular, the two options of either sampling documents or of using a reference corpus independent of the target retrieval collection are compared using standard IR test collections. In addition, the possibility of pruning term lists based on frequency is evaluated. The results show that very good retrieval performance can be reached when just the most frequent terms of a collection - an "extended stop word list" - are known and all terms which are not in that list are treated equally. However, the list cannot always be fully estimated from a general-purpose reference corpus, but some "domain-specific stop words" need to be added. A good solution for achieving this is to mix estimates from small samples of the target retrieval collection with ones derived from a reference corpus.
    Date
    1. 8.2008 9:44:22
  16. Khoo, C.S.G.; Wan, K.-W.: ¬A simple relevancy-ranking strategy for an interface to Boolean OPACs (2004) 0.01
    0.006668968 = product of:
      0.023341388 = sum of:
        0.014753087 = product of:
          0.036882717 = sum of:
            0.012816621 = weight(_text_:retrieval in 2509) [ClassicSimilarity], result of:
              0.012816621 = score(doc=2509,freq=2.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.11697317 = fieldWeight in 2509, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=2509)
            0.024066096 = weight(_text_:system in 2509) [ClassicSimilarity], result of:
              0.024066096 = score(doc=2509,freq=6.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.21095149 = fieldWeight in 2509, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=2509)
          0.4 = coord(2/5)
        0.0085883 = product of:
          0.0171766 = sum of:
            0.0171766 = weight(_text_:22 in 2509) [ClassicSimilarity], result of:
              0.0171766 = score(doc=2509,freq=2.0), product of:
                0.12684377 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03622214 = queryNorm
                0.1354154 = fieldWeight in 2509, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=2509)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Content
    "Most Web search engines accept natural language queries, perform some kind of fuzzy matching and produce ranked output, displaying first the documents that are most likely to be relevant. On the other hand, most library online public access catalogs (OPACs) an the Web are still Boolean retrieval systems that perform exact matching, and require users to express their search requests precisely in a Boolean search language and to refine their search statements to improve the search results. It is well-documented that users have difficulty searching Boolean OPACs effectively (e.g. Borgman, 1996; Ensor, 1992; Wallace, 1993). One approach to making OPACs easier to use is to develop a natural language search interface that acts as a middleware between the user's Web browser and the OPAC system. The search interface can accept a natural language query from the user and reformulate it as a series of Boolean search statements that are then submitted to the OPAC. The records retrieved by the OPAC are ranked by the search interface before forwarding them to the user's Web browser. The user, then, does not need to interact directly with the Boolean OPAC but with the natural language search interface or search intermediary. The search interface interacts with the OPAC system an the user's behalf. The advantage of this approach is that no modification to the OPAC or library system is required. Furthermore, the search interface can access multiple OPACs, acting as a meta search engine, and integrate search results from various OPACs before sending them to the user. The search interface needs to incorporate a method for converting the user's natural language query into a series of Boolean search statements, and for ranking the OPAC records retrieved. The purpose of this study was to develop a relevancyranking algorithm for a search interface to Boolean OPAC systems. This is part of an on-going effort to develop a knowledge-based search interface to OPACs called the E-Referencer (Khoo et al., 1998, 1999; Poo et al., 2000). E-Referencer v. 2 that has been implemented applies a repertoire of initial search strategies and reformulation strategies to retrieve records from OPACs using the Z39.50 protocol, and also assists users in mapping query keywords to the Library of Congress subject headings."
    Source
    Electronic library. 22(2004) no.2, S.112-120
  17. Joss, M.W.; Wszola, S.: ¬The engines that can : text search and retrieval software, their strategies, and vendors (1996) 0.01
    0.0063811145 = product of:
      0.0223339 = sum of:
        0.0076110996 = product of:
          0.0380555 = sum of:
            0.0380555 = weight(_text_:retrieval in 5123) [ClassicSimilarity], result of:
              0.0380555 = score(doc=5123,freq=6.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.34732026 = fieldWeight in 5123, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5123)
          0.2 = coord(1/5)
        0.0147228 = product of:
          0.0294456 = sum of:
            0.0294456 = weight(_text_:22 in 5123) [ClassicSimilarity], result of:
              0.0294456 = score(doc=5123,freq=2.0), product of:
                0.12684377 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03622214 = queryNorm
                0.23214069 = fieldWeight in 5123, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5123)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    Traces the development of text searching and retrieval software designed to cope with the increasing demands made by the storage and handling of large amounts of data, recorded on high data storage media, from CD-ROM to multi gigabyte storage media and online information services, with particular reference to the need to cope with graphics as well as conventional ASCII text. Includes details of: Boolean searching, fuzzy searching and matching; relevance ranking; proximity searching and improved strategies for dealing with text searching in very large databases. Concludes that the best searching tools for CD-ROM publishers are those optimized for searching and retrieval on CD-ROM. CD-ROM drives have relatively lower random seek times than hard discs and so the software most appropriate to the medium is that which can effectively arrange the indexes and text on the CD-ROM to avoid continuous random access searching. Lists and reviews a selection of software packages designed to achieve the sort of results required for rapid CD-ROM searching
    Date
    12. 9.1996 13:56:22
  18. Campos, L.M. de; Fernández-Luna, J.M.; Huete, J.F.: Implementing relevance feedback in the Bayesian network retrieval model (2003) 0.01
    0.0063811145 = product of:
      0.0223339 = sum of:
        0.0076110996 = product of:
          0.0380555 = sum of:
            0.0380555 = weight(_text_:retrieval in 825) [ClassicSimilarity], result of:
              0.0380555 = score(doc=825,freq=6.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.34732026 = fieldWeight in 825, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.046875 = fieldNorm(doc=825)
          0.2 = coord(1/5)
        0.0147228 = product of:
          0.0294456 = sum of:
            0.0294456 = weight(_text_:22 in 825) [ClassicSimilarity], result of:
              0.0294456 = score(doc=825,freq=2.0), product of:
                0.12684377 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03622214 = queryNorm
                0.23214069 = fieldWeight in 825, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=825)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    Relevance Feedback consists in automatically formulating a new query according to the relevance judgments provided by the user after evaluating a set of retrieved documents. In this article, we introduce several relevance feedback methods for the Bayesian Network Retrieval ModeL The theoretical frame an which our methods are based uses the concept of partial evidences, which summarize the new pieces of information gathered after evaluating the results obtained by the original query. These partial evidences are inserted into the underlying Bayesian network and a new inference process (probabilities propagation) is run to compute the posterior relevance probabilities of the documents in the collection given the new query. The quality of the proposed methods is tested using a preliminary experimentation with different standard document collections.
    Date
    22. 3.2003 19:30:19
    Footnote
    Beitrag eines Themenheftes: Mathematical, logical, and formal methods in information retrieval
  19. Kanaeva, Z.: Ranking: Google und CiteSeer (2005) 0.01
    0.0063723573 = product of:
      0.02230325 = sum of:
        0.0051266486 = product of:
          0.025633242 = sum of:
            0.025633242 = weight(_text_:retrieval in 3276) [ClassicSimilarity], result of:
              0.025633242 = score(doc=3276,freq=2.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.23394634 = fieldWeight in 3276, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3276)
          0.2 = coord(1/5)
        0.0171766 = product of:
          0.0343532 = sum of:
            0.0343532 = weight(_text_:22 in 3276) [ClassicSimilarity], result of:
              0.0343532 = score(doc=3276,freq=2.0), product of:
                0.12684377 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03622214 = queryNorm
                0.2708308 = fieldWeight in 3276, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3276)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    Im Rahmen des klassischen Information Retrieval wurden verschiedene Verfahren für das Ranking sowie die Suche in einer homogenen strukturlosen Dokumentenmenge entwickelt. Die Erfolge der Suchmaschine Google haben gezeigt dass die Suche in einer zwar inhomogenen aber zusammenhängenden Dokumentenmenge wie dem Internet unter Berücksichtigung der Dokumentenverbindungen (Links) sehr effektiv sein kann. Unter den von der Suchmaschine Google realisierten Konzepten ist ein Verfahren zum Ranking von Suchergebnissen (PageRank), das in diesem Artikel kurz erklärt wird. Darüber hinaus wird auf die Konzepte eines Systems namens CiteSeer eingegangen, welches automatisch bibliographische Angaben indexiert (engl. Autonomous Citation Indexing, ACI). Letzteres erzeugt aus einer Menge von nicht vernetzten wissenschaftlichen Dokumenten eine zusammenhängende Dokumentenmenge und ermöglicht den Einsatz von Banking-Verfahren, die auf den von Google genutzten Verfahren basieren.
    Date
    20. 3.2005 16:23:22
  20. Beitzel, S.M.; Jensen, E.C.; Chowdhury, A.; Grossman, D.; Frieder, O; Goharian, N.: Fusion of effective retrieval strategies in the same information retrieval system (2004) 0.01
    0.006327753 = product of:
      0.044294268 = sum of:
        0.044294268 = product of:
          0.11073567 = sum of:
            0.0694795 = weight(_text_:retrieval in 2502) [ClassicSimilarity], result of:
              0.0694795 = score(doc=2502,freq=20.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.63411707 = fieldWeight in 2502, product of:
                  4.472136 = tf(freq=20.0), with freq of:
                    20.0 = termFreq=20.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2502)
            0.041256163 = weight(_text_:system in 2502) [ClassicSimilarity], result of:
              0.041256163 = score(doc=2502,freq=6.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.36163113 = fieldWeight in 2502, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2502)
          0.4 = coord(2/5)
      0.14285715 = coord(1/7)
    
    Abstract
    Prior efforts have shown that under certain situations retrieval effectiveness may be improved via the use of data fusion techniques. Although these improvements have been observed from the fusion of result sets from several distinct information retrieval systems, it has often been thought that fusing different document retrieval strategies in a single information retrieval system will lead to similar improvements. In this study, we show that this is not the case. We hold constant systemic differences such as parsing, stemming, phrase processing, and relevance feedback, and fuse result sets generated from highly effective retrieval strategies in the same information retrieval system. From this, we show that data fusion of highly effective retrieval strategies alone shows little or no improvement in retrieval effectiveness. Furthermore, we present a detailed analysis of the performance of modern data fusion approaches, and demonstrate the reasons why they do not perform weIl when applied to this problem. Detailed results and analyses are included to support our conclusions.

Languages

Types

  • a 257
  • m 12
  • el 5
  • s 5
  • r 4
  • p 2
  • x 2
  • d 1
  • More… Less…