Search (92 results, page 1 of 5)

  • × theme_ss:"Retrievalalgorithmen"
  1. Mandl, T.: Tolerantes Information Retrieval : Neuronale Netze zur Erhöhung der Adaptivität und Flexibilität bei der Informationssuche (2001) 0.01
    0.008287756 = product of:
      0.033151023 = sum of:
        0.0026463792 = product of:
          0.013231896 = sum of:
            0.013231896 = weight(_text_:problem in 5965) [ClassicSimilarity], result of:
              0.013231896 = score(doc=5965,freq=2.0), product of:
                0.1410789 = queryWeight, product of:
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.03323817 = queryNorm
                0.09379075 = fieldWeight in 5965, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.015625 = fieldNorm(doc=5965)
          0.2 = coord(1/5)
        0.030504642 = weight(_text_:maschine in 5965) [ClassicSimilarity], result of:
          0.030504642 = score(doc=5965,freq=2.0), product of:
            0.21420717 = queryWeight, product of:
              6.444614 = idf(docFreq=190, maxDocs=44218)
              0.03323817 = queryNorm
            0.1424072 = fieldWeight in 5965, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.444614 = idf(docFreq=190, maxDocs=44218)
              0.015625 = fieldNorm(doc=5965)
      0.25 = coord(2/8)
    
    Abstract
    Ein wesentliches Bedürfnis im Rahmen der Mensch-Maschine-Interaktion ist die Suche nach Information. Um Information Retrieval (IR) Systeme kognitiv adäquat zu gestalten und sie an den Menschen anzupassen bieten sich Modelle des Soft Computing an. Ein umfassender state-of-the-art Bericht zu neuronalen Netzen im IR zeigt dass die meisten bestehenden Modelle das Potential neuronaler Netze nicht ausschöpfen. Das vorgestellte COSIMIR-Modell (Cognitive Similarity learning in Information Retrieval) basiert auf neuronalen Netzen und lernt, die Ähnlichkeit zwischen Anfrage und Dokument zu berechnen. Es trägt somit die kognitive Modellierung in den Kern eines IR Systems. Das Transformations-Netzwerk ist ein weiteres neuronales Netzwerk, das die Behandlung von Heterogenität anhand von Expertenurteilen lernt. Das COSIMIR-Modell und das Transformations-Netzwerk werden ausführlich diskutiert und anhand realer Datenmengen evaluiert
    Footnote
    Rez. in: nfd - Information 54(2003) H.6, S.379-380 (U. Thiel): "Kannte G. Salton bei der Entwicklung des Vektorraummodells die kybernetisch orientierten Versuche mit assoziativen Speicherstrukturen? An diese und ähnliche Vermutungen, die ich vor einigen Jahren mit Reginald Ferber und anderen Kollegen diskutierte, erinnerte mich die Thematik des vorliegenden Buches. Immerhin lässt sich feststellen, dass die Vektorrepräsentation eine genial einfache Darstellung sowohl der im Information Retrieval (IR) als grundlegende Datenstruktur benutzten "inverted files" als auch der assoziativen Speichermatrizen darstellt, die sich im Laufe der Zeit Über Perzeptrons zu Neuronalen Netzen (NN) weiterentwickelten. Dieser formale Zusammenhang stimulierte in der Folge eine Reihe von Ansätzen, die Netzwerke im Retrieval zu verwenden, wobei sich, wie auch im vorliegenden Band, hybride Ansätze, die Methoden aus beiden Disziplinen kombinieren, als sehr geeignet erweisen. Aber der Reihe nach... Das Buch wurde vom Autor als Dissertation beim Fachbereich IV "Sprachen und Technik" der Universität Hildesheim eingereicht und resultiert aus einer Folge von Forschungsbeiträgen zu mehreren Projekten, an denen der Autor in der Zeit von 1995 bis 2000 an verschiedenen Standorten beteiligt war. Dies erklärt die ungewohnte Breite der Anwendungen, Szenarien und Domänen, in denen die Ergebnisse gewonnen wurden. So wird das in der Arbeit entwickelte COSIMIR Modell (COgnitive SIMilarity learning in Information Retrieval) nicht nur anhand der klassischen Cranfield-Kollektion evaluiert, sondern auch im WING-Projekt der Universität Regensburg im Faktenretrieval aus einer Werkstoffdatenbank eingesetzt. Weitere Versuche mit der als "Transformations-Netzwerk" bezeichneten Komponente, deren Aufgabe die Abbildung von Gewichtungsfunktionen zwischen zwei Termräumen ist, runden das Spektrum der Experimente ab. Aber nicht nur die vorgestellten Resultate sind vielfältig, auch der dem Leser angebotene "State-of-the-Art"-Überblick fasst in hoch informativer Breite Wesentliches aus den Gebieten IR und NN zusammen und beleuchtet die Schnittpunkte der beiden Bereiche. So werden neben den Grundlagen des Text- und Faktenretrieval die Ansätze zur Verbesserung der Adaptivität und zur Beherrschung von Heterogenität vorgestellt, während als Grundlagen Neuronaler Netze neben einer allgemeinen Einführung in die Grundbegriffe u.a. das Backpropagation-Modell, KohonenNetze und die Adaptive Resonance Theory (ART) geschildert werden. Einweiteres Kapitel stellt die bisherigen NN-orientierten Ansätze im IR vor und rundet den Abriss der relevanten Forschungslandschaft ab. Als Vorbereitung der Präsentation des COSIMIR-Modells schiebt der Autor an dieser Stelle ein diskursives Kapitel zum Thema Heterogenität im IR ein, wodurch die Ziele und Grundannahmen der Arbeit noch einmal reflektiert werden. Als Dimensionen der Heterogenität werden der Objekttyp, die Qualität der Objekte und ihrer Erschließung und die Mehrsprachigkeit genannt. Wenn auch diese Systematik im Wesentlichen die Akzente auf Probleme aus den hier tangierten Projekten legt, und weniger eine umfassende Aufbereitung z.B. der Literatur zum Problem der Relevanz anstrebt, ist sie dennoch hilfreich zum Verständnis der in den nachfolgenden Kapitel oft nur implizit angesprochenen Designentscheidungen bei der Konzeption der entwickelten Prototypen. Der Ansatz, Heterogenität durch Transformationen zu behandeln, wird im speziellen Kontext der NN konkretisiert, wobei andere Möglichkeiten, die z.B. Instrumente der Logik und Probabilistik einzusetzen, nur kurz diskutiert werden. Eine weitergehende Analyse hätte wohl auch den Rahmen der Arbeit zu weit gespannt,
  2. Uratani, N.; Takeda, M.: ¬A fast string-searching algorithm for multiple patterns (1993) 0.01
    0.00567584 = product of:
      0.02270336 = sum of:
        0.010585517 = product of:
          0.052927583 = sum of:
            0.052927583 = weight(_text_:problem in 6275) [ClassicSimilarity], result of:
              0.052927583 = score(doc=6275,freq=2.0), product of:
                0.1410789 = queryWeight, product of:
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.03323817 = queryNorm
                0.375163 = fieldWeight in 6275, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.0625 = fieldNorm(doc=6275)
          0.2 = coord(1/5)
        0.012117843 = product of:
          0.03635353 = sum of:
            0.03635353 = weight(_text_:29 in 6275) [ClassicSimilarity], result of:
              0.03635353 = score(doc=6275,freq=2.0), product of:
                0.116921484 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03323817 = queryNorm
                0.31092256 = fieldWeight in 6275, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0625 = fieldNorm(doc=6275)
          0.33333334 = coord(1/3)
      0.25 = coord(2/8)
    
    Abstract
    The string-searching problem is to find all occurrences of pattern(s) in a text string. The Aho-Corasick string searching algorithm simultaneously finds all occurrences of multiple patterns in one pass through the text. The Boyer-Moore algorithm is the fastest algorithm for a single pattern. By combining the ideas of these two algorithms, presents an efficient string searching algorithm for multiple patterns. The algorithm runs in sublinear time, on the average, as the BM algorithm achieves, and its preprocessing time is linear proportional to the sum of the lengths of the patterns like the AC algorithm
    Source
    Information processing and management. 29(1993) no.6, S.775-791
  3. Bodoff, D.; Enache, D.; Kambil, A.; Simon, G.; Yukhimets, A.: ¬A unified maximum likelihood approach to document retrieval (2001) 0.00
    0.00425688 = product of:
      0.01702752 = sum of:
        0.0079391375 = product of:
          0.039695688 = sum of:
            0.039695688 = weight(_text_:problem in 174) [ClassicSimilarity], result of:
              0.039695688 = score(doc=174,freq=2.0), product of:
                0.1410789 = queryWeight, product of:
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.03323817 = queryNorm
                0.28137225 = fieldWeight in 174, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.046875 = fieldNorm(doc=174)
          0.2 = coord(1/5)
        0.009088382 = product of:
          0.027265146 = sum of:
            0.027265146 = weight(_text_:29 in 174) [ClassicSimilarity], result of:
              0.027265146 = score(doc=174,freq=2.0), product of:
                0.116921484 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03323817 = queryNorm
                0.23319192 = fieldWeight in 174, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.046875 = fieldNorm(doc=174)
          0.33333334 = coord(1/3)
      0.25 = coord(2/8)
    
    Abstract
    Empirical work shows significant benefits from using relevance feedback data to improve information retrieval (IR) performance. Still, one fundamental difficulty has limited the ability to fully exploit this valuable data. The problem is that it is not clear whether the relevance feedback data should be used to train the system about what the users really mean, or about what the documents really mean. In this paper, we resolve the question using a maximum likelihood framework. We show how all the available data can be used to simultaneously estimate both documents and queries in proportions that are optimal in a maximum likelihood sense. The resulting algorithm is directly applicable to many approaches to IR, and the unified framework can help explain previously reported results as well as guidethe search for new methods that utilize feedback data in IR
    Date
    29. 9.2001 17:52:51
  4. Effektive Information Retrieval Verfahren in Theorie und Praxis : ausgewählte und erweiterte Beiträge des Vierten Hildesheimer Evaluierungs- und Retrievalworkshop (HIER 2005), Hildesheim, 20.7.2005 (2006) 0.00
    0.0038130803 = product of:
      0.030504642 = sum of:
        0.030504642 = weight(_text_:maschine in 5973) [ClassicSimilarity], result of:
          0.030504642 = score(doc=5973,freq=2.0), product of:
            0.21420717 = queryWeight, product of:
              6.444614 = idf(docFreq=190, maxDocs=44218)
              0.03323817 = queryNorm
            0.1424072 = fieldWeight in 5973, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.444614 = idf(docFreq=190, maxDocs=44218)
              0.015625 = fieldNorm(doc=5973)
      0.125 = coord(1/8)
    
    Footnote
    "Evaluierung", das Thema des dritten Kapitels, ist in seiner Breite nicht auf das Information Retrieval beschränkt sondern beinhaltet ebenso einzelne Aspekte der Bereiche Mensch-Maschine-Interaktion sowie des E-Learning. Michael Muck und Marco Winter von der Stiftung Wissenschaft und Politik sowie dem Informationszentrum Sozialwissenschaften thematisieren in ihrem Beitrag den Einfluss der Fragestellung (Topic) auf die Bewertung von Relevanz und zeigen Verfahrensweisen für die Topic-Erstellung auf, die beim Cross Language Evaluation Forum (CLEF) Anwendung finden. Im darauf folgenden Aufsatz stellt Thomas Mandl verschiedene Evaluierungsinitiativen im Information Retrieval und aktuelle Entwicklungen dar. Joachim Pfister erläutert in seinem Beitrag das automatisierte Gruppieren, das sogenannte Clustering, von Patent-Dokumenten in den Datenbanken des Fachinformationszentrums Karlsruhe und evaluiert unterschiedliche Clusterverfahren auf Basis von Nutzerbewertungen. Ralph Kölle, Glenn Langemeier und Wolfgang Semar widmen sich dem kollaborativen Lernen unter den speziellen Bedingungen des Programmierens. Dabei werden das System VitaminL zur synchronen Bearbeitung von Programmieraufgaben und das Kennzahlensystem K-3 für die Bewertung kollaborativer Zusammenarbeit in einer Lehrveranstaltung angewendet. Der aktuelle Forschungsschwerpunkt der Hildesheimer Informationswissenschaft zeichnet sich im vierten Kapitel unter dem Thema "Multilinguale Systeme" ab. Hier finden sich die meisten Beiträge des Tagungsbandes wieder. Olga Tartakovski und Margaryta Shramko beschreiben und prüfen das System Langldent, das die Sprache von mono- und multilingualen Texten identifiziert. Die Eigenheiten der japanischen Schriftzeichen stellt Nina Kummer dar und vergleicht experimentell die unterschiedlichen Techniken der Indexierung. Suriya Na Nhongkai und Hans-Joachim Bentz präsentieren und prüfen eine bilinguale Suche auf Basis von Konzeptnetzen, wobei die Konzeptstruktur das verbindende Elemente der beiden Textsammlungen darstellt. Das Entwickeln und Evaluieren eines mehrsprachigen Question-Answering-Systems im Rahmen des Cross Language Evaluation Forum (CLEF), das die alltagssprachliche Formulierung von konkreten Fragestellungen ermöglicht, wird im Beitrag von Robert Strötgen, Thomas Mandl und Rene Schneider thematisiert. Den Schluss bildet der Aufsatz von Niels Jensen, der ein mehrsprachiges Web-Retrieval-System ebenfalls im Zusammenhang mit dem CLEF anhand des multilingualen EuroGOVKorpus evaluiert.
  5. Soulier, L.; Jabeur, L.B.; Tamine, L.; Bahsoun, W.: On ranking relevant entities in heterogeneous networks using a language-based model (2013) 0.00
    0.0035303675 = product of:
      0.01412147 = sum of:
        0.006615948 = product of:
          0.03307974 = sum of:
            0.03307974 = weight(_text_:problem in 664) [ClassicSimilarity], result of:
              0.03307974 = score(doc=664,freq=2.0), product of:
                0.1410789 = queryWeight, product of:
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.03323817 = queryNorm
                0.23447686 = fieldWeight in 664, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=664)
          0.2 = coord(1/5)
        0.007505522 = product of:
          0.022516565 = sum of:
            0.022516565 = weight(_text_:22 in 664) [ClassicSimilarity], result of:
              0.022516565 = score(doc=664,freq=2.0), product of:
                0.1163944 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03323817 = queryNorm
                0.19345059 = fieldWeight in 664, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=664)
          0.33333334 = coord(1/3)
      0.25 = coord(2/8)
    
    Abstract
    A new challenge, accessing multiple relevant entities, arises from the availability of linked heterogeneous data. In this article, we address more specifically the problem of accessing relevant entities, such as publications and authors within a bibliographic network, given an information need. We propose a novel algorithm, called BibRank, that estimates a joint relevance of documents and authors within a bibliographic network. This model ranks each type of entity using a score propagation algorithm with respect to the query topic and the structure of the underlying bi-type information entity network. Evidence sources, namely content-based and network-based scores, are both used to estimate the topical similarity between connected entities. For this purpose, authorship relationships are analyzed through a language model-based score on the one hand and on the other hand, non topically related entities of the same type are detected through marginal citations. The article reports the results of experiments using the Bibrank algorithm for an information retrieval task. The CiteSeerX bibliographic data set forms the basis for the topical query automatic generation and evaluation. We show that a statistically significant improvement over closely related ranking models is achieved.
    Date
    22. 3.2013 19:34:49
  6. Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.00
    0.0030022087 = product of:
      0.02401767 = sum of:
        0.02401767 = product of:
          0.07205301 = sum of:
            0.07205301 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
              0.07205301 = score(doc=402,freq=2.0), product of:
                0.1163944 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03323817 = queryNorm
                0.61904186 = fieldWeight in 402, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.125 = fieldNorm(doc=402)
          0.33333334 = coord(1/3)
      0.125 = coord(1/8)
    
    Source
    Information processing and management. 22(1986) no.6, S.465-476
  7. Ayadi, H.; Torjmen-Khemakhem, M.; Daoud, M.; Xiangji Huang, J.; Ben Jemaa, M.: MF-Re-Rank : a modality feature-based re-ranking model for medical image retrieval (2018) 0.00
    0.00283792 = product of:
      0.01135168 = sum of:
        0.0052927583 = product of:
          0.026463792 = sum of:
            0.026463792 = weight(_text_:problem in 4459) [ClassicSimilarity], result of:
              0.026463792 = score(doc=4459,freq=2.0), product of:
                0.1410789 = queryWeight, product of:
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.03323817 = queryNorm
                0.1875815 = fieldWeight in 4459, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.03125 = fieldNorm(doc=4459)
          0.2 = coord(1/5)
        0.0060589216 = product of:
          0.018176764 = sum of:
            0.018176764 = weight(_text_:29 in 4459) [ClassicSimilarity], result of:
              0.018176764 = score(doc=4459,freq=2.0), product of:
                0.116921484 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03323817 = queryNorm
                0.15546128 = fieldWeight in 4459, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03125 = fieldNorm(doc=4459)
          0.33333334 = coord(1/3)
      0.25 = coord(2/8)
    
    Abstract
    One of the main challenges in medical image retrieval is the increasing volume of image data, which render it difficult for domain experts to find relevant information from large data sets. Effective and efficient medical image retrieval systems are required to better manage medical image information. Text-based image retrieval (TBIR) was very successful in retrieving images with textual descriptions. Several TBIR approaches rely on models based on bag-of-words approaches, in which the image retrieval problem turns into one of standard text-based information retrieval; where the meanings and values of specific medical entities in the text and metadata are ignored in the image representation and retrieval process. However, we believe that TBIR should extract specific medical entities and terms and then exploit these elements to achieve better image retrieval results. Therefore, we propose a novel reranking method based on medical-image-dependent features. These features are manually selected by a medical expert from imaging modalities and medical terminology. First, we represent queries and images using only medical-image-dependent features such as image modality and image scale. Second, we exploit the defined features in a new reranking method for medical image retrieval. Our motivation is the large influence of image modality in medical image retrieval and its impact on image-relevance scores. To evaluate our approach, we performed a series of experiments on the medical ImageCLEF data sets from 2009 to 2013. The BM25 model, a language model, and an image-relevance feedback model are used as baselines to evaluate our approach. The experimental results show that compared to the BM25 model, the proposed model significantly enhances image retrieval performance. We also compared our approach with other state-of-the-art approaches and show that our approach performs comparably to those of the top three runs in the official ImageCLEF competition.
    Date
    29. 9.2018 11:43:31
  8. Archuby, C.G.: Interfaces se recuperacion para catalogos en linea con salidas ordenadas por probable relevancia (2000) 0.00
    0.0026776902 = product of:
      0.021421522 = sum of:
        0.021421522 = product of:
          0.064264566 = sum of:
            0.064264566 = weight(_text_:29 in 5727) [ClassicSimilarity], result of:
              0.064264566 = score(doc=5727,freq=4.0), product of:
                0.116921484 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03323817 = queryNorm
                0.5496386 = fieldWeight in 5727, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.078125 = fieldNorm(doc=5727)
          0.33333334 = coord(1/3)
      0.125 = coord(1/8)
    
    Date
    29. 1.1996 18:23:13
    Source
    Ciencia da informacao. 29(2000) no.3, S.5-13
  9. Crestani, F.: Combination of similarity measures for effective spoken document retrieval (2003) 0.00
    0.0026507783 = product of:
      0.021206226 = sum of:
        0.021206226 = product of:
          0.063618675 = sum of:
            0.063618675 = weight(_text_:29 in 4690) [ClassicSimilarity], result of:
              0.063618675 = score(doc=4690,freq=2.0), product of:
                0.116921484 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03323817 = queryNorm
                0.5441145 = fieldWeight in 4690, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.109375 = fieldNorm(doc=4690)
          0.33333334 = coord(1/3)
      0.125 = coord(1/8)
    
    Source
    Journal of information science. 29(2003) no.2, S.87-96
  10. Smeaton, A.F.; Rijsbergen, C.J. van: ¬The retrieval effects of query expansion on a feedback document retrieval system (1983) 0.00
    0.0026269327 = product of:
      0.021015462 = sum of:
        0.021015462 = product of:
          0.06304638 = sum of:
            0.06304638 = weight(_text_:22 in 2134) [ClassicSimilarity], result of:
              0.06304638 = score(doc=2134,freq=2.0), product of:
                0.1163944 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03323817 = queryNorm
                0.5416616 = fieldWeight in 2134, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=2134)
          0.33333334 = coord(1/3)
      0.125 = coord(1/8)
    
    Date
    30. 3.2001 13:32:22
  11. Back, J.: ¬An evaluation of relevancy ranking techniques used by Internet search engines (2000) 0.00
    0.0026269327 = product of:
      0.021015462 = sum of:
        0.021015462 = product of:
          0.06304638 = sum of:
            0.06304638 = weight(_text_:22 in 3445) [ClassicSimilarity], result of:
              0.06304638 = score(doc=3445,freq=2.0), product of:
                0.1163944 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03323817 = queryNorm
                0.5416616 = fieldWeight in 3445, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=3445)
          0.33333334 = coord(1/3)
      0.125 = coord(1/8)
    
    Date
    25. 8.2005 17:42:22
  12. Okada, M.; Ando, K.; Lee, S.S.; Hayashi, Y.; Aoe, J.I.: ¬An efficient substring search method by using delayed keyword extraction (2001) 0.00
    0.0022720955 = product of:
      0.018176764 = sum of:
        0.018176764 = product of:
          0.054530293 = sum of:
            0.054530293 = weight(_text_:29 in 6415) [ClassicSimilarity], result of:
              0.054530293 = score(doc=6415,freq=2.0), product of:
                0.116921484 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03323817 = queryNorm
                0.46638384 = fieldWeight in 6415, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.09375 = fieldNorm(doc=6415)
          0.33333334 = coord(1/3)
      0.125 = coord(1/8)
    
    Date
    29. 3.2002 17:24:03
  13. Cole, C.: Intelligent information retrieval: diagnosing information need : Part II: uncertainty expansion in a prototype of a diagnostic IR tool (1998) 0.00
    0.0022720955 = product of:
      0.018176764 = sum of:
        0.018176764 = product of:
          0.054530293 = sum of:
            0.054530293 = weight(_text_:29 in 6432) [ClassicSimilarity], result of:
              0.054530293 = score(doc=6432,freq=2.0), product of:
                0.116921484 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03323817 = queryNorm
                0.46638384 = fieldWeight in 6432, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.09375 = fieldNorm(doc=6432)
          0.33333334 = coord(1/3)
      0.125 = coord(1/8)
    
    Date
    11. 8.2001 14:48:29
  14. Fuhr, N.: Ranking-Experimente mit gewichteter Indexierung (1986) 0.00
    0.0022516565 = product of:
      0.018013252 = sum of:
        0.018013252 = product of:
          0.054039754 = sum of:
            0.054039754 = weight(_text_:22 in 58) [ClassicSimilarity], result of:
              0.054039754 = score(doc=58,freq=2.0), product of:
                0.1163944 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03323817 = queryNorm
                0.46428138 = fieldWeight in 58, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=58)
          0.33333334 = coord(1/3)
      0.125 = coord(1/8)
    
    Date
    14. 6.2015 22:12:44
  15. Fuhr, N.: Rankingexperimente mit gewichteter Indexierung (1986) 0.00
    0.0022516565 = product of:
      0.018013252 = sum of:
        0.018013252 = product of:
          0.054039754 = sum of:
            0.054039754 = weight(_text_:22 in 2051) [ClassicSimilarity], result of:
              0.054039754 = score(doc=2051,freq=2.0), product of:
                0.1163944 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03323817 = queryNorm
                0.46428138 = fieldWeight in 2051, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=2051)
          0.33333334 = coord(1/3)
      0.125 = coord(1/8)
    
    Date
    14. 6.2015 22:12:56
  16. Zhang, W.; Korf, R.E.: Performance of linear-space search algorithms (1995) 0.00
    0.001893413 = product of:
      0.015147304 = sum of:
        0.015147304 = product of:
          0.04544191 = sum of:
            0.04544191 = weight(_text_:29 in 4744) [ClassicSimilarity], result of:
              0.04544191 = score(doc=4744,freq=2.0), product of:
                0.116921484 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03323817 = queryNorm
                0.38865322 = fieldWeight in 4744, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.078125 = fieldNorm(doc=4744)
          0.33333334 = coord(1/3)
      0.125 = coord(1/8)
    
    Date
    2. 8.1996 10:29:15
  17. Hüther, H.: Selix im DFG-Projekt Kascade (1998) 0.00
    0.001893413 = product of:
      0.015147304 = sum of:
        0.015147304 = product of:
          0.04544191 = sum of:
            0.04544191 = weight(_text_:29 in 5151) [ClassicSimilarity], result of:
              0.04544191 = score(doc=5151,freq=2.0), product of:
                0.116921484 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03323817 = queryNorm
                0.38865322 = fieldWeight in 5151, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.078125 = fieldNorm(doc=5151)
          0.33333334 = coord(1/3)
      0.125 = coord(1/8)
    
    Date
    25. 8.2000 19:55:29
  18. Maron, M.E.: ¬An historical note on the origins of probabilistic indexing (2008) 0.00
    0.0018712728 = product of:
      0.014970182 = sum of:
        0.014970182 = product of:
          0.07485091 = sum of:
            0.07485091 = weight(_text_:problem in 2047) [ClassicSimilarity], result of:
              0.07485091 = score(doc=2047,freq=4.0), product of:
                0.1410789 = queryWeight, product of:
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.03323817 = queryNorm
                0.5305606 = fieldWeight in 2047, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.0625 = fieldNorm(doc=2047)
          0.2 = coord(1/5)
      0.125 = coord(1/8)
    
    Abstract
    The motivation behind "Probabilistic Indexing" was to replace two-valued thinking about information retrieval with probabilistic notions. This involved a new view of the information retrieval problem - viewing it as problem of inference and prediction, and introducing probabilistically weighted indexes and probabilistically ranked output. These ideas were first formulated and written up in August 1958.
  19. Sachs, W.M.: ¬An approach to associative retrieval through the theory of fuzzy sets (1976) 0.00
    0.001653987 = product of:
      0.013231896 = sum of:
        0.013231896 = product of:
          0.06615948 = sum of:
            0.06615948 = weight(_text_:problem in 7) [ClassicSimilarity], result of:
              0.06615948 = score(doc=7,freq=2.0), product of:
                0.1410789 = queryWeight, product of:
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.03323817 = queryNorm
                0.46895373 = fieldWeight in 7, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.078125 = fieldNorm(doc=7)
          0.2 = coord(1/5)
      0.125 = coord(1/8)
    
    Abstract
    The theory of fuzzy sets is used to provide a rogorous formulation of the problem of associative retrieval. This formulation suggests the idea of using fuzzy clustering to organize data for retrieval
  20. Cheng, C.-S.; Chung, C.-P.; Shann, J.J.-J.: Fast query evaluation through document identifier assignment for inverted file-based information retrieval systems (2006) 0.00
    0.001653987 = product of:
      0.013231896 = sum of:
        0.013231896 = product of:
          0.06615948 = sum of:
            0.06615948 = weight(_text_:problem in 979) [ClassicSimilarity], result of:
              0.06615948 = score(doc=979,freq=8.0), product of:
                0.1410789 = queryWeight, product of:
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.03323817 = queryNorm
                0.46895373 = fieldWeight in 979, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=979)
          0.2 = coord(1/5)
      0.125 = coord(1/8)
    
    Abstract
    Compressing an inverted file can greatly improve query performance of an information retrieval system (IRS) by reducing disk I/Os. We observe that a good document identifier assignment (DIA) can make the document identifiers in the posting lists more clustered, and result in better compression as well as shorter query processing time. In this paper, we tackle the NP-complete problem of finding an optimal DIA to minimize the average query processing time in an IRS when the probability distribution of query terms is given. We indicate that the greedy nearest neighbor (Greedy-NN) algorithm can provide excellent performance for this problem. However, the Greedy-NN algorithm is inappropriate if used in large-scale IRSs, due to its high complexity O(N2 × n), where N denotes the number of documents and n denotes the number of distinct terms. In real-world IRSs, the distribution of query terms is skewed. Based on this fact, we propose a fast O(N × n) heuristic, called partition-based document identifier assignment (PBDIA) algorithm, which can efficiently assign consecutive document identifiers to those documents containing frequently used query terms, and improve compression efficiency of the posting lists for those terms. This can result in reduced query processing time. The experimental results show that the PBDIA algorithm can yield a competitive performance versus the Greedy-NN for the DIA problem, and that this optimization problem has significant advantages for both long queries and parallel information retrieval (IR).

Languages

  • e 78
  • d 12
  • m 1
  • pt 1
  • More… Less…

Types

  • a 83
  • m 5
  • el 3
  • s 2
  • r 1
  • x 1
  • More… Less…