Search (345 results, page 1 of 18)

Berry, M.W.; Browne, M.: Understanding search engines : mathematical modeling and text retrieval (1999) 0.08

0.08153041 = product of:
  0.2608973 = sum of:
    0.057791423 = weight(_text_:wide in 5777) [ClassicSimilarity], result of:
      0.057791423 = score(doc=5777,freq=4.0), product of:
        0.13912784 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.031400457 = queryNorm
        0.4153836 = fieldWeight in 5777, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.046875 = fieldNorm(doc=5777)
    0.044339646 = weight(_text_:web in 5777) [ClassicSimilarity], result of:
      0.044339646 = score(doc=5777,freq=8.0), product of:
        0.10247572 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031400457 = queryNorm
        0.43268442 = fieldWeight in 5777, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=5777)
    0.015712982 = weight(_text_:information in 5777) [ClassicSimilarity], result of:
      0.015712982 = score(doc=5777,freq=12.0), product of:
        0.055122808 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.031400457 = queryNorm
        0.2850541 = fieldWeight in 5777, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=5777)
    0.0503927 = weight(_text_:retrieval in 5777) [ClassicSimilarity], result of:
      0.0503927 = score(doc=5777,freq=14.0), product of:
        0.09498371 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.031400457 = queryNorm
        0.5305404 = fieldWeight in 5777, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=5777)
    0.09266056 = weight(_text_:software in 5777) [ClassicSimilarity], result of:
      0.09266056 = score(doc=5777,freq=16.0), product of:
        0.124570385 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.031400457 = queryNorm
        0.743841 = fieldWeight in 5777, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.046875 = fieldNorm(doc=5777)
  0.3125 = coord(5/16)

Abstract: This book discusses many of the key design issues for building search engines and emphazises the important role that applied mathematics can play in improving information retrieval. The authors discuss not only important data structures, algorithms, and software but also user-centered issues such as interfaces, manual indexing, and document preparation. They also present some of the current problems in information retrieval that many not be familiar to applied mathematicians and computer scientists and some of the driving computational methods (SVD, SDD) for automated conceptual indexing
Classification: ST 230 [Informatik # Monographien # Software und -entwicklung # Software allgemein, (Einführung, Lehrbücher, Methoden der Programmierung) Software engineering, Programmentwicklungssysteme, Softwarewerkzeuge]
LCSH: Web search engines
RSWK: Suchmaschine / Information Retrieval
World Wide Web / Suchmaschine / Mathematisches Modell (BVB)
Suchmaschine / Information Retrieval / Mathematisches Modell (HEBIS)
RVK: ST 230 [Informatik # Monographien # Software und -entwicklung # Software allgemein, (Einführung, Lehrbücher, Methoden der Programmierung) Software engineering, Programmentwicklungssysteme, Softwarewerkzeuge]
Series: Software, environments, tools; 8
Subject: Suchmaschine / Information Retrieval
World Wide Web / Suchmaschine / Mathematisches Modell (BVB)
Suchmaschine / Information Retrieval / Mathematisches Modell (HEBIS)
Web search engines

Chang, C.-H.; Hsu, C.-C.: Integrating query expansion and conceptual relevance feedback for personalized Web information retrieval (1998) 0.05

0.048049383 = product of:
  0.15375802 = sum of:
    0.04767549 = weight(_text_:wide in 1319) [ClassicSimilarity], result of:
      0.04767549 = score(doc=1319,freq=2.0), product of:
        0.13912784 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.031400457 = queryNorm
        0.342674 = fieldWeight in 1319, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1319)
    0.04479914 = weight(_text_:web in 1319) [ClassicSimilarity], result of:
      0.04479914 = score(doc=1319,freq=6.0), product of:
        0.10247572 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031400457 = queryNorm
        0.43716836 = fieldWeight in 1319, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1319)
    0.0149678625 = weight(_text_:information in 1319) [ClassicSimilarity], result of:
      0.0149678625 = score(doc=1319,freq=8.0), product of:
        0.055122808 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.031400457 = queryNorm
        0.27153665 = fieldWeight in 1319, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1319)
    0.031425368 = weight(_text_:retrieval in 1319) [ClassicSimilarity], result of:
      0.031425368 = score(doc=1319,freq=4.0), product of:
        0.09498371 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.031400457 = queryNorm
        0.33085006 = fieldWeight in 1319, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1319)
    0.014890149 = product of:
      0.029780298 = sum of:
        0.029780298 = weight(_text_:22 in 1319) [ClassicSimilarity], result of:
          0.029780298 = score(doc=1319,freq=2.0), product of:
            0.10995905 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.031400457 = queryNorm
            0.2708308 = fieldWeight in 1319, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1319)
      0.5 = coord(1/2)
  0.3125 = coord(5/16)

Abstract: Keyword based querying has been an immediate and efficient way to specify and retrieve related information that the user inquired. However, conventional document ranking based on an automatic assessment of document relevance to the query may not be the best approach when little information is given. Proposes an idea to integrate 2 existing techniques, query expansion and relevance feedback to achieve a concept-based information search for the Web
Date: 1. 8.1996 22:08:06
Footnote: Contribution to a special issue devoted to the Proceedings of the 7th International World Wide Web Conference, held 14-18 April 1998, Brisbane, Australia
Theme: Semantisches Umfeld in Indexierung u. Retrieval

Joss, M.W.; Wszola, S.: ¬The engines that can : text search and retrieval software, their strategies, and vendors (1996) 0.04

0.039773364 = product of:
  0.12727477 = sum of:
    0.009586309 = product of:
      0.019172618 = sum of:
        0.019172618 = weight(_text_:online in 5123) [ClassicSimilarity], result of:
          0.019172618 = score(doc=5123,freq=2.0), product of:
            0.09529729 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.031400457 = queryNorm
            0.20118743 = fieldWeight in 5123, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.046875 = fieldNorm(doc=5123)
      0.5 = coord(1/2)
    0.006414798 = weight(_text_:information in 5123) [ClassicSimilarity], result of:
      0.006414798 = score(doc=5123,freq=2.0), product of:
        0.055122808 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.031400457 = queryNorm
        0.116372846 = fieldWeight in 5123, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=5123)
    0.032989766 = weight(_text_:retrieval in 5123) [ClassicSimilarity], result of:
      0.032989766 = score(doc=5123,freq=6.0), product of:
        0.09498371 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.031400457 = queryNorm
        0.34732026 = fieldWeight in 5123, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=5123)
    0.065520905 = weight(_text_:software in 5123) [ClassicSimilarity], result of:
      0.065520905 = score(doc=5123,freq=8.0), product of:
        0.124570385 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.031400457 = queryNorm
        0.525975 = fieldWeight in 5123, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.046875 = fieldNorm(doc=5123)
    0.012762985 = product of:
      0.02552597 = sum of:
        0.02552597 = weight(_text_:22 in 5123) [ClassicSimilarity], result of:
          0.02552597 = score(doc=5123,freq=2.0), product of:
            0.10995905 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.031400457 = queryNorm
            0.23214069 = fieldWeight in 5123, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=5123)
      0.5 = coord(1/2)
  0.3125 = coord(5/16)

Abstract: Traces the development of text searching and retrieval software designed to cope with the increasing demands made by the storage and handling of large amounts of data, recorded on high data storage media, from CD-ROM to multi gigabyte storage media and online information services, with particular reference to the need to cope with graphics as well as conventional ASCII text. Includes details of: Boolean searching, fuzzy searching and matching; relevance ranking; proximity searching and improved strategies for dealing with text searching in very large databases. Concludes that the best searching tools for CD-ROM publishers are those optimized for searching and retrieval on CD-ROM. CD-ROM drives have relatively lower random seek times than hard discs and so the software most appropriate to the medium is that which can effectively arrange the indexes and text on the CD-ROM to avoid continuous random access searching. Lists and reviews a selection of software packages designed to achieve the sort of results required for rapid CD-ROM searching
Date: 12. 9.1996 13:56:22

Mandl, T.: Web- und Multimedia-Dokumente : Neuere Entwicklungen bei der Evaluierung von Information Retrieval Systemen (2003) 0.04

0.037287936 = product of:
  0.14915174 = sum of:
    0.029559765 = weight(_text_:web in 1734) [ClassicSimilarity], result of:
      0.029559765 = score(doc=1734,freq=2.0), product of:
        0.10247572 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031400457 = queryNorm
        0.2884563 = fieldWeight in 1734, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0625 = fieldNorm(doc=1734)
    0.019125232 = weight(_text_:information in 1734) [ClassicSimilarity], result of:
      0.019125232 = score(doc=1734,freq=10.0), product of:
        0.055122808 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.031400457 = queryNorm
        0.3469568 = fieldWeight in 1734, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0625 = fieldNorm(doc=1734)
    0.05678614 = weight(_text_:retrieval in 1734) [ClassicSimilarity], result of:
      0.05678614 = score(doc=1734,freq=10.0), product of:
        0.09498371 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.031400457 = queryNorm
        0.59785134 = fieldWeight in 1734, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0625 = fieldNorm(doc=1734)
    0.043680605 = weight(_text_:software in 1734) [ClassicSimilarity], result of:
      0.043680605 = score(doc=1734,freq=2.0), product of:
        0.124570385 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.031400457 = queryNorm
        0.35064998 = fieldWeight in 1734, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.0625 = fieldNorm(doc=1734)
  0.25 = coord(4/16)

Abstract: Die Menge an Daten im Internet steigt weiter rapide an. Damit wächst auch der Bedarf an qualitativ hochwertigen Information Retrieval Diensten zur Orientierung und problemorientierten Suche. Die Entscheidung für die Benutzung oder Beschaffung von Information Retrieval Software erfordert aussagekräftige Evaluierungsergebnisse. Dieser Beitrag stellt neuere Entwicklungen bei der Evaluierung von Information Retrieval Systemen vor und zeigt den Trend zu Spezialisierung und Diversifizierung von Evaluierungsstudien, die den Realitätsgrad derErgebnisse erhöhen. DerSchwerpunkt liegt auf dem Retrieval von Fachtexten, Internet-Seiten und Multimedia-Objekten.
Source: Information - Wissenschaft und Praxis. 54(2003) H.4, S.203-210

Ding, Y.; Chowdhury, G.; Foo, S.: Organsising keywords in a Web search environment : a methodology based on co-word analysis (2000) 0.03

0.029887814 = product of:
  0.11955126 = sum of:
    0.040864702 = weight(_text_:wide in 105) [ClassicSimilarity], result of:
      0.040864702 = score(doc=105,freq=2.0), product of:
        0.13912784 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.031400457 = queryNorm
        0.29372054 = fieldWeight in 105, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.046875 = fieldNorm(doc=105)
    0.031352866 = weight(_text_:web in 105) [ClassicSimilarity], result of:
      0.031352866 = score(doc=105,freq=4.0), product of:
        0.10247572 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031400457 = queryNorm
        0.3059541 = fieldWeight in 105, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=105)
    0.014343925 = weight(_text_:information in 105) [ClassicSimilarity], result of:
      0.014343925 = score(doc=105,freq=10.0), product of:
        0.055122808 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.031400457 = queryNorm
        0.2602176 = fieldWeight in 105, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=105)
    0.032989766 = weight(_text_:retrieval in 105) [ClassicSimilarity], result of:
      0.032989766 = score(doc=105,freq=6.0), product of:
        0.09498371 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.031400457 = queryNorm
        0.34732026 = fieldWeight in 105, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=105)
  0.25 = coord(4/16)

Abstract: The rapid development of the Internet and World Wide Web has caused some critical problem for information retrieval. Researchers have made several attempts to solve these problems. Thesauri and subject heading lists as traditional information retrieval tools have been criticised for their efficiency to tackle these newly emerging problems. This paper proposes an information retrieval tool generated by cocitation analysis, comprising keyword clusters with relationships based on the co-occurrences of keywords in the literature. Such a tool can play the role of an associative thesaurus that can provide information about the keywords in a domain that might be useful for information searching and query expansion

Habernal, I.; Konopík, M.; Rohlík, O.: Question answering (2012) 0.03

0.029509231 = product of:
  0.118036926 = sum of:
    0.040864702 = weight(_text_:wide in 101) [ClassicSimilarity], result of:
      0.040864702 = score(doc=101,freq=2.0), product of:
        0.13912784 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.031400457 = queryNorm
        0.29372054 = fieldWeight in 101, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.046875 = fieldNorm(doc=101)
    0.031352866 = weight(_text_:web in 101) [ClassicSimilarity], result of:
      0.031352866 = score(doc=101,freq=4.0), product of:
        0.10247572 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031400457 = queryNorm
        0.3059541 = fieldWeight in 101, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=101)
    0.012829596 = weight(_text_:information in 101) [ClassicSimilarity], result of:
      0.012829596 = score(doc=101,freq=8.0), product of:
        0.055122808 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.031400457 = queryNorm
        0.23274569 = fieldWeight in 101, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=101)
    0.032989766 = weight(_text_:retrieval in 101) [ClassicSimilarity], result of:
      0.032989766 = score(doc=101,freq=6.0), product of:
        0.09498371 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.031400457 = queryNorm
        0.34732026 = fieldWeight in 101, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=101)
  0.25 = coord(4/16)

Abstract: Question Answering is an area of information retrieval with the added challenge of applying sophisticated techniques to identify the complex syntactic and semantic relationships present in text in order to provide a more sophisticated and satisfactory response to the user's information needs. For this reason, the authors see question answering as the next step beyond standard information retrieval. In this chapter state of the art question answering is covered focusing on providing an overview of systems, techniques and approaches that are likely to be employed in the next generations of search engines. Special attention is paid to question answering using the World Wide Web as the data source and to question answering exploiting the possibilities of Semantic Web. Considerations about the current issues and prospects for promising future research are also provided.
Source: Next generation search engines: advanced models for information retrieval. Eds.: C. Jouis, u.a

Berry, M.W.; Browne, M.: Understanding search engines : mathematical modeling and text retrieval (2005) 0.03
```
0.026489941 = product of:
  0.105959766 = sum of:
    0.020901911 = weight(_text_:web in 7) [ClassicSimilarity], result of:
      0.020901911 = score(doc=7,freq=4.0), product of:
        0.10247572 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031400457 = queryNorm
        0.2039694 = fieldWeight in 7, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03125 = fieldNorm(doc=7)
    0.01131464 = weight(_text_:information in 7) [ClassicSimilarity], result of:
      0.01131464 = score(doc=7,freq=14.0), product of:
        0.055122808 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.031400457 = queryNorm
        0.20526241 = fieldWeight in 7, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03125 = fieldNorm(doc=7)
    0.035914708 = weight(_text_:retrieval in 7) [ClassicSimilarity], result of:
      0.035914708 = score(doc=7,freq=16.0), product of:
        0.09498371 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.031400457 = queryNorm
        0.37811437 = fieldWeight in 7, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.03125 = fieldNorm(doc=7)
    0.037828512 = weight(_text_:software in 7) [ClassicSimilarity], result of:
      0.037828512 = score(doc=7,freq=6.0), product of:
        0.124570385 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.031400457 = queryNorm
        0.3036718 = fieldWeight in 7, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.03125 = fieldNorm(doc=7)
  0.25 = coord(4/16)
```
Abstract

The second edition of Understanding Search Engines: Mathematical Modeling and Text Retrieval follows the basic premise of the first edition by discussing many of the key design issues for building search engines and emphasizing the important role that applied mathematics can play in improving information retrieval. The authors discuss important data structures, algorithms, and software as well as user-centered issues such as interfaces, manual indexing, and document preparation. Significant changes bring the text up to date on current information retrieval methods: for example the addition of a new chapter on link-structure algorithms used in search engines such as Google. The chapter on user interface has been rewritten to specifically focus on search engine usability. In addition the authors have added new recommendations for further reading and expanded the bibliography, and have updated and streamlined the index to make it more reader friendly.

Content

Inhalt: Introduction Document File Preparation - Manual Indexing - Information Extraction - Vector Space Modeling - Matrix Decompositions - Query Representations - Ranking and Relevance Feedback - Searching by Link Structure - User Interface - Book Format Document File Preparation Document Purification and Analysis - Text Formatting - Validation - Manual Indexing - Automatic Indexing - Item Normalization - Inverted File Structures - Document File - Dictionary List - Inversion List - Other File Structures Vector Space Models Construction - Term-by-Document Matrices - Simple Query Matching - Design Issues - Term Weighting - Sparse Matrix Storage - Low-Rank Approximations Matrix Decompositions QR Factorization - Singular Value Decomposition - Low-Rank Approximations - Query Matching - Software - Semidiscrete Decomposition - Updating Techniques Query Management Query Binding - Types of Queries - Boolean Queries - Natural Language Queries - Thesaurus Queries - Fuzzy Queries - Term Searches - Probabilistic Queries Ranking and Relevance Feedback Performance Evaluation - Precision - Recall - Average Precision - Genetic Algorithms - Relevance Feedback Searching by Link Structure HITS Method - HITS Implementation - HITS Summary - PageRank Method - PageRank Adjustments - PageRank Implementation - PageRank Summary User Interface Considerations General Guidelines - Search Engine Interfaces - Form Fill-in - Display Considerations - Progress Indication - No Penalties for Error - Results - Test and Retest - Final Considerations Further Reading

LCSH

Web search engines

RSWK

Suchmaschine / Information Retrieval
Suchmaschine / Information Retrieval / Mathematisches Modell (HEBIS)

Series

Software, environments, tools; 17

Subject

Web search engines
Suchmaschine / Information Retrieval
Suchmaschine / Information Retrieval / Mathematisches Modell (HEBIS)

Picard, J.; Savoy, J.: Enhancing retrieval with hyperlinks : a general model based on propositional argumentation systems (2003) 0.03

0.026232902 = product of:
  0.10493161 = sum of:
    0.03405392 = weight(_text_:wide in 1427) [ClassicSimilarity], result of:
      0.03405392 = score(doc=1427,freq=2.0), product of:
        0.13912784 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.031400457 = queryNorm
        0.24476713 = fieldWeight in 1427, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1427)
    0.026127389 = weight(_text_:web in 1427) [ClassicSimilarity], result of:
      0.026127389 = score(doc=1427,freq=4.0), product of:
        0.10247572 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031400457 = queryNorm
        0.25496176 = fieldWeight in 1427, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1427)
    0.009258964 = weight(_text_:information in 1427) [ClassicSimilarity], result of:
      0.009258964 = score(doc=1427,freq=6.0), product of:
        0.055122808 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.031400457 = queryNorm
        0.16796975 = fieldWeight in 1427, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1427)
    0.035491336 = weight(_text_:retrieval in 1427) [ClassicSimilarity], result of:
      0.035491336 = score(doc=1427,freq=10.0), product of:
        0.09498371 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.031400457 = queryNorm
        0.37365708 = fieldWeight in 1427, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1427)
  0.25 = coord(4/16)

Abstract: Fast, effective, and adaptable techniques are needed to automatically organize and retrieve information an the ever-increasing World Wide Web. In that respect, different strategies have been suggested to take hypertext links into account. For example, hyperlinks have been used to (1) enhance document representation, (2) improve document ranking by propagating document score, (3) provide an indicator of popularity, and (4) find hubs and authorities for a given topic. Although the TREC experiments have not demonstrated the usefulness of hyperlinks for retrieval, the hypertext structure is nevertheless an essential aspect of the Web, and as such, should not be ignored. The development of abstract models of the IR task was a key factor to the improvement of search engines. However, at this time conceptual tools for modeling the hypertext retrieval task are lacking, making it difficult to compare, improve, and reason an the existing techniques. This article proposes a general model for using hyperlinks based an Probabilistic Argumentation Systems, in which each of the above-mentioned techniques can be stated. This model will allow to discover some inconsistencies in the mentioned techniques, and to take a higher level and systematic approach for using hyperlinks for retrieval.
Footnote: Beitrag eines Themenheftes: Mathematical, logical, and formal methods in information retrieval
Source: Journal of the American Society for Information Science and technology. 54(2003) no.4, S.347-355

Symonds, M.; Bruza, P.; Zuccon, G.; Koopman, B.; Sitbon, L.; Turner, I.: Automatic query expansion : a structural linguistic perspective (2014) 0.03

0.025840215 = product of:
  0.10336086 = sum of:
    0.03405392 = weight(_text_:wide in 1338) [ClassicSimilarity], result of:
      0.03405392 = score(doc=1338,freq=2.0), product of:
        0.13912784 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.031400457 = queryNorm
        0.24476713 = fieldWeight in 1338, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1338)
    0.018474855 = weight(_text_:web in 1338) [ClassicSimilarity], result of:
      0.018474855 = score(doc=1338,freq=2.0), product of:
        0.10247572 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031400457 = queryNorm
        0.18028519 = fieldWeight in 1338, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1338)
    0.01195327 = weight(_text_:information in 1338) [ClassicSimilarity], result of:
      0.01195327 = score(doc=1338,freq=10.0), product of:
        0.055122808 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.031400457 = queryNorm
        0.21684799 = fieldWeight in 1338, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1338)
    0.038878813 = weight(_text_:retrieval in 1338) [ClassicSimilarity], result of:
      0.038878813 = score(doc=1338,freq=12.0), product of:
        0.09498371 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.031400457 = queryNorm
        0.40932083 = fieldWeight in 1338, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1338)
  0.25 = coord(4/16)

Abstract: A user's query is considered to be an imprecise description of their information need. Automatic query expansion is the process of reformulating the original query with the goal of improving retrieval effectiveness. Many successful query expansion techniques model syntagmatic associations that infer two terms co-occur more often than by chance in natural language. However, structural linguistics relies on both syntagmatic and paradigmatic associations to deduce the meaning of a word. Given the success of dependency-based approaches to query expansion and the reliance on word meanings in the query formulation process, we argue that modeling both syntagmatic and paradigmatic information in the query expansion process improves retrieval effectiveness. This article develops and evaluates a new query expansion technique that is based on a formal, corpus-based model of word meaning that models syntagmatic and paradigmatic associations. We demonstrate that when sufficient statistical information exists, as in the case of longer queries, including paradigmatic information alone provides significant improvements in retrieval effectiveness across a wide variety of data sets. More generally, when our new query expansion approach is applied to large-scale web retrieval it demonstrates significant improvements in retrieval effectiveness over a strong baseline system, based on a commercial search engine.
Source: Journal of the Association for Information Science and Technology. 65(2014) no.8, S.1577-1596
Theme: Semantisches Umfeld in Indexierung u. Retrieval

Fuhr, N.: Zur Überwindung der Diskrepanz zwischen Retrievalforschung und -praxis (1990) 0.03

0.025262669 = product of:
  0.13473423 = sum of:
    0.09026646 = weight(_text_:benutzer in 6625) [ClassicSimilarity], result of:
      0.09026646 = score(doc=6625,freq=2.0), product of:
        0.17907447 = queryWeight, product of:
          5.7029257 = idf(docFreq=400, maxDocs=44218)
          0.031400457 = queryNorm
        0.5040722 = fieldWeight in 6625, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.7029257 = idf(docFreq=400, maxDocs=44218)
          0.0625 = fieldNorm(doc=6625)
    0.008553064 = weight(_text_:information in 6625) [ClassicSimilarity], result of:
      0.008553064 = score(doc=6625,freq=2.0), product of:
        0.055122808 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.031400457 = queryNorm
        0.1551638 = fieldWeight in 6625, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0625 = fieldNorm(doc=6625)
    0.035914708 = weight(_text_:retrieval in 6625) [ClassicSimilarity], result of:
      0.035914708 = score(doc=6625,freq=4.0), product of:
        0.09498371 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.031400457 = queryNorm
        0.37811437 = fieldWeight in 6625, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0625 = fieldNorm(doc=6625)
  0.1875 = coord(3/16)

Abstract: In diesem Beitrag werden einige Forschungsergebnisse des Information Retrieval vorgestellt, die unmittelbar zur Verbesserung der Retrievalqualität für bereits existierende Datenbanken eingesetzt werden können: Linguistische Algorithmen zur Grund- und Stammformreduktion unterstützen die Suche nach Flexions- und Derivationsformen von Suchtermen. Rankingalgorithmen, die Frage- und Dokumentterme gewichten, führen zu signifikant besseren Retrievalergebnissen als beim Booleschen Retrieval. Durch Relevance Feedback können die Retrievalqualität weiter gesteigert und außerdem der Benutzer bei der sukzessiven Modifikation seiner Frageformulierung unterstützt werden. Es wird eine benutzerfreundliche Bedienungsoberfläche für ein System vorgestellt, das auf diesen Konzepten basiert.

Shiri, A.A.; Revie, C.: Query expansion behavior within a thesaurus-enhanced search environment : a user-centered evaluation (2006) 0.02

0.024259038 = product of:
  0.077628925 = sum of:
    0.018474855 = weight(_text_:web in 56) [ClassicSimilarity], result of:
      0.018474855 = score(doc=56,freq=2.0), product of:
        0.10247572 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031400457 = queryNorm
        0.18028519 = fieldWeight in 56, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=56)
    0.005345665 = weight(_text_:information in 56) [ClassicSimilarity], result of:
      0.005345665 = score(doc=56,freq=2.0), product of:
        0.055122808 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.031400457 = queryNorm
        0.09697737 = fieldWeight in 56, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=56)
    0.015872208 = weight(_text_:retrieval in 56) [ClassicSimilarity], result of:
      0.015872208 = score(doc=56,freq=2.0), product of:
        0.09498371 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.031400457 = queryNorm
        0.16710453 = fieldWeight in 56, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=56)
    0.027300376 = weight(_text_:software in 56) [ClassicSimilarity], result of:
      0.027300376 = score(doc=56,freq=2.0), product of:
        0.124570385 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.031400457 = queryNorm
        0.21915624 = fieldWeight in 56, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.0390625 = fieldNorm(doc=56)
    0.010635821 = product of:
      0.021271642 = sum of:
        0.021271642 = weight(_text_:22 in 56) [ClassicSimilarity], result of:
          0.021271642 = score(doc=56,freq=2.0), product of:
            0.10995905 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.031400457 = queryNorm
            0.19345059 = fieldWeight in 56, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=56)
      0.5 = coord(1/2)
  0.3125 = coord(5/16)

Abstract: The study reported here investigated the query expansion behavior of end-users interacting with a thesaurus-enhanced search system on the Web. Two groups, namely academic staff and postgraduate students, were recruited into this study. Data were collected from 90 searches performed by 30 users using the OVID interface to the CAB abstracts database. Data-gathering techniques included questionnaires, screen capturing software, and interviews. The results presented here relate to issues of search-topic and search-term characteristics, number and types of expanded queries, usefulness of thesaurus terms, and behavioral differences between academic staff and postgraduate students in their interaction. The key conclusions drawn were that (a) academic staff chose more narrow and synonymous terms than did postgraduate students, who generally selected broader and related terms; (b) topic complexity affected users' interaction with the thesaurus in that complex topics required more query expansion and search term selection; (c) users' prior topic-search experience appeared to have a significant effect on their selection and evaluation of thesaurus terms; (d) in 50% of the searches where additional terms were suggested from the thesaurus, users stated that they had not been aware of the terms at the beginning of the search; this observation was particularly noticeable in the case of postgraduate students.
Date: 22. 7.2006 16:32:43
Source: Journal of the American Society for Information Science and Technology. 57(2006) no.4, S.462-478
Theme: Semantisches Umfeld in Indexierung u. Retrieval

Fan, W.; Fox, E.A.; Pathak, P.; Wu, H.: ¬The effects of fitness functions an genetic programming-based ranking discovery for Web search (2004) 0.02

0.023787355 = product of:
  0.09514942 = sum of:
    0.044339646 = weight(_text_:web in 2239) [ClassicSimilarity], result of:
      0.044339646 = score(doc=2239,freq=8.0), product of:
        0.10247572 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031400457 = queryNorm
        0.43268442 = fieldWeight in 2239, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=2239)
    0.011110757 = weight(_text_:information in 2239) [ClassicSimilarity], result of:
      0.011110757 = score(doc=2239,freq=6.0), product of:
        0.055122808 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.031400457 = queryNorm
        0.20156369 = fieldWeight in 2239, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2239)
    0.02693603 = weight(_text_:retrieval in 2239) [ClassicSimilarity], result of:
      0.02693603 = score(doc=2239,freq=4.0), product of:
        0.09498371 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.031400457 = queryNorm
        0.2835858 = fieldWeight in 2239, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=2239)
    0.012762985 = product of:
      0.02552597 = sum of:
        0.02552597 = weight(_text_:22 in 2239) [ClassicSimilarity], result of:
          0.02552597 = score(doc=2239,freq=2.0), product of:
            0.10995905 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.031400457 = queryNorm
            0.23214069 = fieldWeight in 2239, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2239)
      0.5 = coord(1/2)
  0.25 = coord(4/16)

Abstract: Genetic-based evolutionary learning algorithms, such as genetic algorithms (GAs) and genetic programming (GP), have been applied to information retrieval (IR) since the 1980s. Recently, GP has been applied to a new IR taskdiscovery of ranking functions for Web search-and has achieved very promising results. However, in our prior research, only one fitness function has been used for GP-based learning. It is unclear how other fitness functions may affect ranking function discovery for Web search, especially since it is weIl known that choosing a proper fitness function is very important for the effectiveness and efficiency of evolutionary algorithms. In this article, we report our experience in contrasting different fitness function designs an GP-based learning using a very large Web corpus. Our results indicate that the design of fitness functions is instrumental in performance improvement. We also give recommendations an the design of fitness functions for genetic-based information retrieval experiments.
Date: 31. 5.2004 19:22:06
Source: Journal of the American Society for Information Science and technology. 55(2004) no.7, S.628-636

Austin, D.: How Google finds your needle in the Web's haystack : as we'll see, the trick is to ask the web itself to rank the importance of pages... (2006) 0.02
```
0.021501906 = product of:
  0.086007625 = sum of:
    0.023837745 = weight(_text_:wide in 93) [ClassicSimilarity], result of:
      0.023837745 = score(doc=93,freq=2.0), product of:
        0.13912784 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.031400457 = queryNorm
        0.171337 = fieldWeight in 93, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.02734375 = fieldNorm(doc=93)
    0.036578346 = weight(_text_:web in 93) [ClassicSimilarity], result of:
      0.036578346 = score(doc=93,freq=16.0), product of:
        0.10247572 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031400457 = queryNorm
        0.35694647 = fieldWeight in 93, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.02734375 = fieldNorm(doc=93)
    0.0064812745 = weight(_text_:information in 93) [ClassicSimilarity], result of:
      0.0064812745 = score(doc=93,freq=6.0), product of:
        0.055122808 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.031400457 = queryNorm
        0.11757882 = fieldWeight in 93, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02734375 = fieldNorm(doc=93)
    0.019110262 = weight(_text_:software in 93) [ClassicSimilarity], result of:
      0.019110262 = score(doc=93,freq=2.0), product of:
        0.124570385 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.031400457 = queryNorm
        0.15340936 = fieldWeight in 93, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.02734375 = fieldNorm(doc=93)
  0.25 = coord(4/16)
```
Abstract

Imagine a library containing 25 billion documents but with no centralized organization and no librarians. In addition, anyone may add a document at any time without telling anyone. You may feel sure that one of the documents contained in the collection has a piece of information that is vitally important to you, and, being impatient like most of us, you'd like to find it in a matter of seconds. How would you go about doing it? Posed in this way, the problem seems impossible. Yet this description is not too different from the World Wide Web, a huge, highly-disorganized collection of documents in many different formats. Of course, we're all familiar with search engines (perhaps you found this article using one) so we know that there is a solution. This article will describe Google's PageRank algorithm and how it returns pages from the web's collection of 25 billion documents that match search criteria so well that "google" has become a widely used verb. Most search engines, including Google, continually run an army of computer programs that retrieve pages from the web, index the words in each document, and store this information in an efficient format. Each time a user asks for a web search using a search phrase, such as "search engine," the search engine determines all the pages on the web that contains the words in the search phrase. (Perhaps additional information such as the distance between the words "search" and "engine" will be noted as well.) Here is the problem: Google now claims to index 25 billion pages. Roughly 95% of the text in web pages is composed from a mere 10,000 words. This means that, for most searches, there will be a huge number of pages containing the words in the search phrase. What is needed is a means of ranking the importance of the pages that fit the search criteria so that the pages can be sorted with the most important pages at the top of the list. One way to determine the importance of pages is to use a human-generated ranking. For instance, you may have seen pages that consist mainly of a large number of links to other resources in a particular area of interest. Assuming the person maintaining this page is reliable, the pages referenced are likely to be useful. Of course, the list may quickly fall out of date, and the person maintaining the list may miss some important pages, either unintentionally or as a result of an unstated bias. Google's PageRank algorithm assesses the importance of web pages without human evaluation of the content. In fact, Google feels that the value of its service is largely in its ability to provide unbiased results to search queries; Google claims, "the heart of our software is PageRank." As we'll see, the trick is to ask the web itself to rank the importance of pages.

Kantor, P.; Kim, M.H.; Ibraev, U.; Atasoy, K.: Estimating the number of relevant documents in enormous collections (1999) 0.02

0.0214167 = product of:
  0.0856668 = sum of:
    0.03405392 = weight(_text_:wide in 6690) [ClassicSimilarity], result of:
      0.03405392 = score(doc=6690,freq=2.0), product of:
        0.13912784 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.031400457 = queryNorm
        0.24476713 = fieldWeight in 6690, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0390625 = fieldNorm(doc=6690)
    0.018474855 = weight(_text_:web in 6690) [ClassicSimilarity], result of:
      0.018474855 = score(doc=6690,freq=2.0), product of:
        0.10247572 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031400457 = queryNorm
        0.18028519 = fieldWeight in 6690, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=6690)
    0.01069133 = weight(_text_:information in 6690) [ClassicSimilarity], result of:
      0.01069133 = score(doc=6690,freq=8.0), product of:
        0.055122808 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.031400457 = queryNorm
        0.19395474 = fieldWeight in 6690, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=6690)
    0.022446692 = weight(_text_:retrieval in 6690) [ClassicSimilarity], result of:
      0.022446692 = score(doc=6690,freq=4.0), product of:
        0.09498371 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.031400457 = queryNorm
        0.23632148 = fieldWeight in 6690, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=6690)
  0.25 = coord(4/16)

Abstract: In assessing information retrieval systems, it is important to know not only the precision of the retrieved set, but also to compare the number of retrieved relevant items to the total number of relevant items. For large collections, such as the TREC test collections, or the World Wide Web, it is not possible to enumerate the entire set of relevant documents. If the retrieved documents are evaluated, a variant of the statistical "capture-recapture" method can be used to estimate the total number of relevant documents, providing the several retrieval systems used are sufficiently independent. We show that the underlying signal detection model supporting such an analysis can be extended in two ways. First, assuming that there are two distinct performance characteristics (corresponding to the chance of retrieving a relevant, and retrieving a given non-relevant document), we show that if there are three or more independent systems available it is possible to estimate the number of relevant documents without actually having to decide whether each individual document is relevant. We report applications of this 3-system method to the TREC data, leading to the conclusion that the independence assumptions are not satisfied. We then extend the model to a multi-system, multi-problem model, and show that it is possible to include statistical dependencies of all orders in the model, and determine the number of relevant documents for each of the problems in the set. Application to the TREC setting will be presented
Imprint: Medford, NJ : Information Today
Series: Proceedings of the American Society for Information Science; vol.36
Source: Knowledge: creation, organization and use. Proceedings of the 62nd Annual Meeting of the American Society for Information Science, 31.10.-4.11.1999. Ed.: L. Woods

Meghabghab, G.: Google's Web page ranking applied to different topological Web graph structures (2001) 0.02

0.02100645 = product of:
  0.1120344 = sum of:
    0.08860228 = weight(_text_:web in 6028) [ClassicSimilarity], result of:
      0.08860228 = score(doc=6028,freq=46.0), product of:
        0.10247572 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031400457 = queryNorm
        0.86461735 = fieldWeight in 6028, product of:
          6.78233 = tf(freq=46.0), with freq of:
            46.0 = termFreq=46.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=6028)
    0.007559912 = weight(_text_:information in 6028) [ClassicSimilarity], result of:
      0.007559912 = score(doc=6028,freq=4.0), product of:
        0.055122808 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.031400457 = queryNorm
        0.13714671 = fieldWeight in 6028, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=6028)
    0.015872208 = weight(_text_:retrieval in 6028) [ClassicSimilarity], result of:
      0.015872208 = score(doc=6028,freq=2.0), product of:
        0.09498371 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.031400457 = queryNorm
        0.16710453 = fieldWeight in 6028, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=6028)
  0.1875 = coord(3/16)

Abstract: This research is part of the ongoing study to better understand web page ranking on the web. It looks at a web page as a graph structure or a web graph, and tries to classify different web graphs in the new coordinate space: (out-degree, in-degree). The out-degree coordinate od is defined as the number of outgoing web pages from a given web page. The in-degree id coordinate is the number of web pages that point to a given web page. In this new coordinate space a metric is built to classify how close or far different web graphs are. Google's web ranking algorithm (Brin & Page, 1998) on ranking web pages is applied in this new coordinate space. The results of the algorithm has been modified to fit different topological web graph structures. Also the algorithm was not successful in the case of general web graphs and new ranking web algorithms have to be considered. This study does not look at enhancing web ranking by adding any contextual information. It only considers web links as a source to web page ranking. The author believes that understanding the underlying web page as a graph will help design better ranking web algorithms, enhance retrieval and web performance, and recommends using graphs as a part of visual aid for browsing engine designers
Source: Journal of the American Society for Information Science and technology. 52(2001) no.9, S.736-747

Chen, H.; Lally, A.M.; Zhu, B.; Chau, M.: HelpfulMed : Intelligent searching for medical information over the Internet (2003) 0.02

0.020422183 = product of:
  0.08168873 = sum of:
    0.03694971 = weight(_text_:web in 1615) [ClassicSimilarity], result of:
      0.03694971 = score(doc=1615,freq=8.0), product of:
        0.10247572 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031400457 = queryNorm
        0.36057037 = fieldWeight in 1615, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1615)
    0.00798859 = product of:
      0.01597718 = sum of:
        0.01597718 = weight(_text_:online in 1615) [ClassicSimilarity], result of:
          0.01597718 = score(doc=1615,freq=2.0), product of:
            0.09529729 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.031400457 = queryNorm
            0.16765618 = fieldWeight in 1615, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1615)
      0.5 = coord(1/2)
    0.009258964 = weight(_text_:information in 1615) [ClassicSimilarity], result of:
      0.009258964 = score(doc=1615,freq=6.0), product of:
        0.055122808 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.031400457 = queryNorm
        0.16796975 = fieldWeight in 1615, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1615)
    0.027491473 = weight(_text_:retrieval in 1615) [ClassicSimilarity], result of:
      0.027491473 = score(doc=1615,freq=6.0), product of:
        0.09498371 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.031400457 = queryNorm
        0.28943354 = fieldWeight in 1615, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1615)
  0.25 = coord(4/16)

Abstract: The Medical professionals and researchers need information from reputable sources to accomplish their work. Unfortunately, the Web has a large number of documents that are irrelevant to their work, even those documents that purport to be "medically-related." This paper describes an architecture designed to integrate advanced searching and indexing algorithms, an automatic thesaurus, or "concept space," and Kohonen-based Self-Organizing Map (SOM) technologies to provide searchers with finegrained results. Initial results indicate that these systems provide complementary retrieval functionalities. HelpfulMed not only allows users to search Web pages and other online databases, but also allows them to build searches through the use of an automatic thesaurus and browse a graphical display of medical-related topics. Evaluation results for each of the different components are included. Our spidering algorithm outperformed both breadth-first search and PageRank spiders an a test collection of 100,000 Web pages. The automatically generated thesaurus performed as well as both MeSH and UMLS-systems which require human mediation for currency. Lastly, a variant of the Kohonen SOM was comparable to MeSH terms in perceived cluster precision and significantly better at perceived cluster recall.
Footnote: Teil eines Themenheftes: "Web retrieval and mining: A machine learning perspective"
Source: Journal of the American Society for Information Science and technology. 54(2003) no.7, S.683-694
Theme: Semantisches Umfeld in Indexierung u. Retrieval

Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.02

0.019112218 = product of:
  0.101931825 = sum of:
    0.017106129 = weight(_text_:information in 402) [ClassicSimilarity], result of:
      0.017106129 = score(doc=402,freq=2.0), product of:
        0.055122808 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.031400457 = queryNorm
        0.3103276 = fieldWeight in 402, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.125 = fieldNorm(doc=402)
    0.050791066 = weight(_text_:retrieval in 402) [ClassicSimilarity], result of:
      0.050791066 = score(doc=402,freq=2.0), product of:
        0.09498371 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.031400457 = queryNorm
        0.5347345 = fieldWeight in 402, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.125 = fieldNorm(doc=402)
    0.03403463 = product of:
      0.06806926 = sum of:
        0.06806926 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
          0.06806926 = score(doc=402,freq=2.0), product of:
            0.10995905 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.031400457 = queryNorm
            0.61904186 = fieldWeight in 402, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.125 = fieldNorm(doc=402)
      0.5 = coord(1/2)
  0.1875 = coord(3/16)

Source: Information processing and management. 22(1986) no.6, S.465-476

Henzinger, M.R.: Hyperlink analysis for the Web (2001) 0.02
```
0.018558865 = product of:
  0.09898061 = sum of:
    0.053289626 = weight(_text_:web in 8) [ClassicSimilarity], result of:
      0.053289626 = score(doc=8,freq=26.0), product of:
        0.10247572 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031400457 = queryNorm
        0.520022 = fieldWeight in 8, product of:
          5.0990195 = tf(freq=26.0), with freq of:
            26.0 = termFreq=26.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03125 = fieldNorm(doc=8)
    0.012095859 = weight(_text_:information in 8) [ClassicSimilarity], result of:
      0.012095859 = score(doc=8,freq=16.0), product of:
        0.055122808 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.031400457 = queryNorm
        0.21943474 = fieldWeight in 8, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03125 = fieldNorm(doc=8)
    0.03359513 = weight(_text_:retrieval in 8) [ClassicSimilarity], result of:
      0.03359513 = score(doc=8,freq=14.0), product of:
        0.09498371 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.031400457 = queryNorm
        0.3536936 = fieldWeight in 8, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.03125 = fieldNorm(doc=8)
  0.1875 = coord(3/16)
```
Abstract

Hyperlink analysis algorithms allow search engines to deliver focused results to user queries.This article surveys ranking algorithms used to retrieve information on the Web.

Content

Information retrieval is a computer science subfield whose goal is to find all documents relevant to a user query in a given collection of documents. As such, information retrieval should really be called document retrieval. Before the advent of the Web, IR systems were typically installed in libraries for use mostly by reference librarians. The retrieval algorithm for these systems was usually based exclusively on analysis of the words in the document. The Web changed all this. Now each Web user has access to various search engines whose retrieval algorithms often use not only the words in the documents but also information like the hyperlink structure of the Web or markup language tags. How are hyperlinks useful? The hyperlink functionality alone-that is, the hyperlink to Web page B that is contained in Web page A-is not directly useful in information retrieval. However, the way Web page authors use hyperlinks can give them valuable information content. Authors usually create hyperlinks they think will be useful to readers. Some may be navigational aids that, for example, take the reader back to the site's home page; others provide access to documents that augment the content of the current page. The latter tend to point to highquality pages that might be on the same topic as the page containing the hyperlink. Web information retrieval systems can exploit this information to refine searches for relevant documents. Hyperlink analysis significantly improves the relevance of the search results, so much so that all major Web search engines claim to use some type of hyperlink analysis. However, the search engines do not disclose details about the type of hyperlink analysis they perform- mostly to avoid manipulation of search results by Web-positioning companies. In this article, I discuss how hyperlink analysis can be applied to ranking algorithms, and survey other ways Web search engines can use this analysis.

Computational information retrieval (2001) 0.02

0.018280702 = product of:
  0.097497076 = sum of:
    0.014343925 = weight(_text_:information in 4167) [ClassicSimilarity], result of:
      0.014343925 = score(doc=4167,freq=10.0), product of:
        0.055122808 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.031400457 = queryNorm
        0.2602176 = fieldWeight in 4167, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=4167)
    0.0503927 = weight(_text_:retrieval in 4167) [ClassicSimilarity], result of:
      0.0503927 = score(doc=4167,freq=14.0), product of:
        0.09498371 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.031400457 = queryNorm
        0.5305404 = fieldWeight in 4167, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=4167)
    0.032760452 = weight(_text_:software in 4167) [ClassicSimilarity], result of:
      0.032760452 = score(doc=4167,freq=2.0), product of:
        0.124570385 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.031400457 = queryNorm
        0.2629875 = fieldWeight in 4167, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.046875 = fieldNorm(doc=4167)
  0.1875 = coord(3/16)

Abstract: This volume contains selected papers that focus on the use of linear algebra, computational statistics, and computer science in the development of algorithms and software systems for text retrieval. Experts in information modeling and retrieval share their perspectives on the design of scalable but precise text retrieval systems, revealing many of the challenges and obstacles that mathematical and statistical models must overcome to be viable for automated text processing. This very useful proceedings is an excellent companion for courses in information retrieval, applied linear algebra, and applied statistics. Computational Information Retrieval provides background material on vector space models for text retrieval that applied mathematicians, statisticians, and computer scientists may not be familiar with. For graduate students in these areas, several research questions in information modeling are exposed. In addition, several case studies concerning the efficacy of the popular Latent Semantic Analysis (or Indexing) approach are provided.

Dominich, S.; Skrop, A.: PageRank and interaction information retrieval (2005) 0.02

0.018145664 = product of:
  0.09677687 = sum of:
    0.044339646 = weight(_text_:web in 3268) [ClassicSimilarity], result of:
      0.044339646 = score(doc=3268,freq=8.0), product of:
        0.10247572 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031400457 = queryNorm
        0.43268442 = fieldWeight in 3268, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=3268)
    0.014343925 = weight(_text_:information in 3268) [ClassicSimilarity], result of:
      0.014343925 = score(doc=3268,freq=10.0), product of:
        0.055122808 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.031400457 = queryNorm
        0.2602176 = fieldWeight in 3268, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=3268)
    0.0380933 = weight(_text_:retrieval in 3268) [ClassicSimilarity], result of:
      0.0380933 = score(doc=3268,freq=8.0), product of:
        0.09498371 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.031400457 = queryNorm
        0.40105087 = fieldWeight in 3268, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=3268)
  0.1875 = coord(3/16)

Abstract: The PageRank method is used by the Google Web search engine to compute the importance of Web pages. Two different views have been developed for the Interpretation of the PageRank method and values: (a) stochastic (random surfer): the PageRank values can be conceived as the steady-state distribution of a Markov chain, and (b) algebraic: the PageRank values form the eigenvector corresponding to eigenvalue 1 of the Web link matrix. The Interaction Information Retrieval (1**2 R) method is a nonclassical information retrieval paradigm, which represents a connectionist approach based an dynamic systems. In the present paper, a different Interpretation of PageRank is proposed, namely, a dynamic systems viewpoint, by showing that the PageRank method can be formally interpreted as a particular case of the Interaction Information Retrieval method; and thus, the PageRank values may be interpreted as neutral equilibrium points of the Web.
Source: Journal of the American Society for Information Science and Technology. 56(2005) no.1, S.63-69

Search (345 results, page 1 of 18)

Authors

Years

Languages

Types

Themes

Subjects

Classifications