Search (150 results, page 1 of 8)

  • × theme_ss:"Retrievalalgorithmen"
  1. Berry, M.W.; Browne, M.: Understanding search engines : mathematical modeling and text retrieval (1999) 0.04
    0.043571062 = product of:
      0.1161895 = sum of:
        0.054616455 = weight(_text_:wide in 5777) [ClassicSimilarity], result of:
          0.054616455 = score(doc=5777,freq=4.0), product of:
            0.13148437 = queryWeight, product of:
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.029675366 = queryNorm
            0.4153836 = fieldWeight in 5777, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.046875 = fieldNorm(doc=5777)
        0.041903697 = weight(_text_:web in 5777) [ClassicSimilarity], result of:
          0.041903697 = score(doc=5777,freq=8.0), product of:
            0.096845865 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.029675366 = queryNorm
            0.43268442 = fieldWeight in 5777, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=5777)
        0.019669347 = weight(_text_:data in 5777) [ClassicSimilarity], result of:
          0.019669347 = score(doc=5777,freq=2.0), product of:
            0.093835 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.029675366 = queryNorm
            0.2096163 = fieldWeight in 5777, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=5777)
      0.375 = coord(3/8)
    
    Abstract
    This book discusses many of the key design issues for building search engines and emphazises the important role that applied mathematics can play in improving information retrieval. The authors discuss not only important data structures, algorithms, and software but also user-centered issues such as interfaces, manual indexing, and document preparation. They also present some of the current problems in information retrieval that many not be familiar to applied mathematicians and computer scientists and some of the driving computational methods (SVD, SDD) for automated conceptual indexing
    LCSH
    Web search engines
    RSWK
    World Wide Web / Suchmaschine / Mathematisches Modell (BVB)
    Subject
    World Wide Web / Suchmaschine / Mathematisches Modell (BVB)
    Web search engines
  2. Chang, C.-H.; Hsu, C.-C.: Integrating query expansion and conceptual relevance feedback for personalized Web information retrieval (1998) 0.04
    0.038049873 = product of:
      0.10146633 = sum of:
        0.045056276 = weight(_text_:wide in 1319) [ClassicSimilarity], result of:
          0.045056276 = score(doc=1319,freq=2.0), product of:
            0.13148437 = queryWeight, product of:
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.029675366 = queryNorm
            0.342674 = fieldWeight in 1319, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1319)
        0.042337947 = weight(_text_:web in 1319) [ClassicSimilarity], result of:
          0.042337947 = score(doc=1319,freq=6.0), product of:
            0.096845865 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.029675366 = queryNorm
            0.43716836 = fieldWeight in 1319, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1319)
        0.014072108 = product of:
          0.028144216 = sum of:
            0.028144216 = weight(_text_:22 in 1319) [ClassicSimilarity], result of:
              0.028144216 = score(doc=1319,freq=2.0), product of:
                0.103918076 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.029675366 = queryNorm
                0.2708308 = fieldWeight in 1319, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1319)
          0.5 = coord(1/2)
      0.375 = coord(3/8)
    
    Abstract
    Keyword based querying has been an immediate and efficient way to specify and retrieve related information that the user inquired. However, conventional document ranking based on an automatic assessment of document relevance to the query may not be the best approach when little information is given. Proposes an idea to integrate 2 existing techniques, query expansion and relevance feedback to achieve a concept-based information search for the Web
    Date
    1. 8.1996 22:08:06
    Footnote
    Contribution to a special issue devoted to the Proceedings of the 7th International World Wide Web Conference, held 14-18 April 1998, Brisbane, Australia
  3. Habernal, I.; Konopík, M.; Rohlík, O.: Question answering (2012) 0.03
    0.032969773 = product of:
      0.0879194 = sum of:
        0.038619664 = weight(_text_:wide in 101) [ClassicSimilarity], result of:
          0.038619664 = score(doc=101,freq=2.0), product of:
            0.13148437 = queryWeight, product of:
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.029675366 = queryNorm
            0.29372054 = fieldWeight in 101, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.046875 = fieldNorm(doc=101)
        0.029630389 = weight(_text_:web in 101) [ClassicSimilarity], result of:
          0.029630389 = score(doc=101,freq=4.0), product of:
            0.096845865 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.029675366 = queryNorm
            0.3059541 = fieldWeight in 101, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=101)
        0.019669347 = weight(_text_:data in 101) [ClassicSimilarity], result of:
          0.019669347 = score(doc=101,freq=2.0), product of:
            0.093835 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.029675366 = queryNorm
            0.2096163 = fieldWeight in 101, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=101)
      0.375 = coord(3/8)
    
    Abstract
    Question Answering is an area of information retrieval with the added challenge of applying sophisticated techniques to identify the complex syntactic and semantic relationships present in text in order to provide a more sophisticated and satisfactory response to the user's information needs. For this reason, the authors see question answering as the next step beyond standard information retrieval. In this chapter state of the art question answering is covered focusing on providing an overview of systems, techniques and approaches that are likely to be employed in the next generations of search engines. Special attention is paid to question answering using the World Wide Web as the data source and to question answering exploiting the possibilities of Semantic Web. Considerations about the current issues and prospects for promising future research are also provided.
  4. Kantor, P.; Kim, M.H.; Ibraev, U.; Atasoy, K.: Estimating the number of relevant documents in enormous collections (1999) 0.02
    0.024762768 = product of:
      0.06603405 = sum of:
        0.03218305 = weight(_text_:wide in 6690) [ClassicSimilarity], result of:
          0.03218305 = score(doc=6690,freq=2.0), product of:
            0.13148437 = queryWeight, product of:
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.029675366 = queryNorm
            0.24476713 = fieldWeight in 6690, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.0390625 = fieldNorm(doc=6690)
        0.017459875 = weight(_text_:web in 6690) [ClassicSimilarity], result of:
          0.017459875 = score(doc=6690,freq=2.0), product of:
            0.096845865 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.029675366 = queryNorm
            0.18028519 = fieldWeight in 6690, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=6690)
        0.016391123 = weight(_text_:data in 6690) [ClassicSimilarity], result of:
          0.016391123 = score(doc=6690,freq=2.0), product of:
            0.093835 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.029675366 = queryNorm
            0.17468026 = fieldWeight in 6690, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=6690)
      0.375 = coord(3/8)
    
    Abstract
    In assessing information retrieval systems, it is important to know not only the precision of the retrieved set, but also to compare the number of retrieved relevant items to the total number of relevant items. For large collections, such as the TREC test collections, or the World Wide Web, it is not possible to enumerate the entire set of relevant documents. If the retrieved documents are evaluated, a variant of the statistical "capture-recapture" method can be used to estimate the total number of relevant documents, providing the several retrieval systems used are sufficiently independent. We show that the underlying signal detection model supporting such an analysis can be extended in two ways. First, assuming that there are two distinct performance characteristics (corresponding to the chance of retrieving a relevant, and retrieving a given non-relevant document), we show that if there are three or more independent systems available it is possible to estimate the number of relevant documents without actually having to decide whether each individual document is relevant. We report applications of this 3-system method to the TREC data, leading to the conclusion that the independence assumptions are not satisfied. We then extend the model to a multi-system, multi-problem model, and show that it is possible to include statistical dependencies of all orders in the model, and determine the number of relevant documents for each of the problems in the set. Application to the TREC setting will be presented
  5. Symonds, M.; Bruza, P.; Zuccon, G.; Koopman, B.; Sitbon, L.; Turner, I.: Automatic query expansion : a structural linguistic perspective (2014) 0.02
    0.024762768 = product of:
      0.06603405 = sum of:
        0.03218305 = weight(_text_:wide in 1338) [ClassicSimilarity], result of:
          0.03218305 = score(doc=1338,freq=2.0), product of:
            0.13148437 = queryWeight, product of:
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.029675366 = queryNorm
            0.24476713 = fieldWeight in 1338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1338)
        0.017459875 = weight(_text_:web in 1338) [ClassicSimilarity], result of:
          0.017459875 = score(doc=1338,freq=2.0), product of:
            0.096845865 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.029675366 = queryNorm
            0.18028519 = fieldWeight in 1338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1338)
        0.016391123 = weight(_text_:data in 1338) [ClassicSimilarity], result of:
          0.016391123 = score(doc=1338,freq=2.0), product of:
            0.093835 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.029675366 = queryNorm
            0.17468026 = fieldWeight in 1338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1338)
      0.375 = coord(3/8)
    
    Abstract
    A user's query is considered to be an imprecise description of their information need. Automatic query expansion is the process of reformulating the original query with the goal of improving retrieval effectiveness. Many successful query expansion techniques model syntagmatic associations that infer two terms co-occur more often than by chance in natural language. However, structural linguistics relies on both syntagmatic and paradigmatic associations to deduce the meaning of a word. Given the success of dependency-based approaches to query expansion and the reliance on word meanings in the query formulation process, we argue that modeling both syntagmatic and paradigmatic information in the query expansion process improves retrieval effectiveness. This article develops and evaluates a new query expansion technique that is based on a formal, corpus-based model of word meaning that models syntagmatic and paradigmatic associations. We demonstrate that when sufficient statistical information exists, as in the case of longer queries, including paradigmatic information alone provides significant improvements in retrieval effectiveness across a wide variety of data sets. More generally, when our new query expansion approach is applied to large-scale web retrieval it demonstrates significant improvements in retrieval effectiveness over a strong baseline system, based on a commercial search engine.
  6. Baloh, P.; Desouza, K.C.; Hackney, R.: Contextualizing organizational interventions of knowledge management systems : a design science perspectiveA domain analysis (2012) 0.02
    0.02198463 = product of:
      0.05862568 = sum of:
        0.03218305 = weight(_text_:wide in 241) [ClassicSimilarity], result of:
          0.03218305 = score(doc=241,freq=2.0), product of:
            0.13148437 = queryWeight, product of:
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.029675366 = queryNorm
            0.24476713 = fieldWeight in 241, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.0390625 = fieldNorm(doc=241)
        0.016391123 = weight(_text_:data in 241) [ClassicSimilarity], result of:
          0.016391123 = score(doc=241,freq=2.0), product of:
            0.093835 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.029675366 = queryNorm
            0.17468026 = fieldWeight in 241, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=241)
        0.010051507 = product of:
          0.020103013 = sum of:
            0.020103013 = weight(_text_:22 in 241) [ClassicSimilarity], result of:
              0.020103013 = score(doc=241,freq=2.0), product of:
                0.103918076 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.029675366 = queryNorm
                0.19345059 = fieldWeight in 241, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=241)
          0.5 = coord(1/2)
      0.375 = coord(3/8)
    
    Abstract
    We address how individuals' (workers) knowledge needs influence the design of knowledge management systems (KMS), enabling knowledge creation and utilization. It is evident that KMS technologies and activities are indiscriminately deployed in most organizations with little regard to the actual context of their adoption. Moreover, it is apparent that the extant literature pertaining to knowledge management projects is frequently deficient in identifying the variety of factors indicative for successful KMS. This presents an obvious business practice and research gap that requires a critical analysis of the necessary intervention that will actually improve how workers can leverage and form organization-wide knowledge. This research involved an extensive review of the literature, a grounded theory methodological approach and rigorous data collection and synthesis through an empirical case analysis (Parsons Brinckerhoff and Samsung). The contribution of this study is the formulation of a model for designing KMS based upon the design science paradigm, which aspires to create artifacts that are interdependent of people and organizations. The essential proposition is that KMS design and implementation must be contextualized in relation to knowledge needs and that these will differ for various organizational settings. The findings present valuable insights and further understanding of the way in which KMS design efforts should be focused.
    Date
    11. 6.2012 14:22:34
  7. Biskri, I.; Rompré, L.: Using association rules for query reformulation (2012) 0.02
    0.019588731 = product of:
      0.078354925 = sum of:
        0.03406831 = weight(_text_:data in 92) [ClassicSimilarity], result of:
          0.03406831 = score(doc=92,freq=6.0), product of:
            0.093835 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.029675366 = queryNorm
            0.3630661 = fieldWeight in 92, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=92)
        0.044286616 = product of:
          0.08857323 = sum of:
            0.08857323 = weight(_text_:mining in 92) [ClassicSimilarity], result of:
              0.08857323 = score(doc=92,freq=4.0), product of:
                0.16744171 = queryWeight, product of:
                  5.642448 = idf(docFreq=425, maxDocs=44218)
                  0.029675366 = queryNorm
                0.5289795 = fieldWeight in 92, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.642448 = idf(docFreq=425, maxDocs=44218)
                  0.046875 = fieldNorm(doc=92)
          0.5 = coord(1/2)
      0.25 = coord(2/8)
    
    Abstract
    In this paper the authors will present research on the combination of two methods of data mining: text classification and maximal association rules. Text classification has been the focus of interest of many researchers for a long time. However, the results take the form of lists of words (classes) that people often do not know what to do with. The use of maximal association rules induced a number of advantages: (1) the detection of dependencies and correlations between the relevant units of information (words) of different classes, (2) the extraction of hidden knowledge, often relevant, from a large volume of data. The authors will show how this combination can improve the process of information retrieval.
    Theme
    Data Mining
  8. Shiri, A.A.; Revie, C.: Query expansion behavior within a thesaurus-enhanced search environment : a user-centered evaluation (2006) 0.02
    0.019009473 = product of:
      0.05069193 = sum of:
        0.017459875 = weight(_text_:web in 56) [ClassicSimilarity], result of:
          0.017459875 = score(doc=56,freq=2.0), product of:
            0.096845865 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.029675366 = queryNorm
            0.18028519 = fieldWeight in 56, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=56)
        0.023180548 = weight(_text_:data in 56) [ClassicSimilarity], result of:
          0.023180548 = score(doc=56,freq=4.0), product of:
            0.093835 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.029675366 = queryNorm
            0.24703519 = fieldWeight in 56, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=56)
        0.010051507 = product of:
          0.020103013 = sum of:
            0.020103013 = weight(_text_:22 in 56) [ClassicSimilarity], result of:
              0.020103013 = score(doc=56,freq=2.0), product of:
                0.103918076 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.029675366 = queryNorm
                0.19345059 = fieldWeight in 56, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=56)
          0.5 = coord(1/2)
      0.375 = coord(3/8)
    
    Abstract
    The study reported here investigated the query expansion behavior of end-users interacting with a thesaurus-enhanced search system on the Web. Two groups, namely academic staff and postgraduate students, were recruited into this study. Data were collected from 90 searches performed by 30 users using the OVID interface to the CAB abstracts database. Data-gathering techniques included questionnaires, screen capturing software, and interviews. The results presented here relate to issues of search-topic and search-term characteristics, number and types of expanded queries, usefulness of thesaurus terms, and behavioral differences between academic staff and postgraduate students in their interaction. The key conclusions drawn were that (a) academic staff chose more narrow and synonymous terms than did postgraduate students, who generally selected broader and related terms; (b) topic complexity affected users' interaction with the thesaurus in that complex topics required more query expansion and search term selection; (c) users' prior topic-search experience appeared to have a significant effect on their selection and evaluation of thesaurus terms; (d) in 50% of the searches where additional terms were suggested from the thesaurus, users stated that they had not been aware of the terms at the beginning of the search; this observation was particularly noticeable in the case of postgraduate students.
    Date
    22. 7.2006 16:32:43
  9. Stock, M.; Stock, W.G.: Internet-Suchwerkzeuge im Vergleich (IV) : Relevance Ranking nach "Popularität" von Webseiten: Google (2001) 0.02
    0.018727332 = product of:
      0.07490933 = sum of:
        0.038619664 = weight(_text_:wide in 5771) [ClassicSimilarity], result of:
          0.038619664 = score(doc=5771,freq=2.0), product of:
            0.13148437 = queryWeight, product of:
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.029675366 = queryNorm
            0.29372054 = fieldWeight in 5771, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.046875 = fieldNorm(doc=5771)
        0.03628967 = weight(_text_:web in 5771) [ClassicSimilarity], result of:
          0.03628967 = score(doc=5771,freq=6.0), product of:
            0.096845865 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.029675366 = queryNorm
            0.37471575 = fieldWeight in 5771, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=5771)
      0.25 = coord(2/8)
    
    Abstract
    In unserem Retrievaltest von Suchwerkzeugen im World Wide Web (Password 11/2000) schnitt die Suchmaschine Google am besten ab. Im Vergleich zu anderen Search Engines setzt Google kaum auf Informationslinguistik, sondern auf Algorithmen, die sich aus den Besonderheiten der Web-Dokumente ableiten lassen. Kernstück der informationsstatistischen Technik ist das "PageRank"- Verfahren (benannt nach dem Entwickler Larry Page), das aus der Hypertextstruktur des Web die "Popularität" von Seiten anhand ihrer ein- und ausgehenden Links berechnet. Google besticht durch das Angebot intuitiv verstehbarer Suchbildschirme sowie durch einige sehr nützliche "Kleinigkeiten" wie die Angabe des Rangs einer Seite, Highlighting, Suchen in der Seite, Suchen innerhalb eines Suchergebnisses usw., alles verstaut in einer eigenen Befehlsleiste innerhalb des Browsers. Ähnlich wie RealNames bietet Google mit dem Produkt "AdWords" den Aufkauf von Suchtermen an. Nach einer Reihe von nunmehr vier Password-Artikeln über InternetSuchwerkzeugen im Vergleich wollen wir abschließend zu einer Bewertung kommen. Wie ist der Stand der Technik bei Directories und Search Engines aus informationswissenschaftlicher Sicht einzuschätzen? Werden die "typischen" Internetnutzer, die ja in der Regel keine Information Professionals sind, adäquat bedient? Und können auch Informationsfachleute von den Suchwerkzeugen profitieren?
  10. Ding, Y.; Chowdhury, G.; Foo, S.: Organsising keywords in a Web search environment : a methodology based on co-word analysis (2000) 0.02
    0.017062513 = product of:
      0.06825005 = sum of:
        0.038619664 = weight(_text_:wide in 105) [ClassicSimilarity], result of:
          0.038619664 = score(doc=105,freq=2.0), product of:
            0.13148437 = queryWeight, product of:
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.029675366 = queryNorm
            0.29372054 = fieldWeight in 105, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.046875 = fieldNorm(doc=105)
        0.029630389 = weight(_text_:web in 105) [ClassicSimilarity], result of:
          0.029630389 = score(doc=105,freq=4.0), product of:
            0.096845865 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.029675366 = queryNorm
            0.3059541 = fieldWeight in 105, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=105)
      0.25 = coord(2/8)
    
    Abstract
    The rapid development of the Internet and World Wide Web has caused some critical problem for information retrieval. Researchers have made several attempts to solve these problems. Thesauri and subject heading lists as traditional information retrieval tools have been criticised for their efficiency to tackle these newly emerging problems. This paper proposes an information retrieval tool generated by cocitation analysis, comprising keyword clusters with relationships based on the co-occurrences of keywords in the literature. Such a tool can play the role of an associative thesaurus that can provide information about the keywords in a domain that might be useful for information searching and query expansion
  11. Ning, X.; Jin, H.; Wu, H.: RSS: a framework enabling ranked search on the semantic web (2008) 0.02
    0.016443776 = product of:
      0.065775104 = sum of:
        0.049383983 = weight(_text_:web in 2069) [ClassicSimilarity], result of:
          0.049383983 = score(doc=2069,freq=16.0), product of:
            0.096845865 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.029675366 = queryNorm
            0.5099235 = fieldWeight in 2069, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2069)
        0.016391123 = weight(_text_:data in 2069) [ClassicSimilarity], result of:
          0.016391123 = score(doc=2069,freq=2.0), product of:
            0.093835 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.029675366 = queryNorm
            0.17468026 = fieldWeight in 2069, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2069)
      0.25 = coord(2/8)
    
    Abstract
    The semantic web not only contains resources but also includes the heterogeneous relationships among them, which is sharply distinguished from the current web. As the growth of the semantic web, specialized search techniques are of significance. In this paper, we present RSS-a framework for enabling ranked semantic search on the semantic web. In this framework, the heterogeneity of relationships is fully exploited to determine the global importance of resources. In addition, the search results can be greatly expanded with entities most semantically related to the query, thus able to provide users with properly ordered semantic search results by combining global ranking values and the relevance between the resources and the query. The proposed semantic search model which supports inference is very different from traditional keyword-based search methods. Moreover, RSS also distinguishes from many current methods of accessing the semantic web data in that it applies novel ranking strategies to prevent returning search results in disorder. The experimental results show that the framework is feasible and can produce better ordering of semantic search results than directly applying the standard PageRank algorithm on the semantic web.
    Theme
    Semantic Web
  12. Chen, H.; Lally, A.M.; Zhu, B.; Chau, M.: HelpfulMed : Intelligent searching for medical information over the Internet (2003) 0.02
    0.015253972 = product of:
      0.06101589 = sum of:
        0.03491975 = weight(_text_:web in 1615) [ClassicSimilarity], result of:
          0.03491975 = score(doc=1615,freq=8.0), product of:
            0.096845865 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.029675366 = queryNorm
            0.36057037 = fieldWeight in 1615, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1615)
        0.02609614 = product of:
          0.05219228 = sum of:
            0.05219228 = weight(_text_:mining in 1615) [ClassicSimilarity], result of:
              0.05219228 = score(doc=1615,freq=2.0), product of:
                0.16744171 = queryWeight, product of:
                  5.642448 = idf(docFreq=425, maxDocs=44218)
                  0.029675366 = queryNorm
                0.31170416 = fieldWeight in 1615, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.642448 = idf(docFreq=425, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1615)
          0.5 = coord(1/2)
      0.25 = coord(2/8)
    
    Abstract
    The Medical professionals and researchers need information from reputable sources to accomplish their work. Unfortunately, the Web has a large number of documents that are irrelevant to their work, even those documents that purport to be "medically-related." This paper describes an architecture designed to integrate advanced searching and indexing algorithms, an automatic thesaurus, or "concept space," and Kohonen-based Self-Organizing Map (SOM) technologies to provide searchers with finegrained results. Initial results indicate that these systems provide complementary retrieval functionalities. HelpfulMed not only allows users to search Web pages and other online databases, but also allows them to build searches through the use of an automatic thesaurus and browse a graphical display of medical-related topics. Evaluation results for each of the different components are included. Our spidering algorithm outperformed both breadth-first search and PageRank spiders an a test collection of 100,000 Web pages. The automatically generated thesaurus performed as well as both MeSH and UMLS-systems which require human mediation for currency. Lastly, a variant of the Kohonen SOM was comparable to MeSH terms in perceived cluster precision and significantly better at perceived cluster recall.
    Footnote
    Teil eines Themenheftes: "Web retrieval and mining: A machine learning perspective"
  13. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment (1998) 0.01
    0.014892878 = product of:
      0.059571512 = sum of:
        0.038619664 = weight(_text_:wide in 5) [ClassicSimilarity], result of:
          0.038619664 = score(doc=5,freq=2.0), product of:
            0.13148437 = queryWeight, product of:
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.029675366 = queryNorm
            0.29372054 = fieldWeight in 5, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.046875 = fieldNorm(doc=5)
        0.020951848 = weight(_text_:web in 5) [ClassicSimilarity], result of:
          0.020951848 = score(doc=5,freq=2.0), product of:
            0.096845865 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.029675366 = queryNorm
            0.21634221 = fieldWeight in 5, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=5)
      0.25 = coord(2/8)
    
    Abstract
    The network structure of a hyperlinked environment can be a rich source of information about the content of the environment, provided we have effective means for understanding it. We develop a set of algorithmic tools for extracting information from the link structures of such environments, and report on experiments that demonstrate their effectiveness in a variety of contexts on the World Wide Web. The central issue we address within our framework is the distillation of broad search topics, through the discovery of "authoritative" information sources on such topics. We propose and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of "hub pages" that join them together in the link structure. Our formulation has connections to the eigenvectors of certain matrices associated with the link graph; these connections in turn motivate additional heuristics for link-based analysis.
  14. Li, M.; Li, H.; Zhou, Z.-H.: Semi-supervised document retrieval (2009) 0.01
    0.01440244 = product of:
      0.05760976 = sum of:
        0.017459875 = weight(_text_:web in 4218) [ClassicSimilarity], result of:
          0.017459875 = score(doc=4218,freq=2.0), product of:
            0.096845865 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.029675366 = queryNorm
            0.18028519 = fieldWeight in 4218, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4218)
        0.040149886 = weight(_text_:data in 4218) [ClassicSimilarity], result of:
          0.040149886 = score(doc=4218,freq=12.0), product of:
            0.093835 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.029675366 = queryNorm
            0.4278775 = fieldWeight in 4218, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4218)
      0.25 = coord(2/8)
    
    Abstract
    This paper proposes a new machine learning method for constructing ranking models in document retrieval. The method, which is referred to as SSRank, aims to use the advantages of both the traditional Information Retrieval (IR) methods and the supervised learning methods for IR proposed recently. The advantages include the use of limited amount of labeled data and rich model representation. To do so, the method adopts a semi-supervised learning framework in ranking model construction. Specifically, given a small number of labeled documents with respect to some queries, the method effectively labels the unlabeled documents for the queries. It then uses all the labeled data to train a machine learning model (in our case, Neural Network). In the data labeling, the method also makes use of a traditional IR model (in our case, BM25). A stopping criterion based on machine learning theory is given for the data labeling process. Experimental results on three benchmark datasets and one web search dataset indicate that SSRank consistently and almost always significantly outperforms the baseline methods (unsupervised and supervised learning methods), given the same amount of labeled data. This is because SSRank can effectively leverage the use of unlabeled data in learning.
  15. Ning, X.; Jin, H.; Jia, W.; Yuan, P.: Practical and effective IR-style keyword search over semantic web (2009) 0.01
    0.014379091 = product of:
      0.057516363 = sum of:
        0.03456879 = weight(_text_:web in 4213) [ClassicSimilarity], result of:
          0.03456879 = score(doc=4213,freq=4.0), product of:
            0.096845865 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.029675366 = queryNorm
            0.35694647 = fieldWeight in 4213, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4213)
        0.022947572 = weight(_text_:data in 4213) [ClassicSimilarity], result of:
          0.022947572 = score(doc=4213,freq=2.0), product of:
            0.093835 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.029675366 = queryNorm
            0.24455236 = fieldWeight in 4213, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4213)
      0.25 = coord(2/8)
    
    Abstract
    This paper presents a novel IR-style keyword search model for semantic web data retrieval, distinguished from current retrieval methods. In this model, an answer to a keyword query is a connected subgraph that contains all the query keywords. In addition, the answer is minimal because any proper subgraph can not be an answer to the query. We provide an approximation algorithm to retrieve these answers efficiently. A special ranking strategy is also proposed so that answers can be appropriately ordered. The experimental results over real datasets show that our model outperforms existing possible solutions with respect to effectiveness and efficiency.
  16. Henzinger, M.R.: Link analysis in Web information retrieval (2000) 0.01
    0.01432082 = product of:
      0.05728328 = sum of:
        0.04417038 = weight(_text_:web in 801) [ClassicSimilarity], result of:
          0.04417038 = score(doc=801,freq=20.0), product of:
            0.096845865 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.029675366 = queryNorm
            0.45608947 = fieldWeight in 801, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.03125 = fieldNorm(doc=801)
        0.013112898 = weight(_text_:data in 801) [ClassicSimilarity], result of:
          0.013112898 = score(doc=801,freq=2.0), product of:
            0.093835 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.029675366 = queryNorm
            0.1397442 = fieldWeight in 801, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.03125 = fieldNorm(doc=801)
      0.25 = coord(2/8)
    
    Abstract
    The analysis of the hyperlink structure of the web has led to significant improvements in web information retrieval. This survey describes two successful link analysis algorithms and the state-of-the art of the field.
    Content
    The goal of information retrieval is to find all documents relevant for a user query in a collection of documents. Decades of research in information retrieval were successful in developing and refining techniques that are solely word-based (see e.g., [2]). With the advent of the web new sources of information became available, one of them being the hyperlinks between documents and records of user behavior. To be precise, hypertexts (i.e., collections of documents connected by hyperlinks) have existed and have been studied for a long time. What was new was the large number of hyperlinks created by independent individuals. Hyperlinks provide a valuable source of information for web information retrieval as we will show in this article. This area of information retrieval is commonly called link analysis. Why would one expect hyperlinks to be useful? Ahyperlink is a reference of a web page B that is contained in a web page A. When the hyperlink is clicked on in a web browser, the browser displays page B. This functionality alone is not helpful for web information retrieval. However, the way hyperlinks are typically used by authors of web pages can give them valuable information content. Typically, authors create links because they think they will be useful for the readers of the pages. Thus, links are usually either navigational aids that, for example, bring the reader back to the homepage of the site, or links that point to pages whose content augments the content of the current page. The second kind of links tend to point to high-quality pages that might be on the same topic as the page containing the link.
    Source
    IEEE data engineering bulletin. 23(2000) no.3, S.3-8
  17. Austin, D.: How Google finds your needle in the Web's haystack : as we'll see, the trick is to ask the web itself to rank the importance of pages... (2006) 0.01
    0.014274232 = product of:
      0.05709693 = sum of:
        0.022528138 = weight(_text_:wide in 93) [ClassicSimilarity], result of:
          0.022528138 = score(doc=93,freq=2.0), product of:
            0.13148437 = queryWeight, product of:
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.029675366 = queryNorm
            0.171337 = fieldWeight in 93, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.02734375 = fieldNorm(doc=93)
        0.03456879 = weight(_text_:web in 93) [ClassicSimilarity], result of:
          0.03456879 = score(doc=93,freq=16.0), product of:
            0.096845865 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.029675366 = queryNorm
            0.35694647 = fieldWeight in 93, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.02734375 = fieldNorm(doc=93)
      0.25 = coord(2/8)
    
    Abstract
    Imagine a library containing 25 billion documents but with no centralized organization and no librarians. In addition, anyone may add a document at any time without telling anyone. You may feel sure that one of the documents contained in the collection has a piece of information that is vitally important to you, and, being impatient like most of us, you'd like to find it in a matter of seconds. How would you go about doing it? Posed in this way, the problem seems impossible. Yet this description is not too different from the World Wide Web, a huge, highly-disorganized collection of documents in many different formats. Of course, we're all familiar with search engines (perhaps you found this article using one) so we know that there is a solution. This article will describe Google's PageRank algorithm and how it returns pages from the web's collection of 25 billion documents that match search criteria so well that "google" has become a widely used verb. Most search engines, including Google, continually run an army of computer programs that retrieve pages from the web, index the words in each document, and store this information in an efficient format. Each time a user asks for a web search using a search phrase, such as "search engine," the search engine determines all the pages on the web that contains the words in the search phrase. (Perhaps additional information such as the distance between the words "search" and "engine" will be noted as well.) Here is the problem: Google now claims to index 25 billion pages. Roughly 95% of the text in web pages is composed from a mere 10,000 words. This means that, for most searches, there will be a huge number of pages containing the words in the search phrase. What is needed is a means of ranking the importance of the pages that fit the search criteria so that the pages can be sorted with the most important pages at the top of the list. One way to determine the importance of pages is to use a human-generated ranking. For instance, you may have seen pages that consist mainly of a large number of links to other resources in a particular area of interest. Assuming the person maintaining this page is reliable, the pages referenced are likely to be useful. Of course, the list may quickly fall out of date, and the person maintaining the list may miss some important pages, either unintentionally or as a result of an unstated bias. Google's PageRank algorithm assesses the importance of web pages without human evaluation of the content. In fact, Google feels that the value of its service is largely in its ability to provide unbiased results to search queries; Google claims, "the heart of our software is PageRank." As we'll see, the trick is to ask the web itself to rank the importance of pages.
  18. Picard, J.; Savoy, J.: Enhancing retrieval with hyperlinks : a general model based on propositional argumentation systems (2003) 0.01
    0.014218761 = product of:
      0.056875043 = sum of:
        0.03218305 = weight(_text_:wide in 1427) [ClassicSimilarity], result of:
          0.03218305 = score(doc=1427,freq=2.0), product of:
            0.13148437 = queryWeight, product of:
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.029675366 = queryNorm
            0.24476713 = fieldWeight in 1427, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1427)
        0.024691992 = weight(_text_:web in 1427) [ClassicSimilarity], result of:
          0.024691992 = score(doc=1427,freq=4.0), product of:
            0.096845865 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.029675366 = queryNorm
            0.25496176 = fieldWeight in 1427, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1427)
      0.25 = coord(2/8)
    
    Abstract
    Fast, effective, and adaptable techniques are needed to automatically organize and retrieve information an the ever-increasing World Wide Web. In that respect, different strategies have been suggested to take hypertext links into account. For example, hyperlinks have been used to (1) enhance document representation, (2) improve document ranking by propagating document score, (3) provide an indicator of popularity, and (4) find hubs and authorities for a given topic. Although the TREC experiments have not demonstrated the usefulness of hyperlinks for retrieval, the hypertext structure is nevertheless an essential aspect of the Web, and as such, should not be ignored. The development of abstract models of the IR task was a key factor to the improvement of search engines. However, at this time conceptual tools for modeling the hypertext retrieval task are lacking, making it difficult to compare, improve, and reason an the existing techniques. This article proposes a general model for using hyperlinks based an Probabilistic Argumentation Systems, in which each of the above-mentioned techniques can be stated. This model will allow to discover some inconsistencies in the mentioned techniques, and to take a higher level and systematic approach for using hyperlinks for retrieval.
  19. Fu, X.: Towards a model of implicit feedback for Web search (2010) 0.01
    0.013989754 = product of:
      0.055959016 = sum of:
        0.03628967 = weight(_text_:web in 3310) [ClassicSimilarity], result of:
          0.03628967 = score(doc=3310,freq=6.0), product of:
            0.096845865 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.029675366 = queryNorm
            0.37471575 = fieldWeight in 3310, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=3310)
        0.019669347 = weight(_text_:data in 3310) [ClassicSimilarity], result of:
          0.019669347 = score(doc=3310,freq=2.0), product of:
            0.093835 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.029675366 = queryNorm
            0.2096163 = fieldWeight in 3310, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=3310)
      0.25 = coord(2/8)
    
    Abstract
    This research investigated several important issues in using implicit feedback techniques to assist searchers with difficulties in formulating effective search strategies. It focused on examining the relationship between types of behavioral evidence that can be captured from Web searches and searchers' interests. A carefully crafted observation study was conducted to capture, examine, and elucidate the analytical processes and work practices of human analysts when they simulated the role of an implicit feedback system by trying to infer searchers' interests from behavioral traces. Findings provided rare insight into the complexities and nuances in using behavioral evidence for implicit feedback and led to the proposal of an implicit feedback model for Web search that bridged previous studies on behavioral evidence and implicit feedback measures. A new level of analysis termed an analytical lens emerged from the data and provides a road map for future research on this topic.
  20. Fan, W.; Fox, E.A.; Pathak, P.; Wu, H.: ¬The effects of fitness functions an genetic programming-based ranking discovery for Web search (2004) 0.01
    0.013491376 = product of:
      0.053965505 = sum of:
        0.041903697 = weight(_text_:web in 2239) [ClassicSimilarity], result of:
          0.041903697 = score(doc=2239,freq=8.0), product of:
            0.096845865 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.029675366 = queryNorm
            0.43268442 = fieldWeight in 2239, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=2239)
        0.012061807 = product of:
          0.024123615 = sum of:
            0.024123615 = weight(_text_:22 in 2239) [ClassicSimilarity], result of:
              0.024123615 = score(doc=2239,freq=2.0), product of:
                0.103918076 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.029675366 = queryNorm
                0.23214069 = fieldWeight in 2239, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2239)
          0.5 = coord(1/2)
      0.25 = coord(2/8)
    
    Abstract
    Genetic-based evolutionary learning algorithms, such as genetic algorithms (GAs) and genetic programming (GP), have been applied to information retrieval (IR) since the 1980s. Recently, GP has been applied to a new IR taskdiscovery of ranking functions for Web search-and has achieved very promising results. However, in our prior research, only one fitness function has been used for GP-based learning. It is unclear how other fitness functions may affect ranking function discovery for Web search, especially since it is weIl known that choosing a proper fitness function is very important for the effectiveness and efficiency of evolutionary algorithms. In this article, we report our experience in contrasting different fitness function designs an GP-based learning using a very large Web corpus. Our results indicate that the design of fitness functions is instrumental in performance improvement. We also give recommendations an the design of fitness functions for genetic-based information retrieval experiments.
    Date
    31. 5.2004 19:22:06

Years

Languages

  • e 138
  • d 11
  • m 1
  • More… Less…

Types

  • a 134
  • m 10
  • el 4
  • s 4
  • r 1
  • x 1
  • More… Less…