Search (4 results, page 1 of 1)

  • × theme_ss:"Literaturübersicht"
  • × theme_ss:"Internet"
  1. Chowdhury, G.G.: ¬The Internet and information retrieval research : a brief review (1999) 0.03
    0.025550429 = product of:
      0.102201715 = sum of:
        0.102201715 = weight(_text_:engines in 3424) [ClassicSimilarity], result of:
          0.102201715 = score(doc=3424,freq=2.0), product of:
            0.22757743 = queryWeight, product of:
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.04479146 = queryNorm
            0.44908544 = fieldWeight in 3424, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.0625 = fieldNorm(doc=3424)
      0.25 = coord(1/4)
    
    Abstract
    The Internet and related information services attract increasing interest from information retrieval researchers. A survey of recent publications shows that frequent topics are the effectiveness of search engines, information validation and quality, user studies, design of user interfaces, data structures and metadata, classification and vocabulary based aids, and indexing and search agents. Current research in these areas is briefly discussed. The changing balance between CD-ROM sources and traditional online searching is quite important and is noted
  2. Rasmussen, E.M.: Indexing and retrieval for the Web (2002) 0.02
    0.019361407 = product of:
      0.077445626 = sum of:
        0.077445626 = weight(_text_:engines in 4285) [ClassicSimilarity], result of:
          0.077445626 = score(doc=4285,freq=6.0), product of:
            0.22757743 = queryWeight, product of:
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.04479146 = queryNorm
            0.34030452 = fieldWeight in 4285, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.02734375 = fieldNorm(doc=4285)
      0.25 = coord(1/4)
    
    Abstract
    Techniques for automated indexing and information retrieval (IR) have been developed, tested, and refined over the past 40 years, and are well documented (see, for example, Agosti & Smeaton, 1996; BaezaYates & Ribeiro-Neto, 1999a; Frakes & Baeza-Yates, 1992; Korfhage, 1997; Salton, 1989; Witten, Moffat, & Bell, 1999). With the introduction of the Web, and the capability to index and retrieve via search engines, these techniques have been extended to a new environment. They have been adopted, altered, and in some Gases extended to include new methods. "In short, search engines are indispensable for searching the Web, they employ a variety of relatively advanced IR techniques, and there are some peculiar aspects of search engines that make searching the Web different than more conventional information retrieval" (Gordon & Pathak, 1999, p. 145). The environment for information retrieval an the World Wide Web differs from that of "conventional" information retrieval in a number of fundamental ways. The collection is very large and changes continuously, with pages being added, deleted, and altered. Wide variability between the size, structure, focus, quality, and usefulness of documents makes Web documents much more heterogeneous than a typical electronic document collection. The wide variety of document types includes images, video, audio, and scripts, as well as many different document languages. Duplication of documents and sites is common. Documents are interconnected through networks of hyperlinks. Because of the size and dynamic nature of the Web, preprocessing all documents requires considerable resources and is often not feasible, certainly not an the frequent basis required to ensure currency. Query length is usually much shorter than in other environments-only a few words-and user behavior differs from that in other environments. These differences make the Web a novel environment for information retrieval (Baeza-Yates & Ribeiro-Neto, 1999b; Bharat & Henzinger, 1998; Huang, 2000).
  3. Woodward, J.: Cataloging and classifying information resources on the Internet (1996) 0.02
    0.01916282 = product of:
      0.07665128 = sum of:
        0.07665128 = weight(_text_:engines in 7397) [ClassicSimilarity], result of:
          0.07665128 = score(doc=7397,freq=2.0), product of:
            0.22757743 = queryWeight, product of:
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.04479146 = queryNorm
            0.33681408 = fieldWeight in 7397, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.046875 = fieldNorm(doc=7397)
      0.25 = coord(1/4)
    
    Abstract
    State of the art review exploring the problem of bibliographic citations to resources that exist only in electronic form where the cited items may no longer be locatable at the URL indicated. Notes that the Internet is currently in a state of near chaos in terms of access and organization, while searching, usually performed with word based search engines, is generally not adequate for the needs of most users. Reviews strategies used by librarians for cataloguing and classifying information resources on the Internet. Techniques used include: automatic classification projects and classified subject trees, like the BUBL Subject Tree; CyberDewey, and the WWW Virtual Library. Considers OPAC like library catalogues such as the UK's CATRIONA Project and OCLC's InterCat. Explores retrieval tools used with concept analysis and other non traditional proposals, which include some library expertise, usually the use of one of the major library classifications. Pays particular attention to the UDC
  4. Thelwall, M.; Vaughan, L.; Björneborn, L.: Webometrics (2004) 0.02
    0.015969018 = product of:
      0.06387607 = sum of:
        0.06387607 = weight(_text_:engines in 4279) [ClassicSimilarity], result of:
          0.06387607 = score(doc=4279,freq=2.0), product of:
            0.22757743 = queryWeight, product of:
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.04479146 = queryNorm
            0.2806784 = fieldWeight in 4279, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4279)
      0.25 = coord(1/4)
    
    Abstract
    Webometrics, the quantitative study of Web-related phenomena, emerged from the realization that methods originally designed for bibliometric analysis of scientific journal article citation patterns could be applied to the Web, with commercial search engines providing the raw data. Almind and Ingwersen (1997) defined the field and gave it its name. Other pioneers included Rodriguez Gairin (1997) and Aguillo (1998). Larson (1996) undertook exploratory link structure analysis, as did Rousseau (1997). Webometrics encompasses research from fields beyond information science such as communication studies, statistical physics, and computer science. In this review we concentrate on link analysis, but also cover other aspects of webometrics, including Web log fle analysis. One theme that runs through this chapter is the messiness of Web data and the need for data cleansing heuristics. The uncontrolled Web creates numerous problems in the interpretation of results, for instance, from the automatic creation or replication of links. The loose connection between top-level domain specifications (e.g., com, edu, and org) and their actual content is also a frustrating problem. For example, many .com sites contain noncommercial content, although com is ostensibly the main commercial top-level domain. Indeed, a skeptical researcher could claim that obstacles of this kind are so great that all Web analyses lack value. As will be seen, one response to this view, a view shared by critics of evaluative bibliometrics, is to demonstrate that Web data correlate significantly with some non-Web data in order to prove that the Web data are not wholly random. A practical response has been to develop increasingly sophisticated data cleansing techniques and multiple data analysis methods.