Search (12 results, page 1 of 1)

  • × theme_ss:"Automatisches Indexieren"
  • × theme_ss:"Internet"
  1. Wolfekuhler, M.R.; Punch, W.F.: Finding salient features for personal Web pages categories (1997) 0.03
    0.025941458 = product of:
      0.051882915 = sum of:
        0.051882915 = sum of:
          0.008202582 = weight(_text_:a in 2673) [ClassicSimilarity], result of:
            0.008202582 = score(doc=2673,freq=6.0), product of:
              0.053105544 = queryWeight, product of:
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.046056706 = queryNorm
              0.1544581 = fieldWeight in 2673, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2673)
          0.043680333 = weight(_text_:22 in 2673) [ClassicSimilarity], result of:
            0.043680333 = score(doc=2673,freq=2.0), product of:
              0.16128273 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046056706 = queryNorm
              0.2708308 = fieldWeight in 2673, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2673)
      0.5 = coord(1/2)
    
    Abstract
    Examines techniques that discover features in sets of pre-categorized documents, such that similar documents can be found on the WWW. Examines techniques which will classifiy training examples with high accuracy, then explains why this is not necessarily useful. Describes a method for extracting word clusters from the raw document features. Results show that the clustering technique is successful in discovering word groups in personal Web pages which can be used to find similar information on the WWW
    Date
    1. 8.1996 22:08:06
    Footnote
    Contribution to a special issue of papers from the 6th International World Wide Web conference, held 7-11 Apr 1997, Santa Clara, California
    Type
    a
  2. McKiernan, G.: Automated categorisation of Web resources : a profile of selected projects, research, products, and services (1996) 0.00
    0.0023919214 = product of:
      0.0047838427 = sum of:
        0.0047838427 = product of:
          0.009567685 = sum of:
            0.009567685 = weight(_text_:a in 2533) [ClassicSimilarity], result of:
              0.009567685 = score(doc=2533,freq=4.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.18016359 = fieldWeight in 2533, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.078125 = fieldNorm(doc=2533)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Type
    a
  3. Hirawa, M.: Role of keywords in the network searching era (1998) 0.00
    0.0023435948 = product of:
      0.0046871896 = sum of:
        0.0046871896 = product of:
          0.009374379 = sum of:
            0.009374379 = weight(_text_:a in 3446) [ClassicSimilarity], result of:
              0.009374379 = score(doc=3446,freq=6.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.17652355 = fieldWeight in 3446, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0625 = fieldNorm(doc=3446)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    A survey of Japanese OPACs available on the Internet was conducted relating to use of keywords for subject access. The findings suggest that present OPACs are not capable of storing subject-oriented information. Currently available keyword access derives from a merely title-based retrieval system. Contents data should be added to bibliographic records as an efficient way of providing subject access, and costings for this process should be estimated. Word standardisation issues must also be addressed
    Type
    a
  4. Bloomfield, M.: Indexing : neglected and poorly understood (2001) 0.00
    0.002269176 = product of:
      0.004538352 = sum of:
        0.004538352 = product of:
          0.009076704 = sum of:
            0.009076704 = weight(_text_:a in 5439) [ClassicSimilarity], result of:
              0.009076704 = score(doc=5439,freq=10.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.1709182 = fieldWeight in 5439, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5439)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The growth of the Internet has highlighted the use of machine indexing. The difficulties in using the Internet as a searching device can be frustrating. The use of the term "Python" is given as an example. Machine indexing is noted as "rotten" and human indexing as "capricious." The problem seems to be a lack of a theoretical foundation for the art of indexing. What librarians have learned over the last hundred years has yet to yield a consistent approach to what really works best in preparing index terms and in the ability of our customers to search the various indexes. An attempt is made to consider the elements of indexing, their pros and cons. The argument is made that machine indexing is far too prolific in its production of index terms. Neither librarians nor computer programmers have made much progress to improve Internet indexing. Human indexing has had the same problems for over fifty years.
    Type
    a
  5. Koch, T.: Experiments with automatic classification of WAIS databases and indexing of WWW : some results from the Nordic WAIS/WWW project (1994) 0.00
    0.0020506454 = product of:
      0.004101291 = sum of:
        0.004101291 = product of:
          0.008202582 = sum of:
            0.008202582 = weight(_text_:a in 7209) [ClassicSimilarity], result of:
              0.008202582 = score(doc=7209,freq=6.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.1544581 = fieldWeight in 7209, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=7209)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The Nordic WAIS/WWW project sponsored by NORDINFO is a joint project between Lund University Library and the National Technological Library of Denmark. It aims to improve the existing networked information discovery and retrieval tools Wide Area Information System (WAIS) and World Wide Web (WWW), and to move towards unifying WWW and WAIS. Details current results focusing on the WAIS side of the project. Describes research into automatic indexing and classification of WAIS sources, development of an orientation tool for WAIS, and development of a WAIS index of WWW resources
    Type
    a
  6. Cheng, K.-H.: Automatic identification for topics of electronic documents (1997) 0.00
    0.0020506454 = product of:
      0.004101291 = sum of:
        0.004101291 = product of:
          0.008202582 = sum of:
            0.008202582 = weight(_text_:a in 1811) [ClassicSimilarity], result of:
              0.008202582 = score(doc=1811,freq=6.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.1544581 = fieldWeight in 1811, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1811)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    With the rapid rise in numbers of electronic documents on the Internet, how to effectively assign topics to documents become an important issue. Current research in this area focuses on the behaviour of nouns in documents. Proposes, however, that nouns and verbs together contribute to the process of topic identification. Constructs a mathematical model taking into account the following factors: word importance, word frequency, word co-occurence, and word distance. Preliminary experiments ahow that the performance of the proposed model is equivalent to that of a human being
    Type
    a
  7. Rasmussen, E.M.: Indexing and retrieval for the Web (2002) 0.00
    0.0020506454 = product of:
      0.004101291 = sum of:
        0.004101291 = product of:
          0.008202582 = sum of:
            0.008202582 = weight(_text_:a in 4285) [ClassicSimilarity], result of:
              0.008202582 = score(doc=4285,freq=24.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.1544581 = fieldWeight in 4285, product of:
                  4.8989797 = tf(freq=24.0), with freq of:
                    24.0 = termFreq=24.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=4285)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The introduction and growth of the World Wide Web (WWW, or Web) have resulted in a profound change in the way individuals and organizations access information. In terms of volume, nature, and accessibility, the characteristics of electronic information are significantly different from those of even five or six years ago. Control of, and access to, this flood of information rely heavily an automated techniques for indexing and retrieval. According to Gudivada, Raghavan, Grosky, and Kasanagottu (1997, p. 58), "The ability to search and retrieve information from the Web efficiently and effectively is an enabling technology for realizing its full potential." Almost 93 percent of those surveyed consider the Web an "indispensable" Internet technology, second only to e-mail (Graphie, Visualization & Usability Center, 1998). Although there are other ways of locating information an the Web (browsing or following directory structures), 85 percent of users identify Web pages by means of a search engine (Graphie, Visualization & Usability Center, 1998). A more recent study conducted by the Stanford Institute for the Quantitative Study of Society confirms the finding that searching for information is second only to e-mail as an Internet activity (Nie & Ebring, 2000, online). In fact, Nie and Ebring conclude, "... the Internet today is a giant public library with a decidedly commercial tilt. The most widespread use of the Internet today is as an information search utility for products, travel, hobbies, and general information. Virtually all users interviewed responded that they engaged in one or more of these information gathering activities."
    Techniques for automated indexing and information retrieval (IR) have been developed, tested, and refined over the past 40 years, and are well documented (see, for example, Agosti & Smeaton, 1996; BaezaYates & Ribeiro-Neto, 1999a; Frakes & Baeza-Yates, 1992; Korfhage, 1997; Salton, 1989; Witten, Moffat, & Bell, 1999). With the introduction of the Web, and the capability to index and retrieve via search engines, these techniques have been extended to a new environment. They have been adopted, altered, and in some Gases extended to include new methods. "In short, search engines are indispensable for searching the Web, they employ a variety of relatively advanced IR techniques, and there are some peculiar aspects of search engines that make searching the Web different than more conventional information retrieval" (Gordon & Pathak, 1999, p. 145). The environment for information retrieval an the World Wide Web differs from that of "conventional" information retrieval in a number of fundamental ways. The collection is very large and changes continuously, with pages being added, deleted, and altered. Wide variability between the size, structure, focus, quality, and usefulness of documents makes Web documents much more heterogeneous than a typical electronic document collection. The wide variety of document types includes images, video, audio, and scripts, as well as many different document languages. Duplication of documents and sites is common. Documents are interconnected through networks of hyperlinks. Because of the size and dynamic nature of the Web, preprocessing all documents requires considerable resources and is often not feasible, certainly not an the frequent basis required to ensure currency. Query length is usually much shorter than in other environments-only a few words-and user behavior differs from that in other environments. These differences make the Web a novel environment for information retrieval (Baeza-Yates & Ribeiro-Neto, 1999b; Bharat & Henzinger, 1998; Huang, 2000).
    Type
    a
  8. Thirion, B.; Leroy, J.P.; Baudic, F.; Douyère, M.; Piot, J.; Darmoni, S.J.: SDI selecting, decribing, and indexing : did you mean automatically? (2001) 0.00
    0.0020296127 = product of:
      0.0040592253 = sum of:
        0.0040592253 = product of:
          0.008118451 = sum of:
            0.008118451 = weight(_text_:a in 6198) [ClassicSimilarity], result of:
              0.008118451 = score(doc=6198,freq=2.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.15287387 = fieldWeight in 6198, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.09375 = fieldNorm(doc=6198)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Type
    a
  9. Pfeifer, U.; Fuhr, N.; Huynh, T.: Searching structured documents with the enhanced retrieval functionality of freeWAIS-sf and SFgate (1995) 0.00
    0.001674345 = product of:
      0.00334869 = sum of:
        0.00334869 = product of:
          0.00669738 = sum of:
            0.00669738 = weight(_text_:a in 2214) [ClassicSimilarity], result of:
              0.00669738 = score(doc=2214,freq=4.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.12611452 = fieldWeight in 2214, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2214)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The original WAIS implementation by Thinking Machines and others treats documents as uniform bags of terms. Since most documents exhibit some internal structure, it is desirable to provide the user means to exploit this structure in his queries. Presents extensions to the freeWAIS indexer and server, which allows access to document structures using the original WAIS protocol. Major extensions include: arbitrary document formats, search in individual structure elements, stemming and phonetic search, support of 8-bit character sets, numeric concepts and operators. combination of Boolean and linear retrieval. Presents a WWW-WAIS gateway specially tailored for usage with freeWAIS-sf which transforms filled out HTML forms to the new query syntax
    Type
    a
  10. MacDougall, S.: Rethinking indexing : the impact of the Internet (1996) 0.00
    0.0014351527 = product of:
      0.0028703054 = sum of:
        0.0028703054 = product of:
          0.005740611 = sum of:
            0.005740611 = weight(_text_:a in 704) [ClassicSimilarity], result of:
              0.005740611 = score(doc=704,freq=4.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.10809815 = fieldWeight in 704, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=704)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Considers the challenge to professional indexers posed by the Internet. Indexing and searching on the Internet appears to have a retrograde step, as well developed and efficient information retrieval techniques have been replaced by cruder techniques, involving automatic keyword indexing and frequency ranking, leading to large retrieval sets and low precision. This is made worse by the apparent acceptance of this poor perfromance by Internet users and the feeling, on the part of indexers, that they are being bypassed by the producers of these hyperlinked menus and search engines. Key issues are: how far 'human' indexing will still be required in the Internet environment; how indexing techniques will have to change to stay relevant; and the future role of indexers. The challenge facing indexers is to adapt their skills to suit the online environment and to convince publishers of the need for efficient indexes on the Internet
    Type
    a
  11. Shafer, K.: Scorpion Project explores using Dewey to organize the Web (1996) 0.00
    0.0011839407 = product of:
      0.0023678814 = sum of:
        0.0023678814 = product of:
          0.0047357627 = sum of:
            0.0047357627 = weight(_text_:a in 6750) [ClassicSimilarity], result of:
              0.0047357627 = score(doc=6750,freq=2.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.089176424 = fieldWeight in 6750, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=6750)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Type
    a
  12. Lepsky, K.: Im Heuhaufen suchen - und finden : Automatische Erschließung von Internetquellen: Möglichkeiten und Grenzen (1998) 0.00
    0.0010148063 = product of:
      0.0020296127 = sum of:
        0.0020296127 = product of:
          0.0040592253 = sum of:
            0.0040592253 = weight(_text_:a in 4655) [ClassicSimilarity], result of:
              0.0040592253 = score(doc=4655,freq=2.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.07643694 = fieldWeight in 4655, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4655)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Type
    a