Search (10 results, page 1 of 1)

Galvez, C.; Moya-Anegón, F. de: ¬An evaluation of conflation accuracy using finite-state transducers (2006) 0.02
```
0.016952079 = product of:
  0.050856233 = sum of:
    0.050856233 = weight(_text_:electronic in 5599) [ClassicSimilarity], result of:
      0.050856233 = score(doc=5599,freq=2.0), product of:
        0.19623034 = queryWeight, product of:
          3.9095051 = idf(docFreq=2409, maxDocs=44218)
          0.05019314 = queryNorm
        0.259166 = fieldWeight in 5599, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9095051 = idf(docFreq=2409, maxDocs=44218)
          0.046875 = fieldNorm(doc=5599)
  0.33333334 = coord(1/3)
```
Abstract

Purpose - To evaluate the accuracy of conflation methods based on finite-state transducers (FSTs). Design/methodology/approach - Incorrectly lemmatized and stemmed forms may lead to the retrieval of inappropriate documents. Experimental studies to date have focused on retrieval performance, but very few on conflation performance. The process of normalization we used involved a linguistic toolbox that allowed us to construct, through graphic interfaces, electronic dictionaries represented internally by FSTs. The lexical resources developed were applied to a Spanish test corpus for merging term variants in canonical lemmatized forms. Conflation performance was evaluated in terms of an adaptation of recall and precision measures, based on accuracy and coverage, not actual retrieval. The results were compared with those obtained using a Spanish version of the Porter algorithm. Findings - The conclusion is that the main strength of lemmatization is its accuracy, whereas its main limitation is the underanalysis of variant forms. Originality/value - The report outlines the potential of transducers in their application to normalization processes.

Hlava, M.M.K.: Automatic indexing : comparing rule-based and statistics-based indexing systems (2005) 0.02

0.01586778 = product of:
  0.047603343 = sum of:
    0.047603343 = product of:
      0.095206685 = sum of:
        0.095206685 = weight(_text_:22 in 6265) [ClassicSimilarity], result of:
          0.095206685 = score(doc=6265,freq=2.0), product of:
            0.17576782 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05019314 = queryNorm
            0.5416616 = fieldWeight in 6265, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=6265)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Source: Information outlook. 9(2005) no.8, S.22-23

Rasmussen, E.M.: Indexing and retrieval for the Web (2002) 0.01
```
0.013984751 = product of:
  0.041954253 = sum of:
    0.041954253 = weight(_text_:electronic in 4285) [ClassicSimilarity], result of:
      0.041954253 = score(doc=4285,freq=4.0), product of:
        0.19623034 = queryWeight, product of:
          3.9095051 = idf(docFreq=2409, maxDocs=44218)
          0.05019314 = queryNorm
        0.21380106 = fieldWeight in 4285, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.9095051 = idf(docFreq=2409, maxDocs=44218)
          0.02734375 = fieldNorm(doc=4285)
  0.33333334 = coord(1/3)
```
Abstract

The introduction and growth of the World Wide Web (WWW, or Web) have resulted in a profound change in the way individuals and organizations access information. In terms of volume, nature, and accessibility, the characteristics of electronic information are significantly different from those of even five or six years ago. Control of, and access to, this flood of information rely heavily an automated techniques for indexing and retrieval. According to Gudivada, Raghavan, Grosky, and Kasanagottu (1997, p. 58), "The ability to search and retrieve information from the Web efficiently and effectively is an enabling technology for realizing its full potential." Almost 93 percent of those surveyed consider the Web an "indispensable" Internet technology, second only to e-mail (Graphie, Visualization & Usability Center, 1998). Although there are other ways of locating information an the Web (browsing or following directory structures), 85 percent of users identify Web pages by means of a search engine (Graphie, Visualization & Usability Center, 1998). A more recent study conducted by the Stanford Institute for the Quantitative Study of Society confirms the finding that searching for information is second only to e-mail as an Internet activity (Nie & Ebring, 2000, online). In fact, Nie and Ebring conclude, "... the Internet today is a giant public library with a decidedly commercial tilt. The most widespread use of the Internet today is as an information search utility for products, travel, hobbies, and general information. Virtually all users interviewed responded that they engaged in one or more of these information gathering activities."
Techniques for automated indexing and information retrieval (IR) have been developed, tested, and refined over the past 40 years, and are well documented (see, for example, Agosti & Smeaton, 1996; BaezaYates & Ribeiro-Neto, 1999a; Frakes & Baeza-Yates, 1992; Korfhage, 1997; Salton, 1989; Witten, Moffat, & Bell, 1999). With the introduction of the Web, and the capability to index and retrieve via search engines, these techniques have been extended to a new environment. They have been adopted, altered, and in some Gases extended to include new methods. "In short, search engines are indispensable for searching the Web, they employ a variety of relatively advanced IR techniques, and there are some peculiar aspects of search engines that make searching the Web different than more conventional information retrieval" (Gordon & Pathak, 1999, p. 145). The environment for information retrieval an the World Wide Web differs from that of "conventional" information retrieval in a number of fundamental ways. The collection is very large and changes continuously, with pages being added, deleted, and altered. Wide variability between the size, structure, focus, quality, and usefulness of documents makes Web documents much more heterogeneous than a typical electronic document collection. The wide variety of document types includes images, video, audio, and scripts, as well as many different document languages. Duplication of documents and sites is common. Documents are interconnected through networks of hyperlinks. Because of the size and dynamic nature of the Web, preprocessing all documents requires considerable resources and is often not feasible, certainly not an the frequent basis required to ensure currency. Query length is usually much shorter than in other environments-only a few words-and user behavior differs from that in other environments. These differences make the Web a novel environment for information retrieval (Baeza-Yates & Ribeiro-Neto, 1999b; Bharat & Henzinger, 1998; Huang, 2000).

Hauer, M.: Automatische Indexierung (2000) 0.01

0.013600955 = product of:
  0.040802862 = sum of:
    0.040802862 = product of:
      0.081605725 = sum of:
        0.081605725 = weight(_text_:22 in 5887) [ClassicSimilarity], result of:
          0.081605725 = score(doc=5887,freq=2.0), product of:
            0.17576782 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05019314 = queryNorm
            0.46428138 = fieldWeight in 5887, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=5887)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Source: Wissen in Aktion: Wege des Knowledge Managements. 22. Online-Tagung der DGI, Frankfurt am Main, 2.-4.5.2000. Proceedings. Hrsg.: R. Schmidt

Lepsky, K.; Vorhauer, J.: Lingo - ein open source System für die Automatische Indexierung deutschsprachiger Dokumente (2006) 0.01

0.0090673035 = product of:
  0.02720191 = sum of:
    0.02720191 = product of:
      0.05440382 = sum of:
        0.05440382 = weight(_text_:22 in 3581) [ClassicSimilarity], result of:
          0.05440382 = score(doc=3581,freq=2.0), product of:
            0.17576782 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05019314 = queryNorm
            0.30952093 = fieldWeight in 3581, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=3581)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 24. 3.2006 12:22:02

Probst, M.; Mittelbach, J.: Maschinelle Indexierung in der Sacherschließung wissenschaftlicher Bibliotheken (2006) 0.01

0.0090673035 = product of:
  0.02720191 = sum of:
    0.02720191 = product of:
      0.05440382 = sum of:
        0.05440382 = weight(_text_:22 in 1755) [ClassicSimilarity], result of:
          0.05440382 = score(doc=1755,freq=2.0), product of:
            0.17576782 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05019314 = queryNorm
            0.30952093 = fieldWeight in 1755, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=1755)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 22. 3.2008 12:35:19

Renz, M.: Automatische Inhaltserschließung im Zeichen von Wissensmanagement (2001) 0.01

0.00793389 = product of:
  0.023801671 = sum of:
    0.023801671 = product of:
      0.047603343 = sum of:
        0.047603343 = weight(_text_:22 in 5671) [ClassicSimilarity], result of:
          0.047603343 = score(doc=5671,freq=2.0), product of:
            0.17576782 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05019314 = queryNorm
            0.2708308 = fieldWeight in 5671, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5671)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 22. 3.2001 13:14:48

Newman, D.J.; Block, S.: Probabilistic topic decomposition of an eighteenth-century American newspaper (2006) 0.01

0.00793389 = product of:
  0.023801671 = sum of:
    0.023801671 = product of:
      0.047603343 = sum of:
        0.047603343 = weight(_text_:22 in 5291) [ClassicSimilarity], result of:
          0.047603343 = score(doc=5291,freq=2.0), product of:
            0.17576782 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05019314 = queryNorm
            0.2708308 = fieldWeight in 5291, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5291)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 22. 7.2006 17:32:00

Lorenz, S.: Konzeption und prototypische Realisierung einer begriffsbasierten Texterschließung (2006) 0.01

0.0068004774 = product of:
  0.020401431 = sum of:
    0.020401431 = product of:
      0.040802862 = sum of:
        0.040802862 = weight(_text_:22 in 1746) [ClassicSimilarity], result of:
          0.040802862 = score(doc=1746,freq=2.0), product of:
            0.17576782 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05019314 = queryNorm
            0.23214069 = fieldWeight in 1746, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=1746)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 22. 3.2015 9:17:30

Nohr, H.: Grundlagen der automatischen Indexierung : ein Lehrbuch (2003) 0.00

0.0045336518 = product of:
  0.013600955 = sum of:
    0.013600955 = product of:
      0.02720191 = sum of:
        0.02720191 = weight(_text_:22 in 1767) [ClassicSimilarity], result of:
          0.02720191 = score(doc=1767,freq=2.0), product of:
            0.17576782 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05019314 = queryNorm
            0.15476047 = fieldWeight in 1767, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=1767)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 22. 6.2009 12:46:51

Search (10 results, page 1 of 1)

Authors

Languages

Types

Themes