Search (94 results, page 1 of 5)

Wolfekuhler, M.R.; Punch, W.F.: Finding salient features for personal Web pages categories (1997) 0.14

0.14300323 = product of:
  0.21450482 = sum of:
    0.045744486 = weight(_text_:world in 2673) [ClassicSimilarity], result of:
      0.045744486 = score(doc=2673,freq=2.0), product of:
        0.1538826 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.04003532 = queryNorm
        0.29726875 = fieldWeight in 2673, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2673)
    0.06078585 = weight(_text_:wide in 2673) [ClassicSimilarity], result of:
      0.06078585 = score(doc=2673,freq=2.0), product of:
        0.17738682 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.04003532 = queryNorm
        0.342674 = fieldWeight in 2673, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2673)
    0.057118528 = weight(_text_:web in 2673) [ClassicSimilarity], result of:
      0.057118528 = score(doc=2673,freq=6.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.43716836 = fieldWeight in 2673, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2673)
    0.05085595 = product of:
      0.076283924 = sum of:
        0.038314294 = weight(_text_:29 in 2673) [ClassicSimilarity], result of:
          0.038314294 = score(doc=2673,freq=2.0), product of:
            0.14083174 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04003532 = queryNorm
            0.27205724 = fieldWeight in 2673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2673)
        0.03796963 = weight(_text_:22 in 2673) [ClassicSimilarity], result of:
          0.03796963 = score(doc=2673,freq=2.0), product of:
            0.14019686 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04003532 = queryNorm
            0.2708308 = fieldWeight in 2673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2673)
      0.6666667 = coord(2/3)
  0.6666667 = coord(4/6)

Abstract: Examines techniques that discover features in sets of pre-categorized documents, such that similar documents can be found on the WWW. Examines techniques which will classifiy training examples with high accuracy, then explains why this is not necessarily useful. Describes a method for extracting word clusters from the raw document features. Results show that the clustering technique is successful in discovering word groups in personal Web pages which can be used to find similar information on the WWW
Date: 1. 8.1996 22:08:06
Footnote: Contribution to a special issue of papers from the 6th International World Wide Web conference, held 7-11 Apr 1997, Santa Clara, California
Source: Computer networks and ISDN systems. 29(1997) no.8, S.1147-1156

Koch, T.: Experiments with automatic classification of WAIS databases and indexing of WWW : some results from the Nordic WAIS/WWW project (1994) 0.10

0.09908667 = product of:
  0.19817334 = sum of:
    0.07923178 = weight(_text_:world in 7209) [ClassicSimilarity], result of:
      0.07923178 = score(doc=7209,freq=6.0), product of:
        0.1538826 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.04003532 = queryNorm
        0.5148846 = fieldWeight in 7209, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7209)
    0.08596417 = weight(_text_:wide in 7209) [ClassicSimilarity], result of:
      0.08596417 = score(doc=7209,freq=4.0), product of:
        0.17738682 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.04003532 = queryNorm
        0.4846142 = fieldWeight in 7209, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7209)
    0.0329774 = weight(_text_:web in 7209) [ClassicSimilarity], result of:
      0.0329774 = score(doc=7209,freq=2.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.25239927 = fieldWeight in 7209, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7209)
  0.5 = coord(3/6)

Abstract: The Nordic WAIS/WWW project sponsored by NORDINFO is a joint project between Lund University Library and the National Technological Library of Denmark. It aims to improve the existing networked information discovery and retrieval tools Wide Area Information System (WAIS) and World Wide Web (WWW), and to move towards unifying WWW and WAIS. Details current results focusing on the WAIS side of the project. Describes research into automatic indexing and classification of WAIS sources, development of an orientation tool for WAIS, and development of a WAIS index of WWW resources
Source: Internet world and document delivery world international 94: Proceedings of the 2nd Annual Conference, London, May 1994

Rasmussen, E.M.: Indexing and retrieval for the Web (2002) 0.08
```
0.077413574 = product of:
  0.15482715 = sum of:
    0.032346237 = weight(_text_:world in 4285) [ClassicSimilarity], result of:
      0.032346237 = score(doc=4285,freq=4.0), product of:
        0.1538826 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.04003532 = queryNorm
        0.21020076 = fieldWeight in 4285, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.02734375 = fieldNorm(doc=4285)
    0.06078585 = weight(_text_:wide in 4285) [ClassicSimilarity], result of:
      0.06078585 = score(doc=4285,freq=8.0), product of:
        0.17738682 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.04003532 = queryNorm
        0.342674 = fieldWeight in 4285, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.02734375 = fieldNorm(doc=4285)
    0.061695054 = weight(_text_:web in 4285) [ClassicSimilarity], result of:
      0.061695054 = score(doc=4285,freq=28.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.47219574 = fieldWeight in 4285, product of:
          5.2915025 = tf(freq=28.0), with freq of:
            28.0 = termFreq=28.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.02734375 = fieldNorm(doc=4285)
  0.5 = coord(3/6)
```
Abstract

The introduction and growth of the World Wide Web (WWW, or Web) have resulted in a profound change in the way individuals and organizations access information. In terms of volume, nature, and accessibility, the characteristics of electronic information are significantly different from those of even five or six years ago. Control of, and access to, this flood of information rely heavily an automated techniques for indexing and retrieval. According to Gudivada, Raghavan, Grosky, and Kasanagottu (1997, p. 58), "The ability to search and retrieve information from the Web efficiently and effectively is an enabling technology for realizing its full potential." Almost 93 percent of those surveyed consider the Web an "indispensable" Internet technology, second only to e-mail (Graphie, Visualization & Usability Center, 1998). Although there are other ways of locating information an the Web (browsing or following directory structures), 85 percent of users identify Web pages by means of a search engine (Graphie, Visualization & Usability Center, 1998). A more recent study conducted by the Stanford Institute for the Quantitative Study of Society confirms the finding that searching for information is second only to e-mail as an Internet activity (Nie & Ebring, 2000, online). In fact, Nie and Ebring conclude, "... the Internet today is a giant public library with a decidedly commercial tilt. The most widespread use of the Internet today is as an information search utility for products, travel, hobbies, and general information. Virtually all users interviewed responded that they engaged in one or more of these information gathering activities."
Techniques for automated indexing and information retrieval (IR) have been developed, tested, and refined over the past 40 years, and are well documented (see, for example, Agosti & Smeaton, 1996; BaezaYates & Ribeiro-Neto, 1999a; Frakes & Baeza-Yates, 1992; Korfhage, 1997; Salton, 1989; Witten, Moffat, & Bell, 1999). With the introduction of the Web, and the capability to index and retrieve via search engines, these techniques have been extended to a new environment. They have been adopted, altered, and in some Gases extended to include new methods. "In short, search engines are indispensable for searching the Web, they employ a variety of relatively advanced IR techniques, and there are some peculiar aspects of search engines that make searching the Web different than more conventional information retrieval" (Gordon & Pathak, 1999, p. 145). The environment for information retrieval an the World Wide Web differs from that of "conventional" information retrieval in a number of fundamental ways. The collection is very large and changes continuously, with pages being added, deleted, and altered. Wide variability between the size, structure, focus, quality, and usefulness of documents makes Web documents much more heterogeneous than a typical electronic document collection. The wide variety of document types includes images, video, audio, and scripts, as well as many different document languages. Duplication of documents and sites is common. Documents are interconnected through networks of hyperlinks. Because of the size and dynamic nature of the Web, preprocessing all documents requires considerable resources and is often not feasible, certainly not an the frequent basis required to ensure currency. Query length is usually much shorter than in other environments-only a few words-and user behavior differs from that in other environments. These differences make the Web a novel environment for information retrieval (Baeza-Yates & Ribeiro-Neto, 1999b; Bharat & Henzinger, 1998; Huang, 2000).
Krüger, C.: Evaluation des WWW-Suchdienstes GERHARD unter besonderer Beachtung automatischer Indexierung (1999) 0.07
```
0.07046205 = product of:
  0.1409241 = sum of:
    0.046208907 = weight(_text_:world in 1777) [ClassicSimilarity], result of:
      0.046208907 = score(doc=1777,freq=4.0), product of:
        0.1538826 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.04003532 = queryNorm
        0.30028677 = fieldWeight in 1777, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1777)
    0.061402984 = weight(_text_:wide in 1777) [ClassicSimilarity], result of:
      0.061402984 = score(doc=1777,freq=4.0), product of:
        0.17738682 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.04003532 = queryNorm
        0.34615302 = fieldWeight in 1777, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1777)
    0.0333122 = weight(_text_:web in 1777) [ClassicSimilarity], result of:
      0.0333122 = score(doc=1777,freq=4.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.25496176 = fieldWeight in 1777, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1777)
  0.5 = coord(3/6)
```
Abstract

Die vorliegende Arbeit beinhaltet eine Beschreibung und Evaluation des WWW - Suchdienstes GERHARD (German Harvest Automated Retrieval and Directory). GERHARD ist ein Such- und Navigationssystem für das deutsche World Wide Web, weiches ausschließlich wissenschaftlich relevante Dokumente sammelt, und diese auf der Basis computerlinguistischer und statistischer Methoden automatisch mit Hilfe eines bibliothekarischen Klassifikationssystems klassifiziert. Mit dem DFG - Projekt GERHARD ist der Versuch unternommen worden, mit einem auf einem automatischen Klassifizierungsverfahren basierenden World Wide Web - Dienst eine Alternative zu herkömmlichen Methoden der Interneterschließung zu entwickeln. GERHARD ist im deutschsprachigen Raum das einzige Verzeichnis von Internetressourcen, dessen Erstellung und Aktualisierung vollständig automatisch (also maschinell) erfolgt. GERHARD beschränkt sich dabei auf den Nachweis von Dokumenten auf wissenschaftlichen WWW - Servern. Die Grundidee dabei war, kostenintensive intellektuelle Erschließung und Klassifizierung von lnternetseiten durch computerlinguistische und statistische Methoden zu ersetzen, um auf diese Weise die nachgewiesenen Internetressourcen automatisch auf das Vokabular eines bibliothekarischen Klassifikationssystems abzubilden. GERHARD steht für German Harvest Automated Retrieval and Directory. Die WWW - Adresse (URL) von GERHARD lautet: http://www.gerhard.de. Im Rahmen der vorliegenden Diplomarbeit soll eine Beschreibung des Dienstes mit besonderem Schwerpunkt auf dem zugrundeliegenden Indexierungs- bzw. Klassifizierungssystem erfolgen und anschließend mit Hilfe eines kleinen Retrievaltests die Effektivität von GERHARD überprüft werden.

Fauzi, F.; Belkhatir, M.: Multifaceted conceptual image indexing on the world wide web (2013) 0.07

0.07013522 = product of:
  0.14027044 = sum of:
    0.03920956 = weight(_text_:world in 2721) [ClassicSimilarity], result of:
      0.03920956 = score(doc=2721,freq=2.0), product of:
        0.1538826 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.04003532 = queryNorm
        0.25480178 = fieldWeight in 2721, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.046875 = fieldNorm(doc=2721)
    0.052102152 = weight(_text_:wide in 2721) [ClassicSimilarity], result of:
      0.052102152 = score(doc=2721,freq=2.0), product of:
        0.17738682 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.04003532 = queryNorm
        0.29372054 = fieldWeight in 2721, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.046875 = fieldNorm(doc=2721)
    0.048958737 = weight(_text_:web in 2721) [ClassicSimilarity], result of:
      0.048958737 = score(doc=2721,freq=6.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.37471575 = fieldWeight in 2721, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=2721)
  0.5 = coord(3/6)

Abstract: In this paper, we describe a user-centered design of an automated multifaceted concept-based indexing framework which analyzes the semantics of the Web image contextual information and classifies it into five broad semantic concept facets: signal, object, abstract, scene, and relational; and identifies the semantic relationships between the concepts. An important aspect of our indexing model is that it relates to the users' levels of image descriptions. Also, a major contribution relies on the fact that the classification is performed automatically with the raw image contextual information extracted from any general webpage and is not solely based on image tags like state-of-the-art solutions. Human Language Technology techniques and an external knowledge base are used to analyze the information both syntactically and semantically. Experimental results on a human-annotated Web image collection and corresponding contextual information indicate that our method outperforms empirical frameworks employing tf-idf and location-based tf-idf weighting schemes as well as n-gram indexing in a recall/precision based evaluation framework.

Daudaravicius, V.: ¬A framework for keyphrase extraction from scientific journals (2016) 0.07

0.06975387 = product of:
  0.13950774 = sum of:
    0.045744486 = weight(_text_:world in 2930) [ClassicSimilarity], result of:
      0.045744486 = score(doc=2930,freq=2.0), product of:
        0.1538826 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.04003532 = queryNorm
        0.29726875 = fieldWeight in 2930, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2930)
    0.06078585 = weight(_text_:wide in 2930) [ClassicSimilarity], result of:
      0.06078585 = score(doc=2930,freq=2.0), product of:
        0.17738682 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.04003532 = queryNorm
        0.342674 = fieldWeight in 2930, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2930)
    0.0329774 = weight(_text_:web in 2930) [ClassicSimilarity], result of:
      0.0329774 = score(doc=2930,freq=2.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.25239927 = fieldWeight in 2930, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2930)
  0.5 = coord(3/6)

Content: Vortrag, "Semantics, Analytics, Visualisation: Enhancing Scholarly Data Workshop co-located with the 25th International World Wide Web Conference April 11, 2016 - Montreal, Canada", Montreal 2016.

Gábor, K.; Zargayouna, H.; Tellier, I.; Buscaldi, D.; Charnois, T.: ¬A typology of semantic relations dedicated to scientific literature analysis (2016) 0.07

0.06975387 = product of:
  0.13950774 = sum of:
    0.045744486 = weight(_text_:world in 2933) [ClassicSimilarity], result of:
      0.045744486 = score(doc=2933,freq=2.0), product of:
        0.1538826 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.04003532 = queryNorm
        0.29726875 = fieldWeight in 2933, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2933)
    0.06078585 = weight(_text_:wide in 2933) [ClassicSimilarity], result of:
      0.06078585 = score(doc=2933,freq=2.0), product of:
        0.17738682 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.04003532 = queryNorm
        0.342674 = fieldWeight in 2933, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2933)
    0.0329774 = weight(_text_:web in 2933) [ClassicSimilarity], result of:
      0.0329774 = score(doc=2933,freq=2.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.25239927 = fieldWeight in 2933, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2933)
  0.5 = coord(3/6)

Content: Vortrag, "Semantics, Analytics, Visualisation: Enhancing Scholarly Data Workshop co-located with the 25th International World Wide Web Conference April 11, 2016 - Montreal, Canada", Montreal 2016.

Alexander, M.: Retrieving digital data with fuzzy matching (1996) 0.04

0.040582985 = product of:
  0.121748954 = sum of:
    0.052279413 = weight(_text_:world in 6961) [ClassicSimilarity], result of:
      0.052279413 = score(doc=6961,freq=2.0), product of:
        0.1538826 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.04003532 = queryNorm
        0.33973572 = fieldWeight in 6961, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.0625 = fieldNorm(doc=6961)
    0.06946954 = weight(_text_:wide in 6961) [ClassicSimilarity], result of:
      0.06946954 = score(doc=6961,freq=2.0), product of:
        0.17738682 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.04003532 = queryNorm
        0.3916274 = fieldWeight in 6961, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0625 = fieldNorm(doc=6961)
  0.33333334 = coord(2/6)

Abstract: Briefly describes the Excalibur EFS system which makes use of adaptive pattern recognition technology as an aid to automatic indexing and how it is being tested at the British Library for the indexing and retrieval of scanned images from the library's holdings. Notes how Excalibur EFS can support a wide degree of fuzzy searching, compensate for the errors produced by OCR conversion of scanned images, reduce the costs of indexing, and require far less storage space than more traditional indexes
Source: New library world. 97(1996) no.1131, S.28-31

Search Engines and Beyond : Developing efficient knowledge management systems, April 19-20 1999, Boston, Mass (1999) 0.04
```
0.03985935 = product of:
  0.0797187 = sum of:
    0.026139706 = weight(_text_:world in 2596) [ClassicSimilarity], result of:
      0.026139706 = score(doc=2596,freq=2.0), product of:
        0.1538826 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.04003532 = queryNorm
        0.16986786 = fieldWeight in 2596, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.03125 = fieldNorm(doc=2596)
    0.03473477 = weight(_text_:wide in 2596) [ClassicSimilarity], result of:
      0.03473477 = score(doc=2596,freq=2.0), product of:
        0.17738682 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.04003532 = queryNorm
        0.1958137 = fieldWeight in 2596, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.03125 = fieldNorm(doc=2596)
    0.018844226 = weight(_text_:web in 2596) [ClassicSimilarity], result of:
      0.018844226 = score(doc=2596,freq=2.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.14422815 = fieldWeight in 2596, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03125 = fieldNorm(doc=2596)
  0.5 = coord(3/6)
```
Content

Ramana Rao (Inxight, Palo Alto, CA) 7 ± 2 Insights on achieving Effective Information Access Session One: Updates and a twelve month perspective Danny Sullivan (Search Engine Watch, US / England) Portalization and other search trends Carol Tenopir (University of Tennessee) Search realities faced by end users and professional searchers Session Two: Today's search engines and beyond Daniel Hoogterp (Retrieval Technologies, McLean, VA) Effective presentation and utilization of search techniques Rick Kenny (Fulcrum Technologies, Ontario, Canada) Beyond document clustering: The knowledge impact statement Gary Stock (Ingenius, Kalamazoo, MI) Automated change monitoring Gary Culliss (Direct Hit, Wellesley Hills, MA) User popularity ranked search engines Byron Dom (IBM, CA) Automatically finding the best pages on the World Wide Web (CLEVER) Peter Tomassi (LookSmart, San Francisco, CA) Adding human intellect to search technology Session Three: Panel discussion: Human v automated categorization and editing Ev Brenner (New York, NY)- Chairman James Callan (University of Massachusetts, MA) Marc Krellenstein (Northern Light Technology, Cambridge, MA) Dan Miller (Ask Jeeves, Berkeley, CA) Session Four: Updates and a twelve month perspective Steve Arnold (AIT, Harrods Creek, KY) Review: The leading edge in search and retrieval software Ellen Voorhees (NIST, Gaithersburg, MD) TREC update Session Five: Search engines now and beyond Intelligent Agents John Snyder (Muscat, Cambridge, England) Practical issues behind intelligent agents Text summarization Therese Firmin, (Dept of Defense, Ft George G. Meade, MD) The TIPSTER/SUMMAC evaluation of automatic text summarization systems Cross language searching Elizabeth Liddy (TextWise, Syracuse, NY) A conceptual interlingua approach to cross-language retrieval. Video search and retrieval Armon Amir (IBM, Almaden, CA) CueVideo: Modular system for automatic indexing and browsing of video/audio Speech recognition Michael Witbrock (Lycos, Waltham, MA) Retrieval of spoken documents Visualization James A. Wise (Integral Visuals, Richland, WA) Information visualization in the new millennium: Emerging science or passing fashion? Text mining David Evans (Claritech, Pittsburgh, PA) Text mining - towards decision support
Groß, T.; Faden, M.: Automatische Indexierung elektronischer Dokumente an der Deutschen Zentralbibliothek für Wirtschaftswissenschaften : Bericht über die Jahrestagung der Internationalen Buchwissenschaftlichen Gesellschaft (2010) 0.04
```
0.03985935 = product of:
  0.0797187 = sum of:
    0.026139706 = weight(_text_:world in 4051) [ClassicSimilarity], result of:
      0.026139706 = score(doc=4051,freq=2.0), product of:
        0.1538826 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.04003532 = queryNorm
        0.16986786 = fieldWeight in 4051, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.03125 = fieldNorm(doc=4051)
    0.03473477 = weight(_text_:wide in 4051) [ClassicSimilarity], result of:
      0.03473477 = score(doc=4051,freq=2.0), product of:
        0.17738682 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.04003532 = queryNorm
        0.1958137 = fieldWeight in 4051, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.03125 = fieldNorm(doc=4051)
    0.018844226 = weight(_text_:web in 4051) [ClassicSimilarity], result of:
      0.018844226 = score(doc=4051,freq=2.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.14422815 = fieldWeight in 4051, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03125 = fieldNorm(doc=4051)
  0.5 = coord(3/6)
```
Abstract

Die zunehmende Verfügbarmachung digitaler Informationen in den letzten Jahren sowie die Aussicht auf ein weiteres Ansteigen der sogenannten Datenflut kumulieren in einem grundlegenden, sich weiter verstärkenden Informationsstrukturierungsproblem. Die stetige Zunahme von digitalen Informationsressourcen im World Wide Web sichert zwar jederzeit und ortsungebunden den Zugriff auf verschiedene Informationen; offen bleibt der strukturierte Zugang, insbesondere zu wissenschaftlichen Ressourcen. Angesichts der steigenden Anzahl elektronischer Inhalte und vor dem Hintergrund stagnierender bzw. knapper werdender personeller Ressourcen in der Sacherschließun schafft keine Bibliothek bzw. kein Bibliotheksverbund es mehr, weder aktuell noch zukünftig, alle digitalen Daten zu erfassen, zu strukturieren und zueinander in Beziehung zu setzen. In der Informationsgesellschaft des 21. Jahrhunderts wird es aber zunehmend wichtiger, die in der Flut verschwundenen wissenschaftlichen Informationen zeitnah, angemessen und vollständig zu strukturieren und somit als Basis für eine Wissensgenerierung wieder nutzbar zu machen. Eine normierte Inhaltserschließung digitaler Informationsressourcen ist deshalb für die Deutsche Zentralbibliothek für Wirtschaftswissenschaften (ZBW) als wichtige Informationsinfrastruktureinrichtung in diesem Bereich ein entscheidender und auch erfolgskritischer Aspekt im Wettbewerb mit anderen Informationsdienstleistern. Weil die traditionelle intellektuelle Sacherschließung aber nicht beliebig skalierbar ist - mit dem Anstieg der Zahl an Online-Dokumenten steigt proportional auch der personelle Ressourcenbedarf an Fachreferenten, wenn ein gewisser Qualitätsstandard gehalten werden soll - bedarf es zukünftig anderer Sacherschließungsverfahren. Automatisierte Verschlagwortungsmethoden werden dabei als einzige Möglichkeit angesehen, die bibliothekarische Sacherschließung auch im digitalen Zeitalter zukunftsfest auszugestalten. Zudem können maschinelle Ansätze dazu beitragen, die Heterogenitäten (Indexierungsinkonsistenzen) zwischen den einzelnen Sacherschließer zu nivellieren, und somit zu einer homogeneren Erschließung des Bibliotheksbestandes beitragen.

Shafer, K.: Scorpion Project explores using Dewey to organize the Web (1996) 0.03

0.030793857 = product of:
  0.09238157 = sum of:
    0.045744486 = weight(_text_:world in 6750) [ClassicSimilarity], result of:
      0.045744486 = score(doc=6750,freq=2.0), product of:
        0.1538826 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.04003532 = queryNorm
        0.29726875 = fieldWeight in 6750, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.0546875 = fieldNorm(doc=6750)
    0.04663708 = weight(_text_:web in 6750) [ClassicSimilarity], result of:
      0.04663708 = score(doc=6750,freq=4.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.35694647 = fieldWeight in 6750, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=6750)
  0.33333334 = coord(2/6)

Abstract: As the amount of accessible information on the WWW increases, so will the cost of accessing it, even if search servcies remain free, due to the increasing amount of time users will have to spend to find needed items. Considers what the seemingly unorganized Web and the organized world of libraries can offer each other. The OCLC Scorpion Project is attempting to combine indexing and cataloguing, specifically focusing on building tools for automatic subject recognition using the technqiues of library science and information retrieval. If subject headings or concept domains can be automatically assigned to electronic items, improved filtering tools for searching can be produced

Souza, R.R.; Gil-Leiva, I.: Automatic indexing of scientific texts : a methodological comparison (2016) 0.02

0.02229178 = product of:
  0.06687534 = sum of:
    0.052279413 = weight(_text_:world in 4913) [ClassicSimilarity], result of:
      0.052279413 = score(doc=4913,freq=2.0), product of:
        0.1538826 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.04003532 = queryNorm
        0.33973572 = fieldWeight in 4913, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.0625 = fieldNorm(doc=4913)
    0.014595922 = product of:
      0.043787766 = sum of:
        0.043787766 = weight(_text_:29 in 4913) [ClassicSimilarity], result of:
          0.043787766 = score(doc=4913,freq=2.0), product of:
            0.14083174 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04003532 = queryNorm
            0.31092256 = fieldWeight in 4913, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0625 = fieldNorm(doc=4913)
      0.33333334 = coord(1/3)
  0.33333334 = coord(2/6)

Source: Knowledge organization for a sustainable world: challenges and perspectives for cultural, scientific, and technological sharing in a connected society : proceedings of the Fourteenth International ISKO Conference 27-29 September 2016, Rio de Janeiro, Brazil / organized by International Society for Knowledge Organization (ISKO), ISKO-Brazil, São Paulo State University ; edited by José Augusto Chaves Guimarães, Suellen Oliveira Milani, Vera Dodebei

Schulz, K.U.; Brunner, L.: Vollautomatische thematische Verschlagwortung großer Textkollektionen mittels semantischer Netze (2017) 0.02

0.01524961 = product of:
  0.04574883 = sum of:
    0.0329774 = weight(_text_:web in 3493) [ClassicSimilarity], result of:
      0.0329774 = score(doc=3493,freq=2.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.25239927 = fieldWeight in 3493, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3493)
    0.012771431 = product of:
      0.038314294 = sum of:
        0.038314294 = weight(_text_:29 in 3493) [ClassicSimilarity], result of:
          0.038314294 = score(doc=3493,freq=2.0), product of:
            0.14083174 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04003532 = queryNorm
            0.27205724 = fieldWeight in 3493, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3493)
      0.33333334 = coord(1/3)
  0.33333334 = coord(2/6)

Source: Theorie, Semantik und Organisation von Wissen: Proceedings der 13. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) und dem 13. Internationalen Symposium der Informationswissenschaft der Higher Education Association for Information Science (HI) Potsdam (19.-20.03.2013): 'Theory, Information and Organization of Knowledge' / Proceedings der 14. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) und Natural Language & Information Systems (NLDB) Passau (16.06.2015): 'Lexical Resources for Knowledge Organization' / Proceedings des Workshops der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) auf der SEMANTICS Leipzig (1.09.2014): 'Knowledge Organization and Semantic Web' / Proceedings des Workshops der Polnischen und Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) Cottbus (29.-30.09.2011): 'Economics of Knowledge Production and Organization'. Hrsg. von W. Babik, H.P. Ohly u. K. Weber

Böhm, A.; Seifert, C.; Schlötterer, J.; Granitzer, M.: Identifying tweets from the economic domain (2017) 0.02

0.01524961 = product of:
  0.04574883 = sum of:
    0.0329774 = weight(_text_:web in 3495) [ClassicSimilarity], result of:
      0.0329774 = score(doc=3495,freq=2.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.25239927 = fieldWeight in 3495, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3495)
    0.012771431 = product of:
      0.038314294 = sum of:
        0.038314294 = weight(_text_:29 in 3495) [ClassicSimilarity], result of:
          0.038314294 = score(doc=3495,freq=2.0), product of:
            0.14083174 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04003532 = queryNorm
            0.27205724 = fieldWeight in 3495, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3495)
      0.33333334 = coord(1/3)
  0.33333334 = coord(2/6)

Source: Theorie, Semantik und Organisation von Wissen: Proceedings der 13. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) und dem 13. Internationalen Symposium der Informationswissenschaft der Higher Education Association for Information Science (HI) Potsdam (19.-20.03.2013): 'Theory, Information and Organization of Knowledge' / Proceedings der 14. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) und Natural Language & Information Systems (NLDB) Passau (16.06.2015): 'Lexical Resources for Knowledge Organization' / Proceedings des Workshops der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) auf der SEMANTICS Leipzig (1.09.2014): 'Knowledge Organization and Semantic Web' / Proceedings des Workshops der Polnischen und Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) Cottbus (29.-30.09.2011): 'Economics of Knowledge Production and Organization'. Hrsg. von W. Babik, H.P. Ohly u. K. Weber

Kempf, A.O.: Neue Verfahrenswege der Wissensorganisation : eine Evaluation automatischer Indexierung in der sozialwissenschaftlichen Fachinformation (2017) 0.02
```
0.01524961 = product of:
  0.04574883 = sum of:
    0.0329774 = weight(_text_:web in 3497) [ClassicSimilarity], result of:
      0.0329774 = score(doc=3497,freq=2.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.25239927 = fieldWeight in 3497, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3497)
    0.012771431 = product of:
      0.038314294 = sum of:
        0.038314294 = weight(_text_:29 in 3497) [ClassicSimilarity], result of:
          0.038314294 = score(doc=3497,freq=2.0), product of:
            0.14083174 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04003532 = queryNorm
            0.27205724 = fieldWeight in 3497, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3497)
      0.33333334 = coord(1/3)
  0.33333334 = coord(2/6)
```
Source

Theorie, Semantik und Organisation von Wissen: Proceedings der 13. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) und dem 13. Internationalen Symposium der Informationswissenschaft der Higher Education Association for Information Science (HI) Potsdam (19.-20.03.2013): 'Theory, Information and Organization of Knowledge' / Proceedings der 14. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) und Natural Language & Information Systems (NLDB) Passau (16.06.2015): 'Lexical Resources for Knowledge Organization' / Proceedings des Workshops der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) auf der SEMANTICS Leipzig (1.09.2014): 'Knowledge Organization and Semantic Web' / Proceedings des Workshops der Polnischen und Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) Cottbus (29.-30.09.2011): 'Economics of Knowledge Production and Organization'. Hrsg. von W. Babik, H.P. Ohly u. K. Weber
Li, X.; Zhang, A.; Li, C.; Ouyang, J.; Cai, Y.: Exploring coherent topics by topic modeling with term weighting (2018) 0.01
```
0.013932362 = product of:
  0.041797087 = sum of:
    0.032674633 = weight(_text_:world in 5045) [ClassicSimilarity], result of:
      0.032674633 = score(doc=5045,freq=2.0), product of:
        0.1538826 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.04003532 = queryNorm
        0.21233483 = fieldWeight in 5045, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5045)
    0.009122452 = product of:
      0.027367353 = sum of:
        0.027367353 = weight(_text_:29 in 5045) [ClassicSimilarity], result of:
          0.027367353 = score(doc=5045,freq=2.0), product of:
            0.14083174 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04003532 = queryNorm
            0.19432661 = fieldWeight in 5045, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5045)
      0.33333334 = coord(1/3)
  0.33333334 = coord(2/6)
```
Abstract

Topic models often produce unexplainable topics that are filled with noisy words. The reason is that words in topic modeling have equal weights. High frequency words dominate the top topic word lists, but most of them are meaningless words, e.g., domain-specific stopwords. To address this issue, in this paper we aim to investigate how to weight words, and then develop a straightforward but effective term weighting scheme, namely entropy weighting (EW). The proposed EW scheme is based on conditional entropy measured by word co-occurrences. Compared with existing term weighting schemes, the highlight of EW is that it can automatically reward informative words. For more robust word weight, we further suggest a combination form of EW (CEW) with two existing weighting schemes. Basically, our CEW assigns meaningless words lower weights and informative words higher weights, leading to more coherent topics during topic modeling inference. We apply CEW to Dirichlet multinomial mixture and latent Dirichlet allocation, and evaluate it by topic quality, document clustering and classification tasks on 8 real world data sets. Experimental results show that weighting words can effectively improve the topic modeling performance over both short texts and normal long texts. More importantly, the proposed CEW significantly outperforms the existing term weighting schemes, since it further considers which words are informative.

Date

15. 3.2019 18:55:29

Milstead, J.L.: Thesauri in a full-text world (1998) 0.01

0.013905007 = product of:
  0.041715022 = sum of:
    0.032674633 = weight(_text_:world in 2337) [ClassicSimilarity], result of:
      0.032674633 = score(doc=2337,freq=2.0), product of:
        0.1538826 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.04003532 = queryNorm
        0.21233483 = fieldWeight in 2337, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2337)
    0.009040388 = product of:
      0.027121164 = sum of:
        0.027121164 = weight(_text_:22 in 2337) [ClassicSimilarity], result of:
          0.027121164 = score(doc=2337,freq=2.0), product of:
            0.14019686 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04003532 = queryNorm
            0.19345059 = fieldWeight in 2337, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2337)
      0.33333334 = coord(1/3)
  0.33333334 = coord(2/6)

Date: 22. 9.1997 19:16:05

Chartron, G.; Dalbin, S.; Monteil, M.-G.; Verillon, M.: Indexation manuelle et indexation automatique : dépasser les oppositions (1989) 0.01
```
0.010130975 = product of:
  0.06078585 = sum of:
    0.06078585 = weight(_text_:wide in 3516) [ClassicSimilarity], result of:
      0.06078585 = score(doc=3516,freq=2.0), product of:
        0.17738682 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.04003532 = queryNorm
        0.342674 = fieldWeight in 3516, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3516)
  0.16666667 = coord(1/6)
```
Abstract

Report of a study comparing 2 methods of indexing: LEXINET, a computerised system for indexing titles and summaries only; and manual indexing of full texts, using the thesaurus developed by French Electricity (EDF). Both systems were applied to a collection of approximately 2.000 documents on artifical intelligence from the EDF data base. The results were then analysed to compare quantitative performance (number and range of terms) and qualitative performance (ambiguity of terms, specificity, variability, consistency). Overall, neither system proved ideal: LEXINET was deficient as regards lack of accessibility and excessive ambiguity; while the manual system gave rise to an over-wide variation of terms. The ideal system would appear to be a combination of automatic and manual systems, on the evidence produced here.
Goller, C.; Löning, J.; Will, T.; Wolff, W.: Automatic document classification : a thourough evaluation of various methods (2000) 0.01
```
0.008683693 = product of:
  0.052102152 = sum of:
    0.052102152 = weight(_text_:wide in 5480) [ClassicSimilarity], result of:
      0.052102152 = score(doc=5480,freq=2.0), product of:
        0.17738682 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.04003532 = queryNorm
        0.29372054 = fieldWeight in 5480, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.046875 = fieldNorm(doc=5480)
  0.16666667 = coord(1/6)
```
Abstract

(Automatic) document classification is generally defined as content-based assignment of one or more predefined categories to documents. Usually, machine learning, statistical pattern recognition, or neural network approaches are used to construct classifiers automatically. In this paper we thoroughly evaluate a wide variety of these methods on a document classification task for German text. We evaluate different feature construction and selection methods and various classifiers. Our main results are: (1) feature selection is necessary not only to reduce learning and classification time, but also to avoid overfitting (even for Support Vector Machines); (2) surprisingly, our morphological analysis does not improve classification quality compared to a letter 5-gram approach; (3) Support Vector Machines are significantly better than all other classification methods

Franke-Maier, M.: Anforderungen an die Qualität der Inhaltserschließung im Spannungsfeld von intellektuell und automatisch erzeugten Metadaten (2018) 0.01

0.008475992 = product of:
  0.05085595 = sum of:
    0.05085595 = product of:
      0.076283924 = sum of:
        0.038314294 = weight(_text_:29 in 5344) [ClassicSimilarity], result of:
          0.038314294 = score(doc=5344,freq=2.0), product of:
            0.14083174 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04003532 = queryNorm
            0.27205724 = fieldWeight in 5344, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5344)
        0.03796963 = weight(_text_:22 in 5344) [ClassicSimilarity], result of:
          0.03796963 = score(doc=5344,freq=2.0), product of:
            0.14019686 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04003532 = queryNorm
            0.2708308 = fieldWeight in 5344, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5344)
      0.6666667 = coord(2/3)
  0.16666667 = coord(1/6)

Abstract: Spätestens seit dem Deutschen Bibliothekartag 2018 hat sich die Diskussion zu den automatischen Verfahren der Inhaltserschließung der Deutschen Nationalbibliothek von einer politisch geführten Diskussion in eine Qualitätsdiskussion verwandelt. Der folgende Beitrag beschäftigt sich mit Fragen der Qualität von Inhaltserschließung in digitalen Zeiten, wo heterogene Erzeugnisse unterschiedlicher Verfahren aufeinandertreffen und versucht, wichtige Anforderungen an Qualität zu definieren. Dieser Tagungsbeitrag fasst die vom Autor als Impulse vorgetragenen Ideen beim Workshop der FAG "Erschließung und Informationsvermittlung" des GBV am 29. August 2018 in Kiel zusammen. Der Workshop fand im Rahmen der 22. Verbundkonferenz des GBV statt.

Search (94 results, page 1 of 5)

Authors

Years

Languages

Types

Themes

Classifications