Search (202 results, page 1 of 11)

Van der Veer Martens, B.: Do citation systems represent theories of truth? (2001) 0.05

0.047506485 = product of:
  0.09501297 = sum of:
    0.09501297 = sum of:
      0.006765375 = weight(_text_:a in 3925) [ClassicSimilarity], result of:
        0.006765375 = score(doc=3925,freq=2.0), product of:
          0.053105544 = queryWeight, product of:
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.046056706 = queryNorm
          0.12739488 = fieldWeight in 3925, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.078125 = fieldNorm(doc=3925)
      0.0882476 = weight(_text_:22 in 3925) [ClassicSimilarity], result of:
        0.0882476 = score(doc=3925,freq=4.0), product of:
          0.16128273 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046056706 = queryNorm
          0.54716086 = fieldWeight in 3925, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.078125 = fieldNorm(doc=3925)
  0.5 = coord(1/2)

Date: 22. 7.2006 15:22:28
Type: a

Qin, J.; Paling, S.: Converting a controlled vocabulary into an ontology : the case of GEM (2001) 0.04

0.043180898 = product of:
  0.086361796 = sum of:
    0.086361796 = sum of:
      0.011481222 = weight(_text_:a in 3895) [ClassicSimilarity], result of:
        0.011481222 = score(doc=3895,freq=4.0), product of:
          0.053105544 = queryWeight, product of:
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.046056706 = queryNorm
          0.2161963 = fieldWeight in 3895, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.09375 = fieldNorm(doc=3895)
      0.07488057 = weight(_text_:22 in 3895) [ClassicSimilarity], result of:
        0.07488057 = score(doc=3895,freq=2.0), product of:
          0.16128273 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046056706 = queryNorm
          0.46428138 = fieldWeight in 3895, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.09375 = fieldNorm(doc=3895)
  0.5 = coord(1/2)

Date: 24. 8.2005 19:20:22
Type: a

Heflin, J.; Hendler, J.: Semantic interoperability on the Web (2000) 0.03

0.026575929 = product of:
  0.053151857 = sum of:
    0.053151857 = sum of:
      0.009471525 = weight(_text_:a in 759) [ClassicSimilarity], result of:
        0.009471525 = score(doc=759,freq=8.0), product of:
          0.053105544 = queryWeight, product of:
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.046056706 = queryNorm
          0.17835285 = fieldWeight in 759, product of:
            2.828427 = tf(freq=8.0), with freq of:
              8.0 = termFreq=8.0
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.0546875 = fieldNorm(doc=759)
      0.043680333 = weight(_text_:22 in 759) [ClassicSimilarity], result of:
        0.043680333 = score(doc=759,freq=2.0), product of:
          0.16128273 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046056706 = queryNorm
          0.2708308 = fieldWeight in 759, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=759)
  0.5 = coord(1/2)

Abstract: XML will have a profound impact on the way data is exchanged on the Internet. An important feature of this language is the separation of content from presentation, which makes it easier to select and/or reformat the data. However, due to the likelihood of numerous industry and domain specific DTDs, those who wish to integrate information will still be faced with the problem of semantic interoperability. In this paper we discuss why this problem is not solved by XML, and then discuss why the Resource Description Framework is only a partial solution. We then present the SHOE language, which we feel has many of the features necessary to enable a semantic web, and describe an existing set of tools that make it easy to use the language.
Date: 11. 5.2013 19:22:18
Type: a

Decimal Classification Editorial Policy Committee (2002) 0.02
```
0.02445382 = product of:
  0.04890764 = sum of:
    0.04890764 = sum of:
      0.0047838427 = weight(_text_:a in 236) [ClassicSimilarity], result of:
        0.0047838427 = score(doc=236,freq=4.0), product of:
          0.053105544 = queryWeight, product of:
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.046056706 = queryNorm
          0.090081796 = fieldWeight in 236, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.0390625 = fieldNorm(doc=236)
      0.0441238 = weight(_text_:22 in 236) [ClassicSimilarity], result of:
        0.0441238 = score(doc=236,freq=4.0), product of:
          0.16128273 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046056706 = queryNorm
          0.27358043 = fieldWeight in 236, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=236)
  0.5 = coord(1/2)
```
Abstract

The Decimal Classification Editorial Policy Committee (EPC) held its Meeting 117 at the Library Dec. 3-5, 2001, with chair Andrea Stamm (Northwestern University) presiding. Through its actions at this meeting, significant progress was made toward publication of DDC unabridged Edition 22 in mid-2003 and Abridged Edition 14 in early 2004. For Edition 22, the committee approved the revisions to two major segments of the classification: Table 2 through 55 Iran (the first half of the geographic area table) and 900 History and geography. EPC approved updates to several parts of the classification it had already considered: 004-006 Data processing, Computer science; 340 Law; 370 Education; 510 Mathematics; 610 Medicine; Table 3 issues concerning treatment of scientific and technical themes, with folklore, arts, and printing ramifications at 398.2 - 398.3, 704.94, and 758; Table 5 and Table 6 Ethnic Groups and Languages (portions concerning American native peoples and languages); and tourism issues at 647.9 and 790. Reports on the results of testing the approved 200 Religion and 305-306 Social groups schedules were received, as was a progress report on revision work for the manual being done by Ross Trotter (British Library, retired). Revisions for Abridged Edition 14 that received committee approval included 010 Bibliography; 070 Journalism; 150 Psychology; 370 Education; 380 Commerce, communications, and transportation; 621 Applied physics; 624 Civil engineering; and 629.8 Automatic control engineering. At the meeting the committee received print versions of _DC&_ numbers 4 and 5. Primarily for the use of Dewey translators, these cumulations list changes, substantive and cosmetic, to DDC Edition 21 and Abridged Edition 13 for the period October 1999 - December 2001. EPC will hold its Meeting 118 at the Library May 15-17, 2002.

Type

a

Beppler, F.D.; Fonseca, F.T.; Pacheco, R.C.S.: Hermeneus: an architecture for an ontology-enabled information retrieval (2008) 0.02

0.022779368 = product of:
  0.045558736 = sum of:
    0.045558736 = sum of:
      0.008118451 = weight(_text_:a in 3261) [ClassicSimilarity], result of:
        0.008118451 = score(doc=3261,freq=8.0), product of:
          0.053105544 = queryWeight, product of:
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.046056706 = queryNorm
          0.15287387 = fieldWeight in 3261, product of:
            2.828427 = tf(freq=8.0), with freq of:
              8.0 = termFreq=8.0
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.046875 = fieldNorm(doc=3261)
      0.037440285 = weight(_text_:22 in 3261) [ClassicSimilarity], result of:
        0.037440285 = score(doc=3261,freq=2.0), product of:
          0.16128273 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046056706 = queryNorm
          0.23214069 = fieldWeight in 3261, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=3261)
  0.5 = coord(1/2)

Abstract: Ontologies improve IR systems regarding its retrieval and presentation of information, which make the task of finding information more effective, efficient, and interactive. In this paper we argue that ontologies also greatly improve the engineering of such systems. We created a framework that uses ontology to drive the process of engineering an IR system. We developed a prototype that shows how a domain specialist without knowledge in the IR field can build an IR system with interactive components. The resulting system provides support for users not only to find their information needs but also to extend their state of knowledge. This way, our approach to ontology-enabled information retrieval addresses both the engineering aspect described here and also the usability aspect described elsewhere.
Date: 28.11.2016 12:43:22
Type: a

Atran, S.; Medin, D.L.; Ross, N.: Evolution and devolution of knowledge : a tale of two biologies (2004) 0.02

0.021590449 = product of:
  0.043180898 = sum of:
    0.043180898 = sum of:
      0.005740611 = weight(_text_:a in 479) [ClassicSimilarity], result of:
        0.005740611 = score(doc=479,freq=4.0), product of:
          0.053105544 = queryWeight, product of:
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.046056706 = queryNorm
          0.10809815 = fieldWeight in 479, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.046875 = fieldNorm(doc=479)
      0.037440285 = weight(_text_:22 in 479) [ClassicSimilarity], result of:
        0.037440285 = score(doc=479,freq=2.0), product of:
          0.16128273 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046056706 = queryNorm
          0.23214069 = fieldWeight in 479, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=479)
  0.5 = coord(1/2)

Date: 23. 1.2022 10:22:18
Type: a

Bittner, T.; Donnelly, M.; Winter, S.: Ontology and semantic interoperability (2006) 0.02

0.020749755 = product of:
  0.04149951 = sum of:
    0.04149951 = sum of:
      0.0040592253 = weight(_text_:a in 4820) [ClassicSimilarity], result of:
        0.0040592253 = score(doc=4820,freq=2.0), product of:
          0.053105544 = queryWeight, product of:
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.046056706 = queryNorm
          0.07643694 = fieldWeight in 4820, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.046875 = fieldNorm(doc=4820)
      0.037440285 = weight(_text_:22 in 4820) [ClassicSimilarity], result of:
        0.037440285 = score(doc=4820,freq=2.0), product of:
          0.16128273 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046056706 = queryNorm
          0.23214069 = fieldWeight in 4820, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=4820)
  0.5 = coord(1/2)

Date: 3.12.2016 18:39:22
Type: a

Baeza-Yates, R.; Boldi, P.; Castillo, C.: Generalizing PageRank : damping functions for linkbased ranking algorithms (2006) 0.02
```
0.02067415 = product of:
  0.0413483 = sum of:
    0.0413483 = sum of:
      0.010148063 = weight(_text_:a in 2565) [ClassicSimilarity], result of:
        0.010148063 = score(doc=2565,freq=18.0), product of:
          0.053105544 = queryWeight, product of:
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.046056706 = queryNorm
          0.19109234 = fieldWeight in 2565, product of:
            4.2426405 = tf(freq=18.0), with freq of:
              18.0 = termFreq=18.0
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.0390625 = fieldNorm(doc=2565)
      0.03120024 = weight(_text_:22 in 2565) [ClassicSimilarity], result of:
        0.03120024 = score(doc=2565,freq=2.0), product of:
          0.16128273 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046056706 = queryNorm
          0.19345059 = fieldWeight in 2565, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=2565)
  0.5 = coord(1/2)
```
Abstract

This paper introduces a family of link-based ranking algorithms that propagate page importance through links. In these algorithms there is a damping function that decreases with distance, so a direct link implies more endorsement than a link through a long path. PageRank is the most widely known ranking function of this family. The main objective of this paper is to determine whether this family of ranking techniques has some interest per se, and how different choices for the damping function impact on rank quality and on convergence speed. Even though our results suggest that PageRank can be approximated with other simpler forms of rankings that may be computed more efficiently, our focus is of more speculative nature, in that it aims at separating the kernel of PageRank, that is, link-based importance propagation, from the way propagation decays over paths. We focus on three damping functions, having linear, exponential, and hyperbolic decay on the lengths of the paths. The exponential decay corresponds to PageRank, and the other functions are new. Our presentation includes algorithms, analysis, comparisons and experiments that study their behavior under different parameters in real Web graph data. Among other results, we show how to calculate a linear approximation that induces a page ordering that is almost identical to PageRank's using a fixed small number of iterations; comparisons were performed using Kendall's tau on large domain datasets.

Date

16. 1.2016 10:22:28

Type

a
Boldi, P.; Santini, M.; Vigna, S.: PageRank as a function of the damping factor (2005) 0.02
```
0.020383961 = product of:
  0.040767923 = sum of:
    0.040767923 = sum of:
      0.009567685 = weight(_text_:a in 2564) [ClassicSimilarity], result of:
        0.009567685 = score(doc=2564,freq=16.0), product of:
          0.053105544 = queryWeight, product of:
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.046056706 = queryNorm
          0.18016359 = fieldWeight in 2564, product of:
            4.0 = tf(freq=16.0), with freq of:
              16.0 = termFreq=16.0
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.0390625 = fieldNorm(doc=2564)
      0.03120024 = weight(_text_:22 in 2564) [ClassicSimilarity], result of:
        0.03120024 = score(doc=2564,freq=2.0), product of:
          0.16128273 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046056706 = queryNorm
          0.19345059 = fieldWeight in 2564, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=2564)
  0.5 = coord(1/2)
```
Abstract

PageRank is defined as the stationary state of a Markov chain. The chain is obtained by perturbing the transition matrix induced by a web graph with a damping factor alpha that spreads uniformly part of the rank. The choice of alpha is eminently empirical, and in most cases the original suggestion alpha=0.85 by Brin and Page is still used. Recently, however, the behaviour of PageRank with respect to changes in alpha was discovered to be useful in link-spam detection. Moreover, an analytical justification of the value chosen for alpha is still missing. In this paper, we give the first mathematical analysis of PageRank when alpha changes. In particular, we show that, contrarily to popular belief, for real-world graphs values of alpha close to 1 do not give a more meaningful ranking. Then, we give closed-form formulae for PageRank derivatives of any order, and an extension of the Power Method that approximates them with convergence O(t**k*alpha**t) for the k-th derivative. Finally, we show a tight connection between iterated computation and analytical behaviour by proving that the k-th iteration of the Power Method gives exactly the PageRank value obtained using a Maclaurin polynomial of degree k. The latter result paves the way towards the application of analytical methods to the study of PageRank.

Date

16. 1.2016 10:22:28

Type

a
Baker, T.: ¬A grammar of Dublin Core (2000) 0.02
```
0.018531231 = product of:
  0.037062462 = sum of:
    0.037062462 = sum of:
      0.012102271 = weight(_text_:a in 1236) [ClassicSimilarity], result of:
        0.012102271 = score(doc=1236,freq=40.0), product of:
          0.053105544 = queryWeight, product of:
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.046056706 = queryNorm
          0.22789092 = fieldWeight in 1236, product of:
            6.3245554 = tf(freq=40.0), with freq of:
              40.0 = termFreq=40.0
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.03125 = fieldNorm(doc=1236)
      0.02496019 = weight(_text_:22 in 1236) [ClassicSimilarity], result of:
        0.02496019 = score(doc=1236,freq=2.0), product of:
          0.16128273 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046056706 = queryNorm
          0.15476047 = fieldWeight in 1236, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.03125 = fieldNorm(doc=1236)
  0.5 = coord(1/2)
```
Abstract

Dublin Core is often presented as a modern form of catalog card -- a set of elements (and now qualifiers) that describe resources in a complete package. Sometimes it is proposed as an exchange format for sharing records among multiple collections. The founding principle that "every element is optional and repeatable" reinforces the notion that a Dublin Core description is to be taken as a whole. This paper, in contrast, is based on a much different premise: Dublin Core is a language. More precisely, it is a small language for making a particular class of statements about resources. Like natural languages, it has a vocabulary of word-like terms, the two classes of which -- elements and qualifiers -- function within statements like nouns and adjectives; and it has a syntax for arranging elements and qualifiers into statements according to a simple pattern. Whenever tourists order a meal or ask directions in an unfamiliar language, considerate native speakers will spontaneously limit themselves to basic words and simple sentence patterns along the lines of "I am so-and-so" or "This is such-and-such". Linguists call this pidginization. In such situations, a small phrase book or translated menu can be most helpful. By analogy, today's Web has been called an Internet Commons where users and information providers from a wide range of scientific, commercial, and social domains present their information in a variety of incompatible data models and description languages. In this context, Dublin Core presents itself as a metadata pidgin for digital tourists who must find their way in this linguistically diverse landscape. Its vocabulary is small enough to learn quickly, and its basic pattern is easily grasped. It is well-suited to serve as an auxiliary language for digital libraries. This grammar starts by defining terms. It then follows a 200-year-old tradition of English grammar teaching by focusing on the structure of single statements. It concludes by looking at the growing dictionary of Dublin Core vocabulary terms -- its registry, and at how statements can be used to build the metadata equivalent of paragraphs and compositions -- the application profile.

Date

26.12.2011 14:01:22

Type

a
Reiner, U.: Automatische DDC-Klassifizierung bibliografischer Titeldatensätze der Deutschen Nationalbibliografie (2009) 0.01
```
0.01482369 = product of:
  0.02964738 = sum of:
    0.02964738 = sum of:
      0.0046871896 = weight(_text_:a in 3284) [ClassicSimilarity], result of:
        0.0046871896 = score(doc=3284,freq=6.0), product of:
          0.053105544 = queryWeight, product of:
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.046056706 = queryNorm
          0.088261776 = fieldWeight in 3284, product of:
            2.4494898 = tf(freq=6.0), with freq of:
              6.0 = termFreq=6.0
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.03125 = fieldNorm(doc=3284)
      0.02496019 = weight(_text_:22 in 3284) [ClassicSimilarity], result of:
        0.02496019 = score(doc=3284,freq=2.0), product of:
          0.16128273 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046056706 = queryNorm
          0.15476047 = fieldWeight in 3284, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.03125 = fieldNorm(doc=3284)
  0.5 = coord(1/2)
```
Abstract

Das Klassifizieren von Objekten (z. B. Fauna, Flora, Texte) ist ein Verfahren, das auf menschlicher Intelligenz basiert. In der Informatik - insbesondere im Gebiet der Künstlichen Intelligenz (KI) - wird u. a. untersucht, inweit Verfahren, die menschliche Intelligenz benötigen, automatisiert werden können. Hierbei hat sich herausgestellt, dass die Lösung von Alltagsproblemen eine größere Herausforderung darstellt, als die Lösung von Spezialproblemen, wie z. B. das Erstellen eines Schachcomputers. So ist "Rybka" der seit Juni 2007 amtierende Computerschach-Weltmeistern. Inwieweit Alltagsprobleme mit Methoden der Künstlichen Intelligenz gelöst werden können, ist eine - für den allgemeinen Fall - noch offene Frage. Beim Lösen von Alltagsproblemen spielt die Verarbeitung der natürlichen Sprache, wie z. B. das Verstehen, eine wesentliche Rolle. Den "gesunden Menschenverstand" als Maschine (in der Cyc-Wissensbasis in Form von Fakten und Regeln) zu realisieren, ist Lenat's Ziel seit 1984. Bezüglich des KI-Paradeprojektes "Cyc" gibt es CycOptimisten und Cyc-Pessimisten. Das Verstehen der natürlichen Sprache (z. B. Werktitel, Zusammenfassung, Vorwort, Inhalt) ist auch beim intellektuellen Klassifizieren von bibliografischen Titeldatensätzen oder Netzpublikationen notwendig, um diese Textobjekte korrekt klassifizieren zu können. Seit dem Jahr 2007 werden von der Deutschen Nationalbibliothek nahezu alle Veröffentlichungen mit der Dewey Dezimalklassifikation (DDC) intellektuell klassifiziert.
Die Menge der zu klassifizierenden Veröffentlichungen steigt spätestens seit der Existenz des World Wide Web schneller an, als sie intellektuell sachlich erschlossen werden kann. Daher werden Verfahren gesucht, um die Klassifizierung von Textobjekten zu automatisieren oder die intellektuelle Klassifizierung zumindest zu unterstützen. Seit 1968 gibt es Verfahren zur automatischen Dokumentenklassifizierung (Information Retrieval, kurz: IR) und seit 1992 zur automatischen Textklassifizierung (ATC: Automated Text Categorization). Seit immer mehr digitale Objekte im World Wide Web zur Verfügung stehen, haben Arbeiten zur automatischen Textklassifizierung seit ca. 1998 verstärkt zugenommen. Dazu gehören seit 1996 auch Arbeiten zur automatischen DDC-Klassifizierung bzw. RVK-Klassifizierung von bibliografischen Titeldatensätzen und Volltextdokumenten. Bei den Entwicklungen handelt es sich unseres Wissens bislang um experimentelle und keine im ständigen Betrieb befindlichen Systeme. Auch das VZG-Projekt Colibri/DDC ist seit 2006 u. a. mit der automatischen DDC-Klassifizierung befasst. Die diesbezüglichen Untersuchungen und Entwicklungen dienen zur Beantwortung der Forschungsfrage: "Ist es möglich, eine inhaltlich stimmige DDC-Titelklassifikation aller GVK-PLUS-Titeldatensätze automatisch zu erzielen?"

Date

22. 1.2010 14:41:24

Type

a
Bradford, R.B.: Relationship discovery in large text collections using Latent Semantic Indexing (2006) 0.01
```
0.01482369 = product of:
  0.02964738 = sum of:
    0.02964738 = sum of:
      0.0046871896 = weight(_text_:a in 1163) [ClassicSimilarity], result of:
        0.0046871896 = score(doc=1163,freq=6.0), product of:
          0.053105544 = queryWeight, product of:
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.046056706 = queryNorm
          0.088261776 = fieldWeight in 1163, product of:
            2.4494898 = tf(freq=6.0), with freq of:
              6.0 = termFreq=6.0
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.03125 = fieldNorm(doc=1163)
      0.02496019 = weight(_text_:22 in 1163) [ClassicSimilarity], result of:
        0.02496019 = score(doc=1163,freq=2.0), product of:
          0.16128273 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046056706 = queryNorm
          0.15476047 = fieldWeight in 1163, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.03125 = fieldNorm(doc=1163)
  0.5 = coord(1/2)
```
Abstract

This paper addresses the problem of information discovery in large collections of text. For users, one of the key problems in working with such collections is determining where to focus their attention. In selecting documents for examination, users must be able to formulate reasonably precise queries. Queries that are too broad will greatly reduce the efficiency of information discovery efforts by overwhelming the users with peripheral information. In order to formulate efficient queries, a mechanism is needed to automatically alert users regarding potentially interesting information contained within the collection. This paper presents the results of an experiment designed to test one approach to generation of such alerts. The technique of latent semantic indexing (LSI) is used to identify relationships among entities of interest. Entity extraction software is used to pre-process the text of the collection so that the LSI space contains representation vectors for named entities in addition to those for individual terms. In the LSI space, the cosine of the angle between the representation vectors for two entities captures important information regarding the degree of association of those two entities. For appropriate choices of entities, determining the entity pairs with the highest mutual cosine values yields valuable information regarding the contents of the text collection. The test database used for the experiment consists of 150,000 news articles. The proposed approach for alert generation is tested using a counterterrorism analysis example. The approach is shown to have significant potential for aiding users in rapidly focusing on information of potential importance in large text collections. The approach also has value in identifying possible use of aliases.

Source

Proceedings of the Fourth Workshop on Link Analysis, Counterterrorism, and Security, SIAM Data Mining Conference, Bethesda, MD, 20-22 April, 2006. [http://www.siam.org/meetings/sdm06/workproceed/Link%20Analysis/15.pdf]

Type

a
Lavoie, B.; Connaway, L.S.; Dempsey, L.: Anatomy of aggregate collections : the example of Google print for libraries (2005) 0.01
```
0.0132904 = product of:
  0.0265808 = sum of:
    0.0265808 = sum of:
      0.007860656 = weight(_text_:a in 1184) [ClassicSimilarity], result of:
        0.007860656 = score(doc=1184,freq=30.0), product of:
          0.053105544 = queryWeight, product of:
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.046056706 = queryNorm
          0.1480195 = fieldWeight in 1184, product of:
            5.477226 = tf(freq=30.0), with freq of:
              30.0 = termFreq=30.0
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.0234375 = fieldNorm(doc=1184)
      0.018720143 = weight(_text_:22 in 1184) [ClassicSimilarity], result of:
        0.018720143 = score(doc=1184,freq=2.0), product of:
          0.16128273 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046056706 = queryNorm
          0.116070345 = fieldWeight in 1184, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0234375 = fieldNorm(doc=1184)
  0.5 = coord(1/2)
```
Abstract

Google's December 2004 announcement of its intention to collaborate with five major research libraries - Harvard University, the University of Michigan, Stanford University, the University of Oxford, and the New York Public Library - to digitize and surface their print book collections in the Google searching universe has, predictably, stirred conflicting opinion, with some viewing the project as a welcome opportunity to enhance the visibility of library collections in new environments, and others wary of Google's prospective role as gateway to these collections. The project has been vigorously debated on discussion lists and blogs, with the participating libraries commonly referred to as "the Google 5". One point most observers seem to concede is that the questions raised by this initiative are both timely and significant. The Google Print Library Project (GPLP) has galvanized a long overdue, multi-faceted discussion about library print book collections. The print book is core to library identity and practice, but in an era of zero-sum budgeting, it is almost inevitable that print book budgets will decline as budgets for serials, digital resources, and other materials expand. As libraries re-allocate resources to accommodate changing patterns of user needs, print book budgets may be adversely impacted. Of course, the degree of impact will depend on a library's perceived mission. A public library may expect books to justify their shelf-space, with de-accession the consequence of minimal use. A national library, on the other hand, has a responsibility to the scholarly and cultural record and may seek to collect comprehensively within particular areas, with the attendant obligation to secure the long-term retention of its print book collections. The combination of limited budgets, changing user needs, and differences in library collection strategies underscores the need to think about a collective, or system-wide, print book collection - in particular, how can an inter-institutional system be organized to achieve goals that would be difficult, and/or prohibitively expensive, for any one library to undertake individually [4]? Mass digitization programs like GPLP cast new light on these and other issues surrounding the future of library print book collections, but at this early stage, it is light that illuminates only dimly. It will be some time before GPLP's implications for libraries and library print book collections can be fully appreciated and evaluated. But the strong interest and lively debate generated by this initiative suggest that some preliminary analysis - premature though it may be - would be useful, if only to undertake a rough mapping of the terrain over which GPLP potentially will extend. At the least, some early perspective helps shape interesting questions for the future, when the boundaries of GPLP become settled, workflows for producing and managing the digitized materials become systematized, and usage patterns within the GPLP framework begin to emerge.
This article offers some perspectives on GPLP in light of what is known about library print book collections in general, and those of the Google 5 in particular, from information in OCLC's WorldCat bibliographic database and holdings file. Questions addressed include: * Coverage: What proportion of the system-wide print book collection will GPLP potentially cover? What is the degree of holdings overlap across the print book collections of the five participating libraries? * Language: What is the distribution of languages associated with the print books held by the GPLP libraries? Which languages are predominant? * Copyright: What proportion of the GPLP libraries' print book holdings are out of copyright? * Works: How many distinct works are represented in the holdings of the GPLP libraries? How does a focus on works impact coverage and holdings overlap? * Convergence: What are the effects on coverage of using a different set of five libraries? What are the effects of adding the holdings of additional libraries to those of the GPLP libraries, and how do these effects vary by library type? These questions certainly do not exhaust the analytical possibilities presented by GPLP. More in-depth analysis might look at Google 5 coverage in particular subject areas; it also would be interesting to see how many books covered by the GPLP have already been digitized in other contexts. However, these questions are left to future studies. The purpose here is to explore a few basic questions raised by GPLP, and in doing so, provide an empirical context for the debate that is sure to continue for some time to come. A secondary objective is to lay some groundwork for a general set of questions that could be used to explore the implications of any mass digitization initiative. A suggested list of questions is provided in the conclusion of the article.

Date

26.12.2011 14:08:22

Type

a

Foerster, H. von; Müller, A.; Müller, K.H.: Rück- und Vorschauen : Heinz von Foerster im Gespräch mit Albert Müller und Karl H. Müller (2001) 0.01

0.010795224 = product of:
  0.021590449 = sum of:
    0.021590449 = sum of:
      0.0028703054 = weight(_text_:a in 5988) [ClassicSimilarity], result of:
        0.0028703054 = score(doc=5988,freq=4.0), product of:
          0.053105544 = queryWeight, product of:
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.046056706 = queryNorm
          0.054049075 = fieldWeight in 5988, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.0234375 = fieldNorm(doc=5988)
      0.018720143 = weight(_text_:22 in 5988) [ClassicSimilarity], result of:
        0.018720143 = score(doc=5988,freq=2.0), product of:
          0.16128273 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046056706 = queryNorm
          0.116070345 = fieldWeight in 5988, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0234375 = fieldNorm(doc=5988)
  0.5 = coord(1/2)

Date: 10. 9.2006 17:22:54
Type: a

Bartol, W.; Pióro, K.; Rosselló, F.: On the coverings by tolerance classes (2003) 0.00
```
0.0037439493 = product of:
  0.0074878987 = sum of:
    0.0074878987 = product of:
      0.014975797 = sum of:
        0.014975797 = weight(_text_:a in 4842) [ClassicSimilarity], result of:
          0.014975797 = score(doc=4842,freq=20.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.28200063 = fieldWeight in 4842, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4842)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

A tolerance is a reflexive and symmetric, but not necessarily transitive, binary relation. Contrary what happens with equivalence relations, when dealing with tolerances one must distinguish between blocks (maximal subsets where the tolerance is a total relation) and classes (the class of an element is the set of those elements tolerable with it). Both blocks and classes of a tolerance on a set define coverings of this set, but not every covering of a set is defined in this way. The characterization of those coverings that are families of blocks of some tolerance has been known for more than a decade now. In this paper we give a characterization of those coverings of a finite set that are families of classes of some tolerance.

Type

a

Burnard, L.: Text encoding for interchange : a new consortium (2000) 0.00

0.00334869 = product of:
  0.00669738 = sum of:
    0.00669738 = product of:
      0.01339476 = sum of:
        0.01339476 = weight(_text_:a in 406) [ClassicSimilarity], result of:
          0.01339476 = score(doc=406,freq=4.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.25222903 = fieldWeight in 406, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.109375 = fieldNorm(doc=406)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Type: a

Miller, P.: Towards a typology for portals (2003) 0.00

0.00334869 = product of:
  0.00669738 = sum of:
    0.00669738 = product of:
      0.01339476 = sum of:
        0.01339476 = weight(_text_:a in 4087) [ClassicSimilarity], result of:
          0.01339476 = score(doc=4087,freq=4.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.25222903 = fieldWeight in 4087, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.109375 = fieldNorm(doc=4087)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Type: a

Pinto, F.; Fraser, M.: Access management, the key to a Portal (2003) 0.00

0.00334869 = product of:
  0.00669738 = sum of:
    0.00669738 = product of:
      0.01339476 = sum of:
        0.01339476 = weight(_text_:a in 4111) [ClassicSimilarity], result of:
          0.01339476 = score(doc=4111,freq=4.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.25222903 = fieldWeight in 4111, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.109375 = fieldNorm(doc=4111)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Type: a

Lindholm, J.; Schönthal, T.; Jansson , K.: Experiences of harvesting Web resources in engineering using automatic classification (2003) 0.00

0.0033143433 = product of:
  0.0066286866 = sum of:
    0.0066286866 = product of:
      0.013257373 = sum of:
        0.013257373 = weight(_text_:a in 4088) [ClassicSimilarity], result of:
          0.013257373 = score(doc=4088,freq=12.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.24964198 = fieldWeight in 4088, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=4088)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: Authors describe the background and the work involved in setting up Engine-e, a Web index that uses automatic classification as a mean for the selection of resources in Engineering. Considerations in offering a robot-generated Web index as a successor to a manually indexed quality-controlled subject gateway are also discussed
Type: a

WordHoard: finding multiword units (20??) 0.00
```
0.0031324127 = product of:
  0.0062648254 = sum of:
    0.0062648254 = product of:
      0.012529651 = sum of:
        0.012529651 = weight(_text_:a in 1123) [ClassicSimilarity], result of:
          0.012529651 = score(doc=1123,freq=14.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.23593865 = fieldWeight in 1123, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1123)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

WordHoard defines a multiword unit as a special type of collocate in which the component words comprise a meaningful phrase. For example, "Knight of the Round Table" is a meaningful multiword unit or phrase. WordHoard uses the notion of a pseudo-bigram to generalize the computation of bigram (two word) statistical measures to phrases (n-grams) longer than two words, and to allow comparisons of these measures for phrases with different word counts. WordHoard applies the localmaxs algorithm of Silva et al. to the pseudo-bigrams to identify potential compositional phrases that "stand out" in a text. WordHoard can also filter two and three word phrases using the word class filters suggested by Justeson and Katz.

Type

a

Search (202 results, page 1 of 11)

Authors

Languages

Themes