Search (117 results, page 6 of 6)

Baker, T.: ¬A grammar of Dublin Core (2000) 0.01

0.0050708186 = product of:
  0.010141637 = sum of:
    0.010141637 = product of:
      0.020283274 = sum of:
        0.020283274 = weight(_text_:22 in 1236) [ClassicSimilarity], result of:
          0.020283274 = score(doc=1236,freq=2.0), product of:
            0.13106237 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.037426826 = queryNorm
            0.15476047 = fieldWeight in 1236, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=1236)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 26.12.2011 14:01:22

Reiner, U.: Automatische DDC-Klassifizierung bibliografischer Titeldatensätze der Deutschen Nationalbibliografie (2009) 0.01

0.0050708186 = product of:
  0.010141637 = sum of:
    0.010141637 = product of:
      0.020283274 = sum of:
        0.020283274 = weight(_text_:22 in 3284) [ClassicSimilarity], result of:
          0.020283274 = score(doc=3284,freq=2.0), product of:
            0.13106237 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.037426826 = queryNorm
            0.15476047 = fieldWeight in 3284, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=3284)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 1.2010 14:41:24

Bradford, R.B.: Relationship discovery in large text collections using Latent Semantic Indexing (2006) 0.01

0.0050708186 = product of:
  0.010141637 = sum of:
    0.010141637 = product of:
      0.020283274 = sum of:
        0.020283274 = weight(_text_:22 in 1163) [ClassicSimilarity], result of:
          0.020283274 = score(doc=1163,freq=2.0), product of:
            0.13106237 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.037426826 = queryNorm
            0.15476047 = fieldWeight in 1163, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=1163)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Proceedings of the Fourth Workshop on Link Analysis, Counterterrorism, and Security, SIAM Data Mining Conference, Bethesda, MD, 20-22 April, 2006. [http://www.siam.org/meetings/sdm06/workproceed/Link%20Analysis/15.pdf]

Cazan, C.: Medizinische Ontologien : das Ende des MeSH (2006) 0.00

0.0049201283 = product of:
  0.009840257 = sum of:
    0.009840257 = product of:
      0.019680513 = sum of:
        0.019680513 = weight(_text_:c in 132) [ClassicSimilarity], result of:
          0.019680513 = score(doc=132,freq=2.0), product of:
            0.1291003 = queryWeight, product of:
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.037426826 = queryNorm
            0.1524436 = fieldWeight in 132, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.03125 = fieldNorm(doc=132)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Heckner, M.; Mühlbacher, S.; Wolff, C.: Tagging tagging : a classification model for user keywords in scientific bibliography management systems (2007) 0.00

0.0049201283 = product of:
  0.009840257 = sum of:
    0.009840257 = product of:
      0.019680513 = sum of:
        0.019680513 = weight(_text_:c in 533) [ClassicSimilarity], result of:
          0.019680513 = score(doc=533,freq=2.0), product of:
            0.1291003 = queryWeight, product of:
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.037426826 = queryNorm
            0.1524436 = fieldWeight in 533, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.03125 = fieldNorm(doc=533)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

DeVorsey, K.L.; Elson, C.; Gregorev, N.P.; Hansen, J.: ¬The development of a local thesaurus to improve access to the anthropological collections of the American Museum of Natural History (2006) 0.00

0.0049201283 = product of:
  0.009840257 = sum of:
    0.009840257 = product of:
      0.019680513 = sum of:
        0.019680513 = weight(_text_:c in 1174) [ClassicSimilarity], result of:
          0.019680513 = score(doc=1174,freq=2.0), product of:
            0.1291003 = queryWeight, product of:
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.037426826 = queryNorm
            0.1524436 = fieldWeight in 1174, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.03125 = fieldNorm(doc=1174)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Crane, G.: What do you do with a million books? (2006) 0.00
```
0.0049201283 = product of:
  0.009840257 = sum of:
    0.009840257 = product of:
      0.019680513 = sum of:
        0.019680513 = weight(_text_:c in 1180) [ClassicSimilarity], result of:
          0.019680513 = score(doc=1180,freq=2.0), product of:
            0.1291003 = queryWeight, product of:
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.037426826 = queryNorm
            0.1524436 = fieldWeight in 1180, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.03125 = fieldNorm(doc=1180)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The Greek historian Herodotus has the Athenian sage Solon estimate the lifetime of a human being at c. 26,250 days (Herodotus, The Histories, 1.32). If we could read a book on each of those days, it would take almost forty lifetimes to work through every volume in a single million book library. The continuous tradition of written European literature that began with the Iliad and Odyssey in the eighth century BCE is itself little more than a million days old. While libraries that contain more than one million items are not unusual, print libraries never possessed a million books of use to any one reader. The great libraries that took shape in the nineteenth and twentieth centuries were meta-structures, whose catalogues and finding aids allowed readers to create their own customized collections, building on the fixed classification schemes and disciplinary structures that took shape in the nineteenth century. The digital libraries of the early twenty-first century can be searched and their contents transmitted around the world. They can contain time-based media, images, quantitative data, and a far richer array of content than print, with visualization technologies blurring the boundaries between library and museum. But our digital libraries remain filled with digital incunabula - digital objects whose form remains firmly rooted in traditions of print, with HTML and PDF largely mimicking the limitations of their print predecessors. Vast collections based on image books - raw digital pictures of books with searchable but uncorrected text from OCR - could arguably retard our long-term progress, reinforcing the hegemony of structures that evolved to minimize the challenges of a world where paper was the only medium of distribution and where humans alone could read. Already the books in a digital library are beginning to read one another and to confer among themselves before creating a new synthetic document for review by their human readers.

Birmingham, W.; Pardo, B.; Meek, C.; Shifrin, J.: ¬The MusArt music-retrieval system (2002) 0.00

0.0049201283 = product of:
  0.009840257 = sum of:
    0.009840257 = product of:
      0.019680513 = sum of:
        0.019680513 = weight(_text_:c in 1205) [ClassicSimilarity], result of:
          0.019680513 = score(doc=1205,freq=2.0), product of:
            0.1291003 = queryWeight, product of:
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.037426826 = queryNorm
            0.1524436 = fieldWeight in 1205, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.03125 = fieldNorm(doc=1205)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Waard, A. de; Fluit, C.; Harmelen, F. van: Drug Ontology Project for Elsevier (DOPE) (2007) 0.00

0.0049201283 = product of:
  0.009840257 = sum of:
    0.009840257 = product of:
      0.019680513 = sum of:
        0.019680513 = weight(_text_:c in 758) [ClassicSimilarity], result of:
          0.019680513 = score(doc=758,freq=2.0), product of:
            0.1291003 = queryWeight, product of:
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.037426826 = queryNorm
            0.1524436 = fieldWeight in 758, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.03125 = fieldNorm(doc=758)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

cis: Nationalbibliothek will das deutsche Internet kopieren (2008) 0.00

0.004436966 = product of:
  0.008873932 = sum of:
    0.008873932 = product of:
      0.017747864 = sum of:
        0.017747864 = weight(_text_:22 in 4609) [ClassicSimilarity], result of:
          0.017747864 = score(doc=4609,freq=2.0), product of:
            0.13106237 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.037426826 = queryNorm
            0.1354154 = fieldWeight in 4609, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.02734375 = fieldNorm(doc=4609)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 24.10.2008 14:19:22

Oberhauser, O.: Card-Image Public Access Catalogues (CIPACs) : a critical consideration of a cost-effective alternative to full retrospective catalogue conversion (2002) 0.00
```
0.004305112 = product of:
  0.008610224 = sum of:
    0.008610224 = product of:
      0.017220449 = sum of:
        0.017220449 = weight(_text_:c in 1703) [ClassicSimilarity], result of:
          0.017220449 = score(doc=1703,freq=2.0), product of:
            0.1291003 = queryWeight, product of:
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.037426826 = queryNorm
            0.13338815 = fieldWeight in 1703, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.02734375 = fieldNorm(doc=1703)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Footnote

Rez. in: ABI-Technik 21(2002) H.3, S.292 (E. Pietzsch): "Otto C. Oberhauser hat mit seiner Diplomarbeit eine beeindruckende Analyse digitalisierter Zettelkataloge (CIPACs) vorgelegt. Die Arbeit wartet mit einer Fülle von Daten und Statistiken auf, wie sie bislang nicht vorgelegen haben. BibliothekarInnen, die sich mit der Digitalisierung von Katalogen tragen, finden darin eine einzigartige Vorlage zur Entscheidungsfindung. Nach einem einführenden Kapitel bringt Oberhauser zunächst einen Überblick über eine Auswahl weltweit verfügbarer CIPACs, deren Indexierungsmethode (Binäre Suche, partielle Indexierung, Suche in OCR-Daten) und stellt vergleichende Betrachtungen über geographische Verteilung, Größe, Software, Navigation und andere Eigenschaften an. Anschließend beschreibt und analysiert er Implementierungsprobleme, beginnend bei Gründen, die zur Digitalisierung führen können: Kosten, Umsetzungsdauer, Zugriffsverbesserung, Stellplatzersparnis. Er fährt fort mit technischen Aspekten wie Scannen und Qualitätskontrolle, Image Standards, OCR, manueller Nacharbeit, Servertechnologie. Dabei geht er auch auf die eher hinderlichen Eigenschaften älterer Kataloge ein sowie auf die Präsentation im Web und die Anbindung an vorhandene Opacs. Einem wichtigen Aspekt, nämlich der Beurteilung durch die wichtigste Zielgruppe, die BibliotheksbenutzerInnen, hat Oberhauser eine eigene Feldforschung gewidmet, deren Ergebnisse er im letzten Kapitel eingehend analysiert. Anhänge über die Art der Datenerhebung und Einzelbeschreibung vieler Kataloge runden die Arbeit ab. Insgesamt kann ich die Arbeit nur als die eindrucksvollste Sammlung von Daten, Statistiken und Analysen zum Thema CIPACs bezeichnen, die mir bislang begegnet ist. Auf einen schön herausgearbeiteten Einzelaspekt, nämlich die weitgehende Zersplitterung bei den eingesetzten Softwaresystemen, will ich besonders eingehen: Derzeit können wir grob zwischen Komplettlösungen (eine beauftragte Firma führt als Generalunternehmung sämtliche Aufgaben von der Digitalisierung bis zur Ablieferung der fertigen Anwendung aus) und geteilten Lösungen (die Digitalisierung wird getrennt von der Indexierung und der Softwareerstellung vergeben bzw. im eigenen Hause vorgenommen) unterscheiden. Letztere setzen ein Projektmanagement im Hause voraus. Gerade die Softwareerstellung im eigenen Haus aber kann zu Lösungen führen, die kommerziellen Angeboten keineswegs nachstehen. Schade ist nur, daß die vielfältigen Eigenentwicklungen bislang noch nicht zu Initiativen geführt haben, die, ähnlich wie bei Public Domain Software, eine "optimale", kostengünstige und weithin akzeptierte Softwarelösung zum Ziel haben. Einige kritische Anmerkungen sollen dennoch nicht unerwähnt bleiben. Beispielsweise fehlt eine Differenzierung zwischen "Reiterkarten"-Systemen, d.h. solchen mit Indexierung jeder 20. oder 50. Karte, und Systemen mit vollständiger Indexierung sämtlicher Kartenköpfe, führt doch diese weitreichende Designentscheidung zu erheblichen Kostenverschiebungen zwischen Katalogerstellung und späterer Benutzung. Auch bei den statistischen Auswertungen der Feldforschung hätte ich mir eine feinere Differenzierung nach Typ des CIPAC oder nach Bibliothek gewünscht. So haben beispielsweise mehr als die Hälfte der befragten BenutzerInnen angegeben, die Bedienung des CIPAC sei zunächst schwer verständlich oder seine Benutzung sei zeitaufwendig gewesen. Offen beibt jedoch, ob es Unterschiede zwischen den verschiedenen Realisierungstypen gibt.

Thomas, C.; McDonald, R.H.; McDowell, C.S.: Overview - Repositories by the numbers (2007) 0.00

0.004305112 = product of:
  0.008610224 = sum of:
    0.008610224 = product of:
      0.017220449 = sum of:
        0.017220449 = weight(_text_:c in 1169) [ClassicSimilarity], result of:
          0.017220449 = score(doc=1169,freq=2.0), product of:
            0.1291003 = queryWeight, product of:
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.037426826 = queryNorm
            0.13338815 = fieldWeight in 1169, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.02734375 = fieldNorm(doc=1169)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Foerster, H. von; Müller, A.; Müller, K.H.: Rück- und Vorschauen : Heinz von Foerster im Gespräch mit Albert Müller und Karl H. Müller (2001) 0.00

0.0038031137 = product of:
  0.0076062274 = sum of:
    0.0076062274 = product of:
      0.015212455 = sum of:
        0.015212455 = weight(_text_:22 in 5988) [ClassicSimilarity], result of:
          0.015212455 = score(doc=5988,freq=2.0), product of:
            0.13106237 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.037426826 = queryNorm
            0.116070345 = fieldWeight in 5988, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0234375 = fieldNorm(doc=5988)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 10. 9.2006 17:22:54

Lavoie, B.; Connaway, L.S.; Dempsey, L.: Anatomy of aggregate collections : the example of Google print for libraries (2005) 0.00

0.0038031137 = product of:
  0.0076062274 = sum of:
    0.0076062274 = product of:
      0.015212455 = sum of:
        0.015212455 = weight(_text_:22 in 1184) [ClassicSimilarity], result of:
          0.015212455 = score(doc=1184,freq=2.0), product of:
            0.13106237 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.037426826 = queryNorm
            0.116070345 = fieldWeight in 1184, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0234375 = fieldNorm(doc=1184)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 26.12.2011 14:08:22

Rötzer, F.: Computerspiele verbessern die Aufmerksamkeit : Nach einer Untersuchung von Kognitionswissenschaftlern schulen Shooter-Spiele manche Leistungen der visuellen Aufmerksamkeit (2003) 0.00
```
0.0036900965 = product of:
  0.007380193 = sum of:
    0.007380193 = product of:
      0.014760386 = sum of:
        0.014760386 = weight(_text_:c in 1643) [ClassicSimilarity], result of:
          0.014760386 = score(doc=1643,freq=2.0), product of:
            0.1291003 = queryWeight, product of:
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.037426826 = queryNorm
            0.114332706 = fieldWeight in 1643, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.0234375 = fieldNorm(doc=1643)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Content

Wer stundenlang und täglich vor dem Computer sitzt und spielt, trainiert bestimmte Fähigkeiten (und vernachlässigt andere, die verkümmern, was aber schon sehr viel schwieriger nachzuweisen wäre). Computerspiele erfordern, dass deren Benutzer sich beispielsweise aktiv visuell orientieren müssen - und dies schnell sowie mit anhaltender Konzentration. Zudem muss das Gesehene (oder Gehörte) schnell in Reaktionen umgesetzt werden, was senso-motorische Fähigkeiten, also beispielsweise die Koordination von Auge und Hand, fördert. Das aber war nicht Gegenstand der Studie. Nach den Experimenten der Kognitionswissenschaftler vom Center for Visual Sciences an der University of Rochester, New York, lernen die Computerspieler sogar nicht nur die Bewältigung von bestimmten Aufgaben, sondern können das Gelernte auf andere Aufgaben übertragen, wodurch sie allgemein die visuelle Aufmerksamkeit stärken. Untersucht wurden dabei, wie C. Shawn Green und Daphne Bavellier in [[External Link]] Nature schreiben, Personen zwischen 18 und 23 Jahren, die Action-Spiele wie Grand Theft Auto3, Half-Life, Counter-Strike, 007 oder Spider-Man während des letzten halben Jahres mindestens an vier Tagen in der Woche und mindestens eine Stunde am Tag gespielt haben. Darunter befanden sich allerdings keine Frauen! Die Wissenschaftler hatten keine Studentinnen mit der notwendigen Shooter-Spiele--Praxis finden können. Verglichen wurden die Leistungen in den Tests mit denen von Nichtspielern. Zur Kontrolle mussten Nichtspieler - darunter dann auch Frauen - an 10 aufeinander folgenden Tagen jeweils mindestens eine Stunde sich an Shooter-Spielen trainieren, wodurch sich tatsächlich die visuellen Aufmerksamkeitsleistungen steigerten. Das mag schließlich in der Tat bei manchen Aufgaben hilfreich sein, verbessert aber weder allgemein die Aufmerksamkeit noch andere kognitive Fähigkeiten, die nicht mit der visuellen Orientierung und Reaktion zu tun haben. Computerspieler, die Action-Spiele-Erfahrung haben, besitzen beispielsweise eine höhere Aufmerksamkeitskapazität, die sich weit weniger schnell erschöpft wie bei den Nichtspielern. So haben sie auch nach einer anstrengenden Bewältigung von Aufgaben noch die Fähigkeit, neben der Aufgabe Ablenkungen zu verarbeiten. Sie können sich beispielsweise auch längere Zahlenreihen, die den Versuchspersonen kurz auf dem Bildschirm gezeigt werden, merken. Zudem konnten die Spieler ihre Aufmerksamkeit weitaus besser auch in ungewohnten Situationen auf die Erfassung eines räumlichen Feldes erstrecken als Nichtspieler. Dabei mussten zuerst Objekte in einem dichten Feld identifiziert und dann schnell durch Umschalten der Fokussierung ein weiteres Umfeld erkundet werden. Der Druck, schnell auf mehrere visuelle Reize reagieren zu müssen, fördert, so die Wissenschaftler, die Fähigkeit, Reize über die Zeit hinweg zu verarbeiten und "Flaschenhals"-Situationen der Aufmerksamkeit zu vermeiden. Sie sind auch besser in der Lage, von einer Aufgabe zur nächsten zu springen. Wie die Wissenschaftler selbst feststellen, könnte man natürlich angesichts dieser Ergebnisse einwenden, dass die Fähigkeiten nicht mit der Beschäftigung mit Computerspielen entstehen, sondern dass Menschen, deren visuelle Aufmerksamkeit und senso-motorische Koordination besser ist, sich lieber mit dieser Art von Spielen beschäftigen, weil sie dort auch besser belohnt werden als die Ungeschickten. Aus diesem Grund hat man eine Gruppe von Nichtspielern gebeten, mindesten eine Stunde am Tag während zehn aufeinander folgenden Tagen, "Medal of Honor" zu spielen, während eine Kontrollgruppe "Tetris" bekam. Tetris verlangt ganz andere Leistungen wie ein Shooter-Spiel. Der Benutzer muss seine Aufmerksamkeit zu jeder Zeit auf jeweils ein Objekt richten, während die Aufmerksamkeit der Shooter-Spieler auf den ganzen Raum verteilt sein und ständig mit Unvorgesehenem rechnen muss, das aus irgendeiner Ecke auftaucht. Tetris-Spieler müssten also, wenn Aufmerksamkeit spezifisch von Spieleanforderungen trainiert wird, in den Tests zur visuellen Aufmerksamkeit anders abschneiden.
Crane, G.; Jones, A.: Text, information, knowledge and the evolving record of humanity (2006) 0.00
```
0.0030750802 = product of:
  0.0061501605 = sum of:
    0.0061501605 = product of:
      0.012300321 = sum of:
        0.012300321 = weight(_text_:c in 1182) [ClassicSimilarity], result of:
          0.012300321 = score(doc=1182,freq=2.0), product of:
            0.1291003 = queryWeight, product of:
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.037426826 = queryNorm
            0.09527725 = fieldWeight in 1182, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.01953125 = fieldNorm(doc=1182)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Consider a sentence such as "the current price of tea in China is 35 cents per pound." In a library with millions of books we might find many statements of the above form that we could capture today with relatively simple rules: rather than pursuing every variation of a statement, programs can wait, like predators at a water hole, for their informational prey to reappear in a standard linguistic pattern. We can make inferences from sentences such as "NAME1 born at NAME2 in DATE" that NAME more likely than not represents a person and NAME a place and then convert the statement into a proposition about a person born at a given place and time. The changing price of tea in China, pedestrian birth and death dates, or other basic statements may not be truth and beauty in the Phaedrus, but a digital library that could plot the prices of various commodities in different markets over time, plot the various lifetimes of individuals, or extract and classify many events would be very useful. Services such as the Syllabus Finder1 and H-Bot2 (which Dan Cohen describes elsewhere in this issue of D-Lib) represent examples of information extraction already in use. H-Bot, in particular, builds on our evolving ability to extract information from very large corpora such as the billions of web pages available through the Google API. Aside from identifying higher order statements, however, users also want to search and browse named entities: they want to read about "C. P. E. Bach" rather than his father "Johann Sebastian" or about "Cambridge, Maryland", without hearing about "Cambridge, Massachusetts", Cambridge in the UK or any of the other Cambridges scattered around the world. Named entity identification is a well-established area with an ongoing literature. The Natural Language Processing Research Group at the University of Sheffield has developed its open source Generalized Architecture for Text Engineering (GATE) for years, while IBM's Unstructured Information Analysis and Search (UIMA) is "available as open source software to provide a common foundation for industry and academia." Powerful tools are thus freely available and more demanding users can draw upon published literature to develop their own systems. Major search engines such as Google and Yahoo also integrate increasingly sophisticated tools to categorize and identify places. The software resources are rich and expanding. The reference works on which these systems depend, however, are ill-suited for historical analysis. First, simple gazetteers and similar authority lists quickly grow too big for useful information extraction. They provide us with potential entities against which to match textual references, but existing electronic reference works assume that human readers can use their knowledge of geography and of the immediate context to pick the right Boston from the Bostons in the Getty Thesaurus of Geographic Names (TGN), but, with the crucial exception of geographic location, the TGN records do not provide any machine readable clues: we cannot tell which Bostons are large or small. If we are analyzing a document published in 1818, we cannot filter out those places that did not yet exist or that had different names: "Jefferson Davis" is not the name of a parish in Louisiana (tgn,2000880) or a county in Mississippi (tgn,2001118) until after the Civil War.

Lagoze, C.: Keeping Dublin Core simple : Cross-domain discovery or resource description? (2001) 0.00

0.0030750802 = product of:
  0.0061501605 = sum of:
    0.0061501605 = product of:
      0.012300321 = sum of:
        0.012300321 = weight(_text_:c in 1216) [ClassicSimilarity], result of:
          0.012300321 = score(doc=1216,freq=2.0), product of:
            0.1291003 = queryWeight, product of:
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.037426826 = queryNorm
            0.09527725 = fieldWeight in 1216, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.01953125 = fieldNorm(doc=1216)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Search (117 results, page 6 of 6)

Authors

Languages

Types

Themes