Search (7 results, page 1 of 1)

Bredack, J.: Terminologieextraktion von Mehrwortgruppen in kunsthistorischen Fachtexten (2013) 0.03
```
0.026695073 = product of:
  0.04004261 = sum of:
    0.031532075 = weight(_text_:im in 1054) [ClassicSimilarity], result of:
      0.031532075 = score(doc=1054,freq=8.0), product of:
        0.1442303 = queryWeight, product of:
          2.8267863 = idf(docFreq=7115, maxDocs=44218)
          0.051022716 = queryNorm
        0.2186231 = fieldWeight in 1054, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          2.8267863 = idf(docFreq=7115, maxDocs=44218)
          0.02734375 = fieldNorm(doc=1054)
    0.008510532 = product of:
      0.025531596 = sum of:
        0.025531596 = weight(_text_:retrieval in 1054) [ClassicSimilarity], result of:
          0.025531596 = score(doc=1054,freq=4.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.16542503 = fieldWeight in 1054, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.02734375 = fieldNorm(doc=1054)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)
```
Abstract

Mit Hilfe eines algorithmisch arbeitenden Verfahrens können Mehrwortgruppen aus elektronisch vorliegenden Texten identifiziert und extrahiert werden. Als Datengrundlage für diese Arbeit dienen kunsthistorische Lexikonartikel des Reallexikons zur Deutschen Kunstgeschichte. Die linguistisch, wörterbuchbasierte Open-Source-Software Lingo wurde in dieser Studie genutzt. Mit Lingo ist es möglich, auf Basis erstellter Wortmuster, bestimmte Wortfolgen aus elektronisch vorliegenden Daten algorithmisch zu identifizieren und zu extrahieren. Die erstellten Wortmuster basieren auf Wortklassen, mit denen die lexikalisierten Einträge in den Wörterbüchern getaggt sind und dadurch näher definiert werden. So wurden individuelle Wortklassen für Fachterminologie, Eigennamen, oder Adjektive vergeben. In der vorliegenden Arbeit werden zusätzlich Funktionswörter in die Musterbildung mit einbezogen. Dafür wurden neue Wortklassen definiert. Funktionswörter bestimmen Artikel, Konjunktionen und Präpositionen. Ziel war es fachterminologische Mehrwortgruppen mit kunsthistorischen Inhalten zu extrahieren unter der gezielten Einbindung von Funktionswörtern. Anhand selbst gebildeter Kriterien, wurden die extrahierten Mehrwortgruppen qualitativ analysiert. Es konnte festgestellt werden, dass die Verwendung von Funktionswörtern fachterminologische Mehrwortgruppen erzeugt, die als potentielle Indexterme weitere Verwendung im Information Retrieval finden können.
Mehrwortgruppen sind als lexikalische Einheit zu betrachten und bestehen aus mindestens zwei miteinander in Verbindung stehenden Begriffen. Durch die Ver-bindung mehrerer Fachwörter transportieren sie in Fachtexten aussagekräftige Informationen. Sie vermitteln eindeutige Informationen, da aus den resultierenden Beziehungen zwischen den in Verbindung stehenden Fachbegriffen die inhaltliche Bedeutung eines Fachtextes ersichtlich wird. Demzufolge ist es sinnvoll, Mehrwort-gruppen aus Fachtexten zu extrahieren, da diese die Inhalte eindeutig repräsentieren. So können Mehrwortgruppen für eine inhaltliche Erschließung genutzt und beispiels-weise als Indexterme im Information Retrieval bereitgestellt werden. Mehrwortgruppen enthalten Informationen eines Textes, die in natürlicher Sprache vorliegen. Zur Extraktion von Informationen eines elektronisch vorliegenden Textes kommen maschinelle Verfahren zum Einsatz, da Sprache Strukturen aufweist, die maschinell verarbeitet werden können. Eine mögliche Methode Mehrwortgruppen innerhalb von elektronisch vorliegenden Fachtexten zu identifizieren und extrahieren ist ein algorithmisches Verfahren. Diese Methode erkennt Wortfolgen durch das Bilden von Wortmustern, aus denen sich eine Mehrwortgruppe in einem Text zusammensetzt. Die Wortmuster repräsentieren somit die einzelnen Bestandteile einer Mehrwortgruppe. Bereits an mathematischen Fachtexten wurde dieses Verfahren untersucht und analysiert. Relevante Mehrwortgruppen, die ein mathematisches Konzept oder mathe-matischen Inhalt repräsentierten, konnten erfolgreich extrahiert werden. Zum Einsatz kam das Indexierungssystem Lingo, mit dessen Programmodul sequencer eine algorithmische Identifizierung und Extraktion von Mehrwortgruppen möglich ist. In der vorliegenden Arbeit wird dieses algorithmische Verfahren unter Einsatz der Software Lingo genutzt, um Mehrwortgruppen aus kunsthistorischen Fachtexten zu extrahieren. Als Datenquelle dienen kunsthistorische Lexikonartikel aus dem Reallexikon zur Deutschen Kunstgeschichte, welches in deutscher Sprache vorliegt. Es wird untersucht, ob positive Ergebnisse im Sinne von fachterminologischen Mehrwort-gruppen mit kunsthistorischen Inhalten erzeugt werden können. Dabei soll zusätzlich die Einbindung von Funktionswörtern innerhalb einer Mehrwortgruppe erfolgen. Funktionswörter definieren Artikel, Konjunktionen und Präpositionen, die für sich alleinstehend keine inhaltstragende Bedeutung besitzen, allerdings innerhalb einer Mehrwortgruppe syntaktische Funktionen erfüllen. Anhand der daraus resultierenden Ergebnisse wird analysiert, ob das Hinzufügen von Funktionswörtern innerhalb einer Mehrwortgruppe zu positiven Ergebnissen führt. Ziel soll es demnach sein, fach-terminologische Mehrwortgruppen mit kunsthistorischen Inhalten zu erzeugen, unter Einbindung von Funktionswörtern. Bei der Extraktion fachterminologischer Mehrwortgruppen wird im Folgenden insbesondere auf die Erstellung von Wortmustern eingegangen, da diese die Basis liefern, mit welchen das Programmmodul sequencer Wortfolgen innerhalb der kunst-historischen Lexikonartikel identifiziert. Eine Einordung der Indexierungsergebnisse erfolgt anhand selbst gebildeter Kriterien, die definieren, was unter einer fach-terminologischen Mehrwortgruppe zu verstehen ist.

Weckend, E.: Anwenders Ideal : Forderungen der entstehenden Information Community (1995) 0.01

0.008158875 = product of:
  0.024476623 = sum of:
    0.024476623 = product of:
      0.07342987 = sum of:
        0.07342987 = weight(_text_:online in 2326) [ClassicSimilarity], result of:
          0.07342987 = score(doc=2326,freq=4.0), product of:
            0.1548489 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.051022716 = queryNorm
            0.47420335 = fieldWeight in 2326, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.078125 = fieldNorm(doc=2326)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Abstract: Die Nutzung von Online-Datenbanken war bis vor kurzem einem relativ kleinen Kreis von Spezialisten überlassen. Online ist heute jedoch für viele bereits zum Schlagwort einer neuen Kompetenz geworden, die die Bedeutung der elektronischen Informationsgewinnung als selbstverständliche Grundlage einer zeitgemäßen Entscheidungsfindung erkenntn und nutzt

Graphic details : a scientific study of the importance of diagrams to science (2016) 0.01
```
0.0080701 = product of:
  0.0242103 = sum of:
    0.0242103 = product of:
      0.03631545 = sum of:
        0.015576826 = weight(_text_:online in 3035) [ClassicSimilarity], result of:
          0.015576826 = score(doc=3035,freq=2.0), product of:
            0.1548489 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.051022716 = queryNorm
            0.100593716 = fieldWeight in 3035, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.0234375 = fieldNorm(doc=3035)
        0.02073862 = weight(_text_:22 in 3035) [ClassicSimilarity], result of:
          0.02073862 = score(doc=3035,freq=2.0), product of:
            0.17867287 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051022716 = queryNorm
            0.116070345 = fieldWeight in 3035, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0234375 = fieldNorm(doc=3035)
      0.6666667 = coord(2/3)
  0.33333334 = coord(1/3)
```
Content

Bill Howe and his colleagues at the University of Washington, in Seattle, decided to find out. First, they trained a computer algorithm to distinguish between various sorts of figures-which they defined as diagrams, equations, photographs, plots (such as bar charts and scatter graphs) and tables. They exposed their algorithm to between 400 and 600 images of each of these types of figure until it could distinguish them with an accuracy greater than 90%. Then they set it loose on the more-than-650,000 papers (containing more than 10m figures) stored on PubMed Central, an online archive of biomedical-research articles. To measure each paper's influence, they calculated its article-level Eigenfactor score-a modified version of the PageRank algorithm Google uses to provide the most relevant results for internet searches. Eigenfactor scoring gives a better measure than simply noting the number of times a paper is cited elsewhere, because it weights citations by their influence. A citation in a paper that is itself highly cited is worth more than one in a paper that is not.
As the team describe in a paper posted (http://arxiv.org/abs/1605.04951) on arXiv, they found that figures did indeed matter-but not all in the same way. An average paper in PubMed Central has about one diagram for every three pages and gets 1.67 citations. Papers with more diagrams per page and, to a lesser extent, plots per page tended to be more influential (on average, a paper accrued two more citations for every extra diagram per page, and one more for every extra plot per page). By contrast, including photographs and equations seemed to decrease the chances of a paper being cited by others. That agrees with a study from 2012, whose authors counted (by hand) the number of mathematical expressions in over 600 biology papers and found that each additional equation per page reduced the number of citations a paper received by 22%. This does not mean that researchers should rush to include more diagrams in their next paper. Dr Howe has not shown what is behind the effect, which may merely be one of correlation, rather than causation. It could, for example, be that papers with lots of diagrams tend to be those that illustrate new concepts, and thus start a whole new field of inquiry. Such papers will certainly be cited a lot. On the other hand, the presence of equations really might reduce citations. Biologists (as are most of those who write and read the papers in PubMed Central) are notoriously mathsaverse. If that is the case, looking in a physics archive would probably produce a different result.

Leuenberger, M.; Stettler, N.; Grossmann, S.; Herget, J.: Combining different access options for image databases (2006) 0.01

0.005673688 = product of:
  0.017021064 = sum of:
    0.017021064 = product of:
      0.05106319 = sum of:
        0.05106319 = weight(_text_:retrieval in 6106) [ClassicSimilarity], result of:
          0.05106319 = score(doc=6106,freq=4.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.33085006 = fieldWeight in 6106, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6106)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Abstract: Living Memory is an interdisciplinary project running for two years, which is realised in cooperation of several institutions. It aims at developing an information system for a digital collection of different types of visual resources and will combine classical methods of image indexing and retrieval with innovative approaches like content-based image retrieval and the use of topic maps for semantic searching and browsing. This work-in-progress-report outlines the aims of the project and present first results after the period of fifteen months.

Kellsey, C.: Cataloging with Bibliofile : alternative to the bibliographic utilities for small college libraries (1998) 0.00
```
0.004038437 = product of:
  0.01211531 = sum of:
    0.01211531 = product of:
      0.03634593 = sum of:
        0.03634593 = weight(_text_:online in 5177) [ClassicSimilarity], result of:
          0.03634593 = score(doc=5177,freq=2.0), product of:
            0.1548489 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.051022716 = queryNorm
            0.23471867 = fieldWeight in 5177, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5177)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)
```
Abstract

Bibliofile is a CD-ROM cataloguing product that provides LC MARC records. Available databases include English only, foreign language materials, audio-visual materials, as well as several that are more specialized. Bibliofile runs on a PC that may be connected to a network. Advantages over an online utility include lower cost, no telecommunication problems, no slow response times, fixed subscription rates with no hourly use charges, easy installation, searching and editing and good phone support. Disadvantages include no member-contributed records and no member holdings to use for interlibrary loan. A library should consider type and level of materials catalogued, existence of an interface with a local OPAC, total cataloguing time used, and other sources for ILL searching when considering bibliofile as a cataloguing alternative

McCallum, S.H.: ¬A look at new information retrieval protocols : SRU, OpenSearch/A9, CQL, and XQuery (2006) 0.00

0.0034387745 = product of:
  0.0103163235 = sum of:
    0.0103163235 = product of:
      0.03094897 = sum of:
        0.03094897 = weight(_text_:retrieval in 6108) [ClassicSimilarity], result of:
          0.03094897 = score(doc=6108,freq=2.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.20052543 = fieldWeight in 6108, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=6108)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Patton, G.; Hengel-Dittrich, C.; O'Neill, E.T.; Tillett, B.B.: VIAF (Virtual International Authority File) : Linking Die Deutsche Bibliothek and Library of Congress Name Authority Files (2006) 0.00
```
0.0028845975 = product of:
  0.008653793 = sum of:
    0.008653793 = product of:
      0.025961377 = sum of:
        0.025961377 = weight(_text_:online in 6105) [ClassicSimilarity], result of:
          0.025961377 = score(doc=6105,freq=2.0), product of:
            0.1548489 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.051022716 = queryNorm
            0.16765618 = fieldWeight in 6105, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.0390625 = fieldNorm(doc=6105)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)
```
Abstract

Die Deutsche Bibliothek, the Library of Congress, and OCLC Online Computer Library Center are jointly developing a virtual international authority file (VIAF) for personal names which links authority records from the world's national bibliographic agencies and will be made freely available on the Web. The goals of the project are to prove the viability of automatically linking authority records from different national authority files and to demonstrate its benefits. The authority and bibliographic files from the Library of Congress and Die Deutsche Bibliothek were used to create the initial VIAF which contains over six million names with over a half million links. A key aspect of the project was the development of automated name matching algorithms which use information from both authority records and the corresponding bibliographic records. The practicality of algorithmically linking the personal names between national authority files was demonstrated; seventy percent of the authority records for personal names common to both files were automatically linked with an error rate of less than one percent. The long-term goal of the VIAF project is to combine the authoritative names from many national libraries and other significant sources into a shared global authority service.

Search (7 results, page 1 of 1)

Authors

Years

Types

Themes