Search (17 results, page 1 of 1)

  • × year_i:[2000 TO 2010}
  • × theme_ss:"Automatisches Indexieren"
  1. Souza, R.R.; Raghavan, K.S.: ¬A methodology for noun phrase-based automatic indexing (2006) 0.02
    0.018278074 = product of:
      0.073112294 = sum of:
        0.073112294 = weight(_text_:digital in 173) [ClassicSimilarity], result of:
          0.073112294 = score(doc=173,freq=4.0), product of:
            0.19770671 = queryWeight, product of:
              3.944552 = idf(docFreq=2326, maxDocs=44218)
              0.050121464 = queryNorm
            0.36980176 = fieldWeight in 173, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.944552 = idf(docFreq=2326, maxDocs=44218)
              0.046875 = fieldNorm(doc=173)
      0.25 = coord(1/4)
    
    Abstract
    The scholarly community is increasingly employing the Web both for publication of scholarly output and for locating and accessing relevant scholarly literature. Organization of this vast body of digital information assumes significance in this context. The sheer volume of digital information to be handled makes traditional indexing and knowledge representation strategies ineffective and impractical. It is, therefore, worth exploring new approaches. An approach being discussed considers the intrinsic semantics of texts of documents. Based on the hypothesis that noun phrases in a text are semantically rich in terms of their ability to represent the subject content of the document, this approach seeks to identify and extract noun phrases instead of single keywords, and use them as descriptors. This paper presents a methodology that has been developed for extracting noun phrases from Portuguese texts. The results of an experiment carried out to test the adequacy of the methodology are also presented.
  2. Medelyan, O.; Witten, I.H.: Domain-independent automatic keyphrase indexing with small training sets (2008) 0.01
    0.012924549 = product of:
      0.051698197 = sum of:
        0.051698197 = weight(_text_:digital in 1871) [ClassicSimilarity], result of:
          0.051698197 = score(doc=1871,freq=2.0), product of:
            0.19770671 = queryWeight, product of:
              3.944552 = idf(docFreq=2326, maxDocs=44218)
              0.050121464 = queryNorm
            0.26148933 = fieldWeight in 1871, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.944552 = idf(docFreq=2326, maxDocs=44218)
              0.046875 = fieldNorm(doc=1871)
      0.25 = coord(1/4)
    
    Abstract
    Keyphrases are widely used in both physical and digital libraries as a brief, but precise, summary of documents. They help organize material based on content, provide thematic access, represent search results, and assist with navigation. Manual assignment is expensive because trained human indexers must reach an understanding of the document and select appropriate descriptors according to defined cataloging rules. We propose a new method that enhances automatic keyphrase extraction by using semantic information about terms and phrases gleaned from a domain-specific thesaurus. The key advantage of the new approach is that it performs well with very little training data. We evaluate it on a large set of manually indexed documents in the domain of agriculture, compare its consistency with a group of six professional indexers, and explore its performance on smaller collections of documents in other domains and of French and Spanish documents.
  3. Hlava, M.M.K.: Automatic indexing : comparing rule-based and statistics-based indexing systems (2005) 0.01
    0.011883841 = product of:
      0.047535364 = sum of:
        0.047535364 = product of:
          0.09507073 = sum of:
            0.09507073 = weight(_text_:22 in 6265) [ClassicSimilarity], result of:
              0.09507073 = score(doc=6265,freq=2.0), product of:
                0.17551683 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050121464 = queryNorm
                0.5416616 = fieldWeight in 6265, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=6265)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Source
    Information outlook. 9(2005) no.8, S.22-23
  4. Jones, S.; Paynter, G.W.: Automatic extractionof document keyphrases for use in digital libraries : evaluations and applications (2002) 0.01
    0.010770457 = product of:
      0.043081827 = sum of:
        0.043081827 = weight(_text_:digital in 601) [ClassicSimilarity], result of:
          0.043081827 = score(doc=601,freq=2.0), product of:
            0.19770671 = queryWeight, product of:
              3.944552 = idf(docFreq=2326, maxDocs=44218)
              0.050121464 = queryNorm
            0.21790776 = fieldWeight in 601, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.944552 = idf(docFreq=2326, maxDocs=44218)
              0.0390625 = fieldNorm(doc=601)
      0.25 = coord(1/4)
    
  5. Hauer, M.: Automatische Indexierung (2000) 0.01
    0.01018615 = product of:
      0.0407446 = sum of:
        0.0407446 = product of:
          0.0814892 = sum of:
            0.0814892 = weight(_text_:22 in 5887) [ClassicSimilarity], result of:
              0.0814892 = score(doc=5887,freq=2.0), product of:
                0.17551683 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050121464 = queryNorm
                0.46428138 = fieldWeight in 5887, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=5887)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Source
    Wissen in Aktion: Wege des Knowledge Managements. 22. Online-Tagung der DGI, Frankfurt am Main, 2.-4.5.2000. Proceedings. Hrsg.: R. Schmidt
  6. Mongin, L.; Fu, Y.Y.; Mostafa, J.: Open Archives data Service prototype and automated subject indexing using D-Lib archive content as a testbed (2003) 0.01
    0.008121594 = product of:
      0.032486375 = sum of:
        0.032486375 = weight(_text_:library in 1167) [ClassicSimilarity], result of:
          0.032486375 = score(doc=1167,freq=4.0), product of:
            0.1317883 = queryWeight, product of:
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.050121464 = queryNorm
            0.24650425 = fieldWeight in 1167, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.046875 = fieldNorm(doc=1167)
      0.25 = coord(1/4)
    
    Abstract
    The Indiana University School of Library and Information Science opened a new research laboratory in January 2003; The Indiana University School of Library and Information Science Information Processing Laboratory [IU IP Lab]. The purpose of the new laboratory is to facilitate collaboration between scientists in the department in the areas of information retrieval (IR) and information visualization (IV) research. The lab has several areas of focus. These include grid and cluster computing, and a standard Java-based software platform to support plug and play research datasets, a selection of standard IR modules and standard IV algorithms. Future development includes software to enable researchers to contribute datasets, IR algorithms, and visualization algorithms into the standard environment. We decided early on to use OAI-PMH as a resource discovery tool because it is consistent with our mission.
  7. Salton, G.: SMART System: 1961-1976 (2009) 0.01
    0.0076571116 = product of:
      0.030628446 = sum of:
        0.030628446 = weight(_text_:library in 3879) [ClassicSimilarity], result of:
          0.030628446 = score(doc=3879,freq=2.0), product of:
            0.1317883 = queryWeight, product of:
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.050121464 = queryNorm
            0.23240642 = fieldWeight in 3879, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.0625 = fieldNorm(doc=3879)
      0.25 = coord(1/4)
    
    Source
    Encyclopedia of library and information sciences. 3rd ed. Ed.: M.J. Bates
  8. Roberts, D.; Souter, C.: ¬The automation of controlled vocabulary subject indexing of medical journal articles (2000) 0.01
    0.0073997467 = product of:
      0.029598987 = sum of:
        0.029598987 = product of:
          0.059197973 = sum of:
            0.059197973 = weight(_text_:project in 711) [ClassicSimilarity], result of:
              0.059197973 = score(doc=711,freq=2.0), product of:
                0.21156175 = queryWeight, product of:
                  4.220981 = idf(docFreq=1764, maxDocs=44218)
                  0.050121464 = queryNorm
                0.27981415 = fieldWeight in 711, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.220981 = idf(docFreq=1764, maxDocs=44218)
                  0.046875 = fieldNorm(doc=711)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    This article discusses the possibility of the automation of sophisticated subject indexing of medical journal articles. Approaches to subject descriptor assignment in information retrieval research are usually either based upon the manual descriptors in the database or generation of search parameters from the text of the article. The principles of the Medline indexing system are described, followed by a summary of a pilot project, based upon the Amed database. The results suggest that a more extended study, based upon Medline, should encompass various components: Extraction of 'concept strings' from titles and abstracts of records, based upon linguistic features characteristic of medical literature. Use of the Unified Medical Language System (UMLS) for identification of controlled vocabulary descriptors. Coordination of descriptors, utilising features of the Medline indexing system. The emphasis should be on system manipulation of data, based upon input, available resources and specifically designed rules.
  9. Lepsky, K.; Vorhauer, J.: Lingo - ein open source System für die Automatische Indexierung deutschsprachiger Dokumente (2006) 0.01
    0.0067907665 = product of:
      0.027163066 = sum of:
        0.027163066 = product of:
          0.054326132 = sum of:
            0.054326132 = weight(_text_:22 in 3581) [ClassicSimilarity], result of:
              0.054326132 = score(doc=3581,freq=2.0), product of:
                0.17551683 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050121464 = queryNorm
                0.30952093 = fieldWeight in 3581, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=3581)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    24. 3.2006 12:22:02
  10. Probst, M.; Mittelbach, J.: Maschinelle Indexierung in der Sacherschließung wissenschaftlicher Bibliotheken (2006) 0.01
    0.0067907665 = product of:
      0.027163066 = sum of:
        0.027163066 = product of:
          0.054326132 = sum of:
            0.054326132 = weight(_text_:22 in 1755) [ClassicSimilarity], result of:
              0.054326132 = score(doc=1755,freq=2.0), product of:
                0.17551683 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050121464 = queryNorm
                0.30952093 = fieldWeight in 1755, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1755)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    22. 3.2008 12:35:19
  11. Renz, M.: Automatische Inhaltserschließung im Zeichen von Wissensmanagement (2001) 0.01
    0.0059419204 = product of:
      0.023767682 = sum of:
        0.023767682 = product of:
          0.047535364 = sum of:
            0.047535364 = weight(_text_:22 in 5671) [ClassicSimilarity], result of:
              0.047535364 = score(doc=5671,freq=2.0), product of:
                0.17551683 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050121464 = queryNorm
                0.2708308 = fieldWeight in 5671, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5671)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    22. 3.2001 13:14:48
  12. Newman, D.J.; Block, S.: Probabilistic topic decomposition of an eighteenth-century American newspaper (2006) 0.01
    0.0059419204 = product of:
      0.023767682 = sum of:
        0.023767682 = product of:
          0.047535364 = sum of:
            0.047535364 = weight(_text_:22 in 5291) [ClassicSimilarity], result of:
              0.047535364 = score(doc=5291,freq=2.0), product of:
                0.17551683 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050121464 = queryNorm
                0.2708308 = fieldWeight in 5291, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5291)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    22. 7.2006 17:32:00
  13. Lorenz, S.: Konzeption und prototypische Realisierung einer begriffsbasierten Texterschließung (2006) 0.01
    0.005093075 = product of:
      0.0203723 = sum of:
        0.0203723 = product of:
          0.0407446 = sum of:
            0.0407446 = weight(_text_:22 in 1746) [ClassicSimilarity], result of:
              0.0407446 = score(doc=1746,freq=2.0), product of:
                0.17551683 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050121464 = queryNorm
                0.23214069 = fieldWeight in 1746, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1746)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    22. 3.2015 9:17:30
  14. Mittelbach, J.; Probst, M.: Möglichkeiten und Grenzen maschineller Indexierung in der Sacherschließung : Strategien für das Bibliothekssystem der Freien Universität Berlin (2006) 0.00
    0.004785695 = product of:
      0.01914278 = sum of:
        0.01914278 = weight(_text_:library in 1411) [ClassicSimilarity], result of:
          0.01914278 = score(doc=1411,freq=2.0), product of:
            0.1317883 = queryWeight, product of:
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.050121464 = queryNorm
            0.14525402 = fieldWeight in 1411, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1411)
      0.25 = coord(1/4)
    
    Abstract
    Automatische Indexierung wird zunehmend als sinnvolle Möglichkeit erkannt, Daten für Informationsretrievalsysteme zu erzeugen und somit die Auffindbarkeit von Do-kumenten zu erhöhen. Die dafür geeigneten Methoden sind seit geraumer Zeit bekannt und umfassen statistische bzw. computerlinguistische Sprachanalysetechniken, die im Gegensatz zur gebräuchlichen Freitextinvertierung entscheidende Vor-teile hinsichtlich des Retrievals bieten. So bilden erst die Wortformenreduzierung und die semantische Zerlegung sowie die Gewichtung der ermittelten Indexterme die Grundlagen für die gezielte sachliche Suche im Online-Katalog. Entsprechende Ver-fahren, die sich für Bibliotheken eignen, stehen seit Mitte der neunziger Jahre auch für den praktischen Einsatz bereit und werden - nicht zuletzt aufgrund steigender Akzeptanz - ständig weiterentwickelt. Dabei geht es nicht nur um die Steigerung der allgemeinen Leistungsfähigkeit von maschinellen Indexierungssystemen, sondern auch um ihre Fähigkeit, die im Bibliothekswesen verfügbare, sehr heterogene Daten-grundlage optimal zu nutzen. Wichtige Kriterien sind zudem eine vertretbare Fehler-quote, die Integrierbarkeit in die Geschäftsgänge und die Darstellbarkeit der anfal-lenden Datenmengen in entsprechenden Datenrepräsentationsmodellen. Im Fokus der Untersuchung stehen die allgemeine Betrachtung der Vor- und Nachteile der beiden gängigen Indexierungssysteme MILOS und intelligentCAPTURE sowie die Möglichkeiten und Grenzen ihres Einsatzes im Bibliothekssystem der Freien Universität Berlin. Diese Veröffentlichung geht zurück auf eine Master-Arbeit im postgradualen Fernstudiengang Master of Arts (Library and Information Science) an der Humboldt-Universität zu Berlin. Online-Version: http://www.ib.hu-berlin.de/~kumlau/handreichungen/h183/
  15. Schneider, A.: Moderne Retrievalverfahren in klassischen bibliotheksbezogenen Anwendungen : Projekte und Perspektiven (2008) 0.00
    0.0038285558 = product of:
      0.015314223 = sum of:
        0.015314223 = weight(_text_:library in 4031) [ClassicSimilarity], result of:
          0.015314223 = score(doc=4031,freq=2.0), product of:
            0.1317883 = queryWeight, product of:
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.050121464 = queryNorm
            0.11620321 = fieldWeight in 4031, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.03125 = fieldNorm(doc=4031)
      0.25 = coord(1/4)
    
    Abstract
    Die vorliegende Arbeit beschäftigt sich mit modernen Retrievalverfahren in klassischen bibliotheksbezogenen Anwendungen. Wie die Verbindung der beiden gegensätzlich scheinenden Wortgruppen im Titel zeigt, werden in der Arbeit Aspekte aus der Informatik bzw. Informationswissenschaft mit Aspekten aus der Bibliothekstradition verknüpft. Nach einer kurzen Schilderung der Ausgangslage, der so genannten Informationsflut, im ersten Kapitel stellt das zweite Kapitel eine Einführung in die Theorie des Information Retrieval dar. Im Einzelnen geht es um die Grundlagen von Information Retrieval und Information-Retrieval-Systemen sowie um die verschiedenen Möglichkeiten der Informationserschließung. Hier werden Formal- und Sacherschließung, Indexierung und automatische Indexierung behandelt. Des Weiteren werden im Rahmen der Theorie des Information Retrieval unterschiedliche Information-Retrieval-Modelle und die Evaluation durch Retrievaltests vorgestellt. Nach der Theorie folgt im dritten Kapitel die Praxis des Information Retrieval. Es werden die organisationsinterne Anwendung, die Anwendung im Informations- und Dokumentationsbereich sowie die Anwendung im Bibliotheksbereich unterschieden. Die organisationsinterne Anwendung wird durch das Beispiel der Datenbank KURS zur Aus- und Weiterbildung veranschaulicht. Die Anwendung im Bibliotheksbereich bezieht sich in erster Linie auf den OPAC als Kompromiss zwischen bibliothekarischer Indexierung und Endnutzeranforderungen und auf seine Anreicherung (sog. Catalogue Enrichment), um das Retrieval zu verbessern. Der Bibliotheksbereich wird ausführlicher behandelt, indem ein Rückblick auf abgeschlossene Projekte zu Informations- und Indexierungssystemen aus den Neunziger Jahren (OSIRIS, MILOS I und II, KASCADE) sowie ein Einblick in aktuelle Projekte gegeben werden. In den beiden folgenden Kapiteln wird je ein aktuelles Projekt zur Verbesserung des Retrievals durch Kataloganreicherung, automatische Erschließung und fortschrittliche Retrievalverfahren präsentiert: das Suchportal dandelon.com und das 180T-Projekt des Hochschulbibliothekszentrums des Landes Nordrhein-Westfalen. Hierbei werden jeweils Projektziel, Projektpartner, Projektorganisation, Projektverlauf und die verwendete Technologie vorgestellt. Die Projekte unterscheiden sich insofern, dass in dem einen Fall eine große Verbundzentrale die Projektkoordination übernimmt, im anderen Fall jede einzelne teilnehmende Bibliothek selbst für die Durchführung verantwortlich ist. Im sechsten und letzten Kapitel geht es um das Fazit und die Perspektiven. Es werden sowohl die beiden beschriebenen Projekte bewertet als auch ein Ausblick auf Entwicklungen bezüglich des Bibliothekskatalogs gegeben. Diese Veröffentlichung geht zurück auf eine Master-Arbeit im postgradualen Fernstudiengang Master of Arts (Library and Information Science) an der Humboldt-Universität zu Berlin.
  16. Nohr, H.: Grundlagen der automatischen Indexierung : ein Lehrbuch (2003) 0.00
    0.0033953832 = product of:
      0.013581533 = sum of:
        0.013581533 = product of:
          0.027163066 = sum of:
            0.027163066 = weight(_text_:22 in 1767) [ClassicSimilarity], result of:
              0.027163066 = score(doc=1767,freq=2.0), product of:
                0.17551683 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050121464 = queryNorm
                0.15476047 = fieldWeight in 1767, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03125 = fieldNorm(doc=1767)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    22. 6.2009 12:46:51
  17. Rasmussen, E.M.: Indexing and retrieval for the Web (2002) 0.00
    0.0033499864 = product of:
      0.013399946 = sum of:
        0.013399946 = weight(_text_:library in 4285) [ClassicSimilarity], result of:
          0.013399946 = score(doc=4285,freq=2.0), product of:
            0.1317883 = queryWeight, product of:
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.050121464 = queryNorm
            0.10167781 = fieldWeight in 4285, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.02734375 = fieldNorm(doc=4285)
      0.25 = coord(1/4)
    
    Abstract
    The introduction and growth of the World Wide Web (WWW, or Web) have resulted in a profound change in the way individuals and organizations access information. In terms of volume, nature, and accessibility, the characteristics of electronic information are significantly different from those of even five or six years ago. Control of, and access to, this flood of information rely heavily an automated techniques for indexing and retrieval. According to Gudivada, Raghavan, Grosky, and Kasanagottu (1997, p. 58), "The ability to search and retrieve information from the Web efficiently and effectively is an enabling technology for realizing its full potential." Almost 93 percent of those surveyed consider the Web an "indispensable" Internet technology, second only to e-mail (Graphie, Visualization & Usability Center, 1998). Although there are other ways of locating information an the Web (browsing or following directory structures), 85 percent of users identify Web pages by means of a search engine (Graphie, Visualization & Usability Center, 1998). A more recent study conducted by the Stanford Institute for the Quantitative Study of Society confirms the finding that searching for information is second only to e-mail as an Internet activity (Nie & Ebring, 2000, online). In fact, Nie and Ebring conclude, "... the Internet today is a giant public library with a decidedly commercial tilt. The most widespread use of the Internet today is as an information search utility for products, travel, hobbies, and general information. Virtually all users interviewed responded that they engaged in one or more of these information gathering activities."