Search (6 results, page 1 of 1)

  • × language_ss:"e"
  • × theme_ss:"Automatisches Indexieren"
  • × type_ss:"el"
  1. Tavakolizadeh-Ravari, M.: Analysis of the long term dynamics in thesaurus developments and its consequences (2017) 0.01
    0.006468881 = product of:
      0.032344405 = sum of:
        0.032344405 = weight(_text_:und in 3081) [ClassicSimilarity], result of:
          0.032344405 = score(doc=3081,freq=20.0), product of:
            0.10442211 = queryWeight, product of:
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.047114085 = queryNorm
            0.3097467 = fieldWeight in 3081, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.03125 = fieldNorm(doc=3081)
      0.2 = coord(1/5)
    
    Abstract
    Die Arbeit analysiert die dynamische Entwicklung und den Gebrauch von Thesaurusbegriffen. Zusätzlich konzentriert sie sich auf die Faktoren, die die Zahl von Indexbegriffen pro Dokument oder Zeitschrift beeinflussen. Als Untersuchungsobjekt dienten der MeSH und die entsprechende Datenbank "MEDLINE". Die wichtigsten Konsequenzen sind: 1. Der MeSH-Thesaurus hat sich durch drei unterschiedliche Phasen jeweils logarithmisch entwickelt. Solch einen Thesaurus sollte folgenden Gleichung folgen: "T = 3.076,6 Ln (d) - 22.695 + 0,0039d" (T = Begriffe, Ln = natürlicher Logarithmus und d = Dokumente). Um solch einen Thesaurus zu konstruieren, muss man demnach etwa 1.600 Dokumente von unterschiedlichen Themen des Bereiches des Thesaurus haben. Die dynamische Entwicklung von Thesauri wie MeSH erfordert die Einführung eines neuen Begriffs pro Indexierung von 256 neuen Dokumenten. 2. Die Verteilung der Thesaurusbegriffe erbrachte drei Kategorien: starke, normale und selten verwendete Headings. Die letzte Gruppe ist in einer Testphase, während in der ersten und zweiten Kategorie die neu hinzukommenden Deskriptoren zu einem Thesauruswachstum führen. 3. Es gibt ein logarithmisches Verhältnis zwischen der Zahl von Index-Begriffen pro Aufsatz und dessen Seitenzahl für die Artikeln zwischen einer und einundzwanzig Seiten. 4. Zeitschriftenaufsätze, die in MEDLINE mit Abstracts erscheinen erhalten fast zwei Deskriptoren mehr. 5. Die Findablity der nicht-englisch sprachigen Dokumente in MEDLINE ist geringer als die englische Dokumente. 6. Aufsätze der Zeitschriften mit einem Impact Factor 0 bis fünfzehn erhalten nicht mehr Indexbegriffe als die der anderen von MEDINE erfassten Zeitschriften. 7. In einem Indexierungssystem haben unterschiedliche Zeitschriften mehr oder weniger Gewicht in ihrem Findability. Die Verteilung der Indexbegriffe pro Seite hat gezeigt, dass es bei MEDLINE drei Kategorien der Publikationen gibt. Außerdem gibt es wenige stark bevorzugten Zeitschriften."
    Footnote
    Dissertation, Humboldt-Universität zu Berlin - Institut für Bibliotheks- und Informationswissenschaft.
    Imprint
    Berlin : Humboldt-Universität zu Berlin / Institut für Bibliotheks- und Informationswissenschaft
    Theme
    Konzeption und Anwendung des Prinzips Thesaurus
  2. Mongin, L.; Fu, Y.Y.; Mostafa, J.: Open Archives data Service prototype and automated subject indexing using D-Lib archive content as a testbed (2003) 0.00
    0.0043044016 = product of:
      0.021522008 = sum of:
        0.021522008 = weight(_text_:information in 1167) [ClassicSimilarity], result of:
          0.021522008 = score(doc=1167,freq=10.0), product of:
            0.08270773 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047114085 = queryNorm
            0.2602176 = fieldWeight in 1167, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=1167)
      0.2 = coord(1/5)
    
    Abstract
    The Indiana University School of Library and Information Science opened a new research laboratory in January 2003; The Indiana University School of Library and Information Science Information Processing Laboratory [IU IP Lab]. The purpose of the new laboratory is to facilitate collaboration between scientists in the department in the areas of information retrieval (IR) and information visualization (IV) research. The lab has several areas of focus. These include grid and cluster computing, and a standard Java-based software platform to support plug and play research datasets, a selection of standard IR modules and standard IV algorithms. Future development includes software to enable researchers to contribute datasets, IR algorithms, and visualization algorithms into the standard environment. We decided early on to use OAI-PMH as a resource discovery tool because it is consistent with our mission.
  3. Karpathy, A.; Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions (2015) 0.00
    0.0030684597 = product of:
      0.015342298 = sum of:
        0.015342298 = weight(_text_:und in 1868) [ClassicSimilarity], result of:
          0.015342298 = score(doc=1868,freq=2.0), product of:
            0.10442211 = queryWeight, product of:
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.047114085 = queryNorm
            0.14692576 = fieldWeight in 1868, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.046875 = fieldNorm(doc=1868)
      0.2 = coord(1/5)
    
    Content
    Vgl. auch: http://cs.stanford.edu/people/karpathy/cvpr2015.pdf und http://cs.stanford.edu/people/karpathy/deepimagesent/. Vgl. auch: https://news.ycombinator.com/item?id=8621658.
  4. Gödert, W.: Detecting multiword phrases in mathematical text corpora (2012) 0.00
    0.0025666493 = product of:
      0.012833246 = sum of:
        0.012833246 = weight(_text_:information in 466) [ClassicSimilarity], result of:
          0.012833246 = score(doc=466,freq=2.0), product of:
            0.08270773 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047114085 = queryNorm
            0.1551638 = fieldWeight in 466, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0625 = fieldNorm(doc=466)
      0.2 = coord(1/5)
    
    Abstract
    We present an approach for detecting multiword phrases in mathematical text corpora. The method used is based on characteristic features of mathematical terminology. It makes use of a software tool named Lingo which allows to identify words by means of previously defined dictionaries for specific word classes as adjectives, personal names or nouns. The detection of multiword groups is done algorithmically. Possible advantages of the method for indexing and information retrieval and conclusions for applying dictionary-based methods of automatic indexing instead of stemming procedures are discussed.
  5. Search Engines and Beyond : Developing efficient knowledge management systems, April 19-20 1999, Boston, Mass (1999) 0.00
    0.001814895 = product of:
      0.009074475 = sum of:
        0.009074475 = weight(_text_:information in 2596) [ClassicSimilarity], result of:
          0.009074475 = score(doc=2596,freq=4.0), product of:
            0.08270773 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047114085 = queryNorm
            0.10971737 = fieldWeight in 2596, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.03125 = fieldNorm(doc=2596)
      0.2 = coord(1/5)
    
    Content
    Ramana Rao (Inxight, Palo Alto, CA) 7 ± 2 Insights on achieving Effective Information Access Session One: Updates and a twelve month perspective Danny Sullivan (Search Engine Watch, US / England) Portalization and other search trends Carol Tenopir (University of Tennessee) Search realities faced by end users and professional searchers Session Two: Today's search engines and beyond Daniel Hoogterp (Retrieval Technologies, McLean, VA) Effective presentation and utilization of search techniques Rick Kenny (Fulcrum Technologies, Ontario, Canada) Beyond document clustering: The knowledge impact statement Gary Stock (Ingenius, Kalamazoo, MI) Automated change monitoring Gary Culliss (Direct Hit, Wellesley Hills, MA) User popularity ranked search engines Byron Dom (IBM, CA) Automatically finding the best pages on the World Wide Web (CLEVER) Peter Tomassi (LookSmart, San Francisco, CA) Adding human intellect to search technology Session Three: Panel discussion: Human v automated categorization and editing Ev Brenner (New York, NY)- Chairman James Callan (University of Massachusetts, MA) Marc Krellenstein (Northern Light Technology, Cambridge, MA) Dan Miller (Ask Jeeves, Berkeley, CA) Session Four: Updates and a twelve month perspective Steve Arnold (AIT, Harrods Creek, KY) Review: The leading edge in search and retrieval software Ellen Voorhees (NIST, Gaithersburg, MD) TREC update Session Five: Search engines now and beyond Intelligent Agents John Snyder (Muscat, Cambridge, England) Practical issues behind intelligent agents Text summarization Therese Firmin, (Dept of Defense, Ft George G. Meade, MD) The TIPSTER/SUMMAC evaluation of automatic text summarization systems Cross language searching Elizabeth Liddy (TextWise, Syracuse, NY) A conceptual interlingua approach to cross-language retrieval. Video search and retrieval Armon Amir (IBM, Almaden, CA) CueVideo: Modular system for automatic indexing and browsing of video/audio Speech recognition Michael Witbrock (Lycos, Waltham, MA) Retrieval of spoken documents Visualization James A. Wise (Integral Visuals, Richland, WA) Information visualization in the new millennium: Emerging science or passing fashion? Text mining David Evans (Claritech, Pittsburgh, PA) Text mining - towards decision support
  6. Toepfer, M.; Seifert, C.: Content-based quality estimation for automatic subject indexing of short texts under precision and recall constraints 0.00
    0.0016041556 = product of:
      0.008020778 = sum of:
        0.008020778 = weight(_text_:information in 4309) [ClassicSimilarity], result of:
          0.008020778 = score(doc=4309,freq=2.0), product of:
            0.08270773 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047114085 = queryNorm
            0.09697737 = fieldWeight in 4309, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4309)
      0.2 = coord(1/5)
    
    Abstract
    Semantic annotations have to satisfy quality constraints to be useful for digital libraries, which is particularly challenging on large and diverse datasets. Confidence scores of multi-label classification methods typically refer only to the relevance of particular subjects, disregarding indicators of insufficient content representation at the document-level. Therefore, we propose a novel approach that detects documents rather than concepts where quality criteria are met. Our approach uses a deep, multi-layered regression architecture, which comprises a variety of content-based indicators. We evaluated multiple configurations using text collections from law and economics, where the available content is restricted to very short texts. Notably, we demonstrate that the proposed quality estimation technique can determine subsets of the previously unseen data where considerable gains in document-level recall can be achieved, while upholding precision at the same time. Hence, the approach effectively performs a filtering that ensures high data quality standards in operative information retrieval systems.