Search (11 results, page 1 of 1)

Junger, U.: Can indexing be automated? : the example of the Deutsche Nationalbibliothek (2012) 0.07

0.07221276 = product of:
  0.10831914 = sum of:
    0.06772823 = weight(_text_:semantic in 1717) [ClassicSimilarity], result of:
      0.06772823 = score(doc=1717,freq=2.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.32156807 = fieldWeight in 1717, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1717)
    0.040590912 = product of:
      0.081181824 = sum of:
        0.081181824 = weight(_text_:indexing in 1717) [ClassicSimilarity], result of:
          0.081181824 = score(doc=1717,freq=4.0), product of:
            0.19390269 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.050655533 = queryNorm
            0.41867304 = fieldWeight in 1717, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1717)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: The German subject headings authority file (Schlagwortnormdatei/SWD) provides a broad controlled vocabulary for indexing documents of all subjects. Traditionally used for intellectual subject cataloguing primarily of books the Deutsche Nationalbibliothek (DNB, German National Library) has been working on developping and implementing procedures for automated assignment of subject headings for online publications. This project, its results and problems are sketched in the paper.
Content: Beitrag für die Tagung: Beyond libraries - subject metadata in the digital environment and semantic web. IFLA Satellite Post-Conference, 17-18 August 2012, Tallinn. Vgl.: http://http://www.nlib.ee/index.php?id=17763.

Toepfer, M.; Seifert, C.: Content-based quality estimation for automatic subject indexing of short texts under precision and recall constraints 0.05
```
0.04591921 = product of:
  0.068878815 = sum of:
    0.048377305 = weight(_text_:semantic in 4309) [ClassicSimilarity], result of:
      0.048377305 = score(doc=4309,freq=2.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.22969149 = fieldWeight in 4309, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4309)
    0.020501507 = product of:
      0.041003015 = sum of:
        0.041003015 = weight(_text_:indexing in 4309) [ClassicSimilarity], result of:
          0.041003015 = score(doc=4309,freq=2.0), product of:
            0.19390269 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.050655533 = queryNorm
            0.21146181 = fieldWeight in 4309, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4309)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Semantic annotations have to satisfy quality constraints to be useful for digital libraries, which is particularly challenging on large and diverse datasets. Confidence scores of multi-label classification methods typically refer only to the relevance of particular subjects, disregarding indicators of insufficient content representation at the document-level. Therefore, we propose a novel approach that detects documents rather than concepts where quality criteria are met. Our approach uses a deep, multi-layered regression architecture, which comprises a variety of content-based indicators. We evaluated multiple configurations using text collections from law and economics, where the available content is restricted to very short texts. Notably, we demonstrate that the proposed quality estimation technique can determine subsets of the previously unseen data where considerable gains in document-level recall can be achieved, while upholding precision at the same time. Hence, the approach effectively performs a filtering that ensures high data quality standards in operative information retrieval systems.
Gábor, K.; Zargayouna, H.; Tellier, I.; Buscaldi, D.; Charnois, T.: ¬A typology of semantic relations dedicated to scientific literature analysis (2016) 0.03
```
0.031927396 = product of:
  0.09578218 = sum of:
    0.09578218 = weight(_text_:semantic in 2933) [ClassicSimilarity], result of:
      0.09578218 = score(doc=2933,freq=4.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.45476598 = fieldWeight in 2933, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2933)
  0.33333334 = coord(1/3)
```
Abstract

We propose a method for improving access to scientific literature by analyzing the content of research papers beyond citation links and topic tracking. Our model relies on a typology of explicit semantic relations. These relations are instantiated in the abstract/introduction part of the papers and can be identified automatically using textual data and external ontologies. Preliminary results show a promising precision in unsupervised relationship classification.

Karpathy, A.; Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions (2015) 0.02

0.019350924 = product of:
  0.058052767 = sum of:
    0.058052767 = weight(_text_:semantic in 1868) [ClassicSimilarity], result of:
      0.058052767 = score(doc=1868,freq=2.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.2756298 = fieldWeight in 1868, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.046875 = fieldNorm(doc=1868)
  0.33333334 = coord(1/3)

Kiros, R.; Salakhutdinov, R.; Zemel, R.S.: Unifying visual-semantic embeddings with multimodal neural language models (2014) 0.02

0.019350924 = product of:
  0.058052767 = sum of:
    0.058052767 = weight(_text_:semantic in 1871) [ClassicSimilarity], result of:
      0.058052767 = score(doc=1871,freq=2.0), product of:
        0.21061863 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.050655533 = queryNorm
        0.2756298 = fieldWeight in 1871, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.046875 = fieldNorm(doc=1871)
  0.33333334 = coord(1/3)

Gödert, W.: Detecting multiword phrases in mathematical text corpora (2012) 0.02

0.015463205 = product of:
  0.046389613 = sum of:
    0.046389613 = product of:
      0.09277923 = sum of:
        0.09277923 = weight(_text_:indexing in 466) [ClassicSimilarity], result of:
          0.09277923 = score(doc=466,freq=4.0), product of:
            0.19390269 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.050655533 = queryNorm
            0.47848347 = fieldWeight in 466, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0625 = fieldNorm(doc=466)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Abstract: We present an approach for detecting multiword phrases in mathematical text corpora. The method used is based on characteristic features of mathematical terminology. It makes use of a software tool named Lingo which allows to identify words by means of previously defined dictionaries for specific word classes as adjectives, personal names or nouns. The detection of multiword groups is done algorithmically. Possible advantages of the method for indexing and information retrieval and conclusions for applying dictionary-based methods of automatic indexing instead of stemming procedures are discussed.

Suominen, O.; Koskenniemi, I.: Annif Analyzer Shootout : comparing text lemmatization methods for automated subject indexing (2022) 0.01
```
0.011836551 = product of:
  0.035509653 = sum of:
    0.035509653 = product of:
      0.07101931 = sum of:
        0.07101931 = weight(_text_:indexing in 658) [ClassicSimilarity], result of:
          0.07101931 = score(doc=658,freq=6.0), product of:
            0.19390269 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.050655533 = queryNorm
            0.3662626 = fieldWeight in 658, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0390625 = fieldNorm(doc=658)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

Automated text classification is an important function for many AI systems relevant to libraries, including automated subject indexing and classification. When implemented using the traditional natural language processing (NLP) paradigm, one key part of the process is the normalization of words using stemming or lemmatization, which reduces the amount of linguistic variation and often improves the quality of classification. In this paper, we compare the output of seven different text lemmatization algorithms as well as two baseline methods. We measure how the choice of method affects the quality of text classification using example corpora in three languages. The experiments have been performed using the open source Annif toolkit for automated subject indexing and classification, but should generalize also to other NLP toolkits and similar text classification tasks. The results show that lemmatization methods in most cases outperform baseline methods in text classification particularly for Finnish and Swedish text, but not English, where baseline methods are most effective. The differences between lemmatization methods are quite small. The systematic comparison will help optimize text classification pipelines and inform the further development of the Annif toolkit to incorporate a wider choice of normalization methods.

Mongin, L.; Fu, Y.Y.; Mostafa, J.: Open Archives data Service prototype and automated subject indexing using D-Lib archive content as a testbed (2003) 0.01

0.008200604 = product of:
  0.02460181 = sum of:
    0.02460181 = product of:
      0.04920362 = sum of:
        0.04920362 = weight(_text_:indexing in 1167) [ClassicSimilarity], result of:
          0.04920362 = score(doc=1167,freq=2.0), product of:
            0.19390269 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.050655533 = queryNorm
            0.2537542 = fieldWeight in 1167, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.046875 = fieldNorm(doc=1167)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Husevag, A.-S.R.: Named entities in indexing : a case study of TV subtitles and metadata records (2016) 0.01

0.006833836 = product of:
  0.020501507 = sum of:
    0.020501507 = product of:
      0.041003015 = sum of:
        0.041003015 = weight(_text_:indexing in 3105) [ClassicSimilarity], result of:
          0.041003015 = score(doc=3105,freq=2.0), product of:
            0.19390269 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.050655533 = queryNorm
            0.21146181 = fieldWeight in 3105, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3105)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Junger, U.; Schwens, U.: ¬Die inhaltliche Erschließung des schriftlichen kulturellen Erbes auf dem Weg in die Zukunft : Automatische Vergabe von Schlagwörtern in der Deutschen Nationalbibliothek (2017) 0.01

0.0057192715 = product of:
  0.017157814 = sum of:
    0.017157814 = product of:
      0.034315627 = sum of:
        0.034315627 = weight(_text_:22 in 3780) [ClassicSimilarity], result of:
          0.034315627 = score(doc=3780,freq=2.0), product of:
            0.17738704 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050655533 = queryNorm
            0.19345059 = fieldWeight in 3780, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3780)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 19. 8.2017 9:24:22

Search Engines and Beyond : Developing efficient knowledge management systems, April 19-20 1999, Boston, Mass (1999) 0.01
```
0.0054670684 = product of:
  0.016401205 = sum of:
    0.016401205 = product of:
      0.03280241 = sum of:
        0.03280241 = weight(_text_:indexing in 2596) [ClassicSimilarity], result of:
          0.03280241 = score(doc=2596,freq=2.0), product of:
            0.19390269 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.050655533 = queryNorm
            0.16916946 = fieldWeight in 2596, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.03125 = fieldNorm(doc=2596)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Content

Ramana Rao (Inxight, Palo Alto, CA) 7 ± 2 Insights on achieving Effective Information Access Session One: Updates and a twelve month perspective Danny Sullivan (Search Engine Watch, US / England) Portalization and other search trends Carol Tenopir (University of Tennessee) Search realities faced by end users and professional searchers Session Two: Today's search engines and beyond Daniel Hoogterp (Retrieval Technologies, McLean, VA) Effective presentation and utilization of search techniques Rick Kenny (Fulcrum Technologies, Ontario, Canada) Beyond document clustering: The knowledge impact statement Gary Stock (Ingenius, Kalamazoo, MI) Automated change monitoring Gary Culliss (Direct Hit, Wellesley Hills, MA) User popularity ranked search engines Byron Dom (IBM, CA) Automatically finding the best pages on the World Wide Web (CLEVER) Peter Tomassi (LookSmart, San Francisco, CA) Adding human intellect to search technology Session Three: Panel discussion: Human v automated categorization and editing Ev Brenner (New York, NY)- Chairman James Callan (University of Massachusetts, MA) Marc Krellenstein (Northern Light Technology, Cambridge, MA) Dan Miller (Ask Jeeves, Berkeley, CA) Session Four: Updates and a twelve month perspective Steve Arnold (AIT, Harrods Creek, KY) Review: The leading edge in search and retrieval software Ellen Voorhees (NIST, Gaithersburg, MD) TREC update Session Five: Search engines now and beyond Intelligent Agents John Snyder (Muscat, Cambridge, England) Practical issues behind intelligent agents Text summarization Therese Firmin, (Dept of Defense, Ft George G. Meade, MD) The TIPSTER/SUMMAC evaluation of automatic text summarization systems Cross language searching Elizabeth Liddy (TextWise, Syracuse, NY) A conceptual interlingua approach to cross-language retrieval. Video search and retrieval Armon Amir (IBM, Almaden, CA) CueVideo: Modular system for automatic indexing and browsing of video/audio Speech recognition Michael Witbrock (Lycos, Waltham, MA) Retrieval of spoken documents Visualization James A. Wise (Integral Visuals, Richland, WA) Information visualization in the new millennium: Emerging science or passing fashion? Text mining David Evans (Claritech, Pittsburgh, PA) Text mining - towards decision support

Search (11 results, page 1 of 1)

Authors

Years

Languages

Themes