Search (8 results, page 1 of 1)

  • × language_ss:"e"
  • × theme_ss:"Automatisches Indexieren"
  • × type_ss:"el"
  1. Mao, J.; Xu, W.; Yang, Y.; Wang, J.; Yuille, A.L.: Explain images with multimodal recurrent neural networks (2014) 0.02
    0.018830853 = product of:
      0.11298511 = sum of:
        0.11298511 = weight(_text_:networks in 1557) [ClassicSimilarity], result of:
          0.11298511 = score(doc=1557,freq=6.0), product of:
            0.20804176 = queryWeight, product of:
              4.72992 = idf(docFreq=1060, maxDocs=44218)
              0.043984205 = queryNorm
            0.5430886 = fieldWeight in 1557, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.72992 = idf(docFreq=1060, maxDocs=44218)
              0.046875 = fieldNorm(doc=1557)
      0.16666667 = coord(1/6)
    
    Abstract
    In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel sentence descriptions to explain the content of images. It directly models the probability distribution of generating a word given previous words and the image. Image descriptions are generated by sampling from this distribution. The model consists of two sub-networks: a deep recurrent neural network for sentences and a deep convolutional network for images. These two sub-networks interact with each other in a multimodal layer to form the whole m-RNN model. The effectiveness of our model is validated on three benchmark datasets (IAPR TC-12 [8], Flickr 8K [28], and Flickr 30K [13]). Our model outperforms the state-of-the-art generative method. In addition, the m-RNN model can be applied to retrieval tasks for retrieving images or sentences, and achieves significant performance improvement over the state-of-the-art methods which directly optimize the ranking objective function for retrieval.
  2. Karpathy, A.; Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions (2015) 0.02
    0.015375326 = product of:
      0.09225196 = sum of:
        0.09225196 = weight(_text_:networks in 1868) [ClassicSimilarity], result of:
          0.09225196 = score(doc=1868,freq=4.0), product of:
            0.20804176 = queryWeight, product of:
              4.72992 = idf(docFreq=1060, maxDocs=44218)
              0.043984205 = queryNorm
            0.44343 = fieldWeight in 1868, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.72992 = idf(docFreq=1060, maxDocs=44218)
              0.046875 = fieldNorm(doc=1868)
      0.16666667 = coord(1/6)
    
    Abstract
    We present a model that generates free-form natural language descriptions of image regions. Our model leverages datasets of images and their sentence descriptions to learn about the inter-modal correspondences between text and visual data. Our approach is based on a novel combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural Networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding. We then describe a Recurrent Neural Network architecture that uses the inferred alignments to learn to generate novel descriptions of image regions. We demonstrate the effectiveness of our alignment model with ranking experiments on Flickr8K, Flickr30K and COCO datasets, where we substantially improve on the state of the art. We then show that the sentences created by our generative model outperform retrieval baselines on the three aforementioned datasets and a new dataset of region-level annotations.
  3. Donahue, J.; Hendricks, L.A.; Guadarrama, S.; Rohrbach, M.; Venugopalan, S.; Saenko, K.; Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description (2014) 0.01
    0.012812771 = product of:
      0.076876625 = sum of:
        0.076876625 = weight(_text_:networks in 1873) [ClassicSimilarity], result of:
          0.076876625 = score(doc=1873,freq=4.0), product of:
            0.20804176 = queryWeight, product of:
              4.72992 = idf(docFreq=1060, maxDocs=44218)
              0.043984205 = queryNorm
            0.369525 = fieldWeight in 1873, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.72992 = idf(docFreq=1060, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1873)
      0.16666667 = coord(1/6)
    
    Abstract
    Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or "temporally deep", are effective for tasks involving sequences, visual and otherwise. We develop a novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and demonstrate the value of these models on benchmark video recognition tasks, image description and retrieval problems, and video narration challenges. In contrast to current models which assume a fixed spatio-temporal receptive field or simple temporal averaging for sequential processing, recurrent convolutional models are "doubly deep" in that they can be compositional in spatial and temporal "layers". Such models may have advantages when target concepts are complex and/or training data are limited. Learning long-term dependencies is possible when nonlinearities are incorporated into the network state updates. Long-term RNN models are appealing in that they directly can map variable-length inputs (e.g., video frames) to variable length outputs (e.g., natural language text) and can model complex temporal dynamics; yet they can be optimized with backpropagation. Our recurrent long-term models are directly connected to modern visual convnet models and can be jointly trained to simultaneously learn temporal dynamics and convolutional perceptual representations. Our results show such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined and/or optimized.
  4. Markoff, J.: Researchers announce advance in image-recognition software (2014) 0.01
    0.009059997 = product of:
      0.054359984 = sum of:
        0.054359984 = weight(_text_:networks in 1875) [ClassicSimilarity], result of:
          0.054359984 = score(doc=1875,freq=8.0), product of:
            0.20804176 = queryWeight, product of:
              4.72992 = idf(docFreq=1060, maxDocs=44218)
              0.043984205 = queryNorm
            0.26129362 = fieldWeight in 1875, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              4.72992 = idf(docFreq=1060, maxDocs=44218)
              0.01953125 = fieldNorm(doc=1875)
      0.16666667 = coord(1/6)
    
    Content
    In the longer term, the new research may lead to technology that helps the blind and robots navigate natural environments. But it also raises chilling possibilities for surveillance. During the past 15 years, video cameras have been placed in a vast number of public and private spaces. In the future, the software operating the cameras will not only be able to identify particular humans via facial recognition, experts say, but also identify certain types of behavior, perhaps even automatically alerting authorities. Two years ago Google researchers created image-recognition software and presented it with 10 million images taken from YouTube videos. Without human guidance, the program trained itself to recognize cats - a testament to the number of cat videos on YouTube. Current artificial intelligence programs in new cars already can identify pedestrians and bicyclists from cameras positioned atop the windshield and can stop the car automatically if the driver does not take action to avoid a collision. But "just single object recognition is not very beneficial," said Ali Farhadi, a computer scientist at the University of Washington who has published research on software that generates sentences from digital pictures. "We've focused on objects, and we've ignored verbs," he said, adding that these programs do not grasp what is going on in an image. Both the Google and Stanford groups tackled the problem by refining software programs known as neural networks, inspired by our understanding of how the brain works. Neural networks can "train" themselves to discover similarities and patterns in data, even when their human creators do not know the patterns exist.
    In living organisms, webs of neurons in the brain vastly outperform even the best computer-based networks in perception and pattern recognition. But by adopting some of the same architecture, computers are catching up, learning to identify patterns in speech and imagery with increasing accuracy. The advances are apparent to consumers who use Apple's Siri personal assistant, for example, or Google's image search. Both groups of researchers employed similar approaches, weaving together two types of neural networks, one focused on recognizing images and the other on human language. In both cases the researchers trained the software with relatively small sets of digital images that had been annotated with descriptive sentences by humans. After the software programs "learned" to see patterns in the pictures and description, the researchers turned them on previously unseen images. The programs were able to identify objects and actions with roughly double the accuracy of earlier efforts, although still nowhere near human perception capabilities. "I was amazed that even with the small amount of training data that we were able to do so well," said Oriol Vinyals, a Google computer scientist who wrote the paper with Alexander Toshev, Samy Bengio and Dumitru Erhan, members of the Google Brain project. "The field is just starting, and we will see a lot of increases."
  5. Mongin, L.; Fu, Y.Y.; Mostafa, J.: Open Archives data Service prototype and automated subject indexing using D-Lib archive content as a testbed (2003) 0.00
    0.00334871 = product of:
      0.02009226 = sum of:
        0.02009226 = weight(_text_:information in 1167) [ClassicSimilarity], result of:
          0.02009226 = score(doc=1167,freq=10.0), product of:
            0.0772133 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.043984205 = queryNorm
            0.2602176 = fieldWeight in 1167, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=1167)
      0.16666667 = coord(1/6)
    
    Abstract
    The Indiana University School of Library and Information Science opened a new research laboratory in January 2003; The Indiana University School of Library and Information Science Information Processing Laboratory [IU IP Lab]. The purpose of the new laboratory is to facilitate collaboration between scientists in the department in the areas of information retrieval (IR) and information visualization (IV) research. The lab has several areas of focus. These include grid and cluster computing, and a standard Java-based software platform to support plug and play research datasets, a selection of standard IR modules and standard IV algorithms. Future development includes software to enable researchers to contribute datasets, IR algorithms, and visualization algorithms into the standard environment. We decided early on to use OAI-PMH as a resource discovery tool because it is consistent with our mission.
  6. Gödert, W.: Detecting multiword phrases in mathematical text corpora (2012) 0.00
    0.001996785 = product of:
      0.011980709 = sum of:
        0.011980709 = weight(_text_:information in 466) [ClassicSimilarity], result of:
          0.011980709 = score(doc=466,freq=2.0), product of:
            0.0772133 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.043984205 = queryNorm
            0.1551638 = fieldWeight in 466, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0625 = fieldNorm(doc=466)
      0.16666667 = coord(1/6)
    
    Abstract
    We present an approach for detecting multiword phrases in mathematical text corpora. The method used is based on characteristic features of mathematical terminology. It makes use of a software tool named Lingo which allows to identify words by means of previously defined dictionaries for specific word classes as adjectives, personal names or nouns. The detection of multiword groups is done algorithmically. Possible advantages of the method for indexing and information retrieval and conclusions for applying dictionary-based methods of automatic indexing instead of stemming procedures are discussed.
  7. Search Engines and Beyond : Developing efficient knowledge management systems, April 19-20 1999, Boston, Mass (1999) 0.00
    0.0014119402 = product of:
      0.008471641 = sum of:
        0.008471641 = weight(_text_:information in 2596) [ClassicSimilarity], result of:
          0.008471641 = score(doc=2596,freq=4.0), product of:
            0.0772133 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.043984205 = queryNorm
            0.10971737 = fieldWeight in 2596, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.03125 = fieldNorm(doc=2596)
      0.16666667 = coord(1/6)
    
    Content
    Ramana Rao (Inxight, Palo Alto, CA) 7 ± 2 Insights on achieving Effective Information Access Session One: Updates and a twelve month perspective Danny Sullivan (Search Engine Watch, US / England) Portalization and other search trends Carol Tenopir (University of Tennessee) Search realities faced by end users and professional searchers Session Two: Today's search engines and beyond Daniel Hoogterp (Retrieval Technologies, McLean, VA) Effective presentation and utilization of search techniques Rick Kenny (Fulcrum Technologies, Ontario, Canada) Beyond document clustering: The knowledge impact statement Gary Stock (Ingenius, Kalamazoo, MI) Automated change monitoring Gary Culliss (Direct Hit, Wellesley Hills, MA) User popularity ranked search engines Byron Dom (IBM, CA) Automatically finding the best pages on the World Wide Web (CLEVER) Peter Tomassi (LookSmart, San Francisco, CA) Adding human intellect to search technology Session Three: Panel discussion: Human v automated categorization and editing Ev Brenner (New York, NY)- Chairman James Callan (University of Massachusetts, MA) Marc Krellenstein (Northern Light Technology, Cambridge, MA) Dan Miller (Ask Jeeves, Berkeley, CA) Session Four: Updates and a twelve month perspective Steve Arnold (AIT, Harrods Creek, KY) Review: The leading edge in search and retrieval software Ellen Voorhees (NIST, Gaithersburg, MD) TREC update Session Five: Search engines now and beyond Intelligent Agents John Snyder (Muscat, Cambridge, England) Practical issues behind intelligent agents Text summarization Therese Firmin, (Dept of Defense, Ft George G. Meade, MD) The TIPSTER/SUMMAC evaluation of automatic text summarization systems Cross language searching Elizabeth Liddy (TextWise, Syracuse, NY) A conceptual interlingua approach to cross-language retrieval. Video search and retrieval Armon Amir (IBM, Almaden, CA) CueVideo: Modular system for automatic indexing and browsing of video/audio Speech recognition Michael Witbrock (Lycos, Waltham, MA) Retrieval of spoken documents Visualization James A. Wise (Integral Visuals, Richland, WA) Information visualization in the new millennium: Emerging science or passing fashion? Text mining David Evans (Claritech, Pittsburgh, PA) Text mining - towards decision support
  8. Toepfer, M.; Seifert, C.: Content-based quality estimation for automatic subject indexing of short texts under precision and recall constraints 0.00
    0.0012479905 = product of:
      0.007487943 = sum of:
        0.007487943 = weight(_text_:information in 4309) [ClassicSimilarity], result of:
          0.007487943 = score(doc=4309,freq=2.0), product of:
            0.0772133 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.043984205 = queryNorm
            0.09697737 = fieldWeight in 4309, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4309)
      0.16666667 = coord(1/6)
    
    Abstract
    Semantic annotations have to satisfy quality constraints to be useful for digital libraries, which is particularly challenging on large and diverse datasets. Confidence scores of multi-label classification methods typically refer only to the relevance of particular subjects, disregarding indicators of insufficient content representation at the document-level. Therefore, we propose a novel approach that detects documents rather than concepts where quality criteria are met. Our approach uses a deep, multi-layered regression architecture, which comprises a variety of content-based indicators. We evaluated multiple configurations using text collections from law and economics, where the available content is restricted to very short texts. Notably, we demonstrate that the proposed quality estimation technique can determine subsets of the previously unseen data where considerable gains in document-level recall can be achieved, while upholding precision at the same time. Hence, the approach effectively performs a filtering that ensures high data quality standards in operative information retrieval systems.