Search (3 results, page 1 of 1)

  • × author_ss:"Srihari, R.K."
  1. Srihari, R.K.: Using speech input for image interpretation, annotation, and retrieval (1997) 0.02
    0.022235535 = product of:
      0.04447107 = sum of:
        0.04447107 = sum of:
          0.007030784 = weight(_text_:a in 764) [ClassicSimilarity], result of:
            0.007030784 = score(doc=764,freq=6.0), product of:
              0.053105544 = queryWeight, product of:
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.046056706 = queryNorm
              0.13239266 = fieldWeight in 764, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.046875 = fieldNorm(doc=764)
          0.037440285 = weight(_text_:22 in 764) [ClassicSimilarity], result of:
            0.037440285 = score(doc=764,freq=2.0), product of:
              0.16128273 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046056706 = queryNorm
              0.23214069 = fieldWeight in 764, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=764)
      0.5 = coord(1/2)
    
    Abstract
    Explores the interaction of textual and photographic information in an integrated text and image database environment and describes 3 different applications involving the exploitation of linguistic context in vision. Describes the practical application of these ideas in working systems. PICTION uses captions to identify human faces in a photograph, wile Show&Tell is a multimedia system for semi automatic image annotation. The system combines advances in speech recognition, natural language processing and image understanding to assist in image annotation and enhance image retrieval capabilities. Presents an extension of this work to video annotation and retrieval
    Date
    22. 9.1997 19:16:05
    Type
    a
  2. Srihari, R.K.: Computational models for integrating linguistic and visual information : a survey (1994/95) 0.00
    0.0020506454 = product of:
      0.004101291 = sum of:
        0.004101291 = product of:
          0.008202582 = sum of:
            0.008202582 = weight(_text_:a in 2244) [ClassicSimilarity], result of:
              0.008202582 = score(doc=2244,freq=6.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.1544581 = fieldWeight in 2244, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2244)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Surveys research in developing computational models for integrating linguistic and visual information. It begins with a discussion of systems which have been actually implemented and continues with computationally motivated theories of human cognition. Since existing research spans several disciplines (eg natural language understanding, computer vision, knowledge representation), as well as several application areas, categorizes existing research based on inputs and objectives. Outlines some key issues related to integrating information from 2 such diverse sources and relates it to existing research. Throughout, the key issue addressed is the correspondence problem, ie how to associate visual events with words and vice verse
    Type
    a
  3. Srihari, R.K.; Zhang, Z.: Exploiting multimodal context in image retrieval (1999) 0.00
    0.001757696 = product of:
      0.003515392 = sum of:
        0.003515392 = product of:
          0.007030784 = sum of:
            0.007030784 = weight(_text_:a in 848) [ClassicSimilarity], result of:
              0.007030784 = score(doc=848,freq=6.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.13239266 = fieldWeight in 848, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=848)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This research explores the interaction of textual and photographic information in multimodal documents. The World Wide Web (WWW) may be viewed as the ultimate, large-scale, dynamically changing, multimedia database. Finding useful information from the WWW without encountering numerous false positives (the current case) poses a challenge to multimedia information retrieval systems (MMIR). The fact that images do not appear in isolation, but rather with accompanying collateral text, is exploited. Taken independently, existing techniques for picture retrieval using collateral text-based methods and image-based methods have several limitations. Text-based methods, while very powerful in matching context, do not have access to image content. Image-based methods compute general similarity between images and provide limited semantics. This research focuses on improving precision and recall in an MMIR system by interactively combining text processing with image processing (IP) in both the indexing and retrieval phases. A picture search engine is demonstrated as an application.
    Type
    a