Search (181 results, page 1 of 10)

  • × theme_ss:"Automatisches Indexieren"
  • × language_ss:"e"
  1. Short, M.: Text mining and subject analysis for fiction; or, using machine learning and information extraction to assign subject headings to dime novels (2019) 0.08
    0.084560595 = product of:
      0.16912119 = sum of:
        0.010483121 = weight(_text_:information in 5481) [ClassicSimilarity], result of:
          0.010483121 = score(doc=5481,freq=2.0), product of:
            0.0772133 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.043984205 = queryNorm
            0.13576832 = fieldWeight in 5481, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5481)
        0.10706388 = weight(_text_:united in 5481) [ClassicSimilarity], result of:
          0.10706388 = score(doc=5481,freq=2.0), product of:
            0.24675635 = queryWeight, product of:
              5.6101127 = idf(docFreq=439, maxDocs=44218)
              0.043984205 = queryNorm
            0.433885 = fieldWeight in 5481, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.6101127 = idf(docFreq=439, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5481)
        0.05157419 = product of:
          0.10314838 = sum of:
            0.10314838 = weight(_text_:states in 5481) [ClassicSimilarity], result of:
              0.10314838 = score(doc=5481,freq=2.0), product of:
                0.24220218 = queryWeight, product of:
                  5.506572 = idf(docFreq=487, maxDocs=44218)
                  0.043984205 = queryNorm
                0.42587718 = fieldWeight in 5481, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.506572 = idf(docFreq=487, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5481)
          0.5 = coord(1/2)
      0.5 = coord(3/6)
    
    Abstract
    This article describes multiple experiments in text mining at Northern Illinois University that were undertaken to improve the efficiency and accuracy of cataloging. It focuses narrowly on subject analysis of dime novels, a format of inexpensive fiction that was popular in the United States between 1860 and 1915. NIU holds more than 55,000 dime novels in its collections, which it is in the process of comprehensively digitizing. Classification, keyword extraction, named-entity recognition, clustering, and topic modeling are discussed as means of assigning subject headings to improve their discoverability by researchers and to increase the productivity of digitization workflows.
  2. Wolfekuhler, M.R.; Punch, W.F.: Finding salient features for personal Web pages categories (1997) 0.05
    0.053722244 = product of:
      0.10744449 = sum of:
        0.010483121 = weight(_text_:information in 2673) [ClassicSimilarity], result of:
          0.010483121 = score(doc=2673,freq=2.0), product of:
            0.0772133 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.043984205 = queryNorm
            0.13576832 = fieldWeight in 2673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2673)
        0.07610398 = weight(_text_:networks in 2673) [ClassicSimilarity], result of:
          0.07610398 = score(doc=2673,freq=2.0), product of:
            0.20804176 = queryWeight, product of:
              4.72992 = idf(docFreq=1060, maxDocs=44218)
              0.043984205 = queryNorm
            0.36581108 = fieldWeight in 2673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.72992 = idf(docFreq=1060, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2673)
        0.020857384 = product of:
          0.04171477 = sum of:
            0.04171477 = weight(_text_:22 in 2673) [ClassicSimilarity], result of:
              0.04171477 = score(doc=2673,freq=2.0), product of:
                0.1540252 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043984205 = queryNorm
                0.2708308 = fieldWeight in 2673, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2673)
          0.5 = coord(1/2)
      0.5 = coord(3/6)
    
    Abstract
    Examines techniques that discover features in sets of pre-categorized documents, such that similar documents can be found on the WWW. Examines techniques which will classifiy training examples with high accuracy, then explains why this is not necessarily useful. Describes a method for extracting word clusters from the raw document features. Results show that the clustering technique is successful in discovering word groups in personal Web pages which can be used to find similar information on the WWW
    Date
    1. 8.1996 22:08:06
    Source
    Computer networks and ISDN systems. 29(1997) no.8, S.1147-1156
  3. Krutulis, J.D.; Jacob, E.K.: ¬A theoretical model for the study of emergent structure in adaptive information networks (1995) 0.04
    0.043689415 = product of:
      0.13106824 = sum of:
        0.02344097 = weight(_text_:information in 3353) [ClassicSimilarity], result of:
          0.02344097 = score(doc=3353,freq=10.0), product of:
            0.0772133 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.043984205 = queryNorm
            0.3035872 = fieldWeight in 3353, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3353)
        0.10762728 = weight(_text_:networks in 3353) [ClassicSimilarity], result of:
          0.10762728 = score(doc=3353,freq=4.0), product of:
            0.20804176 = queryWeight, product of:
              4.72992 = idf(docFreq=1060, maxDocs=44218)
              0.043984205 = queryNorm
            0.517335 = fieldWeight in 3353, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.72992 = idf(docFreq=1060, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3353)
      0.33333334 = coord(2/6)
    
    Abstract
    Attempts to automate classification have focused on mimicking the intellectual processes whereby human classifiers assign entities to mutually exclusive groups that exhibit or more shared characteristics. A more viable approach might be to construct an adaptive retrieval system that produces groupings of related entities by generating dynamic categories based on document content and on the system's emergent structure as it adapts to modifications in the database and to observed patterns of access. Presents a theoretical model for adaptive information networks using relevance feedback and genetic algorithms to generate emergent structure
    Imprint
    Alberta : Alberta University, School of Library and Information Studies
    Source
    Connectedness: information, systems, people, organizations. Proceedings of CAIS/ACSI 95, the proceedings of the 23rd Annual Conference of the Canadian Association for Information Science. Ed. by Hope A. Olson and Denis B. Ward
  4. Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.02
    0.023878481 = product of:
      0.07163544 = sum of:
        0.023961417 = weight(_text_:information in 402) [ClassicSimilarity], result of:
          0.023961417 = score(doc=402,freq=2.0), product of:
            0.0772133 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.043984205 = queryNorm
            0.3103276 = fieldWeight in 402, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.125 = fieldNorm(doc=402)
        0.047674023 = product of:
          0.095348045 = sum of:
            0.095348045 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
              0.095348045 = score(doc=402,freq=2.0), product of:
                0.1540252 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043984205 = queryNorm
                0.61904186 = fieldWeight in 402, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.125 = fieldNorm(doc=402)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Source
    Information processing and management. 22(1986) no.6, S.465-476
  5. Hlava, M.M.K.: Automatic indexing : comparing rule-based and statistics-based indexing systems (2005) 0.02
    0.02089367 = product of:
      0.06268101 = sum of:
        0.020966241 = weight(_text_:information in 6265) [ClassicSimilarity], result of:
          0.020966241 = score(doc=6265,freq=2.0), product of:
            0.0772133 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.043984205 = queryNorm
            0.27153665 = fieldWeight in 6265, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.109375 = fieldNorm(doc=6265)
        0.04171477 = product of:
          0.08342954 = sum of:
            0.08342954 = weight(_text_:22 in 6265) [ClassicSimilarity], result of:
              0.08342954 = score(doc=6265,freq=2.0), product of:
                0.1540252 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043984205 = queryNorm
                0.5416616 = fieldWeight in 6265, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=6265)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Source
    Information outlook. 9(2005) no.8, S.22-23
  6. Rasmussen, E.M.: Indexing and retrieval for the Web (2002) 0.02
    0.019450821 = product of:
      0.058352463 = sum of:
        0.020300474 = weight(_text_:information in 4285) [ClassicSimilarity], result of:
          0.020300474 = score(doc=4285,freq=30.0), product of:
            0.0772133 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.043984205 = queryNorm
            0.2629142 = fieldWeight in 4285, product of:
              5.477226 = tf(freq=30.0), with freq of:
                30.0 = termFreq=30.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.02734375 = fieldNorm(doc=4285)
        0.03805199 = weight(_text_:networks in 4285) [ClassicSimilarity], result of:
          0.03805199 = score(doc=4285,freq=2.0), product of:
            0.20804176 = queryWeight, product of:
              4.72992 = idf(docFreq=1060, maxDocs=44218)
              0.043984205 = queryNorm
            0.18290554 = fieldWeight in 4285, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.72992 = idf(docFreq=1060, maxDocs=44218)
              0.02734375 = fieldNorm(doc=4285)
      0.33333334 = coord(2/6)
    
    Abstract
    The introduction and growth of the World Wide Web (WWW, or Web) have resulted in a profound change in the way individuals and organizations access information. In terms of volume, nature, and accessibility, the characteristics of electronic information are significantly different from those of even five or six years ago. Control of, and access to, this flood of information rely heavily an automated techniques for indexing and retrieval. According to Gudivada, Raghavan, Grosky, and Kasanagottu (1997, p. 58), "The ability to search and retrieve information from the Web efficiently and effectively is an enabling technology for realizing its full potential." Almost 93 percent of those surveyed consider the Web an "indispensable" Internet technology, second only to e-mail (Graphie, Visualization & Usability Center, 1998). Although there are other ways of locating information an the Web (browsing or following directory structures), 85 percent of users identify Web pages by means of a search engine (Graphie, Visualization & Usability Center, 1998). A more recent study conducted by the Stanford Institute for the Quantitative Study of Society confirms the finding that searching for information is second only to e-mail as an Internet activity (Nie & Ebring, 2000, online). In fact, Nie and Ebring conclude, "... the Internet today is a giant public library with a decidedly commercial tilt. The most widespread use of the Internet today is as an information search utility for products, travel, hobbies, and general information. Virtually all users interviewed responded that they engaged in one or more of these information gathering activities."
    Techniques for automated indexing and information retrieval (IR) have been developed, tested, and refined over the past 40 years, and are well documented (see, for example, Agosti & Smeaton, 1996; BaezaYates & Ribeiro-Neto, 1999a; Frakes & Baeza-Yates, 1992; Korfhage, 1997; Salton, 1989; Witten, Moffat, & Bell, 1999). With the introduction of the Web, and the capability to index and retrieve via search engines, these techniques have been extended to a new environment. They have been adopted, altered, and in some Gases extended to include new methods. "In short, search engines are indispensable for searching the Web, they employ a variety of relatively advanced IR techniques, and there are some peculiar aspects of search engines that make searching the Web different than more conventional information retrieval" (Gordon & Pathak, 1999, p. 145). The environment for information retrieval an the World Wide Web differs from that of "conventional" information retrieval in a number of fundamental ways. The collection is very large and changes continuously, with pages being added, deleted, and altered. Wide variability between the size, structure, focus, quality, and usefulness of documents makes Web documents much more heterogeneous than a typical electronic document collection. The wide variety of document types includes images, video, audio, and scripts, as well as many different document languages. Duplication of documents and sites is common. Documents are interconnected through networks of hyperlinks. Because of the size and dynamic nature of the Web, preprocessing all documents requires considerable resources and is often not feasible, certainly not an the frequent basis required to ensure currency. Query length is usually much shorter than in other environments-only a few words-and user behavior differs from that in other environments. These differences make the Web a novel environment for information retrieval (Baeza-Yates & Ribeiro-Neto, 1999b; Bharat & Henzinger, 1998; Huang, 2000).
    Source
    Annual review of information science and technology. 37(2003), S.91-126
  7. Mao, J.; Xu, W.; Yang, Y.; Wang, J.; Yuille, A.L.: Explain images with multimodal recurrent neural networks (2014) 0.02
    0.018830853 = product of:
      0.11298511 = sum of:
        0.11298511 = weight(_text_:networks in 1557) [ClassicSimilarity], result of:
          0.11298511 = score(doc=1557,freq=6.0), product of:
            0.20804176 = queryWeight, product of:
              4.72992 = idf(docFreq=1060, maxDocs=44218)
              0.043984205 = queryNorm
            0.5430886 = fieldWeight in 1557, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.72992 = idf(docFreq=1060, maxDocs=44218)
              0.046875 = fieldNorm(doc=1557)
      0.16666667 = coord(1/6)
    
    Abstract
    In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel sentence descriptions to explain the content of images. It directly models the probability distribution of generating a word given previous words and the image. Image descriptions are generated by sampling from this distribution. The model consists of two sub-networks: a deep recurrent neural network for sentences and a deep convolutional network for images. These two sub-networks interact with each other in a multimodal layer to form the whole m-RNN model. The effectiveness of our model is validated on three benchmark datasets (IAPR TC-12 [8], Flickr 8K [28], and Flickr 30K [13]). Our model outperforms the state-of-the-art generative method. In addition, the m-RNN model can be applied to retrieval tasks for retrieving images or sentences, and achieves significant performance improvement over the state-of-the-art methods which directly optimize the ranking objective function for retrieval.
  8. Cui, H.; Boufford, D.; Selden, P.: Semantic annotation of biosystematics literature without training examples (2010) 0.02
    0.01773066 = product of:
      0.05319198 = sum of:
        0.0089855315 = weight(_text_:information in 3422) [ClassicSimilarity], result of:
          0.0089855315 = score(doc=3422,freq=2.0), product of:
            0.0772133 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.043984205 = queryNorm
            0.116372846 = fieldWeight in 3422, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=3422)
        0.044206448 = product of:
          0.088412896 = sum of:
            0.088412896 = weight(_text_:states in 3422) [ClassicSimilarity], result of:
              0.088412896 = score(doc=3422,freq=2.0), product of:
                0.24220218 = queryWeight, product of:
                  5.506572 = idf(docFreq=487, maxDocs=44218)
                  0.043984205 = queryNorm
                0.3650376 = fieldWeight in 3422, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.506572 = idf(docFreq=487, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3422)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Abstract
    This article presents an unsupervised algorithm for semantic annotation of morphological descriptions of whole organisms. The algorithm is able to annotate plain text descriptions with high accuracy at the clause level by exploiting the corpus itself. In other words, the algorithm does not need lexicons, syntactic parsers, training examples, or annotation templates. The evaluation on two real-life description collections in botany and paleontology shows that the algorithm has the following desirable features: (a) reduces/eliminates manual labor required to compile dictionaries and prepare source documents; (b) improves annotation coverage: the algorithm annotates what appears in documents and is not limited by predefined and often incomplete templates; (c) learns clean and reusable concepts: the algorithm learns organ names and character states that can be used to construct reusable domain lexicons, as opposed to collection-dependent patterns whose applicability is often limited to a particular collection; (d) insensitive to collection size; and (e) runs in linear time with respect to the number of clauses to be annotated.
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.3, S.522-542
  9. Biebricher, N.; Fuhr, N.; Lustig, G.; Schwantner, M.; Knorz, G.: ¬The automatic indexing system AIR/PHYS : from research to application (1988) 0.02
    0.01699179 = product of:
      0.050975367 = sum of:
        0.0211791 = weight(_text_:information in 1952) [ClassicSimilarity], result of:
          0.0211791 = score(doc=1952,freq=4.0), product of:
            0.0772133 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.043984205 = queryNorm
            0.27429342 = fieldWeight in 1952, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.078125 = fieldNorm(doc=1952)
        0.029796265 = product of:
          0.05959253 = sum of:
            0.05959253 = weight(_text_:22 in 1952) [ClassicSimilarity], result of:
              0.05959253 = score(doc=1952,freq=2.0), product of:
                0.1540252 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043984205 = queryNorm
                0.38690117 = fieldWeight in 1952, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=1952)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Date
    16. 8.1998 12:51:22
    Footnote
    Wiederabgedruckt in: Readings in information retrieval. Ed.: K. Sparck Jones u. P. Willett. San Francisco: Morgan Kaufmann 1997. S.513-517.
    Source
    Proceedings of the 11th annual conference on research and development in information retrieval. Ed.: Y. Chiaramella
  10. Karpathy, A.; Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions (2015) 0.02
    0.015375326 = product of:
      0.09225196 = sum of:
        0.09225196 = weight(_text_:networks in 1868) [ClassicSimilarity], result of:
          0.09225196 = score(doc=1868,freq=4.0), product of:
            0.20804176 = queryWeight, product of:
              4.72992 = idf(docFreq=1060, maxDocs=44218)
              0.043984205 = queryNorm
            0.44343 = fieldWeight in 1868, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.72992 = idf(docFreq=1060, maxDocs=44218)
              0.046875 = fieldNorm(doc=1868)
      0.16666667 = coord(1/6)
    
    Abstract
    We present a model that generates free-form natural language descriptions of image regions. Our model leverages datasets of images and their sentence descriptions to learn about the inter-modal correspondences between text and visual data. Our approach is based on a novel combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural Networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding. We then describe a Recurrent Neural Network architecture that uses the inferred alignments to learn to generate novel descriptions of image regions. We demonstrate the effectiveness of our alignment model with ranking experiments on Flickr8K, Flickr30K and COCO datasets, where we substantially improve on the state of the art. We then show that the sentences created by our generative model outperform retrieval baselines on the three aforementioned datasets and a new dataset of region-level annotations.
  11. Kutschekmanesch, S.; Lutes, B.; Moelle, K.; Thiel, U.; Tzeras, K.: Automated multilingual indexing : a synthesis of rule-based and thesaurus-based methods (1998) 0.01
    0.014924051 = product of:
      0.04477215 = sum of:
        0.014975886 = weight(_text_:information in 4157) [ClassicSimilarity], result of:
          0.014975886 = score(doc=4157,freq=2.0), product of:
            0.0772133 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.043984205 = queryNorm
            0.19395474 = fieldWeight in 4157, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.078125 = fieldNorm(doc=4157)
        0.029796265 = product of:
          0.05959253 = sum of:
            0.05959253 = weight(_text_:22 in 4157) [ClassicSimilarity], result of:
              0.05959253 = score(doc=4157,freq=2.0), product of:
                0.1540252 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043984205 = queryNorm
                0.38690117 = fieldWeight in 4157, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=4157)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Source
    Information und Märkte: 50. Deutscher Dokumentartag 1998, Kongreß der Deutschen Gesellschaft für Dokumentation e.V. (DGD), Rheinische Friedrich-Wilhelms-Universität Bonn, 22.-24. September 1998. Hrsg. von Marlies Ockenfeld u. Gerhard J. Mantwill
  12. Riloff, E.: ¬An empirical study of automated dictionary construction for information extraction in three domains (1996) 0.01
    0.014862737 = product of:
      0.04458821 = sum of:
        0.020751199 = weight(_text_:information in 6752) [ClassicSimilarity], result of:
          0.020751199 = score(doc=6752,freq=6.0), product of:
            0.0772133 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.043984205 = queryNorm
            0.2687516 = fieldWeight in 6752, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0625 = fieldNorm(doc=6752)
        0.023837011 = product of:
          0.047674023 = sum of:
            0.047674023 = weight(_text_:22 in 6752) [ClassicSimilarity], result of:
              0.047674023 = score(doc=6752,freq=2.0), product of:
                0.1540252 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043984205 = queryNorm
                0.30952093 = fieldWeight in 6752, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=6752)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Abstract
    AutoSlog is a system that addresses the knowledge engineering bottleneck for information extraction. AutoSlog automatically creates domain specific dictionaries for information extraction, given an appropriate training corpus. Describes experiments with AutoSlog in terrorism, joint ventures and microelectronics domains. Compares the performance of AutoSlog across the 3 domains, discusses the lessons learned and presents results from 2 experiments which demonstrate that novice users can generate effective dictionaries using AutoSlog
    Date
    6. 3.1997 16:22:15
  13. Donahue, J.; Hendricks, L.A.; Guadarrama, S.; Rohrbach, M.; Venugopalan, S.; Saenko, K.; Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description (2014) 0.01
    0.012812771 = product of:
      0.076876625 = sum of:
        0.076876625 = weight(_text_:networks in 1873) [ClassicSimilarity], result of:
          0.076876625 = score(doc=1873,freq=4.0), product of:
            0.20804176 = queryWeight, product of:
              4.72992 = idf(docFreq=1060, maxDocs=44218)
              0.043984205 = queryNorm
            0.369525 = fieldWeight in 1873, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.72992 = idf(docFreq=1060, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1873)
      0.16666667 = coord(1/6)
    
    Abstract
    Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or "temporally deep", are effective for tasks involving sequences, visual and otherwise. We develop a novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and demonstrate the value of these models on benchmark video recognition tasks, image description and retrieval problems, and video narration challenges. In contrast to current models which assume a fixed spatio-temporal receptive field or simple temporal averaging for sequential processing, recurrent convolutional models are "doubly deep" in that they can be compositional in spatial and temporal "layers". Such models may have advantages when target concepts are complex and/or training data are limited. Learning long-term dependencies is possible when nonlinearities are incorporated into the network state updates. Long-term RNN models are appealing in that they directly can map variable-length inputs (e.g., video frames) to variable length outputs (e.g., natural language text) and can model complex temporal dynamics; yet they can be optimized with backpropagation. Our recurrent long-term models are directly connected to modern visual convnet models and can be jointly trained to simultaneously learn temporal dynamics and convolutional perceptual representations. Our results show such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined and/or optimized.
  14. Pfeifer, U.; Fuhr, N.; Huynh, T.: Searching structured documents with the enhanced retrieval functionality of freeWAIS-sf and SFgate (1995) 0.01
    0.012683997 = product of:
      0.07610398 = sum of:
        0.07610398 = weight(_text_:networks in 2214) [ClassicSimilarity], result of:
          0.07610398 = score(doc=2214,freq=2.0), product of:
            0.20804176 = queryWeight, product of:
              4.72992 = idf(docFreq=1060, maxDocs=44218)
              0.043984205 = queryNorm
            0.36581108 = fieldWeight in 2214, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.72992 = idf(docFreq=1060, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2214)
      0.16666667 = coord(1/6)
    
    Source
    Computer networks and ISDN systems. 27(1995) no.6, S.1027-36
  15. Hodges, P.R.: Keyword in title indexes : effectiveness of retrieval in computer searches (1983) 0.01
    0.011894252 = product of:
      0.035682756 = sum of:
        0.014825371 = weight(_text_:information in 5001) [ClassicSimilarity], result of:
          0.014825371 = score(doc=5001,freq=4.0), product of:
            0.0772133 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.043984205 = queryNorm
            0.1920054 = fieldWeight in 5001, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5001)
        0.020857384 = product of:
          0.04171477 = sum of:
            0.04171477 = weight(_text_:22 in 5001) [ClassicSimilarity], result of:
              0.04171477 = score(doc=5001,freq=2.0), product of:
                0.1540252 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043984205 = queryNorm
                0.2708308 = fieldWeight in 5001, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5001)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Abstract
    A study was done to test the effectiveness of retrieval using title word searching. It was based on actual search profiles used in the Mechanized Information Center at Ohio State University, in order ro replicate as closely as possible actual searching conditions. Fewer than 50% of the relevant titles were retrieved by keywords in titles. The low rate of retrieval can be attributes to three sources: titles themselves, user and information specialist ignorance of the subject vocabulary in use, and to general language problems. Across fields it was found that the social sciences had the best retrieval rate, with science having the next best, and arts and humanities the lowest. Ways to enhance and supplement keyword in title searching on the computer and in printed indexes are discussed.
    Date
    14. 3.1996 13:22:21
  16. Bordoni, L.; Pazienza, M.T.: Documents automatic indexing in an environmental domain (1997) 0.01
    0.011894252 = product of:
      0.035682756 = sum of:
        0.014825371 = weight(_text_:information in 530) [ClassicSimilarity], result of:
          0.014825371 = score(doc=530,freq=4.0), product of:
            0.0772133 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.043984205 = queryNorm
            0.1920054 = fieldWeight in 530, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=530)
        0.020857384 = product of:
          0.04171477 = sum of:
            0.04171477 = weight(_text_:22 in 530) [ClassicSimilarity], result of:
              0.04171477 = score(doc=530,freq=2.0), product of:
                0.1540252 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043984205 = queryNorm
                0.2708308 = fieldWeight in 530, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=530)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Abstract
    Describes an application of Natural Language Processing (NLP) techniques, in HIRMA (Hypertextual Information Retrieval Managed by ARIOSTO), to the problem of document indexing by referring to a system which incorporates natural language processing techniques to determine the subject of the text of documents and to associate them with relevant semantic indexes. Describes briefly the overall system, details of its implementation on a corpus of scientific abstracts related to environmental topics and experimental evidence of the system's behaviour. Analyzes in detail an experiment designed to evaluate the system's retrieval ability in terms of recall and precision
    Source
    International forum on information and documentation. 22(1997) no.1, S.17-28
  17. Newman, D.J.; Block, S.: Probabilistic topic decomposition of an eighteenth-century American newspaper (2006) 0.01
    0.011894252 = product of:
      0.035682756 = sum of:
        0.014825371 = weight(_text_:information in 5291) [ClassicSimilarity], result of:
          0.014825371 = score(doc=5291,freq=4.0), product of:
            0.0772133 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.043984205 = queryNorm
            0.1920054 = fieldWeight in 5291, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5291)
        0.020857384 = product of:
          0.04171477 = sum of:
            0.04171477 = weight(_text_:22 in 5291) [ClassicSimilarity], result of:
              0.04171477 = score(doc=5291,freq=2.0), product of:
                0.1540252 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043984205 = queryNorm
                0.2708308 = fieldWeight in 5291, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5291)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Abstract
    We use a probabilistic mixture decomposition method to determine topics in the Pennsylvania Gazette, a major colonial U.S. newspaper from 1728-1800. We assess the value of several topic decomposition techniques for historical research and compare the accuracy and efficacy of various methods. After determining the topics covered by the 80,000 articles and advertisements in the entire 18th century run of the Gazette, we calculate how the prevalence of those topics changed over time, and give historically relevant examples of our findings. This approach reveals important information about the content of this colonial newspaper, and suggests the value of such approaches to a more complete understanding of early American print culture and society.
    Date
    22. 7.2006 17:32:00
    Source
    Journal of the American Society for Information Science and Technology. 57(2006) no.6, S.753-767
  18. Milstead, J.L.: Thesauri in a full-text world (1998) 0.01
    0.009958006 = product of:
      0.02987402 = sum of:
        0.014975886 = weight(_text_:information in 2337) [ClassicSimilarity], result of:
          0.014975886 = score(doc=2337,freq=8.0), product of:
            0.0772133 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.043984205 = queryNorm
            0.19395474 = fieldWeight in 2337, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2337)
        0.0148981325 = product of:
          0.029796265 = sum of:
            0.029796265 = weight(_text_:22 in 2337) [ClassicSimilarity], result of:
              0.029796265 = score(doc=2337,freq=2.0), product of:
                0.1540252 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043984205 = queryNorm
                0.19345059 = fieldWeight in 2337, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2337)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Abstract
    Despite early claims to the contemporary, thesauri continue to find use as access tools for information in the full-text environment. Their mode of use is changing, but this change actually represents an expansion rather than a contrdiction of their utility. Thesauri and similar vocabulary tools can complement full-text access by aiding users in focusing their searches, by supplementing the linguistic analysis of the text search engine, and even by serving as one of the tools used by the linguistic engine for its analysis. While human indexing contunues to be used for many databases, the trend is to increase the use of machine aids for this purpose. All machine-aided indexing (MAI) systems rely on thesauri as the basis for term selection. In the 21st century, the balance of effort between human and machine will change at both input and output, but thesauri will continue to play an important role for the foreseeable future
    Date
    22. 9.1997 19:16:05
    Imprint
    Urbana-Champaign, IL : Illinois University at Urbana-Champaign, Graduate School of Library and Information Science
    Source
    Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al
  19. Wellisch, H.H.: ¬The art of indexing and some fallacies of its automation (1992) 0.01
    0.009823656 = product of:
      0.05894193 = sum of:
        0.05894193 = product of:
          0.11788386 = sum of:
            0.11788386 = weight(_text_:states in 3958) [ClassicSimilarity], result of:
              0.11788386 = score(doc=3958,freq=2.0), product of:
                0.24220218 = queryWeight, product of:
                  5.506572 = idf(docFreq=487, maxDocs=44218)
                  0.043984205 = queryNorm
                0.48671678 = fieldWeight in 3958, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.506572 = idf(docFreq=487, maxDocs=44218)
                  0.0625 = fieldNorm(doc=3958)
          0.5 = coord(1/2)
      0.16666667 = coord(1/6)
    
    Abstract
    Reviews the history of indexing, which began with the rise of the universities in the 13th century, before the invention of printing. Describes the different skills needed for indexing books, periodicals and databases. States the belief that the quest for fully automatic indexing is a futile endeavour; machine-generated indexes need the services of human post-editors if they are to be useful and acceptable
  20. Markoff, J.: Researchers announce advance in image-recognition software (2014) 0.01
    0.009059997 = product of:
      0.054359984 = sum of:
        0.054359984 = weight(_text_:networks in 1875) [ClassicSimilarity], result of:
          0.054359984 = score(doc=1875,freq=8.0), product of:
            0.20804176 = queryWeight, product of:
              4.72992 = idf(docFreq=1060, maxDocs=44218)
              0.043984205 = queryNorm
            0.26129362 = fieldWeight in 1875, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              4.72992 = idf(docFreq=1060, maxDocs=44218)
              0.01953125 = fieldNorm(doc=1875)
      0.16666667 = coord(1/6)
    
    Content
    In the longer term, the new research may lead to technology that helps the blind and robots navigate natural environments. But it also raises chilling possibilities for surveillance. During the past 15 years, video cameras have been placed in a vast number of public and private spaces. In the future, the software operating the cameras will not only be able to identify particular humans via facial recognition, experts say, but also identify certain types of behavior, perhaps even automatically alerting authorities. Two years ago Google researchers created image-recognition software and presented it with 10 million images taken from YouTube videos. Without human guidance, the program trained itself to recognize cats - a testament to the number of cat videos on YouTube. Current artificial intelligence programs in new cars already can identify pedestrians and bicyclists from cameras positioned atop the windshield and can stop the car automatically if the driver does not take action to avoid a collision. But "just single object recognition is not very beneficial," said Ali Farhadi, a computer scientist at the University of Washington who has published research on software that generates sentences from digital pictures. "We've focused on objects, and we've ignored verbs," he said, adding that these programs do not grasp what is going on in an image. Both the Google and Stanford groups tackled the problem by refining software programs known as neural networks, inspired by our understanding of how the brain works. Neural networks can "train" themselves to discover similarities and patterns in data, even when their human creators do not know the patterns exist.
    In living organisms, webs of neurons in the brain vastly outperform even the best computer-based networks in perception and pattern recognition. But by adopting some of the same architecture, computers are catching up, learning to identify patterns in speech and imagery with increasing accuracy. The advances are apparent to consumers who use Apple's Siri personal assistant, for example, or Google's image search. Both groups of researchers employed similar approaches, weaving together two types of neural networks, one focused on recognizing images and the other on human language. In both cases the researchers trained the software with relatively small sets of digital images that had been annotated with descriptive sentences by humans. After the software programs "learned" to see patterns in the pictures and description, the researchers turned them on previously unseen images. The programs were able to identify objects and actions with roughly double the accuracy of earlier efforts, although still nowhere near human perception capabilities. "I was amazed that even with the small amount of training data that we were able to do so well," said Oriol Vinyals, a Google computer scientist who wrote the paper with Alexander Toshev, Samy Bengio and Dumitru Erhan, members of the Google Brain project. "The field is just starting, and we will see a lot of increases."

Types

  • a 171
  • el 8
  • s 5
  • m 4
  • More… Less…