Search (270 results, page 1 of 14)

Short, M.: Text mining and subject analysis for fiction; or, using machine learning and information extraction to assign subject headings to dime novels (2019) 0.08

0.084560595 = product of:
  0.16912119 = sum of:
    0.010483121 = weight(_text_:information in 5481) [ClassicSimilarity], result of:
      0.010483121 = score(doc=5481,freq=2.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.13576832 = fieldWeight in 5481, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5481)
    0.10706388 = weight(_text_:united in 5481) [ClassicSimilarity], result of:
      0.10706388 = score(doc=5481,freq=2.0), product of:
        0.24675635 = queryWeight, product of:
          5.6101127 = idf(docFreq=439, maxDocs=44218)
          0.043984205 = queryNorm
        0.433885 = fieldWeight in 5481, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.6101127 = idf(docFreq=439, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5481)
    0.05157419 = product of:
      0.10314838 = sum of:
        0.10314838 = weight(_text_:states in 5481) [ClassicSimilarity], result of:
          0.10314838 = score(doc=5481,freq=2.0), product of:
            0.24220218 = queryWeight, product of:
              5.506572 = idf(docFreq=487, maxDocs=44218)
              0.043984205 = queryNorm
            0.42587718 = fieldWeight in 5481, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.506572 = idf(docFreq=487, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5481)
      0.5 = coord(1/2)
  0.5 = coord(3/6)

Abstract: This article describes multiple experiments in text mining at Northern Illinois University that were undertaken to improve the efficiency and accuracy of cataloging. It focuses narrowly on subject analysis of dime novels, a format of inexpensive fiction that was popular in the United States between 1860 and 1915. NIU holds more than 55,000 dime novels in its collections, which it is in the process of comprehensively digitizing. Classification, keyword extraction, named-entity recognition, clustering, and topic modeling are discussed as means of assigning subject headings to improve their discoverability by researchers and to increase the productivity of digitization workflows.

Wolfekuhler, M.R.; Punch, W.F.: Finding salient features for personal Web pages categories (1997) 0.05

0.053722244 = product of:
  0.10744449 = sum of:
    0.010483121 = weight(_text_:information in 2673) [ClassicSimilarity], result of:
      0.010483121 = score(doc=2673,freq=2.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.13576832 = fieldWeight in 2673, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2673)
    0.07610398 = weight(_text_:networks in 2673) [ClassicSimilarity], result of:
      0.07610398 = score(doc=2673,freq=2.0), product of:
        0.20804176 = queryWeight, product of:
          4.72992 = idf(docFreq=1060, maxDocs=44218)
          0.043984205 = queryNorm
        0.36581108 = fieldWeight in 2673, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.72992 = idf(docFreq=1060, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2673)
    0.020857384 = product of:
      0.04171477 = sum of:
        0.04171477 = weight(_text_:22 in 2673) [ClassicSimilarity], result of:
          0.04171477 = score(doc=2673,freq=2.0), product of:
            0.1540252 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.043984205 = queryNorm
            0.2708308 = fieldWeight in 2673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2673)
      0.5 = coord(1/2)
  0.5 = coord(3/6)

Abstract: Examines techniques that discover features in sets of pre-categorized documents, such that similar documents can be found on the WWW. Examines techniques which will classifiy training examples with high accuracy, then explains why this is not necessarily useful. Describes a method for extracting word clusters from the raw document features. Results show that the clustering technique is successful in discovering word groups in personal Web pages which can be used to find similar information on the WWW
Date: 1. 8.1996 22:08:06
Source: Computer networks and ISDN systems. 29(1997) no.8, S.1147-1156

Krutulis, J.D.; Jacob, E.K.: ¬A theoretical model for the study of emergent structure in adaptive information networks (1995) 0.04

0.043689415 = product of:
  0.13106824 = sum of:
    0.02344097 = weight(_text_:information in 3353) [ClassicSimilarity], result of:
      0.02344097 = score(doc=3353,freq=10.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.3035872 = fieldWeight in 3353, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3353)
    0.10762728 = weight(_text_:networks in 3353) [ClassicSimilarity], result of:
      0.10762728 = score(doc=3353,freq=4.0), product of:
        0.20804176 = queryWeight, product of:
          4.72992 = idf(docFreq=1060, maxDocs=44218)
          0.043984205 = queryNorm
        0.517335 = fieldWeight in 3353, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.72992 = idf(docFreq=1060, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3353)
  0.33333334 = coord(2/6)

Abstract: Attempts to automate classification have focused on mimicking the intellectual processes whereby human classifiers assign entities to mutually exclusive groups that exhibit or more shared characteristics. A more viable approach might be to construct an adaptive retrieval system that produces groupings of related entities by generating dynamic categories based on document content and on the system's emergent structure as it adapts to modifications in the database and to observed patterns of access. Presents a theoretical model for adaptive information networks using relevance feedback and genetic algorithms to generate emergent structure
Imprint: Alberta : Alberta University, School of Library and Information Studies
Source: Connectedness: information, systems, people, organizations. Proceedings of CAIS/ACSI 95, the proceedings of the 23rd Annual Conference of the Canadian Association for Information Science. Ed. by Hope A. Olson and Denis B. Ward

Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.02

0.023878481 = product of:
  0.07163544 = sum of:
    0.023961417 = weight(_text_:information in 402) [ClassicSimilarity], result of:
      0.023961417 = score(doc=402,freq=2.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.3103276 = fieldWeight in 402, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.125 = fieldNorm(doc=402)
    0.047674023 = product of:
      0.095348045 = sum of:
        0.095348045 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
          0.095348045 = score(doc=402,freq=2.0), product of:
            0.1540252 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.043984205 = queryNorm
            0.61904186 = fieldWeight in 402, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.125 = fieldNorm(doc=402)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)

Source: Information processing and management. 22(1986) no.6, S.465-476

Hlava, M.M.K.: Automatic indexing : comparing rule-based and statistics-based indexing systems (2005) 0.02

0.02089367 = product of:
  0.06268101 = sum of:
    0.020966241 = weight(_text_:information in 6265) [ClassicSimilarity], result of:
      0.020966241 = score(doc=6265,freq=2.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.27153665 = fieldWeight in 6265, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.109375 = fieldNorm(doc=6265)
    0.04171477 = product of:
      0.08342954 = sum of:
        0.08342954 = weight(_text_:22 in 6265) [ClassicSimilarity], result of:
          0.08342954 = score(doc=6265,freq=2.0), product of:
            0.1540252 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.043984205 = queryNorm
            0.5416616 = fieldWeight in 6265, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=6265)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)

Source: Information outlook. 9(2005) no.8, S.22-23

Moreno, J.M.T.: Automatic text summarization (2014) 0.02
```
0.020615976 = product of:
  0.061847925 = sum of:
    0.007487943 = weight(_text_:information in 1518) [ClassicSimilarity], result of:
      0.007487943 = score(doc=1518,freq=2.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.09697737 = fieldWeight in 1518, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1518)
    0.054359984 = weight(_text_:networks in 1518) [ClassicSimilarity], result of:
      0.054359984 = score(doc=1518,freq=2.0), product of:
        0.20804176 = queryWeight, product of:
          4.72992 = idf(docFreq=1060, maxDocs=44218)
          0.043984205 = queryNorm
        0.26129362 = fieldWeight in 1518, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.72992 = idf(docFreq=1060, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1518)
  0.33333334 = coord(2/6)
```
Abstract

This new textbook examines the motivations and the different algorithms for automatic document summarization (ADS). We performed a recent state of the art. The book shows the main problems of ADS, difficulties and the solutions provided by the community. It presents recent advances in ADS, as well as current applications and trends. The approaches are statistical, linguistic and symbolic. Several exemples are included in order to clarify the theoretical concepts. The books currently available in the area of Automatic Document Summarization are not recent. Powerful algorithms have been developed in recent years that include several applications of ADS. The development of recent technology has impacted on the development of algorithms and their applications. The massive use of social networks and the new forms of the technology requires the adaptation of the classical methods of text summarizers. This is a new textbook on Automatic Text Summarization, based on teaching materials used in two or one-semester courses. It presents a extensive state-of-art and describes the new systems on the subject. Previous automatic summarization books have been either collections of specialized papers, or else authored books with only a chapter or two devoted to the field as a whole. In other hand, the classic books on the subject are not recent.

Content

Automatic Text Summarization Some Important Concepts 23 Single document Summarization 53 Guided Multi-Document Summarization 109 Emerging systems 151 Source and DomainSpecific Summarization 179 Text Abstracting 219 Evaluating Document Summaries 243 Conclusion 275 Information Retrieval NLP and Automatic Text Summarization 281 Automatic Text Summarization Resources 305
Rasmussen, E.M.: Indexing and retrieval for the Web (2002) 0.02
```
0.019450821 = product of:
  0.058352463 = sum of:
    0.020300474 = weight(_text_:information in 4285) [ClassicSimilarity], result of:
      0.020300474 = score(doc=4285,freq=30.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.2629142 = fieldWeight in 4285, product of:
          5.477226 = tf(freq=30.0), with freq of:
            30.0 = termFreq=30.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02734375 = fieldNorm(doc=4285)
    0.03805199 = weight(_text_:networks in 4285) [ClassicSimilarity], result of:
      0.03805199 = score(doc=4285,freq=2.0), product of:
        0.20804176 = queryWeight, product of:
          4.72992 = idf(docFreq=1060, maxDocs=44218)
          0.043984205 = queryNorm
        0.18290554 = fieldWeight in 4285, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.72992 = idf(docFreq=1060, maxDocs=44218)
          0.02734375 = fieldNorm(doc=4285)
  0.33333334 = coord(2/6)
```
Abstract

The introduction and growth of the World Wide Web (WWW, or Web) have resulted in a profound change in the way individuals and organizations access information. In terms of volume, nature, and accessibility, the characteristics of electronic information are significantly different from those of even five or six years ago. Control of, and access to, this flood of information rely heavily an automated techniques for indexing and retrieval. According to Gudivada, Raghavan, Grosky, and Kasanagottu (1997, p. 58), "The ability to search and retrieve information from the Web efficiently and effectively is an enabling technology for realizing its full potential." Almost 93 percent of those surveyed consider the Web an "indispensable" Internet technology, second only to e-mail (Graphie, Visualization & Usability Center, 1998). Although there are other ways of locating information an the Web (browsing or following directory structures), 85 percent of users identify Web pages by means of a search engine (Graphie, Visualization & Usability Center, 1998). A more recent study conducted by the Stanford Institute for the Quantitative Study of Society confirms the finding that searching for information is second only to e-mail as an Internet activity (Nie & Ebring, 2000, online). In fact, Nie and Ebring conclude, "... the Internet today is a giant public library with a decidedly commercial tilt. The most widespread use of the Internet today is as an information search utility for products, travel, hobbies, and general information. Virtually all users interviewed responded that they engaged in one or more of these information gathering activities."
Techniques for automated indexing and information retrieval (IR) have been developed, tested, and refined over the past 40 years, and are well documented (see, for example, Agosti & Smeaton, 1996; BaezaYates & Ribeiro-Neto, 1999a; Frakes & Baeza-Yates, 1992; Korfhage, 1997; Salton, 1989; Witten, Moffat, & Bell, 1999). With the introduction of the Web, and the capability to index and retrieve via search engines, these techniques have been extended to a new environment. They have been adopted, altered, and in some Gases extended to include new methods. "In short, search engines are indispensable for searching the Web, they employ a variety of relatively advanced IR techniques, and there are some peculiar aspects of search engines that make searching the Web different than more conventional information retrieval" (Gordon & Pathak, 1999, p. 145). The environment for information retrieval an the World Wide Web differs from that of "conventional" information retrieval in a number of fundamental ways. The collection is very large and changes continuously, with pages being added, deleted, and altered. Wide variability between the size, structure, focus, quality, and usefulness of documents makes Web documents much more heterogeneous than a typical electronic document collection. The wide variety of document types includes images, video, audio, and scripts, as well as many different document languages. Duplication of documents and sites is common. Documents are interconnected through networks of hyperlinks. Because of the size and dynamic nature of the Web, preprocessing all documents requires considerable resources and is often not feasible, certainly not an the frequent basis required to ensure currency. Query length is usually much shorter than in other environments-only a few words-and user behavior differs from that in other environments. These differences make the Web a novel environment for information retrieval (Baeza-Yates & Ribeiro-Neto, 1999b; Bharat & Henzinger, 1998; Huang, 2000).

Source

Annual review of information science and technology. 37(2003), S.91-126
Mao, J.; Xu, W.; Yang, Y.; Wang, J.; Yuille, A.L.: Explain images with multimodal recurrent neural networks (2014) 0.02
```
0.018830853 = product of:
  0.11298511 = sum of:
    0.11298511 = weight(_text_:networks in 1557) [ClassicSimilarity], result of:
      0.11298511 = score(doc=1557,freq=6.0), product of:
        0.20804176 = queryWeight, product of:
          4.72992 = idf(docFreq=1060, maxDocs=44218)
          0.043984205 = queryNorm
        0.5430886 = fieldWeight in 1557, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.72992 = idf(docFreq=1060, maxDocs=44218)
          0.046875 = fieldNorm(doc=1557)
  0.16666667 = coord(1/6)
```
Abstract

In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel sentence descriptions to explain the content of images. It directly models the probability distribution of generating a word given previous words and the image. Image descriptions are generated by sampling from this distribution. The model consists of two sub-networks: a deep recurrent neural network for sentences and a deep convolutional network for images. These two sub-networks interact with each other in a multimodal layer to form the whole m-RNN model. The effectiveness of our model is validated on three benchmark datasets (IAPR TC-12 [8], Flickr 8K [28], and Flickr 30K [13]). Our model outperforms the state-of-the-art generative method. In addition, the m-RNN model can be applied to retrieval tasks for retrieving images or sentences, and achieves significant performance improvement over the state-of-the-art methods which directly optimize the ranking objective function for retrieval.
Cui, H.; Boufford, D.; Selden, P.: Semantic annotation of biosystematics literature without training examples (2010) 0.02
```
0.01773066 = product of:
  0.05319198 = sum of:
    0.0089855315 = weight(_text_:information in 3422) [ClassicSimilarity], result of:
      0.0089855315 = score(doc=3422,freq=2.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.116372846 = fieldWeight in 3422, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=3422)
    0.044206448 = product of:
      0.088412896 = sum of:
        0.088412896 = weight(_text_:states in 3422) [ClassicSimilarity], result of:
          0.088412896 = score(doc=3422,freq=2.0), product of:
            0.24220218 = queryWeight, product of:
              5.506572 = idf(docFreq=487, maxDocs=44218)
              0.043984205 = queryNorm
            0.3650376 = fieldWeight in 3422, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.506572 = idf(docFreq=487, maxDocs=44218)
              0.046875 = fieldNorm(doc=3422)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)
```
Abstract

This article presents an unsupervised algorithm for semantic annotation of morphological descriptions of whole organisms. The algorithm is able to annotate plain text descriptions with high accuracy at the clause level by exploiting the corpus itself. In other words, the algorithm does not need lexicons, syntactic parsers, training examples, or annotation templates. The evaluation on two real-life description collections in botany and paleontology shows that the algorithm has the following desirable features: (a) reduces/eliminates manual labor required to compile dictionaries and prepare source documents; (b) improves annotation coverage: the algorithm annotates what appears in documents and is not limited by predefined and often incomplete templates; (c) learns clean and reusable concepts: the algorithm learns organ names and character states that can be used to construct reusable domain lexicons, as opposed to collection-dependent patterns whose applicability is often limited to a particular collection; (d) insensitive to collection size; and (e) runs in linear time with respect to the number of clauses to be annotated.

Source

Journal of the American Society for Information Science and Technology. 61(2010) no.3, S.522-542

Biebricher, N.; Fuhr, N.; Lustig, G.; Schwantner, M.; Knorz, G.: ¬The automatic indexing system AIR/PHYS : from research to application (1988) 0.02

0.01699179 = product of:
  0.050975367 = sum of:
    0.0211791 = weight(_text_:information in 1952) [ClassicSimilarity], result of:
      0.0211791 = score(doc=1952,freq=4.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.27429342 = fieldWeight in 1952, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.078125 = fieldNorm(doc=1952)
    0.029796265 = product of:
      0.05959253 = sum of:
        0.05959253 = weight(_text_:22 in 1952) [ClassicSimilarity], result of:
          0.05959253 = score(doc=1952,freq=2.0), product of:
            0.1540252 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.043984205 = queryNorm
            0.38690117 = fieldWeight in 1952, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=1952)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)

Date: 16. 8.1998 12:51:22
Footnote: Wiederabgedruckt in: Readings in information retrieval. Ed.: K. Sparck Jones u. P. Willett. San Francisco: Morgan Kaufmann 1997. S.513-517.
Source: Proceedings of the 11th annual conference on research and development in information retrieval. Ed.: Y. Chiaramella

Ma, N.; Zheng, H.T.; Xiao, X.: ¬An ontology-based latent semantic indexing approach using long short-term memory networks (2017) 0.02
```
0.015692377 = product of:
  0.09415426 = sum of:
    0.09415426 = weight(_text_:networks in 3810) [ClassicSimilarity], result of:
      0.09415426 = score(doc=3810,freq=6.0), product of:
        0.20804176 = queryWeight, product of:
          4.72992 = idf(docFreq=1060, maxDocs=44218)
          0.043984205 = queryNorm
        0.45257387 = fieldWeight in 3810, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.72992 = idf(docFreq=1060, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3810)
  0.16666667 = coord(1/6)
```
Abstract

Nowadays, online data shows an astonishing increase and the issue of semantic indexing remains an open question. Ontologies and knowledge bases have been widely used to optimize performance. However, researchers are placing increased emphasis on internal relations of ontologies but neglect latent semantic relations between ontologies and documents. They generally annotate instances mentioned in documents, which are related to concepts in ontologies. In this paper, we propose an Ontology-based Latent Semantic Indexing approach utilizing Long Short-Term Memory networks (LSTM-OLSI). We utilize an importance-aware topic model to extract document-level semantic features and leverage ontologies to extract word-level contextual features. Then we encode the above two levels of features and match their embedding vectors utilizing LSTM networks. Finally, the experimental results reveal that LSTM-OLSI outperforms existing techniques and demonstrates deep comprehension of instances and articles.
Karpathy, A.; Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions (2015) 0.02
```
0.015375326 = product of:
  0.09225196 = sum of:
    0.09225196 = weight(_text_:networks in 1868) [ClassicSimilarity], result of:
      0.09225196 = score(doc=1868,freq=4.0), product of:
        0.20804176 = queryWeight, product of:
          4.72992 = idf(docFreq=1060, maxDocs=44218)
          0.043984205 = queryNorm
        0.44343 = fieldWeight in 1868, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.72992 = idf(docFreq=1060, maxDocs=44218)
          0.046875 = fieldNorm(doc=1868)
  0.16666667 = coord(1/6)
```
Abstract

We present a model that generates free-form natural language descriptions of image regions. Our model leverages datasets of images and their sentence descriptions to learn about the inter-modal correspondences between text and visual data. Our approach is based on a novel combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural Networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding. We then describe a Recurrent Neural Network architecture that uses the inferred alignments to learn to generate novel descriptions of image regions. We demonstrate the effectiveness of our alignment model with ranking experiments on Flickr8K, Flickr30K and COCO datasets, where we substantially improve on the state of the art. We then show that the sentences created by our generative model outperform retrieval baselines on the three aforementioned datasets and a new dataset of region-level annotations.

Kutschekmanesch, S.; Lutes, B.; Moelle, K.; Thiel, U.; Tzeras, K.: Automated multilingual indexing : a synthesis of rule-based and thesaurus-based methods (1998) 0.01

0.014924051 = product of:
  0.04477215 = sum of:
    0.014975886 = weight(_text_:information in 4157) [ClassicSimilarity], result of:
      0.014975886 = score(doc=4157,freq=2.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.19395474 = fieldWeight in 4157, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.078125 = fieldNorm(doc=4157)
    0.029796265 = product of:
      0.05959253 = sum of:
        0.05959253 = weight(_text_:22 in 4157) [ClassicSimilarity], result of:
          0.05959253 = score(doc=4157,freq=2.0), product of:
            0.1540252 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.043984205 = queryNorm
            0.38690117 = fieldWeight in 4157, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=4157)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)

Source: Information und Märkte: 50. Deutscher Dokumentartag 1998, Kongreß der Deutschen Gesellschaft für Dokumentation e.V. (DGD), Rheinische Friedrich-Wilhelms-Universität Bonn, 22.-24. September 1998. Hrsg. von Marlies Ockenfeld u. Gerhard J. Mantwill

Riloff, E.: ¬An empirical study of automated dictionary construction for information extraction in three domains (1996) 0.01

0.014862737 = product of:
  0.04458821 = sum of:
    0.020751199 = weight(_text_:information in 6752) [ClassicSimilarity], result of:
      0.020751199 = score(doc=6752,freq=6.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.2687516 = fieldWeight in 6752, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0625 = fieldNorm(doc=6752)
    0.023837011 = product of:
      0.047674023 = sum of:
        0.047674023 = weight(_text_:22 in 6752) [ClassicSimilarity], result of:
          0.047674023 = score(doc=6752,freq=2.0), product of:
            0.1540252 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.043984205 = queryNorm
            0.30952093 = fieldWeight in 6752, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=6752)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)

Abstract: AutoSlog is a system that addresses the knowledge engineering bottleneck for information extraction. AutoSlog automatically creates domain specific dictionaries for information extraction, given an appropriate training corpus. Describes experiments with AutoSlog in terrorism, joint ventures and microelectronics domains. Compares the performance of AutoSlog across the 3 domains, discusses the lessons learned and presents results from 2 experiments which demonstrate that novice users can generate effective dictionaries using AutoSlog
Date: 6. 3.1997 16:22:15

Donahue, J.; Hendricks, L.A.; Guadarrama, S.; Rohrbach, M.; Venugopalan, S.; Saenko, K.; Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description (2014) 0.01
```
0.012812771 = product of:
  0.076876625 = sum of:
    0.076876625 = weight(_text_:networks in 1873) [ClassicSimilarity], result of:
      0.076876625 = score(doc=1873,freq=4.0), product of:
        0.20804176 = queryWeight, product of:
          4.72992 = idf(docFreq=1060, maxDocs=44218)
          0.043984205 = queryNorm
        0.369525 = fieldWeight in 1873, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.72992 = idf(docFreq=1060, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1873)
  0.16666667 = coord(1/6)
```
Abstract

Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or "temporally deep", are effective for tasks involving sequences, visual and otherwise. We develop a novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and demonstrate the value of these models on benchmark video recognition tasks, image description and retrieval problems, and video narration challenges. In contrast to current models which assume a fixed spatio-temporal receptive field or simple temporal averaging for sequential processing, recurrent convolutional models are "doubly deep" in that they can be compositional in spatial and temporal "layers". Such models may have advantages when target concepts are complex and/or training data are limited. Learning long-term dependencies is possible when nonlinearities are incorporated into the network state updates. Long-term RNN models are appealing in that they directly can map variable-length inputs (e.g., video frames) to variable length outputs (e.g., natural language text) and can model complex temporal dynamics; yet they can be optimized with backpropagation. Our recurrent long-term models are directly connected to modern visual convnet models and can be jointly trained to simultaneously learn temporal dynamics and convolutional perceptual representations. Our results show such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined and/or optimized.

Pfeifer, U.; Fuhr, N.; Huynh, T.: Searching structured documents with the enhanced retrieval functionality of freeWAIS-sf and SFgate (1995) 0.01

0.012683997 = product of:
  0.07610398 = sum of:
    0.07610398 = weight(_text_:networks in 2214) [ClassicSimilarity], result of:
      0.07610398 = score(doc=2214,freq=2.0), product of:
        0.20804176 = queryWeight, product of:
          4.72992 = idf(docFreq=1060, maxDocs=44218)
          0.043984205 = queryNorm
        0.36581108 = fieldWeight in 2214, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.72992 = idf(docFreq=1060, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2214)
  0.16666667 = coord(1/6)

Source: Computer networks and ISDN systems. 27(1995) no.6, S.1027-36

Lepsky, K.; Vorhauer, J.: Lingo - ein open source System für die Automatische Indexierung deutschsprachiger Dokumente (2006) 0.01

0.011939241 = product of:
  0.03581772 = sum of:
    0.011980709 = weight(_text_:information in 3581) [ClassicSimilarity], result of:
      0.011980709 = score(doc=3581,freq=2.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.1551638 = fieldWeight in 3581, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0625 = fieldNorm(doc=3581)
    0.023837011 = product of:
      0.047674023 = sum of:
        0.047674023 = weight(_text_:22 in 3581) [ClassicSimilarity], result of:
          0.047674023 = score(doc=3581,freq=2.0), product of:
            0.1540252 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.043984205 = queryNorm
            0.30952093 = fieldWeight in 3581, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=3581)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)

Abstract: Lingo ist ein frei verfügbares System (open source) zur automatischen Indexierung der deutschen Sprache. Bei der Entwicklung von lingo standen hohe Konfigurierbarkeit und Flexibilität des Systems für unterschiedliche Einsatzmöglichkeiten im Vordergrund. Der Beitrag zeigt den Nutzen einer linguistisch basierten automatischen Indexierung für das Information Retrieval auf. Die für eine Retrievalverbesserung zur Verfügung stehende linguistische Funktionalität von lingo wird vorgestellt und an Beispielen erläutert: Grundformerkennung, Kompositumerkennung bzw. Kompositumzerlegung, Wortrelationierung, lexikalische und algorithmische Mehrwortgruppenerkennung, OCR-Fehlerkorrektur. Der offene Systemaufbau von lingo wird beschrieben, mögliche Einsatzszenarien und Anwendungsgrenzen werden benannt.
Date: 24. 3.2006 12:22:02

Hodges, P.R.: Keyword in title indexes : effectiveness of retrieval in computer searches (1983) 0.01

0.011894252 = product of:
  0.035682756 = sum of:
    0.014825371 = weight(_text_:information in 5001) [ClassicSimilarity], result of:
      0.014825371 = score(doc=5001,freq=4.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.1920054 = fieldWeight in 5001, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5001)
    0.020857384 = product of:
      0.04171477 = sum of:
        0.04171477 = weight(_text_:22 in 5001) [ClassicSimilarity], result of:
          0.04171477 = score(doc=5001,freq=2.0), product of:
            0.1540252 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.043984205 = queryNorm
            0.2708308 = fieldWeight in 5001, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5001)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)

Abstract: A study was done to test the effectiveness of retrieval using title word searching. It was based on actual search profiles used in the Mechanized Information Center at Ohio State University, in order ro replicate as closely as possible actual searching conditions. Fewer than 50% of the relevant titles were retrieved by keywords in titles. The low rate of retrieval can be attributes to three sources: titles themselves, user and information specialist ignorance of the subject vocabulary in use, and to general language problems. Across fields it was found that the social sciences had the best retrieval rate, with science having the next best, and arts and humanities the lowest. Ways to enhance and supplement keyword in title searching on the computer and in printed indexes are discussed.
Date: 14. 3.1996 13:22:21

Bordoni, L.; Pazienza, M.T.: Documents automatic indexing in an environmental domain (1997) 0.01

0.011894252 = product of:
  0.035682756 = sum of:
    0.014825371 = weight(_text_:information in 530) [ClassicSimilarity], result of:
      0.014825371 = score(doc=530,freq=4.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.1920054 = fieldWeight in 530, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=530)
    0.020857384 = product of:
      0.04171477 = sum of:
        0.04171477 = weight(_text_:22 in 530) [ClassicSimilarity], result of:
          0.04171477 = score(doc=530,freq=2.0), product of:
            0.1540252 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.043984205 = queryNorm
            0.2708308 = fieldWeight in 530, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=530)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)

Abstract: Describes an application of Natural Language Processing (NLP) techniques, in HIRMA (Hypertextual Information Retrieval Managed by ARIOSTO), to the problem of document indexing by referring to a system which incorporates natural language processing techniques to determine the subject of the text of documents and to associate them with relevant semantic indexes. Describes briefly the overall system, details of its implementation on a corpus of scientific abstracts related to environmental topics and experimental evidence of the system's behaviour. Analyzes in detail an experiment designed to evaluate the system's retrieval ability in terms of recall and precision
Source: International forum on information and documentation. 22(1997) no.1, S.17-28

Newman, D.J.; Block, S.: Probabilistic topic decomposition of an eighteenth-century American newspaper (2006) 0.01

0.011894252 = product of:
  0.035682756 = sum of:
    0.014825371 = weight(_text_:information in 5291) [ClassicSimilarity], result of:
      0.014825371 = score(doc=5291,freq=4.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.1920054 = fieldWeight in 5291, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5291)
    0.020857384 = product of:
      0.04171477 = sum of:
        0.04171477 = weight(_text_:22 in 5291) [ClassicSimilarity], result of:
          0.04171477 = score(doc=5291,freq=2.0), product of:
            0.1540252 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.043984205 = queryNorm
            0.2708308 = fieldWeight in 5291, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5291)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)

Abstract: We use a probabilistic mixture decomposition method to determine topics in the Pennsylvania Gazette, a major colonial U.S. newspaper from 1728-1800. We assess the value of several topic decomposition techniques for historical research and compare the accuracy and efficacy of various methods. After determining the topics covered by the 80,000 articles and advertisements in the entire 18th century run of the Gazette, we calculate how the prevalence of those topics changed over time, and give historically relevant examples of our findings. This approach reveals important information about the content of this colonial newspaper, and suggests the value of such approaches to a more complete understanding of early American print culture and society.
Date: 22. 7.2006 17:32:00
Source: Journal of the American Society for Information Science and Technology. 57(2006) no.6, S.753-767

Search (270 results, page 1 of 14)

Authors

Years

Languages

Types

Themes

Subjects

Classifications