Search (137 results, page 1 of 7)

Munkelt, J.; Schaer, P.; Lepsky, K.: Towards an IR test collection for the German National Library (2018) 0.02
```
0.021074913 = product of:
  0.07376219 = sum of:
    0.014989593 = weight(_text_:with in 4311) [ClassicSimilarity], result of:
      0.014989593 = score(doc=4311,freq=2.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.15974675 = fieldWeight in 4311, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.046875 = fieldNorm(doc=4311)
    0.058772597 = product of:
      0.117545195 = sum of:
        0.117545195 = weight(_text_:humans in 4311) [ClassicSimilarity], result of:
          0.117545195 = score(doc=4311,freq=2.0), product of:
            0.26276368 = queryWeight, product of:
              6.7481275 = idf(docFreq=140, maxDocs=44218)
              0.038938753 = queryNorm
            0.44734186 = fieldWeight in 4311, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.7481275 = idf(docFreq=140, maxDocs=44218)
              0.046875 = fieldNorm(doc=4311)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)
```
Abstract

Automatic content indexing is one of the innovations that are increasingly changing the way libraries work. In theory, it promises a cataloguing service that would hardly be possible with humans in terms of speed, quantity and maybe quality. The German National Library (DNB) has also recognised this potential and is increasingly relying on the automatic indexing of their catalogue content. The DNB took a major step in this direction in 2017, which was announced in two papers. The announcement was rather restrained, but the content of the papers is all the more explosive for the library community: Since September 2017, the DNB has discontinued the intellectual indexing of series Band H and has switched to an automatic process for these series. The subject indexing of online publications (series O) has been purely automatical since 2010; from September 2017, monographs and periodicals published outside the publishing industry and university publications will no longer be indexed by people. This raises the question: What is the quality of the automatic indexing compared to the manual work or in other words to which degree can the automatic indexing replace people without a signi cant drop in regards to quality?

Asula, M.; Makke, J.; Freienthal, L.; Kuulmets, H.-A.; Sirel, R.: Kratt: developing an automatic subject indexing tool for the National Library of Estonia : how to transfer metadata information among work cluster members (2021) 0.02

0.021074913 = product of:
  0.07376219 = sum of:
    0.014989593 = weight(_text_:with in 723) [ClassicSimilarity], result of:
      0.014989593 = score(doc=723,freq=2.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.15974675 = fieldWeight in 723, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.046875 = fieldNorm(doc=723)
    0.058772597 = product of:
      0.117545195 = sum of:
        0.117545195 = weight(_text_:humans in 723) [ClassicSimilarity], result of:
          0.117545195 = score(doc=723,freq=2.0), product of:
            0.26276368 = queryWeight, product of:
              6.7481275 = idf(docFreq=140, maxDocs=44218)
              0.038938753 = queryNorm
            0.44734186 = fieldWeight in 723, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.7481275 = idf(docFreq=140, maxDocs=44218)
              0.046875 = fieldNorm(doc=723)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)

Abstract: Manual subject indexing in libraries is a time-consuming and costly process and the quality of the assigned subjects is affected by the cataloger's knowledge on the specific topics contained in the book. Trying to solve these issues, we exploited the opportunities arising from artificial intelligence to develop Kratt: a prototype of an automatic subject indexing tool. Kratt is able to subject index a book independent of its extent and genre with a set of keywords present in the Estonian Subject Thesaurus. It takes Kratt approximately one minute to subject index a book, outperforming humans 10-15 times. Although the resulting keywords were not considered satisfactory by the catalogers, the ratings of a small sample of regular library users showed more promise. We also argue that the results can be enhanced by including a bigger corpus for training the model and applying more careful preprocessing techniques.

Witschel, H.F.: Terminology extraction and automatic indexing : comparison and qualitative evaluation of methods (2005) 0.02
```
0.019040734 = product of:
  0.06664257 = sum of:
    0.017665405 = weight(_text_:with in 1842) [ClassicSimilarity], result of:
      0.017665405 = score(doc=1842,freq=4.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.18826336 = fieldWeight in 1842, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1842)
    0.048977163 = product of:
      0.097954325 = sum of:
        0.097954325 = weight(_text_:humans in 1842) [ClassicSimilarity], result of:
          0.097954325 = score(doc=1842,freq=2.0), product of:
            0.26276368 = queryWeight, product of:
              6.7481275 = idf(docFreq=140, maxDocs=44218)
              0.038938753 = queryNorm
            0.37278488 = fieldWeight in 1842, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.7481275 = idf(docFreq=140, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1842)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)
```
Abstract

Many terminology engineering processes involve the task of automatic terminology extraction: before the terminology of a given domain can be modelled, organised or standardised, important concepts (or terms) of this domain have to be identified and fed into terminological databases. These serve in further steps as a starting point for compiling dictionaries, thesauri or maybe even terminological ontologies for the domain. For the extraction of the initial concepts, extraction methods are needed that operate on specialised language texts. On the other hand, many machine learning or information retrieval applications require automatic indexing techniques. In Machine Learning applications concerned with the automatic clustering or classification of texts, often feature vectors are needed that describe the contents of a given text briefly but meaningfully. These feature vectors typically consist of a fairly small set of index terms together with weights indicating their importance. Short but meaningful descriptions of document contents as provided by good index terms are also useful to humans: some knowledge management applications (e.g. topic maps) use them as a set of basic concepts (topics). The author believes that the tasks of terminology extraction and automatic indexing have much in common and can thus benefit from the same set of basic algorithms. It is the goal of this paper to outline some methods that may be used in both contexts, but also to find the discriminating factors between the two tasks that call for the variation of parameters or application of different techniques. The discussion of these methods will be based on statistical, syntactical and especially morphological properties of (index) terms. The paper is concluded by the presentation of some qualitative and quantitative results comparing statistical and morphological methods.
Markoff, J.: Researchers announce advance in image-recognition software (2014) 0.02
```
0.018552726 = product of:
  0.06493454 = sum of:
    0.022519061 = weight(_text_:with in 1875) [ClassicSimilarity], result of:
      0.022519061 = score(doc=1875,freq=26.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.23998962 = fieldWeight in 1875, product of:
          5.0990195 = tf(freq=26.0), with freq of:
            26.0 = termFreq=26.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.01953125 = fieldNorm(doc=1875)
    0.042415474 = product of:
      0.08483095 = sum of:
        0.08483095 = weight(_text_:humans in 1875) [ClassicSimilarity], result of:
          0.08483095 = score(doc=1875,freq=6.0), product of:
            0.26276368 = queryWeight, product of:
              6.7481275 = idf(docFreq=140, maxDocs=44218)
              0.038938753 = queryNorm
            0.32284123 = fieldWeight in 1875, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              6.7481275 = idf(docFreq=140, maxDocs=44218)
              0.01953125 = fieldNorm(doc=1875)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)
```
Abstract

Two groups of scientists, working independently, have created artificial intelligence software capable of recognizing and describing the content of photographs and videos with far greater accuracy than ever before, sometimes even mimicking human levels of understanding.

Content

"Until now, so-called computer vision has largely been limited to recognizing individual objects. The new software, described on Monday by researchers at Google and at Stanford University, teaches itself to identify entire scenes: a group of young men playing Frisbee, for example, or a herd of elephants marching on a grassy plain. The software then writes a caption in English describing the picture. Compared with human observations, the researchers found, the computer-written descriptions are surprisingly accurate. The advances may make it possible to better catalog and search for the billions of images and hours of video available online, which are often poorly described and archived. At the moment, search engines like Google rely largely on written language accompanying an image or video to ascertain what it contains. "I consider the pixel data in images and video to be the dark matter of the Internet," said Fei-Fei Li, director of the Stanford Artificial Intelligence Laboratory, who led the research with Andrej Karpathy, a graduate student. "We are now starting to illuminate it." Dr. Li and Mr. Karpathy published their research as a Stanford University technical report. The Google team published their paper on arXiv.org, an open source site hosted by Cornell University.
In the longer term, the new research may lead to technology that helps the blind and robots navigate natural environments. But it also raises chilling possibilities for surveillance. During the past 15 years, video cameras have been placed in a vast number of public and private spaces. In the future, the software operating the cameras will not only be able to identify particular humans via facial recognition, experts say, but also identify certain types of behavior, perhaps even automatically alerting authorities. Two years ago Google researchers created image-recognition software and presented it with 10 million images taken from YouTube videos. Without human guidance, the program trained itself to recognize cats - a testament to the number of cat videos on YouTube. Current artificial intelligence programs in new cars already can identify pedestrians and bicyclists from cameras positioned atop the windshield and can stop the car automatically if the driver does not take action to avoid a collision. But "just single object recognition is not very beneficial," said Ali Farhadi, a computer scientist at the University of Washington who has published research on software that generates sentences from digital pictures. "We've focused on objects, and we've ignored verbs," he said, adding that these programs do not grasp what is going on in an image. Both the Google and Stanford groups tackled the problem by refining software programs known as neural networks, inspired by our understanding of how the brain works. Neural networks can "train" themselves to discover similarities and patterns in data, even when their human creators do not know the patterns exist.
In living organisms, webs of neurons in the brain vastly outperform even the best computer-based networks in perception and pattern recognition. But by adopting some of the same architecture, computers are catching up, learning to identify patterns in speech and imagery with increasing accuracy. The advances are apparent to consumers who use Apple's Siri personal assistant, for example, or Google's image search. Both groups of researchers employed similar approaches, weaving together two types of neural networks, one focused on recognizing images and the other on human language. In both cases the researchers trained the software with relatively small sets of digital images that had been annotated with descriptive sentences by humans. After the software programs "learned" to see patterns in the pictures and description, the researchers turned them on previously unseen images. The programs were able to identify objects and actions with roughly double the accuracy of earlier efforts, although still nowhere near human perception capabilities. "I was amazed that even with the small amount of training data that we were able to do so well," said Oriol Vinyals, a Google computer scientist who wrote the paper with Alexander Toshev, Samy Bengio and Dumitru Erhan, members of the Google Brain project. "The field is just starting, and we will see a lot of increases."
Computer vision specialists said that despite the improvements, these software systems had made only limited progress toward the goal of digitally duplicating human vision and, even more elusive, understanding. "I don't know that I would say this is 'understanding' in the sense we want," said John R. Smith, a senior manager at I.B.M.'s T.J. Watson Research Center in Yorktown Heights, N.Y. "I think even the ability to generate language here is very limited." But the Google and Stanford teams said that they expect to see significant increases in accuracy as they improve their software and train these programs with larger sets of annotated images. A research group led by Tamara L. Berg, a computer scientist at the University of North Carolina at Chapel Hill, is training a neural network with one million images annotated by humans. "You're trying to tell the story behind the image," she said. "A natural scene will be very complex, and you want to pick out the most important objects in the image.""

Footnote

A version of this article appears in print on November 18, 2014, on page A13 of the New York edition with the headline: Advance Reported in Content-Recognition Software. Vgl.: http://cs.stanford.edu/people/karpathy/cvpr2015.pdf. Vgl. auch: http://googleresearch.blogspot.de/2014/11/a-picture-is-worth-thousand-coherent.html. https://news.ycombinator.com/item?id=8621658 Vgl. auch: https://news.ycombinator.com/item?id=8621658.

Anderson, J.D.; Pérez-Carballo, J.: ¬The nature of indexing: how humans and machines analyze messages and texts for retrieval : Part I: Research and the nature of human indexing (2001) 0.02

0.01679217 = product of:
  0.117545195 = sum of:
    0.117545195 = product of:
      0.23509039 = sum of:
        0.23509039 = weight(_text_:humans in 3136) [ClassicSimilarity], result of:
          0.23509039 = score(doc=3136,freq=2.0), product of:
            0.26276368 = queryWeight, product of:
              6.7481275 = idf(docFreq=140, maxDocs=44218)
              0.038938753 = queryNorm
            0.8946837 = fieldWeight in 3136, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.7481275 = idf(docFreq=140, maxDocs=44218)
              0.09375 = fieldNorm(doc=3136)
      0.5 = coord(1/2)
  0.14285715 = coord(1/7)

Kajanan, S.; Bao, Y.; Datta, A.; VanderMeer, D.; Dutta, K.: Efficient automatic search query formulation using phrase-level analysis (2014) 0.01
```
0.014049942 = product of:
  0.049174793 = sum of:
    0.009993061 = weight(_text_:with in 1264) [ClassicSimilarity], result of:
      0.009993061 = score(doc=1264,freq=2.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.10649783 = fieldWeight in 1264, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.03125 = fieldNorm(doc=1264)
    0.03918173 = product of:
      0.07836346 = sum of:
        0.07836346 = weight(_text_:humans in 1264) [ClassicSimilarity], result of:
          0.07836346 = score(doc=1264,freq=2.0), product of:
            0.26276368 = queryWeight, product of:
              6.7481275 = idf(docFreq=140, maxDocs=44218)
              0.038938753 = queryNorm
            0.2982279 = fieldWeight in 1264, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.7481275 = idf(docFreq=140, maxDocs=44218)
              0.03125 = fieldNorm(doc=1264)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)
```
Abstract

Over the past decade, the volume of information available digitally over the Internet has grown enormously. Technical developments in the area of search, such as Google's Page Rank algorithm, have proved so good at serving relevant results that Internet search has become integrated into daily human activity. One can endlessly explore topics of interest simply by querying and reading through the resulting links. Yet, although search engines are well known for providing relevant results based on users' queries, users do not always receive the results they are looking for. Google's Director of Research describes clickstream evidence of frustrated users repeatedly reformulating queries and searching through page after page of results. Given the general quality of search engine results, one must consider the possibility that the frustrated user's query is not effective; that is, it does not describe the essence of the user's interest. Indeed, extensive research into human search behavior has found that humans are not very effective at formulating good search queries that describe what they are interested in. Ideally, the user should simply point to a portion of text that sparked the user's interest, and a system should automatically formulate a search query that captures the essence of the text. In this paper, we describe an implemented system that provides this capability. We first describe how our work differs from existing work in automatic query formulation, and propose a new method for improved quantification of the relevance of candidate search terms drawn from input text using phrase-level analysis. We then propose an implementable method designed to provide relevant queries based on a user's text input. We demonstrate the quality of our results and performance of our system through experimental studies. Our results demonstrate that our system produces relevant search terms with roughly two-thirds precision and recall compared to search terms selected by experts, and that typical users find significantly more relevant results (31% more relevant) more quickly (64% faster) using our system than self-formulated search queries. Further, we show that our implementation can scale to request loads of up to 10 requests per second within current online responsiveness expectations (<2-second response times at the highest loads tested).

Anderson, J.D.; Pérez-Carballo, J.: ¬The nature of indexing: how humans and machines analyze messages and texts for retrieval : Part II: Machine indexing, and the allocation of human versus machine effort (2001) 0.01

0.013993476 = product of:
  0.097954325 = sum of:
    0.097954325 = product of:
      0.19590865 = sum of:
        0.19590865 = weight(_text_:humans in 368) [ClassicSimilarity], result of:
          0.19590865 = score(doc=368,freq=2.0), product of:
            0.26276368 = queryWeight, product of:
              6.7481275 = idf(docFreq=140, maxDocs=44218)
              0.038938753 = queryNorm
            0.74556977 = fieldWeight in 368, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.7481275 = idf(docFreq=140, maxDocs=44218)
              0.078125 = fieldNorm(doc=368)
      0.5 = coord(1/2)
  0.14285715 = coord(1/7)

Ward, M.L.: ¬The future of the human indexer (1996) 0.01

0.013087479 = product of:
  0.045806177 = sum of:
    0.029979186 = weight(_text_:with in 7244) [ClassicSimilarity], result of:
      0.029979186 = score(doc=7244,freq=8.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.3194935 = fieldWeight in 7244, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.046875 = fieldNorm(doc=7244)
    0.015826989 = product of:
      0.031653978 = sum of:
        0.031653978 = weight(_text_:22 in 7244) [ClassicSimilarity], result of:
          0.031653978 = score(doc=7244,freq=2.0), product of:
            0.13635688 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038938753 = queryNorm
            0.23214069 = fieldWeight in 7244, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=7244)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)

Abstract: Considers the principles of indexing and the intellectual skills involved in order to determine what automatic indexing systems would be required in order to supplant or complement the human indexer. Good indexing requires: considerable prior knowledge of the literature; judgement as to what to index and what depth to index; reading skills; abstracting skills; and classification skills, Illustrates these features with a detailed description of abstracting and indexing processes involved in generating entries for the mechanical engineering database POWERLINK. Briefly assesses the possibility of replacing human indexers with specialist indexing software, with particular reference to the Object Analyzer from the InTEXT automatic indexing system and using the criteria described for human indexers. At present, it is unlikely that the automatic indexer will replace the human indexer, but when more primary texts are available in electronic form, it may be a useful productivity tool for dealing with large quantities of low grade texts (should they be wanted in the database)
Date: 9. 2.1997 18:44:22

Tsujii, J.-I.: Automatic acquisition of semantic collocation from corpora (1995) 0.01

0.011739651 = product of:
  0.041088775 = sum of:
    0.019986123 = weight(_text_:with in 4709) [ClassicSimilarity], result of:
      0.019986123 = score(doc=4709,freq=2.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.21299566 = fieldWeight in 4709, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.0625 = fieldNorm(doc=4709)
    0.021102654 = product of:
      0.042205308 = sum of:
        0.042205308 = weight(_text_:22 in 4709) [ClassicSimilarity], result of:
          0.042205308 = score(doc=4709,freq=2.0), product of:
            0.13635688 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038938753 = queryNorm
            0.30952093 = fieldWeight in 4709, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=4709)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)

Abstract: Proposes automatic linguistic knowledge acquisition from sublanguage corpora. The system combines existing linguistic knowledge and human intervention with corpus based techniques. The algorithm involves a gradual approximation which works to converge linguistic knowledge gradually towards desirable results. The 1st experiment revealed the characteristic of this algorithm and the others proved the effectiveness of this algorithm for a real corpus
Date: 31. 7.1996 9:22:19

Riloff, E.: ¬An empirical study of automated dictionary construction for information extraction in three domains (1996) 0.01

0.011739651 = product of:
  0.041088775 = sum of:
    0.019986123 = weight(_text_:with in 6752) [ClassicSimilarity], result of:
      0.019986123 = score(doc=6752,freq=2.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.21299566 = fieldWeight in 6752, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.0625 = fieldNorm(doc=6752)
    0.021102654 = product of:
      0.042205308 = sum of:
        0.042205308 = weight(_text_:22 in 6752) [ClassicSimilarity], result of:
          0.042205308 = score(doc=6752,freq=2.0), product of:
            0.13635688 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038938753 = queryNorm
            0.30952093 = fieldWeight in 6752, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=6752)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)

Abstract: AutoSlog is a system that addresses the knowledge engineering bottleneck for information extraction. AutoSlog automatically creates domain specific dictionaries for information extraction, given an appropriate training corpus. Describes experiments with AutoSlog in terrorism, joint ventures and microelectronics domains. Compares the performance of AutoSlog across the 3 domains, discusses the lessons learned and presents results from 2 experiments which demonstrate that novice users can generate effective dictionaries using AutoSlog
Date: 6. 3.1997 16:22:15

Hodges, P.R.: Keyword in title indexes : effectiveness of retrieval in computer searches (1983) 0.01

0.010272195 = product of:
  0.03595268 = sum of:
    0.017487857 = weight(_text_:with in 5001) [ClassicSimilarity], result of:
      0.017487857 = score(doc=5001,freq=2.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.1863712 = fieldWeight in 5001, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5001)
    0.018464822 = product of:
      0.036929645 = sum of:
        0.036929645 = weight(_text_:22 in 5001) [ClassicSimilarity], result of:
          0.036929645 = score(doc=5001,freq=2.0), product of:
            0.13635688 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038938753 = queryNorm
            0.2708308 = fieldWeight in 5001, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5001)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)

Abstract: A study was done to test the effectiveness of retrieval using title word searching. It was based on actual search profiles used in the Mechanized Information Center at Ohio State University, in order ro replicate as closely as possible actual searching conditions. Fewer than 50% of the relevant titles were retrieved by keywords in titles. The low rate of retrieval can be attributes to three sources: titles themselves, user and information specialist ignorance of the subject vocabulary in use, and to general language problems. Across fields it was found that the social sciences had the best retrieval rate, with science having the next best, and arts and humanities the lowest. Ways to enhance and supplement keyword in title searching on the computer and in printed indexes are discussed.
Date: 14. 3.1996 13:22:21

Bordoni, L.; Pazienza, M.T.: Documents automatic indexing in an environmental domain (1997) 0.01

0.010272195 = product of:
  0.03595268 = sum of:
    0.017487857 = weight(_text_:with in 530) [ClassicSimilarity], result of:
      0.017487857 = score(doc=530,freq=2.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.1863712 = fieldWeight in 530, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.0546875 = fieldNorm(doc=530)
    0.018464822 = product of:
      0.036929645 = sum of:
        0.036929645 = weight(_text_:22 in 530) [ClassicSimilarity], result of:
          0.036929645 = score(doc=530,freq=2.0), product of:
            0.13635688 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038938753 = queryNorm
            0.2708308 = fieldWeight in 530, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=530)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)

Abstract: Describes an application of Natural Language Processing (NLP) techniques, in HIRMA (Hypertextual Information Retrieval Managed by ARIOSTO), to the problem of document indexing by referring to a system which incorporates natural language processing techniques to determine the subject of the text of documents and to associate them with relevant semantic indexes. Describes briefly the overall system, details of its implementation on a corpus of scientific abstracts related to environmental topics and experimental evidence of the system's behaviour. Analyzes in detail an experiment designed to evaluate the system's retrieval ability in terms of recall and precision
Source: International forum on information and documentation. 22(1997) no.1, S.17-28

Wolfekuhler, M.R.; Punch, W.F.: Finding salient features for personal Web pages categories (1997) 0.01

0.010272195 = product of:
  0.03595268 = sum of:
    0.017487857 = weight(_text_:with in 2673) [ClassicSimilarity], result of:
      0.017487857 = score(doc=2673,freq=2.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.1863712 = fieldWeight in 2673, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2673)
    0.018464822 = product of:
      0.036929645 = sum of:
        0.036929645 = weight(_text_:22 in 2673) [ClassicSimilarity], result of:
          0.036929645 = score(doc=2673,freq=2.0), product of:
            0.13635688 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038938753 = queryNorm
            0.2708308 = fieldWeight in 2673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2673)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)

Abstract: Examines techniques that discover features in sets of pre-categorized documents, such that similar documents can be found on the WWW. Examines techniques which will classifiy training examples with high accuracy, then explains why this is not necessarily useful. Describes a method for extracting word clusters from the raw document features. Results show that the clustering technique is successful in discovering word groups in personal Web pages which can be used to find similar information on the WWW
Date: 1. 8.1996 22:08:06

Plaunt, C.; Norgard, B.A.: ¬An association-based method for automatic indexing with a controlled vocabulary (1998) 0.01
```
0.00881559 = product of:
  0.030854564 = sum of:
    0.017665405 = weight(_text_:with in 1794) [ClassicSimilarity], result of:
      0.017665405 = score(doc=1794,freq=4.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.18826336 = fieldWeight in 1794, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1794)
    0.013189158 = product of:
      0.026378317 = sum of:
        0.026378317 = weight(_text_:22 in 1794) [ClassicSimilarity], result of:
          0.026378317 = score(doc=1794,freq=2.0), product of:
            0.13635688 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038938753 = queryNorm
            0.19345059 = fieldWeight in 1794, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1794)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)
```
Abstract

In this article, we describe and test a two-stage algorithm based on a lexical collocation technique which maps from the lexical clues contained in a document representation into a controlled vocabulary list of subject headings. Using a collection of 4.626 INSPEC documents, we create a 'dictionary' of associations between the lexical items contained in the titles, authors, and abstracts, and controlled vocabulary subject headings assigned to those records by human indexers using a likelihood ratio statistic as the measure of association. In the deployment stage, we use the dictiony to predict which of the controlled vocabulary subject headings best describe new documents when they are presented to the system. Our evaluation of this algorithm, in which we compare the automatically assigned subject headings to the subject headings assigned to the test documents by human catalogers, shows that we can obtain results comparable to, and consistent with, human cataloging. In effect we have cast this as a classic partial match information retrieval problem. We consider the problem to be one of 'retrieving' (or assigning) the most probably 'relevant' (or correct) controlled vocabulary subject headings to a document based on the clues contained in that document

Date

11. 9.2000 19:53:22
Martins, A.L.; Souza, R.R.; Ribeiro de Mello, H.: ¬The use of noun phrases in information retrieval : proposing a mechanism for automatic classification (2014) 0.01
```
0.007959949 = product of:
  0.027859818 = sum of:
    0.017308492 = weight(_text_:with in 1441) [ClassicSimilarity], result of:
      0.017308492 = score(doc=1441,freq=6.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.18445967 = fieldWeight in 1441, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.03125 = fieldNorm(doc=1441)
    0.010551327 = product of:
      0.021102654 = sum of:
        0.021102654 = weight(_text_:22 in 1441) [ClassicSimilarity], result of:
          0.021102654 = score(doc=1441,freq=2.0), product of:
            0.13635688 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038938753 = queryNorm
            0.15476047 = fieldWeight in 1441, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=1441)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)
```
Abstract

This paper presents a research on syntactic structures known as noun phrases (NP) being applied to increase the effectiveness and efficiency of the mechanisms for the document's classification. Our hypothesis is the fact that the NP can be used instead of single words as a semantic aggregator to reduce the number of words that will be used for the classification system without losing its semantic coverage, increasing its efficiency. The experiment divided the documents classification process in three phases: a) NP preprocessing b) system training; and c) classification experiments. In the first step, a corpus of digitalized texts was submitted to a natural language processing platform1 in which the part-of-speech tagging was done, and them PERL scripts pertaining to the PALAVRAS package were used to extract the Noun Phrases. The preprocessing also involved the tasks of a) removing NP low meaning pre-modifiers, as quantifiers; b) identification of synonyms and corresponding substitution for common hyperonyms; and c) stemming of the relevant words contained in the NP, for similitude checking with other NPs. The first tests with the resulting documents have demonstrated its effectiveness. We have compared the structural similarity of the documents before and after the whole pre-processing steps of phase one. The texts maintained the consistency with the original and have kept the readability. The second phase involves submitting the modified documents to a SVM algorithm to identify clusters and classify the documents. The classification rules are to be established using a machine learning approach. Finally, tests will be conducted to check the effectiveness of the whole process.

Source

Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik
Mesquita, L.A.P.; Souza, R.R.; Baracho Porto, R.M.A.: Noun phrases in automatic indexing: : a structural analysis of the distribution of relevant terms in doctoral theses (2014) 0.01
```
0.007052472 = product of:
  0.02468365 = sum of:
    0.014132325 = weight(_text_:with in 1442) [ClassicSimilarity], result of:
      0.014132325 = score(doc=1442,freq=4.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.15061069 = fieldWeight in 1442, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.03125 = fieldNorm(doc=1442)
    0.010551327 = product of:
      0.021102654 = sum of:
        0.021102654 = weight(_text_:22 in 1442) [ClassicSimilarity], result of:
          0.021102654 = score(doc=1442,freq=2.0), product of:
            0.13635688 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038938753 = queryNorm
            0.15476047 = fieldWeight in 1442, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=1442)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)
```
Abstract

The main objective of this research was to analyze whether there was a characteristic distribution behavior of relevant terms over a scientific text that could contribute as a criterion for their process of automatic indexing. The terms considered in this study were only full noun phrases contained in the texts themselves. The texts were considered a total of 98 doctoral theses of the eight areas of knowledge in a same university. Initially, 20 full noun phrases were automatically extracted from each text as candidates to be the most relevant terms, and each author of each text assigned a relevance value 0-6 (not relevant and highly relevant, respectively) for each of the 20 noun phrases sent. Only, 22.1 % of noun phrases were considered not relevant. A relevance values of the terms assigned by the authors were associated with their positions in the text. Each full noun phrases found in the text was considered as a valid linear position. The results that were obtained showed values resulting from this distribution by considering two types of position: linear, with values consolidated into ten equal consecutive parts; and structural, considering parts of the text (such as introduction, development and conclusion). As a result of considerable importance, all areas of knowledge related to the Natural Sciences showed a characteristic behavior in the distribution of relevant terms, as well as all areas of knowledge related to Social Sciences showed the same characteristic behavior of distribution, but distinct from the Natural Sciences. The difference of the distribution behavior between the Natural and Social Sciences can be clearly visualized through graphs. All behaviors, including the general behavior of all areas of knowledge together, were characterized in polynomial equations and can be applied in future as criteria for automatic indexing. Until the present date this work has become inedited of for two reasons: to present a method for characterizing the distribution of relevant terms in a scientific text, and also, through this method, pointing out a quantitative trait difference between the Natural and Social Sciences.

Source

Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik

Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.01

0.0060293297 = product of:
  0.042205308 = sum of:
    0.042205308 = product of:
      0.084410615 = sum of:
        0.084410615 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
          0.084410615 = score(doc=402,freq=2.0), product of:
            0.13635688 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038938753 = queryNorm
            0.61904186 = fieldWeight in 402, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.125 = fieldNorm(doc=402)
      0.5 = coord(1/2)
  0.14285715 = coord(1/7)

Source: Information processing and management. 22(1986) no.6, S.465-476

Kantor, P.B.; Voorhees, E.: Information retrieval with scanned texts (2000) 0.01

0.005710321 = product of:
  0.039972246 = sum of:
    0.039972246 = weight(_text_:with in 3901) [ClassicSimilarity], result of:
      0.039972246 = score(doc=3901,freq=2.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.42599133 = fieldWeight in 3901, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.125 = fieldNorm(doc=3901)
  0.14285715 = coord(1/7)

Fuhr, N.; Niewelt, B.: ¬Ein Retrievaltest mit automatisch indexierten Dokumenten (1984) 0.01

0.005275664 = product of:
  0.036929645 = sum of:
    0.036929645 = product of:
      0.07385929 = sum of:
        0.07385929 = weight(_text_:22 in 262) [ClassicSimilarity], result of:
          0.07385929 = score(doc=262,freq=2.0), product of:
            0.13635688 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038938753 = queryNorm
            0.5416616 = fieldWeight in 262, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=262)
      0.5 = coord(1/2)
  0.14285715 = coord(1/7)

Date: 20.10.2000 12:22:23

Hlava, M.M.K.: Automatic indexing : comparing rule-based and statistics-based indexing systems (2005) 0.01

0.005275664 = product of:
  0.036929645 = sum of:
    0.036929645 = product of:
      0.07385929 = sum of:
        0.07385929 = weight(_text_:22 in 6265) [ClassicSimilarity], result of:
          0.07385929 = score(doc=6265,freq=2.0), product of:
            0.13635688 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038938753 = queryNorm
            0.5416616 = fieldWeight in 6265, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=6265)
      0.5 = coord(1/2)
  0.14285715 = coord(1/7)

Source: Information outlook. 9(2005) no.8, S.22-23

Search (137 results, page 1 of 7)

Authors

Years

Languages

Types

Themes

Classifications