Search (168 results, page 1 of 9)

Blank, I.; Rokach, L.; Shani, G.: Leveraging metadata to recommend keywords for academic papers (2016) 0.09

0.090009674 = product of:
  0.120012894 = sum of:
    0.019343007 = weight(_text_:science in 3232) [ClassicSimilarity], result of:
      0.019343007 = score(doc=3232,freq=2.0), product of:
        0.1329271 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.050463587 = queryNorm
        0.1455159 = fieldWeight in 3232, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3232)
    0.0453818 = weight(_text_:research in 3232) [ClassicSimilarity], result of:
      0.0453818 = score(doc=3232,freq=8.0), product of:
        0.14397179 = queryWeight, product of:
          2.8529835 = idf(docFreq=6931, maxDocs=44218)
          0.050463587 = queryNorm
        0.31521314 = fieldWeight in 3232, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          2.8529835 = idf(docFreq=6931, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3232)
    0.055288088 = product of:
      0.110576175 = sum of:
        0.110576175 = weight(_text_:network in 3232) [ClassicSimilarity], result of:
          0.110576175 = score(doc=3232,freq=8.0), product of:
            0.22473325 = queryWeight, product of:
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.050463587 = queryNorm
            0.492033 = fieldWeight in 3232, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3232)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: Users of research databases, such as CiteSeerX, Google Scholar, and Microsoft Academic, often search for papers using a set of keywords. Unfortunately, many authors avoid listing sufficient keywords for their papers. As such, these applications may need to automatically associate good descriptive keywords with papers. When the full text of the paper is available this problem has been thoroughly studied. In many cases, however, due to copyright limitations, research databases do not have access to the full text. On the other hand, such databases typically maintain metadata, such as the title and abstract and the citation network of each paper. In this paper we study the problem of predicting which keywords are appropriate for a research paper, using different methods based on the citation network and available metadata. Our main goal is in providing search engines with the ability to extract keywords from the available metadata. However, our system can also be used for other applications, such as for recommending keywords for the authors of new papers. We create a data set of research papers, and their citation network, keywords, and other metadata, containing over 470K papers with and more than 2 million keywords. We compare our methods with predicting keywords using the title and abstract, in offline experiments and in a user study, concluding that the citation network provides much better predictions.
Source: Journal of the Association for Information Science and Technology. 67(2016) no.12, S.3073-3091

Research and development in information retrieval : Proc., Berlin, 18.-20.5.1982 (1983) 0.07

0.06725425 = product of:
  0.1345085 = sum of:
    0.061897624 = weight(_text_:science in 2332) [ClassicSimilarity], result of:
      0.061897624 = score(doc=2332,freq=2.0), product of:
        0.1329271 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.050463587 = queryNorm
        0.4656509 = fieldWeight in 2332, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.125 = fieldNorm(doc=2332)
    0.07261088 = weight(_text_:research in 2332) [ClassicSimilarity], result of:
      0.07261088 = score(doc=2332,freq=2.0), product of:
        0.14397179 = queryWeight, product of:
          2.8529835 = idf(docFreq=6931, maxDocs=44218)
          0.050463587 = queryNorm
        0.504341 = fieldWeight in 2332, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.8529835 = idf(docFreq=6931, maxDocs=44218)
          0.125 = fieldNorm(doc=2332)
  0.5 = coord(2/4)

Series: Lecture notes in computer science; vol.146

Newman, D.J.; Block, S.: Probabilistic topic decomposition of an eighteenth-century American newspaper (2006) 0.06

0.062083043 = product of:
  0.08277739 = sum of:
    0.027080212 = weight(_text_:science in 5291) [ClassicSimilarity], result of:
      0.027080212 = score(doc=5291,freq=2.0), product of:
        0.1329271 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.050463587 = queryNorm
        0.20372227 = fieldWeight in 5291, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5291)
    0.031767257 = weight(_text_:research in 5291) [ClassicSimilarity], result of:
      0.031767257 = score(doc=5291,freq=2.0), product of:
        0.14397179 = queryWeight, product of:
          2.8529835 = idf(docFreq=6931, maxDocs=44218)
          0.050463587 = queryNorm
        0.22064918 = fieldWeight in 5291, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.8529835 = idf(docFreq=6931, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5291)
    0.023929918 = product of:
      0.047859836 = sum of:
        0.047859836 = weight(_text_:22 in 5291) [ClassicSimilarity], result of:
          0.047859836 = score(doc=5291,freq=2.0), product of:
            0.17671488 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050463587 = queryNorm
            0.2708308 = fieldWeight in 5291, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5291)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: We use a probabilistic mixture decomposition method to determine topics in the Pennsylvania Gazette, a major colonial U.S. newspaper from 1728-1800. We assess the value of several topic decomposition techniques for historical research and compare the accuracy and efficacy of various methods. After determining the topics covered by the 80,000 articles and advertisements in the entire 18th century run of the Gazette, we calculate how the prevalence of those topics changed over time, and give historically relevant examples of our findings. This approach reveals important information about the content of this colonial newspaper, and suggests the value of such approaches to a more complete understanding of early American print culture and society.
Date: 22. 7.2006 17:32:00
Source: Journal of the American Society for Information Science and Technology. 57(2006) no.6, S.753-767

Biebricher, N.; Fuhr, N.; Lustig, G.; Schwantner, M.; Knorz, G.: ¬The automatic indexing system AIR/PHYS : from research to application (1988) 0.05

0.049182575 = product of:
  0.09836515 = sum of:
    0.064179555 = weight(_text_:research in 1952) [ClassicSimilarity], result of:
      0.064179555 = score(doc=1952,freq=4.0), product of:
        0.14397179 = queryWeight, product of:
          2.8529835 = idf(docFreq=6931, maxDocs=44218)
          0.050463587 = queryNorm
        0.44577867 = fieldWeight in 1952, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.8529835 = idf(docFreq=6931, maxDocs=44218)
          0.078125 = fieldNorm(doc=1952)
    0.034185596 = product of:
      0.06837119 = sum of:
        0.06837119 = weight(_text_:22 in 1952) [ClassicSimilarity], result of:
          0.06837119 = score(doc=1952,freq=2.0), product of:
            0.17671488 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050463587 = queryNorm
            0.38690117 = fieldWeight in 1952, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=1952)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Date: 16. 8.1998 12:51:22
Source: Proceedings of the 11th annual conference on research and development in information retrieval. Ed.: Y. Chiaramella

Haas, S.; He, S.: Toward the automatic identification of sublanguage vocabulary (1993) 0.04

0.040036835 = product of:
  0.08007367 = sum of:
    0.04376823 = weight(_text_:science in 4891) [ClassicSimilarity], result of:
      0.04376823 = score(doc=4891,freq=4.0), product of:
        0.1329271 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.050463587 = queryNorm
        0.3292649 = fieldWeight in 4891, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.0625 = fieldNorm(doc=4891)
    0.03630544 = weight(_text_:research in 4891) [ClassicSimilarity], result of:
      0.03630544 = score(doc=4891,freq=2.0), product of:
        0.14397179 = queryWeight, product of:
          2.8529835 = idf(docFreq=6931, maxDocs=44218)
          0.050463587 = queryNorm
        0.2521705 = fieldWeight in 4891, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.8529835 = idf(docFreq=6931, maxDocs=44218)
          0.0625 = fieldNorm(doc=4891)
  0.5 = coord(2/4)

Abstract: Describes a method developed for automatic identification of sublanguage vocabulary words as they occur in abstracts. Describes the sublanguage vocabulary identification procedures using abstracts from computer science and library and information science as sublanguage sources. Evaluates the results using three criteria. Discuss the practical and theoretical significance of this research and plans for further experiments

Mongin, L.; Fu, Y.Y.; Mostafa, J.: Open Archives data Service prototype and automated subject indexing using D-Lib archive content as a testbed (2003) 0.04

0.03999416 = product of:
  0.07998832 = sum of:
    0.032826174 = weight(_text_:science in 1167) [ClassicSimilarity], result of:
      0.032826174 = score(doc=1167,freq=4.0), product of:
        0.1329271 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.050463587 = queryNorm
        0.24694869 = fieldWeight in 1167, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.046875 = fieldNorm(doc=1167)
    0.04716215 = weight(_text_:research in 1167) [ClassicSimilarity], result of:
      0.04716215 = score(doc=1167,freq=6.0), product of:
        0.14397179 = queryWeight, product of:
          2.8529835 = idf(docFreq=6931, maxDocs=44218)
          0.050463587 = queryNorm
        0.3275791 = fieldWeight in 1167, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          2.8529835 = idf(docFreq=6931, maxDocs=44218)
          0.046875 = fieldNorm(doc=1167)
  0.5 = coord(2/4)

Abstract: The Indiana University School of Library and Information Science opened a new research laboratory in January 2003; The Indiana University School of Library and Information Science Information Processing Laboratory [IU IP Lab]. The purpose of the new laboratory is to facilitate collaboration between scientists in the department in the areas of information retrieval (IR) and information visualization (IV) research. The lab has several areas of focus. These include grid and cluster computing, and a standard Java-based software platform to support plug and play research datasets, a selection of standard IR modules and standard IV algorithms. Future development includes software to enable researchers to contribute datasets, IR algorithms, and visualization algorithms into the standard environment. We decided early on to use OAI-PMH as a resource discovery tool because it is consistent with our mission.

Lowe, D.B.; Dollinger, I.; Koster, T.; Herbert, B.E.: Text mining for type of research classification (2021) 0.04

0.03883488 = product of:
  0.07766976 = sum of:
    0.023211608 = weight(_text_:science in 720) [ClassicSimilarity], result of:
      0.023211608 = score(doc=720,freq=2.0), product of:
        0.1329271 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.050463587 = queryNorm
        0.17461908 = fieldWeight in 720, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.046875 = fieldNorm(doc=720)
    0.054458156 = weight(_text_:research in 720) [ClassicSimilarity], result of:
      0.054458156 = score(doc=720,freq=8.0), product of:
        0.14397179 = queryWeight, product of:
          2.8529835 = idf(docFreq=6931, maxDocs=44218)
          0.050463587 = queryNorm
        0.37825575 = fieldWeight in 720, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          2.8529835 = idf(docFreq=6931, maxDocs=44218)
          0.046875 = fieldNorm(doc=720)
  0.5 = coord(2/4)

Abstract: This project brought together undergraduate students in Computer Science with librarians to mine abstracts of articles from the Texas A&M University Libraries' institutional repository, OAKTrust, in order to probe the creation of new metadata to improve discovery and use. The mining operation task consisted simply of classifying the articles into two categories of research type: basic research ("for understanding," "curiosity-based," or "knowledge-based") and applied research ("use-based"). These categories are fundamental especially for funders but are also important to researchers. The mining-to-classification steps took several iterations, but ultimately, we achieved good results with the toolkit BERT (Bidirectional Encoder Representations from Transformers). The project and its workflows represent a preview of what may lie ahead in the future of crafting metadata using text mining techniques to enhance discoverability.

Markoff, J.: Researchers announce advance in image-recognition software (2014) 0.04
```
0.038463064 = product of:
  0.051284086 = sum of:
    0.009671504 = weight(_text_:science in 1875) [ClassicSimilarity], result of:
      0.009671504 = score(doc=1875,freq=2.0), product of:
        0.1329271 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.050463587 = queryNorm
        0.07275795 = fieldWeight in 1875, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.01953125 = fieldNorm(doc=1875)
    0.027790561 = weight(_text_:research in 1875) [ClassicSimilarity], result of:
      0.027790561 = score(doc=1875,freq=12.0), product of:
        0.14397179 = queryWeight, product of:
          2.8529835 = idf(docFreq=6931, maxDocs=44218)
          0.050463587 = queryNorm
        0.19302782 = fieldWeight in 1875, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          2.8529835 = idf(docFreq=6931, maxDocs=44218)
          0.01953125 = fieldNorm(doc=1875)
    0.013822022 = product of:
      0.027644044 = sum of:
        0.027644044 = weight(_text_:network in 1875) [ClassicSimilarity], result of:
          0.027644044 = score(doc=1875,freq=2.0), product of:
            0.22473325 = queryWeight, product of:
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.050463587 = queryNorm
            0.12300825 = fieldWeight in 1875, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.01953125 = fieldNorm(doc=1875)
      0.5 = coord(1/2)
  0.75 = coord(3/4)
```
Content

"Until now, so-called computer vision has largely been limited to recognizing individual objects. The new software, described on Monday by researchers at Google and at Stanford University, teaches itself to identify entire scenes: a group of young men playing Frisbee, for example, or a herd of elephants marching on a grassy plain. The software then writes a caption in English describing the picture. Compared with human observations, the researchers found, the computer-written descriptions are surprisingly accurate. The advances may make it possible to better catalog and search for the billions of images and hours of video available online, which are often poorly described and archived. At the moment, search engines like Google rely largely on written language accompanying an image or video to ascertain what it contains. "I consider the pixel data in images and video to be the dark matter of the Internet," said Fei-Fei Li, director of the Stanford Artificial Intelligence Laboratory, who led the research with Andrej Karpathy, a graduate student. "We are now starting to illuminate it." Dr. Li and Mr. Karpathy published their research as a Stanford University technical report. The Google team published their paper on arXiv.org, an open source site hosted by Cornell University.
In the longer term, the new research may lead to technology that helps the blind and robots navigate natural environments. But it also raises chilling possibilities for surveillance. During the past 15 years, video cameras have been placed in a vast number of public and private spaces. In the future, the software operating the cameras will not only be able to identify particular humans via facial recognition, experts say, but also identify certain types of behavior, perhaps even automatically alerting authorities. Two years ago Google researchers created image-recognition software and presented it with 10 million images taken from YouTube videos. Without human guidance, the program trained itself to recognize cats - a testament to the number of cat videos on YouTube. Current artificial intelligence programs in new cars already can identify pedestrians and bicyclists from cameras positioned atop the windshield and can stop the car automatically if the driver does not take action to avoid a collision. But "just single object recognition is not very beneficial," said Ali Farhadi, a computer scientist at the University of Washington who has published research on software that generates sentences from digital pictures. "We've focused on objects, and we've ignored verbs," he said, adding that these programs do not grasp what is going on in an image. Both the Google and Stanford groups tackled the problem by refining software programs known as neural networks, inspired by our understanding of how the brain works. Neural networks can "train" themselves to discover similarities and patterns in data, even when their human creators do not know the patterns exist.
Computer vision specialists said that despite the improvements, these software systems had made only limited progress toward the goal of digitally duplicating human vision and, even more elusive, understanding. "I don't know that I would say this is 'understanding' in the sense we want," said John R. Smith, a senior manager at I.B.M.'s T.J. Watson Research Center in Yorktown Heights, N.Y. "I think even the ability to generate language here is very limited." But the Google and Stanford teams said that they expect to see significant increases in accuracy as they improve their software and train these programs with larger sets of annotated images. A research group led by Tamara L. Berg, a computer scientist at the University of North Carolina at Chapel Hill, is training a neural network with one million images annotated by humans. "You're trying to tell the story behind the image," she said. "A natural scene will be very complex, and you want to pick out the most important objects in the image.""

Source

http://www.nytimes.com/2014/11/18/science/researchers-announce-breakthrough-in-content-recognition-software.html

Stankovic, R. et al.: Indexing of textual databases based on lexical resources : a case study for Serbian (2016) 0.04

0.036435805 = product of:
  0.07287161 = sum of:
    0.038686015 = weight(_text_:science in 2759) [ClassicSimilarity], result of:
      0.038686015 = score(doc=2759,freq=2.0), product of:
        0.1329271 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.050463587 = queryNorm
        0.2910318 = fieldWeight in 2759, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.078125 = fieldNorm(doc=2759)
    0.034185596 = product of:
      0.06837119 = sum of:
        0.06837119 = weight(_text_:22 in 2759) [ClassicSimilarity], result of:
          0.06837119 = score(doc=2759,freq=2.0), product of:
            0.17671488 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050463587 = queryNorm
            0.38690117 = fieldWeight in 2759, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2759)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Date: 1. 2.2016 18:25:22
Series: Lecture notes in computer science ; 9398

Daudaravicius, V.: ¬A framework for keyphrase extraction from scientific journals (2016) 0.04

0.03600295 = product of:
  0.0720059 = sum of:
    0.027080212 = weight(_text_:science in 2930) [ClassicSimilarity], result of:
      0.027080212 = score(doc=2930,freq=2.0), product of:
        0.1329271 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.050463587 = queryNorm
        0.20372227 = fieldWeight in 2930, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2930)
    0.044925686 = weight(_text_:research in 2930) [ClassicSimilarity], result of:
      0.044925686 = score(doc=2930,freq=4.0), product of:
        0.14397179 = queryWeight, product of:
          2.8529835 = idf(docFreq=6931, maxDocs=44218)
          0.050463587 = queryNorm
        0.31204507 = fieldWeight in 2930, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.8529835 = idf(docFreq=6931, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2930)
  0.5 = coord(2/4)

Abstract: We present a framework for keyphrase extraction from scientific journals in diverse research fields. While journal articles are often provided with manually assigned keywords, it is not clear how to automatically extract keywords and measure their significance for a set of journal articles. We compare extracted keyphrases from journals in the fields of astrophysics, mathematics, physics, and computer science. We show that the presented statistics-based framework is able to demonstrate differences among journals, and that the extracted keyphrases can be used to represent journal or conference research topics, dynamics, and specificity.

Simões, M. da Graça; Machado, L.M.; Souza, R.R.; Almeida, M.B.; Tavares Lopes, A.: Automatic indexing and ontologies : the consistency of research chronology and authoring in the context of Information Science (2018) 0.04

0.03600295 = product of:
  0.0720059 = sum of:
    0.027080212 = weight(_text_:science in 5909) [ClassicSimilarity], result of:
      0.027080212 = score(doc=5909,freq=2.0), product of:
        0.1329271 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.050463587 = queryNorm
        0.20372227 = fieldWeight in 5909, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5909)
    0.044925686 = weight(_text_:research in 5909) [ClassicSimilarity], result of:
      0.044925686 = score(doc=5909,freq=4.0), product of:
        0.14397179 = queryWeight, product of:
          2.8529835 = idf(docFreq=6931, maxDocs=44218)
          0.050463587 = queryNorm
        0.31204507 = fieldWeight in 5909, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.8529835 = idf(docFreq=6931, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5909)
  0.5 = coord(2/4)

Source: Challenges and opportunities for knowledge organization in the digital age: proceedings of the Fifteenth International ISKO Conference, 9-11 July 2018, Porto, Portugal / organized by: International Society for Knowledge Organization (ISKO), ISKO Spain and Portugal Chapter, University of Porto - Faculty of Arts and Humanities, Research Centre in Communication, Information and Digital Culture (CIC.digital) - Porto. Eds.: F. Ribeiro u. M.E. Cerveira

Samstag-Schnock, U.; Meadow, C.T.: PBS: an ecomical natural language query interpreter (1993) 0.03

0.033627126 = product of:
  0.06725425 = sum of:
    0.030948812 = weight(_text_:science in 5091) [ClassicSimilarity], result of:
      0.030948812 = score(doc=5091,freq=2.0), product of:
        0.1329271 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.050463587 = queryNorm
        0.23282544 = fieldWeight in 5091, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.0625 = fieldNorm(doc=5091)
    0.03630544 = weight(_text_:research in 5091) [ClassicSimilarity], result of:
      0.03630544 = score(doc=5091,freq=2.0), product of:
        0.14397179 = queryWeight, product of:
          2.8529835 = idf(docFreq=6931, maxDocs=44218)
          0.050463587 = queryNorm
        0.2521705 = fieldWeight in 5091, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.8529835 = idf(docFreq=6931, maxDocs=44218)
          0.0625 = fieldNorm(doc=5091)
  0.5 = coord(2/4)

Abstract: Reports on the design and implementation of the information searching and retrieval software, PBS (Parsing, Boolean recognition, Stemming) for the front end OAK 2, a new version of OAK developed at Toronto Univ. OAK 2 is a research tool for user behaviour studies. PBS receives natural language search statements from an end user and identifies search facets and implied Boolean logic operators
Source: Journal of the American Society for Information Science. 44(1993) no.5, S.265-272

Salton, G.; Buckley, C.: Approaches to global text analysis (1990) 0.03

0.029423734 = product of:
  0.05884747 = sum of:
    0.027080212 = weight(_text_:science in 4901) [ClassicSimilarity], result of:
      0.027080212 = score(doc=4901,freq=2.0), product of:
        0.1329271 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.050463587 = queryNorm
        0.20372227 = fieldWeight in 4901, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4901)
    0.031767257 = weight(_text_:research in 4901) [ClassicSimilarity], result of:
      0.031767257 = score(doc=4901,freq=2.0), product of:
        0.14397179 = queryWeight, product of:
          2.8529835 = idf(docFreq=6931, maxDocs=44218)
          0.050463587 = queryNorm
        0.22064918 = fieldWeight in 4901, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.8529835 = idf(docFreq=6931, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4901)
  0.5 = coord(2/4)

Source: ASIS'90: Information in the year 2000, from research to applications. Proc. of the 53rd Annual Meeting of the American Society for Information Science, Toronto, Canada, 4.-8.11.1990. Ed. by Diana Henderson

Warner, A.J.: ¬A linguistic approach to the automated hierarchical organization of phrases (1990) 0.03

0.029423734 = product of:
  0.05884747 = sum of:
    0.027080212 = weight(_text_:science in 4902) [ClassicSimilarity], result of:
      0.027080212 = score(doc=4902,freq=2.0), product of:
        0.1329271 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.050463587 = queryNorm
        0.20372227 = fieldWeight in 4902, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4902)
    0.031767257 = weight(_text_:research in 4902) [ClassicSimilarity], result of:
      0.031767257 = score(doc=4902,freq=2.0), product of:
        0.14397179 = queryWeight, product of:
          2.8529835 = idf(docFreq=6931, maxDocs=44218)
          0.050463587 = queryNorm
        0.22064918 = fieldWeight in 4902, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.8529835 = idf(docFreq=6931, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4902)
  0.5 = coord(2/4)

Source: ASIS'90: Information in the year 2000, from research to applications. Proc. of the 53rd Annual Meeting of the American Society for Information Science, Toronto, Canada, 4.-8.11.1990. Ed. by Diana Henderson

Chung, Y.M.; Lee, J.Y.: ¬A corpus-based approach to comparative evaluation of statistical term association measures (2001) 0.03
```
0.025716392 = product of:
  0.051432785 = sum of:
    0.019343007 = weight(_text_:science in 5769) [ClassicSimilarity], result of:
      0.019343007 = score(doc=5769,freq=2.0), product of:
        0.1329271 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.050463587 = queryNorm
        0.1455159 = fieldWeight in 5769, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5769)
    0.032089777 = weight(_text_:research in 5769) [ClassicSimilarity], result of:
      0.032089777 = score(doc=5769,freq=4.0), product of:
        0.14397179 = queryWeight, product of:
          2.8529835 = idf(docFreq=6931, maxDocs=44218)
          0.050463587 = queryNorm
        0.22288933 = fieldWeight in 5769, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.8529835 = idf(docFreq=6931, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5769)
  0.5 = coord(2/4)
```
Abstract

Statistical association measures have been widely applied in information retrieval research, usually employing a clustering of documents or terms on the basis of their relationships. Applications of the association measures for term clustering include automatic thesaurus construction and query expansion. This research evaluates the similarity of six association measures by comparing the relationship and behavior they demonstrate in various analyses of a test corpus. Analysis techniques include comparisons of highly ranked term pairs and term clusters, analyses of the correlation among the association measures using Pearson's correlation coefficient and MDS mapping, and an analysis of the impact of a term frequency on the association values by means of z-score. The major findings of the study are as follows: First, the most similar association measures are mutual information and Yule's coefficient of colligation Y, whereas cosine and Jaccard coefficients, as well as X**2 statistic and likelihood ratio, demonstrate quite similar behavior for terms with high frequency. Second, among all the measures, the X**2 statistic is the least affected by the frequency of terms. Third, although cosine and Jaccard coefficients tend to emphasize high frequency terms, mutual information and Yule's Y seem to overestimate rare terms

Source

Journal of the American Society for Information Science and technology. 52(2001) no.4, S.283-296
Kanan, T.; Fox, E.A.: Automated arabic text classification with P-Stemmer, machine learning, and a tailored news article taxonomy (2016) 0.03
```
0.025716392 = product of:
  0.051432785 = sum of:
    0.019343007 = weight(_text_:science in 3151) [ClassicSimilarity], result of:
      0.019343007 = score(doc=3151,freq=2.0), product of:
        0.1329271 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.050463587 = queryNorm
        0.1455159 = fieldWeight in 3151, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3151)
    0.032089777 = weight(_text_:research in 3151) [ClassicSimilarity], result of:
      0.032089777 = score(doc=3151,freq=4.0), product of:
        0.14397179 = queryWeight, product of:
          2.8529835 = idf(docFreq=6931, maxDocs=44218)
          0.050463587 = queryNorm
        0.22288933 = fieldWeight in 3151, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.8529835 = idf(docFreq=6931, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3151)
  0.5 = coord(2/4)
```
Abstract

Arabic news articles in electronic collections are difficult to study. Browsing by category is rarely supported. Although helpful machine-learning methods have been applied successfully to similar situations for English news articles, limited research has been completed to yield suitable solutions for Arabic news. In connection with a Qatar National Research Fund (QNRF)-funded project to build digital library community and infrastructure in Qatar, we developed software for browsing a collection of about 237,000 Arabic news articles, which should be applicable to other Arabic news collections. We designed a simple taxonomy for Arabic news stories that is suitable for the needs of Qatar and other nations, is compatible with the subject codes of the International Press Telecommunications Council, and was enhanced with the aid of a librarian expert as well as five Arabic-speaking volunteers. We developed tailored stemming (i.e., a new Arabic light stemmer called P-Stemmer) and automatic classification methods (the best being binary Support Vector Machines classifiers) to work with the taxonomy. Using evaluation techniques commonly used in the information retrieval community, including 10-fold cross-validation and the Wilcoxon signed-rank test, we showed that our approach to stemming and classification is superior to state-of-the-art techniques.

Source

Journal of the Association for Information Science and Technology. 67(2016) no.11, S.2667-2683

Strobel, S.; Marín-Arraiza, P.: Metadata for scientific audiovisual media : current practices and perspectives of the TIB / AV-portal (2015) 0.03

0.025716392 = product of:
  0.051432785 = sum of:
    0.019343007 = weight(_text_:science in 3667) [ClassicSimilarity], result of:
      0.019343007 = score(doc=3667,freq=2.0), product of:
        0.1329271 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.050463587 = queryNorm
        0.1455159 = fieldWeight in 3667, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3667)
    0.032089777 = weight(_text_:research in 3667) [ClassicSimilarity], result of:
      0.032089777 = score(doc=3667,freq=4.0), product of:
        0.14397179 = queryWeight, product of:
          2.8529835 = idf(docFreq=6931, maxDocs=44218)
          0.050463587 = queryNorm
        0.22288933 = fieldWeight in 3667, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.8529835 = idf(docFreq=6931, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3667)
  0.5 = coord(2/4)

Series: Communications in computer and information science; 544
Source: Metadata and semantics research: 9th Research Conference, MTSR 2015, Manchester, UK, September 9-11, 2015, Proceedings. Eds.: E. Garoufallou et al

Wang, S.; Koopman, R.: Embed first, then predict (2019) 0.03

0.025716392 = product of:
  0.051432785 = sum of:
    0.019343007 = weight(_text_:science in 5400) [ClassicSimilarity], result of:
      0.019343007 = score(doc=5400,freq=2.0), product of:
        0.1329271 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.050463587 = queryNorm
        0.1455159 = fieldWeight in 5400, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5400)
    0.032089777 = weight(_text_:research in 5400) [ClassicSimilarity], result of:
      0.032089777 = score(doc=5400,freq=4.0), product of:
        0.14397179 = queryWeight, product of:
          2.8529835 = idf(docFreq=6931, maxDocs=44218)
          0.050463587 = queryNorm
        0.22288933 = fieldWeight in 5400, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.8529835 = idf(docFreq=6931, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5400)
  0.5 = coord(2/4)

Footnote: Beitrag eines Special Issue: Research Information Systems and Science Classifications; including papers from "Trajectories for Research: Fathoming the Promise of the NARCIS Classification," 27-28 September 2018, The Hague, The Netherlands.

Hodges, P.R.: Keyword in title indexes : effectiveness of retrieval in computer searches (1983) 0.03

0.025505066 = product of:
  0.05101013 = sum of:
    0.027080212 = weight(_text_:science in 5001) [ClassicSimilarity], result of:
      0.027080212 = score(doc=5001,freq=2.0), product of:
        0.1329271 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.050463587 = queryNorm
        0.20372227 = fieldWeight in 5001, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5001)
    0.023929918 = product of:
      0.047859836 = sum of:
        0.047859836 = weight(_text_:22 in 5001) [ClassicSimilarity], result of:
          0.047859836 = score(doc=5001,freq=2.0), product of:
            0.17671488 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050463587 = queryNorm
            0.2708308 = fieldWeight in 5001, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5001)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: A study was done to test the effectiveness of retrieval using title word searching. It was based on actual search profiles used in the Mechanized Information Center at Ohio State University, in order ro replicate as closely as possible actual searching conditions. Fewer than 50% of the relevant titles were retrieved by keywords in titles. The low rate of retrieval can be attributes to three sources: titles themselves, user and information specialist ignorance of the subject vocabulary in use, and to general language problems. Across fields it was found that the social sciences had the best retrieval rate, with science having the next best, and arts and humanities the lowest. Ways to enhance and supplement keyword in title searching on the computer and in printed indexes are discussed.
Date: 14. 3.1996 13:22:21

Zhitomirsky-Geffet, M.; Prebor, G.; Bloch, O.: Improving proverb search and retrieval with a generic multidimensional ontology (2017) 0.03
```
0.025220342 = product of:
  0.050440684 = sum of:
    0.023211608 = weight(_text_:science in 3320) [ClassicSimilarity], result of:
      0.023211608 = score(doc=3320,freq=2.0), product of:
        0.1329271 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.050463587 = queryNorm
        0.17461908 = fieldWeight in 3320, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.046875 = fieldNorm(doc=3320)
    0.027229078 = weight(_text_:research in 3320) [ClassicSimilarity], result of:
      0.027229078 = score(doc=3320,freq=2.0), product of:
        0.14397179 = queryWeight, product of:
          2.8529835 = idf(docFreq=6931, maxDocs=44218)
          0.050463587 = queryNorm
        0.18912788 = fieldWeight in 3320, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.8529835 = idf(docFreq=6931, maxDocs=44218)
          0.046875 = fieldNorm(doc=3320)
  0.5 = coord(2/4)
```
Abstract

The goal of this research is to develop a generic ontological model for proverbs that unifies potential classification criteria and various characteristics of proverbs to enable their effective retrieval and large-scale analysis. Because proverbs can be described and indexed by multiple characteristics and criteria, we built a multidimensional ontology suitable for proverb classification. To evaluate the effectiveness of the constructed ontology for improving search and retrieval of proverbs, a large-scale user experiment was arranged with 70 users who were asked to search a proverb repository using ontology-based and free-text search interfaces. The comparative analysis of the results shows that the use of this ontology helped to substantially improve the search recall, precision, user satisfaction, and efficiency and to minimize user effort during the search process. A practical contribution of this work is an automated web-based proverb search and retrieval system which incorporates the proposed ontological scheme and an initial corpus of ontology-based annotated proverbs.

Source

Journal of the Association for Information Science and Technology. 68(2017) no.1, S.141-153

Search (168 results, page 1 of 9)

Authors

Years

Languages

Types

Themes