Search (143 results, page 1 of 8)

Hlava, M.M.K.; Hainebach, R.: Machine aided indexing : European Parliament study and results (1996) 0.02

0.021220777 = product of:
  0.10610388 = sum of:
    0.012879624 = product of:
      0.025759248 = sum of:
        0.025759248 = weight(_text_:online in 5563) [ClassicSimilarity], result of:
          0.025759248 = score(doc=5563,freq=2.0), product of:
            0.096027054 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.031640913 = queryNorm
            0.2682499 = fieldWeight in 5563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.0625 = fieldNorm(doc=5563)
      0.5 = coord(1/2)
    0.0440151 = weight(_text_:software in 5563) [ClassicSimilarity], result of:
      0.0440151 = score(doc=5563,freq=2.0), product of:
        0.12552431 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.031640913 = queryNorm
        0.35064998 = fieldWeight in 5563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.0625 = fieldNorm(doc=5563)
    0.049209163 = weight(_text_:evaluation in 5563) [ClassicSimilarity], result of:
      0.049209163 = score(doc=5563,freq=2.0), product of:
        0.13272417 = queryWeight, product of:
          4.1947007 = idf(docFreq=1811, maxDocs=44218)
          0.031640913 = queryNorm
        0.37076265 = fieldWeight in 5563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1947007 = idf(docFreq=1811, maxDocs=44218)
          0.0625 = fieldNorm(doc=5563)
  0.2 = coord(3/15)

Abstract: Reports on a pilot study of the application of Access Innovations' machine aided indexing (MAI) system on the European Parliament's full text materials. Describes how the knowledge base used by the MAI software is created, and gives an evaluation of the system
Source: Proceedings of the 17th National Online Meeting 1996, New York, 14-16 May 1996. Ed.: M.E. Williams

Greiner-Petter, A.; Schubotz, M.; Cohl, H.S.; Gipp, B.: Semantic preserving bijective mappings for expressions involving special functions between computer algebra systems and document preparation systems (2019) 0.02

0.017144857 = product of:
  0.06429321 = sum of:
    0.00910727 = product of:
      0.01821454 = sum of:
        0.01821454 = weight(_text_:online in 5499) [ClassicSimilarity], result of:
          0.01821454 = score(doc=5499,freq=4.0), product of:
            0.096027054 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.031640913 = queryNorm
            0.18968134 = fieldWeight in 5499, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.03125 = fieldNorm(doc=5499)
      0.5 = coord(1/2)
    0.02200755 = weight(_text_:software in 5499) [ClassicSimilarity], result of:
      0.02200755 = score(doc=5499,freq=2.0), product of:
        0.12552431 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.031640913 = queryNorm
        0.17532499 = fieldWeight in 5499, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.03125 = fieldNorm(doc=5499)
    0.024604581 = weight(_text_:evaluation in 5499) [ClassicSimilarity], result of:
      0.024604581 = score(doc=5499,freq=2.0), product of:
        0.13272417 = queryWeight, product of:
          4.1947007 = idf(docFreq=1811, maxDocs=44218)
          0.031640913 = queryNorm
        0.18538132 = fieldWeight in 5499, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1947007 = idf(docFreq=1811, maxDocs=44218)
          0.03125 = fieldNorm(doc=5499)
    0.008573813 = product of:
      0.017147627 = sum of:
        0.017147627 = weight(_text_:22 in 5499) [ClassicSimilarity], result of:
          0.017147627 = score(doc=5499,freq=2.0), product of:
            0.110801086 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.031640913 = queryNorm
            0.15476047 = fieldWeight in 5499, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=5499)
      0.5 = coord(1/2)
  0.26666668 = coord(4/15)

Abstract: Purpose Modern mathematicians and scientists of math-related disciplines often use Document Preparation Systems (DPS) to write and Computer Algebra Systems (CAS) to calculate mathematical expressions. Usually, they translate the expressions manually between DPS and CAS. This process is time-consuming and error-prone. The purpose of this paper is to automate this translation. This paper uses Maple and Mathematica as the CAS, and LaTeX as the DPS. Design/methodology/approach Bruce Miller at the National Institute of Standards and Technology (NIST) developed a collection of special LaTeX macros that create links from mathematical symbols to their definitions in the NIST Digital Library of Mathematical Functions (DLMF). The authors are using these macros to perform rule-based translations between the formulae in the DLMF and CAS. Moreover, the authors develop software to ease the creation of new rules and to discover inconsistencies. Findings The authors created 396 mappings and translated 58.8 percent of DLMF formulae (2,405 expressions) successfully between Maple and DLMF. For a significant percentage, the special function definitions in Maple and the DLMF were different. An atomic symbol in one system maps to a composite expression in the other system. The translator was also successfully used for automatic verification of mathematical online compendia and CAS. The evaluation techniques discovered two errors in the DLMF and one defect in Maple. Originality/value This paper introduces the first translation tool for special functions between LaTeX and CAS. The approach improves error-prone manual translations and can be used to verify mathematical online compendia and CAS.
Date: 20. 1.2015 18:30:22

Markoff, J.: Researchers announce advance in image-recognition software (2014) 0.02
```
0.016000278 = product of:
  0.08000139 = sum of:
    0.0040248823 = product of:
      0.008049765 = sum of:
        0.008049765 = weight(_text_:online in 1875) [ClassicSimilarity], result of:
          0.008049765 = score(doc=1875,freq=2.0), product of:
            0.096027054 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.031640913 = queryNorm
            0.08382809 = fieldWeight in 1875, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.01953125 = fieldNorm(doc=1875)
      0.5 = coord(1/2)
    0.04959334 = weight(_text_:software in 1875) [ClassicSimilarity], result of:
      0.04959334 = score(doc=1875,freq=26.0), product of:
        0.12552431 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.031640913 = queryNorm
        0.39508954 = fieldWeight in 1875, product of:
          5.0990195 = tf(freq=26.0), with freq of:
            26.0 = termFreq=26.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.01953125 = fieldNorm(doc=1875)
    0.026383169 = weight(_text_:site in 1875) [ClassicSimilarity], result of:
      0.026383169 = score(doc=1875,freq=2.0), product of:
        0.1738463 = queryWeight, product of:
          5.494352 = idf(docFreq=493, maxDocs=44218)
          0.031640913 = queryNorm
        0.15176146 = fieldWeight in 1875, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.494352 = idf(docFreq=493, maxDocs=44218)
          0.01953125 = fieldNorm(doc=1875)
  0.2 = coord(3/15)
```
Abstract

Two groups of scientists, working independently, have created artificial intelligence software capable of recognizing and describing the content of photographs and videos with far greater accuracy than ever before, sometimes even mimicking human levels of understanding.

Content

"Until now, so-called computer vision has largely been limited to recognizing individual objects. The new software, described on Monday by researchers at Google and at Stanford University, teaches itself to identify entire scenes: a group of young men playing Frisbee, for example, or a herd of elephants marching on a grassy plain. The software then writes a caption in English describing the picture. Compared with human observations, the researchers found, the computer-written descriptions are surprisingly accurate. The advances may make it possible to better catalog and search for the billions of images and hours of video available online, which are often poorly described and archived. At the moment, search engines like Google rely largely on written language accompanying an image or video to ascertain what it contains. "I consider the pixel data in images and video to be the dark matter of the Internet," said Fei-Fei Li, director of the Stanford Artificial Intelligence Laboratory, who led the research with Andrej Karpathy, a graduate student. "We are now starting to illuminate it." Dr. Li and Mr. Karpathy published their research as a Stanford University technical report. The Google team published their paper on arXiv.org, an open source site hosted by Cornell University.
In the longer term, the new research may lead to technology that helps the blind and robots navigate natural environments. But it also raises chilling possibilities for surveillance. During the past 15 years, video cameras have been placed in a vast number of public and private spaces. In the future, the software operating the cameras will not only be able to identify particular humans via facial recognition, experts say, but also identify certain types of behavior, perhaps even automatically alerting authorities. Two years ago Google researchers created image-recognition software and presented it with 10 million images taken from YouTube videos. Without human guidance, the program trained itself to recognize cats - a testament to the number of cat videos on YouTube. Current artificial intelligence programs in new cars already can identify pedestrians and bicyclists from cameras positioned atop the windshield and can stop the car automatically if the driver does not take action to avoid a collision. But "just single object recognition is not very beneficial," said Ali Farhadi, a computer scientist at the University of Washington who has published research on software that generates sentences from digital pictures. "We've focused on objects, and we've ignored verbs," he said, adding that these programs do not grasp what is going on in an image. Both the Google and Stanford groups tackled the problem by refining software programs known as neural networks, inspired by our understanding of how the brain works. Neural networks can "train" themselves to discover similarities and patterns in data, even when their human creators do not know the patterns exist.
In living organisms, webs of neurons in the brain vastly outperform even the best computer-based networks in perception and pattern recognition. But by adopting some of the same architecture, computers are catching up, learning to identify patterns in speech and imagery with increasing accuracy. The advances are apparent to consumers who use Apple's Siri personal assistant, for example, or Google's image search. Both groups of researchers employed similar approaches, weaving together two types of neural networks, one focused on recognizing images and the other on human language. In both cases the researchers trained the software with relatively small sets of digital images that had been annotated with descriptive sentences by humans. After the software programs "learned" to see patterns in the pictures and description, the researchers turned them on previously unseen images. The programs were able to identify objects and actions with roughly double the accuracy of earlier efforts, although still nowhere near human perception capabilities. "I was amazed that even with the small amount of training data that we were able to do so well," said Oriol Vinyals, a Google computer scientist who wrote the paper with Alexander Toshev, Samy Bengio and Dumitru Erhan, members of the Google Brain project. "The field is just starting, and we will see a lot of increases."
Computer vision specialists said that despite the improvements, these software systems had made only limited progress toward the goal of digitally duplicating human vision and, even more elusive, understanding. "I don't know that I would say this is 'understanding' in the sense we want," said John R. Smith, a senior manager at I.B.M.'s T.J. Watson Research Center in Yorktown Heights, N.Y. "I think even the ability to generate language here is very limited." But the Google and Stanford teams said that they expect to see significant increases in accuracy as they improve their software and train these programs with larger sets of annotated images. A research group led by Tamara L. Berg, a computer scientist at the University of North Carolina at Chapel Hill, is training a neural network with one million images annotated by humans. "You're trying to tell the story behind the image," she said. "A natural scene will be very complex, and you want to pick out the most important objects in the image.""

Footnote

A version of this article appears in print on November 18, 2014, on page A13 of the New York edition with the headline: Advance Reported in Content-Recognition Software. Vgl.: http://cs.stanford.edu/people/karpathy/cvpr2015.pdf. Vgl. auch: http://googleresearch.blogspot.de/2014/11/a-picture-is-worth-thousand-coherent.html. https://news.ycombinator.com/item?id=8621658 Vgl. auch: https://news.ycombinator.com/item?id=8621658.

Faraj, N.: Analyse d'une methode d'indexation automatique basée sur une analyse syntaxique de texte (1996) 0.01

0.013188295 = product of:
  0.09891221 = sum of:
    0.0440151 = weight(_text_:software in 685) [ClassicSimilarity], result of:
      0.0440151 = score(doc=685,freq=2.0), product of:
        0.12552431 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.031640913 = queryNorm
        0.35064998 = fieldWeight in 685, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.0625 = fieldNorm(doc=685)
    0.054897115 = product of:
      0.10979423 = sum of:
        0.10979423 = weight(_text_:analyse in 685) [ClassicSimilarity], result of:
          0.10979423 = score(doc=685,freq=4.0), product of:
            0.16670908 = queryWeight, product of:
              5.268782 = idf(docFreq=618, maxDocs=44218)
              0.031640913 = queryNorm
            0.65859777 = fieldWeight in 685, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.268782 = idf(docFreq=618, maxDocs=44218)
              0.0625 = fieldNorm(doc=685)
      0.5 = coord(1/2)
  0.13333334 = coord(2/15)

Abstract: Evaluates an automatic indexing method based on syntactical text analysis combined with statistical analysis. Tests many combinations for the choice of term categories and weighting methods. The experiment, conducted on a software engineering corpus, shows systematic improvement in the use of syntactic term phrases compared to using only individual words as index terms

Search Engines and Beyond : Developing efficient knowledge management systems, April 19-20 1999, Boston, Mass (1999) 0.01
```
0.012301038 = product of:
  0.06150519 = sum of:
    0.02200755 = weight(_text_:software in 2596) [ClassicSimilarity], result of:
      0.02200755 = score(doc=2596,freq=2.0), product of:
        0.12552431 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.031640913 = queryNorm
        0.17532499 = fieldWeight in 2596, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.03125 = fieldNorm(doc=2596)
    0.024604581 = weight(_text_:evaluation in 2596) [ClassicSimilarity], result of:
      0.024604581 = score(doc=2596,freq=2.0), product of:
        0.13272417 = queryWeight, product of:
          4.1947007 = idf(docFreq=1811, maxDocs=44218)
          0.031640913 = queryNorm
        0.18538132 = fieldWeight in 2596, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1947007 = idf(docFreq=1811, maxDocs=44218)
          0.03125 = fieldNorm(doc=2596)
    0.014893063 = weight(_text_:web in 2596) [ClassicSimilarity], result of:
      0.014893063 = score(doc=2596,freq=2.0), product of:
        0.10326045 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031640913 = queryNorm
        0.14422815 = fieldWeight in 2596, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03125 = fieldNorm(doc=2596)
  0.2 = coord(3/15)
```
Content

Ramana Rao (Inxight, Palo Alto, CA) 7 ± 2 Insights on achieving Effective Information Access Session One: Updates and a twelve month perspective Danny Sullivan (Search Engine Watch, US / England) Portalization and other search trends Carol Tenopir (University of Tennessee) Search realities faced by end users and professional searchers Session Two: Today's search engines and beyond Daniel Hoogterp (Retrieval Technologies, McLean, VA) Effective presentation and utilization of search techniques Rick Kenny (Fulcrum Technologies, Ontario, Canada) Beyond document clustering: The knowledge impact statement Gary Stock (Ingenius, Kalamazoo, MI) Automated change monitoring Gary Culliss (Direct Hit, Wellesley Hills, MA) User popularity ranked search engines Byron Dom (IBM, CA) Automatically finding the best pages on the World Wide Web (CLEVER) Peter Tomassi (LookSmart, San Francisco, CA) Adding human intellect to search technology Session Three: Panel discussion: Human v automated categorization and editing Ev Brenner (New York, NY)- Chairman James Callan (University of Massachusetts, MA) Marc Krellenstein (Northern Light Technology, Cambridge, MA) Dan Miller (Ask Jeeves, Berkeley, CA) Session Four: Updates and a twelve month perspective Steve Arnold (AIT, Harrods Creek, KY) Review: The leading edge in search and retrieval software Ellen Voorhees (NIST, Gaithersburg, MD) TREC update Session Five: Search engines now and beyond Intelligent Agents John Snyder (Muscat, Cambridge, England) Practical issues behind intelligent agents Text summarization Therese Firmin, (Dept of Defense, Ft George G. Meade, MD) The TIPSTER/SUMMAC evaluation of automatic text summarization systems Cross language searching Elizabeth Liddy (TextWise, Syracuse, NY) A conceptual interlingua approach to cross-language retrieval. Video search and retrieval Armon Amir (IBM, Almaden, CA) CueVideo: Modular system for automatic indexing and browsing of video/audio Speech recognition Michael Witbrock (Lycos, Waltham, MA) Retrieval of spoken documents Visualization James A. Wise (Integral Visuals, Richland, WA) Information visualization in the new millennium: Emerging science or passing fashion? Text mining David Evans (Claritech, Pittsburgh, PA) Text mining - towards decision support

Smart, G.: Using language analysis to manage information (1993) 0.01

0.01104443 = product of:
  0.082833216 = sum of:
    0.0440151 = weight(_text_:software in 4423) [ClassicSimilarity], result of:
      0.0440151 = score(doc=4423,freq=2.0), product of:
        0.12552431 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.031640913 = queryNorm
        0.35064998 = fieldWeight in 4423, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.0625 = fieldNorm(doc=4423)
    0.03881812 = product of:
      0.07763624 = sum of:
        0.07763624 = weight(_text_:analyse in 4423) [ClassicSimilarity], result of:
          0.07763624 = score(doc=4423,freq=2.0), product of:
            0.16670908 = queryWeight, product of:
              5.268782 = idf(docFreq=618, maxDocs=44218)
              0.031640913 = queryNorm
            0.46569893 = fieldWeight in 4423, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.268782 = idf(docFreq=618, maxDocs=44218)
              0.0625 = fieldNorm(doc=4423)
      0.5 = coord(1/2)
  0.13333334 = coord(2/15)

Abstract: The ESPRIT project SIMPR developed software to analyse documents and generate indexes for them. Of immediate application as a document indexing and classification system, this also offers a technology for information modelling that has broader implications, supporting many new uses for information management softeware. The project was based on the assumption that information can only be managed successfully by computer systems that can view the information contained in a document through the language in which the document is written, and that systems need to be sufficiently flexible to respond to the changing requirements of document use

Golub, K.; Soergel, D.; Buchanan, G.; Tudhope, D.; Lykke, M.; Hiom, D.: ¬A framework for evaluating automatic indexing or classification in the context of retrieval (2016) 0.01
```
0.010770657 = product of:
  0.080779925 = sum of:
    0.027509436 = weight(_text_:software in 3311) [ClassicSimilarity], result of:
      0.027509436 = score(doc=3311,freq=2.0), product of:
        0.12552431 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.031640913 = queryNorm
        0.21915624 = fieldWeight in 3311, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3311)
    0.053270485 = weight(_text_:evaluation in 3311) [ClassicSimilarity], result of:
      0.053270485 = score(doc=3311,freq=6.0), product of:
        0.13272417 = queryWeight, product of:
          4.1947007 = idf(docFreq=1811, maxDocs=44218)
          0.031640913 = queryNorm
        0.40136236 = fieldWeight in 3311, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.1947007 = idf(docFreq=1811, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3311)
  0.13333334 = coord(2/15)
```
Abstract

Tools for automatic subject assignment help deal with scale and sustainability in creating and enriching metadata, establishing more connections across and between resources and enhancing consistency. Although some software vendors and experimental researchers claim the tools can replace manual subject indexing, hard scientific evidence of their performance in operating information environments is scarce. A major reason for this is that research is usually conducted in laboratory conditions, excluding the complexities of real-life systems and situations. The article reviews and discusses issues with existing evaluation approaches such as problems of aboutness and relevance assessments, implying the need to use more than a single "gold standard" method when evaluating indexing and retrieval, and proposes a comprehensive evaluation framework. The framework is informed by a systematic review of the literature on evaluation approaches: evaluating indexing quality directly through assessment by an evaluator or through comparison with a gold standard, evaluating the quality of computer-assisted indexing directly in the context of an indexing workflow, and evaluating indexing quality indirectly through analyzing retrieval performance.
Nohr, H.: Grundlagen der automatischen Indexierung : ein Lehrbuch (2003) 0.01
```
0.010743881 = product of:
  0.0537194 = sum of:
    0.020541003 = product of:
      0.041082006 = sum of:
        0.041082006 = weight(_text_:recherche in 1767) [ClassicSimilarity], result of:
          0.041082006 = score(doc=1767,freq=2.0), product of:
            0.17150146 = queryWeight, product of:
              5.4202437 = idf(docFreq=531, maxDocs=44218)
              0.031640913 = queryNorm
            0.23954318 = fieldWeight in 1767, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4202437 = idf(docFreq=531, maxDocs=44218)
              0.03125 = fieldNorm(doc=1767)
      0.5 = coord(1/2)
    0.024604581 = weight(_text_:evaluation in 1767) [ClassicSimilarity], result of:
      0.024604581 = score(doc=1767,freq=2.0), product of:
        0.13272417 = queryWeight, product of:
          4.1947007 = idf(docFreq=1811, maxDocs=44218)
          0.031640913 = queryNorm
        0.18538132 = fieldWeight in 1767, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1947007 = idf(docFreq=1811, maxDocs=44218)
          0.03125 = fieldNorm(doc=1767)
    0.008573813 = product of:
      0.017147627 = sum of:
        0.017147627 = weight(_text_:22 in 1767) [ClassicSimilarity], result of:
          0.017147627 = score(doc=1767,freq=2.0), product of:
            0.110801086 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.031640913 = queryNorm
            0.15476047 = fieldWeight in 1767, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=1767)
      0.5 = coord(1/2)
  0.2 = coord(3/15)
```
Date

22. 6.2009 12:46:51

Footnote

Im fünften Kapitel "Information Extraction" geht Nohr auf eine Problemstellung ein, die in der Fachwelt eine noch stärkere Betonung verdiente: "Die stetig ansteigende Zahl elektronischer Dokumente macht neben einer automatischen Erschließung auch eine automatische Gewinnung der relevanten Informationen aus diesen Dokumenten wünschenswert, um diese z.B. für weitere Bearbeitungen oder Auswertungen in betriebliche Informationssysteme übernehmen zu können." (S. 103) "Indexierung und Retrievalverfahren" als voneinander abhängige Verfahren werden im sechsten Kapitel behandelt. Hier stehen Relevance Ranking und Relevance Feedback sowie die Anwendung informationslinguistischer Verfahren in der Recherche im Mittelpunkt. Die "Evaluation automatischer Indexierung" setzt den thematischen Schlusspunkt. Hier geht es vor allem um die Oualität einer Indexierung, um gängige Retrievalmaße in Retrievaltest und deren Einssatz. Weiterhin ist hervorzuheben, dass jedes Kapitel durch die Vorgabe von Lernzielen eingeleitet wird und zu den jeweiligen Kapiteln (im hinteren Teil des Buches) einige Kontrollfragen gestellt werden. Die sehr zahlreichen Beispiele aus der Praxis, ein Abkürzungsverzeichnis und ein Sachregister erhöhen den Nutzwert des Buches. Die Lektüre förderte beim Rezensenten das Verständnis für die Zusammenhänge von BID-Handwerkzeug, Wirtschaftsinformatik (insbesondere Data Warehousing) und Künstlicher Intelligenz. Die "Grundlagen der automatischen Indexierung" sollte auch in den bibliothekarischen Studiengängen zur Pflichtlektüre gehören. Holger Nohrs Lehrbuch ist auch für den BID-Profi geeignet, um die mehr oder weniger fundierten Kenntnisse auf dem Gebiet "automatisches Indexieren" schnell, leicht verständlich und informativ aufzufrischen."

Hlava, M.M.K.: Machine aided indexing (MAI) in a multilingual environment (1993) 0.01

0.010396869 = product of:
  0.07797651 = sum of:
    0.011269671 = product of:
      0.022539342 = sum of:
        0.022539342 = weight(_text_:online in 7405) [ClassicSimilarity], result of:
          0.022539342 = score(doc=7405,freq=2.0), product of:
            0.096027054 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.031640913 = queryNorm
            0.23471867 = fieldWeight in 7405, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.0546875 = fieldNorm(doc=7405)
      0.5 = coord(1/2)
    0.066706836 = weight(_text_:software in 7405) [ClassicSimilarity], result of:
      0.066706836 = score(doc=7405,freq=6.0), product of:
        0.12552431 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.031640913 = queryNorm
        0.53142565 = fieldWeight in 7405, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7405)
  0.13333334 = coord(2/15)

Abstract: The machine aided indexing (MAI) software devloped by Access Innovations, Inc., is a semantic based, Boolean statement, rule interpreting application with 3 modules: the MA engine which accepts input files, matches terms in the knowledge base, interprets rules, and outputs a text file with suggested indexing terms; a rule building application allowing each Boolean style rule in the knowledge base to be created or modifies; and a statistical computation module which analyzes performance of the MA software against text manually indexed by professional human indexers. The MA software can be applied across multiple languages and can be used where the text to be searched is in one language and the indexes to be output are in another
Source: Proceedings of the 14th National Online Meeting 1993, New York, 4-6 May 1993. Ed.: M.E. Williams

Fauzi, F.; Belkhatir, M.: Multifaceted conceptual image indexing on the world wide web (2013) 0.01
```
0.010080026 = product of:
  0.07560019 = sum of:
    0.036906876 = weight(_text_:evaluation in 2721) [ClassicSimilarity], result of:
      0.036906876 = score(doc=2721,freq=2.0), product of:
        0.13272417 = queryWeight, product of:
          4.1947007 = idf(docFreq=1811, maxDocs=44218)
          0.031640913 = queryNorm
        0.278072 = fieldWeight in 2721, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1947007 = idf(docFreq=1811, maxDocs=44218)
          0.046875 = fieldNorm(doc=2721)
    0.038693316 = weight(_text_:web in 2721) [ClassicSimilarity], result of:
      0.038693316 = score(doc=2721,freq=6.0), product of:
        0.10326045 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031640913 = queryNorm
        0.37471575 = fieldWeight in 2721, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=2721)
  0.13333334 = coord(2/15)
```
Abstract

In this paper, we describe a user-centered design of an automated multifaceted concept-based indexing framework which analyzes the semantics of the Web image contextual information and classifies it into five broad semantic concept facets: signal, object, abstract, scene, and relational; and identifies the semantic relationships between the concepts. An important aspect of our indexing model is that it relates to the users' levels of image descriptions. Also, a major contribution relies on the fact that the classification is performed automatically with the raw image contextual information extracted from any general webpage and is not solely based on image tags like state-of-the-art solutions. Human Language Technology techniques and an external knowledge base are used to analyze the information both syntactically and semantically. Experimental results on a human-annotated Web image collection and corresponding contextual information indicate that our method outperforms empirical frameworks employing tf-idf and location-based tf-idf weighting schemes as well as n-gram indexing in a recall/precision based evaluation framework.
Kasprzik, A.: Automatisierte und semiautomatisierte Klassifizierung : eine Analyse aktueller Projekte (2014) 0.01
```
0.009891222 = product of:
  0.074184164 = sum of:
    0.033011325 = weight(_text_:software in 2470) [ClassicSimilarity], result of:
      0.033011325 = score(doc=2470,freq=2.0), product of:
        0.12552431 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.031640913 = queryNorm
        0.2629875 = fieldWeight in 2470, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.046875 = fieldNorm(doc=2470)
    0.041172836 = product of:
      0.08234567 = sum of:
        0.08234567 = weight(_text_:analyse in 2470) [ClassicSimilarity], result of:
          0.08234567 = score(doc=2470,freq=4.0), product of:
            0.16670908 = queryWeight, product of:
              5.268782 = idf(docFreq=618, maxDocs=44218)
              0.031640913 = queryNorm
            0.49394834 = fieldWeight in 2470, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.268782 = idf(docFreq=618, maxDocs=44218)
              0.046875 = fieldNorm(doc=2470)
      0.5 = coord(1/2)
  0.13333334 = coord(2/15)
```
Abstract

Das sprunghafte Anwachsen der Menge digital verfügbarer Dokumente gepaart mit dem Zeit- und Personalmangel an wissenschaftlichen Bibliotheken legt den Einsatz von halb- oder vollautomatischen Verfahren für die verbale und klassifikatorische Inhaltserschließung nahe. Nach einer kurzen allgemeinen Einführung in die gängige Methodik beleuchtet dieser Artikel eine Reihe von Projekten zur automatisierten Klassifizierung aus dem Zeitraum 2007-2012 und aus dem deutschsprachigen Raum. Ein Großteil der vorgestellten Projekte verwendet Methoden des Maschinellen Lernens aus der Künstlichen Intelligenz, arbeitet meist mit angepassten Versionen einer kommerziellen Software und bezieht sich in der Regel auf die Dewey Decimal Classification (DDC). Als Datengrundlage dienen Metadatensätze, Abstracs, Inhaltsverzeichnisse und Volltexte in diversen Datenformaten. Die abschließende Analyse enthält eine Anordnung der Projekte nach einer Reihe von verschiedenen Kriterien und eine Zusammenfassung der aktuellen Lage und der größten Herausfordungen für automatisierte Klassifizierungsverfahren.

Wiesenmüller, H.: Maschinelle Indexierung am Beispiel der DNB : Analyse und Entwicklungmöglichkeiten (2018) 0.01

0.009663876 = product of:
  0.07247907 = sum of:
    0.03851321 = weight(_text_:software in 5209) [ClassicSimilarity], result of:
      0.03851321 = score(doc=5209,freq=2.0), product of:
        0.12552431 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.031640913 = queryNorm
        0.30681872 = fieldWeight in 5209, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5209)
    0.033965856 = product of:
      0.06793171 = sum of:
        0.06793171 = weight(_text_:analyse in 5209) [ClassicSimilarity], result of:
          0.06793171 = score(doc=5209,freq=2.0), product of:
            0.16670908 = queryWeight, product of:
              5.268782 = idf(docFreq=618, maxDocs=44218)
              0.031640913 = queryNorm
            0.40748656 = fieldWeight in 5209, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.268782 = idf(docFreq=618, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5209)
      0.5 = coord(1/2)
  0.13333334 = coord(2/15)

Abstract: Der Beitrag untersucht die Ergebnisse des bei der Deutschen Nationalbibliothek (DNB) eingesetzten Verfahrens zur automatischen Vergabe von Schlagwörtern. Seit 2017 kommt dieses auch bei Printausgaben der Reihen B und H der Deutschen Nationalbibliografie zum Einsatz. Die zentralen Problembereiche werden dargestellt und an Beispielen illustriert - beispielsweise dass nicht alle im Inhaltsverzeichnis vorkommenden Wörter tatsächlich thematische Aspekte ausdrücken und dass die Software sehr häufig Körperschaften und andere "Named entities" nicht erkennt. Die maschinell generierten Ergebnisse sind derzeit sehr unbefriedigend. Es werden Überlegungen für mögliche Verbesserungen und sinnvolle Strategien angestellt.

Krüger, C.: Evaluation des WWW-Suchdienstes GERHARD unter besonderer Beachtung automatischer Indexierung (1999) 0.01
```
0.009309685 = product of:
  0.06982263 = sum of:
    0.043495167 = weight(_text_:evaluation in 1777) [ClassicSimilarity], result of:
      0.043495167 = score(doc=1777,freq=4.0), product of:
        0.13272417 = queryWeight, product of:
          4.1947007 = idf(docFreq=1811, maxDocs=44218)
          0.031640913 = queryNorm
        0.327711 = fieldWeight in 1777, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.1947007 = idf(docFreq=1811, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1777)
    0.026327467 = weight(_text_:web in 1777) [ClassicSimilarity], result of:
      0.026327467 = score(doc=1777,freq=4.0), product of:
        0.10326045 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031640913 = queryNorm
        0.25496176 = fieldWeight in 1777, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1777)
  0.13333334 = coord(2/15)
```
Abstract

Die vorliegende Arbeit beinhaltet eine Beschreibung und Evaluation des WWW - Suchdienstes GERHARD (German Harvest Automated Retrieval and Directory). GERHARD ist ein Such- und Navigationssystem für das deutsche World Wide Web, weiches ausschließlich wissenschaftlich relevante Dokumente sammelt, und diese auf der Basis computerlinguistischer und statistischer Methoden automatisch mit Hilfe eines bibliothekarischen Klassifikationssystems klassifiziert. Mit dem DFG - Projekt GERHARD ist der Versuch unternommen worden, mit einem auf einem automatischen Klassifizierungsverfahren basierenden World Wide Web - Dienst eine Alternative zu herkömmlichen Methoden der Interneterschließung zu entwickeln. GERHARD ist im deutschsprachigen Raum das einzige Verzeichnis von Internetressourcen, dessen Erstellung und Aktualisierung vollständig automatisch (also maschinell) erfolgt. GERHARD beschränkt sich dabei auf den Nachweis von Dokumenten auf wissenschaftlichen WWW - Servern. Die Grundidee dabei war, kostenintensive intellektuelle Erschließung und Klassifizierung von lnternetseiten durch computerlinguistische und statistische Methoden zu ersetzen, um auf diese Weise die nachgewiesenen Internetressourcen automatisch auf das Vokabular eines bibliothekarischen Klassifikationssystems abzubilden. GERHARD steht für German Harvest Automated Retrieval and Directory. Die WWW - Adresse (URL) von GERHARD lautet: http://www.gerhard.de. Im Rahmen der vorliegenden Diplomarbeit soll eine Beschreibung des Dienstes mit besonderem Schwerpunkt auf dem zugrundeliegenden Indexierungs- bzw. Klassifizierungssystem erfolgen und anschließend mit Hilfe eines kleinen Retrievaltests die Effektivität von GERHARD überprüft werden.
Kempf, A.O.: Neue Verfahrenswege der Wissensorganisation : eine Evaluation automatischer Indexierung in der sozialwissenschaftlichen Fachinformation (2017) 0.01
```
0.009216119 = product of:
  0.069120884 = sum of:
    0.04305802 = weight(_text_:evaluation in 3497) [ClassicSimilarity], result of:
      0.04305802 = score(doc=3497,freq=2.0), product of:
        0.13272417 = queryWeight, product of:
          4.1947007 = idf(docFreq=1811, maxDocs=44218)
          0.031640913 = queryNorm
        0.32441732 = fieldWeight in 3497, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1947007 = idf(docFreq=1811, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3497)
    0.026062861 = weight(_text_:web in 3497) [ClassicSimilarity], result of:
      0.026062861 = score(doc=3497,freq=2.0), product of:
        0.10326045 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031640913 = queryNorm
        0.25239927 = fieldWeight in 3497, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3497)
  0.13333334 = coord(2/15)
```
Source

Theorie, Semantik und Organisation von Wissen: Proceedings der 13. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) und dem 13. Internationalen Symposium der Informationswissenschaft der Higher Education Association for Information Science (HI) Potsdam (19.-20.03.2013): 'Theory, Information and Organization of Knowledge' / Proceedings der 14. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) und Natural Language & Information Systems (NLDB) Passau (16.06.2015): 'Lexical Resources for Knowledge Organization' / Proceedings des Workshops der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) auf der SEMANTICS Leipzig (1.09.2014): 'Knowledge Organization and Semantic Web' / Proceedings des Workshops der Polnischen und Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) Cottbus (29.-30.09.2011): 'Economics of Knowledge Production and Organization'. Hrsg. von W. Babik, H.P. Ohly u. K. Weber

Wolfekuhler, M.R.; Punch, W.F.: Finding salient features for personal Web pages categories (1997) 0.01

0.008019517 = product of:
  0.060146373 = sum of:
    0.0451422 = weight(_text_:web in 2673) [ClassicSimilarity], result of:
      0.0451422 = score(doc=2673,freq=6.0), product of:
        0.10326045 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031640913 = queryNorm
        0.43716836 = fieldWeight in 2673, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2673)
    0.015004174 = product of:
      0.030008348 = sum of:
        0.030008348 = weight(_text_:22 in 2673) [ClassicSimilarity], result of:
          0.030008348 = score(doc=2673,freq=2.0), product of:
            0.110801086 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.031640913 = queryNorm
            0.2708308 = fieldWeight in 2673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2673)
      0.5 = coord(1/2)
  0.13333334 = coord(2/15)

Abstract: Examines techniques that discover features in sets of pre-categorized documents, such that similar documents can be found on the WWW. Examines techniques which will classifiy training examples with high accuracy, then explains why this is not necessarily useful. Describes a method for extracting word clusters from the raw document features. Results show that the clustering technique is successful in discovering word groups in personal Web pages which can be used to find similar information on the WWW
Date: 1. 8.1996 22:08:06
Footnote: Contribution to a special issue of papers from the 6th International World Wide Web conference, held 7-11 Apr 1997, Santa Clara, California

Kempf, A.O.: Automatische Inhaltserschließung in der Fachinformation (2013) 0.01
```
0.007768689 = product of:
  0.058265164 = sum of:
    0.027509436 = weight(_text_:software in 905) [ClassicSimilarity], result of:
      0.027509436 = score(doc=905,freq=2.0), product of:
        0.12552431 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.031640913 = queryNorm
        0.21915624 = fieldWeight in 905, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.0390625 = fieldNorm(doc=905)
    0.030755727 = weight(_text_:evaluation in 905) [ClassicSimilarity], result of:
      0.030755727 = score(doc=905,freq=2.0), product of:
        0.13272417 = queryWeight, product of:
          4.1947007 = idf(docFreq=1811, maxDocs=44218)
          0.031640913 = queryNorm
        0.23172665 = fieldWeight in 905, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1947007 = idf(docFreq=1811, maxDocs=44218)
          0.0390625 = fieldNorm(doc=905)
  0.13333334 = coord(2/15)
```
Abstract

Der Artikel basiert auf einer Masterarbeit mit dem Titel "Automatische Indexierung in der sozialwissenschaftlichen Fachinformation. Eine Evaluationsstudie zur maschinellen Erschließung für die Datenbank SOLIS" (Kempf 2012), die im Rahmen des Aufbaustudiengangs Bibliotheks- und Informationswissenschaft an der Humboldt- Universität zu Berlin am Lehrstuhl Information Retrieval verfasst wurde. Auf der Grundlage des Schalenmodells zur Inhaltserschließung in der Fachinformation stellt der Artikel Evaluationsergebnisse eines automatischen Erschließungsverfahrens für den Einsatz in der sozialwissenschaftlichen Fachinformation vor. Ausgehend von dem von Krause beschriebenen Anwendungsszenario, wonach SOLIS-Datenbestände (Sozialwissenschaftliches Literaturinformationssystem) von geringerer Relevanz automatisch erschlossen werden sollten, wurden auf dieser Dokumentgrundlage zwei Testreihen mit der Indexierungssoftware MindServer der Firma Recommind durchgeführt. Neben den Auswirkungen allgemeiner Systemeinstellungen in der ersten Testreihe wurde in der zweiten Testreihe die Indexierungsleistung der Software für die Rand- und die Kernbereiche der Literaturdatenbank miteinander verglichen. Für letztere Testreihe wurden für beide Bereiche der Datenbank spezifische Versionen der Indexierungssoftware aufgebaut, die anhand von Dokumentkorpora aus den entsprechenden Bereichen trainiert wurden. Die Ergebnisse der Evaluation, die auf der Grundlage intellektuell generierter Vergleichsdaten erfolgt, weisen auf Unterschiede in der Indexierungsleistung zwischen Rand- und Kernbereichen hin, die einerseits gegen den Einsatz automatischer Indexierungsverfahren in den Randbereichen sprechen. Andererseits deutet sich an, dass sich die Indexierungsresultate durch den Aufbau fachteilgebietsspezifischer Trainingsmengen verbessern lassen.
Kanan, T.; Fox, E.A.: Automated arabic text classification with P-Stemmer, machine learning, and a tailored news article taxonomy (2016) 0.01
```
0.007768689 = product of:
  0.058265164 = sum of:
    0.027509436 = weight(_text_:software in 3151) [ClassicSimilarity], result of:
      0.027509436 = score(doc=3151,freq=2.0), product of:
        0.12552431 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.031640913 = queryNorm
        0.21915624 = fieldWeight in 3151, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3151)
    0.030755727 = weight(_text_:evaluation in 3151) [ClassicSimilarity], result of:
      0.030755727 = score(doc=3151,freq=2.0), product of:
        0.13272417 = queryWeight, product of:
          4.1947007 = idf(docFreq=1811, maxDocs=44218)
          0.031640913 = queryNorm
        0.23172665 = fieldWeight in 3151, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1947007 = idf(docFreq=1811, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3151)
  0.13333334 = coord(2/15)
```
Abstract

Arabic news articles in electronic collections are difficult to study. Browsing by category is rarely supported. Although helpful machine-learning methods have been applied successfully to similar situations for English news articles, limited research has been completed to yield suitable solutions for Arabic news. In connection with a Qatar National Research Fund (QNRF)-funded project to build digital library community and infrastructure in Qatar, we developed software for browsing a collection of about 237,000 Arabic news articles, which should be applicable to other Arabic news collections. We designed a simple taxonomy for Arabic news stories that is suitable for the needs of Qatar and other nations, is compatible with the subject codes of the International Press Telecommunications Council, and was enhanced with the aid of a librarian expert as well as five Arabic-speaking volunteers. We developed tailored stemming (i.e., a new Arabic light stemmer called P-Stemmer) and automatic classification methods (the best being binary Support Vector Machines classifiers) to work with the taxonomy. Using evaluation techniques commonly used in the information retrieval community, including 10-fold cross-validation and the Wilcoxon signed-rank test, we showed that our approach to stemming and classification is superior to state-of-the-art techniques.

Pritchard-Schoch, T.: Natural language comes of age (1993) 0.01

0.0075859637 = product of:
  0.056894723 = sum of:
    0.012879624 = product of:
      0.025759248 = sum of:
        0.025759248 = weight(_text_:online in 2570) [ClassicSimilarity], result of:
          0.025759248 = score(doc=2570,freq=2.0), product of:
            0.096027054 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.031640913 = queryNorm
            0.2682499 = fieldWeight in 2570, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.0625 = fieldNorm(doc=2570)
      0.5 = coord(1/2)
    0.0440151 = weight(_text_:software in 2570) [ClassicSimilarity], result of:
      0.0440151 = score(doc=2570,freq=2.0), product of:
        0.12552431 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.031640913 = queryNorm
        0.35064998 = fieldWeight in 2570, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.0625 = fieldNorm(doc=2570)
  0.13333334 = coord(2/15)

Abstract: Discusses natural languages and the natural language implementations of Westlaw's full-text legal documents, Westlaw Is Natural. Natural language is not aritificial intelligence but a hybrid of linguistics, mathematics and statistics. Provides 3 classes of retrieval models. Explains how Westlaw processes an English query. Assesses WIN. Covers WIN enhancements; the natural language features of Congressional Quarterly's Washington Alert using a document for a query; the personal librarian front end search software and Dowquest from Dow Jones news/retrieval. Conmsiders whether natural language encourages fuzzy thinking and whether Boolean logic will still be needed
Source: Online. 17(1993) no.3, S.33-43

Rasmussen, E.M.: Indexing and retrieval for the Web (2002) 0.01
```
0.007252531 = product of:
  0.05439398 = sum of:
    0.0056348355 = product of:
      0.011269671 = sum of:
        0.011269671 = weight(_text_:online in 4285) [ClassicSimilarity], result of:
          0.011269671 = score(doc=4285,freq=2.0), product of:
            0.096027054 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.031640913 = queryNorm
            0.11735933 = fieldWeight in 4285, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.02734375 = fieldNorm(doc=4285)
      0.5 = coord(1/2)
    0.048759144 = weight(_text_:web in 4285) [ClassicSimilarity], result of:
      0.048759144 = score(doc=4285,freq=28.0), product of:
        0.10326045 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031640913 = queryNorm
        0.47219574 = fieldWeight in 4285, product of:
          5.2915025 = tf(freq=28.0), with freq of:
            28.0 = termFreq=28.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.02734375 = fieldNorm(doc=4285)
  0.13333334 = coord(2/15)
```
Abstract

The introduction and growth of the World Wide Web (WWW, or Web) have resulted in a profound change in the way individuals and organizations access information. In terms of volume, nature, and accessibility, the characteristics of electronic information are significantly different from those of even five or six years ago. Control of, and access to, this flood of information rely heavily an automated techniques for indexing and retrieval. According to Gudivada, Raghavan, Grosky, and Kasanagottu (1997, p. 58), "The ability to search and retrieve information from the Web efficiently and effectively is an enabling technology for realizing its full potential." Almost 93 percent of those surveyed consider the Web an "indispensable" Internet technology, second only to e-mail (Graphie, Visualization & Usability Center, 1998). Although there are other ways of locating information an the Web (browsing or following directory structures), 85 percent of users identify Web pages by means of a search engine (Graphie, Visualization & Usability Center, 1998). A more recent study conducted by the Stanford Institute for the Quantitative Study of Society confirms the finding that searching for information is second only to e-mail as an Internet activity (Nie & Ebring, 2000, online). In fact, Nie and Ebring conclude, "... the Internet today is a giant public library with a decidedly commercial tilt. The most widespread use of the Internet today is as an information search utility for products, travel, hobbies, and general information. Virtually all users interviewed responded that they engaged in one or more of these information gathering activities."
Techniques for automated indexing and information retrieval (IR) have been developed, tested, and refined over the past 40 years, and are well documented (see, for example, Agosti & Smeaton, 1996; BaezaYates & Ribeiro-Neto, 1999a; Frakes & Baeza-Yates, 1992; Korfhage, 1997; Salton, 1989; Witten, Moffat, & Bell, 1999). With the introduction of the Web, and the capability to index and retrieve via search engines, these techniques have been extended to a new environment. They have been adopted, altered, and in some Gases extended to include new methods. "In short, search engines are indispensable for searching the Web, they employ a variety of relatively advanced IR techniques, and there are some peculiar aspects of search engines that make searching the Web different than more conventional information retrieval" (Gordon & Pathak, 1999, p. 145). The environment for information retrieval an the World Wide Web differs from that of "conventional" information retrieval in a number of fundamental ways. The collection is very large and changes continuously, with pages being added, deleted, and altered. Wide variability between the size, structure, focus, quality, and usefulness of documents makes Web documents much more heterogeneous than a typical electronic document collection. The wide variety of document types includes images, video, audio, and scripts, as well as many different document languages. Duplication of documents and sites is common. Documents are interconnected through networks of hyperlinks. Because of the size and dynamic nature of the Web, preprocessing all documents requires considerable resources and is often not feasible, certainly not an the frequent basis required to ensure currency. Query length is usually much shorter than in other environments-only a few words-and user behavior differs from that in other environments. These differences make the Web a novel environment for information retrieval (Baeza-Yates & Ribeiro-Neto, 1999b; Bharat & Henzinger, 1998; Huang, 2000).

Renz, M.: Automatische Inhaltserschließung im Zeichen von Wissensmanagement (2001) 0.01

0.0071356515 = product of:
  0.053517383 = sum of:
    0.03851321 = weight(_text_:software in 5671) [ClassicSimilarity], result of:
      0.03851321 = score(doc=5671,freq=2.0), product of:
        0.12552431 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.031640913 = queryNorm
        0.30681872 = fieldWeight in 5671, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5671)
    0.015004174 = product of:
      0.030008348 = sum of:
        0.030008348 = weight(_text_:22 in 5671) [ClassicSimilarity], result of:
          0.030008348 = score(doc=5671,freq=2.0), product of:
            0.110801086 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.031640913 = queryNorm
            0.2708308 = fieldWeight in 5671, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5671)
      0.5 = coord(1/2)
  0.13333334 = coord(2/15)

Abstract: Methoden der automatischen Inhaltserschließung werden seit mehr als 30 Jahren entwickelt, ohne in luD-Kreisen auf merkliche Akzeptanz zu stoßen. Gegenwärtig führen jedoch die steigende Informationsflut und der Bedarf an effizienten Zugriffsverfahren im Informations- und Wissensmanagement in breiten Anwenderkreisen zu einem wachsenden Interesse an diesen Methoden, zu verstärkten Anstrengungen in Forschung und Entwicklung und zu neuen Produkten. In diesem Beitrag werden verschiedene Ansätze zu intelligentem und inhaltsbasiertem Retrieval und zur automatischen Inhaltserschließung diskutiert sowie kommerziell vertriebene Softwarewerkzeuge und Lösungen präsentiert. Abschließend wird festgestellt, dass in naher Zukunft mit einer zunehmenden Automatisierung von bestimmten Komponenten des Informations- und Wissensmanagements zu rechnen ist, indem Software-Werkzeuge zur automatischen Inhaltserschließung in den Workflow integriert werden
Date: 22. 3.2001 13:14:48

Search (143 results, page 1 of 8)

Authors

Years

Languages

Types

Themes

Classifications