Search (31 results, page 1 of 2)

  • × theme_ss:"Automatisches Indexieren"
  • × theme_ss:"Computerlinguistik"
  1. Stock, M.: Textwortmethode und Übersetzungsrelation : Eine Methode zum Aufbau von kombinierten Literaturnachweis- und Terminologiedatenbanken (1989) 0.02
    0.022424987 = product of:
      0.07848745 = sum of:
        0.007323784 = product of:
          0.03661892 = sum of:
            0.03661892 = weight(_text_:retrieval in 3412) [ClassicSimilarity], result of:
              0.03661892 = score(doc=3412,freq=2.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.33420905 = fieldWeight in 3412, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.078125 = fieldNorm(doc=3412)
          0.2 = coord(1/5)
        0.07116366 = product of:
          0.14232732 = sum of:
            0.14232732 = weight(_text_:zugriff in 3412) [ClassicSimilarity], result of:
              0.14232732 = score(doc=3412,freq=2.0), product of:
                0.2160124 = queryWeight, product of:
                  5.963546 = idf(docFreq=308, maxDocs=44218)
                  0.03622214 = queryNorm
                0.65888494 = fieldWeight in 3412, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.963546 = idf(docFreq=308, maxDocs=44218)
                  0.078125 = fieldNorm(doc=3412)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    Geisteswissenschaftliche Fachinformation erfordert eine enge Kooperation zwischen Literaturnachweis- und Terminologieinformationssystemen. Eine geeignete Dokumentationsmethode für die Auswertung geisteswissen- schaftlicher Literatur ist die Textwortwethode. Dem originalsprachig aufgenommenen Begriffsrepertoire ist ein einheitssprachiger Zugriff beizuordnen, der einerseits ein vollständiges und genaues Retrieval garantiert und andererseits den Aufbau fachspezifischer Wörterbücher vorantreibt
  2. Riloff, E.: ¬An empirical study of automated dictionary construction for information extraction in three domains (1996) 0.01
    0.007423487 = product of:
      0.025982203 = sum of:
        0.006351802 = product of:
          0.03175901 = sum of:
            0.03175901 = weight(_text_:system in 6752) [ClassicSimilarity], result of:
              0.03175901 = score(doc=6752,freq=2.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.27838376 = fieldWeight in 6752, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.0625 = fieldNorm(doc=6752)
          0.2 = coord(1/5)
        0.0196304 = product of:
          0.0392608 = sum of:
            0.0392608 = weight(_text_:22 in 6752) [ClassicSimilarity], result of:
              0.0392608 = score(doc=6752,freq=2.0), product of:
                0.12684377 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03622214 = queryNorm
                0.30952093 = fieldWeight in 6752, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=6752)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    AutoSlog is a system that addresses the knowledge engineering bottleneck for information extraction. AutoSlog automatically creates domain specific dictionaries for information extraction, given an appropriate training corpus. Describes experiments with AutoSlog in terrorism, joint ventures and microelectronics domains. Compares the performance of AutoSlog across the 3 domains, discusses the lessons learned and presents results from 2 experiments which demonstrate that novice users can generate effective dictionaries using AutoSlog
    Date
    6. 3.1997 16:22:15
  3. Rapke, K.: Automatische Indexierung von Volltexten für die Gruner+Jahr Pressedatenbank (2001) 0.01
    0.005000235 = product of:
      0.035001643 = sum of:
        0.035001643 = product of:
          0.087504104 = sum of:
            0.053818595 = weight(_text_:retrieval in 6386) [ClassicSimilarity], result of:
              0.053818595 = score(doc=6386,freq=12.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.49118498 = fieldWeight in 6386, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.046875 = fieldNorm(doc=6386)
            0.033685513 = weight(_text_:system in 6386) [ClassicSimilarity], result of:
              0.033685513 = score(doc=6386,freq=4.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.29527056 = fieldWeight in 6386, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.046875 = fieldNorm(doc=6386)
          0.4 = coord(2/5)
      0.14285715 = coord(1/7)
    
    Abstract
    Retrieval Tests sind die anerkannteste Methode, um neue Verfahren der Inhaltserschließung gegenüber traditionellen Verfahren zu rechtfertigen. Im Rahmen einer Diplomarbeit wurden zwei grundsätzlich unterschiedliche Systeme der automatischen inhaltlichen Erschließung anhand der Pressedatenbank des Verlagshauses Gruner + Jahr (G+J) getestet und evaluiert. Untersucht wurde dabei natürlichsprachliches Retrieval im Vergleich zu Booleschem Retrieval. Bei den beiden Systemen handelt es sich zum einen um Autonomy von Autonomy Inc. und DocCat, das von IBM an die Datenbankstruktur der G+J Pressedatenbank angepasst wurde. Ersteres ist ein auf natürlichsprachlichem Retrieval basierendes, probabilistisches System. DocCat demgegenüber basiert auf Booleschem Retrieval und ist ein lernendes System, das auf Grund einer intellektuell erstellten Trainingsvorlage indexiert. Methodisch geht die Evaluation vom realen Anwendungskontext der Textdokumentation von G+J aus. Die Tests werden sowohl unter statistischen wie auch qualitativen Gesichtspunkten bewertet. Ein Ergebnis der Tests ist, dass DocCat einige Mängel gegenüber der intellektuellen Inhaltserschließung aufweist, die noch behoben werden müssen, während das natürlichsprachliche Retrieval von Autonomy in diesem Rahmen und für die speziellen Anforderungen der G+J Textdokumentation so nicht einsetzbar ist
  4. Experimentelles und praktisches Information Retrieval : Festschrift für Gerhard Lustig (1992) 0.00
    0.004682856 = product of:
      0.03277999 = sum of:
        0.03277999 = product of:
          0.08194998 = sum of:
            0.058130726 = weight(_text_:retrieval in 4) [ClassicSimilarity], result of:
              0.058130726 = score(doc=4,freq=14.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.5305404 = fieldWeight in 4, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4)
            0.023819257 = weight(_text_:system in 4) [ClassicSimilarity], result of:
              0.023819257 = score(doc=4,freq=2.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.20878783 = fieldWeight in 4, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4)
          0.4 = coord(2/5)
      0.14285715 = coord(1/7)
    
    Content
    Enthält die Beiträge: SALTON, G.: Effective text understanding in information retrieval; KRAUSE, J.: Intelligentes Information retrieval; FUHR, N.: Konzepte zur Gestaltung zukünftiger Information-Retrieval-Systeme; HÜTHER, H.: Überlegungen zu einem mathematischen Modell für die Type-Token-, die Grundform-Token und die Grundform-Type-Relation; KNORZ, G.: Automatische Generierung inferentieller Links in und zwischen Hyperdokumenten; KONRAD, E.: Zur Effektivitätsbewertung von Information-Retrieval-Systemen; HENRICHS, N.: Retrievalunterstützung durch automatisch generierte Wortfelder; LÜCK, W., W. RITTBERGER u. M. SCHWANTNER: Der Einsatz des Automatischen Indexierungs- und Retrieval-System (AIR) im Fachinformationszentrum Karlsruhe; REIMER, U.: Verfahren der Automatischen Indexierung. Benötigtes Vorwissen und Ansätze zu seiner automatischen Akquisition: Ein Überblick; ENDRES-NIGGEMEYER, B.: Dokumentrepräsentation: Ein individuelles prozedurales Modell des Abstracting, des Indexierens und Klassifizierens; SEELBACH, D.: Zur Entwicklung von zwei- und mehrsprachigen lexikalischen Datenbanken und Terminologiedatenbanken; ZIMMERMANN, H.: Der Einfluß der Sprachbarrieren in Europa und Möglichkeiten zu ihrer Minderung; LENDERS, W.: Wörter zwischen Welt und Wissen; PANYR, J.: Frames, Thesauri und automatische Klassifikation (Clusteranalyse): HAHN, U.: Forschungsstrategien und Erkenntnisinteressen in der anwendungsorientierten automatischen Sprachverarbeitung. Überlegungen zu einer ingenieurorientierten Computerlinguistik; KUHLEN, R.: Hypertext und Information Retrieval - mehr als Browsing und Suche.
  5. SIGIR'92 : Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (1992) 0.00
    0.0046747252 = product of:
      0.032723077 = sum of:
        0.032723077 = product of:
          0.08180769 = sum of:
            0.04250792 = weight(_text_:retrieval in 6671) [ClassicSimilarity], result of:
              0.04250792 = score(doc=6671,freq=22.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.3879561 = fieldWeight in 6671, product of:
                  4.690416 = tf(freq=22.0), with freq of:
                    22.0 = termFreq=22.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=6671)
            0.039299767 = weight(_text_:system in 6671) [ClassicSimilarity], result of:
              0.039299767 = score(doc=6671,freq=16.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.34448233 = fieldWeight in 6671, product of:
                  4.0 = tf(freq=16.0), with freq of:
                    16.0 = termFreq=16.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=6671)
          0.4 = coord(2/5)
      0.14285715 = coord(1/7)
    
    Content
    HARMAN, D.: Relevance feedback revisited; AALBERSBERG, I.J.: Incremental relevance feedback; TAGUE-SUTCLIFFE, J.: Measuring the informativeness of a retrieval process; LEWIS, D.D.: An evaluation of phrasal and clustered representations on a text categorization task; BLOSSEVILLE, M.J., G. HÉBRAIL, M.G. MONTEIL u. N. PÉNOT: Automatic document classification: natural language processing, statistical analysis, and expert system techniques used together; MASAND, B., G. LINOFF u. D. WALTZ: Classifying news stories using memory based reasoning; KEEN, E.M.: Term position ranking: some new test results; CROUCH, C.J. u. B. YANG: Experiments in automatic statistical thesaurus construction; GREFENSTETTE, G.: Use of syntactic context to produce term association lists for text retrieval; ANICK, P.G. u. R.A. FLYNN: Versioning of full-text information retrieval system; BURKOWSKI, F.J.: Retrieval activities in a database consisting of heterogeneous collections; DEERWESTER, S.C., K. WACLENA u. M. LaMAR: A textual object management system; NIE, J.-Y.:Towards a probabilistic modal logic for semantic-based information retrieval; WANG, A.W., S.K.M. WONG u. Y.Y. YAO: An analysis of vector space models based on computational geometry; BARTELL, B.T., G.W. COTTRELL u. R.K. BELEW: Latent semantic indexing is an optimal special case of multidimensional scaling; GLAVITSCH, U. u. P. SCHÄUBLE: A system for retrieving speech documents; MARGULIS, E.L.: N-Poisson document modelling; HESS, M.: An incrementally extensible document retrieval system based on linguistics and logical principles; COOPER, W.S., F.C. GEY u. D.P. DABNEY: Probabilistic retrieval based on staged logistic regression; FUHR, N.: Integration of probabilistic fact and text retrieval; CROFT, B., L.A. SMITH u. H. TURTLE: A loosely-coupled integration of a text retrieval system and an object-oriented database system; DUMAIS, S.T. u. J. NIELSEN: Automating the assignement of submitted manuscripts to reviewers; GOST, M.A. u. M. MASOTTI: Design of an OPAC database to permit different subject searching accesses; ROBERTSON, A.M. u. P. WILLETT: Searching for historical word forms in a database of 17th century English text using spelling correction methods; FAX, E.A., Q.F. CHEN u. L.S. HEATH: A faster algorithm for constructing minimal perfect hash functions; MOFFAT, A. u. J. ZOBEL: Parameterised compression for sparse bitmaps; GRANDI, F., P. TIBERIO u. P. Zezula: Frame-sliced patitioned parallel signature files; ALLEN, B.: Cognitive differences in end user searching of a CD-ROM index; SONNENWALD, D.H.: Developing a theory to guide the process of designing information retrieval systems; CUTTING, D.R., J.O. PEDERSEN, D. KARGER, u. J.W. TUKEY: Scatter/ Gather: a cluster-based approach to browsing large document collections; CHALMERS, M. u. P. CHITSON: Bead: Explorations in information visualization; WILLIAMSON, C. u. B. SHNEIDERMAN: The dynamic HomeFinder: evaluating dynamic queries in a real-estate information exploring system
  6. Volk, M.; Mittermaier, H.; Schurig, A.; Biedassek, T.: Halbautomatische Volltextanalyse, Datenbankaufbau und Document Retrieval (1992) 0.00
    0.0046406575 = product of:
      0.032484602 = sum of:
        0.032484602 = product of:
          0.08121151 = sum of:
            0.025633242 = weight(_text_:retrieval in 2571) [ClassicSimilarity], result of:
              0.025633242 = score(doc=2571,freq=2.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.23394634 = fieldWeight in 2571, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2571)
            0.055578265 = weight(_text_:system in 2571) [ClassicSimilarity], result of:
              0.055578265 = score(doc=2571,freq=8.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.4871716 = fieldWeight in 2571, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2571)
          0.4 = coord(2/5)
      0.14285715 = coord(1/7)
    
    Abstract
    In diesem Aufsatz beschreiben wir ein System zur Analyse von Kurzartikeln. Das System arbeitet halbautomatisch. Das heißt, zunächst wird der Artikel vom System analysiert und dann dem benutzer zur Nachberarbeitung vorgelegt. Die so gewonnene Information wird in einem Datenbankeintrag abgelegt. Über die Datenbank - in dBase IV implementiert - sind dann Abfragen und Zugriffe auf die Originaltexte effizient möglich. Der Kern dieses Aufsatzes betrifft die halbautomatische Analyse. Wir beschreiben unser Verfahren für parametrisiertes Pattern Matching sowie linguistische Heuristiken zur Ermittlung von Nominalphrasen und Präpositionalphrasen. Das System wurde für den praktischen Einsatz im Bonner Büro des 'Forums InformatikerInnen Für Frieden und gesellschaftliche Verantwortung e.V. (FIFF)' entwickelt
  7. Salton, G.: Automatic processing of foreign language documents (1985) 0.00
    0.0040293047 = product of:
      0.028205132 = sum of:
        0.028205132 = product of:
          0.07051283 = sum of:
            0.03875382 = weight(_text_:retrieval in 3650) [ClassicSimilarity], result of:
              0.03875382 = score(doc=3650,freq=14.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.3536936 = fieldWeight in 3650, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03125 = fieldNorm(doc=3650)
            0.03175901 = weight(_text_:system in 3650) [ClassicSimilarity], result of:
              0.03175901 = score(doc=3650,freq=8.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.27838376 = fieldWeight in 3650, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03125 = fieldNorm(doc=3650)
          0.4 = coord(2/5)
      0.14285715 = coord(1/7)
    
    Abstract
    The attempt to computerize a process, such as indexing, abstracting, classifying, or retrieving information, begins with an analysis of the process into its intellectual and nonintellectual components. That part of the process which is amenable to computerization is mechanical or algorithmic. What is not is intellectual or creative and requires human intervention. Gerard Salton has been an innovator, experimenter, and promoter in the area of mechanized information systems since the early 1960s. He has been particularly ingenious at analyzing the process of information retrieval into its algorithmic components. He received a doctorate in applied mathematics from Harvard University before moving to the computer science department at Cornell, where he developed a prototype automatic retrieval system called SMART. Working with this system he and his students contributed for over a decade to our theoretical understanding of the retrieval process. On a more practical level, they have contributed design criteria for operating retrieval systems. The following selection presents one of the early descriptions of the SMART system; it is valuable as it shows the direction automatic retrieval methods were to take beyond simple word-matching techniques. These include various word normalization techniques to improve recall, for instance, the separation of words into stems and affixes; the correlation and clustering, using statistical association measures, of related terms; and the identification, using a concept thesaurus, of synonymous, broader, narrower, and sibling terms. They include, as weIl, techniques, both linguistic and statistical, to deal with the thorny problem of how to automatically extract from texts index terms that consist of more than one word. They include weighting techniques and various documentrequest matching algorithms. Significant among the latter are those which produce a retrieval output of citations ranked in relevante order. During the 1970s, Salton and his students went an to further refine these various techniques, particularly the weighting and statistical association measures. Many of their early innovations seem commonplace today. Some of their later techniques are still ahead of their time and await technological developments for implementation. The particular focus of the selection that follows is an the evaluation of a particular component of the SMART system, a multilingual thesaurus. By mapping English language expressions and their German equivalents to a common concept number, the thesaurus permitted the automatic processing of German language documents against English language queries and vice versa. The results of the evaluation, as it turned out, were somewhat inconclusive. However, this SMART experiment suggested in a bold and optimistic way how one might proceed to answer such complex questions as What is meant by retrieval language compatability? How it is to be achieved, and how evaluated?
  8. Rapke, K.: Automatische Indexierung von Volltexten für die Gruner+Jahr Pressedatenbank (2001) 0.00
    0.003943569 = product of:
      0.027604982 = sum of:
        0.027604982 = product of:
          0.069012456 = sum of:
            0.040941194 = weight(_text_:retrieval in 5863) [ClassicSimilarity], result of:
              0.040941194 = score(doc=5863,freq=10.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.37365708 = fieldWeight in 5863, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5863)
            0.028071264 = weight(_text_:system in 5863) [ClassicSimilarity], result of:
              0.028071264 = score(doc=5863,freq=4.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.24605882 = fieldWeight in 5863, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5863)
          0.4 = coord(2/5)
      0.14285715 = coord(1/7)
    
    Abstract
    Retrievaltests sind die anerkannteste Methode, um neue Verfahren der Inhaltserschließung gegenüber traditionellen Verfahren zu rechtfertigen. Im Rahmen einer Diplomarbeit wurden zwei grundsätzlich unterschiedliche Systeme der automatischen inhaltlichen Erschließung anhand der Pressedatenbank des Verlagshauses Gruner + Jahr (G+J) getestet und evaluiert. Untersucht wurde dabei natürlichsprachliches Retrieval im Vergleich zu Booleschem Retrieval. Bei den beiden Systemen handelt es sich zum einen um Autonomy von Autonomy Inc. und DocCat, das von IBM an die Datenbankstruktur der G+J Pressedatenbank angepasst wurde. Ersteres ist ein auf natürlichsprachlichem Retrieval basierendes, probabilistisches System. DocCat demgegenüber basiert auf Booleschem Retrieval und ist ein lernendes System, das aufgrund einer intellektuell erstellten Trainingsvorlage indexiert. Methodisch geht die Evaluation vom realen Anwendungskontext der Textdokumentation von G+J aus. Die Tests werden sowohl unter statistischen wie auch qualitativen Gesichtspunkten bewertet. Ein Ergebnis der Tests ist, dass DocCat einige Mängel gegenüber der intellektuellen Inhaltserschließung aufweist, die noch behoben werden müssen, während das natürlichsprachliche Retrieval von Autonomy in diesem Rahmen und für die speziellen Anforderungen der G+J Textdokumentation so nicht einsetzbar ist
  9. Polity, Y.: Vers une ergonomie linguistique (1994) 0.00
    0.0034888082 = product of:
      0.024421657 = sum of:
        0.024421657 = product of:
          0.06105414 = sum of:
            0.029295133 = weight(_text_:retrieval in 36) [ClassicSimilarity], result of:
              0.029295133 = score(doc=36,freq=2.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.26736724 = fieldWeight in 36, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0625 = fieldNorm(doc=36)
            0.03175901 = weight(_text_:system in 36) [ClassicSimilarity], result of:
              0.03175901 = score(doc=36,freq=2.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.27838376 = fieldWeight in 36, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.0625 = fieldNorm(doc=36)
          0.4 = coord(2/5)
      0.14285715 = coord(1/7)
    
    Abstract
    Analyzed a special type of man-mchine interaction, that of searching an information system with natural language. A model for full text processing for information retrieval was proposed that considered the system's users and how they employ information. Describes how INIST (the National Institute for Scientific and Technical Information) is developing computer assisted indexing as an aid to improving relevance when retrieving information from bibliographic data banks
  10. Lorenz, S.: Konzeption und prototypische Realisierung einer begriffsbasierten Texterschließung (2006) 0.00
    0.0021032572 = product of:
      0.0147228 = sum of:
        0.0147228 = product of:
          0.0294456 = sum of:
            0.0294456 = weight(_text_:22 in 1746) [ClassicSimilarity], result of:
              0.0294456 = score(doc=1746,freq=2.0), product of:
                0.12684377 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03622214 = queryNorm
                0.23214069 = fieldWeight in 1746, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1746)
          0.5 = coord(1/2)
      0.14285715 = coord(1/7)
    
    Date
    22. 3.2015 9:17:30
  11. Malone, L.C.; Driscoll, J.R.; Pepe, J.W.: Modeling the performance of an automated keywording system (1991) 0.00
    0.0018148007 = product of:
      0.012703604 = sum of:
        0.012703604 = product of:
          0.06351802 = sum of:
            0.06351802 = weight(_text_:system in 6682) [ClassicSimilarity], result of:
              0.06351802 = score(doc=6682,freq=8.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.5567675 = fieldWeight in 6682, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.0625 = fieldNorm(doc=6682)
          0.2 = coord(1/5)
      0.14285715 = coord(1/7)
    
    Abstract
    Presents a model for predicting the performance of a computerised keyword assigning and indexing system. Statistical procedures were investigated in order to protect against incorrect keywording by the system behaving as an expert system designed to mimic the behaviour of human keyword indexers and representing lessons learned from military exercises and operations
  12. Chowdhury, G.G.: Natural language processing and information retrieval : pt.1: basic issues; pt.2: major applications (1991) 0.00
    0.0014796278 = product of:
      0.010357394 = sum of:
        0.010357394 = product of:
          0.051786967 = sum of:
            0.051786967 = weight(_text_:retrieval in 3313) [ClassicSimilarity], result of:
              0.051786967 = score(doc=3313,freq=4.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.47264296 = fieldWeight in 3313, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.078125 = fieldNorm(doc=3313)
          0.2 = coord(1/5)
      0.14285715 = coord(1/7)
    
    Abstract
    Reviews the basic issues and procedures involved in natural language processing of textual material for final use in information retrieval. Covers: natural language processing; natural language understanding; syntactic and semantic analysis; parsing; knowledge bases and knowledge representation
  13. Porter, M.F.: ¬An algorithm for suffix stripping (1980) 0.00
    0.0012555057 = product of:
      0.00878854 = sum of:
        0.00878854 = product of:
          0.0439427 = sum of:
            0.0439427 = weight(_text_:retrieval in 3122) [ClassicSimilarity], result of:
              0.0439427 = score(doc=3122,freq=2.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.40105087 = fieldWeight in 3122, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.09375 = fieldNorm(doc=3122)
          0.2 = coord(1/5)
      0.14285715 = coord(1/7)
    
    Footnote
    Wiederabgedruckt in: Readings in information retrieval. Ed.: K. Sparck Jones u. P. Willett. San Francisco: Morgan Kaufmann 1997. S.313-316.
  14. Pritchard-Schoch, T.: Natural language comes of age (1993) 0.00
    0.0011837021 = product of:
      0.008285915 = sum of:
        0.008285915 = product of:
          0.04142957 = sum of:
            0.04142957 = weight(_text_:retrieval in 2570) [ClassicSimilarity], result of:
              0.04142957 = score(doc=2570,freq=4.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.37811437 = fieldWeight in 2570, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0625 = fieldNorm(doc=2570)
          0.2 = coord(1/5)
      0.14285715 = coord(1/7)
    
    Abstract
    Discusses natural languages and the natural language implementations of Westlaw's full-text legal documents, Westlaw Is Natural. Natural language is not aritificial intelligence but a hybrid of linguistics, mathematics and statistics. Provides 3 classes of retrieval models. Explains how Westlaw processes an English query. Assesses WIN. Covers WIN enhancements; the natural language features of Congressional Quarterly's Washington Alert using a document for a query; the personal librarian front end search software and Dowquest from Dow Jones news/retrieval. Conmsiders whether natural language encourages fuzzy thinking and whether Boolean logic will still be needed
  15. Driscoll, J.R.; Rajala, D.A.; Shaffer, W.H.: ¬The operation and performance of an artificially intelligent keywording system (1991) 0.00
    0.0011228506 = product of:
      0.007859954 = sum of:
        0.007859954 = product of:
          0.039299767 = sum of:
            0.039299767 = weight(_text_:system in 6681) [ClassicSimilarity], result of:
              0.039299767 = score(doc=6681,freq=4.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.34448233 = fieldWeight in 6681, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=6681)
          0.2 = coord(1/5)
      0.14285715 = coord(1/7)
    
    Abstract
    Presents a new approach to text analysis for automating the key phrase indexing process, using artificial intelligence techniques. This mimics the behaviour of human experts by using a rule base consisting of insertion and deletion rules generated by subject-matter experts. The insertion rules are based on the idea that some phrases found in a text imply or trigger other phrases. The deletion rules apply to semantically ambiguous phrases where text presence alone does not determine appropriateness as a key phrase. The insertion and deletion rules are used to transform a list of found phrases to a list of key phrases for indexing a document. Statistical data are provided to demonstrate the performance of this expert rule based system
  16. Galvez, C.; Moya-Anegón, F. de: ¬An evaluation of conflation accuracy using finite-state transducers (2006) 0.00
    0.0010873 = product of:
      0.0076110996 = sum of:
        0.0076110996 = product of:
          0.0380555 = sum of:
            0.0380555 = weight(_text_:retrieval in 5599) [ClassicSimilarity], result of:
              0.0380555 = score(doc=5599,freq=6.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.34732026 = fieldWeight in 5599, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5599)
          0.2 = coord(1/5)
      0.14285715 = coord(1/7)
    
    Abstract
    Purpose - To evaluate the accuracy of conflation methods based on finite-state transducers (FSTs). Design/methodology/approach - Incorrectly lemmatized and stemmed forms may lead to the retrieval of inappropriate documents. Experimental studies to date have focused on retrieval performance, but very few on conflation performance. The process of normalization we used involved a linguistic toolbox that allowed us to construct, through graphic interfaces, electronic dictionaries represented internally by FSTs. The lexical resources developed were applied to a Spanish test corpus for merging term variants in canonical lemmatized forms. Conflation performance was evaluated in terms of an adaptation of recall and precision measures, based on accuracy and coverage, not actual retrieval. The results were compared with those obtained using a Spanish version of the Porter algorithm. Findings - The conclusion is that the main strength of lemmatization is its accuracy, whereas its main limitation is the underanalysis of variant forms. Originality/value - The report outlines the potential of transducers in their application to normalization processes.
  17. Ahlgren, P.; Kekäläinen, J.: Indexing strategies for Swedish full text retrieval under different user scenarios (2007) 0.00
    0.0010462549 = product of:
      0.007323784 = sum of:
        0.007323784 = product of:
          0.03661892 = sum of:
            0.03661892 = weight(_text_:retrieval in 896) [ClassicSimilarity], result of:
              0.03661892 = score(doc=896,freq=8.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.33420905 = fieldWeight in 896, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=896)
          0.2 = coord(1/5)
      0.14285715 = coord(1/7)
    
    Abstract
    This paper deals with Swedish full text retrieval and the problem of morphological variation of query terms in the document database. The effects of combination of indexing strategies with query terms on retrieval effectiveness were studied. Three of five tested combinations involved indexing strategies that used conflation, in the form of normalization. Further, two of these three combinations used indexing strategies that employed compound splitting. Normalization and compound splitting were performed by SWETWOL, a morphological analyzer for the Swedish language. A fourth combination attempted to group related terms by right hand truncation of query terms. The four combinations were compared to each other and to a baseline combination, where no attempt was made to counteract the problem of morphological variation of query terms in the document database. The five combinations were evaluated under six different user scenarios, where each scenario simulated a certain user type. The four alternative combinations outperformed the baseline, for each user scenario. The truncation combination had the best performance under each user scenario. The main conclusion of the paper is that normalization and right hand truncation (performed by a search expert) enhanced retrieval effectiveness in comparison to the baseline. The performance of the three combinations of indexing strategies with query terms based on normalization was not far below the performance of the truncation combination.
  18. Fagan, J.L.: ¬The effectiveness of a nonsyntactic approach to automatic phrase indexing for document retrieval (1989) 0.00
    0.0010462549 = product of:
      0.007323784 = sum of:
        0.007323784 = product of:
          0.03661892 = sum of:
            0.03661892 = weight(_text_:retrieval in 1845) [ClassicSimilarity], result of:
              0.03661892 = score(doc=1845,freq=8.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.33420905 = fieldWeight in 1845, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1845)
          0.2 = coord(1/5)
      0.14285715 = coord(1/7)
    
    Abstract
    It may be possible to improve the quality of automatic indexing systems by using complex descriptors, for example, phrases, in addition to the simple descriptors (words or word stems) that are normally used in automatically constructed representations of document content. This study is directed toward the goal of developing effective methods of identifying phrases in natural language text from which good quality phrase descriptors can be constructed. The effectiveness of one method, a simple nonsyntactic phrase indexing procedure, has been tested on five experimental document collections. The results have been analyzed in order to identify the inadequacies of the procedure, and to determine what kinds of information about text structure are needed in order to construct phrase descriptors that are good indicators of document content. Two primary conclusions have been reached: (1) In the retrieval experiments, the nonsyntactic phrase construction procedure did not consistently yield substantial improvements in effectiveness. It is therefore not likely that phrase indexing of this kind will prove to be an important method of enhancing the performance of automatic document indexing and retrieval systems in operational environments. (2) Many of the shortcomings of the nonsyntactic approach can be overcome by incorporating syntactic information into the phrase construction process. However, a general syntactic analysis facility may be required, since many useful sources of phrases cannot be exploited if only a limited inventory of syntactic patterns can be recognized. Further research should be conducted into methods of incorporating automatic syntactic analysis into content analysis for document retrieval.
  19. Kuhlen, R.: Experimentelle Morphologie in der Informationswissenschaft (1977) 0.00
    0.0010357393 = product of:
      0.007250175 = sum of:
        0.007250175 = product of:
          0.036250874 = sum of:
            0.036250874 = weight(_text_:retrieval in 4253) [ClassicSimilarity], result of:
              0.036250874 = score(doc=4253,freq=4.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.33085006 = fieldWeight in 4253, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=4253)
          0.2 = coord(1/5)
      0.14285715 = coord(1/7)
    
    LCSH
    Information storage and retrieval systems
    Subject
    Information storage and retrieval systems
  20. Kajanan, S.; Bao, Y.; Datta, A.; VanderMeer, D.; Dutta, K.: Efficient automatic search query formulation using phrase-level analysis (2014) 0.00
    0.0010145045 = product of:
      0.0071015307 = sum of:
        0.0071015307 = product of:
          0.035507653 = sum of:
            0.035507653 = weight(_text_:system in 1264) [ClassicSimilarity], result of:
              0.035507653 = score(doc=1264,freq=10.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.31124252 = fieldWeight in 1264, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03125 = fieldNorm(doc=1264)
          0.2 = coord(1/5)
      0.14285715 = coord(1/7)
    
    Abstract
    Over the past decade, the volume of information available digitally over the Internet has grown enormously. Technical developments in the area of search, such as Google's Page Rank algorithm, have proved so good at serving relevant results that Internet search has become integrated into daily human activity. One can endlessly explore topics of interest simply by querying and reading through the resulting links. Yet, although search engines are well known for providing relevant results based on users' queries, users do not always receive the results they are looking for. Google's Director of Research describes clickstream evidence of frustrated users repeatedly reformulating queries and searching through page after page of results. Given the general quality of search engine results, one must consider the possibility that the frustrated user's query is not effective; that is, it does not describe the essence of the user's interest. Indeed, extensive research into human search behavior has found that humans are not very effective at formulating good search queries that describe what they are interested in. Ideally, the user should simply point to a portion of text that sparked the user's interest, and a system should automatically formulate a search query that captures the essence of the text. In this paper, we describe an implemented system that provides this capability. We first describe how our work differs from existing work in automatic query formulation, and propose a new method for improved quantification of the relevance of candidate search terms drawn from input text using phrase-level analysis. We then propose an implementable method designed to provide relevant queries based on a user's text input. We demonstrate the quality of our results and performance of our system through experimental studies. Our results demonstrate that our system produces relevant search terms with roughly two-thirds precision and recall compared to search terms selected by experts, and that typical users find significantly more relevant results (31% more relevant) more quickly (64% faster) using our system than self-formulated search queries. Further, we show that our implementation can scale to request loads of up to 10 requests per second within current online responsiveness expectations (<2-second response times at the highest loads tested).