Search (13 results, page 1 of 1)

  • × theme_ss:"Automatisches Indexieren"
  • × theme_ss:"Retrievalstudien"
  • × type_ss:"a"
  1. Hodges, P.R.: Keyword in title indexes : effectiveness of retrieval in computer searches (1983) 0.01
    0.007837114 = product of:
      0.027429897 = sum of:
        0.010253297 = product of:
          0.051266484 = sum of:
            0.051266484 = weight(_text_:retrieval in 5001) [ClassicSimilarity], result of:
              0.051266484 = score(doc=5001,freq=8.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.46789268 = fieldWeight in 5001, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5001)
          0.2 = coord(1/5)
        0.0171766 = product of:
          0.0343532 = sum of:
            0.0343532 = weight(_text_:22 in 5001) [ClassicSimilarity], result of:
              0.0343532 = score(doc=5001,freq=2.0), product of:
                0.12684377 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03622214 = queryNorm
                0.2708308 = fieldWeight in 5001, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5001)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    A study was done to test the effectiveness of retrieval using title word searching. It was based on actual search profiles used in the Mechanized Information Center at Ohio State University, in order ro replicate as closely as possible actual searching conditions. Fewer than 50% of the relevant titles were retrieved by keywords in titles. The low rate of retrieval can be attributes to three sources: titles themselves, user and information specialist ignorance of the subject vocabulary in use, and to general language problems. Across fields it was found that the social sciences had the best retrieval rate, with science having the next best, and arts and humanities the lowest. Ways to enhance and supplement keyword in title searching on the computer and in printed indexes are discussed.
    Date
    14. 3.1996 13:22:21
  2. Wan, T.-L.; Evens, M.; Wan, Y.-W.; Pao, Y.-Y.: Experiments with automatic indexing and a relational thesaurus in a Chinese information retrieval system (1997) 0.01
    0.0064511965 = product of:
      0.045158375 = sum of:
        0.045158375 = product of:
          0.112895936 = sum of:
            0.05731767 = weight(_text_:retrieval in 956) [ClassicSimilarity], result of:
              0.05731767 = score(doc=956,freq=10.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.5231199 = fieldWeight in 956, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=956)
            0.055578265 = weight(_text_:system in 956) [ClassicSimilarity], result of:
              0.055578265 = score(doc=956,freq=8.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.4871716 = fieldWeight in 956, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=956)
          0.4 = coord(2/5)
      0.14285715 = coord(1/7)
    
    Abstract
    This article describes a series of experiments with an interactive Chinese information retrieval system named CIRS and an interactive relational thesaurus. 2 important issues have been explored: whether thesauri enhance the retrieval effectiveness of Chinese documents, and whether automatic indexing can complete with manual indexing in a Chinese information retrieval system. Recall and precision are used to measure and evaluate the effectiveness of the system. Statistical analysis of the recall and precision measures suggest that the use of the relational thesaurus does improve the retrieval effectiveness both in the automatic indexing environment and in the manual indexing environment and that automatic indexing is at least as good as manual indexing
  3. Rapke, K.: Automatische Indexierung von Volltexten für die Gruner+Jahr Pressedatenbank (2001) 0.01
    0.005000235 = product of:
      0.035001643 = sum of:
        0.035001643 = product of:
          0.087504104 = sum of:
            0.053818595 = weight(_text_:retrieval in 6386) [ClassicSimilarity], result of:
              0.053818595 = score(doc=6386,freq=12.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.49118498 = fieldWeight in 6386, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.046875 = fieldNorm(doc=6386)
            0.033685513 = weight(_text_:system in 6386) [ClassicSimilarity], result of:
              0.033685513 = score(doc=6386,freq=4.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.29527056 = fieldWeight in 6386, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.046875 = fieldNorm(doc=6386)
          0.4 = coord(2/5)
      0.14285715 = coord(1/7)
    
    Abstract
    Retrieval Tests sind die anerkannteste Methode, um neue Verfahren der Inhaltserschließung gegenüber traditionellen Verfahren zu rechtfertigen. Im Rahmen einer Diplomarbeit wurden zwei grundsätzlich unterschiedliche Systeme der automatischen inhaltlichen Erschließung anhand der Pressedatenbank des Verlagshauses Gruner + Jahr (G+J) getestet und evaluiert. Untersucht wurde dabei natürlichsprachliches Retrieval im Vergleich zu Booleschem Retrieval. Bei den beiden Systemen handelt es sich zum einen um Autonomy von Autonomy Inc. und DocCat, das von IBM an die Datenbankstruktur der G+J Pressedatenbank angepasst wurde. Ersteres ist ein auf natürlichsprachlichem Retrieval basierendes, probabilistisches System. DocCat demgegenüber basiert auf Booleschem Retrieval und ist ein lernendes System, das auf Grund einer intellektuell erstellten Trainingsvorlage indexiert. Methodisch geht die Evaluation vom realen Anwendungskontext der Textdokumentation von G+J aus. Die Tests werden sowohl unter statistischen wie auch qualitativen Gesichtspunkten bewertet. Ein Ergebnis der Tests ist, dass DocCat einige Mängel gegenüber der intellektuellen Inhaltserschließung aufweist, die noch behoben werden müssen, während das natürlichsprachliche Retrieval von Autonomy in diesem Rahmen und für die speziellen Anforderungen der G+J Textdokumentation so nicht einsetzbar ist
  4. Fuhr, N.; Niewelt, B.: ¬Ein Retrievaltest mit automatisch indexierten Dokumenten (1984) 0.00
    0.0049076 = product of:
      0.0343532 = sum of:
        0.0343532 = product of:
          0.0687064 = sum of:
            0.0687064 = weight(_text_:22 in 262) [ClassicSimilarity], result of:
              0.0687064 = score(doc=262,freq=2.0), product of:
                0.12684377 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03622214 = queryNorm
                0.5416616 = fieldWeight in 262, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=262)
          0.5 = coord(1/2)
      0.14285715 = coord(1/7)
    
    Date
    20.10.2000 12:22:23
  5. Lochbaum, K.E.; Streeter, A.R.: Comparing and combining the effectiveness of latent semantic indexing and the ordinary vector space model for information retrieval (1989) 0.00
    0.004099487 = product of:
      0.028696407 = sum of:
        0.028696407 = product of:
          0.071741015 = sum of:
            0.0380555 = weight(_text_:retrieval in 3458) [ClassicSimilarity], result of:
              0.0380555 = score(doc=3458,freq=6.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.34732026 = fieldWeight in 3458, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3458)
            0.033685513 = weight(_text_:system in 3458) [ClassicSimilarity], result of:
              0.033685513 = score(doc=3458,freq=4.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.29527056 = fieldWeight in 3458, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3458)
          0.4 = coord(2/5)
      0.14285715 = coord(1/7)
    
    Abstract
    A retrievalsystem was built to find individuals with appropriate expertise within a large research establishment on the basis of their authored documents. The expert-locating system uses a new method for automatic indexing and retrieval based on singular value decomposition, a matrix decomposition technique related to the factor analysis. Organizational groups, represented by the documents they write, and the terms contained in these documents, are fit simultaneously into a 100-dimensional "semantic" space. User queries are positioned in the semantic space, and the most similar groups are returned to the user. Here we compared the standard vector-space model with this new technique and found that combining the two methods improved performance over either alone. We also examined the effects of various experimental variables on the system`s retrieval accuracy. In particular, the effects of: term weighting functions in the semantic space construction and in query construction, suffix stripping, and using lexical units larger than a a single word were studied.
  6. Rapke, K.: Automatische Indexierung von Volltexten für die Gruner+Jahr Pressedatenbank (2001) 0.00
    0.003943569 = product of:
      0.027604982 = sum of:
        0.027604982 = product of:
          0.069012456 = sum of:
            0.040941194 = weight(_text_:retrieval in 5863) [ClassicSimilarity], result of:
              0.040941194 = score(doc=5863,freq=10.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.37365708 = fieldWeight in 5863, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5863)
            0.028071264 = weight(_text_:system in 5863) [ClassicSimilarity], result of:
              0.028071264 = score(doc=5863,freq=4.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.24605882 = fieldWeight in 5863, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5863)
          0.4 = coord(2/5)
      0.14285715 = coord(1/7)
    
    Abstract
    Retrievaltests sind die anerkannteste Methode, um neue Verfahren der Inhaltserschließung gegenüber traditionellen Verfahren zu rechtfertigen. Im Rahmen einer Diplomarbeit wurden zwei grundsätzlich unterschiedliche Systeme der automatischen inhaltlichen Erschließung anhand der Pressedatenbank des Verlagshauses Gruner + Jahr (G+J) getestet und evaluiert. Untersucht wurde dabei natürlichsprachliches Retrieval im Vergleich zu Booleschem Retrieval. Bei den beiden Systemen handelt es sich zum einen um Autonomy von Autonomy Inc. und DocCat, das von IBM an die Datenbankstruktur der G+J Pressedatenbank angepasst wurde. Ersteres ist ein auf natürlichsprachlichem Retrieval basierendes, probabilistisches System. DocCat demgegenüber basiert auf Booleschem Retrieval und ist ein lernendes System, das aufgrund einer intellektuell erstellten Trainingsvorlage indexiert. Methodisch geht die Evaluation vom realen Anwendungskontext der Textdokumentation von G+J aus. Die Tests werden sowohl unter statistischen wie auch qualitativen Gesichtspunkten bewertet. Ein Ergebnis der Tests ist, dass DocCat einige Mängel gegenüber der intellektuellen Inhaltserschließung aufweist, die noch behoben werden müssen, während das natürlichsprachliche Retrieval von Autonomy in diesem Rahmen und für die speziellen Anforderungen der G+J Textdokumentation so nicht einsetzbar ist
  7. Chevallet, J.-P.; Bruandet, M.F.: Impact de l'utilisation de multi terms sur la qualité des résponses dùn système de recherche d'information a indexation automatique (1999) 0.00
    0.0034888082 = product of:
      0.024421657 = sum of:
        0.024421657 = product of:
          0.06105414 = sum of:
            0.029295133 = weight(_text_:retrieval in 6253) [ClassicSimilarity], result of:
              0.029295133 = score(doc=6253,freq=2.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.26736724 = fieldWeight in 6253, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0625 = fieldNorm(doc=6253)
            0.03175901 = weight(_text_:system in 6253) [ClassicSimilarity], result of:
              0.03175901 = score(doc=6253,freq=2.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.27838376 = fieldWeight in 6253, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.0625 = fieldNorm(doc=6253)
          0.4 = coord(2/5)
      0.14285715 = coord(1/7)
    
    Footnote
    Übers. d. Titels: Impact of the use of multi-terms on the quality of the answers of an information retrieval system based on automatic indexing
  8. Buckley, C.; Allan, J.; Salton, G.: Automatic routing and retrieval using Smart : TREC-2 (1995) 0.00
    0.0012555057 = product of:
      0.00878854 = sum of:
        0.00878854 = product of:
          0.0439427 = sum of:
            0.0439427 = weight(_text_:retrieval in 5699) [ClassicSimilarity], result of:
              0.0439427 = score(doc=5699,freq=8.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.40105087 = fieldWeight in 5699, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5699)
          0.2 = coord(1/5)
      0.14285715 = coord(1/7)
    
    Abstract
    The Smart information retrieval project emphazises completely automatic approaches to the understanding and retrieval of large quantities of text. The work in the TREC-2 environment continues, performing both routing and ad hoc experiments. The ad hoc work extends investigations into combining global similarities, giving an overall indication of how a document matches a query, with local similarities identifying a smaller part of the document that matches the query. The performance of ad hoc runs is good, but it is clear that full advantage of the available local information is not been taken advantage of. The routing experiments use conventional relevance feedback approaches to routing, but with a much greater degree of query expansion than was previously done. The length of a query vector is increased by a factor of 5 to 10 by adding terms found in previously seen relevant documents. This approach improves effectiveness by 30-40% over the original query
    Theme
    Semantisches Umfeld in Indexierung u. Retrieval
  9. Gödert, W.; Liebig, M.: Maschinelle Indexierung auf dem Prüfstand : Ergebnisse eines Retrievaltests zum MILOS II Projekt (1997) 0.00
    0.0010357393 = product of:
      0.007250175 = sum of:
        0.007250175 = product of:
          0.036250874 = sum of:
            0.036250874 = weight(_text_:retrieval in 1174) [ClassicSimilarity], result of:
              0.036250874 = score(doc=1174,freq=4.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.33085006 = fieldWeight in 1174, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1174)
          0.2 = coord(1/5)
      0.14285715 = coord(1/7)
    
    Abstract
    The test ran between Nov 95-Aug 96 in Cologne Fachhochschule fur Bibliothekswesen (College of Librarianship).The test basis was a database of 190,000 book titles published between 1990-95. MILOS II mechanized indexing methods proved helpful in avoiding or reducing numbers of unsatisfied/no result retrieval searches. Retrieval from mechanised indexing is 3 times more successful than from title keyword data. MILOS II also used a standardized semantic vocabulary. Mechanised indexing demands high quality software and output data
  10. Lepsky, K.; Siepmann, J.; Zimmermann, A.: Automatische Indexierung für Online-Kataloge : Ergebnisse eines Retrievaltests (1996) 0.00
    7.323784E-4 = product of:
      0.0051266486 = sum of:
        0.0051266486 = product of:
          0.025633242 = sum of:
            0.025633242 = weight(_text_:retrieval in 3251) [ClassicSimilarity], result of:
              0.025633242 = score(doc=3251,freq=2.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.23394634 = fieldWeight in 3251, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3251)
          0.2 = coord(1/5)
      0.14285715 = coord(1/7)
    
    Abstract
    Examines the effectiveness of automated indexing and presents the results of a study of information retrieval from a segment (40.000 items) of the ULB Düsseldorf database. The segment was selected randomly and all the documents included were indexed automatically. The search topics included 50 subject areas ranging from economic growth to alternative energy sources. While there were 876 relevant documents in the database segment for each of the 50 search topics, the recall ranged from 1 to 244 references, with the average being 17.52 documents per topic. Therefore it seems that, in the immediate future, automatic indexing should be used in combination with intellectual indexing
  11. Munkelt, J.: Erstellung einer DNB-Retrieval-Testkollektion (2018) 0.00
    7.323784E-4 = product of:
      0.0051266486 = sum of:
        0.0051266486 = product of:
          0.025633242 = sum of:
            0.025633242 = weight(_text_:retrieval in 4310) [ClassicSimilarity], result of:
              0.025633242 = score(doc=4310,freq=2.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.23394634 = fieldWeight in 4310, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=4310)
          0.2 = coord(1/5)
      0.14285715 = coord(1/7)
    
  12. Mielke, B.: Wider einige gängige Ansichten zur juristischen Informationserschließung (2002) 0.00
    6.8055023E-4 = product of:
      0.0047638514 = sum of:
        0.0047638514 = product of:
          0.023819257 = sum of:
            0.023819257 = weight(_text_:system in 2145) [ClassicSimilarity], result of:
              0.023819257 = score(doc=2145,freq=2.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.20878783 = fieldWeight in 2145, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2145)
          0.2 = coord(1/5)
      0.14285715 = coord(1/7)
    
    Abstract
    Ausgehend von einer Betrachtung in der Rechtsinformatik geläufiger Annahmen zur juristischen Informationserschließung beschreibt der Beitrag wesentliche Ergebnisse einer empirischen Studie der Retrievaleffektivität von Re-cherchen in juristischen Datenbanken. Dabei steht die Frage nach der Notwendigkeit einer intellektuellen Erschließung einerseits, der Effektivität der sogenannten Stichwortsuche andererseits im Mittelpunkt. Die Ergebnisse der Studie, bei der auch ein Vergleich zwischen einem Informationssystem auf der Basis eines Booleschen Retrievalmodells mit einem System auf der Basis statistischer Verfahren vorgenommen wurde, legen den Schluss nahe, dass in der rechtsinformatischen Fachliteratur analytisch begründete Annahmen wie die Gefahr zu großer Antwortmengen bei der Stichwortsuche empirisch nicht zu belegen sind. Auch zeigt sich keine Überlegenheit intellektueller Erschließungsverfahren (Beschlagwortung) gegenüber der automatischen Indexierung, im Gegenteil führt der Einsatz eines statistischen Verfahrens bei identischer Dokumentkollektion zu einer höheren Wiedergewinnungsrate (recall).
  13. Toepfer, M.; Seifert, C.: Content-based quality estimation for automatic subject indexing of short texts under precision and recall constraints 0.00
    5.2312744E-4 = product of:
      0.003661892 = sum of:
        0.003661892 = product of:
          0.01830946 = sum of:
            0.01830946 = weight(_text_:retrieval in 4309) [ClassicSimilarity], result of:
              0.01830946 = score(doc=4309,freq=2.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.16710453 = fieldWeight in 4309, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4309)
          0.2 = coord(1/5)
      0.14285715 = coord(1/7)
    
    Abstract
    Semantic annotations have to satisfy quality constraints to be useful for digital libraries, which is particularly challenging on large and diverse datasets. Confidence scores of multi-label classification methods typically refer only to the relevance of particular subjects, disregarding indicators of insufficient content representation at the document-level. Therefore, we propose a novel approach that detects documents rather than concepts where quality criteria are met. Our approach uses a deep, multi-layered regression architecture, which comprises a variety of content-based indicators. We evaluated multiple configurations using text collections from law and economics, where the available content is restricted to very short texts. Notably, we demonstrate that the proposed quality estimation technique can determine subsets of the previously unseen data where considerable gains in document-level recall can be achieved, while upholding precision at the same time. Hence, the approach effectively performs a filtering that ensures high data quality standards in operative information retrieval systems.