Search (291 results, page 1 of 15)

  • × theme_ss:"Automatisches Indexieren"
  1. Jones, S.; Paynter, G.W.: Automatic extractionof document keyphrases for use in digital libraries : evaluations and applications (2002) 0.04
    0.044884928 = product of:
      0.112212315 = sum of:
        0.005235487 = weight(_text_:information in 601) [ClassicSimilarity], result of:
          0.005235487 = score(doc=601,freq=2.0), product of:
            0.05398669 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.030753274 = queryNorm
            0.09697737 = fieldWeight in 601, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=601)
        0.015545071 = weight(_text_:retrieval in 601) [ClassicSimilarity], result of:
          0.015545071 = score(doc=601,freq=2.0), product of:
            0.093026035 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.030753274 = queryNorm
            0.16710453 = fieldWeight in 601, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=601)
        0.07029427 = weight(_text_:ranking in 601) [ClassicSimilarity], result of:
          0.07029427 = score(doc=601,freq=4.0), product of:
            0.16634533 = queryWeight, product of:
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.030753274 = queryNorm
            0.42258036 = fieldWeight in 601, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.0390625 = fieldNorm(doc=601)
        0.021137487 = product of:
          0.042274974 = sum of:
            0.042274974 = weight(_text_:evaluation in 601) [ClassicSimilarity], result of:
              0.042274974 = score(doc=601,freq=4.0), product of:
                0.12900078 = queryWeight, product of:
                  4.1947007 = idf(docFreq=1811, maxDocs=44218)
                  0.030753274 = queryNorm
                0.327711 = fieldWeight in 601, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.1947007 = idf(docFreq=1811, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=601)
          0.5 = coord(1/2)
      0.4 = coord(4/10)
    
    Abstract
    This article describes an evaluation of the Kea automatic keyphrase extraction algorithm. Document keyphrases are conventionally used as concise descriptors of document content, and are increasingly used in novel ways, including document clustering, searching and browsing interfaces, and retrieval engines. However, it is costly and time consuming to manually assign keyphrases to documents, motivating the development of tools that automatically perform this function. Previous studies have evaluated Kea's performance by measuring its ability to identify author keywords and keyphrases, but this methodology has a number of well-known limitations. The results presented in this article are based on evaluations by human assessors of the quality and appropriateness of Kea keyphrases. The results indicate that, in general, Kea produces keyphrases that are rated positively by human assessors. However, typical Kea settings can degrade performance, particularly those relating to keyphrase length and domain specificity. We found that for some settings, Kea's performance is better than that of similar systems, and that Kea's ranking of extracted keyphrases is effective. We also determined that author-specified keyphrases appear to exhibit an inherent ranking, and that they are rated highly and therefore suitable for use in training and evaluation of automatic keyphrasing systems.
    Source
    Journal of the American Society for Information Science and technology. 53(2002) no.8, S.653-677
  2. Nohr, H.: Grundlagen der automatischen Indexierung : ein Lehrbuch (2003) 0.04
    0.040014368 = product of:
      0.10003592 = sum of:
        0.0072545046 = weight(_text_:information in 1767) [ClassicSimilarity], result of:
          0.0072545046 = score(doc=1767,freq=6.0), product of:
            0.05398669 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.030753274 = queryNorm
            0.1343758 = fieldWeight in 1767, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.03125 = fieldNorm(doc=1767)
        0.012436057 = weight(_text_:retrieval in 1767) [ClassicSimilarity], result of:
          0.012436057 = score(doc=1767,freq=2.0), product of:
            0.093026035 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.030753274 = queryNorm
            0.13368362 = fieldWeight in 1767, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03125 = fieldNorm(doc=1767)
        0.03976444 = weight(_text_:ranking in 1767) [ClassicSimilarity], result of:
          0.03976444 = score(doc=1767,freq=2.0), product of:
            0.16634533 = queryWeight, product of:
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.030753274 = queryNorm
            0.23904754 = fieldWeight in 1767, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.03125 = fieldNorm(doc=1767)
        0.040580913 = sum of:
          0.023914335 = weight(_text_:evaluation in 1767) [ClassicSimilarity], result of:
            0.023914335 = score(doc=1767,freq=2.0), product of:
              0.12900078 = queryWeight, product of:
                4.1947007 = idf(docFreq=1811, maxDocs=44218)
                0.030753274 = queryNorm
              0.18538132 = fieldWeight in 1767, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1947007 = idf(docFreq=1811, maxDocs=44218)
                0.03125 = fieldNorm(doc=1767)
          0.016666576 = weight(_text_:22 in 1767) [ClassicSimilarity], result of:
            0.016666576 = score(doc=1767,freq=2.0), product of:
              0.107692726 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.030753274 = queryNorm
              0.15476047 = fieldWeight in 1767, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.03125 = fieldNorm(doc=1767)
      0.4 = coord(4/10)
    
    Date
    22. 6.2009 12:46:51
    Footnote
    Rez. in: nfd 54(2003) H.5, S.314 (W. Ratzek): "Um entscheidungsrelevante Daten aus der ständig wachsenden Flut von mehr oder weniger relevanten Dokumenten zu extrahieren, müssen Unternehmen, öffentliche Verwaltung oder Einrichtungen der Fachinformation effektive und effiziente Filtersysteme entwickeln, einsetzen und pflegen. Das vorliegende Lehrbuch von Holger Nohr bietet erstmalig eine grundlegende Einführung in das Thema "automatische Indexierung". Denn: "Wie man Information sammelt, verwaltet und verwendet, wird darüber entscheiden, ob man zu den Gewinnern oder Verlierern gehört" (Bill Gates), heißt es einleitend. Im ersten Kapitel "Einleitung" stehen die Grundlagen im Mittelpunkt. Die Zusammenhänge zwischen Dokumenten-Management-Systeme, Information Retrieval und Indexierung für Planungs-, Entscheidungs- oder Innovationsprozesse, sowohl in Profit- als auch Non-Profit-Organisationen werden beschrieben. Am Ende des einleitenden Kapitels geht Nohr auf die Diskussion um die intellektuelle und automatische Indexierung ein und leitet damit über zum zweiten Kapitel "automatisches Indexieren. Hier geht der Autor überblickartig unter anderem ein auf - Probleme der automatischen Sprachverarbeitung und Indexierung - verschiedene Verfahren der automatischen Indexierung z.B. einfache Stichwortextraktion / Volltextinvertierung, - statistische Verfahren, Pattern-Matching-Verfahren. Die "Verfahren der automatischen Indexierung" behandelt Nohr dann vertiefend und mit vielen Beispielen versehen im umfangreichsten dritten Kapitel. Das vierte Kapitel "Keyphrase Extraction" nimmt eine Passpartout-Status ein: "Eine Zwischenstufe auf dem Weg von der automatischen Indexierung hin zur automatischen Generierung textueller Zusammenfassungen (Automatic Text Summarization) stellen Ansätze dar, die Schlüsselphrasen aus Dokumenten extrahieren (Keyphrase Extraction). Die Grenzen zwischen den automatischen Verfahren der Indexierung und denen des Text Summarization sind fließend." (S. 91). Am Beispiel NCR"s Extractor/Copernic Summarizer beschreibt Nohr die Funktionsweise.
    Im fünften Kapitel "Information Extraction" geht Nohr auf eine Problemstellung ein, die in der Fachwelt eine noch stärkere Betonung verdiente: "Die stetig ansteigende Zahl elektronischer Dokumente macht neben einer automatischen Erschließung auch eine automatische Gewinnung der relevanten Informationen aus diesen Dokumenten wünschenswert, um diese z.B. für weitere Bearbeitungen oder Auswertungen in betriebliche Informationssysteme übernehmen zu können." (S. 103) "Indexierung und Retrievalverfahren" als voneinander abhängige Verfahren werden im sechsten Kapitel behandelt. Hier stehen Relevance Ranking und Relevance Feedback sowie die Anwendung informationslinguistischer Verfahren in der Recherche im Mittelpunkt. Die "Evaluation automatischer Indexierung" setzt den thematischen Schlusspunkt. Hier geht es vor allem um die Oualität einer Indexierung, um gängige Retrievalmaße in Retrievaltest und deren Einssatz. Weiterhin ist hervorzuheben, dass jedes Kapitel durch die Vorgabe von Lernzielen eingeleitet wird und zu den jeweiligen Kapiteln (im hinteren Teil des Buches) einige Kontrollfragen gestellt werden. Die sehr zahlreichen Beispiele aus der Praxis, ein Abkürzungsverzeichnis und ein Sachregister erhöhen den Nutzwert des Buches. Die Lektüre förderte beim Rezensenten das Verständnis für die Zusammenhänge von BID-Handwerkzeug, Wirtschaftsinformatik (insbesondere Data Warehousing) und Künstlicher Intelligenz. Die "Grundlagen der automatischen Indexierung" sollte auch in den bibliothekarischen Studiengängen zur Pflichtlektüre gehören. Holger Nohrs Lehrbuch ist auch für den BID-Profi geeignet, um die mehr oder weniger fundierten Kenntnisse auf dem Gebiet "automatisches Indexieren" schnell, leicht verständlich und informativ aufzufrischen."
  3. Advances in intelligent retrieval: Proc. of a conference ... Wadham College, Oxford, 16.-17.4.1985 (1986) 0.04
    0.036914706 = product of:
      0.12304902 = sum of:
        0.014048288 = weight(_text_:information in 1384) [ClassicSimilarity], result of:
          0.014048288 = score(doc=1384,freq=10.0), product of:
            0.05398669 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.030753274 = queryNorm
            0.2602176 = fieldWeight in 1384, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=1384)
        0.04935407 = weight(_text_:retrieval in 1384) [ClassicSimilarity], result of:
          0.04935407 = score(doc=1384,freq=14.0), product of:
            0.093026035 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.030753274 = queryNorm
            0.5305404 = fieldWeight in 1384, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=1384)
        0.059646662 = weight(_text_:ranking in 1384) [ClassicSimilarity], result of:
          0.059646662 = score(doc=1384,freq=2.0), product of:
            0.16634533 = queryWeight, product of:
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.030753274 = queryNorm
            0.35857132 = fieldWeight in 1384, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.046875 = fieldNorm(doc=1384)
      0.3 = coord(3/10)
    
    Content
    Enthält die Beiträge: ADDIS, T.: Extended relational analysis: a design approach to knowledge-based systems; PARKINSON, D.: Supercomputers and non-numeric processing; McGREGOR, D.R. u. J.R. MALONE: An architectural approach to advances in information retrieval; ALLEN, M.J. u. O.S. HARRISON: Word processing and information retrieval: some practical problems; MURTAGH, F.: Clustering and nearest neighborhood searching; ENSER, P.G.B.: Experimenting with the automatic classification of books; TESKEY, N. u. Z. RAZAK: An analysis of ranking for free text retrieval systems; ZARRI, G.P.: Interactive information retrieval: an artificial intelligence approach to deal with biographical data; HANCOX, P. u. F. SMITH: A case system processor for the PRECIS indexing language; ROUAULT, J.: Linguistic methods in information retrieval systems; ARAGON-RAMIREZ, V. u. C.D. PAICE: Design of a system for the online elucidation of natural language search statements; BROOKS, H.M., P.J. DANIELS u. N.J. BELKIN: Problem descriptions and user models: developing an intelligent interface for document retrieval systems; BLACK, W.J., P. HARGREAVES u. P.B. MAYES: HEADS: a cataloguing advisory system; BELL, D.A.: An architecture for integrating data, knowledge, and information bases
  4. SIGIR'92 : Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (1992) 0.04
    0.036129367 = product of:
      0.09032342 = sum of:
        0.00897699 = weight(_text_:information in 6671) [ClassicSimilarity], result of:
          0.00897699 = score(doc=6671,freq=12.0), product of:
            0.05398669 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.030753274 = queryNorm
            0.16628155 = fieldWeight in 6671, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.02734375 = fieldNorm(doc=6671)
        0.03609002 = weight(_text_:retrieval in 6671) [ClassicSimilarity], result of:
          0.03609002 = score(doc=6671,freq=22.0), product of:
            0.093026035 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.030753274 = queryNorm
            0.3879561 = fieldWeight in 6671, product of:
              4.690416 = tf(freq=22.0), with freq of:
                22.0 = termFreq=22.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.02734375 = fieldNorm(doc=6671)
        0.034793887 = weight(_text_:ranking in 6671) [ClassicSimilarity], result of:
          0.034793887 = score(doc=6671,freq=2.0), product of:
            0.16634533 = queryWeight, product of:
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.030753274 = queryNorm
            0.2091666 = fieldWeight in 6671, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.02734375 = fieldNorm(doc=6671)
        0.0104625225 = product of:
          0.020925045 = sum of:
            0.020925045 = weight(_text_:evaluation in 6671) [ClassicSimilarity], result of:
              0.020925045 = score(doc=6671,freq=2.0), product of:
                0.12900078 = queryWeight, product of:
                  4.1947007 = idf(docFreq=1811, maxDocs=44218)
                  0.030753274 = queryNorm
                0.16220866 = fieldWeight in 6671, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.1947007 = idf(docFreq=1811, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=6671)
          0.5 = coord(1/2)
      0.4 = coord(4/10)
    
    Content
    HARMAN, D.: Relevance feedback revisited; AALBERSBERG, I.J.: Incremental relevance feedback; TAGUE-SUTCLIFFE, J.: Measuring the informativeness of a retrieval process; LEWIS, D.D.: An evaluation of phrasal and clustered representations on a text categorization task; BLOSSEVILLE, M.J., G. HÉBRAIL, M.G. MONTEIL u. N. PÉNOT: Automatic document classification: natural language processing, statistical analysis, and expert system techniques used together; MASAND, B., G. LINOFF u. D. WALTZ: Classifying news stories using memory based reasoning; KEEN, E.M.: Term position ranking: some new test results; CROUCH, C.J. u. B. YANG: Experiments in automatic statistical thesaurus construction; GREFENSTETTE, G.: Use of syntactic context to produce term association lists for text retrieval; ANICK, P.G. u. R.A. FLYNN: Versioning of full-text information retrieval system; BURKOWSKI, F.J.: Retrieval activities in a database consisting of heterogeneous collections; DEERWESTER, S.C., K. WACLENA u. M. LaMAR: A textual object management system; NIE, J.-Y.:Towards a probabilistic modal logic for semantic-based information retrieval; WANG, A.W., S.K.M. WONG u. Y.Y. YAO: An analysis of vector space models based on computational geometry; BARTELL, B.T., G.W. COTTRELL u. R.K. BELEW: Latent semantic indexing is an optimal special case of multidimensional scaling; GLAVITSCH, U. u. P. SCHÄUBLE: A system for retrieving speech documents; MARGULIS, E.L.: N-Poisson document modelling; HESS, M.: An incrementally extensible document retrieval system based on linguistics and logical principles; COOPER, W.S., F.C. GEY u. D.P. DABNEY: Probabilistic retrieval based on staged logistic regression; FUHR, N.: Integration of probabilistic fact and text retrieval; CROFT, B., L.A. SMITH u. H. TURTLE: A loosely-coupled integration of a text retrieval system and an object-oriented database system; DUMAIS, S.T. u. J. NIELSEN: Automating the assignement of submitted manuscripts to reviewers; GOST, M.A. u. M. MASOTTI: Design of an OPAC database to permit different subject searching accesses; ROBERTSON, A.M. u. P. WILLETT: Searching for historical word forms in a database of 17th century English text using spelling correction methods; FAX, E.A., Q.F. CHEN u. L.S. HEATH: A faster algorithm for constructing minimal perfect hash functions; MOFFAT, A. u. J. ZOBEL: Parameterised compression for sparse bitmaps; GRANDI, F., P. TIBERIO u. P. Zezula: Frame-sliced patitioned parallel signature files; ALLEN, B.: Cognitive differences in end user searching of a CD-ROM index; SONNENWALD, D.H.: Developing a theory to guide the process of designing information retrieval systems; CUTTING, D.R., J.O. PEDERSEN, D. KARGER, u. J.W. TUKEY: Scatter/ Gather: a cluster-based approach to browsing large document collections; CHALMERS, M. u. P. CHITSON: Bead: Explorations in information visualization; WILLIAMSON, C. u. B. SHNEIDERMAN: The dynamic HomeFinder: evaluating dynamic queries in a real-estate information exploring system
  5. Salton, G.: Another look at automatic text-retrieval systems (1986) 0.03
    0.03296507 = product of:
      0.10988356 = sum of:
        0.010470974 = weight(_text_:information in 1356) [ClassicSimilarity], result of:
          0.010470974 = score(doc=1356,freq=2.0), product of:
            0.05398669 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.030753274 = queryNorm
            0.19395474 = fieldWeight in 1356, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.078125 = fieldNorm(doc=1356)
        0.06951967 = weight(_text_:retrieval in 1356) [ClassicSimilarity], result of:
          0.06951967 = score(doc=1356,freq=10.0), product of:
            0.093026035 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.030753274 = queryNorm
            0.74731416 = fieldWeight in 1356, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.078125 = fieldNorm(doc=1356)
        0.02989292 = product of:
          0.05978584 = sum of:
            0.05978584 = weight(_text_:evaluation in 1356) [ClassicSimilarity], result of:
              0.05978584 = score(doc=1356,freq=2.0), product of:
                0.12900078 = queryWeight, product of:
                  4.1947007 = idf(docFreq=1811, maxDocs=44218)
                  0.030753274 = queryNorm
                0.4634533 = fieldWeight in 1356, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.1947007 = idf(docFreq=1811, maxDocs=44218)
                  0.078125 = fieldNorm(doc=1356)
          0.5 = coord(1/2)
      0.3 = coord(3/10)
    
    Footnote
    Bezugnahme auf: Blair, D.C.: An evaluation of retrieval effectiveness for a full-text document-retrieval system. Comm. ACM 28(1985) S.280-299. - Vgl. auch: Blair, D.C.: Full text retrieval ... Int. Class. 13(1986) S.18-23; Blair, D.C., M.E. Maron: full-text information retrieval ... Inf. Proc. Man. 26(1990) S.437-447.
  6. Fuhr, N.; Knorz, G.: Retrieval test evaluation of a rule based automatic indexing (AIR/PHYS) (1984) 0.03
    0.03035952 = product of:
      0.1011984 = sum of:
        0.0125651695 = weight(_text_:information in 2321) [ClassicSimilarity], result of:
          0.0125651695 = score(doc=2321,freq=2.0), product of:
            0.05398669 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.030753274 = queryNorm
            0.23274569 = fieldWeight in 2321, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.09375 = fieldNorm(doc=2321)
        0.052761722 = weight(_text_:retrieval in 2321) [ClassicSimilarity], result of:
          0.052761722 = score(doc=2321,freq=4.0), product of:
            0.093026035 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.030753274 = queryNorm
            0.5671716 = fieldWeight in 2321, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.09375 = fieldNorm(doc=2321)
        0.035871506 = product of:
          0.07174301 = sum of:
            0.07174301 = weight(_text_:evaluation in 2321) [ClassicSimilarity], result of:
              0.07174301 = score(doc=2321,freq=2.0), product of:
                0.12900078 = queryWeight, product of:
                  4.1947007 = idf(docFreq=1811, maxDocs=44218)
                  0.030753274 = queryNorm
                0.556144 = fieldWeight in 2321, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.1947007 = idf(docFreq=1811, maxDocs=44218)
                  0.09375 = fieldNorm(doc=2321)
          0.5 = coord(1/2)
      0.3 = coord(3/10)
    
    Source
    Research and development in information retrieval. Proc. of the 3rd joint BCS and ACM symp., Cambridge, 2.-6.7.1984. Ed.: C.J. van Rijsbergen
  7. Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.03
    0.029949283 = product of:
      0.09983094 = sum of:
        0.01675356 = weight(_text_:information in 402) [ClassicSimilarity], result of:
          0.01675356 = score(doc=402,freq=2.0), product of:
            0.05398669 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.030753274 = queryNorm
            0.3103276 = fieldWeight in 402, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.125 = fieldNorm(doc=402)
        0.04974423 = weight(_text_:retrieval in 402) [ClassicSimilarity], result of:
          0.04974423 = score(doc=402,freq=2.0), product of:
            0.093026035 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.030753274 = queryNorm
            0.5347345 = fieldWeight in 402, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.125 = fieldNorm(doc=402)
        0.033333153 = product of:
          0.066666305 = sum of:
            0.066666305 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
              0.066666305 = score(doc=402,freq=2.0), product of:
                0.107692726 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.030753274 = queryNorm
                0.61904186 = fieldWeight in 402, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.125 = fieldNorm(doc=402)
          0.5 = coord(1/2)
      0.3 = coord(3/10)
    
    Source
    Information processing and management. 22(1986) no.6, S.465-476
  8. Fuhr, N.: Ranking-Experimente mit gewichteter Indexierung (1986) 0.03
    0.028858637 = product of:
      0.14429319 = sum of:
        0.119293325 = weight(_text_:ranking in 58) [ClassicSimilarity], result of:
          0.119293325 = score(doc=58,freq=2.0), product of:
            0.16634533 = queryWeight, product of:
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.030753274 = queryNorm
            0.71714264 = fieldWeight in 58, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.09375 = fieldNorm(doc=58)
        0.024999864 = product of:
          0.04999973 = sum of:
            0.04999973 = weight(_text_:22 in 58) [ClassicSimilarity], result of:
              0.04999973 = score(doc=58,freq=2.0), product of:
                0.107692726 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.030753274 = queryNorm
                0.46428138 = fieldWeight in 58, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=58)
          0.5 = coord(1/2)
      0.2 = coord(2/10)
    
    Date
    14. 6.2015 22:12:44
  9. MacDougall, S.: Rethinking indexing : the impact of the Internet (1996) 0.03
    0.027693033 = product of:
      0.09231011 = sum of:
        0.0062825847 = weight(_text_:information in 704) [ClassicSimilarity], result of:
          0.0062825847 = score(doc=704,freq=2.0), product of:
            0.05398669 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.030753274 = queryNorm
            0.116372846 = fieldWeight in 704, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=704)
        0.026380861 = weight(_text_:retrieval in 704) [ClassicSimilarity], result of:
          0.026380861 = score(doc=704,freq=4.0), product of:
            0.093026035 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.030753274 = queryNorm
            0.2835858 = fieldWeight in 704, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=704)
        0.059646662 = weight(_text_:ranking in 704) [ClassicSimilarity], result of:
          0.059646662 = score(doc=704,freq=2.0), product of:
            0.16634533 = queryWeight, product of:
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.030753274 = queryNorm
            0.35857132 = fieldWeight in 704, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.046875 = fieldNorm(doc=704)
      0.3 = coord(3/10)
    
    Abstract
    Considers the challenge to professional indexers posed by the Internet. Indexing and searching on the Internet appears to have a retrograde step, as well developed and efficient information retrieval techniques have been replaced by cruder techniques, involving automatic keyword indexing and frequency ranking, leading to large retrieval sets and low precision. This is made worse by the apparent acceptance of this poor perfromance by Internet users and the feeling, on the part of indexers, that they are being bypassed by the producers of these hyperlinked menus and search engines. Key issues are: how far 'human' indexing will still be required in the Internet environment; how indexing techniques will have to change to stay relevant; and the future role of indexers. The challenge facing indexers is to adapt their skills to suit the online environment and to convince publishers of the need for efficient indexes on the Internet
  10. Grün, S.: Mehrwortbegriffe und Latent Semantic Analysis : Bewertung automatisch extrahierter Mehrwortgruppen mit LSA (2017) 0.03
    0.026130417 = product of:
      0.087101385 = sum of:
        0.010470974 = weight(_text_:information in 3954) [ClassicSimilarity], result of:
          0.010470974 = score(doc=3954,freq=8.0), product of:
            0.05398669 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.030753274 = queryNorm
            0.19395474 = fieldWeight in 3954, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3954)
        0.026924854 = weight(_text_:retrieval in 3954) [ClassicSimilarity], result of:
          0.026924854 = score(doc=3954,freq=6.0), product of:
            0.093026035 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.030753274 = queryNorm
            0.28943354 = fieldWeight in 3954, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3954)
        0.049705554 = weight(_text_:ranking in 3954) [ClassicSimilarity], result of:
          0.049705554 = score(doc=3954,freq=2.0), product of:
            0.16634533 = queryWeight, product of:
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.030753274 = queryNorm
            0.29880944 = fieldWeight in 3954, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3954)
      0.3 = coord(3/10)
    
    Abstract
    Die vorliegende Studie untersucht das Potenzial von Mehrwortbegriffen für das Information Retrieval. Zielsetzung der Arbeit ist es, intellektuell positiv bewertete Kandidaten mithilfe des Latent Semantic Analysis (LSA) Verfahren höher zu gewichten, als negativ bewertete Kandidaten. Die positiven Kandidaten sollen demnach bei einem Ranking im Information Retrieval bevorzugt werden. Als Kollektion wurde eine Version der sozialwissenschaftlichen GIRT-Datenbank (German Indexing and Retrieval Testdatabase) eingesetzt. Um Kandidaten für Mehrwortbegriffe zu identifizieren wurde die automatische Indexierung Lingo verwendet. Die notwendigen Kernfunktionalitäten waren Lemmatisierung, Identifizierung von Komposita, algorithmische Mehrworterkennung sowie Gewichtung von Indextermen durch das LSA-Modell. Die durch Lingo erkannten und LSAgewichteten Mehrwortkandidaten wurden evaluiert. Zuerst wurde dazu eine intellektuelle Auswahl von positiven und negativen Mehrwortkandidaten vorgenommen. Im zweiten Schritt der Evaluierung erfolgte die Berechnung der Ausbeute, um den Anteil der positiven Mehrwortkandidaten zu erhalten. Im letzten Schritt der Evaluierung wurde auf der Basis der R-Precision berechnet, wie viele positiv bewerteten Mehrwortkandidaten es an der Stelle k des Rankings geschafft haben. Die Ausbeute der positiven Mehrwortkandidaten lag bei durchschnittlich ca. 39%, während die R-Precision einen Durchschnittswert von 54% erzielte. Das LSA-Modell erzielt ein ambivalentes Ergebnis mit positiver Tendenz.
    Footnote
    Masterarbeit, Studiengang Informationswissenschaft und Sprachtechnologie, Institut für Sprache und Information, Philosophische Fakultät, Heinrich-Heine-Universität Düsseldorf
    Imprint
    Düsseldorf : Heinrich-Heine-Universität / Philosophische Fakultät / Institut für Sprache und Information
  11. Liu, G.Z.: Semantic vector space model : implementation and evaluation (1997) 0.03
    0.025159502 = product of:
      0.083865 = sum of:
        0.0062825847 = weight(_text_:information in 161) [ClassicSimilarity], result of:
          0.0062825847 = score(doc=161,freq=2.0), product of:
            0.05398669 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.030753274 = queryNorm
            0.116372846 = fieldWeight in 161, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=161)
        0.059646662 = weight(_text_:ranking in 161) [ClassicSimilarity], result of:
          0.059646662 = score(doc=161,freq=2.0), product of:
            0.16634533 = queryWeight, product of:
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.030753274 = queryNorm
            0.35857132 = fieldWeight in 161, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.046875 = fieldNorm(doc=161)
        0.017935753 = product of:
          0.035871506 = sum of:
            0.035871506 = weight(_text_:evaluation in 161) [ClassicSimilarity], result of:
              0.035871506 = score(doc=161,freq=2.0), product of:
                0.12900078 = queryWeight, product of:
                  4.1947007 = idf(docFreq=1811, maxDocs=44218)
                  0.030753274 = queryNorm
                0.278072 = fieldWeight in 161, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.1947007 = idf(docFreq=1811, maxDocs=44218)
                  0.046875 = fieldNorm(doc=161)
          0.5 = coord(1/2)
      0.3 = coord(3/10)
    
    Abstract
    Presents the Semantic Vector Space Model (SVSM), a text representation and searching technique based on the combination of Vector Space Model (VSM) with heuristic syntax parsing and distributed representation of semantic case structures. Both document and queries are represented as semantic matrices. A search mechanism is designed to compute the similarity between 2 semantic matrices to predict relevancy. A prototype system was built to implement this model by modifying the SMART system and using the Xerox Part of Speech tagged as the pre-processor of the indexing. The prototype system was used in an experimental study to evaluate this technique in terms of precision, recall, and effectiveness of relevance ranking. Results show that if documents and queries were too short, the technique was less effective than VSM. But with longer documents and queires, especially when original docuemtns were used as queries, the system based on this technique was found be performance better than SMART
    Source
    Journal of the American Society for Information Science. 48(1997) no.5, S.395-417
  12. Biebricher, N.; Fuhr, N.; Lustig, G.; Schwantner, M.; Knorz, G.: ¬The automatic indexing system AIR/PHYS : from research to application (1988) 0.02
    0.023882855 = product of:
      0.07960951 = sum of:
        0.014808194 = weight(_text_:information in 1952) [ClassicSimilarity], result of:
          0.014808194 = score(doc=1952,freq=4.0), product of:
            0.05398669 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.030753274 = queryNorm
            0.27429342 = fieldWeight in 1952, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.078125 = fieldNorm(doc=1952)
        0.0439681 = weight(_text_:retrieval in 1952) [ClassicSimilarity], result of:
          0.0439681 = score(doc=1952,freq=4.0), product of:
            0.093026035 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.030753274 = queryNorm
            0.47264296 = fieldWeight in 1952, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.078125 = fieldNorm(doc=1952)
        0.02083322 = product of:
          0.04166644 = sum of:
            0.04166644 = weight(_text_:22 in 1952) [ClassicSimilarity], result of:
              0.04166644 = score(doc=1952,freq=2.0), product of:
                0.107692726 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.030753274 = queryNorm
                0.38690117 = fieldWeight in 1952, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=1952)
          0.5 = coord(1/2)
      0.3 = coord(3/10)
    
    Date
    16. 8.1998 12:51:22
    Footnote
    Wiederabgedruckt in: Readings in information retrieval. Ed.: K. Sparck Jones u. P. Willett. San Francisco: Morgan Kaufmann 1997. S.513-517.
    Source
    Proceedings of the 11th annual conference on research and development in information retrieval. Ed.: Y. Chiaramella
  13. Plaunt, C.; Norgard, B.A.: ¬An association-based method for automatic indexing with a controlled vocabulary (1998) 0.02
    0.022102593 = product of:
      0.073675305 = sum of:
        0.007404097 = weight(_text_:information in 1794) [ClassicSimilarity], result of:
          0.007404097 = score(doc=1794,freq=4.0), product of:
            0.05398669 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.030753274 = queryNorm
            0.13714671 = fieldWeight in 1794, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1794)
        0.015545071 = weight(_text_:retrieval in 1794) [ClassicSimilarity], result of:
          0.015545071 = score(doc=1794,freq=2.0), product of:
            0.093026035 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.030753274 = queryNorm
            0.16710453 = fieldWeight in 1794, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1794)
        0.050726138 = sum of:
          0.02989292 = weight(_text_:evaluation in 1794) [ClassicSimilarity], result of:
            0.02989292 = score(doc=1794,freq=2.0), product of:
              0.12900078 = queryWeight, product of:
                4.1947007 = idf(docFreq=1811, maxDocs=44218)
                0.030753274 = queryNorm
              0.23172665 = fieldWeight in 1794, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1947007 = idf(docFreq=1811, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1794)
          0.02083322 = weight(_text_:22 in 1794) [ClassicSimilarity], result of:
            0.02083322 = score(doc=1794,freq=2.0), product of:
              0.107692726 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.030753274 = queryNorm
              0.19345059 = fieldWeight in 1794, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1794)
      0.3 = coord(3/10)
    
    Abstract
    In this article, we describe and test a two-stage algorithm based on a lexical collocation technique which maps from the lexical clues contained in a document representation into a controlled vocabulary list of subject headings. Using a collection of 4.626 INSPEC documents, we create a 'dictionary' of associations between the lexical items contained in the titles, authors, and abstracts, and controlled vocabulary subject headings assigned to those records by human indexers using a likelihood ratio statistic as the measure of association. In the deployment stage, we use the dictiony to predict which of the controlled vocabulary subject headings best describe new documents when they are presented to the system. Our evaluation of this algorithm, in which we compare the automatically assigned subject headings to the subject headings assigned to the test documents by human catalogers, shows that we can obtain results comparable to, and consistent with, human cataloging. In effect we have cast this as a classic partial match information retrieval problem. We consider the problem to be one of 'retrieving' (or assigning) the most probably 'relevant' (or correct) controlled vocabulary subject headings to a document based on the clues contained in that document
    Date
    11. 9.2000 19:53:22
    Source
    Journal of the American Society for Information Science. 49(1998) no.10, S.888-902
  14. Tsai, C.-F.; McGarry, K.; Tait, J.: Qualitative evaluation of automatic assignment of keywords to images (2006) 0.02
    0.021617675 = product of:
      0.072058916 = sum of:
        0.005235487 = weight(_text_:information in 963) [ClassicSimilarity], result of:
          0.005235487 = score(doc=963,freq=2.0), product of:
            0.05398669 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.030753274 = queryNorm
            0.09697737 = fieldWeight in 963, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=963)
        0.02198405 = weight(_text_:retrieval in 963) [ClassicSimilarity], result of:
          0.02198405 = score(doc=963,freq=4.0), product of:
            0.093026035 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.030753274 = queryNorm
            0.23632148 = fieldWeight in 963, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=963)
        0.044839382 = product of:
          0.089678764 = sum of:
            0.089678764 = weight(_text_:evaluation in 963) [ClassicSimilarity], result of:
              0.089678764 = score(doc=963,freq=18.0), product of:
                0.12900078 = queryWeight, product of:
                  4.1947007 = idf(docFreq=1811, maxDocs=44218)
                  0.030753274 = queryNorm
                0.69518 = fieldWeight in 963, product of:
                  4.2426405 = tf(freq=18.0), with freq of:
                    18.0 = termFreq=18.0
                  4.1947007 = idf(docFreq=1811, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=963)
          0.5 = coord(1/2)
      0.3 = coord(3/10)
    
    Abstract
    In image retrieval, most systems lack user-centred evaluation since they are assessed by some chosen ground truth dataset. The results reported through precision and recall assessed against the ground truth are thought of as being an acceptable surrogate for the judgment of real users. Much current research focuses on automatically assigning keywords to images for enhancing retrieval effectiveness. However, evaluation methods are usually based on system-level assessment, e.g. classification accuracy based on some chosen ground truth dataset. In this paper, we present a qualitative evaluation methodology for automatic image indexing systems. The automatic indexing task is formulated as one of image annotation, or automatic metadata generation for images. The evaluation is composed of two individual methods. First, the automatic indexing annotation results are assessed by human subjects. Second, the subjects are asked to annotate some chosen images as the test set whose annotations are used as ground truth. Then, the system is tested by the test set whose annotation results are judged against the ground truth. Only one of these methods is reported for most systems on which user-centred evaluation are conducted. We believe that both methods need to be considered for full evaluation. We also provide an example evaluation of our system based on this methodology. According to this study, our proposed evaluation methodology is able to provide deeper understanding of the system's performance.
    Source
    Information processing and management. 42(2006) no.1, S.136-154
  15. Rapke, K.: Automatische Indexierung von Volltexten für die Gruner+Jahr Pressedatenbank (2001) 0.02
    0.0209734 = product of:
      0.06991133 = sum of:
        0.0062825847 = weight(_text_:information in 6386) [ClassicSimilarity], result of:
          0.0062825847 = score(doc=6386,freq=2.0), product of:
            0.05398669 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.030753274 = queryNorm
            0.116372846 = fieldWeight in 6386, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=6386)
        0.04569299 = weight(_text_:retrieval in 6386) [ClassicSimilarity], result of:
          0.04569299 = score(doc=6386,freq=12.0), product of:
            0.093026035 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.030753274 = queryNorm
            0.49118498 = fieldWeight in 6386, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=6386)
        0.017935753 = product of:
          0.035871506 = sum of:
            0.035871506 = weight(_text_:evaluation in 6386) [ClassicSimilarity], result of:
              0.035871506 = score(doc=6386,freq=2.0), product of:
                0.12900078 = queryWeight, product of:
                  4.1947007 = idf(docFreq=1811, maxDocs=44218)
                  0.030753274 = queryNorm
                0.278072 = fieldWeight in 6386, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.1947007 = idf(docFreq=1811, maxDocs=44218)
                  0.046875 = fieldNorm(doc=6386)
          0.5 = coord(1/2)
      0.3 = coord(3/10)
    
    Abstract
    Retrieval Tests sind die anerkannteste Methode, um neue Verfahren der Inhaltserschließung gegenüber traditionellen Verfahren zu rechtfertigen. Im Rahmen einer Diplomarbeit wurden zwei grundsätzlich unterschiedliche Systeme der automatischen inhaltlichen Erschließung anhand der Pressedatenbank des Verlagshauses Gruner + Jahr (G+J) getestet und evaluiert. Untersucht wurde dabei natürlichsprachliches Retrieval im Vergleich zu Booleschem Retrieval. Bei den beiden Systemen handelt es sich zum einen um Autonomy von Autonomy Inc. und DocCat, das von IBM an die Datenbankstruktur der G+J Pressedatenbank angepasst wurde. Ersteres ist ein auf natürlichsprachlichem Retrieval basierendes, probabilistisches System. DocCat demgegenüber basiert auf Booleschem Retrieval und ist ein lernendes System, das auf Grund einer intellektuell erstellten Trainingsvorlage indexiert. Methodisch geht die Evaluation vom realen Anwendungskontext der Textdokumentation von G+J aus. Die Tests werden sowohl unter statistischen wie auch qualitativen Gesichtspunkten bewertet. Ein Ergebnis der Tests ist, dass DocCat einige Mängel gegenüber der intellektuellen Inhaltserschließung aufweist, die noch behoben werden müssen, während das natürlichsprachliche Retrieval von Autonomy in diesem Rahmen und für die speziellen Anforderungen der G+J Textdokumentation so nicht einsetzbar ist
    Source
    nfd Information - Wissenschaft und Praxis. 52(2001) H.5, S.251-262
  16. Hodges, P.R.: Keyword in title indexes : effectiveness of retrieval in computer searches (1983) 0.02
    0.020542556 = product of:
      0.06847519 = sum of:
        0.010365736 = weight(_text_:information in 5001) [ClassicSimilarity], result of:
          0.010365736 = score(doc=5001,freq=4.0), product of:
            0.05398669 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.030753274 = queryNorm
            0.1920054 = fieldWeight in 5001, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5001)
        0.0435262 = weight(_text_:retrieval in 5001) [ClassicSimilarity], result of:
          0.0435262 = score(doc=5001,freq=8.0), product of:
            0.093026035 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.030753274 = queryNorm
            0.46789268 = fieldWeight in 5001, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5001)
        0.014583254 = product of:
          0.029166508 = sum of:
            0.029166508 = weight(_text_:22 in 5001) [ClassicSimilarity], result of:
              0.029166508 = score(doc=5001,freq=2.0), product of:
                0.107692726 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.030753274 = queryNorm
                0.2708308 = fieldWeight in 5001, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5001)
          0.5 = coord(1/2)
      0.3 = coord(3/10)
    
    Abstract
    A study was done to test the effectiveness of retrieval using title word searching. It was based on actual search profiles used in the Mechanized Information Center at Ohio State University, in order ro replicate as closely as possible actual searching conditions. Fewer than 50% of the relevant titles were retrieved by keywords in titles. The low rate of retrieval can be attributes to three sources: titles themselves, user and information specialist ignorance of the subject vocabulary in use, and to general language problems. Across fields it was found that the social sciences had the best retrieval rate, with science having the next best, and arts and humanities the lowest. Ways to enhance and supplement keyword in title searching on the computer and in printed indexes are discussed.
    Date
    14. 3.1996 13:22:21
  17. Jardine, N.; Rijsbergen, C.J. van: ¬The use of hierarchic clustering in information retrieval (1971) 0.02
    0.018808415 = product of:
      0.09404208 = sum of:
        0.02369311 = weight(_text_:information in 5170) [ClassicSimilarity], result of:
          0.02369311 = score(doc=5170,freq=4.0), product of:
            0.05398669 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.030753274 = queryNorm
            0.43886948 = fieldWeight in 5170, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.125 = fieldNorm(doc=5170)
        0.07034896 = weight(_text_:retrieval in 5170) [ClassicSimilarity], result of:
          0.07034896 = score(doc=5170,freq=4.0), product of:
            0.093026035 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.030753274 = queryNorm
            0.75622874 = fieldWeight in 5170, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.125 = fieldNorm(doc=5170)
      0.2 = coord(2/10)
    
    Source
    Information storage and retrieval. 7(1971), S.217-240
  18. Sparck Jones, K.; Jackson, D.M.: ¬The use of automatically obtained keyword classification for information retrieval (1970) 0.02
    0.018808415 = product of:
      0.09404208 = sum of:
        0.02369311 = weight(_text_:information in 5177) [ClassicSimilarity], result of:
          0.02369311 = score(doc=5177,freq=4.0), product of:
            0.05398669 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.030753274 = queryNorm
            0.43886948 = fieldWeight in 5177, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.125 = fieldNorm(doc=5177)
        0.07034896 = weight(_text_:retrieval in 5177) [ClassicSimilarity], result of:
          0.07034896 = score(doc=5177,freq=4.0), product of:
            0.093026035 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.030753274 = queryNorm
            0.75622874 = fieldWeight in 5177, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.125 = fieldNorm(doc=5177)
      0.2 = coord(2/10)
    
    Source
    Information storage and retrieval. 5(1970), S.175-201
  19. Kantor, P.B.; Voorhees, E.: Information retrieval with scanned texts (2000) 0.02
    0.018808415 = product of:
      0.09404208 = sum of:
        0.02369311 = weight(_text_:information in 3901) [ClassicSimilarity], result of:
          0.02369311 = score(doc=3901,freq=4.0), product of:
            0.05398669 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.030753274 = queryNorm
            0.43886948 = fieldWeight in 3901, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.125 = fieldNorm(doc=3901)
        0.07034896 = weight(_text_:retrieval in 3901) [ClassicSimilarity], result of:
          0.07034896 = score(doc=3901,freq=4.0), product of:
            0.093026035 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.030753274 = queryNorm
            0.75622874 = fieldWeight in 3901, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.125 = fieldNorm(doc=3901)
      0.2 = coord(2/10)
    
    Source
    Information retrieval. 2(2000), S.165-176
  20. Golub, K.; Soergel, D.; Buchanan, G.; Tudhope, D.; Lykke, M.; Hiom, D.: ¬A framework for evaluating automatic indexing or classification in the context of retrieval (2016) 0.02
    0.018564304 = product of:
      0.061881013 = sum of:
        0.0090681305 = weight(_text_:information in 3311) [ClassicSimilarity], result of:
          0.0090681305 = score(doc=3311,freq=6.0), product of:
            0.05398669 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.030753274 = queryNorm
            0.16796975 = fieldWeight in 3311, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3311)
        0.026924854 = weight(_text_:retrieval in 3311) [ClassicSimilarity], result of:
          0.026924854 = score(doc=3311,freq=6.0), product of:
            0.093026035 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.030753274 = queryNorm
            0.28943354 = fieldWeight in 3311, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3311)
        0.02588803 = product of:
          0.05177606 = sum of:
            0.05177606 = weight(_text_:evaluation in 3311) [ClassicSimilarity], result of:
              0.05177606 = score(doc=3311,freq=6.0), product of:
                0.12900078 = queryWeight, product of:
                  4.1947007 = idf(docFreq=1811, maxDocs=44218)
                  0.030753274 = queryNorm
                0.40136236 = fieldWeight in 3311, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  4.1947007 = idf(docFreq=1811, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3311)
          0.5 = coord(1/2)
      0.3 = coord(3/10)
    
    Abstract
    Tools for automatic subject assignment help deal with scale and sustainability in creating and enriching metadata, establishing more connections across and between resources and enhancing consistency. Although some software vendors and experimental researchers claim the tools can replace manual subject indexing, hard scientific evidence of their performance in operating information environments is scarce. A major reason for this is that research is usually conducted in laboratory conditions, excluding the complexities of real-life systems and situations. The article reviews and discusses issues with existing evaluation approaches such as problems of aboutness and relevance assessments, implying the need to use more than a single "gold standard" method when evaluating indexing and retrieval, and proposes a comprehensive evaluation framework. The framework is informed by a systematic review of the literature on evaluation approaches: evaluating indexing quality directly through assessment by an evaluator or through comparison with a gold standard, evaluating the quality of computer-assisted indexing directly in the context of an indexing workflow, and evaluating indexing quality indirectly through analyzing retrieval performance.
    Series
    Advances in information science
    Source
    Journal of the Association for Information Science and Technology. 67(2016) no.1, S.3-16

Languages

Types

  • a 251
  • x 16
  • el 15
  • m 14
  • s 8
  • d 2
  • r 1
  • More… Less…