Search (231 results, page 1 of 12)

Kutschekmanesch, S.; Lutes, B.; Moelle, K.; Thiel, U.; Tzeras, K.: Automated multilingual indexing : a synthesis of rule-based and thesaurus-based methods (1998) 0.04

0.043877795 = product of:
  0.08775559 = sum of:
    0.03775026 = weight(_text_:von in 4157) [ClassicSimilarity], result of:
      0.03775026 = score(doc=4157,freq=2.0), product of:
        0.12806706 = queryWeight, product of:
          2.6679487 = idf(docFreq=8340, maxDocs=44218)
          0.04800207 = queryNorm
        0.29476947 = fieldWeight in 4157, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6679487 = idf(docFreq=8340, maxDocs=44218)
          0.078125 = fieldNorm(doc=4157)
    0.05000533 = product of:
      0.075008 = sum of:
        0.0099718105 = weight(_text_:a in 4157) [ClassicSimilarity], result of:
          0.0099718105 = score(doc=4157,freq=4.0), product of:
            0.055348642 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.04800207 = queryNorm
            0.18016359 = fieldWeight in 4157, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.078125 = fieldNorm(doc=4157)
        0.065036185 = weight(_text_:22 in 4157) [ClassicSimilarity], result of:
          0.065036185 = score(doc=4157,freq=2.0), product of:
            0.16809508 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04800207 = queryNorm
            0.38690117 = fieldWeight in 4157, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=4157)
      0.6666667 = coord(2/3)
  0.5 = coord(2/4)

Source: Information und Märkte: 50. Deutscher Dokumentartag 1998, Kongreß der Deutschen Gesellschaft für Dokumentation e.V. (DGD), Rheinische Friedrich-Wilhelms-Universität Bonn, 22.-24. September 1998. Hrsg. von Marlies Ockenfeld u. Gerhard J. Mantwill
Type: a

Keitz, W. von: Automatic indexing and the dissemination of information (1986) 0.03

0.03208051 = product of:
  0.06416102 = sum of:
    0.060400415 = weight(_text_:von in 2390) [ClassicSimilarity], result of:
      0.060400415 = score(doc=2390,freq=2.0), product of:
        0.12806706 = queryWeight, product of:
          2.6679487 = idf(docFreq=8340, maxDocs=44218)
          0.04800207 = queryNorm
        0.47163114 = fieldWeight in 2390, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6679487 = idf(docFreq=8340, maxDocs=44218)
          0.125 = fieldNorm(doc=2390)
    0.003760605 = product of:
      0.011281814 = sum of:
        0.011281814 = weight(_text_:a in 2390) [ClassicSimilarity], result of:
          0.011281814 = score(doc=2390,freq=2.0), product of:
            0.055348642 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.04800207 = queryNorm
            0.20383182 = fieldWeight in 2390, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.125 = fieldNorm(doc=2390)
      0.33333334 = coord(1/3)
  0.5 = coord(2/4)

Type: a

Salton, G.: Future prospects for text-based information retrieval (1990) 0.02

0.024060382 = product of:
  0.048120763 = sum of:
    0.04530031 = weight(_text_:von in 2327) [ClassicSimilarity], result of:
      0.04530031 = score(doc=2327,freq=2.0), product of:
        0.12806706 = queryWeight, product of:
          2.6679487 = idf(docFreq=8340, maxDocs=44218)
          0.04800207 = queryNorm
        0.35372335 = fieldWeight in 2327, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6679487 = idf(docFreq=8340, maxDocs=44218)
          0.09375 = fieldNorm(doc=2327)
    0.002820454 = product of:
      0.008461362 = sum of:
        0.008461362 = weight(_text_:a in 2327) [ClassicSimilarity], result of:
          0.008461362 = score(doc=2327,freq=2.0), product of:
            0.055348642 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.04800207 = queryNorm
            0.15287387 = fieldWeight in 2327, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.09375 = fieldNorm(doc=2327)
      0.33333334 = coord(1/3)
  0.5 = coord(2/4)

Source: Pragmatische Aspekte beim Entwurf und Betrieb von Informationssystemen: Proc. des 1. Int. Symposiums für Informationswissenschaft, Universität Konstanz, 17.-19.10.1990. Hrsg.: J. Herget u. R. Kuhlen
Type: a

Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.02

0.019223286 = product of:
  0.07689314 = sum of:
    0.07689314 = product of:
      0.11533971 = sum of:
        0.011281814 = weight(_text_:a in 402) [ClassicSimilarity], result of:
          0.011281814 = score(doc=402,freq=2.0), product of:
            0.055348642 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.04800207 = queryNorm
            0.20383182 = fieldWeight in 402, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.125 = fieldNorm(doc=402)
        0.10405789 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
          0.10405789 = score(doc=402,freq=2.0), product of:
            0.16809508 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04800207 = queryNorm
            0.61904186 = fieldWeight in 402, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.125 = fieldNorm(doc=402)
      0.6666667 = coord(2/3)
  0.25 = coord(1/4)

Source: Information processing and management. 22(1986) no.6, S.465-476
Type: a

Hlava, M.M.K.: Automatic indexing : comparing rule-based and statistics-based indexing systems (2005) 0.02

0.016820375 = product of:
  0.0672815 = sum of:
    0.0672815 = product of:
      0.10092224 = sum of:
        0.009871588 = weight(_text_:a in 6265) [ClassicSimilarity], result of:
          0.009871588 = score(doc=6265,freq=2.0), product of:
            0.055348642 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.04800207 = queryNorm
            0.17835285 = fieldWeight in 6265, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.109375 = fieldNorm(doc=6265)
        0.091050655 = weight(_text_:22 in 6265) [ClassicSimilarity], result of:
          0.091050655 = score(doc=6265,freq=2.0), product of:
            0.16809508 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04800207 = queryNorm
            0.5416616 = fieldWeight in 6265, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=6265)
      0.6666667 = coord(2/3)
  0.25 = coord(1/4)

Source: Information outlook. 9(2005) no.8, S.22-23
Type: a

Koryconski, C.; Newell, A.F.: Natural-language processing and automatic indexing (1990) 0.02

0.016728494 = product of:
  0.03345699 = sum of:
    0.030200208 = weight(_text_:von in 2313) [ClassicSimilarity], result of:
      0.030200208 = score(doc=2313,freq=2.0), product of:
        0.12806706 = queryWeight, product of:
          2.6679487 = idf(docFreq=8340, maxDocs=44218)
          0.04800207 = queryNorm
        0.23581557 = fieldWeight in 2313, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6679487 = idf(docFreq=8340, maxDocs=44218)
          0.0625 = fieldNorm(doc=2313)
    0.0032567796 = product of:
      0.009770338 = sum of:
        0.009770338 = weight(_text_:a in 2313) [ClassicSimilarity], result of:
          0.009770338 = score(doc=2313,freq=6.0), product of:
            0.055348642 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.04800207 = queryNorm
            0.17652355 = fieldWeight in 2313, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=2313)
      0.33333334 = coord(1/3)
  0.5 = coord(2/4)

Abstract: The task of producing satisfactory indexes by automatic means has been tackled on two fronts: by statistical analysis of text and by attempting content analysis of the text in much the same way as a human indexer does. Though statistical techniques have a lot to offer for free-text database systems, neither method has had much success with back-of-the-book indexing. This review examines some problems associated with the application of natural-language processing techniques to book texts. - Vgl. auch die Erwiderung von K.P. Jones
Type: a

Advances in intelligent retrieval: Proc. of a conference ... Wadham College, Oxford, 16.-17.4.1985 (1986) 0.02
```
0.016684802 = product of:
  0.06673921 = sum of:
    0.06673921 = product of:
      0.10010881 = sum of:
        0.00946009 = weight(_text_:a in 1384) [ClassicSimilarity], result of:
          0.00946009 = score(doc=1384,freq=10.0), product of:
            0.055348642 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.04800207 = queryNorm
            0.1709182 = fieldWeight in 1384, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=1384)
        0.09064872 = weight(_text_:z in 1384) [ClassicSimilarity], result of:
          0.09064872 = score(doc=1384,freq=2.0), product of:
            0.2562021 = queryWeight, product of:
              5.337313 = idf(docFreq=577, maxDocs=44218)
              0.04800207 = queryNorm
            0.35381722 = fieldWeight in 1384, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.337313 = idf(docFreq=577, maxDocs=44218)
              0.046875 = fieldNorm(doc=1384)
      0.6666667 = coord(2/3)
  0.25 = coord(1/4)
```
Content

Enthält die Beiträge: ADDIS, T.: Extended relational analysis: a design approach to knowledge-based systems; PARKINSON, D.: Supercomputers and non-numeric processing; McGREGOR, D.R. u. J.R. MALONE: An architectural approach to advances in information retrieval; ALLEN, M.J. u. O.S. HARRISON: Word processing and information retrieval: some practical problems; MURTAGH, F.: Clustering and nearest neighborhood searching; ENSER, P.G.B.: Experimenting with the automatic classification of books; TESKEY, N. u. Z. RAZAK: An analysis of ranking for free text retrieval systems; ZARRI, G.P.: Interactive information retrieval: an artificial intelligence approach to deal with biographical data; HANCOX, P. u. F. SMITH: A case system processor for the PRECIS indexing language; ROUAULT, J.: Linguistic methods in information retrieval systems; ARAGON-RAMIREZ, V. u. C.D. PAICE: Design of a system for the online elucidation of natural language search statements; BROOKS, H.M., P.J. DANIELS u. N.J. BELKIN: Problem descriptions and user models: developing an intelligent interface for document retrieval systems; BLACK, W.J., P. HARGREAVES u. P.B. MAYES: HEADS: a cataloguing advisory system; BELL, D.A.: An architecture for integrating data, knowledge, and information bases

Munkelt, J.: Erstellung einer DNB-Retrieval-Testkollektion (2018) 0.01

0.014035223 = product of:
  0.028070446 = sum of:
    0.026425181 = weight(_text_:von in 4310) [ClassicSimilarity], result of:
      0.026425181 = score(doc=4310,freq=2.0), product of:
        0.12806706 = queryWeight, product of:
          2.6679487 = idf(docFreq=8340, maxDocs=44218)
          0.04800207 = queryNorm
        0.20633863 = fieldWeight in 4310, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6679487 = idf(docFreq=8340, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4310)
    0.0016452647 = product of:
      0.004935794 = sum of:
        0.004935794 = weight(_text_:a in 4310) [ClassicSimilarity], result of:
          0.004935794 = score(doc=4310,freq=2.0), product of:
            0.055348642 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.04800207 = queryNorm
            0.089176424 = fieldWeight in 4310, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4310)
      0.33333334 = coord(1/3)
  0.5 = coord(2/4)

Abstract: Seit Herbst 2017 findet in der Deutschen Nationalbibliothek die Inhaltserschließung bestimmter Medienwerke rein maschinell statt. Die Qualität dieses Verfahrens, das die Prozessorganisation von Bibliotheken maßgeblich prägen kann, wird unter Fachleuten kontrovers diskutiert. Ihre Standpunkte werden zunächst hinreichend erläutert, ehe die Notwendigkeit einer Qualitätsprüfung des Verfahrens und dessen Grundlagen dargelegt werden. Zentraler Bestandteil einer künftigen Prüfung ist eine Testkollektion. Ihre Erstellung und deren Dokumentation steht im Fokus dieser Arbeit. In diesem Zusammenhang werden auch die Entstehungsgeschichte und Anforderungen an gelungene Testkollektionen behandelt. Abschließend wird ein Retrievaltest durchgeführt, der die Einsatzfähigkeit der erarbeiteten Testkollektion belegt. Seine Ergebnisse dienen ausschließlich der Funktionsüberprüfung. Eine Qualitätsbeurteilung maschineller Inhaltserschließung im Speziellen sowie im Allgemeinen findet nicht statt und ist nicht Ziel der Ausarbeitung.
Type: a

Chung, Y.M.; Lee, J.Y.: ¬A corpus-based approach to comparative evaluation of statistical term association measures (2001) 0.01
```
0.013904001 = product of:
  0.055616003 = sum of:
    0.055616003 = product of:
      0.083424 = sum of:
        0.007883408 = weight(_text_:a in 5769) [ClassicSimilarity], result of:
          0.007883408 = score(doc=5769,freq=10.0), product of:
            0.055348642 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.04800207 = queryNorm
            0.14243183 = fieldWeight in 5769, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5769)
        0.075540595 = weight(_text_:z in 5769) [ClassicSimilarity], result of:
          0.075540595 = score(doc=5769,freq=2.0), product of:
            0.2562021 = queryWeight, product of:
              5.337313 = idf(docFreq=577, maxDocs=44218)
              0.04800207 = queryNorm
            0.29484767 = fieldWeight in 5769, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.337313 = idf(docFreq=577, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5769)
      0.6666667 = coord(2/3)
  0.25 = coord(1/4)
```
Abstract

Statistical association measures have been widely applied in information retrieval research, usually employing a clustering of documents or terms on the basis of their relationships. Applications of the association measures for term clustering include automatic thesaurus construction and query expansion. This research evaluates the similarity of six association measures by comparing the relationship and behavior they demonstrate in various analyses of a test corpus. Analysis techniques include comparisons of highly ranked term pairs and term clusters, analyses of the correlation among the association measures using Pearson's correlation coefficient and MDS mapping, and an analysis of the impact of a term frequency on the association values by means of z-score. The major findings of the study are as follows: First, the most similar association measures are mutual information and Yule's coefficient of colligation Y, whereas cosine and Jaccard coefficients, as well as X**2 statistic and likelihood ratio, demonstrate quite similar behavior for terms with high frequency. Second, among all the measures, the X**2 statistic is the least affected by the frequency of terms. Third, although cosine and Jaccard coefficients tend to emphasize high frequency terms, mutual information and Yule's Y seem to overestimate rare terms

Type

a

Stankovic, R. et al.: Indexing of textual databases based on lexical resources : a case study for Serbian (2016) 0.01

0.012501333 = product of:
  0.05000533 = sum of:
    0.05000533 = product of:
      0.075008 = sum of:
        0.0099718105 = weight(_text_:a in 2759) [ClassicSimilarity], result of:
          0.0099718105 = score(doc=2759,freq=4.0), product of:
            0.055348642 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.04800207 = queryNorm
            0.18016359 = fieldWeight in 2759, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.078125 = fieldNorm(doc=2759)
        0.065036185 = weight(_text_:22 in 2759) [ClassicSimilarity], result of:
          0.065036185 = score(doc=2759,freq=2.0), product of:
            0.16809508 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04800207 = queryNorm
            0.38690117 = fieldWeight in 2759, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2759)
      0.6666667 = coord(2/3)
  0.25 = coord(1/4)

Date: 1. 2.2016 18:25:22
Type: a

Biebricher, N.; Fuhr, N.; Lustig, G.; Schwantner, M.; Knorz, G.: ¬The automatic indexing system AIR/PHYS : from research to application (1988) 0.01

0.012014553 = product of:
  0.04805821 = sum of:
    0.04805821 = product of:
      0.07208732 = sum of:
        0.007051134 = weight(_text_:a in 1952) [ClassicSimilarity], result of:
          0.007051134 = score(doc=1952,freq=2.0), product of:
            0.055348642 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.04800207 = queryNorm
            0.12739488 = fieldWeight in 1952, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.078125 = fieldNorm(doc=1952)
        0.065036185 = weight(_text_:22 in 1952) [ClassicSimilarity], result of:
          0.065036185 = score(doc=1952,freq=2.0), product of:
            0.16809508 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04800207 = queryNorm
            0.38690117 = fieldWeight in 1952, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=1952)
      0.6666667 = coord(2/3)
  0.25 = coord(1/4)

Date: 16. 8.1998 12:51:22
Type: a

Tsujii, J.-I.: Automatic acquisition of semantic collocation from corpora (1995) 0.01

0.010299881 = product of:
  0.041199524 = sum of:
    0.041199524 = product of:
      0.061799284 = sum of:
        0.009770338 = weight(_text_:a in 4709) [ClassicSimilarity], result of:
          0.009770338 = score(doc=4709,freq=6.0), product of:
            0.055348642 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.04800207 = queryNorm
            0.17652355 = fieldWeight in 4709, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=4709)
        0.052028947 = weight(_text_:22 in 4709) [ClassicSimilarity], result of:
          0.052028947 = score(doc=4709,freq=2.0), product of:
            0.16809508 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04800207 = queryNorm
            0.30952093 = fieldWeight in 4709, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=4709)
      0.6666667 = coord(2/3)
  0.25 = coord(1/4)

Abstract: Proposes automatic linguistic knowledge acquisition from sublanguage corpora. The system combines existing linguistic knowledge and human intervention with corpus based techniques. The algorithm involves a gradual approximation which works to converge linguistic knowledge gradually towards desirable results. The 1st experiment revealed the characteristic of this algorithm and the others proved the effectiveness of this algorithm for a real corpus
Date: 31. 7.1996 9:22:19
Type: a

Riloff, E.: ¬An empirical study of automated dictionary construction for information extraction in three domains (1996) 0.01

0.010001066 = product of:
  0.040004265 = sum of:
    0.040004265 = product of:
      0.060006395 = sum of:
        0.007977448 = weight(_text_:a in 6752) [ClassicSimilarity], result of:
          0.007977448 = score(doc=6752,freq=4.0), product of:
            0.055348642 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.04800207 = queryNorm
            0.14413087 = fieldWeight in 6752, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=6752)
        0.052028947 = weight(_text_:22 in 6752) [ClassicSimilarity], result of:
          0.052028947 = score(doc=6752,freq=2.0), product of:
            0.16809508 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04800207 = queryNorm
            0.30952093 = fieldWeight in 6752, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=6752)
      0.6666667 = coord(2/3)
  0.25 = coord(1/4)

Abstract: AutoSlog is a system that addresses the knowledge engineering bottleneck for information extraction. AutoSlog automatically creates domain specific dictionaries for information extraction, given an appropriate training corpus. Describes experiments with AutoSlog in terrorism, joint ventures and microelectronics domains. Compares the performance of AutoSlog across the 3 domains, discusses the lessons learned and presents results from 2 experiments which demonstrate that novice users can generate effective dictionaries using AutoSlog
Date: 6. 3.1997 16:22:15
Type: a

Tavakolizadeh-Ravari, M.: Analysis of the long term dynamics in thesaurus developments and its consequences (2017) 0.01
```
0.009987781 = product of:
  0.039951123 = sum of:
    0.039951123 = weight(_text_:von in 3081) [ClassicSimilarity], result of:
      0.039951123 = score(doc=3081,freq=14.0), product of:
        0.12806706 = queryWeight, product of:
          2.6679487 = idf(docFreq=8340, maxDocs=44218)
          0.04800207 = queryNorm
        0.3119547 = fieldWeight in 3081, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          2.6679487 = idf(docFreq=8340, maxDocs=44218)
          0.03125 = fieldNorm(doc=3081)
  0.25 = coord(1/4)
```
Abstract

Die Arbeit analysiert die dynamische Entwicklung und den Gebrauch von Thesaurusbegriffen. Zusätzlich konzentriert sie sich auf die Faktoren, die die Zahl von Indexbegriffen pro Dokument oder Zeitschrift beeinflussen. Als Untersuchungsobjekt dienten der MeSH und die entsprechende Datenbank "MEDLINE". Die wichtigsten Konsequenzen sind: 1. Der MeSH-Thesaurus hat sich durch drei unterschiedliche Phasen jeweils logarithmisch entwickelt. Solch einen Thesaurus sollte folgenden Gleichung folgen: "T = 3.076,6 Ln (d) - 22.695 + 0,0039d" (T = Begriffe, Ln = natürlicher Logarithmus und d = Dokumente). Um solch einen Thesaurus zu konstruieren, muss man demnach etwa 1.600 Dokumente von unterschiedlichen Themen des Bereiches des Thesaurus haben. Die dynamische Entwicklung von Thesauri wie MeSH erfordert die Einführung eines neuen Begriffs pro Indexierung von 256 neuen Dokumenten. 2. Die Verteilung der Thesaurusbegriffe erbrachte drei Kategorien: starke, normale und selten verwendete Headings. Die letzte Gruppe ist in einer Testphase, während in der ersten und zweiten Kategorie die neu hinzukommenden Deskriptoren zu einem Thesauruswachstum führen. 3. Es gibt ein logarithmisches Verhältnis zwischen der Zahl von Index-Begriffen pro Aufsatz und dessen Seitenzahl für die Artikeln zwischen einer und einundzwanzig Seiten. 4. Zeitschriftenaufsätze, die in MEDLINE mit Abstracts erscheinen erhalten fast zwei Deskriptoren mehr. 5. Die Findablity der nicht-englisch sprachigen Dokumente in MEDLINE ist geringer als die englische Dokumente. 6. Aufsätze der Zeitschriften mit einem Impact Factor 0 bis fünfzehn erhalten nicht mehr Indexbegriffe als die der anderen von MEDINE erfassten Zeitschriften. 7. In einem Indexierungssystem haben unterschiedliche Zeitschriften mehr oder weniger Gewicht in ihrem Findability. Die Verteilung der Indexbegriffe pro Seite hat gezeigt, dass es bei MEDLINE drei Kategorien der Publikationen gibt. Außerdem gibt es wenige stark bevorzugten Zeitschriften."

Newman, D.J.; Block, S.: Probabilistic topic decomposition of an eighteenth-century American newspaper (2006) 0.01

0.009232819 = product of:
  0.036931276 = sum of:
    0.036931276 = product of:
      0.055396914 = sum of:
        0.009871588 = weight(_text_:a in 5291) [ClassicSimilarity], result of:
          0.009871588 = score(doc=5291,freq=8.0), product of:
            0.055348642 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.04800207 = queryNorm
            0.17835285 = fieldWeight in 5291, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5291)
        0.045525327 = weight(_text_:22 in 5291) [ClassicSimilarity], result of:
          0.045525327 = score(doc=5291,freq=2.0), product of:
            0.16809508 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04800207 = queryNorm
            0.2708308 = fieldWeight in 5291, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5291)
      0.6666667 = coord(2/3)
  0.25 = coord(1/4)

Abstract: We use a probabilistic mixture decomposition method to determine topics in the Pennsylvania Gazette, a major colonial U.S. newspaper from 1728-1800. We assess the value of several topic decomposition techniques for historical research and compare the accuracy and efficacy of various methods. After determining the topics covered by the 80,000 articles and advertisements in the entire 18th century run of the Gazette, we calculate how the prevalence of those topics changed over time, and give historically relevant examples of our findings. This approach reveals important information about the content of this colonial newspaper, and suggests the value of such approaches to a more complete understanding of early American print culture and society.
Date: 22. 7.2006 17:32:00
Type: a

Bordoni, L.; Pazienza, M.T.: Documents automatic indexing in an environmental domain (1997) 0.01

0.0090123955 = product of:
  0.036049582 = sum of:
    0.036049582 = product of:
      0.054074373 = sum of:
        0.008549047 = weight(_text_:a in 530) [ClassicSimilarity], result of:
          0.008549047 = score(doc=530,freq=6.0), product of:
            0.055348642 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.04800207 = queryNorm
            0.1544581 = fieldWeight in 530, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=530)
        0.045525327 = weight(_text_:22 in 530) [ClassicSimilarity], result of:
          0.045525327 = score(doc=530,freq=2.0), product of:
            0.16809508 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04800207 = queryNorm
            0.2708308 = fieldWeight in 530, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=530)
      0.6666667 = coord(2/3)
  0.25 = coord(1/4)

Abstract: Describes an application of Natural Language Processing (NLP) techniques, in HIRMA (Hypertextual Information Retrieval Managed by ARIOSTO), to the problem of document indexing by referring to a system which incorporates natural language processing techniques to determine the subject of the text of documents and to associate them with relevant semantic indexes. Describes briefly the overall system, details of its implementation on a corpus of scientific abstracts related to environmental topics and experimental evidence of the system's behaviour. Analyzes in detail an experiment designed to evaluate the system's retrieval ability in terms of recall and precision
Source: International forum on information and documentation. 22(1997) no.1, S.17-28
Type: a

Wolfekuhler, M.R.; Punch, W.F.: Finding salient features for personal Web pages categories (1997) 0.01

0.0090123955 = product of:
  0.036049582 = sum of:
    0.036049582 = product of:
      0.054074373 = sum of:
        0.008549047 = weight(_text_:a in 2673) [ClassicSimilarity], result of:
          0.008549047 = score(doc=2673,freq=6.0), product of:
            0.055348642 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.04800207 = queryNorm
            0.1544581 = fieldWeight in 2673, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2673)
        0.045525327 = weight(_text_:22 in 2673) [ClassicSimilarity], result of:
          0.045525327 = score(doc=2673,freq=2.0), product of:
            0.16809508 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04800207 = queryNorm
            0.2708308 = fieldWeight in 2673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2673)
      0.6666667 = coord(2/3)
  0.25 = coord(1/4)

Abstract: Examines techniques that discover features in sets of pre-categorized documents, such that similar documents can be found on the WWW. Examines techniques which will classifiy training examples with high accuracy, then explains why this is not necessarily useful. Describes a method for extracting word clusters from the raw document features. Results show that the clustering technique is successful in discovering word groups in personal Web pages which can be used to find similar information on the WWW
Date: 1. 8.1996 22:08:06
Footnote: Contribution to a special issue of papers from the 6th International World Wide Web conference, held 7-11 Apr 1997, Santa Clara, California
Type: a

Hodges, P.R.: Keyword in title indexes : effectiveness of retrieval in computer searches (1983) 0.01

0.008750932 = product of:
  0.03500373 = sum of:
    0.03500373 = product of:
      0.052505594 = sum of:
        0.0069802674 = weight(_text_:a in 5001) [ClassicSimilarity], result of:
          0.0069802674 = score(doc=5001,freq=4.0), product of:
            0.055348642 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.04800207 = queryNorm
            0.12611452 = fieldWeight in 5001, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5001)
        0.045525327 = weight(_text_:22 in 5001) [ClassicSimilarity], result of:
          0.045525327 = score(doc=5001,freq=2.0), product of:
            0.16809508 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04800207 = queryNorm
            0.2708308 = fieldWeight in 5001, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5001)
      0.6666667 = coord(2/3)
  0.25 = coord(1/4)

Abstract: A study was done to test the effectiveness of retrieval using title word searching. It was based on actual search profiles used in the Mechanized Information Center at Ohio State University, in order ro replicate as closely as possible actual searching conditions. Fewer than 50% of the relevant titles were retrieved by keywords in titles. The low rate of retrieval can be attributes to three sources: titles themselves, user and information specialist ignorance of the subject vocabulary in use, and to general language problems. Across fields it was found that the social sciences had the best retrieval rate, with science having the next best, and arts and humanities the lowest. Ways to enhance and supplement keyword in title searching on the computer and in printed indexes are discussed.
Date: 14. 3.1996 13:22:21
Type: a

Ward, M.L.: ¬The future of the human indexer (1996) 0.01

0.0077249105 = product of:
  0.030899642 = sum of:
    0.030899642 = product of:
      0.046349462 = sum of:
        0.007327754 = weight(_text_:a in 7244) [ClassicSimilarity], result of:
          0.007327754 = score(doc=7244,freq=6.0), product of:
            0.055348642 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.04800207 = queryNorm
            0.13239266 = fieldWeight in 7244, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=7244)
        0.039021708 = weight(_text_:22 in 7244) [ClassicSimilarity], result of:
          0.039021708 = score(doc=7244,freq=2.0), product of:
            0.16809508 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04800207 = queryNorm
            0.23214069 = fieldWeight in 7244, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=7244)
      0.6666667 = coord(2/3)
  0.25 = coord(1/4)

Abstract: Considers the principles of indexing and the intellectual skills involved in order to determine what automatic indexing systems would be required in order to supplant or complement the human indexer. Good indexing requires: considerable prior knowledge of the literature; judgement as to what to index and what depth to index; reading skills; abstracting skills; and classification skills, Illustrates these features with a detailed description of abstracting and indexing processes involved in generating entries for the mechanical engineering database POWERLINK. Briefly assesses the possibility of replacing human indexers with specialist indexing software, with particular reference to the Object Analyzer from the InTEXT automatic indexing system and using the criteria described for human indexers. At present, it is unlikely that the automatic indexer will replace the human indexer, but when more primary texts are available in electronic form, it may be a useful productivity tool for dealing with large quantities of low grade texts (should they be wanted in the database)
Date: 9. 2.1997 18:44:22
Type: a

Plaunt, C.; Norgard, B.A.: ¬An association-based method for automatic indexing with a controlled vocabulary (1998) 0.01
```
0.0073685125 = product of:
  0.02947405 = sum of:
    0.02947405 = product of:
      0.044211075 = sum of:
        0.011692984 = weight(_text_:a in 1794) [ClassicSimilarity], result of:
          0.011692984 = score(doc=1794,freq=22.0), product of:
            0.055348642 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.04800207 = queryNorm
            0.21126054 = fieldWeight in 1794, product of:
              4.690416 = tf(freq=22.0), with freq of:
                22.0 = termFreq=22.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1794)
        0.032518093 = weight(_text_:22 in 1794) [ClassicSimilarity], result of:
          0.032518093 = score(doc=1794,freq=2.0), product of:
            0.16809508 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04800207 = queryNorm
            0.19345059 = fieldWeight in 1794, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1794)
      0.6666667 = coord(2/3)
  0.25 = coord(1/4)
```
Abstract

In this article, we describe and test a two-stage algorithm based on a lexical collocation technique which maps from the lexical clues contained in a document representation into a controlled vocabulary list of subject headings. Using a collection of 4.626 INSPEC documents, we create a 'dictionary' of associations between the lexical items contained in the titles, authors, and abstracts, and controlled vocabulary subject headings assigned to those records by human indexers using a likelihood ratio statistic as the measure of association. In the deployment stage, we use the dictiony to predict which of the controlled vocabulary subject headings best describe new documents when they are presented to the system. Our evaluation of this algorithm, in which we compare the automatically assigned subject headings to the subject headings assigned to the test documents by human catalogers, shows that we can obtain results comparable to, and consistent with, human cataloging. In effect we have cast this as a classic partial match information retrieval problem. We consider the problem to be one of 'retrieving' (or assigning) the most probably 'relevant' (or correct) controlled vocabulary subject headings to a document based on the clues contained in that document

Date

11. 9.2000 19:53:22

Type

a

Search (231 results, page 1 of 12)

Authors

Years

Types

Themes