Search (30 results, page 1 of 2)

  • × language_ss:"e"
  • × theme_ss:"Automatisches Klassifizieren"
  • × year_i:[1990 TO 2000}
  1. Koch, T.; Vizine-Goetz, D.: DDC and knowledge organization in the digital library : Research and development. Demonstration pages (1999) 0.03
    0.029954214 = product of:
      0.1283752 = sum of:
        0.014048031 = weight(_text_:und in 942) [ClassicSimilarity], result of:
          0.014048031 = score(doc=942,freq=8.0), product of:
            0.04780656 = queryWeight, product of:
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.021569785 = queryNorm
            0.29385152 = fieldWeight in 942, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.046875 = fieldNorm(doc=942)
        0.007483202 = weight(_text_:in in 942) [ClassicSimilarity], result of:
          0.007483202 = score(doc=942,freq=16.0), product of:
            0.029340398 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.021569785 = queryNorm
            0.25504774 = fieldWeight in 942, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=942)
        0.014048031 = weight(_text_:und in 942) [ClassicSimilarity], result of:
          0.014048031 = score(doc=942,freq=8.0), product of:
            0.04780656 = queryWeight, product of:
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.021569785 = queryNorm
            0.29385152 = fieldWeight in 942, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.046875 = fieldNorm(doc=942)
        0.020302603 = weight(_text_:bibliotheken in 942) [ClassicSimilarity], result of:
          0.020302603 = score(doc=942,freq=2.0), product of:
            0.08127756 = queryWeight, product of:
              3.768121 = idf(docFreq=2775, maxDocs=44218)
              0.021569785 = queryNorm
            0.24979347 = fieldWeight in 942, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.768121 = idf(docFreq=2775, maxDocs=44218)
              0.046875 = fieldNorm(doc=942)
        0.03188814 = weight(_text_:deutsche in 942) [ClassicSimilarity], result of:
          0.03188814 = score(doc=942,freq=2.0), product of:
            0.10186133 = queryWeight, product of:
              4.7224083 = idf(docFreq=1068, maxDocs=44218)
              0.021569785 = queryNorm
            0.3130544 = fieldWeight in 942, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.7224083 = idf(docFreq=1068, maxDocs=44218)
              0.046875 = fieldNorm(doc=942)
        0.020302603 = weight(_text_:bibliotheken in 942) [ClassicSimilarity], result of:
          0.020302603 = score(doc=942,freq=2.0), product of:
            0.08127756 = queryWeight, product of:
              3.768121 = idf(docFreq=2775, maxDocs=44218)
              0.021569785 = queryNorm
            0.24979347 = fieldWeight in 942, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.768121 = idf(docFreq=2775, maxDocs=44218)
              0.046875 = fieldNorm(doc=942)
        0.020302603 = weight(_text_:bibliotheken in 942) [ClassicSimilarity], result of:
          0.020302603 = score(doc=942,freq=2.0), product of:
            0.08127756 = queryWeight, product of:
              3.768121 = idf(docFreq=2775, maxDocs=44218)
              0.021569785 = queryNorm
            0.24979347 = fieldWeight in 942, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.768121 = idf(docFreq=2775, maxDocs=44218)
              0.046875 = fieldNorm(doc=942)
      0.23333333 = coord(7/30)
    
    Abstract
    Der Workshop gibt einen Einblick in die aktuelle Forschung und Entwicklung zur Wissensorganisation in digitalen Bibliotheken. Diane Vizine-Goetz vom OCLC Office of Research in Dublin, Ohio, stellt die Forschungsprojekte von OCLC zur Anpassung und Weiterentwicklung der Dewey Decimal Classification als Wissensorganisationsinstrument fuer grosse digitale Dokumentensammlungen vor. Traugott Koch, NetLab, Universität Lund in Schweden, demonstriert die Ansätze und Lösungen des EU-Projekts DESIRE zum Einsatz von intellektueller und vor allem automatischer Klassifikation in Fachinformationsdiensten im Internet.
    Content
    1. Increased Importance of Knowledge Organization in Internet Services - 2. Quality Subject Service and the role of classification - 3. Developing the DDC into a knowledge organization instrument for the digital library. OCLC site - 4. DESIRE's Barefoot Solutions of Automatic Classification - 5. Advanced Classification Solutions in DESIRE and CORC - 6. Future directions of research and development - 7. General references
    Footnote
    Vortrag anläßlich des Workshops am 21.10.1999, Deutsche Bibliothek, Frankfurt/M.
  2. Dubin, D.: Dimensions and discriminability (1998) 0.00
    0.0016565587 = product of:
      0.016565587 = sum of:
        0.004365201 = weight(_text_:in in 2338) [ClassicSimilarity], result of:
          0.004365201 = score(doc=2338,freq=4.0), product of:
            0.029340398 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.021569785 = queryNorm
            0.14877784 = fieldWeight in 2338, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2338)
        0.0019719584 = weight(_text_:s in 2338) [ClassicSimilarity], result of:
          0.0019719584 = score(doc=2338,freq=2.0), product of:
            0.023451481 = queryWeight, product of:
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.021569785 = queryNorm
            0.08408674 = fieldWeight in 2338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2338)
        0.010228428 = product of:
          0.020456856 = sum of:
            0.020456856 = weight(_text_:22 in 2338) [ClassicSimilarity], result of:
              0.020456856 = score(doc=2338,freq=2.0), product of:
                0.07553371 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.021569785 = queryNorm
                0.2708308 = fieldWeight in 2338, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2338)
          0.5 = coord(1/2)
      0.1 = coord(3/30)
    
    Abstract
    Visualization interfaces can improve subject access by highlighting the inclusion of document representation components in similarity and discrimination relationships. Within a set of retrieved documents, what kinds of groupings can index terms and subject headings make explicit? The role of controlled vocabulary in classifying search output is examined
    Date
    22. 9.1997 19:16:05
    Pages
    S.39-44
  3. Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998) 0.00
    0.0016565587 = product of:
      0.016565587 = sum of:
        0.004365201 = weight(_text_:in in 1673) [ClassicSimilarity], result of:
          0.004365201 = score(doc=1673,freq=4.0), product of:
            0.029340398 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.021569785 = queryNorm
            0.14877784 = fieldWeight in 1673, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1673)
        0.0019719584 = weight(_text_:s in 1673) [ClassicSimilarity], result of:
          0.0019719584 = score(doc=1673,freq=2.0), product of:
            0.023451481 = queryWeight, product of:
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.021569785 = queryNorm
            0.08408674 = fieldWeight in 1673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1673)
        0.010228428 = product of:
          0.020456856 = sum of:
            0.020456856 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
              0.020456856 = score(doc=1673,freq=2.0), product of:
                0.07553371 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.021569785 = queryNorm
                0.2708308 = fieldWeight in 1673, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1673)
          0.5 = coord(1/2)
      0.1 = coord(3/30)
    
    Abstract
    The Wolverhampton Web Library (WWLib) is a WWW search engine that provides access to UK based information. The experimental version developed in 1995, was a success but highlighted the need for a much higher degree of automation. An interesting feature of the experimental WWLib was that it organised information according to DDC. Discusses the advantages of classification and describes the automatic classifier that is being developed in Java as part of the new, fully automated WWLib
    Date
    1. 8.1996 22:08:06
    Source
    Computer networks and ISDN systems. 30(1998) nos.1/7, S.646-648
  4. Chan, L.M.; Lin, X.; Zeng, M.: Structural and multilingual approaches to subject access on the Web (1999) 0.00
    0.0012487139 = product of:
      0.018730707 = sum of:
        0.009365354 = weight(_text_:und in 162) [ClassicSimilarity], result of:
          0.009365354 = score(doc=162,freq=2.0), product of:
            0.04780656 = queryWeight, product of:
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.021569785 = queryNorm
            0.19590102 = fieldWeight in 162, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.0625 = fieldNorm(doc=162)
        0.009365354 = weight(_text_:und in 162) [ClassicSimilarity], result of:
          0.009365354 = score(doc=162,freq=2.0), product of:
            0.04780656 = queryWeight, product of:
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.021569785 = queryNorm
            0.19590102 = fieldWeight in 162, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.0625 = fieldNorm(doc=162)
      0.06666667 = coord(2/30)
    
    Abstract
    Zu den großen Herausforderungen einer sinnvollen Suche im WWW gehören die riesige Menge des Verfügbaren und die Sparchbarrieren. Verfahren, die die Web-Ressourcen im Hinblick auf ein effizienteres Retrieval inhaltlich strukturieren, werden daher ebenso dringend benötigt wie Programme, die mit der Sprachvielfalt umgehen können. Im folgenden Vortrag werden wir einige Ansätze diskutieren, die zur Bewältigung der beiden Probleme derzeit unternommen werden
  5. Koch, T.; Ardö, A.; Noodén, L.: ¬The construction of a robot-generated subject index : DESIRE II D3.6a, Working Paper 1 (1999) 0.00
    6.783625E-4 = product of:
      0.010175437 = sum of:
        0.0026457112 = weight(_text_:in in 1668) [ClassicSimilarity], result of:
          0.0026457112 = score(doc=1668,freq=2.0), product of:
            0.029340398 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.021569785 = queryNorm
            0.09017298 = fieldWeight in 1668, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=1668)
        0.007529726 = product of:
          0.022589177 = sum of:
            0.022589177 = weight(_text_:l in 1668) [ClassicSimilarity], result of:
              0.022589177 = score(doc=1668,freq=2.0), product of:
                0.0857324 = queryWeight, product of:
                  3.9746525 = idf(docFreq=2257, maxDocs=44218)
                  0.021569785 = queryNorm
                0.26348472 = fieldWeight in 1668, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.9746525 = idf(docFreq=2257, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1668)
          0.33333334 = coord(1/3)
      0.06666667 = coord(2/30)
    
    Abstract
    This working paper describes the creation of a test database to carry out the automatic classification tasks of the DESIRE II work package D3.6a on. It is an improved version of NetLab's existing "All" Engineering database created after a comparative study of the outcome of two different approaches to collecting the documents. These two methods were selected from seven different general methodologies to build robot-generated subject indices, presented in this paper. We found a surprisingly low overlap between the Engineering link collections we used as seed pages for the robot and subsequently an even more surprisingly low overlap between the resources collected by the two different approaches. That inspite of using basically the same services to start the harvesting process from. A intellectual evaluation of the contents of both databases showed almost exactly the same percentage of relevant documents (77%), indicating that the main difference between those aproaches was the coverage of the resulting database.
  6. Ingwersen, P.; Wormell, I.: Ranganathan in the perspective of advanced information retrieval (1992) 0.00
    6.7611027E-4 = product of:
      0.010141654 = sum of:
        0.007887987 = weight(_text_:in in 7695) [ClassicSimilarity], result of:
          0.007887987 = score(doc=7695,freq=10.0), product of:
            0.029340398 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.021569785 = queryNorm
            0.26884392 = fieldWeight in 7695, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0625 = fieldNorm(doc=7695)
        0.002253667 = weight(_text_:s in 7695) [ClassicSimilarity], result of:
          0.002253667 = score(doc=7695,freq=2.0), product of:
            0.023451481 = queryWeight, product of:
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.021569785 = queryNorm
            0.09609913 = fieldWeight in 7695, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.0625 = fieldNorm(doc=7695)
      0.06666667 = coord(2/30)
    
    Abstract
    Examnines Ranganathan's approach to knowledge organisation and its relevance to intellectual accessibility in libraries. Discusses the current and future developments of his methodology and theories in knowledge-based systems. Topics covered include: semi-automatic classification and structure of thesauri; user-intermediary interactions in information retrieval (IR); semantic value-theory and uncertainty principles in IR; and case grammar
    Source
    Libri. 42(1992) no.3, S.184-201
  7. Ardö, A.; Koch, T.: Automatic classification applied to full-text Internet documents in a robot-generated subject index (1999) 0.00
    5.781282E-4 = product of:
      0.008671923 = sum of:
        0.0052914224 = weight(_text_:in in 382) [ClassicSimilarity], result of:
          0.0052914224 = score(doc=382,freq=2.0), product of:
            0.029340398 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.021569785 = queryNorm
            0.18034597 = fieldWeight in 382, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.09375 = fieldNorm(doc=382)
        0.0033805002 = weight(_text_:s in 382) [ClassicSimilarity], result of:
          0.0033805002 = score(doc=382,freq=2.0), product of:
            0.023451481 = queryWeight, product of:
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.021569785 = queryNorm
            0.14414869 = fieldWeight in 382, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.09375 = fieldNorm(doc=382)
      0.06666667 = coord(2/30)
    
    Pages
    S.239-246
  8. Losee, R.M.: Text windows and phrases differing by discipline, location in document, and syntactic structure (1996) 0.00
    5.43019E-4 = product of:
      0.008145284 = sum of:
        0.0061733257 = weight(_text_:in in 6962) [ClassicSimilarity], result of:
          0.0061733257 = score(doc=6962,freq=8.0), product of:
            0.029340398 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.021569785 = queryNorm
            0.21040362 = fieldWeight in 6962, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6962)
        0.0019719584 = weight(_text_:s in 6962) [ClassicSimilarity], result of:
          0.0019719584 = score(doc=6962,freq=2.0), product of:
            0.023451481 = queryWeight, product of:
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.021569785 = queryNorm
            0.08408674 = fieldWeight in 6962, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6962)
      0.06666667 = coord(2/30)
    
    Abstract
    Knowledge of window style, content, location, and grammatical structure may be used to classify documents as originating within a particular discipline or may be used to place a document on a theory vs. practice spectrum. Examines characteristics of phrases and text windows, including their number, location in documents, and grammatical construction, in addition to studying variations in these window characteristics across disciplines. Examines some of the linguistic regularities for individual disciplines, and suggests families of regularities that may provide helpful for the automatic classification of documents, as well as for information retrieval and filtering applications
    Source
    Information processing and management. 32(1996) no.6, S.747-767
  9. Mostafa, J.; Quiroga, L.M.; Palakal, M.: Filtering medical documents using automated and human classification methods (1998) 0.00
    5.0708273E-4 = product of:
      0.007606241 = sum of:
        0.005915991 = weight(_text_:in in 2326) [ClassicSimilarity], result of:
          0.005915991 = score(doc=2326,freq=10.0), product of:
            0.029340398 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.021569785 = queryNorm
            0.20163295 = fieldWeight in 2326, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=2326)
        0.0016902501 = weight(_text_:s in 2326) [ClassicSimilarity], result of:
          0.0016902501 = score(doc=2326,freq=2.0), product of:
            0.023451481 = queryWeight, product of:
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.021569785 = queryNorm
            0.072074346 = fieldWeight in 2326, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.046875 = fieldNorm(doc=2326)
      0.06666667 = coord(2/30)
    
    Abstract
    The goal of this research is to clarify the role of document classification in information filtering. An important function of classification, in managing computational complexity, is described and illustrated in the context of an existing filtering system. A parameter called classification homogeneity is presented for analyzing unsupervised automated classification by employing human classification as a control. 2 significant components of the automated classification approach, vocabulary discovery and classification scheme generation, are described in detail. Results of classification performance revealed considerable variability in the homogeneity of automatically produced classes. Based on the classification performance, different types of interest profiles were created. Subsequently, these profiles were used to perform filtering sessions. The filtering results showed that with increasing homogeneity, filtering performance improves, and, conversely, with decreasing homogeneity, filtering performance degrades
    Source
    Journal of the American Society for Information Science. 49(1998) no.14, S.1304-1318
  10. Savic, D.: Designing an expert system for classifying office documents (1994) 0.00
    4.8283124E-4 = product of:
      0.007242468 = sum of:
        0.004988801 = weight(_text_:in in 2655) [ClassicSimilarity], result of:
          0.004988801 = score(doc=2655,freq=4.0), product of:
            0.029340398 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.021569785 = queryNorm
            0.17003182 = fieldWeight in 2655, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0625 = fieldNorm(doc=2655)
        0.002253667 = weight(_text_:s in 2655) [ClassicSimilarity], result of:
          0.002253667 = score(doc=2655,freq=2.0), product of:
            0.023451481 = queryWeight, product of:
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.021569785 = queryNorm
            0.09609913 = fieldWeight in 2655, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.0625 = fieldNorm(doc=2655)
      0.06666667 = coord(2/30)
    
    Abstract
    Can records management benefit from artificial intelligence technology, in particular from expert systems? Gives an answer to this question by showing an example of a small scale prototype project in automatic classification of office documents. Project methodology and basic elements of an expert system's approach are elaborated to give guidelines to potential users of this promising technology
    Source
    Records management quarterly. 28(1994) no.3, S.20-29
  11. Cheng, P.T.K.; Wu, A.K.W.: ACS: an automatic classification system (1995) 0.00
    4.8283124E-4 = product of:
      0.007242468 = sum of:
        0.004988801 = weight(_text_:in in 2188) [ClassicSimilarity], result of:
          0.004988801 = score(doc=2188,freq=4.0), product of:
            0.029340398 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.021569785 = queryNorm
            0.17003182 = fieldWeight in 2188, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0625 = fieldNorm(doc=2188)
        0.002253667 = weight(_text_:s in 2188) [ClassicSimilarity], result of:
          0.002253667 = score(doc=2188,freq=2.0), product of:
            0.023451481 = queryWeight, product of:
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.021569785 = queryNorm
            0.09609913 = fieldWeight in 2188, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.0625 = fieldNorm(doc=2188)
      0.06666667 = coord(2/30)
    
    Abstract
    In this paper, we introduce ACS, an automatic classification system for school libraries. First, various approaches towards automatic classification, namely (i) rule-based, (ii) browse and search, and (iii) partial match, are critically reviewed. The central issues of scheme selection, text analysis and similarity measures are discussed. A novel approach towards detecting book-class similarity with Modified Overlap Coefficient (MOC) is also proposed. Finally, the design and implementation of ACS is presented. The test result of over 80% correctness in automatic classification and a cost reduction of 75% compared to manual classification suggest that ACS is highly adoptable
    Source
    Journal of information science. 21(1995) no.4, S.289-299
  12. Koch, T.; Ardö, A.; Brümmer, A.: ¬The building and maintenance of robot based internet search services : A review of current indexing and data collection methods. Prepared to meet the requirements of Work Package 3 of EU Telematics for Research, project DESIRE. Version D3.11v0.3 (Draft version 3) (1996) 0.00
    4.2788376E-4 = product of:
      0.006418256 = sum of:
        0.0052914224 = weight(_text_:in in 1669) [ClassicSimilarity], result of:
          0.0052914224 = score(doc=1669,freq=18.0), product of:
            0.029340398 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.021569785 = queryNorm
            0.18034597 = fieldWeight in 1669, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.03125 = fieldNorm(doc=1669)
        0.0011268335 = weight(_text_:s in 1669) [ClassicSimilarity], result of:
          0.0011268335 = score(doc=1669,freq=2.0), product of:
            0.023451481 = queryWeight, product of:
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.021569785 = queryNorm
            0.048049565 = fieldWeight in 1669, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.03125 = fieldNorm(doc=1669)
      0.06666667 = coord(2/30)
    
    Abstract
    After a short outline of problems, possibilities and difficulties of systematic information retrieval on the Internet and a description of efforts for development in this area, a specification of the terminology for this report is required. Although the process of retrieval is generally seen as an iterative process of browsing and information retrieval and several important services on the net have taken this fact into consideration, the emphasis of this report lays on the general retrieval tools for the whole of Internet. In order to be able to evaluate the differences, possibilities and restrictions of the different services it is necessary to begin with organizing the existing varieties in a typological/ taxonomical survey. The possibilities and weaknesses will be briefly compared and described for the most important services in the categories robot-based WWW-catalogues of different types, list- or form-based catalogues and simultaneous or collected search services respectively. It will however for different reasons not be possible to rank them in order of "best" services. Still more important are the weaknesses and problems common for all attempts of indexing the Internet. The problems of the quality of the input, the technical performance and the general problem of indexing virtual hypertext are shown to be at least as difficult as the different aspects of harvesting, indexing and information retrieval. Some of the attempts made in the area of further development of retrieval services will be mentioned in relation to descriptions of the contents of documents and standardization efforts. Internet harvesting and indexing technology and retrieval software is thoroughly reviewed. Details about all services and software are listed in analytical forms in Annex 1-3.
    Pages
    116 S
  13. Rose, J.R.; Gasteiger, J.: HORACE: an automatic system for the hierarchical classification of chemical reactions (1994) 0.00
    4.2247731E-4 = product of:
      0.0063371593 = sum of:
        0.004365201 = weight(_text_:in in 7696) [ClassicSimilarity], result of:
          0.004365201 = score(doc=7696,freq=4.0), product of:
            0.029340398 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.021569785 = queryNorm
            0.14877784 = fieldWeight in 7696, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0546875 = fieldNorm(doc=7696)
        0.0019719584 = weight(_text_:s in 7696) [ClassicSimilarity], result of:
          0.0019719584 = score(doc=7696,freq=2.0), product of:
            0.023451481 = queryWeight, product of:
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.021569785 = queryNorm
            0.08408674 = fieldWeight in 7696, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.0546875 = fieldNorm(doc=7696)
      0.06666667 = coord(2/30)
    
    Abstract
    Describes an automatic classification system for classifying chemical reactions. A detailed study of the classification of chemical reactions, based on topological and physicochemical features, is followed by an analysis of the hierarchical classification produced by the HORACE algorithm (Hierarchical Organization of Reactions through Attribute and Condition Eduction), which combines both approaches in a synergistic manner. The searching and updating of reaction hierarchies is demonstrated with the hierarchies produced for 2 data sets by the HORACE algorithm. Shows that reaction hierarchies provide an efficient access to reaction information and indicate the main reaction types for a given reaction scheme, define the scope of a reaction type, enable searchers to find unusual reactions, and can help in locating the reactions most relevant for a given problem
    Source
    Journal of chemical information and computer sciences. 34(1994) no.1, S.74-90
  14. Ruocco, A.S.; Frieder, O.: Clustering and classification of large document bases in a parallel environment (1997) 0.00
    4.2247731E-4 = product of:
      0.0063371593 = sum of:
        0.004365201 = weight(_text_:in in 1661) [ClassicSimilarity], result of:
          0.004365201 = score(doc=1661,freq=4.0), product of:
            0.029340398 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.021569785 = queryNorm
            0.14877784 = fieldWeight in 1661, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1661)
        0.0019719584 = weight(_text_:s in 1661) [ClassicSimilarity], result of:
          0.0019719584 = score(doc=1661,freq=2.0), product of:
            0.023451481 = queryWeight, product of:
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.021569785 = queryNorm
            0.08408674 = fieldWeight in 1661, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1661)
      0.06666667 = coord(2/30)
    
    Abstract
    Proposes the use of parallel computing systems to overcome the computationally intense clustering process. Examines 2 operations: clustering a document set and classifying the document set. Uses a subset of the TIPSTER corpus, specifically, articles from the Wall Street Journal. Document set classification was performed without the large storage requirements for ancillary data matrices. The time performance of the parallel systems was an improvement over sequential systems times, and produced the same clustering and classification scheme. Results show near linear speed up in higher threshold clustering applications
    Source
    Journal of the American Society for Information Science. 48(1997) no.10, S.932-943
  15. Orwig, R.E.; Chen, H.; Nunamaker, J.F.: ¬A graphical, self-organizing approach to classifying electronic meeting output (1997) 0.00
    4.2247731E-4 = product of:
      0.0063371593 = sum of:
        0.004365201 = weight(_text_:in in 6928) [ClassicSimilarity], result of:
          0.004365201 = score(doc=6928,freq=4.0), product of:
            0.029340398 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.021569785 = queryNorm
            0.14877784 = fieldWeight in 6928, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6928)
        0.0019719584 = weight(_text_:s in 6928) [ClassicSimilarity], result of:
          0.0019719584 = score(doc=6928,freq=2.0), product of:
            0.023451481 = queryWeight, product of:
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.021569785 = queryNorm
            0.08408674 = fieldWeight in 6928, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6928)
      0.06666667 = coord(2/30)
    
    Abstract
    Describes research in the application of a Kohonen Self-Organizing Map (SOM) to the problem of classification of electronic brainstorming output and an evaluation of the results. Describes an electronic meeting system and describes the classification problem that exists in the group problem solving process. Surveys the literature concerning classification. Describes the application of the Kohonen SOM to the meeting output classification problem. Describes an experiment that evaluated the classification performed by the Kohonen SOM by comparing it with those of a human expert and a Hopfield neural network. Discusses conclusions and directions for future research
    Source
    Journal of the American Society for Information Science. 48(1997) no.2, S.157-170
  16. Larson, R.R.: Experiments in automatic Library of Congress Classification (1992) 0.00
    4.181838E-4 = product of:
      0.0062727565 = sum of:
        0.0045825066 = weight(_text_:in in 1054) [ClassicSimilarity], result of:
          0.0045825066 = score(doc=1054,freq=6.0), product of:
            0.029340398 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.021569785 = queryNorm
            0.1561842 = fieldWeight in 1054, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=1054)
        0.0016902501 = weight(_text_:s in 1054) [ClassicSimilarity], result of:
          0.0016902501 = score(doc=1054,freq=2.0), product of:
            0.023451481 = queryWeight, product of:
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.021569785 = queryNorm
            0.072074346 = fieldWeight in 1054, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.046875 = fieldNorm(doc=1054)
      0.06666667 = coord(2/30)
    
    Abstract
    This article presents the results of research into the automatic selection of Library of Congress Classification numbers based on the titles and subject headings in MARC records. The method used in this study was based on partial match retrieval techniques using various elements of new recors (i.e., those to be classified) as "queries", and a test database of classification clusters generated from previously classified MARC records. Sixty individual methods for automatic classification were tested on a set of 283 new records, using all combinations of four different partial match methods, five query types, and three representations of search terms. The results indicate that if the best method for a particular case can be determined, then up to 86% of the new records may be correctly classified. The single method with the best accuracy was able to select the correct classification for about 46% of the new records.
    Source
    Journal of the American Society for Information Science. 43(1992), S.130-148
  17. Losee, R.M.; Haas, S.W.: Sublanguage terms : dictionaries, usage, and automatic classification (1995) 0.00
    3.854188E-4 = product of:
      0.0057812817 = sum of:
        0.003527615 = weight(_text_:in in 2650) [ClassicSimilarity], result of:
          0.003527615 = score(doc=2650,freq=2.0), product of:
            0.029340398 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.021569785 = queryNorm
            0.120230645 = fieldWeight in 2650, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0625 = fieldNorm(doc=2650)
        0.002253667 = weight(_text_:s in 2650) [ClassicSimilarity], result of:
          0.002253667 = score(doc=2650,freq=2.0), product of:
            0.023451481 = queryWeight, product of:
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.021569785 = queryNorm
            0.09609913 = fieldWeight in 2650, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.0625 = fieldNorm(doc=2650)
      0.06666667 = coord(2/30)
    
    Abstract
    The use of terms from natural and social science titles and abstracts is studied from the perspective of sublanguages and their specialized dictionaries. Explores different notions of sublanguage distinctiveness. Object methods for separating hard and soft sciences are suggested based on measures of sublanguage use, dictionary characteristics, and sublanguage distinctiveness. Abstracts were automatically classified with a high degree of accuracy by using a formula that condsiders the degree of uniqueness of terms in each sublanguage. This may prove useful for text filtering of information retrieval systems
    Source
    Journal of the American Society for Information Science. 46(1995) no.7, S.519-529
  18. Dolin, R.; Agrawal, D.; El Abbadi, A.; Pearlman, J.: Using automated classification for summarizing and selecting heterogeneous information sources (1998) 0.00
    3.6212345E-4 = product of:
      0.0054318514 = sum of:
        0.003741601 = weight(_text_:in in 316) [ClassicSimilarity], result of:
          0.003741601 = score(doc=316,freq=4.0), product of:
            0.029340398 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.021569785 = queryNorm
            0.12752387 = fieldWeight in 316, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=316)
        0.0016902501 = weight(_text_:s in 316) [ClassicSimilarity], result of:
          0.0016902501 = score(doc=316,freq=2.0), product of:
            0.023451481 = queryWeight, product of:
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.021569785 = queryNorm
            0.072074346 = fieldWeight in 316, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.046875 = fieldNorm(doc=316)
      0.06666667 = coord(2/30)
    
    Abstract
    Information retrieval over the Internet increasingly requires the filtering of thousands of heterogeneous information sources. Important sources of information include not only traditional databases with structured data and queries, but also increasing numbers of non-traditional, semi- or unstructured collections such as Web sites, FTP archives, etc. As the number and variability of sources increases, new ways of automatically summarizing, discovering, and selecting collections relevant to a user's query are needed. One such method involves the use of classification schemes, such as the Library of Congress Classification (LCC) [10], within which a collection may be represented based on its content, irrespective of the structure of the actual data or documents. For such a system to be useful in a large-scale distributed environment, it must be easy to use for both collection managers and users. As a result, it must be possible to classify documents automatically within a classification scheme. Furthermore, there must be a straightforward and intuitive interface with which the user may use the scheme to assist in information retrieval (IR).
    Pages
    11 S.
  19. Savic, D.: Automatic classification of office documents : review of available methods and techniques (1995) 0.00
    3.3724142E-4 = product of:
      0.005058621 = sum of:
        0.0030866629 = weight(_text_:in in 2219) [ClassicSimilarity], result of:
          0.0030866629 = score(doc=2219,freq=2.0), product of:
            0.029340398 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.021569785 = queryNorm
            0.10520181 = fieldWeight in 2219, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2219)
        0.0019719584 = weight(_text_:s in 2219) [ClassicSimilarity], result of:
          0.0019719584 = score(doc=2219,freq=2.0), product of:
            0.023451481 = queryWeight, product of:
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.021569785 = queryNorm
            0.08408674 = fieldWeight in 2219, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2219)
      0.06666667 = coord(2/30)
    
    Abstract
    Classification of office documents is one of the administrative functions carried out by almost every organization and institution which sends and receives correspondence. Processing of this increasing amount of information coming and out going mail, in particular its classification, is time consuming and expensive. More and more organizations are seeking a solution for meeting this challenge by designing computer based systems for automatic classification. Examines the present status of available knowledge and methodology which can be used for automatic classification of office documents. Besides a review of classic methods and techniques, the focus id also placed on the application of artificial intelligence
    Source
    Records management quarterly. 29(1995) no.4, S.3-18
  20. May, A.D.: Automatic classification of e-mail messages by message type (1997) 0.00
    3.3724142E-4 = product of:
      0.005058621 = sum of:
        0.0030866629 = weight(_text_:in in 6493) [ClassicSimilarity], result of:
          0.0030866629 = score(doc=6493,freq=2.0), product of:
            0.029340398 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.021569785 = queryNorm
            0.10520181 = fieldWeight in 6493, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6493)
        0.0019719584 = weight(_text_:s in 6493) [ClassicSimilarity], result of:
          0.0019719584 = score(doc=6493,freq=2.0), product of:
            0.023451481 = queryWeight, product of:
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.021569785 = queryNorm
            0.08408674 = fieldWeight in 6493, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.0872376 = idf(docFreq=40523, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6493)
      0.06666667 = coord(2/30)
    
    Abstract
    This article describes a system that automatically classifies e-mail messages in the HUMANIST electronic discussion group into one of 4 classes: questions, responses, announcement or administartive. A total of 1.372 messages were analyzed. The automatic classification of a message was based on string matching between a message text and predefined string sets for each of the massage types. The system's automated ability to accurately classify a message was compared against manually assigned codes. The Cohen's Kappa of .55 suggested that there was a statistical agreement between the automatic and manually assigned codes
    Source
    Journal of the American Society for Information Science. 48(1997) no.1, S.32-39