Search (31 results, page 1 of 2)

  • × theme_ss:"Automatisches Klassifizieren"
  1. Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.24
    0.23600978 = product of:
      0.3146797 = sum of:
        0.073939405 = product of:
          0.22181821 = sum of:
            0.22181821 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
              0.22181821 = score(doc=562,freq=2.0), product of:
                0.39468166 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046553567 = queryNorm
                0.56201804 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.33333334 = coord(1/3)
        0.22181821 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
          0.22181821 = score(doc=562,freq=2.0), product of:
            0.39468166 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046553567 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
        0.018922098 = product of:
          0.037844196 = sum of:
            0.037844196 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
              0.037844196 = score(doc=562,freq=2.0), product of:
                0.16302267 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046553567 = queryNorm
                0.23214069 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Content
    Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
    Date
    8. 1.2013 10:22:32
  2. Ahmed, M.; Mukhopadhyay, M.; Mukhopadhyay, P.: Automated knowledge organization : AI ML based subject indexing system for libraries (2023) 0.03
    0.029153839 = product of:
      0.116615355 = sum of:
        0.116615355 = weight(_text_:open in 977) [ClassicSimilarity], result of:
          0.116615355 = score(doc=977,freq=10.0), product of:
            0.20964009 = queryWeight, product of:
              4.5032015 = idf(docFreq=1330, maxDocs=44218)
              0.046553567 = queryNorm
            0.5562646 = fieldWeight in 977, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              4.5032015 = idf(docFreq=1330, maxDocs=44218)
              0.0390625 = fieldNorm(doc=977)
      0.25 = coord(1/4)
    
    Abstract
    The research study as reported here is an attempt to explore the possibilities of an AI/ML-based semi-automated indexing system in a library setup to handle large volumes of documents. It uses the Python virtual environment to install and configure an open source AI environment (named Annif) to feed the LOD (Linked Open Data) dataset of Library of Congress Subject Headings (LCSH) as a standard KOS (Knowledge Organisation System). The framework deployed the Turtle format of LCSH after cleaning the file with Skosify, applied an array of backend algorithms (namely TF-IDF, Omikuji, and NN-Ensemble) to measure relative performance, and selected Snowball as an analyser. The training of Annif was conducted with a large set of bibliographic records populated with subject descriptors (MARC tag 650$a) and indexed by trained LIS professionals. The training dataset is first treated with MarcEdit to export it in a format suitable for OpenRefine, and then in OpenRefine it undergoes many steps to produce a bibliographic record set suitable to train Annif. The framework, after training, has been tested with a bibliographic dataset to measure indexing efficiencies, and finally, the automated indexing framework is integrated with data wrangling software (OpenRefine) to produce suggested headings on a mass scale. The entire framework is based on open-source software, open datasets, and open standards.
  3. Schaalje, G.B.; Blades, N.J.; Funai, T.: ¬An open-set size-adjusted Bayesian classifier for authorship attribution (2013) 0.03
    0.02709896 = product of:
      0.10839584 = sum of:
        0.10839584 = weight(_text_:open in 1041) [ClassicSimilarity], result of:
          0.10839584 = score(doc=1041,freq=6.0), product of:
            0.20964009 = queryWeight, product of:
              4.5032015 = idf(docFreq=1330, maxDocs=44218)
              0.046553567 = queryNorm
            0.5170568 = fieldWeight in 1041, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.5032015 = idf(docFreq=1330, maxDocs=44218)
              0.046875 = fieldNorm(doc=1041)
      0.25 = coord(1/4)
    
    Abstract
    Recent studies of authorship attribution have used machine-learning methods including regularized multinomial logistic regression, neural nets, support vector machines, and the nearest shrunken centroid classifier to identify likely authors of disputed texts. These methods are all limited by an inability to perform open-set classification and account for text and corpus size. We propose a customized Bayesian logit-normal-beta-binomial classification model for supervised authorship attribution. The model is based on the beta-binomial distribution with an explicit inverse relationship between extra-binomial variation and text size. The model internally estimates the relationship of extra-binomial variation to text size, and uses Markov Chain Monte Carlo (MCMC) to produce distributions of posterior authorship probabilities instead of point estimates. We illustrate the method by training the machine-learning methods as well as the open-set Bayesian classifier on undisputed papers of The Federalist, and testing the method on documents historically attributed to Alexander Hamilton, John Jay, and James Madison. The Bayesian classifier was the best classifier of these texts.
  4. Bianchini, C.; Bargioni, S.: Automated classification using linked open data : a case study on faceted classification and Wikidata (2021) 0.03
    0.025813911 = product of:
      0.103255644 = sum of:
        0.103255644 = weight(_text_:open in 724) [ClassicSimilarity], result of:
          0.103255644 = score(doc=724,freq=4.0), product of:
            0.20964009 = queryWeight, product of:
              4.5032015 = idf(docFreq=1330, maxDocs=44218)
              0.046553567 = queryNorm
            0.49253768 = fieldWeight in 724, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.5032015 = idf(docFreq=1330, maxDocs=44218)
              0.0546875 = fieldNorm(doc=724)
      0.25 = coord(1/4)
    
    Abstract
    The Wikidata gadget, CCLitBox, for the automated classification of literary authors and works by a faceted classification and using Linked Open Data (LOD) is presented. The tool reproduces the classification algorithm of class O Literature of the Colon Classification and uses data freely available in Wikidata to create Colon Classification class numbers. CCLitBox is totally free and enables any user to classify literary authors and their works; it is easily accessible to everybody; it uses LOD from Wikidata but missing data for classification can be freely added if necessary; it is readymade for any cooperative and networked project.
  5. Dubin, D.: Dimensions and discriminability (1998) 0.03
    0.025661811 = product of:
      0.102647245 = sum of:
        0.102647245 = sum of:
          0.058495685 = weight(_text_:access in 2338) [ClassicSimilarity], result of:
            0.058495685 = score(doc=2338,freq=4.0), product of:
              0.15778996 = queryWeight, product of:
                3.389428 = idf(docFreq=4053, maxDocs=44218)
                0.046553567 = queryNorm
              0.3707187 = fieldWeight in 2338, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.389428 = idf(docFreq=4053, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2338)
          0.04415156 = weight(_text_:22 in 2338) [ClassicSimilarity], result of:
            0.04415156 = score(doc=2338,freq=2.0), product of:
              0.16302267 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046553567 = queryNorm
              0.2708308 = fieldWeight in 2338, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2338)
      0.25 = coord(1/4)
    
    Abstract
    Visualization interfaces can improve subject access by highlighting the inclusion of document representation components in similarity and discrimination relationships. Within a set of retrieved documents, what kinds of groupings can index terms and subject headings make explicit? The role of controlled vocabulary in classifying search output is examined
    Date
    22. 9.1997 19:16:05
    Source
    Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al
  6. Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998) 0.02
    0.021378566 = product of:
      0.08551426 = sum of:
        0.08551426 = sum of:
          0.0413627 = weight(_text_:access in 1673) [ClassicSimilarity], result of:
            0.0413627 = score(doc=1673,freq=2.0), product of:
              0.15778996 = queryWeight, product of:
                3.389428 = idf(docFreq=4053, maxDocs=44218)
                0.046553567 = queryNorm
              0.2621377 = fieldWeight in 1673, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.389428 = idf(docFreq=4053, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1673)
          0.04415156 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
            0.04415156 = score(doc=1673,freq=2.0), product of:
              0.16302267 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046553567 = queryNorm
              0.2708308 = fieldWeight in 1673, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1673)
      0.25 = coord(1/4)
    
    Abstract
    The Wolverhampton Web Library (WWLib) is a WWW search engine that provides access to UK based information. The experimental version developed in 1995, was a success but highlighted the need for a much higher degree of automation. An interesting feature of the experimental WWLib was that it organised information according to DDC. Discusses the advantages of classification and describes the automatic classifier that is being developed in Java as part of the new, fully automated WWLib
    Date
    1. 8.1996 22:08:06
  7. Wille, J.: Automatisches Klassifizieren bibliographischer Beschreibungsdaten : Vorgehensweise und Ergebnisse (2006) 0.02
    0.01825319 = product of:
      0.07301276 = sum of:
        0.07301276 = weight(_text_:open in 6090) [ClassicSimilarity], result of:
          0.07301276 = score(doc=6090,freq=2.0), product of:
            0.20964009 = queryWeight, product of:
              4.5032015 = idf(docFreq=1330, maxDocs=44218)
              0.046553567 = queryNorm
            0.3482767 = fieldWeight in 6090, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.5032015 = idf(docFreq=1330, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6090)
      0.25 = coord(1/4)
    
    Abstract
    Diese Arbeit befasst sich mit den praktischen Aspekten des Automatischen Klassifizierens bibliographischer Referenzdaten. Im Vordergrund steht die konkrete Vorgehensweise anhand des eigens zu diesem Zweck entwickelten Open Source-Programms COBRA "Classification Of Bibliographic Records, Automatic". Es werden die Rahmenbedingungen und Parameter f¨ur einen Einsatz im bibliothekarischen Umfeld geklärt. Schließlich erfolgt eine Auswertung von Klassifizierungsergebnissen am Beispiel sozialwissenschaftlicher Daten aus der Datenbank SOLIS.
  8. Giorgetti, D.; Sebastiani, F.: Automating survey coding by multiclass text categorization techniques (2003) 0.01
    0.013037993 = product of:
      0.05215197 = sum of:
        0.05215197 = weight(_text_:open in 5172) [ClassicSimilarity], result of:
          0.05215197 = score(doc=5172,freq=2.0), product of:
            0.20964009 = queryWeight, product of:
              4.5032015 = idf(docFreq=1330, maxDocs=44218)
              0.046553567 = queryNorm
            0.24876907 = fieldWeight in 5172, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.5032015 = idf(docFreq=1330, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5172)
      0.25 = coord(1/4)
    
    Abstract
    In this issue Giorgetti, and Sebastiani suggest that answers to open ended questions in survey instruments can be coded automatically by creating classifiers which learn from training sets of manually coded answers. The manual effort required is only that of classifying a representative set of documents, not creating a dictionary of words that trigger an assignment. They use a naive Bayesian probabilistic learner from Mc Callum's RAINBOW package and the multi-class support vector machine learner from Hsu and Lin's BSVM package, both examples of text categorization techniques. Data from the 1996 General Social Survey by the U.S. National Opinion Research Center provided a set of answers to three questions (previously tested by Viechnicki using a dictionary approach), their associated manually assigned category codes, and a complete set of predefined category codes. The learners were run on three random disjoint subsets of the answer sets to create the classifiers and a remaining set was used as a test set. The dictionary approach is out preformed by 18% for RAINBOW and by 17% for BSVM, while the standard deviation of the results is reduced by 28% and 34% respectively over the dictionary approach.
  9. Khoo, C.S.G.; Ng, K.; Ou, S.: ¬An exploratory study of human clustering of Web pages (2003) 0.01
    0.012216322 = product of:
      0.04886529 = sum of:
        0.04886529 = sum of:
          0.023635827 = weight(_text_:access in 2741) [ClassicSimilarity], result of:
            0.023635827 = score(doc=2741,freq=2.0), product of:
              0.15778996 = queryWeight, product of:
                3.389428 = idf(docFreq=4053, maxDocs=44218)
                0.046553567 = queryNorm
              0.14979297 = fieldWeight in 2741, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.389428 = idf(docFreq=4053, maxDocs=44218)
                0.03125 = fieldNorm(doc=2741)
          0.025229463 = weight(_text_:22 in 2741) [ClassicSimilarity], result of:
            0.025229463 = score(doc=2741,freq=2.0), product of:
              0.16302267 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046553567 = queryNorm
              0.15476047 = fieldWeight in 2741, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.03125 = fieldNorm(doc=2741)
      0.25 = coord(1/4)
    
    Abstract
    This study seeks to find out how human beings cluster Web pages naturally. Twenty Web pages retrieved by the Northem Light search engine for each of 10 queries were sorted by 3 subjects into categories that were natural or meaningful to them. lt was found that different subjects clustered the same set of Web pages quite differently and created different categories. The average inter-subject similarity of the clusters created was a low 0.27. Subjects created an average of 5.4 clusters for each sorting. The categories constructed can be divided into 10 types. About 1/3 of the categories created were topical. Another 20% of the categories relate to the degree of relevance or usefulness. The rest of the categories were subject-independent categories such as format, purpose, authoritativeness and direction to other sources. The authors plan to develop automatic methods for categorizing Web pages using the common categories created by the subjects. lt is hoped that the techniques developed can be used by Web search engines to automatically organize Web pages retrieved into categories that are natural to users. 1. Introduction The World Wide Web is an increasingly important source of information for people globally because of its ease of access, the ease of publishing, its ability to transcend geographic and national boundaries, its flexibility and heterogeneity and its dynamic nature. However, Web users also find it increasingly difficult to locate relevant and useful information in this vast information storehouse. Web search engines, despite their scope and power, appear to be quite ineffective. They retrieve too many pages, and though they attempt to rank retrieved pages in order of probable relevance, often the relevant documents do not appear in the top-ranked 10 or 20 documents displayed. Several studies have found that users do not know how to use the advanced features of Web search engines, and do not know how to formulate and re-formulate queries. Users also typically exert minimal effort in performing, evaluating and refining their searches, and are unwilling to scan more than 10 or 20 items retrieved (Jansen, Spink, Bateman & Saracevic, 1998). This suggests that the conventional ranked-list display of search results does not satisfy user requirements, and that better ways of presenting and summarizing search results have to be developed. One promising approach is to group retrieved pages into clusters or categories to allow users to navigate immediately to the "promising" clusters where the most useful Web pages are likely to be located. This approach has been adopted by a number of search engines (notably Northem Light) and search agents.
    Date
    12. 9.2004 9:56:22
  10. Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.01
    0.009461049 = product of:
      0.037844196 = sum of:
        0.037844196 = product of:
          0.07568839 = sum of:
            0.07568839 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
              0.07568839 = score(doc=1046,freq=2.0), product of:
                0.16302267 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046553567 = queryNorm
                0.46428138 = fieldWeight in 1046, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=1046)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    5. 5.2003 14:17:22
  11. Chan, L.M.; Lin, X.; Zeng, M.L.: Structural and multilingual approaches to subject access on the Web (2000) 0.01
    0.008863435 = product of:
      0.03545374 = sum of:
        0.03545374 = product of:
          0.07090748 = sum of:
            0.07090748 = weight(_text_:access in 507) [ClassicSimilarity], result of:
              0.07090748 = score(doc=507,freq=2.0), product of:
                0.15778996 = queryWeight, product of:
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.046553567 = queryNorm
                0.4493789 = fieldWeight in 507, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.09375 = fieldNorm(doc=507)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
  12. Reiner, U.: Automatische DDC-Klassifizierung von bibliografischen Titeldatensätzen (2009) 0.01
    0.007884207 = product of:
      0.03153683 = sum of:
        0.03153683 = product of:
          0.06307366 = sum of:
            0.06307366 = weight(_text_:22 in 611) [ClassicSimilarity], result of:
              0.06307366 = score(doc=611,freq=2.0), product of:
                0.16302267 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046553567 = queryNorm
                0.38690117 = fieldWeight in 611, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=611)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    22. 8.2009 12:54:24
  13. HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.01
    0.007884207 = product of:
      0.03153683 = sum of:
        0.03153683 = product of:
          0.06307366 = sum of:
            0.06307366 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
              0.06307366 = score(doc=2748,freq=2.0), product of:
                0.16302267 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046553567 = queryNorm
                0.38690117 = fieldWeight in 2748, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=2748)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    1. 2.2016 18:25:22
  14. Oberhauser, O.: Automatisches Klassifizieren : Entwicklungsstand - Methodik - Anwendungsbereiche (2005) 0.01
    0.0065189963 = product of:
      0.026075985 = sum of:
        0.026075985 = weight(_text_:open in 38) [ClassicSimilarity], result of:
          0.026075985 = score(doc=38,freq=2.0), product of:
            0.20964009 = queryWeight, product of:
              4.5032015 = idf(docFreq=1330, maxDocs=44218)
              0.046553567 = queryNorm
            0.12438454 = fieldWeight in 38, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.5032015 = idf(docFreq=1330, maxDocs=44218)
              0.01953125 = fieldNorm(doc=38)
      0.25 = coord(1/4)
    
    Footnote
    Zum Inhalt Auf einen kurzen einleitenden Abschnitt folgt eine Einführung in die grundlegende Methodik des automatischen Klassifizierens. Oberhauser erklärt hier Begriffe wie Einfach- und Mehrfachklassifizierung, Klassen- und Dokumentzentrierung, und geht danach auf die hauptsächlichen Anwendungen der automatischen Klassifikation von Textdokumenten, maschinelle Lernverfahren und Techniken der Dimensionsreduktion bei der Indexierung ein. Zwei weitere Unterkapitel sind der Erstellung von Klassifikatoren und den Methoden für deren Auswertung gewidmet. Das Kapitel wird abgerundet von einer kurzen Auflistung einiger Softwareprodukte für automatisches Klassifizieren, die sowohl kommerzielle Software, als auch Projekte aus dem Open-Source-Bereich umfasst. Der Hauptteil des Buches ist den großen Projekten zur automatischen Erschließung von Webdokumenten gewidmet, die von OCLC (Scorpion) sowie an den Universitäten Lund (Nordic WAIS/WWW, DESIRE II), Wolverhampton (WWLib-TOS, WWLib-TNG, Old ACE, ACE) und Oldenburg (GERHARD, GERHARD II) durchgeführt worden sind. Der Autor beschreibt hier sehr detailliert - wobei der Detailliertheitsgrad unterschiedlich ist, je nachdem, was aus der Projektdokumentation geschlossen werden kann - die jeweilige Zielsetzung des Projektes, die verwendete Klassifikation, die methodische Vorgehensweise sowie die Evaluierungsmethoden und -ergebnisse. Sofern Querverweise zu anderen Projekten bestehen, werden auch diese besprochen. Der Verfasser geht hier sehr genau auf wichtige Aspekte wie Vokabularbildung, Textaufbereitung und Gewichtung ein, so dass der Leser eine gute Vorstellung von den Ansätzen und der möglichen Weiterentwicklung des Projektes bekommt. In einem weiteren Kapitel wird auf einige kleinere Projekte eingegangen, die dem für Bibliotheken besonders interessanten Thema des automatischen Klassifizierens von Büchern sowie den Bereichen Patentliteratur, Mediendokumentation und dem Einsatz bei Informationsdiensten gewidmet sind. Die Darstellung wird ergänzt von einem Literaturverzeichnis mit über 250 Titeln zu den konkreten Projekten sowie einem Abkürzungs- und einem Abbildungsverzeichnis. In der abschließenden Diskussion der beschriebenen Projekte wird einerseits auf die Bedeutung der einzelnen Projekte für den methodischen Fortschritt eingegangen, andererseits aber auch einiges an Kritik geäußert, v. a. bezüglich der mangelnden Auswertung der Projektergebnisse und des Fehlens an brauchbarer Dokumentation. So waren z. B. die Projektseiten des Projekts GERHARD (www.gerhard.de/) auf den Stand von 1998 eingefroren, zurzeit [11.07.06] sind sie überhaupt nicht mehr erreichbar. Mit einigem Erstaunen stellt Oberhauser auch fest, dass - abgesehen von der fast 15 Jahre alten Untersuchung von Larsen - »keine signifikanten Studien oder Anwendungen aus dem Bibliotheksbereich vorliegen« (S. 139). Wie der Autor aber selbst ergänzend ausführt, dürfte dies daran liegen, dass sich bibliografische Metadaten wegen des geringen Textumfangs sehr schlecht für automatische Klassifikation eignen, und dass - wie frühere Ergebnisse gezeigt haben - das übliche TF/IDF-Verfahren nicht für Katalogisate geeignet ist (ibd.).
  15. Chan, L.M.; Lin, X.; Zeng, M.: Structural and multilingual approaches to subject access on the Web (1999) 0.01
    0.0059089568 = product of:
      0.023635827 = sum of:
        0.023635827 = product of:
          0.047271654 = sum of:
            0.047271654 = weight(_text_:access in 162) [ClassicSimilarity], result of:
              0.047271654 = score(doc=162,freq=2.0), product of:
                0.15778996 = queryWeight, product of:
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.046553567 = queryNorm
                0.29958594 = fieldWeight in 162, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.0625 = fieldNorm(doc=162)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
  16. Bock, H.-H.: Datenanalyse zur Strukturierung und Ordnung von Information (1989) 0.01
    0.005518945 = product of:
      0.02207578 = sum of:
        0.02207578 = product of:
          0.04415156 = sum of:
            0.04415156 = weight(_text_:22 in 141) [ClassicSimilarity], result of:
              0.04415156 = score(doc=141,freq=2.0), product of:
                0.16302267 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046553567 = queryNorm
                0.2708308 = fieldWeight in 141, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=141)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Pages
    S.1-22
  17. Automatic classification research at OCLC (2002) 0.01
    0.005518945 = product of:
      0.02207578 = sum of:
        0.02207578 = product of:
          0.04415156 = sum of:
            0.04415156 = weight(_text_:22 in 1563) [ClassicSimilarity], result of:
              0.04415156 = score(doc=1563,freq=2.0), product of:
                0.16302267 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046553567 = queryNorm
                0.2708308 = fieldWeight in 1563, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1563)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    5. 5.2003 9:22:09
  18. Yoon, Y.; Lee, C.; Lee, G.G.: ¬An effective procedure for constructing a hierarchical text classification system (2006) 0.01
    0.005518945 = product of:
      0.02207578 = sum of:
        0.02207578 = product of:
          0.04415156 = sum of:
            0.04415156 = weight(_text_:22 in 5273) [ClassicSimilarity], result of:
              0.04415156 = score(doc=5273,freq=2.0), product of:
                0.16302267 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046553567 = queryNorm
                0.2708308 = fieldWeight in 5273, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5273)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    22. 7.2006 16:24:52
  19. Yi, K.: Automatic text classification using library classification schemes : trends, issues and challenges (2007) 0.01
    0.005518945 = product of:
      0.02207578 = sum of:
        0.02207578 = product of:
          0.04415156 = sum of:
            0.04415156 = weight(_text_:22 in 2560) [ClassicSimilarity], result of:
              0.04415156 = score(doc=2560,freq=2.0), product of:
                0.16302267 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046553567 = queryNorm
                0.2708308 = fieldWeight in 2560, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2560)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    22. 9.2008 18:31:54
  20. Rose, J.R.; Gasteiger, J.: HORACE: an automatic system for the hierarchical classification of chemical reactions (1994) 0.01
    0.0051703374 = product of:
      0.02068135 = sum of:
        0.02068135 = product of:
          0.0413627 = sum of:
            0.0413627 = weight(_text_:access in 7696) [ClassicSimilarity], result of:
              0.0413627 = score(doc=7696,freq=2.0), product of:
                0.15778996 = queryWeight, product of:
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.046553567 = queryNorm
                0.2621377 = fieldWeight in 7696, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=7696)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Describes an automatic classification system for classifying chemical reactions. A detailed study of the classification of chemical reactions, based on topological and physicochemical features, is followed by an analysis of the hierarchical classification produced by the HORACE algorithm (Hierarchical Organization of Reactions through Attribute and Condition Eduction), which combines both approaches in a synergistic manner. The searching and updating of reaction hierarchies is demonstrated with the hierarchies produced for 2 data sets by the HORACE algorithm. Shows that reaction hierarchies provide an efficient access to reaction information and indicate the main reaction types for a given reaction scheme, define the scope of a reaction type, enable searchers to find unusual reactions, and can help in locating the reactions most relevant for a given problem