Search (33 results, page 1 of 2)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.29

0.29060683 = product of:
  0.52309227 = sum of:
    0.051431756 = product of:
      0.15429527 = sum of:
        0.15429527 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.15429527 = score(doc=562,freq=2.0), product of:
            0.27453792 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.03238235 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.15429527 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.15429527 = score(doc=562,freq=2.0), product of:
        0.27453792 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.03238235 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.15429527 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.15429527 = score(doc=562,freq=2.0), product of:
        0.27453792 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.03238235 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.15429527 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.15429527 = score(doc=562,freq=2.0), product of:
        0.27453792 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.03238235 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.008774723 = product of:
      0.026324168 = sum of:
        0.026324168 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.026324168 = score(doc=562,freq=2.0), product of:
            0.11339747 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03238235 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
  0.5555556 = coord(5/9)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Classification, automation, and new media : Proceedings of the 24th Annual Conference of the Gesellschaft für Klassifikation e.V., University of Passau, March 15 - 17, 2000 (2002) 0.02

0.024201244 = product of:
  0.2178112 = sum of:
    0.2178112 = weight(_text_:kongress in 5997) [ClassicSimilarity], result of:
      0.2178112 = score(doc=5997,freq=16.0), product of:
        0.21246347 = queryWeight, product of:
          6.5610886 = idf(docFreq=169, maxDocs=44218)
          0.03238235 = queryNorm
        1.0251701 = fieldWeight in 5997, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          6.5610886 = idf(docFreq=169, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5997)
  0.11111111 = coord(1/9)

RSWK: Datenanalyse / Kongress / Passau <2000>
Automatische Klassifikation / Kongress / Passau <2000>
Data Mining / Kongress / Passau <2000>
World Wide Web / Wissensorganisation / Kongress / Passau <2000>
Subject: Datenanalyse / Kongress / Passau <2000>
Automatische Klassifikation / Kongress / Passau <2000>
Data Mining / Kongress / Passau <2000>
World Wide Web / Wissensorganisation / Kongress / Passau <2000>

Dubin, D.: Dimensions and discriminability (1998) 0.01

0.011316975 = product of:
  0.050926384 = sum of:
    0.040689208 = weight(_text_:access in 2338) [ClassicSimilarity], result of:
      0.040689208 = score(doc=2338,freq=4.0), product of:
        0.10975764 = queryWeight, product of:
          3.389428 = idf(docFreq=4053, maxDocs=44218)
          0.03238235 = queryNorm
        0.3707187 = fieldWeight in 2338, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.389428 = idf(docFreq=4053, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2338)
    0.010237177 = product of:
      0.03071153 = sum of:
        0.03071153 = weight(_text_:22 in 2338) [ClassicSimilarity], result of:
          0.03071153 = score(doc=2338,freq=2.0), product of:
            0.11339747 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03238235 = queryNorm
            0.2708308 = fieldWeight in 2338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2338)
      0.33333334 = coord(1/3)
  0.22222222 = coord(2/9)

Abstract: Visualization interfaces can improve subject access by highlighting the inclusion of document representation components in similarity and discrimination relationships. Within a set of retrieved documents, what kinds of groupings can index terms and subject headings make explicit? The role of controlled vocabulary in classifying search output is examined
Date: 22. 9.1997 19:16:05
Source: Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al

Pfister, J.: Clustering von Patent-Dokumenten am Beispiel der Datenbanken des Fachinformationszentrums Karlsruhe (2006) 0.01

0.010108219 = product of:
  0.09097397 = sum of:
    0.09097397 = weight(_text_:konstanz in 5976) [ClassicSimilarity], result of:
      0.09097397 = score(doc=5976,freq=2.0), product of:
        0.18256405 = queryWeight, product of:
          5.637764 = idf(docFreq=427, maxDocs=44218)
          0.03238235 = queryNorm
        0.49831262 = fieldWeight in 5976, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.637764 = idf(docFreq=427, maxDocs=44218)
          0.0625 = fieldNorm(doc=5976)
  0.11111111 = coord(1/9)

Imprint: Konstanz : UVK Verlagsgesellschaft

Ahmed, M.; Mukhopadhyay, M.; Mukhopadhyay, P.: Automated knowledge organization : AI ML based subject indexing system for libraries (2023) 0.01
```
0.009012986 = product of:
  0.08111687 = sum of:
    0.08111687 = weight(_text_:open in 977) [ClassicSimilarity], result of:
      0.08111687 = score(doc=977,freq=10.0), product of:
        0.14582425 = queryWeight, product of:
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.03238235 = queryNorm
        0.5562646 = fieldWeight in 977, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.0390625 = fieldNorm(doc=977)
  0.11111111 = coord(1/9)
```
Abstract

The research study as reported here is an attempt to explore the possibilities of an AI/ML-based semi-automated indexing system in a library setup to handle large volumes of documents. It uses the Python virtual environment to install and configure an open source AI environment (named Annif) to feed the LOD (Linked Open Data) dataset of Library of Congress Subject Headings (LCSH) as a standard KOS (Knowledge Organisation System). The framework deployed the Turtle format of LCSH after cleaning the file with Skosify, applied an array of backend algorithms (namely TF-IDF, Omikuji, and NN-Ensemble) to measure relative performance, and selected Snowball as an analyser. The training of Annif was conducted with a large set of bibliographic records populated with subject descriptors (MARC tag 650$a) and indexed by trained LIS professionals. The training dataset is first treated with MarcEdit to export it in a format suitable for OpenRefine, and then in OpenRefine it undergoes many steps to produce a bibliographic record set suitable to train Annif. The framework, after training, has been tested with a bibliographic dataset to measure indexing efficiencies, and finally, the automated indexing framework is integrated with data wrangling software (OpenRefine) to produce suggested headings on a mass scale. The entire framework is based on open-source software, open datasets, and open standards.

Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998) 0.01

0.008668621 = product of:
  0.039008792 = sum of:
    0.028771617 = weight(_text_:access in 1673) [ClassicSimilarity], result of:
      0.028771617 = score(doc=1673,freq=2.0), product of:
        0.10975764 = queryWeight, product of:
          3.389428 = idf(docFreq=4053, maxDocs=44218)
          0.03238235 = queryNorm
        0.2621377 = fieldWeight in 1673, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.389428 = idf(docFreq=4053, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1673)
    0.010237177 = product of:
      0.03071153 = sum of:
        0.03071153 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
          0.03071153 = score(doc=1673,freq=2.0), product of:
            0.11339747 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03238235 = queryNorm
            0.2708308 = fieldWeight in 1673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1673)
      0.33333334 = coord(1/3)
  0.22222222 = coord(2/9)

Abstract: The Wolverhampton Web Library (WWLib) is a WWW search engine that provides access to UK based information. The experimental version developed in 1995, was a success but highlighted the need for a much higher degree of automation. An interesting feature of the experimental WWLib was that it organised information according to DDC. Discusses the advantages of classification and describes the automatic classifier that is being developed in Java as part of the new, fully automated WWLib
Date: 1. 8.1996 22:08:06

Schaalje, G.B.; Blades, N.J.; Funai, T.: ¬An open-set size-adjusted Bayesian classifier for authorship attribution (2013) 0.01
```
0.008377714 = product of:
  0.07539943 = sum of:
    0.07539943 = weight(_text_:open in 1041) [ClassicSimilarity], result of:
      0.07539943 = score(doc=1041,freq=6.0), product of:
        0.14582425 = queryWeight, product of:
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.03238235 = queryNorm
        0.5170568 = fieldWeight in 1041, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.046875 = fieldNorm(doc=1041)
  0.11111111 = coord(1/9)
```
Abstract

Recent studies of authorship attribution have used machine-learning methods including regularized multinomial logistic regression, neural nets, support vector machines, and the nearest shrunken centroid classifier to identify likely authors of disputed texts. These methods are all limited by an inability to perform open-set classification and account for text and corpus size. We propose a customized Bayesian logit-normal-beta-binomial classification model for supervised authorship attribution. The model is based on the beta-binomial distribution with an explicit inverse relationship between extra-binomial variation and text size. The model internally estimates the relationship of extra-binomial variation to text size, and uses Markov Chain Monte Carlo (MCMC) to produce distributions of posterior authorship probabilities instead of point estimates. We illustrate the method by training the machine-learning methods as well as the open-set Bayesian classifier on undisputed papers of The Federalist, and testing the method on documents historically attributed to Alexander Hamilton, John Jay, and James Madison. The Bayesian classifier was the best classifier of these texts.
Bianchini, C.; Bargioni, S.: Automated classification using linked open data : a case study on faceted classification and Wikidata (2021) 0.01
```
0.007980438 = product of:
  0.07182394 = sum of:
    0.07182394 = weight(_text_:open in 724) [ClassicSimilarity], result of:
      0.07182394 = score(doc=724,freq=4.0), product of:
        0.14582425 = queryWeight, product of:
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.03238235 = queryNorm
        0.49253768 = fieldWeight in 724, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.0546875 = fieldNorm(doc=724)
  0.11111111 = coord(1/9)
```
Abstract

The Wikidata gadget, CCLitBox, for the automated classification of literary authors and works by a faceted classification and using Linked Open Data (LOD) is presented. The tool reproduces the classification algorithm of class O Literature of the Colon Classification and uses data freely available in Wikidata to create Colon Classification class numbers. CCLitBox is totally free and enables any user to classify literary authors and their works; it is easily accessible to everybody; it uses LOD from Wikidata but missing data for classification can be freely added if necessary; it is readymade for any cooperative and networked project.
Wille, J.: Automatisches Klassifizieren bibliographischer Beschreibungsdaten : Vorgehensweise und Ergebnisse (2006) 0.01
```
0.0056430213 = product of:
  0.05078719 = sum of:
    0.05078719 = weight(_text_:open in 6090) [ClassicSimilarity], result of:
      0.05078719 = score(doc=6090,freq=2.0), product of:
        0.14582425 = queryWeight, product of:
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.03238235 = queryNorm
        0.3482767 = fieldWeight in 6090, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.0546875 = fieldNorm(doc=6090)
  0.11111111 = coord(1/9)
```
Abstract

Diese Arbeit befasst sich mit den praktischen Aspekten des Automatischen Klassifizierens bibliographischer Referenzdaten. Im Vordergrund steht die konkrete Vorgehensweise anhand des eigens zu diesem Zweck entwickelten Open Source-Programms COBRA "Classification Of Bibliographic Records, Automatic". Es werden die Rahmenbedingungen und Parameter f¨ur einen Einsatz im bibliothekarischen Umfeld geklärt. Schließlich erfolgt eine Auswertung von Klassifizierungsergebnissen am Beispiel sozialwissenschaftlicher Daten aus der Datenbank SOLIS.

Chan, L.M.; Lin, X.; Zeng, M.L.: Structural and multilingual approaches to subject access on the Web (2000) 0.01

0.0054803076 = product of:
  0.04932277 = sum of:
    0.04932277 = weight(_text_:access in 507) [ClassicSimilarity], result of:
      0.04932277 = score(doc=507,freq=2.0), product of:
        0.10975764 = queryWeight, product of:
          3.389428 = idf(docFreq=4053, maxDocs=44218)
          0.03238235 = queryNorm
        0.4493789 = fieldWeight in 507, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.389428 = idf(docFreq=4053, maxDocs=44218)
          0.09375 = fieldNorm(doc=507)
  0.11111111 = coord(1/9)

Khoo, C.S.G.; Ng, K.; Ou, S.: ¬An exploratory study of human clustering of Web pages (2003) 0.00
```
0.0049534976 = product of:
  0.022290738 = sum of:
    0.016440922 = weight(_text_:access in 2741) [ClassicSimilarity], result of:
      0.016440922 = score(doc=2741,freq=2.0), product of:
        0.10975764 = queryWeight, product of:
          3.389428 = idf(docFreq=4053, maxDocs=44218)
          0.03238235 = queryNorm
        0.14979297 = fieldWeight in 2741, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.389428 = idf(docFreq=4053, maxDocs=44218)
          0.03125 = fieldNorm(doc=2741)
    0.0058498154 = product of:
      0.017549446 = sum of:
        0.017549446 = weight(_text_:22 in 2741) [ClassicSimilarity], result of:
          0.017549446 = score(doc=2741,freq=2.0), product of:
            0.11339747 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03238235 = queryNorm
            0.15476047 = fieldWeight in 2741, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=2741)
      0.33333334 = coord(1/3)
  0.22222222 = coord(2/9)
```
Abstract

This study seeks to find out how human beings cluster Web pages naturally. Twenty Web pages retrieved by the Northem Light search engine for each of 10 queries were sorted by 3 subjects into categories that were natural or meaningful to them. lt was found that different subjects clustered the same set of Web pages quite differently and created different categories. The average inter-subject similarity of the clusters created was a low 0.27. Subjects created an average of 5.4 clusters for each sorting. The categories constructed can be divided into 10 types. About 1/3 of the categories created were topical. Another 20% of the categories relate to the degree of relevance or usefulness. The rest of the categories were subject-independent categories such as format, purpose, authoritativeness and direction to other sources. The authors plan to develop automatic methods for categorizing Web pages using the common categories created by the subjects. lt is hoped that the techniques developed can be used by Web search engines to automatically organize Web pages retrieved into categories that are natural to users. 1. Introduction The World Wide Web is an increasingly important source of information for people globally because of its ease of access, the ease of publishing, its ability to transcend geographic and national boundaries, its flexibility and heterogeneity and its dynamic nature. However, Web users also find it increasingly difficult to locate relevant and useful information in this vast information storehouse. Web search engines, despite their scope and power, appear to be quite ineffective. They retrieve too many pages, and though they attempt to rank retrieved pages in order of probable relevance, often the relevant documents do not appear in the top-ranked 10 or 20 documents displayed. Several studies have found that users do not know how to use the advanced features of Web search engines, and do not know how to formulate and re-formulate queries. Users also typically exert minimal effort in performing, evaluating and refining their searches, and are unwilling to scan more than 10 or 20 items retrieved (Jansen, Spink, Bateman & Saracevic, 1998). This suggests that the conventional ranked-list display of search results does not satisfy user requirements, and that better ways of presenting and summarizing search results have to be developed. One promising approach is to group retrieved pages into clusters or categories to allow users to navigate immediately to the "promising" clusters where the most useful Web pages are likely to be located. This approach has been adopted by a number of search engines (notably Northem Light) and search agents.

Date

12. 9.2004 9:56:22
Giorgetti, D.; Sebastiani, F.: Automating survey coding by multiclass text categorization techniques (2003) 0.00
```
0.004030729 = product of:
  0.036276564 = sum of:
    0.036276564 = weight(_text_:open in 5172) [ClassicSimilarity], result of:
      0.036276564 = score(doc=5172,freq=2.0), product of:
        0.14582425 = queryWeight, product of:
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.03238235 = queryNorm
        0.24876907 = fieldWeight in 5172, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5172)
  0.11111111 = coord(1/9)
```
Abstract

In this issue Giorgetti, and Sebastiani suggest that answers to open ended questions in survey instruments can be coded automatically by creating classifiers which learn from training sets of manually coded answers. The manual effort required is only that of classifying a representative set of documents, not creating a dictionary of words that trigger an assignment. They use a naive Bayesian probabilistic learner from Mc Callum's RAINBOW package and the multi-class support vector machine learner from Hsu and Lin's BSVM package, both examples of text categorization techniques. Data from the 1996 General Social Survey by the U.S. National Opinion Research Center provided a set of answers to three questions (previously tested by Viechnicki using a dictionary approach), their associated manually assigned category codes, and a complete set of predefined category codes. The learners were run on three random disjoint subsets of the answer sets to create the classifiers and a remaining set was used as a test set. The dictionary approach is out preformed by 18% for RAINBOW and by 17% for BSVM, while the standard deviation of the results is reduced by 28% and 34% respectively over the dictionary approach.

Chan, L.M.; Lin, X.; Zeng, M.: Structural and multilingual approaches to subject access on the Web (1999) 0.00

0.0036535384 = product of:
  0.032881845 = sum of:
    0.032881845 = weight(_text_:access in 162) [ClassicSimilarity], result of:
      0.032881845 = score(doc=162,freq=2.0), product of:
        0.10975764 = queryWeight, product of:
          3.389428 = idf(docFreq=4053, maxDocs=44218)
          0.03238235 = queryNorm
        0.29958594 = fieldWeight in 162, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.389428 = idf(docFreq=4053, maxDocs=44218)
          0.0625 = fieldNorm(doc=162)
  0.11111111 = coord(1/9)

Rose, J.R.; Gasteiger, J.: HORACE: an automatic system for the hierarchical classification of chemical reactions (1994) 0.00
```
0.0031968462 = product of:
  0.028771617 = sum of:
    0.028771617 = weight(_text_:access in 7696) [ClassicSimilarity], result of:
      0.028771617 = score(doc=7696,freq=2.0), product of:
        0.10975764 = queryWeight, product of:
          3.389428 = idf(docFreq=4053, maxDocs=44218)
          0.03238235 = queryNorm
        0.2621377 = fieldWeight in 7696, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.389428 = idf(docFreq=4053, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7696)
  0.11111111 = coord(1/9)
```
Abstract

Describes an automatic classification system for classifying chemical reactions. A detailed study of the classification of chemical reactions, based on topological and physicochemical features, is followed by an analysis of the hierarchical classification produced by the HORACE algorithm (Hierarchical Organization of Reactions through Attribute and Condition Eduction), which combines both approaches in a synergistic manner. The searching and updating of reaction hierarchies is demonstrated with the hierarchies produced for 2 data sets by the HORACE algorithm. Shows that reaction hierarchies provide an efficient access to reaction information and indicate the main reaction types for a given reaction scheme, define the scope of a reaction type, enable searchers to find unusual reactions, and can help in locating the reactions most relevant for a given problem

Godby, C.J.; Stuler, J.: ¬The Library of Congress Classification as a knowledge base for automatic subject categorization : subject access issues (2003) 0.00

0.0031968462 = product of:
  0.028771617 = sum of:
    0.028771617 = weight(_text_:access in 3962) [ClassicSimilarity], result of:
      0.028771617 = score(doc=3962,freq=2.0), product of:
        0.10975764 = queryWeight, product of:
          3.389428 = idf(docFreq=4053, maxDocs=44218)
          0.03238235 = queryNorm
        0.2621377 = fieldWeight in 3962, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.389428 = idf(docFreq=4053, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3962)
  0.11111111 = coord(1/9)

Hagedorn, K.; Chapman, S.; Newman, D.: Enhancing search and browse using automated clustering of subject metadata (2007) 0.00
```
0.0027401538 = product of:
  0.024661385 = sum of:
    0.024661385 = weight(_text_:access in 1168) [ClassicSimilarity], result of:
      0.024661385 = score(doc=1168,freq=2.0), product of:
        0.10975764 = queryWeight, product of:
          3.389428 = idf(docFreq=4053, maxDocs=44218)
          0.03238235 = queryNorm
        0.22468945 = fieldWeight in 1168, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.389428 = idf(docFreq=4053, maxDocs=44218)
          0.046875 = fieldNorm(doc=1168)
  0.11111111 = coord(1/9)
```
Abstract

The Web puzzle of online information resources often hinders end-users from effective and efficient access to these resources. Clustering resources into appropriate subject-based groupings may help alleviate these difficulties, but will it work with heterogeneous material? The University of Michigan and the University of California Irvine joined forces to test automatically enhancing metadata records using the Topic Modeling algorithm on the varied OAIster corpus. We created labels for the resulting clusters of metadata records, matched the clusters to an in-house classification system, and developed a prototype that would showcase methods for search and retrieval using the enhanced records. Results indicated that while the algorithm was somewhat time-intensive to run and using a local classification scheme had its drawbacks, precise clustering of records was achieved and the prototype interface proved that faceted classification could be powerful in helping end-users find resources.
Oberhauser, O.: Automatisches Klassifizieren : Entwicklungsstand - Methodik - Anwendungsbereiche (2005) 0.00
```
0.0020153646 = product of:
  0.018138282 = sum of:
    0.018138282 = weight(_text_:open in 38) [ClassicSimilarity], result of:
      0.018138282 = score(doc=38,freq=2.0), product of:
        0.14582425 = queryWeight, product of:
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.03238235 = queryNorm
        0.12438454 = fieldWeight in 38, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.01953125 = fieldNorm(doc=38)
  0.11111111 = coord(1/9)
```
Footnote

Zum Inhalt Auf einen kurzen einleitenden Abschnitt folgt eine Einführung in die grundlegende Methodik des automatischen Klassifizierens. Oberhauser erklärt hier Begriffe wie Einfach- und Mehrfachklassifizierung, Klassen- und Dokumentzentrierung, und geht danach auf die hauptsächlichen Anwendungen der automatischen Klassifikation von Textdokumenten, maschinelle Lernverfahren und Techniken der Dimensionsreduktion bei der Indexierung ein. Zwei weitere Unterkapitel sind der Erstellung von Klassifikatoren und den Methoden für deren Auswertung gewidmet. Das Kapitel wird abgerundet von einer kurzen Auflistung einiger Softwareprodukte für automatisches Klassifizieren, die sowohl kommerzielle Software, als auch Projekte aus dem Open-Source-Bereich umfasst. Der Hauptteil des Buches ist den großen Projekten zur automatischen Erschließung von Webdokumenten gewidmet, die von OCLC (Scorpion) sowie an den Universitäten Lund (Nordic WAIS/WWW, DESIRE II), Wolverhampton (WWLib-TOS, WWLib-TNG, Old ACE, ACE) und Oldenburg (GERHARD, GERHARD II) durchgeführt worden sind. Der Autor beschreibt hier sehr detailliert - wobei der Detailliertheitsgrad unterschiedlich ist, je nachdem, was aus der Projektdokumentation geschlossen werden kann - die jeweilige Zielsetzung des Projektes, die verwendete Klassifikation, die methodische Vorgehensweise sowie die Evaluierungsmethoden und -ergebnisse. Sofern Querverweise zu anderen Projekten bestehen, werden auch diese besprochen. Der Verfasser geht hier sehr genau auf wichtige Aspekte wie Vokabularbildung, Textaufbereitung und Gewichtung ein, so dass der Leser eine gute Vorstellung von den Ansätzen und der möglichen Weiterentwicklung des Projektes bekommt. In einem weiteren Kapitel wird auf einige kleinere Projekte eingegangen, die dem für Bibliotheken besonders interessanten Thema des automatischen Klassifizierens von Büchern sowie den Bereichen Patentliteratur, Mediendokumentation und dem Einsatz bei Informationsdiensten gewidmet sind. Die Darstellung wird ergänzt von einem Literaturverzeichnis mit über 250 Titeln zu den konkreten Projekten sowie einem Abkürzungs- und einem Abbildungsverzeichnis. In der abschließenden Diskussion der beschriebenen Projekte wird einerseits auf die Bedeutung der einzelnen Projekte für den methodischen Fortschritt eingegangen, andererseits aber auch einiges an Kritik geäußert, v. a. bezüglich der mangelnden Auswertung der Projektergebnisse und des Fehlens an brauchbarer Dokumentation. So waren z. B. die Projektseiten des Projekts GERHARD (www.gerhard.de/) auf den Stand von 1998 eingefroren, zurzeit [11.07.06] sind sie überhaupt nicht mehr erreichbar. Mit einigem Erstaunen stellt Oberhauser auch fest, dass - abgesehen von der fast 15 Jahre alten Untersuchung von Larsen - »keine signifikanten Studien oder Anwendungen aus dem Bibliotheksbereich vorliegen« (S. 139). Wie der Autor aber selbst ergänzend ausführt, dürfte dies daran liegen, dass sich bibliografische Metadaten wegen des geringen Textumfangs sehr schlecht für automatische Klassifikation eignen, und dass - wie frühere Ergebnisse gezeigt haben - das übliche TF/IDF-Verfahren nicht für Katalogisate geeignet ist (ibd.).

Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.00

0.0019499385 = product of:
  0.017549446 = sum of:
    0.017549446 = product of:
      0.052648336 = sum of:
        0.052648336 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
          0.052648336 = score(doc=1046,freq=2.0), product of:
            0.11339747 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03238235 = queryNorm
            0.46428138 = fieldWeight in 1046, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=1046)
      0.33333334 = coord(1/3)
  0.11111111 = coord(1/9)

Date: 5. 5.2003 14:17:22

Search Engines and Beyond : Developing efficient knowledge management systems, April 19-20 1999, Boston, Mass (1999) 0.00
```
0.0018267692 = product of:
  0.016440922 = sum of:
    0.016440922 = weight(_text_:access in 2596) [ClassicSimilarity], result of:
      0.016440922 = score(doc=2596,freq=2.0), product of:
        0.10975764 = queryWeight, product of:
          3.389428 = idf(docFreq=4053, maxDocs=44218)
          0.03238235 = queryNorm
        0.14979297 = fieldWeight in 2596, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.389428 = idf(docFreq=4053, maxDocs=44218)
          0.03125 = fieldNorm(doc=2596)
  0.11111111 = coord(1/9)
```
Content

Ramana Rao (Inxight, Palo Alto, CA) 7 ± 2 Insights on achieving Effective Information Access Session One: Updates and a twelve month perspective Danny Sullivan (Search Engine Watch, US / England) Portalization and other search trends Carol Tenopir (University of Tennessee) Search realities faced by end users and professional searchers Session Two: Today's search engines and beyond Daniel Hoogterp (Retrieval Technologies, McLean, VA) Effective presentation and utilization of search techniques Rick Kenny (Fulcrum Technologies, Ontario, Canada) Beyond document clustering: The knowledge impact statement Gary Stock (Ingenius, Kalamazoo, MI) Automated change monitoring Gary Culliss (Direct Hit, Wellesley Hills, MA) User popularity ranked search engines Byron Dom (IBM, CA) Automatically finding the best pages on the World Wide Web (CLEVER) Peter Tomassi (LookSmart, San Francisco, CA) Adding human intellect to search technology Session Three: Panel discussion: Human v automated categorization and editing Ev Brenner (New York, NY)- Chairman James Callan (University of Massachusetts, MA) Marc Krellenstein (Northern Light Technology, Cambridge, MA) Dan Miller (Ask Jeeves, Berkeley, CA) Session Four: Updates and a twelve month perspective Steve Arnold (AIT, Harrods Creek, KY) Review: The leading edge in search and retrieval software Ellen Voorhees (NIST, Gaithersburg, MD) TREC update Session Five: Search engines now and beyond Intelligent Agents John Snyder (Muscat, Cambridge, England) Practical issues behind intelligent agents Text summarization Therese Firmin, (Dept of Defense, Ft George G. Meade, MD) The TIPSTER/SUMMAC evaluation of automatic text summarization systems Cross language searching Elizabeth Liddy (TextWise, Syracuse, NY) A conceptual interlingua approach to cross-language retrieval. Video search and retrieval Armon Amir (IBM, Almaden, CA) CueVideo: Modular system for automatic indexing and browsing of video/audio Speech recognition Michael Witbrock (Lycos, Waltham, MA) Retrieval of spoken documents Visualization James A. Wise (Integral Visuals, Richland, WA) Information visualization in the new millennium: Emerging science or passing fashion? Text mining David Evans (Claritech, Pittsburgh, PA) Text mining - towards decision support
Piros, A.: Automatic interpretation of complex UDC numbers : towards support for library systems (2015) 0.00
```
0.0018267692 = product of:
  0.016440922 = sum of:
    0.016440922 = weight(_text_:access in 2301) [ClassicSimilarity], result of:
      0.016440922 = score(doc=2301,freq=2.0), product of:
        0.10975764 = queryWeight, product of:
          3.389428 = idf(docFreq=4053, maxDocs=44218)
          0.03238235 = queryNorm
        0.14979297 = fieldWeight in 2301, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.389428 = idf(docFreq=4053, maxDocs=44218)
          0.03125 = fieldNorm(doc=2301)
  0.11111111 = coord(1/9)
```
Abstract

Analytico-synthetic and faceted classifications, such as Universal Decimal Classification (UDC) express content of documents with complex, pre-combined classification codes. Without classification authority control that would help manage and access structured notations, the use of UDC codes in searching and browsing is limited. Existing UDC parsing solutions are usually created for a particular database system or a specific task and are not widely applicable. The approach described in this paper provides a solution by which the analysis and interpretation of UDC notations would be stored into an intermediate format (in this case, in XML) by automatic means without any data or information loss. Due to its richness, the output file can be converted into different formats, such as standard mark-up and data exchange formats or simple lists of the recommended entry points of a UDC number. The program can also be used to create authority records containing complex UDC numbers which can be comprehensively analysed in order to be retrieved effectively. The Java program, as well as the corresponding schema definition it employs, is under continuous development. The current version of the interpreter software is now available online for testing purposes at the following web site: http://interpreter-eto.rhcloud.com. The future plan is to implement conversion methods for standard formats and to create standard online interfaces in order to make it possible to use the features of software as a service. This would result in the algorithm being able to be employed both in existing and future library systems to analyse UDC numbers without any significant programming effort.

Search (33 results, page 1 of 2)

Authors

Years

Languages

Types

Themes

Subjects