Search (57 results, page 1 of 3)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.07

0.072704256 = sum of:
  0.054207634 = product of:
    0.21683054 = sum of:
      0.21683054 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
        0.21683054 = score(doc=562,freq=2.0), product of:
          0.38580707 = queryWeight, product of:
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.04550679 = queryNorm
          0.56201804 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
    0.25 = coord(1/4)
  0.018496625 = product of:
    0.03699325 = sum of:
      0.03699325 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
        0.03699325 = score(doc=562,freq=2.0), product of:
          0.15935703 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.04550679 = queryNorm
          0.23214069 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
    0.5 = coord(1/2)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Dubin, D.: Dimensions and discriminability (1998) 0.06

0.06056908 = product of:
  0.12113816 = sum of:
    0.12113816 = sum of:
      0.07797937 = weight(_text_:subject in 2338) [ClassicSimilarity], result of:
        0.07797937 = score(doc=2338,freq=6.0), product of:
          0.16275941 = queryWeight, product of:
            3.576596 = idf(docFreq=3361, maxDocs=44218)
            0.04550679 = queryNorm
          0.4791082 = fieldWeight in 2338, product of:
            2.4494898 = tf(freq=6.0), with freq of:
              6.0 = termFreq=6.0
            3.576596 = idf(docFreq=3361, maxDocs=44218)
            0.0546875 = fieldNorm(doc=2338)
      0.043158792 = weight(_text_:22 in 2338) [ClassicSimilarity], result of:
        0.043158792 = score(doc=2338,freq=2.0), product of:
          0.15935703 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.04550679 = queryNorm
          0.2708308 = fieldWeight in 2338, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=2338)
  0.5 = coord(1/2)

Abstract: Visualization interfaces can improve subject access by highlighting the inclusion of document representation components in similarity and discrimination relationships. Within a set of retrieved documents, what kinds of groupings can index terms and subject headings make explicit? The role of controlled vocabulary in classifying search output is examined
Date: 22. 9.1997 19:16:05
Source: Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al

Khoo, C.S.G.; Ng, K.; Ou, S.: ¬An exploratory study of human clustering of Web pages (2003) 0.03
```
0.030522479 = product of:
  0.061044957 = sum of:
    0.061044957 = sum of:
      0.03638279 = weight(_text_:subject in 2741) [ClassicSimilarity], result of:
        0.03638279 = score(doc=2741,freq=4.0), product of:
          0.16275941 = queryWeight, product of:
            3.576596 = idf(docFreq=3361, maxDocs=44218)
            0.04550679 = queryNorm
          0.22353725 = fieldWeight in 2741, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            3.576596 = idf(docFreq=3361, maxDocs=44218)
            0.03125 = fieldNorm(doc=2741)
      0.024662167 = weight(_text_:22 in 2741) [ClassicSimilarity], result of:
        0.024662167 = score(doc=2741,freq=2.0), product of:
          0.15935703 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.04550679 = queryNorm
          0.15476047 = fieldWeight in 2741, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.03125 = fieldNorm(doc=2741)
  0.5 = coord(1/2)
```
Abstract

This study seeks to find out how human beings cluster Web pages naturally. Twenty Web pages retrieved by the Northem Light search engine for each of 10 queries were sorted by 3 subjects into categories that were natural or meaningful to them. lt was found that different subjects clustered the same set of Web pages quite differently and created different categories. The average inter-subject similarity of the clusters created was a low 0.27. Subjects created an average of 5.4 clusters for each sorting. The categories constructed can be divided into 10 types. About 1/3 of the categories created were topical. Another 20% of the categories relate to the degree of relevance or usefulness. The rest of the categories were subject-independent categories such as format, purpose, authoritativeness and direction to other sources. The authors plan to develop automatic methods for categorizing Web pages using the common categories created by the subjects. lt is hoped that the techniques developed can be used by Web search engines to automatically organize Web pages retrieved into categories that are natural to users. 1. Introduction The World Wide Web is an increasingly important source of information for people globally because of its ease of access, the ease of publishing, its ability to transcend geographic and national boundaries, its flexibility and heterogeneity and its dynamic nature. However, Web users also find it increasingly difficult to locate relevant and useful information in this vast information storehouse. Web search engines, despite their scope and power, appear to be quite ineffective. They retrieve too many pages, and though they attempt to rank retrieved pages in order of probable relevance, often the relevant documents do not appear in the top-ranked 10 or 20 documents displayed. Several studies have found that users do not know how to use the advanced features of Web search engines, and do not know how to formulate and re-formulate queries. Users also typically exert minimal effort in performing, evaluating and refining their searches, and are unwilling to scan more than 10 or 20 items retrieved (Jansen, Spink, Bateman & Saracevic, 1998). This suggests that the conventional ranked-list display of search results does not satisfy user requirements, and that better ways of presenting and summarizing search results have to be developed. One promising approach is to group retrieved pages into clusters or categories to allow users to navigate immediately to the "promising" clusters where the most useful Web pages are likely to be located. This approach has been adopted by a number of search engines (notably Northem Light) and search agents.

Date

12. 9.2004 9:56:22
Wu, M.; Liu, Y.-H.; Brownlee, R.; Zhang, X.: Evaluating utility and automatic classification of subject metadata from Research Data Australia (2021) 0.03
```
0.027287092 = product of:
  0.054574184 = sum of:
    0.054574184 = product of:
      0.10914837 = sum of:
        0.10914837 = weight(_text_:subject in 453) [ClassicSimilarity], result of:
          0.10914837 = score(doc=453,freq=16.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.67061174 = fieldWeight in 453, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.046875 = fieldNorm(doc=453)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

In this paper, we present a case study of how well subject metadata (comprising headings from an international classification scheme) has been deployed in a national data catalogue, and how often data seekers use subject metadata when searching for data. Through an analysis of user search behaviour as recorded in search logs, we find evidence that users utilise the subject metadata for data discovery. Since approximately half of the records ingested by the catalogue did not include subject metadata at the time of harvest, we experimented with automatic subject classification approaches in order to enrich these records and to provide additional support for user search and data discovery. Our results show that automatic methods work well for well represented categories of subject metadata, and these categories tend to have features that can distinguish themselves from the other categories. Our findings raise implications for data catalogue providers; they should invest more effort to enhance the quality of data records by providing an adequate description of these records for under-represented subject categories.

Shafer, K.E.: Evaluating Scorpion results (1998) 0.02

0.022739245 = product of:
  0.04547849 = sum of:
    0.04547849 = product of:
      0.09095698 = sum of:
        0.09095698 = weight(_text_:subject in 1569) [ClassicSimilarity], result of:
          0.09095698 = score(doc=1569,freq=4.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.55884314 = fieldWeight in 1569, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.078125 = fieldNorm(doc=1569)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: Scorpion is a research project at OCLC that builds tools for automatic subject assignment by combining library science and information retrieval techniques. A thesis of Scorpion is that the Dewey Decimal Classification (Dewey) can be used to perform automatic subject assignment for electronic items.

Fang, H.: Classifying research articles in multidisciplinary sciences journals into subject categories (2015) 0.02
```
0.022739245 = product of:
  0.04547849 = sum of:
    0.04547849 = product of:
      0.09095698 = sum of:
        0.09095698 = weight(_text_:subject in 2194) [ClassicSimilarity], result of:
          0.09095698 = score(doc=2194,freq=16.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.55884314 = fieldWeight in 2194, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2194)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

In the Thomson Reuters Web of Science database, the subject categories of a journal are applied to all articles in the journal. However, many articles in multidisciplinary Sciences journals may only be represented by a small number of subject categories. To provide more accurate information on the research areas of articles in such journals, we can classify articles in these journals into subject categories as defined by Web of Science based on their references. For an article in a multidisciplinary sciences journal, the method counts the subject categories in all of the article's references indexed by Web of Science, and uses the most numerous subject categories of the references to determine the most appropriate classification of the article. We used articles in an issue of Proceedings of the National Academy of Sciences (PNAS) to validate the correctness of the method by comparing the obtained results with the categories of the articles as defined by PNAS and their content. This study shows that the method provides more precise search results for the subject category of interest in bibliometric investigations through recognition of articles in multidisciplinary sciences journals whose work relates to a particular subject category.
Godby, C.J.; Stuler, J.: ¬The Library of Congress Classification as a knowledge base for automatic subject categorization : subject access issues (2003) 0.02
```
0.022510704 = product of:
  0.045021407 = sum of:
    0.045021407 = product of:
      0.090042815 = sum of:
        0.090042815 = weight(_text_:subject in 3962) [ClassicSimilarity], result of:
          0.090042815 = score(doc=3962,freq=8.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.5532265 = fieldWeight in 3962, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3962)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

This paper describes a set of experiments in adapting a subset of the Library of Congress Classification for use as a database for automatic classification. A high degree of concept integrity was obtained when subject headings were mapped from OCLC's WorldCat database and filtered using the log-likelihood statistic.

Source

Subject retrieval in a networked environment: Proceedings of the IFLA Satellite Meeting held in Dublin, OH, 14-16 August 2001 and sponsored by the IFLA Classification and Indexing Section, the IFLA Information Technology Section and OCLC. Ed.: I.C. McIlwaine

Godby, C. J.; Stuler, J.: ¬The Library of Congress Classification as a knowledge base for automatic subject categorization (2001) 0.02

0.02227982 = product of:
  0.04455964 = sum of:
    0.04455964 = product of:
      0.08911928 = sum of:
        0.08911928 = weight(_text_:subject in 1567) [ClassicSimilarity], result of:
          0.08911928 = score(doc=1567,freq=6.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.5475522 = fieldWeight in 1567, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0625 = fieldNorm(doc=1567)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: This paper describes a set of experiments in adapting a subset of the Library of Congress Classification for use as a database for automatic classification. A high degree of concept integrity was obtained when subject headings were mapped from OCLC's WorldCat database and filtered using the log-likelihood statistic
Footnote: Paper, IFLA Preconference "Subject Retrieval in a Networked Environment", Dublin, OH, August 2001.

Golub, K.; Hansson, J.; Soergel, D.; Tudhope, D.: Managing classification in libraries : a methodological outline for evaluating automatic subject indexing and classification in Swedish library catalogues (2015) 0.02
```
0.019692764 = product of:
  0.039385527 = sum of:
    0.039385527 = product of:
      0.078771055 = sum of:
        0.078771055 = weight(_text_:subject in 2300) [ClassicSimilarity], result of:
          0.078771055 = score(doc=2300,freq=12.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.48397237 = fieldWeight in 2300, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2300)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Subject terms play a crucial role in resource discovery but require substantial effort to produce. Automatic subject classification and indexing address problems of scale and sustainability and can be used to enrich existing bibliographic records, establish more connections across and between resources and enhance consistency of bibliographic data. The paper aims to put forward a complex methodological framework to evaluate automatic classification tools of Swedish textual documents based on the Dewey Decimal Classification (DDC) recently introduced to Swedish libraries. Three major complementary approaches are suggested: a quality-built gold standard, retrieval effects, domain analysis. The gold standard is built based on input from at least two catalogue librarians, end-users expert in the subject, end users inexperienced in the subject and automated tools. Retrieval effects are studied through a combination of assigned and free tasks, including factual and comprehensive types. The study also takes into consideration the different role and character of subject terms in various knowledge domains, such as scientific disciplines. As a theoretical framework, domain analysis is used and applied in relation to the implementation of DDC in Swedish libraries and chosen domains of knowledge within the DDC itself.

Ardö, A.; Koch, T.: Automatic classification applied to full-text Internet documents in a robot-generated subject index (1999) 0.02

0.019294888 = product of:
  0.038589776 = sum of:
    0.038589776 = product of:
      0.07717955 = sum of:
        0.07717955 = weight(_text_:subject in 382) [ClassicSimilarity], result of:
          0.07717955 = score(doc=382,freq=2.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.4741941 = fieldWeight in 382, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.09375 = fieldNorm(doc=382)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Chan, L.M.; Lin, X.; Zeng, M.L.: Structural and multilingual approaches to subject access on the Web (2000) 0.02

0.019294888 = product of:
  0.038589776 = sum of:
    0.038589776 = product of:
      0.07717955 = sum of:
        0.07717955 = weight(_text_:subject in 507) [ClassicSimilarity], result of:
          0.07717955 = score(doc=507,freq=2.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.4741941 = fieldWeight in 507, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.09375 = fieldNorm(doc=507)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Shafer, K.E.: Automatic Subject Assignment via the Scorpion System (2001) 0.02

0.019294888 = product of:
  0.038589776 = sum of:
    0.038589776 = product of:
      0.07717955 = sum of:
        0.07717955 = weight(_text_:subject in 1043) [ClassicSimilarity], result of:
          0.07717955 = score(doc=1043,freq=2.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.4741941 = fieldWeight in 1043, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.09375 = fieldNorm(doc=1043)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Chung, Y.-M.; Noh, Y.-H.: Developing a specialized directory system by automatically classifying Web documents (2003) 0.02
```
0.019294888 = product of:
  0.038589776 = sum of:
    0.038589776 = product of:
      0.07717955 = sum of:
        0.07717955 = weight(_text_:subject in 1566) [ClassicSimilarity], result of:
          0.07717955 = score(doc=1566,freq=8.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.4741941 = fieldWeight in 1566, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.046875 = fieldNorm(doc=1566)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

This study developed a specialized directory system using an automatic classification technique. Economics was selected as the subject field for the classification experiments with Web documents. The classification scheme of the directory follows the DDC, and subject terms representing each class number or subject category were selected from the DDC table to construct a representative term dictionary. In collecting and classifying the Web documents, various strategies were tested in order to find the optimal thresholds. In the classification experiments, Web documents in economics were classified into a total of 757 hierarchical subject categories built from the DDC scheme. The first and second experiments using the representative term dictionary resulted in relatively high precision ratios of 77 and 60%, respectively. The third experiment employing a machine learning-based k-nearest neighbours (kNN) classifier in a closed experimental setting achieved a precision ratio of 96%. This implies that it is possible to enhance the classification performance by applying a hybrid method combining a dictionary-based technique and a kNN classifier

Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.02

0.018496625 = product of:
  0.03699325 = sum of:
    0.03699325 = product of:
      0.0739865 = sum of:
        0.0739865 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
          0.0739865 = score(doc=1046,freq=2.0), product of:
            0.15935703 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04550679 = queryNorm
            0.46428138 = fieldWeight in 1046, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=1046)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 5. 5.2003 14:17:22

Koch, T.; Ardö, A.: Automatic classification of full-text HTML-documents from one specific subject area : DESIRE II D3.6a, Working Paper 2 (2000) 0.02

0.018191395 = product of:
  0.03638279 = sum of:
    0.03638279 = product of:
      0.07276558 = sum of:
        0.07276558 = weight(_text_:subject in 1667) [ClassicSimilarity], result of:
          0.07276558 = score(doc=1667,freq=4.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.4470745 = fieldWeight in 1667, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0625 = fieldNorm(doc=1667)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Content: 1 Introduction / 2 Method overview / 3 Ei thesaurus preprocessing / 4 Automatic classification process: 4.1 Matching -- 4.2 Weighting -- 4.3 Preparation for display / 5 Results of the classification process / 6 Evaluations / 7 Software / 8 Other applications / 9 Experiments with universal classification systems / References / Appendix A: Ei classification service: Software / Appendix B: Use of the classification software as subject filter in a WWW harvester.

Reiner, U.: Automatische DDC-Klassifizierung von bibliografischen Titeldatensätzen (2009) 0.02

0.015413855 = product of:
  0.03082771 = sum of:
    0.03082771 = product of:
      0.06165542 = sum of:
        0.06165542 = weight(_text_:22 in 611) [ClassicSimilarity], result of:
          0.06165542 = score(doc=611,freq=2.0), product of:
            0.15935703 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04550679 = queryNorm
            0.38690117 = fieldWeight in 611, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=611)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 8.2009 12:54:24

HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.02

0.015413855 = product of:
  0.03082771 = sum of:
    0.03082771 = product of:
      0.06165542 = sum of:
        0.06165542 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
          0.06165542 = score(doc=2748,freq=2.0), product of:
            0.15935703 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04550679 = queryNorm
            0.38690117 = fieldWeight in 2748, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2748)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 1. 2.2016 18:25:22

Wartena, C.; Sommer, M.: Automatic classification of scientific records using the German Subject Heading Authority File (SWD) (2012) 0.01
```
0.0139248865 = product of:
  0.027849773 = sum of:
    0.027849773 = product of:
      0.055699546 = sum of:
        0.055699546 = weight(_text_:subject in 472) [ClassicSimilarity], result of:
          0.055699546 = score(doc=472,freq=6.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.34222013 = fieldWeight in 472, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0390625 = fieldNorm(doc=472)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The following paper deals with an automatic text classification method which does not require training documents. For this method the German Subject Heading Authority File (SWD), provided by the linked data service of the German National Library is used. Recently the SWD was enriched with notations of the Dewey Decimal Classification (DDC). In consequence it became possible to utilize the subject headings as textual representations for the notations of the DDC. Basically, we we derive the classification of a text from the classification of the words in the text given by the thesaurus. The method was tested by classifying 3826 OAI-Records from 7 different repositories. Mean reciprocal rank and recall were chosen as evaluation measure. Direct comparison to a machine learning method has shown that this method is definitely competitive. Thus we can conclude that the enriched version of the SWD provides high quality information with a broad coverage for classification of German scientific articles.
Ahmed, M.; Mukhopadhyay, M.; Mukhopadhyay, P.: Automated knowledge organization : AI ML based subject indexing system for libraries (2023) 0.01
```
0.0139248865 = product of:
  0.027849773 = sum of:
    0.027849773 = product of:
      0.055699546 = sum of:
        0.055699546 = weight(_text_:subject in 977) [ClassicSimilarity], result of:
          0.055699546 = score(doc=977,freq=6.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.34222013 = fieldWeight in 977, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0390625 = fieldNorm(doc=977)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The research study as reported here is an attempt to explore the possibilities of an AI/ML-based semi-automated indexing system in a library setup to handle large volumes of documents. It uses the Python virtual environment to install and configure an open source AI environment (named Annif) to feed the LOD (Linked Open Data) dataset of Library of Congress Subject Headings (LCSH) as a standard KOS (Knowledge Organisation System). The framework deployed the Turtle format of LCSH after cleaning the file with Skosify, applied an array of backend algorithms (namely TF-IDF, Omikuji, and NN-Ensemble) to measure relative performance, and selected Snowball as an analyser. The training of Annif was conducted with a large set of bibliographic records populated with subject descriptors (MARC tag 650$a) and indexed by trained LIS professionals. The training dataset is first treated with MarcEdit to export it in a format suitable for OpenRefine, and then in OpenRefine it undergoes many steps to produce a bibliographic record set suitable to train Annif. The framework, after training, has been tested with a bibliographic dataset to measure indexing efficiencies, and finally, the automated indexing framework is integrated with data wrangling software (OpenRefine) to produce suggested headings on a mass scale. The entire framework is based on open-source software, open datasets, and open standards.
Koch, T.; Ardö, A.; Noodén, L.: ¬The construction of a robot-generated subject index : DESIRE II D3.6a, Working Paper 1 (1999) 0.01
```
0.013643546 = product of:
  0.027287092 = sum of:
    0.027287092 = product of:
      0.054574184 = sum of:
        0.054574184 = weight(_text_:subject in 1668) [ClassicSimilarity], result of:
          0.054574184 = score(doc=1668,freq=4.0), product of:
            0.16275941 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.04550679 = queryNorm
            0.33530587 = fieldWeight in 1668, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.046875 = fieldNorm(doc=1668)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

This working paper describes the creation of a test database to carry out the automatic classification tasks of the DESIRE II work package D3.6a on. It is an improved version of NetLab's existing "All" Engineering database created after a comparative study of the outcome of two different approaches to collecting the documents. These two methods were selected from seven different general methodologies to build robot-generated subject indices, presented in this paper. We found a surprisingly low overlap between the Engineering link collections we used as seed pages for the robot and subsequently an even more surprisingly low overlap between the resources collected by the two different approaches. That inspite of using basically the same services to start the harvesting process from. A intellectual evaluation of the contents of both databases showed almost exactly the same percentage of relevant documents (77%), indicating that the main difference between those aproaches was the coverage of the resulting database.

Search (57 results, page 1 of 3)

Authors

Years

Languages

Types

Themes

Subjects