Search (54 results, page 1 of 3)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.07

0.06799838 = product of:
  0.10199757 = sum of:
    0.08121385 = product of:
      0.24364153 = sum of:
        0.24364153 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.24364153 = score(doc=562,freq=2.0), product of:
            0.43351194 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.051133685 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.020783724 = product of:
      0.04156745 = sum of:
        0.04156745 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.04156745 = score(doc=562,freq=2.0), product of:
            0.17906146 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051133685 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Montesi, M.; Navarrete, T.: Classifying web genres in context : A case study documenting the web genres used by a software engineer (2008) 0.06

0.055421554 = product of:
  0.08313233 = sum of:
    0.063876994 = weight(_text_:resources in 2100) [ClassicSimilarity], result of:
      0.063876994 = score(doc=2100,freq=4.0), product of:
        0.18665522 = queryWeight, product of:
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.051133685 = queryNorm
        0.34221917 = fieldWeight in 2100, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.046875 = fieldNorm(doc=2100)
    0.01925533 = product of:
      0.03851066 = sum of:
        0.03851066 = weight(_text_:management in 2100) [ClassicSimilarity], result of:
          0.03851066 = score(doc=2100,freq=2.0), product of:
            0.17235184 = queryWeight, product of:
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.051133685 = queryNorm
            0.22344214 = fieldWeight in 2100, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.046875 = fieldNorm(doc=2100)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: This case study analyzes the Internet-based resources that a software engineer uses in his daily work. Methodologically, we studied the web browser history of the participant, classifying all the web pages he had seen over a period of 12 days into web genres. We interviewed him before and after the analysis of the web browser history. In the first interview, he spoke about his general information behavior; in the second, he commented on each web genre, explaining why and how he used them. As a result, three approaches allow us to describe the set of 23 web genres obtained: (a) the purposes they serve for the participant; (b) the role they play in the various work and search phases; (c) and the way they are used in combination with each other. Further observations concern the way the participant assesses quality of web-based resources, and his information behavior as a software engineer.
Source: Information processing and management. 44(2008) no.4, S.1410-1430

Dubin, D.: Dimensions and discriminability (1998) 0.05

0.05129568 = product of:
  0.07694352 = sum of:
    0.052695833 = weight(_text_:resources in 2338) [ClassicSimilarity], result of:
      0.052695833 = score(doc=2338,freq=2.0), product of:
        0.18665522 = queryWeight, product of:
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.051133685 = queryNorm
        0.28231642 = fieldWeight in 2338, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2338)
    0.02424768 = product of:
      0.04849536 = sum of:
        0.04849536 = weight(_text_:22 in 2338) [ClassicSimilarity], result of:
          0.04849536 = score(doc=2338,freq=2.0), product of:
            0.17906146 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051133685 = queryNorm
            0.2708308 = fieldWeight in 2338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2338)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Date: 22. 9.1997 19:16:05
Source: Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al

Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998) 0.05

0.05129568 = product of:
  0.07694352 = sum of:
    0.052695833 = weight(_text_:resources in 1673) [ClassicSimilarity], result of:
      0.052695833 = score(doc=1673,freq=2.0), product of:
        0.18665522 = queryWeight, product of:
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.051133685 = queryNorm
        0.28231642 = fieldWeight in 1673, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1673)
    0.02424768 = product of:
      0.04849536 = sum of:
        0.04849536 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
          0.04849536 = score(doc=1673,freq=2.0), product of:
            0.17906146 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051133685 = queryNorm
            0.2708308 = fieldWeight in 1673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1673)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Date: 1. 8.1996 22:08:06

Yi, K.: Automatic text classification using library classification schemes : trends, issues and challenges (2007) 0.05

0.05129568 = product of:
  0.07694352 = sum of:
    0.052695833 = weight(_text_:resources in 2560) [ClassicSimilarity], result of:
      0.052695833 = score(doc=2560,freq=2.0), product of:
        0.18665522 = queryWeight, product of:
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.051133685 = queryNorm
        0.28231642 = fieldWeight in 2560, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2560)
    0.02424768 = product of:
      0.04849536 = sum of:
        0.04849536 = weight(_text_:22 in 2560) [ClassicSimilarity], result of:
          0.04849536 = score(doc=2560,freq=2.0), product of:
            0.17906146 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051133685 = queryNorm
            0.2708308 = fieldWeight in 2560, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2560)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: The proliferation of digital resources and their integration into a traditional library setting has created a pressing need for an automated tool that organizes textual information based on library classification schemes. Automated text classification is a research field of developing tools, methods, and models to automate text classification. This article describes the current popular approach for text classification and major text classification projects and applications that are based on library classification schemes. Related issues and challenges are discussed, and a number of considerations for the challenges are examined.
Date: 22. 9.2008 18:31:54

Yilmaz, T.; Ozcan, R.; Altingovde, I.S.; Ulusoy, Ö.: Improving educational web search for question-like queries through subject classification (2019) 0.04
```
0.035790663 = product of:
  0.053685993 = sum of:
    0.037639882 = weight(_text_:resources in 5041) [ClassicSimilarity], result of:
      0.037639882 = score(doc=5041,freq=2.0), product of:
        0.18665522 = queryWeight, product of:
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.051133685 = queryNorm
        0.20165458 = fieldWeight in 5041, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5041)
    0.016046109 = product of:
      0.032092217 = sum of:
        0.032092217 = weight(_text_:management in 5041) [ClassicSimilarity], result of:
          0.032092217 = score(doc=5041,freq=2.0), product of:
            0.17235184 = queryWeight, product of:
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.051133685 = queryNorm
            0.18620178 = fieldWeight in 5041, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5041)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Students use general web search engines as their primary source of research while trying to find answers to school-related questions. Although search engines are highly relevant for the general population, they may return results that are out of educational context. Another rising trend; social community question answering websites are the second choice for students who try to get answers from other peers online. We attempt discovering possible improvements in educational search by leveraging both of these information sources. For this purpose, we first implement a classifier for educational questions. This classifier is built by an ensemble method that employs several regular learning algorithms and retrieval based approaches that utilize external resources. We also build a query expander to facilitate classification. We further improve the classification using search engine results and obtain 83.5% accuracy. Although our work is entirely based on the Turkish language, the features could easily be mapped to other languages as well. In order to find out whether search engine ranking can be improved in the education domain using the classification model, we collect and label a set of query results retrieved from a general web search engine. We propose five ad-hoc methods to improve search ranking based on the idea that the query-document category relation is an indicator of relevance. We evaluate these methods for overall performance, varying query length and based on factoid and non-factoid queries. We show that some of the methods significantly improve the rankings in the education domain.

Source

Information processing and management. 56(2019) no.1, S.228-246

McKiernan, G.: Automated categorisation of Web resources : a profile of selected projects, research, products, and services (1996) 0.04

0.035487223 = product of:
  0.10646167 = sum of:
    0.10646167 = weight(_text_:resources in 2533) [ClassicSimilarity], result of:
      0.10646167 = score(doc=2533,freq=4.0), product of:
        0.18665522 = queryWeight, product of:
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.051133685 = queryNorm
        0.5703653 = fieldWeight in 2533, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.078125 = fieldNorm(doc=2533)
  0.33333334 = coord(1/3)

Abstract: Profiles several representative current efforts that apply established as well as more innovative methods of automated classification, organization or other method of categorisation of WWW resources

Hagedorn, K.; Chapman, S.; Newman, D.: Enhancing search and browse using automated clustering of subject metadata (2007) 0.03
```
0.030111905 = product of:
  0.09033571 = sum of:
    0.09033571 = weight(_text_:resources in 1168) [ClassicSimilarity], result of:
      0.09033571 = score(doc=1168,freq=8.0), product of:
        0.18665522 = queryWeight, product of:
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.051133685 = queryNorm
        0.483971 = fieldWeight in 1168, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.046875 = fieldNorm(doc=1168)
  0.33333334 = coord(1/3)
```
Abstract

The Web puzzle of online information resources often hinders end-users from effective and efficient access to these resources. Clustering resources into appropriate subject-based groupings may help alleviate these difficulties, but will it work with heterogeneous material? The University of Michigan and the University of California Irvine joined forces to test automatically enhancing metadata records using the Topic Modeling algorithm on the varied OAIster corpus. We created labels for the resulting clusters of metadata records, matched the clusters to an in-house classification system, and developed a prototype that would showcase methods for search and retrieval using the enhanced records. Results indicated that while the algorithm was somewhat time-intensive to run and using a local classification scheme had its drawbacks, precise clustering of records was achieved and the prototype interface proved that faceted classification could be powerful in helping end-users find resources.

Lindholm, J.; Schönthal, T.; Jansson , K.: Experiences of harvesting Web resources in engineering using automatic classification (2003) 0.03

0.028389778 = product of:
  0.08516933 = sum of:
    0.08516933 = weight(_text_:resources in 4088) [ClassicSimilarity], result of:
      0.08516933 = score(doc=4088,freq=4.0), product of:
        0.18665522 = queryWeight, product of:
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.051133685 = queryNorm
        0.45629224 = fieldWeight in 4088, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.0625 = fieldNorm(doc=4088)
  0.33333334 = coord(1/3)

Abstract: Authors describe the background and the work involved in setting up Engine-e, a Web index that uses automatic classification as a mean for the selection of resources in Engineering. Considerations in offering a robot-generated Web index as a successor to a manually indexed quality-controlled subject gateway are also discussed

Liu, R.-L.: Context recognition for hierarchical text classification (2009) 0.03
```
0.026692703 = product of:
  0.08007811 = sum of:
    0.08007811 = sum of:
      0.03851066 = weight(_text_:management in 2760) [ClassicSimilarity], result of:
        0.03851066 = score(doc=2760,freq=2.0), product of:
          0.17235184 = queryWeight, product of:
            3.3706124 = idf(docFreq=4130, maxDocs=44218)
            0.051133685 = queryNorm
          0.22344214 = fieldWeight in 2760, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.3706124 = idf(docFreq=4130, maxDocs=44218)
            0.046875 = fieldNorm(doc=2760)
      0.04156745 = weight(_text_:22 in 2760) [ClassicSimilarity], result of:
        0.04156745 = score(doc=2760,freq=2.0), product of:
          0.17906146 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051133685 = queryNorm
          0.23214069 = fieldWeight in 2760, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=2760)
  0.33333334 = coord(1/3)
```
Abstract

Information is often organized as a text hierarchy. A hierarchical text-classification system is thus essential for the management, sharing, and dissemination of information. It aims to automatically classify each incoming document into zero, one, or several categories in the text hierarchy. In this paper, we present a technique called CRHTC (context recognition for hierarchical text classification) that performs hierarchical text classification by recognizing the context of discussion (COD) of each category. A category's COD is governed by its ancestor categories, whose contents indicate contextual backgrounds of the category. A document may be classified into a category only if its content matches the category's COD. CRHTC does not require any trials to manually set parameters, and hence is more portable and easier to implement than other methods. It is empirically evaluated under various conditions. The results show that CRHTC achieves both better and more stable performance than several hierarchical and nonhierarchical text-classification methodologies.

Date

22. 3.2009 19:11:54

Shafer, K.E.: Evaluating Scorpion Results (2001) 0.03

0.025093256 = product of:
  0.075279765 = sum of:
    0.075279765 = weight(_text_:resources in 4085) [ClassicSimilarity], result of:
      0.075279765 = score(doc=4085,freq=2.0), product of:
        0.18665522 = queryWeight, product of:
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.051133685 = queryNorm
        0.40330917 = fieldWeight in 4085, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.078125 = fieldNorm(doc=4085)
  0.33333334 = coord(1/3)

Abstract: Using DDC for automatic indexing and classifying of Internet resources

Han, K.; Rezapour, R.; Nakamura, K.; Devkota, D.; Miller, D.C.; Diesner, J.: ¬An expert-in-the-loop method for domain-specific document categorization based on small training data (2023) 0.02
```
0.017743612 = product of:
  0.053230833 = sum of:
    0.053230833 = weight(_text_:resources in 967) [ClassicSimilarity], result of:
      0.053230833 = score(doc=967,freq=4.0), product of:
        0.18665522 = queryWeight, product of:
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.051133685 = queryNorm
        0.28518265 = fieldWeight in 967, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.0390625 = fieldNorm(doc=967)
  0.33333334 = coord(1/3)
```
Abstract

Automated text categorization methods are of broad relevance for domain experts since they free researchers and practitioners from manual labeling, save their resources (e.g., time, labor), and enrich the data with information helpful to study substantive questions. Despite a variety of newly developed categorization methods that require substantial amounts of annotated data, little is known about how to build models when (a) labeling texts with categories requires substantial domain expertise and/or in-depth reading, (b) only a few annotated documents are available for model training, and (c) no relevant computational resources, such as pretrained models, are available. In a collaboration with environmental scientists who study the socio-ecological impact of funded biodiversity conservation projects, we develop a method that integrates deep domain expertise with computational models to automatically categorize project reports based on a small sample of 93 annotated documents. Our results suggest that domain expertise can improve automated categorization and that the magnitude of these improvements is influenced by the experts' understanding of categories and their confidence in their annotation, as well as data sparsity and additional category characteristics such as the portion of exclusive keywords that can identify a category.
Koch, T.: Experiments with automatic classification of WAIS databases and indexing of WWW : some results from the Nordic WAIS/WWW project (1994) 0.02
```
0.017565278 = product of:
  0.052695833 = sum of:
    0.052695833 = weight(_text_:resources in 7209) [ClassicSimilarity], result of:
      0.052695833 = score(doc=7209,freq=2.0), product of:
        0.18665522 = queryWeight, product of:
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.051133685 = queryNorm
        0.28231642 = fieldWeight in 7209, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7209)
  0.33333334 = coord(1/3)
```
Abstract

The Nordic WAIS/WWW project sponsored by NORDINFO is a joint project between Lund University Library and the National Technological Library of Denmark. It aims to improve the existing networked information discovery and retrieval tools Wide Area Information System (WAIS) and World Wide Web (WWW), and to move towards unifying WWW and WAIS. Details current results focusing on the WAIS side of the project. Describes research into automatic indexing and classification of WAIS sources, development of an orientation tool for WAIS, and development of a WAIS index of WWW resources

Kwok, K.L.: ¬The use of titles and cited titles as document representations for automatic classification (1975) 0.01

0.014976369 = product of:
  0.044929106 = sum of:
    0.044929106 = product of:
      0.08985821 = sum of:
        0.08985821 = weight(_text_:management in 4347) [ClassicSimilarity], result of:
          0.08985821 = score(doc=4347,freq=2.0), product of:
            0.17235184 = queryWeight, product of:
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.051133685 = queryNorm
            0.521365 = fieldWeight in 4347, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.109375 = fieldNorm(doc=4347)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Source: Information processing and management. 11(1975), S.201-206

Wu, M.; Fuller, M.; Wilkinson, R.: Using clustering and classification approaches in interactive retrieval (2001) 0.01

0.014976369 = product of:
  0.044929106 = sum of:
    0.044929106 = product of:
      0.08985821 = sum of:
        0.08985821 = weight(_text_:management in 2666) [ClassicSimilarity], result of:
          0.08985821 = score(doc=2666,freq=2.0), product of:
            0.17235184 = queryWeight, product of:
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.051133685 = queryNorm
            0.521365 = fieldWeight in 2666, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.109375 = fieldNorm(doc=2666)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Source: Information processing and management. 37(2001) no.3, S.459-484

Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.01

0.013855817 = product of:
  0.04156745 = sum of:
    0.04156745 = product of:
      0.0831349 = sum of:
        0.0831349 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
          0.0831349 = score(doc=1046,freq=2.0), product of:
            0.17906146 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051133685 = queryNorm
            0.46428138 = fieldWeight in 1046, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=1046)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 5. 5.2003 14:17:22

Major, R.L.; Ragsdale, C.T.: ¬An aggregation approach to the classification problem using multiple prediction experts (2000) 0.01

0.0128368875 = product of:
  0.03851066 = sum of:
    0.03851066 = product of:
      0.07702132 = sum of:
        0.07702132 = weight(_text_:management in 3789) [ClassicSimilarity], result of:
          0.07702132 = score(doc=3789,freq=2.0), product of:
            0.17235184 = queryWeight, product of:
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.051133685 = queryNorm
            0.44688427 = fieldWeight in 3789, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.09375 = fieldNorm(doc=3789)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Source: Information processing and management. 36(2000) no.4, S.683-696

Krellenstein, M.: Document classification at Northern Light (1999) 0.01

0.0128368875 = product of:
  0.03851066 = sum of:
    0.03851066 = product of:
      0.07702132 = sum of:
        0.07702132 = weight(_text_:management in 4435) [ClassicSimilarity], result of:
          0.07702132 = score(doc=4435,freq=2.0), product of:
            0.17235184 = queryWeight, product of:
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.051133685 = queryNorm
            0.44688427 = fieldWeight in 4435, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.09375 = fieldNorm(doc=4435)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Footnote: Vortrag bei: Search engines and beyond: developing efficient knowledge management systems; 1999 Search engine Meeting, Boston, MA, April 19-20 1999

Golub, K.: Automated subject classification of textual web documents (2006) 0.01
```
0.012546628 = product of:
  0.037639882 = sum of:
    0.037639882 = weight(_text_:resources in 5600) [ClassicSimilarity], result of:
      0.037639882 = score(doc=5600,freq=2.0), product of:
        0.18665522 = queryWeight, product of:
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.051133685 = queryNorm
        0.20165458 = fieldWeight in 5600, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5600)
  0.33333334 = coord(1/3)
```
Abstract

Purpose - To provide an integrated perspective to similarities and differences between approaches to automated classification in different research communities (machine learning, information retrieval and library science), and point to problems with the approaches and automated classification as such. Design/methodology/approach - A range of works dealing with automated classification of full-text web documents are discussed. Explorations of individual approaches are given in the following sections: special features (description, differences, evaluation), application and characteristics of web pages. Findings - Provides major similarities and differences between the three approaches: document pre-processing and utilization of web-specific document characteristics is common to all the approaches; major differences are in applied algorithms, employment or not of the vector space model and of controlled vocabularies. Problems of automated classification are recognized. Research limitations/implications - The paper does not attempt to provide an exhaustive bibliography of related resources. Practical implications - As an integrated overview of approaches from different research communities with application examples, it is very useful for students in library and information science and computer science, as well as for practitioners. Researchers from one community have the information on how similar tasks are conducted in different communities. Originality/value - To the author's knowledge, no review paper on automated text classification attempted to discuss more than one community's approach from an integrated perspective.
Golub, K.; Hansson, J.; Soergel, D.; Tudhope, D.: Managing classification in libraries : a methodological outline for evaluating automatic subject indexing and classification in Swedish library catalogues (2015) 0.01
```
0.012546628 = product of:
  0.037639882 = sum of:
    0.037639882 = weight(_text_:resources in 2300) [ClassicSimilarity], result of:
      0.037639882 = score(doc=2300,freq=2.0), product of:
        0.18665522 = queryWeight, product of:
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.051133685 = queryNorm
        0.20165458 = fieldWeight in 2300, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2300)
  0.33333334 = coord(1/3)
```
Abstract

Subject terms play a crucial role in resource discovery but require substantial effort to produce. Automatic subject classification and indexing address problems of scale and sustainability and can be used to enrich existing bibliographic records, establish more connections across and between resources and enhance consistency of bibliographic data. The paper aims to put forward a complex methodological framework to evaluate automatic classification tools of Swedish textual documents based on the Dewey Decimal Classification (DDC) recently introduced to Swedish libraries. Three major complementary approaches are suggested: a quality-built gold standard, retrieval effects, domain analysis. The gold standard is built based on input from at least two catalogue librarians, end-users expert in the subject, end users inexperienced in the subject and automated tools. Retrieval effects are studied through a combination of assigned and free tasks, including factual and comprehensive types. The study also takes into consideration the different role and character of subject terms in various knowledge domains, such as scientific disciplines. As a theoretical framework, domain analysis is used and applied in relation to the implementation of DDC in Swedish libraries and chosen domains of knowledge within the DDC itself.

Search (54 results, page 1 of 3)

Authors

Years

Languages

Themes