Search (61 results, page 1 of 4)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.07

0.06799838 = product of:
  0.10199757 = sum of:
    0.08121385 = product of:
      0.24364153 = sum of:
        0.24364153 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.24364153 = score(doc=562,freq=2.0), product of:
            0.43351194 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.051133685 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.020783724 = product of:
      0.04156745 = sum of:
        0.04156745 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.04156745 = score(doc=562,freq=2.0), product of:
            0.17906146 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051133685 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Montesi, M.; Navarrete, T.: Classifying web genres in context : A case study documenting the web genres used by a software engineer (2008) 0.06

0.055421554 = product of:
  0.08313233 = sum of:
    0.063876994 = weight(_text_:resources in 2100) [ClassicSimilarity], result of:
      0.063876994 = score(doc=2100,freq=4.0), product of:
        0.18665522 = queryWeight, product of:
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.051133685 = queryNorm
        0.34221917 = fieldWeight in 2100, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.046875 = fieldNorm(doc=2100)
    0.01925533 = product of:
      0.03851066 = sum of:
        0.03851066 = weight(_text_:management in 2100) [ClassicSimilarity], result of:
          0.03851066 = score(doc=2100,freq=2.0), product of:
            0.17235184 = queryWeight, product of:
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.051133685 = queryNorm
            0.22344214 = fieldWeight in 2100, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.046875 = fieldNorm(doc=2100)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: This case study analyzes the Internet-based resources that a software engineer uses in his daily work. Methodologically, we studied the web browser history of the participant, classifying all the web pages he had seen over a period of 12 days into web genres. We interviewed him before and after the analysis of the web browser history. In the first interview, he spoke about his general information behavior; in the second, he commented on each web genre, explaining why and how he used them. As a result, three approaches allow us to describe the set of 23 web genres obtained: (a) the purposes they serve for the participant; (b) the role they play in the various work and search phases; (c) and the way they are used in combination with each other. Further observations concern the way the participant assesses quality of web-based resources, and his information behavior as a software engineer.
Source: Information processing and management. 44(2008) no.4, S.1410-1430

Dubin, D.: Dimensions and discriminability (1998) 0.05

0.05129568 = product of:
  0.07694352 = sum of:
    0.052695833 = weight(_text_:resources in 2338) [ClassicSimilarity], result of:
      0.052695833 = score(doc=2338,freq=2.0), product of:
        0.18665522 = queryWeight, product of:
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.051133685 = queryNorm
        0.28231642 = fieldWeight in 2338, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2338)
    0.02424768 = product of:
      0.04849536 = sum of:
        0.04849536 = weight(_text_:22 in 2338) [ClassicSimilarity], result of:
          0.04849536 = score(doc=2338,freq=2.0), product of:
            0.17906146 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051133685 = queryNorm
            0.2708308 = fieldWeight in 2338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2338)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Date: 22. 9.1997 19:16:05
Source: Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al

Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998) 0.05

0.05129568 = product of:
  0.07694352 = sum of:
    0.052695833 = weight(_text_:resources in 1673) [ClassicSimilarity], result of:
      0.052695833 = score(doc=1673,freq=2.0), product of:
        0.18665522 = queryWeight, product of:
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.051133685 = queryNorm
        0.28231642 = fieldWeight in 1673, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1673)
    0.02424768 = product of:
      0.04849536 = sum of:
        0.04849536 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
          0.04849536 = score(doc=1673,freq=2.0), product of:
            0.17906146 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051133685 = queryNorm
            0.2708308 = fieldWeight in 1673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1673)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Date: 1. 8.1996 22:08:06

Yi, K.: Automatic text classification using library classification schemes : trends, issues and challenges (2007) 0.05

0.05129568 = product of:
  0.07694352 = sum of:
    0.052695833 = weight(_text_:resources in 2560) [ClassicSimilarity], result of:
      0.052695833 = score(doc=2560,freq=2.0), product of:
        0.18665522 = queryWeight, product of:
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.051133685 = queryNorm
        0.28231642 = fieldWeight in 2560, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2560)
    0.02424768 = product of:
      0.04849536 = sum of:
        0.04849536 = weight(_text_:22 in 2560) [ClassicSimilarity], result of:
          0.04849536 = score(doc=2560,freq=2.0), product of:
            0.17906146 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051133685 = queryNorm
            0.2708308 = fieldWeight in 2560, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2560)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: The proliferation of digital resources and their integration into a traditional library setting has created a pressing need for an automated tool that organizes textual information based on library classification schemes. Automated text classification is a research field of developing tools, methods, and models to automate text classification. This article describes the current popular approach for text classification and major text classification projects and applications that are based on library classification schemes. Related issues and challenges are discussed, and a number of considerations for the challenges are examined.
Date: 22. 9.2008 18:31:54

Adams, K.C.: Word wranglers : Automatic classification tools transform enterprise documents from "bags of words" into knowledge resources (2003) 0.04
```
0.035790663 = product of:
  0.053685993 = sum of:
    0.037639882 = weight(_text_:resources in 1665) [ClassicSimilarity], result of:
      0.037639882 = score(doc=1665,freq=2.0), product of:
        0.18665522 = queryWeight, product of:
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.051133685 = queryNorm
        0.20165458 = fieldWeight in 1665, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1665)
    0.016046109 = product of:
      0.032092217 = sum of:
        0.032092217 = weight(_text_:management in 1665) [ClassicSimilarity], result of:
          0.032092217 = score(doc=1665,freq=2.0), product of:
            0.17235184 = queryWeight, product of:
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.051133685 = queryNorm
            0.18620178 = fieldWeight in 1665, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1665)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Taxonomies are an important part of any knowledge management (KM) system, and automatic classification software is emerging as a "killer app" for consumer and enterprise portals. A number of companies such as Inxight Software , Mohomine, Metacode, and others claim to interpret the semantic content of any textual document and automatically classify text on the fly. The promise that software could automatically produce a Yahoo-style directory is a siren call not many IT managers are able to resist. KM needs have grown more complex due to the increasing amount of digital information, the declining effectiveness of keyword searching, and heterogeneous document formats in corporate databases. This environment requires innovative KM tools, and automatic classification technology is an example of this new kind of software. These products can be divided into three categories according to their underlying technology - rules-based, catalog-by-example, and statistical clustering. Evolving trends in this market include framing classification as a cyborg (computer- and human-based) activity and the increasing use of extensible markup language (XML) and support vector machine (SVM) technology. In this article, we'll survey the rapidly changing automatic classification software market and examine the features and capabilities of leading classification products.
Yilmaz, T.; Ozcan, R.; Altingovde, I.S.; Ulusoy, Ö.: Improving educational web search for question-like queries through subject classification (2019) 0.04
```
0.035790663 = product of:
  0.053685993 = sum of:
    0.037639882 = weight(_text_:resources in 5041) [ClassicSimilarity], result of:
      0.037639882 = score(doc=5041,freq=2.0), product of:
        0.18665522 = queryWeight, product of:
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.051133685 = queryNorm
        0.20165458 = fieldWeight in 5041, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5041)
    0.016046109 = product of:
      0.032092217 = sum of:
        0.032092217 = weight(_text_:management in 5041) [ClassicSimilarity], result of:
          0.032092217 = score(doc=5041,freq=2.0), product of:
            0.17235184 = queryWeight, product of:
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.051133685 = queryNorm
            0.18620178 = fieldWeight in 5041, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5041)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Students use general web search engines as their primary source of research while trying to find answers to school-related questions. Although search engines are highly relevant for the general population, they may return results that are out of educational context. Another rising trend; social community question answering websites are the second choice for students who try to get answers from other peers online. We attempt discovering possible improvements in educational search by leveraging both of these information sources. For this purpose, we first implement a classifier for educational questions. This classifier is built by an ensemble method that employs several regular learning algorithms and retrieval based approaches that utilize external resources. We also build a query expander to facilitate classification. We further improve the classification using search engine results and obtain 83.5% accuracy. Although our work is entirely based on the Turkish language, the features could easily be mapped to other languages as well. In order to find out whether search engine ranking can be improved in the education domain using the classification model, we collect and label a set of query results retrieved from a general web search engine. We propose five ad-hoc methods to improve search ranking based on the idea that the query-document category relation is an indicator of relevance. We evaluate these methods for overall performance, varying query length and based on factoid and non-factoid queries. We show that some of the methods significantly improve the rankings in the education domain.

Source

Information processing and management. 56(2019) no.1, S.228-246

McKiernan, G.: Automated categorisation of Web resources : a profile of selected projects, research, products, and services (1996) 0.04

0.035487223 = product of:
  0.10646167 = sum of:
    0.10646167 = weight(_text_:resources in 2533) [ClassicSimilarity], result of:
      0.10646167 = score(doc=2533,freq=4.0), product of:
        0.18665522 = queryWeight, product of:
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.051133685 = queryNorm
        0.5703653 = fieldWeight in 2533, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.078125 = fieldNorm(doc=2533)
  0.33333334 = coord(1/3)

Abstract: Profiles several representative current efforts that apply established as well as more innovative methods of automated classification, organization or other method of categorisation of WWW resources

Hagedorn, K.; Chapman, S.; Newman, D.: Enhancing search and browse using automated clustering of subject metadata (2007) 0.03
```
0.030111905 = product of:
  0.09033571 = sum of:
    0.09033571 = weight(_text_:resources in 1168) [ClassicSimilarity], result of:
      0.09033571 = score(doc=1168,freq=8.0), product of:
        0.18665522 = queryWeight, product of:
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.051133685 = queryNorm
        0.483971 = fieldWeight in 1168, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.046875 = fieldNorm(doc=1168)
  0.33333334 = coord(1/3)
```
Abstract

The Web puzzle of online information resources often hinders end-users from effective and efficient access to these resources. Clustering resources into appropriate subject-based groupings may help alleviate these difficulties, but will it work with heterogeneous material? The University of Michigan and the University of California Irvine joined forces to test automatically enhancing metadata records using the Topic Modeling algorithm on the varied OAIster corpus. We created labels for the resulting clusters of metadata records, matched the clusters to an in-house classification system, and developed a prototype that would showcase methods for search and retrieval using the enhanced records. Results indicated that while the algorithm was somewhat time-intensive to run and using a local classification scheme had its drawbacks, precise clustering of records was achieved and the prototype interface proved that faceted classification could be powerful in helping end-users find resources.

Lindholm, J.; Schönthal, T.; Jansson , K.: Experiences of harvesting Web resources in engineering using automatic classification (2003) 0.03

0.028389778 = product of:
  0.08516933 = sum of:
    0.08516933 = weight(_text_:resources in 4088) [ClassicSimilarity], result of:
      0.08516933 = score(doc=4088,freq=4.0), product of:
        0.18665522 = queryWeight, product of:
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.051133685 = queryNorm
        0.45629224 = fieldWeight in 4088, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.0625 = fieldNorm(doc=4088)
  0.33333334 = coord(1/3)

Abstract: Authors describe the background and the work involved in setting up Engine-e, a Web index that uses automatic classification as a mean for the selection of resources in Engineering. Considerations in offering a robot-generated Web index as a successor to a manually indexed quality-controlled subject gateway are also discussed

Liu, R.-L.: Context recognition for hierarchical text classification (2009) 0.03
```
0.026692703 = product of:
  0.08007811 = sum of:
    0.08007811 = sum of:
      0.03851066 = weight(_text_:management in 2760) [ClassicSimilarity], result of:
        0.03851066 = score(doc=2760,freq=2.0), product of:
          0.17235184 = queryWeight, product of:
            3.3706124 = idf(docFreq=4130, maxDocs=44218)
            0.051133685 = queryNorm
          0.22344214 = fieldWeight in 2760, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.3706124 = idf(docFreq=4130, maxDocs=44218)
            0.046875 = fieldNorm(doc=2760)
      0.04156745 = weight(_text_:22 in 2760) [ClassicSimilarity], result of:
        0.04156745 = score(doc=2760,freq=2.0), product of:
          0.17906146 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051133685 = queryNorm
          0.23214069 = fieldWeight in 2760, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=2760)
  0.33333334 = coord(1/3)
```
Abstract

Information is often organized as a text hierarchy. A hierarchical text-classification system is thus essential for the management, sharing, and dissemination of information. It aims to automatically classify each incoming document into zero, one, or several categories in the text hierarchy. In this paper, we present a technique called CRHTC (context recognition for hierarchical text classification) that performs hierarchical text classification by recognizing the context of discussion (COD) of each category. A category's COD is governed by its ancestor categories, whose contents indicate contextual backgrounds of the category. A document may be classified into a category only if its content matches the category's COD. CRHTC does not require any trials to manually set parameters, and hence is more portable and easier to implement than other methods. It is empirically evaluated under various conditions. The results show that CRHTC achieves both better and more stable performance than several hierarchical and nonhierarchical text-classification methodologies.

Date

22. 3.2009 19:11:54

Subramanian, S.; Shafer, K.E.: Clustering (1998) 0.03

0.025093256 = product of:
  0.075279765 = sum of:
    0.075279765 = weight(_text_:resources in 1103) [ClassicSimilarity], result of:
      0.075279765 = score(doc=1103,freq=2.0), product of:
        0.18665522 = queryWeight, product of:
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.051133685 = queryNorm
        0.40330917 = fieldWeight in 1103, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.078125 = fieldNorm(doc=1103)
  0.33333334 = coord(1/3)

Abstract: This article presents our exploration of computer science clustering algorithms as they relate to the Scorpion system. Scorpion is a research project at OCLC that explores the indexing and cataloging of electronic resources. For a more complete description of the Scorpion, please visit the Scorpion Web site at <http://purl.oclc.org/scorpion>

Shafer, K.E.: Evaluating Scorpion Results (2001) 0.03

0.025093256 = product of:
  0.075279765 = sum of:
    0.075279765 = weight(_text_:resources in 4085) [ClassicSimilarity], result of:
      0.075279765 = score(doc=4085,freq=2.0), product of:
        0.18665522 = queryWeight, product of:
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.051133685 = queryNorm
        0.40330917 = fieldWeight in 4085, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.078125 = fieldNorm(doc=4085)
  0.33333334 = coord(1/3)

Abstract: Using DDC for automatic indexing and classifying of Internet resources

Han, K.; Rezapour, R.; Nakamura, K.; Devkota, D.; Miller, D.C.; Diesner, J.: ¬An expert-in-the-loop method for domain-specific document categorization based on small training data (2023) 0.02
```
0.017743612 = product of:
  0.053230833 = sum of:
    0.053230833 = weight(_text_:resources in 967) [ClassicSimilarity], result of:
      0.053230833 = score(doc=967,freq=4.0), product of:
        0.18665522 = queryWeight, product of:
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.051133685 = queryNorm
        0.28518265 = fieldWeight in 967, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.0390625 = fieldNorm(doc=967)
  0.33333334 = coord(1/3)
```
Abstract

Automated text categorization methods are of broad relevance for domain experts since they free researchers and practitioners from manual labeling, save their resources (e.g., time, labor), and enrich the data with information helpful to study substantive questions. Despite a variety of newly developed categorization methods that require substantial amounts of annotated data, little is known about how to build models when (a) labeling texts with categories requires substantial domain expertise and/or in-depth reading, (b) only a few annotated documents are available for model training, and (c) no relevant computational resources, such as pretrained models, are available. In a collaboration with environmental scientists who study the socio-ecological impact of funded biodiversity conservation projects, we develop a method that integrates deep domain expertise with computational models to automatically categorize project reports based on a small sample of 93 annotated documents. Our results suggest that domain expertise can improve automated categorization and that the magnitude of these improvements is influenced by the experts' understanding of categories and their confidence in their annotation, as well as data sparsity and additional category characteristics such as the portion of exclusive keywords that can identify a category.
Koch, T.: Experiments with automatic classification of WAIS databases and indexing of WWW : some results from the Nordic WAIS/WWW project (1994) 0.02
```
0.017565278 = product of:
  0.052695833 = sum of:
    0.052695833 = weight(_text_:resources in 7209) [ClassicSimilarity], result of:
      0.052695833 = score(doc=7209,freq=2.0), product of:
        0.18665522 = queryWeight, product of:
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.051133685 = queryNorm
        0.28231642 = fieldWeight in 7209, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7209)
  0.33333334 = coord(1/3)
```
Abstract

The Nordic WAIS/WWW project sponsored by NORDINFO is a joint project between Lund University Library and the National Technological Library of Denmark. It aims to improve the existing networked information discovery and retrieval tools Wide Area Information System (WAIS) and World Wide Web (WWW), and to move towards unifying WWW and WAIS. Details current results focusing on the WAIS side of the project. Describes research into automatic indexing and classification of WAIS sources, development of an orientation tool for WAIS, and development of a WAIS index of WWW resources
Koch, T.; Ardö, A.; Noodén, L.: ¬The construction of a robot-generated subject index : DESIRE II D3.6a, Working Paper 1 (1999) 0.02
```
0.015055953 = product of:
  0.045167856 = sum of:
    0.045167856 = weight(_text_:resources in 1668) [ClassicSimilarity], result of:
      0.045167856 = score(doc=1668,freq=2.0), product of:
        0.18665522 = queryWeight, product of:
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.051133685 = queryNorm
        0.2419855 = fieldWeight in 1668, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.650338 = idf(docFreq=3122, maxDocs=44218)
          0.046875 = fieldNorm(doc=1668)
  0.33333334 = coord(1/3)
```
Abstract

This working paper describes the creation of a test database to carry out the automatic classification tasks of the DESIRE II work package D3.6a on. It is an improved version of NetLab's existing "All" Engineering database created after a comparative study of the outcome of two different approaches to collecting the documents. These two methods were selected from seven different general methodologies to build robot-generated subject indices, presented in this paper. We found a surprisingly low overlap between the Engineering link collections we used as seed pages for the robot and subsequently an even more surprisingly low overlap between the resources collected by the two different approaches. That inspite of using basically the same services to start the harvesting process from. A intellectual evaluation of the contents of both databases showed almost exactly the same percentage of relevant documents (77%), indicating that the main difference between those aproaches was the coverage of the resulting database.

Kwok, K.L.: ¬The use of titles and cited titles as document representations for automatic classification (1975) 0.01

0.014976369 = product of:
  0.044929106 = sum of:
    0.044929106 = product of:
      0.08985821 = sum of:
        0.08985821 = weight(_text_:management in 4347) [ClassicSimilarity], result of:
          0.08985821 = score(doc=4347,freq=2.0), product of:
            0.17235184 = queryWeight, product of:
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.051133685 = queryNorm
            0.521365 = fieldWeight in 4347, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.109375 = fieldNorm(doc=4347)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Source: Information processing and management. 11(1975), S.201-206

Wu, M.; Fuller, M.; Wilkinson, R.: Using clustering and classification approaches in interactive retrieval (2001) 0.01

0.014976369 = product of:
  0.044929106 = sum of:
    0.044929106 = product of:
      0.08985821 = sum of:
        0.08985821 = weight(_text_:management in 2666) [ClassicSimilarity], result of:
          0.08985821 = score(doc=2666,freq=2.0), product of:
            0.17235184 = queryWeight, product of:
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.051133685 = queryNorm
            0.521365 = fieldWeight in 2666, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.109375 = fieldNorm(doc=2666)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Source: Information processing and management. 37(2001) no.3, S.459-484

Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.01

0.013855817 = product of:
  0.04156745 = sum of:
    0.04156745 = product of:
      0.0831349 = sum of:
        0.0831349 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
          0.0831349 = score(doc=1046,freq=2.0), product of:
            0.17906146 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051133685 = queryNorm
            0.46428138 = fieldWeight in 1046, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=1046)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 5. 5.2003 14:17:22

Major, R.L.; Ragsdale, C.T.: ¬An aggregation approach to the classification problem using multiple prediction experts (2000) 0.01

0.0128368875 = product of:
  0.03851066 = sum of:
    0.03851066 = product of:
      0.07702132 = sum of:
        0.07702132 = weight(_text_:management in 3789) [ClassicSimilarity], result of:
          0.07702132 = score(doc=3789,freq=2.0), product of:
            0.17235184 = queryWeight, product of:
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.051133685 = queryNorm
            0.44688427 = fieldWeight in 3789, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.09375 = fieldNorm(doc=3789)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Source: Information processing and management. 36(2000) no.4, S.683-696

Search (61 results, page 1 of 4)

Authors

Years

Languages

Types

Themes

Subjects