-
Egbert, J.; Biber, D.; Davies, M.: Developing a bottom-up, user-based method of web register classification (2015)
0.01
0.013946643 = product of:
0.041839927 = sum of:
0.031420145 = weight(_text_:internet in 2158) [ClassicSimilarity], result of:
0.031420145 = score(doc=2158,freq=4.0), product of:
0.11352337 = queryWeight, product of:
2.9522398 = idf(docFreq=6276, maxDocs=44218)
0.038453303 = queryNorm
0.27677247 = fieldWeight in 2158, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
2.9522398 = idf(docFreq=6276, maxDocs=44218)
0.046875 = fieldNorm(doc=2158)
0.010419784 = product of:
0.03125935 = sum of:
0.03125935 = weight(_text_:22 in 2158) [ClassicSimilarity], result of:
0.03125935 = score(doc=2158,freq=2.0), product of:
0.13465692 = queryWeight, product of:
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.038453303 = queryNorm
0.23214069 = fieldWeight in 2158, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.046875 = fieldNorm(doc=2158)
0.33333334 = coord(1/3)
0.33333334 = coord(2/6)
- Abstract
- This paper introduces a project to develop a reliable, cost-effective method for classifying Internet texts into register categories, and apply that approach to the analysis of a large corpus of web documents. To date, the project has proceeded in 2 key phases. First, we developed a bottom-up method for web register classification, asking end users of the web to utilize a decision-tree survey to code relevant situational characteristics of web documents, resulting in a bottom-up identification of register and subregister categories. We present details regarding the development and testing of this method through a series of 10 pilot studies. Then, in the second phase of our project we applied this procedure to a corpus of 53,000 web documents. An analysis of the results demonstrates the effectiveness of these methods for web register classification and provides a preliminary description of the types and distribution of registers on the web.
- Date
- 4. 8.2015 19:22:04
- Theme
- Internet
-
Chung, Y.-M.; Noh, Y.-H.: Developing a specialized directory system by automatically classifying Web documents (2003)
0.01
0.010910588 = product of:
0.032731764 = sum of:
0.022217397 = weight(_text_:internet in 1566) [ClassicSimilarity], result of:
0.022217397 = score(doc=1566,freq=2.0), product of:
0.11352337 = queryWeight, product of:
2.9522398 = idf(docFreq=6276, maxDocs=44218)
0.038453303 = queryNorm
0.1957077 = fieldWeight in 1566, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
2.9522398 = idf(docFreq=6276, maxDocs=44218)
0.046875 = fieldNorm(doc=1566)
0.010514366 = product of:
0.0315431 = sum of:
0.0315431 = weight(_text_:29 in 1566) [ClassicSimilarity], result of:
0.0315431 = score(doc=1566,freq=2.0), product of:
0.13526669 = queryWeight, product of:
3.5176873 = idf(docFreq=3565, maxDocs=44218)
0.038453303 = queryNorm
0.23319192 = fieldWeight in 1566, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.5176873 = idf(docFreq=3565, maxDocs=44218)
0.046875 = fieldNorm(doc=1566)
0.33333334 = coord(1/3)
0.33333334 = coord(2/6)
- Source
- Journal of information science. 29(2003) no.2, S.117-126
- Theme
- Internet
-
Shafer, K.E.: Evaluating Scorpion Results (2001)
0.01
0.010689351 = product of:
0.0641361 = sum of:
0.0641361 = weight(_text_:internet in 4085) [ClassicSimilarity], result of:
0.0641361 = score(doc=4085,freq=6.0), product of:
0.11352337 = queryWeight, product of:
2.9522398 = idf(docFreq=6276, maxDocs=44218)
0.038453303 = queryNorm
0.56495947 = fieldWeight in 4085, product of:
2.4494898 = tf(freq=6.0), with freq of:
6.0 = termFreq=6.0
2.9522398 = idf(docFreq=6276, maxDocs=44218)
0.078125 = fieldNorm(doc=4085)
0.16666667 = coord(1/6)
- Abstract
- Using DDC for automatic indexing and classifying of Internet resources
- Footnote
- Teil eines Themenheftes: OCLC and the Internet: An Historical Overview of Research Activities, 1990-1999 - Part II
- Theme
- Internet
-
Shafer, K.E.: Automatic Subject Assignment via the Scorpion System (2001)
0.01
0.010473382 = product of:
0.06284029 = sum of:
0.06284029 = weight(_text_:internet in 1043) [ClassicSimilarity], result of:
0.06284029 = score(doc=1043,freq=4.0), product of:
0.11352337 = queryWeight, product of:
2.9522398 = idf(docFreq=6276, maxDocs=44218)
0.038453303 = queryNorm
0.55354494 = fieldWeight in 1043, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
2.9522398 = idf(docFreq=6276, maxDocs=44218)
0.09375 = fieldNorm(doc=1043)
0.16666667 = coord(1/6)
- Footnote
- Teil eines Themenheftes: OCLC and the Internet: An Historical Overview of Research Activities, 1990-1999 - Part I
- Theme
- Internet
-
Vizine-Goetz, D.: NetLab / OCLC collaboration seeks to improve Web searching (1999)
0.01
0.008727819 = product of:
0.05236691 = sum of:
0.05236691 = weight(_text_:internet in 4180) [ClassicSimilarity], result of:
0.05236691 = score(doc=4180,freq=4.0), product of:
0.11352337 = queryWeight, product of:
2.9522398 = idf(docFreq=6276, maxDocs=44218)
0.038453303 = queryNorm
0.46128747 = fieldWeight in 4180, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
2.9522398 = idf(docFreq=6276, maxDocs=44218)
0.078125 = fieldNorm(doc=4180)
0.16666667 = coord(1/6)
- Abstract
- Vorstellung verschiedener Projekte zur Verbesserung der Internet-Erschließung mit Hilfe der DDC
- Theme
- Internet
-
Chan, L.M.; Lin, X.; Zeng, M.L.: Structural and multilingual approaches to subject access on the Web (2000)
0.01
0.007405799 = product of:
0.044434793 = sum of:
0.044434793 = weight(_text_:internet in 507) [ClassicSimilarity], result of:
0.044434793 = score(doc=507,freq=2.0), product of:
0.11352337 = queryWeight, product of:
2.9522398 = idf(docFreq=6276, maxDocs=44218)
0.038453303 = queryNorm
0.3914154 = fieldWeight in 507, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
2.9522398 = idf(docFreq=6276, maxDocs=44218)
0.09375 = fieldNorm(doc=507)
0.16666667 = coord(1/6)
- Theme
- Internet
-
McKiernan, G.: Automated categorisation of Web resources : a profile of selected projects, research, products, and services (1996)
0.01
0.0061715 = product of:
0.037028998 = sum of:
0.037028998 = weight(_text_:internet in 2533) [ClassicSimilarity], result of:
0.037028998 = score(doc=2533,freq=2.0), product of:
0.11352337 = queryWeight, product of:
2.9522398 = idf(docFreq=6276, maxDocs=44218)
0.038453303 = queryNorm
0.3261795 = fieldWeight in 2533, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
2.9522398 = idf(docFreq=6276, maxDocs=44218)
0.078125 = fieldNorm(doc=2533)
0.16666667 = coord(1/6)
- Theme
- Internet
-
Möller, G.: Automatic classification of the World Wide Web using Universal Decimal Classification (1999)
0.01
0.0061715 = product of:
0.037028998 = sum of:
0.037028998 = weight(_text_:internet in 494) [ClassicSimilarity], result of:
0.037028998 = score(doc=494,freq=2.0), product of:
0.11352337 = queryWeight, product of:
2.9522398 = idf(docFreq=6276, maxDocs=44218)
0.038453303 = queryNorm
0.3261795 = fieldWeight in 494, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
2.9522398 = idf(docFreq=6276, maxDocs=44218)
0.078125 = fieldNorm(doc=494)
0.16666667 = coord(1/6)
- Theme
- Internet
-
Koch, T.: Experiments with automatic classification of WAIS databases and indexing of WWW : some results from the Nordic WAIS/WWW project (1994)
0.01
0.0061094724 = product of:
0.036656834 = sum of:
0.036656834 = weight(_text_:internet in 7209) [ClassicSimilarity], result of:
0.036656834 = score(doc=7209,freq=4.0), product of:
0.11352337 = queryWeight, product of:
2.9522398 = idf(docFreq=6276, maxDocs=44218)
0.038453303 = queryNorm
0.32290122 = fieldWeight in 7209, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
2.9522398 = idf(docFreq=6276, maxDocs=44218)
0.0546875 = fieldNorm(doc=7209)
0.16666667 = coord(1/6)
- Source
- Internet world and document delivery world international 94: Proceedings of the 2nd Annual Conference, London, May 1994
- Theme
- Internet
-
Choi, B.; Peng, X.: Dynamic and hierarchical classification of Web pages (2004)
0.01
0.005236691 = product of:
0.031420145 = sum of:
0.031420145 = weight(_text_:internet in 2555) [ClassicSimilarity], result of:
0.031420145 = score(doc=2555,freq=4.0), product of:
0.11352337 = queryWeight, product of:
2.9522398 = idf(docFreq=6276, maxDocs=44218)
0.038453303 = queryNorm
0.27677247 = fieldWeight in 2555, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
2.9522398 = idf(docFreq=6276, maxDocs=44218)
0.046875 = fieldNorm(doc=2555)
0.16666667 = coord(1/6)
- Abstract
- Automatic classification of Web pages is an effective way to organise the vast amount of information and to assist in retrieving relevant information from the Internet. Although many automatic classification systems have been proposed, most of them ignore the conflict between the fixed number of categories and the growing number of Web pages being added into the systems. They also require searching through all existing categories to make any classification. This article proposes a dynamic and hierarchical classification system that is capable of adding new categories as required, organising the Web pages into a tree structure, and classifying Web pages by searching through only one path of the tree. The proposed single-path search technique reduces the search complexity from (n) to (log(n)). Test results show that the system improves the accuracy of classification by 6 percent in comparison to related systems. The dynamic-category expansion technique also achieves satisfying results for adding new categories into the system as required.
- Theme
- Internet