Search (13 results, page 1 of 1)

Egbert, J.; Biber, D.; Davies, M.: Developing a bottom-up, user-based method of web register classification (2015) 0.07

0.07319553 = product of:
  0.14639106 = sum of:
    0.14639106 = sum of:
      0.10403923 = weight(_text_:web in 2158) [ClassicSimilarity], result of:
        0.10403923 = score(doc=2158,freq=16.0), product of:
          0.17002425 = queryWeight, product of:
            3.2635105 = idf(docFreq=4597, maxDocs=44218)
            0.052098576 = queryNorm
          0.6119082 = fieldWeight in 2158, product of:
            4.0 = tf(freq=16.0), with freq of:
              16.0 = termFreq=16.0
            3.2635105 = idf(docFreq=4597, maxDocs=44218)
            0.046875 = fieldNorm(doc=2158)
      0.042351827 = weight(_text_:22 in 2158) [ClassicSimilarity], result of:
        0.042351827 = score(doc=2158,freq=2.0), product of:
          0.18244034 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.052098576 = queryNorm
          0.23214069 = fieldWeight in 2158, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=2158)
  0.5 = coord(1/2)

Abstract: This paper introduces a project to develop a reliable, cost-effective method for classifying Internet texts into register categories, and apply that approach to the analysis of a large corpus of web documents. To date, the project has proceeded in 2 key phases. First, we developed a bottom-up method for web register classification, asking end users of the web to utilize a decision-tree survey to code relevant situational characteristics of web documents, resulting in a bottom-up identification of register and subregister categories. We present details regarding the development and testing of this method through a series of 10 pilot studies. Then, in the second phase of our project we applied this procedure to a corpus of 53,000 web documents. An analysis of the results demonstrates the effectiveness of these methods for web register classification and provides a preliminary description of the types and distribution of registers on the web.
Date: 4. 8.2015 19:22:04

Choi, B.; Peng, X.: Dynamic and hierarchical classification of Web pages (2004) 0.02
```
0.02056256 = product of:
  0.04112512 = sum of:
    0.04112512 = product of:
      0.08225024 = sum of:
        0.08225024 = weight(_text_:web in 2555) [ClassicSimilarity], result of:
          0.08225024 = score(doc=2555,freq=10.0), product of:
            0.17002425 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.052098576 = queryNorm
            0.48375595 = fieldWeight in 2555, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=2555)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Automatic classification of Web pages is an effective way to organise the vast amount of information and to assist in retrieving relevant information from the Internet. Although many automatic classification systems have been proposed, most of them ignore the conflict between the fixed number of categories and the growing number of Web pages being added into the systems. They also require searching through all existing categories to make any classification. This article proposes a dynamic and hierarchical classification system that is capable of adding new categories as required, organising the Web pages into a tree structure, and classifying Web pages by searching through only one path of the tree. The proposed single-path search technique reduces the search complexity from (n) to (log(n)). Test results show that the system improves the accuracy of classification by 6 percent in comparison to related systems. The dynamic-category expansion technique also achieves satisfying results for adding new categories into the system as required.

Chan, L.M.; Lin, X.; Zeng, M.L.: Structural and multilingual approaches to subject access on the Web (2000) 0.02

0.01839171 = product of:
  0.03678342 = sum of:
    0.03678342 = product of:
      0.07356684 = sum of:
        0.07356684 = weight(_text_:web in 507) [ClassicSimilarity], result of:
          0.07356684 = score(doc=507,freq=2.0), product of:
            0.17002425 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.052098576 = queryNorm
            0.43268442 = fieldWeight in 507, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.09375 = fieldNorm(doc=507)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Chung, Y.-M.; Noh, Y.-H.: Developing a specialized directory system by automatically classifying Web documents (2003) 0.02
```
0.01839171 = product of:
  0.03678342 = sum of:
    0.03678342 = product of:
      0.07356684 = sum of:
        0.07356684 = weight(_text_:web in 1566) [ClassicSimilarity], result of:
          0.07356684 = score(doc=1566,freq=8.0), product of:
            0.17002425 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.052098576 = queryNorm
            0.43268442 = fieldWeight in 1566, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=1566)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

This study developed a specialized directory system using an automatic classification technique. Economics was selected as the subject field for the classification experiments with Web documents. The classification scheme of the directory follows the DDC, and subject terms representing each class number or subject category were selected from the DDC table to construct a representative term dictionary. In collecting and classifying the Web documents, various strategies were tested in order to find the optimal thresholds. In the classification experiments, Web documents in economics were classified into a total of 757 hierarchical subject categories built from the DDC scheme. The first and second experiments using the representative term dictionary resulted in relatively high precision ratios of 77 and 60%, respectively. The third experiment employing a machine learning-based k-nearest neighbours (kNN) classifier in a closed experimental setting achieved a precision ratio of 96%. This implies that it is possible to enhance the classification performance by applying a hybrid method combining a dictionary-based technique and a kNN classifier

Chan, L.M.; Lin, X.; Zeng, M.: Structural and multilingual approaches to subject access on the Web (1999) 0.02

0.017339872 = product of:
  0.034679744 = sum of:
    0.034679744 = product of:
      0.06935949 = sum of:
        0.06935949 = weight(_text_:web in 162) [ClassicSimilarity], result of:
          0.06935949 = score(doc=162,freq=4.0), product of:
            0.17002425 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.052098576 = queryNorm
            0.4079388 = fieldWeight in 162, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0625 = fieldNorm(doc=162)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: Zu den großen Herausforderungen einer sinnvollen Suche im WWW gehören die riesige Menge des Verfügbaren und die Sparchbarrieren. Verfahren, die die Web-Ressourcen im Hinblick auf ein effizienteres Retrieval inhaltlich strukturieren, werden daher ebenso dringend benötigt wie Programme, die mit der Sprachvielfalt umgehen können. Im folgenden Vortrag werden wir einige Ansätze diskutieren, die zur Bewältigung der beiden Probleme derzeit unternommen werden

McKiernan, G.: Automated categorisation of Web resources : a profile of selected projects, research, products, and services (1996) 0.02

0.015326426 = product of:
  0.030652853 = sum of:
    0.030652853 = product of:
      0.061305705 = sum of:
        0.061305705 = weight(_text_:web in 2533) [ClassicSimilarity], result of:
          0.061305705 = score(doc=2533,freq=2.0), product of:
            0.17002425 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.052098576 = queryNorm
            0.36057037 = fieldWeight in 2533, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.078125 = fieldNorm(doc=2533)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Wätjen, H.-J.: Automatisches Sammeln, Klassifizieren und Indexieren von wissenschaftlich relevanten Informationsressourcen im deutschen World Wide Web : das DFG-Projekt GERHARD (1998) 0.02

0.015326426 = product of:
  0.030652853 = sum of:
    0.030652853 = product of:
      0.061305705 = sum of:
        0.061305705 = weight(_text_:web in 3066) [ClassicSimilarity], result of:
          0.061305705 = score(doc=3066,freq=2.0), product of:
            0.17002425 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.052098576 = queryNorm
            0.36057037 = fieldWeight in 3066, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.078125 = fieldNorm(doc=3066)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Vizine-Goetz, D.: NetLab / OCLC collaboration seeks to improve Web searching (1999) 0.02

0.015326426 = product of:
  0.030652853 = sum of:
    0.030652853 = product of:
      0.061305705 = sum of:
        0.061305705 = weight(_text_:web in 4180) [ClassicSimilarity], result of:
          0.061305705 = score(doc=4180,freq=2.0), product of:
            0.17002425 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.052098576 = queryNorm
            0.36057037 = fieldWeight in 4180, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.078125 = fieldNorm(doc=4180)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Möller, G.: Automatic classification of the World Wide Web using Universal Decimal Classification (1999) 0.02

0.015326426 = product of:
  0.030652853 = sum of:
    0.030652853 = product of:
      0.061305705 = sum of:
        0.061305705 = weight(_text_:web in 494) [ClassicSimilarity], result of:
          0.061305705 = score(doc=494,freq=2.0), product of:
            0.17002425 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.052098576 = queryNorm
            0.36057037 = fieldWeight in 494, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.078125 = fieldNorm(doc=494)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Subramanian, S.; Shafer, K.E.: Clustering (1998) 0.02

0.015326426 = product of:
  0.030652853 = sum of:
    0.030652853 = product of:
      0.061305705 = sum of:
        0.061305705 = weight(_text_:web in 1103) [ClassicSimilarity], result of:
          0.061305705 = score(doc=1103,freq=2.0), product of:
            0.17002425 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.052098576 = queryNorm
            0.36057037 = fieldWeight in 1103, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.078125 = fieldNorm(doc=1103)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: This article presents our exploration of computer science clustering algorithms as they relate to the Scorpion system. Scorpion is a research project at OCLC that explores the indexing and cataloging of electronic resources. For a more complete description of the Scorpion, please visit the Scorpion Web site at <http://purl.oclc.org/scorpion>

Wätjen, H.-J.: GERHARD : Automatisches Sammeln, Klassifizieren und Indexieren von wissenschaftlich relevanten Informationsressourcen im deutschen World Wide Web (1998) 0.02
```
0.015172388 = product of:
  0.030344777 = sum of:
    0.030344777 = product of:
      0.060689554 = sum of:
        0.060689554 = weight(_text_:web in 3064) [ClassicSimilarity], result of:
          0.060689554 = score(doc=3064,freq=4.0), product of:
            0.17002425 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.052098576 = queryNorm
            0.35694647 = fieldWeight in 3064, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3064)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Die intellektuelle Erschließung des Internet befindet sich in einer Krise. Yahoo und andere Dienste können mit dem Wachstum des Web nicht mithalten. GERHARD ist derzeit weltweit der einzige Such- und Navigationsdienst, der die mit einem Roboter gesammelten Internetressourcen mit computerlinguistischen und statistischen Verfahren auch automatisch vollständig klassifiziert. Weit über eine Million HTML-Dokumente von wissenschaftlich relevanten Servern in Deutschland können wie bei anderen Suchmaschinen in der Datenbank gesucht, aber auch über die Navigation in der dreisprachigen Universalen Dezimalklassifikation (ETH-Bibliothek Zürich) recherchiert werden
Walther, R.: Möglichkeiten und Grenzen automatischer Klassifikationen von Web-Dokumenten (2001) 0.02
```
0.015172388 = product of:
  0.030344777 = sum of:
    0.030344777 = product of:
      0.060689554 = sum of:
        0.060689554 = weight(_text_:web in 1562) [ClassicSimilarity], result of:
          0.060689554 = score(doc=1562,freq=4.0), product of:
            0.17002425 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.052098576 = queryNorm
            0.35694647 = fieldWeight in 1562, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1562)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Automatische Klassifikationen von Web- und andern Textdokumenten ermöglichen es, betriebsinterne und externe Informationen geordnet zugänglich zu machen. Die Forschung zur automatischen Klassifikation hat sich in den letzten Jahren intensiviert. Das Resultat sind verschiedenen Methoden, die heute in der Praxis einzeln oder kombiniert für die Klassifikation im Einsatz sind. In der vorliegenden Lizenziatsarbeit werden neben allgemeinen Grundsätzen einige Methoden zur automatischen Klassifikation genauer betrachtet und ihre Möglichkeiten und Grenzen erörtert. Daneben erfolgt die Präsentation der Resultate aus einer Umfrage bei Anbieterrfirmen von Softwarelösungen zur automatische Klassifikation von Text-Dokumenten. Die Ausführungen dienen der myax internet AG als Basis, ein eigenes Klassifikations-Produkt zu entwickeln
Koch, T.: Experiments with automatic classification of WAIS databases and indexing of WWW : some results from the Nordic WAIS/WWW project (1994) 0.01
```
0.010728499 = product of:
  0.021456998 = sum of:
    0.021456998 = product of:
      0.042913996 = sum of:
        0.042913996 = weight(_text_:web in 7209) [ClassicSimilarity], result of:
          0.042913996 = score(doc=7209,freq=2.0), product of:
            0.17002425 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.052098576 = queryNorm
            0.25239927 = fieldWeight in 7209, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0546875 = fieldNorm(doc=7209)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The Nordic WAIS/WWW project sponsored by NORDINFO is a joint project between Lund University Library and the National Technological Library of Denmark. It aims to improve the existing networked information discovery and retrieval tools Wide Area Information System (WAIS) and World Wide Web (WWW), and to move towards unifying WWW and WAIS. Details current results focusing on the WAIS side of the project. Describes research into automatic indexing and classification of WAIS sources, development of an orientation tool for WAIS, and development of a WAIS index of WWW resources

Search (13 results, page 1 of 1)

Authors

Years

Languages

Types

Themes