Search (53 results, page 1 of 3)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.23

0.23032402 = product of:
  0.34548602 = sum of:
    0.071776696 = product of:
      0.21533008 = sum of:
        0.21533008 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.21533008 = score(doc=562,freq=2.0), product of:
            0.38313732 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.045191888 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.21533008 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.21533008 = score(doc=562,freq=2.0), product of:
        0.38313732 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.045191888 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.040010586 = weight(_text_:computer in 562) [ClassicSimilarity], result of:
      0.040010586 = score(doc=562,freq=2.0), product of:
        0.16515417 = queryWeight, product of:
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.045191888 = queryNorm
        0.24226204 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.018368632 = product of:
      0.036737263 = sum of:
        0.036737263 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.036737263 = score(doc=562,freq=2.0), product of:
            0.1582543 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045191888 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
  0.6666667 = coord(4/6)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32
Imprint: Washington, DC : IEEE Computer Society

Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998) 0.05

0.04537056 = product of:
  0.13611168 = sum of:
    0.046679016 = weight(_text_:computer in 1673) [ClassicSimilarity], result of:
      0.046679016 = score(doc=1673,freq=2.0), product of:
        0.16515417 = queryWeight, product of:
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.045191888 = queryNorm
        0.28263903 = fieldWeight in 1673, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1673)
    0.08943266 = sum of:
      0.046572514 = weight(_text_:resources in 1673) [ClassicSimilarity], result of:
        0.046572514 = score(doc=1673,freq=2.0), product of:
          0.16496566 = queryWeight, product of:
            3.650338 = idf(docFreq=3122, maxDocs=44218)
            0.045191888 = queryNorm
          0.28231642 = fieldWeight in 1673, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.650338 = idf(docFreq=3122, maxDocs=44218)
            0.0546875 = fieldNorm(doc=1673)
      0.04286014 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
        0.04286014 = score(doc=1673,freq=2.0), product of:
          0.1582543 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.045191888 = queryNorm
          0.2708308 = fieldWeight in 1673, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=1673)
  0.33333334 = coord(2/6)

Date: 1. 8.1996 22:08:06
Source: Computer networks and ISDN systems. 30(1998) nos.1/7, S.646-648

McKiernan, G.: Automated categorisation of Web resources : a profile of selected projects, research, products, and services (1996) 0.04

0.03811548 = product of:
  0.11434643 = sum of:
    0.06730108 = weight(_text_:services in 2533) [ClassicSimilarity], result of:
      0.06730108 = score(doc=2533,freq=2.0), product of:
        0.16591617 = queryWeight, product of:
          3.6713707 = idf(docFreq=3057, maxDocs=44218)
          0.045191888 = queryNorm
        0.405633 = fieldWeight in 2533, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.6713707 = idf(docFreq=3057, maxDocs=44218)
          0.078125 = fieldNorm(doc=2533)
    0.047045346 = product of:
      0.09409069 = sum of:
        0.09409069 = weight(_text_:resources in 2533) [ClassicSimilarity], result of:
          0.09409069 = score(doc=2533,freq=4.0), product of:
            0.16496566 = queryWeight, product of:
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.045191888 = queryNorm
            0.5703653 = fieldWeight in 2533, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.078125 = fieldNorm(doc=2533)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)

Abstract: Profiles several representative current efforts that apply established as well as more innovative methods of automated classification, organization or other method of categorisation of WWW resources

Subramanian, S.; Shafer, K.E.: Clustering (1998) 0.03

0.0333168 = product of:
  0.099950396 = sum of:
    0.06668431 = weight(_text_:computer in 1103) [ClassicSimilarity], result of:
      0.06668431 = score(doc=1103,freq=2.0), product of:
        0.16515417 = queryWeight, product of:
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.045191888 = queryNorm
        0.40377006 = fieldWeight in 1103, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.078125 = fieldNorm(doc=1103)
    0.033266082 = product of:
      0.066532165 = sum of:
        0.066532165 = weight(_text_:resources in 1103) [ClassicSimilarity], result of:
          0.066532165 = score(doc=1103,freq=2.0), product of:
            0.16496566 = queryWeight, product of:
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.045191888 = queryNorm
            0.40330917 = fieldWeight in 1103, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.078125 = fieldNorm(doc=1103)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)

Abstract: This article presents our exploration of computer science clustering algorithms as they relate to the Scorpion system. Scorpion is a research project at OCLC that explores the indexing and cataloging of electronic resources. For a more complete description of the Scorpion, please visit the Scorpion Web site at <http://purl.oclc.org/scorpion>

HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.03

0.0324329 = product of:
  0.0972987 = sum of:
    0.06668431 = weight(_text_:computer in 2748) [ClassicSimilarity], result of:
      0.06668431 = score(doc=2748,freq=2.0), product of:
        0.16515417 = queryWeight, product of:
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.045191888 = queryNorm
        0.40377006 = fieldWeight in 2748, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.078125 = fieldNorm(doc=2748)
    0.030614385 = product of:
      0.06122877 = sum of:
        0.06122877 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
          0.06122877 = score(doc=2748,freq=2.0), product of:
            0.1582543 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045191888 = queryNorm
            0.38690117 = fieldWeight in 2748, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2748)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)

Date: 1. 2.2016 18:25:22
Series: Lecture notes in computer science ; 9398

Koch, T.; Ardö, A.; Noodén, L.: ¬The construction of a robot-generated subject index : DESIRE II D3.6a, Working Paper 1 (1999) 0.02

0.02011343 = product of:
  0.060340293 = sum of:
    0.040380646 = weight(_text_:services in 1668) [ClassicSimilarity], result of:
      0.040380646 = score(doc=1668,freq=2.0), product of:
        0.16591617 = queryWeight, product of:
          3.6713707 = idf(docFreq=3057, maxDocs=44218)
          0.045191888 = queryNorm
        0.2433798 = fieldWeight in 1668, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.6713707 = idf(docFreq=3057, maxDocs=44218)
          0.046875 = fieldNorm(doc=1668)
    0.01995965 = product of:
      0.0399193 = sum of:
        0.0399193 = weight(_text_:resources in 1668) [ClassicSimilarity], result of:
          0.0399193 = score(doc=1668,freq=2.0), product of:
            0.16496566 = queryWeight, product of:
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.045191888 = queryNorm
            0.2419855 = fieldWeight in 1668, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.046875 = fieldNorm(doc=1668)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)

Abstract: This working paper describes the creation of a test database to carry out the automatic classification tasks of the DESIRE II work package D3.6a on. It is an improved version of NetLab's existing "All" Engineering database created after a comparative study of the outcome of two different approaches to collecting the documents. These two methods were selected from seven different general methodologies to build robot-generated subject indices, presented in this paper. We found a surprisingly low overlap between the Engineering link collections we used as seed pages for the robot and subsequently an even more surprisingly low overlap between the resources collected by the two different approaches. That inspite of using basically the same services to start the harvesting process from. A intellectual evaluation of the contents of both databases showed almost exactly the same percentage of relevant documents (77%), indicating that the main difference between those aproaches was the coverage of the resulting database.

Adams, K.C.: Word wranglers : Automatic classification tools transform enterprise documents from "bags of words" into knowledge resources (2003) 0.02
```
0.0166584 = product of:
  0.049975198 = sum of:
    0.033342157 = weight(_text_:computer in 1665) [ClassicSimilarity], result of:
      0.033342157 = score(doc=1665,freq=2.0), product of:
        0.16515417 = queryWeight, product of:
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.045191888 = queryNorm
        0.20188503 = fieldWeight in 1665, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1665)
    0.016633041 = product of:
      0.033266082 = sum of:
        0.033266082 = weight(_text_:resources in 1665) [ClassicSimilarity], result of:
          0.033266082 = score(doc=1665,freq=2.0), product of:
            0.16496566 = queryWeight, product of:
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.045191888 = queryNorm
            0.20165458 = fieldWeight in 1665, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1665)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)
```
Abstract

Taxonomies are an important part of any knowledge management (KM) system, and automatic classification software is emerging as a "killer app" for consumer and enterprise portals. A number of companies such as Inxight Software , Mohomine, Metacode, and others claim to interpret the semantic content of any textual document and automatically classify text on the fly. The promise that software could automatically produce a Yahoo-style directory is a siren call not many IT managers are able to resist. KM needs have grown more complex due to the increasing amount of digital information, the declining effectiveness of keyword searching, and heterogeneous document formats in corporate databases. This environment requires innovative KM tools, and automatic classification technology is an example of this new kind of software. These products can be divided into three categories according to their underlying technology - rules-based, catalog-by-example, and statistical clustering. Evolving trends in this market include framing classification as a cyborg (computer- and human-based) activity and the increasing use of extensible markup language (XML) and support vector machine (SVM) technology. In this article, we'll survey the rapidly changing automatic classification software market and examine the features and capabilities of leading classification products.
Golub, K.: Automated subject classification of textual web documents (2006) 0.02
```
0.0166584 = product of:
  0.049975198 = sum of:
    0.033342157 = weight(_text_:computer in 5600) [ClassicSimilarity], result of:
      0.033342157 = score(doc=5600,freq=2.0), product of:
        0.16515417 = queryWeight, product of:
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.045191888 = queryNorm
        0.20188503 = fieldWeight in 5600, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5600)
    0.016633041 = product of:
      0.033266082 = sum of:
        0.033266082 = weight(_text_:resources in 5600) [ClassicSimilarity], result of:
          0.033266082 = score(doc=5600,freq=2.0), product of:
            0.16496566 = queryWeight, product of:
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.045191888 = queryNorm
            0.20165458 = fieldWeight in 5600, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5600)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)
```
Abstract

Purpose - To provide an integrated perspective to similarities and differences between approaches to automated classification in different research communities (machine learning, information retrieval and library science), and point to problems with the approaches and automated classification as such. Design/methodology/approach - A range of works dealing with automated classification of full-text web documents are discussed. Explorations of individual approaches are given in the following sections: special features (description, differences, evaluation), application and characteristics of web pages. Findings - Provides major similarities and differences between the three approaches: document pre-processing and utilization of web-specific document characteristics is common to all the approaches; major differences are in applied algorithms, employment or not of the vector space model and of controlled vocabularies. Problems of automated classification are recognized. Research limitations/implications - The paper does not attempt to provide an exhaustive bibliography of related resources. Practical implications - As an integrated overview of approaches from different research communities with application examples, it is very useful for students in library and information science and computer science, as well as for practitioners. Researchers from one community have the information on how similar tasks are conducted in different communities. Originality/value - To the author's knowledge, no review paper on automated text classification attempted to discuss more than one community's approach from an integrated perspective.
Golub, K.; Soergel, D.; Buchanan, G.; Tudhope, D.; Lykke, M.; Hiom, D.: ¬A framework for evaluating automatic indexing or classification in the context of retrieval (2016) 0.02
```
0.0166584 = product of:
  0.049975198 = sum of:
    0.033342157 = weight(_text_:computer in 3311) [ClassicSimilarity], result of:
      0.033342157 = score(doc=3311,freq=2.0), product of:
        0.16515417 = queryWeight, product of:
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.045191888 = queryNorm
        0.20188503 = fieldWeight in 3311, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3311)
    0.016633041 = product of:
      0.033266082 = sum of:
        0.033266082 = weight(_text_:resources in 3311) [ClassicSimilarity], result of:
          0.033266082 = score(doc=3311,freq=2.0), product of:
            0.16496566 = queryWeight, product of:
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.045191888 = queryNorm
            0.20165458 = fieldWeight in 3311, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3311)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)
```
Abstract

Tools for automatic subject assignment help deal with scale and sustainability in creating and enriching metadata, establishing more connections across and between resources and enhancing consistency. Although some software vendors and experimental researchers claim the tools can replace manual subject indexing, hard scientific evidence of their performance in operating information environments is scarce. A major reason for this is that research is usually conducted in laboratory conditions, excluding the complexities of real-life systems and situations. The article reviews and discusses issues with existing evaluation approaches such as problems of aboutness and relevance assessments, implying the need to use more than a single "gold standard" method when evaluating indexing and retrieval, and proposes a comprehensive evaluation framework. The framework is informed by a systematic review of the literature on evaluation approaches: evaluating indexing quality directly through assessment by an evaluator or through comparison with a gold standard, evaluating the quality of computer-assisted indexing directly in the context of an indexing workflow, and evaluating indexing quality indirectly through analyzing retrieval performance.
Meder, N.: Artificial intelligence as a tool of classification, or: the network of language games as cognitive paradigm (1985) 0.02
```
0.016338257 = product of:
  0.09802954 = sum of:
    0.09802954 = weight(_text_:network in 7694) [ClassicSimilarity], result of:
      0.09802954 = score(doc=7694,freq=4.0), product of:
        0.2012564 = queryWeight, product of:
          4.4533744 = idf(docFreq=1398, maxDocs=44218)
          0.045191888 = queryNorm
        0.48708782 = fieldWeight in 7694, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.4533744 = idf(docFreq=1398, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7694)
  0.16666667 = coord(1/6)
```
Abstract

It is shown that the cognitive paradigm may be an orientation mark for automatic classification. On the basis of research in Artificial Intelligence, the cognitive paradigm - as opposed to the behavioristic paradigm - was developed as a multiplicity of competitive world-views. This is the thesis of DeMey in his book "The cognitive paradigm". Multiplicity in a loosely-coupled network of cognitive knots is also the principle of dynamic restlessness. In competititon with cognitive views, a classification system that follows various models may learn by concrete information retrieval. During his actions the user builds implicitly a new classification order

Dubin, D.: Dimensions and discriminability (1998) 0.01

0.014905443 = product of:
  0.08943266 = sum of:
    0.08943266 = sum of:
      0.046572514 = weight(_text_:resources in 2338) [ClassicSimilarity], result of:
        0.046572514 = score(doc=2338,freq=2.0), product of:
          0.16496566 = queryWeight, product of:
            3.650338 = idf(docFreq=3122, maxDocs=44218)
            0.045191888 = queryNorm
          0.28231642 = fieldWeight in 2338, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.650338 = idf(docFreq=3122, maxDocs=44218)
            0.0546875 = fieldNorm(doc=2338)
      0.04286014 = weight(_text_:22 in 2338) [ClassicSimilarity], result of:
        0.04286014 = score(doc=2338,freq=2.0), product of:
          0.1582543 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.045191888 = queryNorm
          0.2708308 = fieldWeight in 2338, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=2338)
  0.16666667 = coord(1/6)

Date: 22. 9.1997 19:16:05
Source: Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al

Yi, K.: Automatic text classification using library classification schemes : trends, issues and challenges (2007) 0.01

0.014905443 = product of:
  0.08943266 = sum of:
    0.08943266 = sum of:
      0.046572514 = weight(_text_:resources in 2560) [ClassicSimilarity], result of:
        0.046572514 = score(doc=2560,freq=2.0), product of:
          0.16496566 = queryWeight, product of:
            3.650338 = idf(docFreq=3122, maxDocs=44218)
            0.045191888 = queryNorm
          0.28231642 = fieldWeight in 2560, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.650338 = idf(docFreq=3122, maxDocs=44218)
            0.0546875 = fieldNorm(doc=2560)
      0.04286014 = weight(_text_:22 in 2560) [ClassicSimilarity], result of:
        0.04286014 = score(doc=2560,freq=2.0), product of:
          0.1582543 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.045191888 = queryNorm
          0.2708308 = fieldWeight in 2560, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=2560)
  0.16666667 = coord(1/6)

Abstract: The proliferation of digital resources and their integration into a traditional library setting has created a pressing need for an automated tool that organizes textual information based on library classification schemes. Automated text classification is a research field of developing tools, methods, and models to automate text classification. This article describes the current popular approach for text classification and major text classification projects and applications that are based on library classification schemes. Related issues and challenges are discussed, and a number of considerations for the challenges are examined.
Date: 22. 9.2008 18:31:54

Koch, T.; Vizine-Goetz, D.: Automatic classification and content navigation support for Web services : DESIRE II cooperates with OCLC (1998) 0.01
```
0.013599704 = product of:
  0.08159822 = sum of:
    0.08159822 = weight(_text_:services in 1568) [ClassicSimilarity], result of:
      0.08159822 = score(doc=1568,freq=6.0), product of:
        0.16591617 = queryWeight, product of:
          3.6713707 = idf(docFreq=3057, maxDocs=44218)
          0.045191888 = queryNorm
        0.4918039 = fieldWeight in 1568, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.6713707 = idf(docFreq=3057, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1568)
  0.16666667 = coord(1/6)
```
Abstract

Emerging standards in knowledge representation and organization are preparing the way for distributed vocabulary support in Internet search services. NetLab researchers are exploring several innovative solutions for searching and browsing in the subject-based Internet gateway, Electronic Engineering Library, Sweden (EELS). The implementation of the EELS service is described, specifically, the generation of the robot-gathered database 'All' engineering and the automated application of the Ei thesaurus and classification scheme. NetLab and OCLC researchers are collaborating to investigate advanced solutions to automated classification in the DESIRE II context. A plan for furthering the development of distributed vocabulary support in Internet search services is offered.
Guerrero-Bote, V.P.; Moya Anegón, F. de; Herrero Solana, V.: Document organization using Kohonen's algorithm (2002) 0.01
```
0.013203304 = product of:
  0.079219826 = sum of:
    0.079219826 = weight(_text_:network in 2564) [ClassicSimilarity], result of:
      0.079219826 = score(doc=2564,freq=2.0), product of:
        0.2012564 = queryWeight, product of:
          4.4533744 = idf(docFreq=1398, maxDocs=44218)
          0.045191888 = queryNorm
        0.3936264 = fieldWeight in 2564, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4533744 = idf(docFreq=1398, maxDocs=44218)
          0.0625 = fieldNorm(doc=2564)
  0.16666667 = coord(1/6)
```
Abstract

The classification of documents from a bibliographic database is a task that is linked to processes of information retrieval based on partial matching. A method is described of vectorizing reference documents from LISA which permits their topological organization using Kohonen's algorithm. As an example a map is generated of 202 documents from LISA, and an analysis is made of the possibilities of this type of neural network with respect to the development of information retrieval systems based on graphical browsing.
Koch, T.; Ardö, A.; Brümmer, A.: ¬The building and maintenance of robot based internet search services : A review of current indexing and data collection methods. Prepared to meet the requirements of Work Package 3 of EU Telematics for Research, project DESIRE. Version D3.11v0.3 (Draft version 3) (1996) 0.01
```
0.012690413 = product of:
  0.076142475 = sum of:
    0.076142475 = weight(_text_:services in 1669) [ClassicSimilarity], result of:
      0.076142475 = score(doc=1669,freq=16.0), product of:
        0.16591617 = queryWeight, product of:
          3.6713707 = idf(docFreq=3057, maxDocs=44218)
          0.045191888 = queryNorm
        0.45892134 = fieldWeight in 1669, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          3.6713707 = idf(docFreq=3057, maxDocs=44218)
          0.03125 = fieldNorm(doc=1669)
  0.16666667 = coord(1/6)
```
Abstract

After a short outline of problems, possibilities and difficulties of systematic information retrieval on the Internet and a description of efforts for development in this area, a specification of the terminology for this report is required. Although the process of retrieval is generally seen as an iterative process of browsing and information retrieval and several important services on the net have taken this fact into consideration, the emphasis of this report lays on the general retrieval tools for the whole of Internet. In order to be able to evaluate the differences, possibilities and restrictions of the different services it is necessary to begin with organizing the existing varieties in a typological/ taxonomical survey. The possibilities and weaknesses will be briefly compared and described for the most important services in the categories robot-based WWW-catalogues of different types, list- or form-based catalogues and simultaneous or collected search services respectively. It will however for different reasons not be possible to rank them in order of "best" services. Still more important are the weaknesses and problems common for all attempts of indexing the Internet. The problems of the quality of the input, the technical performance and the general problem of indexing virtual hypertext are shown to be at least as difficult as the different aspects of harvesting, indexing and information retrieval. Some of the attempts made in the area of further development of retrieval services will be mentioned in relation to descriptions of the contents of documents and standardization efforts. Internet harvesting and indexing technology and retrieval software is thoroughly reviewed. Details about all services and software are listed in analytical forms in Annex 1-3.
Ru, C.; Tang, J.; Li, S.; Xie, S.; Wang, T.: Using semantic similarity to reduce wrong labels in distant supervision for relation extraction (2018) 0.01
```
0.011670183 = product of:
  0.0700211 = sum of:
    0.0700211 = weight(_text_:network in 5055) [ClassicSimilarity], result of:
      0.0700211 = score(doc=5055,freq=4.0), product of:
        0.2012564 = queryWeight, product of:
          4.4533744 = idf(docFreq=1398, maxDocs=44218)
          0.045191888 = queryNorm
        0.34791988 = fieldWeight in 5055, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.4533744 = idf(docFreq=1398, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5055)
  0.16666667 = coord(1/6)
```
Abstract

Distant supervision (DS) has the advantage of automatically generating large amounts of labelled training data and has been widely used for relation extraction. However, there are usually many wrong labels in the automatically labelled data in distant supervision (Riedel, Yao, & McCallum, 2010). This paper presents a novel method to reduce the wrong labels. The proposed method uses the semantic Jaccard with word embedding to measure the semantic similarity between the relation phrase in the knowledge base and the dependency phrases between two entities in a sentence to filter the wrong labels. In the process of reducing wrong labels, the semantic Jaccard algorithm selects a core dependency phrase to represent the candidate relation in a sentence, which can capture features for relation classification and avoid the negative impact from irrelevant term sequences that previous neural network models of relation extraction often suffer. In the process of relation classification, the core dependency phrases are also used as the input of a convolutional neural network (CNN) for relation classification. The experimental results show that compared with the methods using original DS data, the methods using filtered DS data performed much better in relation extraction. It indicates that the semantic similarity based method is effective in reducing wrong labels. The relation extraction performance of the CNN model using the core dependency phrases as input is the best of all, which indicates that using the core dependency phrases as input of CNN is enough to capture the features for relation classification and could avoid negative impact from irrelevant terms.
Orwig, R.E.; Chen, H.; Nunamaker, J.F.: ¬A graphical, self-organizing approach to classifying electronic meeting output (1997) 0.01
```
0.011552892 = product of:
  0.06931735 = sum of:
    0.06931735 = weight(_text_:network in 6928) [ClassicSimilarity], result of:
      0.06931735 = score(doc=6928,freq=2.0), product of:
        0.2012564 = queryWeight, product of:
          4.4533744 = idf(docFreq=1398, maxDocs=44218)
          0.045191888 = queryNorm
        0.3444231 = fieldWeight in 6928, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4533744 = idf(docFreq=1398, maxDocs=44218)
          0.0546875 = fieldNorm(doc=6928)
  0.16666667 = coord(1/6)
```
Abstract

Describes research in the application of a Kohonen Self-Organizing Map (SOM) to the problem of classification of electronic brainstorming output and an evaluation of the results. Describes an electronic meeting system and describes the classification problem that exists in the group problem solving process. Surveys the literature concerning classification. Describes the application of the Kohonen SOM to the meeting output classification problem. Describes an experiment that evaluated the classification performed by the Kohonen SOM by comparing it with those of a human expert and a Hopfield neural network. Discusses conclusions and directions for future research
Ruiz, M.E.; Srinivasan, P.: Combining machine learning and hierarchical indexing structures for text categorization (2001) 0.01
```
0.011552892 = product of:
  0.06931735 = sum of:
    0.06931735 = weight(_text_:network in 1595) [ClassicSimilarity], result of:
      0.06931735 = score(doc=1595,freq=2.0), product of:
        0.2012564 = queryWeight, product of:
          4.4533744 = idf(docFreq=1398, maxDocs=44218)
          0.045191888 = queryNorm
        0.3444231 = fieldWeight in 1595, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4533744 = idf(docFreq=1398, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1595)
  0.16666667 = coord(1/6)
```
Abstract

This paper presents a method that exploits the hierarchical structure of an indexing vocabulary to guide the development and training of machine learning methods for automatic text categorization. We present the design of a hierarchical classifier based an the divide-and-conquer principle. The method is evaluated using backpropagation neural networks, such as the machine learning algorithm, that leam to assign MeSH categories to a subset of MEDLINE records. Comparisons with traditional Rocchio's algorithm adapted for text categorization, as well as flat neural network classifiers, are provided. The results indicate that the use of hierarchical structures improves Performance significantly.
Yang, Y.; Liu, X.: ¬A re-examination of text categorization methods (1999) 0.01
```
0.011552892 = product of:
  0.06931735 = sum of:
    0.06931735 = weight(_text_:network in 3386) [ClassicSimilarity], result of:
      0.06931735 = score(doc=3386,freq=2.0), product of:
        0.2012564 = queryWeight, product of:
          4.4533744 = idf(docFreq=1398, maxDocs=44218)
          0.045191888 = queryNorm
        0.3444231 = fieldWeight in 3386, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4533744 = idf(docFreq=1398, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3386)
  0.16666667 = coord(1/6)
```
Abstract

This paper reports a controlled study with statistical significance tests an five text categorization methods: the Support Vector Machines (SVM), a k-Nearest Neighbor (kNN) classifier, a neural network (NNet) approach, the Linear Leastsquares Fit (LLSF) mapping and a Naive Bayes (NB) classifier. We focus an the robustness of these methods in dealing with a skewed category distribution, and their performance as function of the training-set category frequency. Our results show that SVM, kNN and LLSF significantly outperform NNet and NB when the number of positive training instances per category are small (less than ten, and that all the methods perform comparably when the categories are sufficiently common (over 300 instances).

Denoyer, L.; Gallinari, P.: Bayesian network model for semi-structured document classification (2004) 0.01

0.009902478 = product of:
  0.059414867 = sum of:
    0.059414867 = weight(_text_:network in 995) [ClassicSimilarity], result of:
      0.059414867 = score(doc=995,freq=2.0), product of:
        0.2012564 = queryWeight, product of:
          4.4533744 = idf(docFreq=1398, maxDocs=44218)
          0.045191888 = queryNorm
        0.29521978 = fieldWeight in 995, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4533744 = idf(docFreq=1398, maxDocs=44218)
          0.046875 = fieldNorm(doc=995)
  0.16666667 = coord(1/6)

Search (53 results, page 1 of 3)

Authors

Years

Languages

Types

Themes