Search (46 results, page 1 of 3)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.31

0.3101718 = product of:
  0.49627492 = sum of:
    0.06351632 = product of:
      0.19054894 = sum of:
        0.19054894 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.19054894 = score(doc=562,freq=2.0), product of:
            0.33904418 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.039991006 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.19054894 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.19054894 = score(doc=562,freq=2.0), product of:
        0.33904418 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.039991006 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.19054894 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.19054894 = score(doc=562,freq=2.0), product of:
        0.33904418 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.039991006 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.035405993 = weight(_text_:computer in 562) [ClassicSimilarity], result of:
      0.035405993 = score(doc=562,freq=2.0), product of:
        0.1461475 = queryWeight, product of:
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.039991006 = queryNorm
        0.24226204 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.016254688 = product of:
      0.032509375 = sum of:
        0.032509375 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.032509375 = score(doc=562,freq=2.0), product of:
            0.1400417 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.039991006 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
  0.625 = coord(5/8)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32
Imprint: Washington, DC : IEEE Computer Society

Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998) 0.03

0.030111834 = product of:
  0.12044734 = sum of:
    0.041306987 = weight(_text_:computer in 1673) [ClassicSimilarity], result of:
      0.041306987 = score(doc=1673,freq=2.0), product of:
        0.1461475 = queryWeight, product of:
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.039991006 = queryNorm
        0.28263903 = fieldWeight in 1673, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1673)
    0.07914035 = sum of:
      0.041212745 = weight(_text_:resources in 1673) [ClassicSimilarity], result of:
        0.041212745 = score(doc=1673,freq=2.0), product of:
          0.14598069 = queryWeight, product of:
            3.650338 = idf(docFreq=3122, maxDocs=44218)
            0.039991006 = queryNorm
          0.28231642 = fieldWeight in 1673, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.650338 = idf(docFreq=3122, maxDocs=44218)
            0.0546875 = fieldNorm(doc=1673)
      0.037927605 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
        0.037927605 = score(doc=1673,freq=2.0), product of:
          0.1400417 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.039991006 = queryNorm
          0.2708308 = fieldWeight in 1673, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=1673)
  0.25 = coord(2/8)

Date: 1. 8.1996 22:08:06
Source: Computer networks and ISDN systems. 30(1998) nos.1/7, S.646-648

Subramanian, S.; Shafer, K.E.: Clustering (1998) 0.02

0.022111915 = product of:
  0.08844766 = sum of:
    0.059009988 = weight(_text_:computer in 1103) [ClassicSimilarity], result of:
      0.059009988 = score(doc=1103,freq=2.0), product of:
        0.1461475 = queryWeight, product of:
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.039991006 = queryNorm
        0.40377006 = fieldWeight in 1103, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.078125 = fieldNorm(doc=1103)
    0.029437674 = product of:
      0.05887535 = sum of:
        0.05887535 = weight(_text_:resources in 1103) [ClassicSimilarity], result of:
          0.05887535 = score(doc=1103,freq=2.0), product of:
            0.14598069 = queryWeight, product of:
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.039991006 = queryNorm
            0.40330917 = fieldWeight in 1103, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.078125 = fieldNorm(doc=1103)
      0.5 = coord(1/2)
  0.25 = coord(2/8)

Abstract: This article presents our exploration of computer science clustering algorithms as they relate to the Scorpion system. Scorpion is a research project at OCLC that explores the indexing and cataloging of electronic resources. For a more complete description of the Scorpion, please visit the Scorpion Web site at <http://purl.oclc.org/scorpion>

HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.02

0.021525284 = product of:
  0.08610114 = sum of:
    0.059009988 = weight(_text_:computer in 2748) [ClassicSimilarity], result of:
      0.059009988 = score(doc=2748,freq=2.0), product of:
        0.1461475 = queryWeight, product of:
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.039991006 = queryNorm
        0.40377006 = fieldWeight in 2748, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.078125 = fieldNorm(doc=2748)
    0.027091147 = product of:
      0.054182295 = sum of:
        0.054182295 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
          0.054182295 = score(doc=2748,freq=2.0), product of:
            0.1400417 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.039991006 = queryNorm
            0.38690117 = fieldWeight in 2748, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2748)
      0.5 = coord(1/2)
  0.25 = coord(2/8)

Date: 1. 2.2016 18:25:22
Series: Lecture notes in computer science ; 9398

Adams, K.C.: Word wranglers : Automatic classification tools transform enterprise documents from "bags of words" into knowledge resources (2003) 0.01
```
0.0110559575 = product of:
  0.04422383 = sum of:
    0.029504994 = weight(_text_:computer in 1665) [ClassicSimilarity], result of:
      0.029504994 = score(doc=1665,freq=2.0), product of:
        0.1461475 = queryWeight, product of:
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.039991006 = queryNorm
        0.20188503 = fieldWeight in 1665, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1665)
    0.014718837 = product of:
      0.029437674 = sum of:
        0.029437674 = weight(_text_:resources in 1665) [ClassicSimilarity], result of:
          0.029437674 = score(doc=1665,freq=2.0), product of:
            0.14598069 = queryWeight, product of:
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.039991006 = queryNorm
            0.20165458 = fieldWeight in 1665, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1665)
      0.5 = coord(1/2)
  0.25 = coord(2/8)
```
Abstract

Taxonomies are an important part of any knowledge management (KM) system, and automatic classification software is emerging as a "killer app" for consumer and enterprise portals. A number of companies such as Inxight Software , Mohomine, Metacode, and others claim to interpret the semantic content of any textual document and automatically classify text on the fly. The promise that software could automatically produce a Yahoo-style directory is a siren call not many IT managers are able to resist. KM needs have grown more complex due to the increasing amount of digital information, the declining effectiveness of keyword searching, and heterogeneous document formats in corporate databases. This environment requires innovative KM tools, and automatic classification technology is an example of this new kind of software. These products can be divided into three categories according to their underlying technology - rules-based, catalog-by-example, and statistical clustering. Evolving trends in this market include framing classification as a cyborg (computer- and human-based) activity and the increasing use of extensible markup language (XML) and support vector machine (SVM) technology. In this article, we'll survey the rapidly changing automatic classification software market and examine the features and capabilities of leading classification products.
Golub, K.: Automated subject classification of textual web documents (2006) 0.01
```
0.0110559575 = product of:
  0.04422383 = sum of:
    0.029504994 = weight(_text_:computer in 5600) [ClassicSimilarity], result of:
      0.029504994 = score(doc=5600,freq=2.0), product of:
        0.1461475 = queryWeight, product of:
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.039991006 = queryNorm
        0.20188503 = fieldWeight in 5600, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5600)
    0.014718837 = product of:
      0.029437674 = sum of:
        0.029437674 = weight(_text_:resources in 5600) [ClassicSimilarity], result of:
          0.029437674 = score(doc=5600,freq=2.0), product of:
            0.14598069 = queryWeight, product of:
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.039991006 = queryNorm
            0.20165458 = fieldWeight in 5600, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5600)
      0.5 = coord(1/2)
  0.25 = coord(2/8)
```
Abstract

Purpose - To provide an integrated perspective to similarities and differences between approaches to automated classification in different research communities (machine learning, information retrieval and library science), and point to problems with the approaches and automated classification as such. Design/methodology/approach - A range of works dealing with automated classification of full-text web documents are discussed. Explorations of individual approaches are given in the following sections: special features (description, differences, evaluation), application and characteristics of web pages. Findings - Provides major similarities and differences between the three approaches: document pre-processing and utilization of web-specific document characteristics is common to all the approaches; major differences are in applied algorithms, employment or not of the vector space model and of controlled vocabularies. Problems of automated classification are recognized. Research limitations/implications - The paper does not attempt to provide an exhaustive bibliography of related resources. Practical implications - As an integrated overview of approaches from different research communities with application examples, it is very useful for students in library and information science and computer science, as well as for practitioners. Researchers from one community have the information on how similar tasks are conducted in different communities. Originality/value - To the author's knowledge, no review paper on automated text classification attempted to discuss more than one community's approach from an integrated perspective.
Golub, K.; Soergel, D.; Buchanan, G.; Tudhope, D.; Lykke, M.; Hiom, D.: ¬A framework for evaluating automatic indexing or classification in the context of retrieval (2016) 0.01
```
0.0110559575 = product of:
  0.04422383 = sum of:
    0.029504994 = weight(_text_:computer in 3311) [ClassicSimilarity], result of:
      0.029504994 = score(doc=3311,freq=2.0), product of:
        0.1461475 = queryWeight, product of:
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.039991006 = queryNorm
        0.20188503 = fieldWeight in 3311, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3311)
    0.014718837 = product of:
      0.029437674 = sum of:
        0.029437674 = weight(_text_:resources in 3311) [ClassicSimilarity], result of:
          0.029437674 = score(doc=3311,freq=2.0), product of:
            0.14598069 = queryWeight, product of:
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.039991006 = queryNorm
            0.20165458 = fieldWeight in 3311, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3311)
      0.5 = coord(1/2)
  0.25 = coord(2/8)
```
Abstract

Tools for automatic subject assignment help deal with scale and sustainability in creating and enriching metadata, establishing more connections across and between resources and enhancing consistency. Although some software vendors and experimental researchers claim the tools can replace manual subject indexing, hard scientific evidence of their performance in operating information environments is scarce. A major reason for this is that research is usually conducted in laboratory conditions, excluding the complexities of real-life systems and situations. The article reviews and discusses issues with existing evaluation approaches such as problems of aboutness and relevance assessments, implying the need to use more than a single "gold standard" method when evaluating indexing and retrieval, and proposes a comprehensive evaluation framework. The framework is informed by a systematic review of the literature on evaluation approaches: evaluating indexing quality directly through assessment by an evaluator or through comparison with a gold standard, evaluating the quality of computer-assisted indexing directly in the context of an indexing workflow, and evaluating indexing quality indirectly through analyzing retrieval performance.
Meder, N.: Artificial intelligence as a tool of classification, or: the network of language games as cognitive paradigm (1985) 0.01
```
0.010843484 = product of:
  0.08674787 = sum of:
    0.08674787 = weight(_text_:network in 7694) [ClassicSimilarity], result of:
      0.08674787 = score(doc=7694,freq=4.0), product of:
        0.17809492 = queryWeight, product of:
          4.4533744 = idf(docFreq=1398, maxDocs=44218)
          0.039991006 = queryNorm
        0.48708782 = fieldWeight in 7694, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.4533744 = idf(docFreq=1398, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7694)
  0.125 = coord(1/8)
```
Abstract

It is shown that the cognitive paradigm may be an orientation mark for automatic classification. On the basis of research in Artificial Intelligence, the cognitive paradigm - as opposed to the behavioristic paradigm - was developed as a multiplicity of competitive world-views. This is the thesis of DeMey in his book "The cognitive paradigm". Multiplicity in a loosely-coupled network of cognitive knots is also the principle of dynamic restlessness. In competititon with cognitive views, a classification system that follows various models may learn by concrete information retrieval. During his actions the user builds implicitly a new classification order

Dubin, D.: Dimensions and discriminability (1998) 0.01

0.009892544 = product of:
  0.07914035 = sum of:
    0.07914035 = sum of:
      0.041212745 = weight(_text_:resources in 2338) [ClassicSimilarity], result of:
        0.041212745 = score(doc=2338,freq=2.0), product of:
          0.14598069 = queryWeight, product of:
            3.650338 = idf(docFreq=3122, maxDocs=44218)
            0.039991006 = queryNorm
          0.28231642 = fieldWeight in 2338, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.650338 = idf(docFreq=3122, maxDocs=44218)
            0.0546875 = fieldNorm(doc=2338)
      0.037927605 = weight(_text_:22 in 2338) [ClassicSimilarity], result of:
        0.037927605 = score(doc=2338,freq=2.0), product of:
          0.1400417 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.039991006 = queryNorm
          0.2708308 = fieldWeight in 2338, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=2338)
  0.125 = coord(1/8)

Date: 22. 9.1997 19:16:05
Source: Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al

Yi, K.: Automatic text classification using library classification schemes : trends, issues and challenges (2007) 0.01

0.009892544 = product of:
  0.07914035 = sum of:
    0.07914035 = sum of:
      0.041212745 = weight(_text_:resources in 2560) [ClassicSimilarity], result of:
        0.041212745 = score(doc=2560,freq=2.0), product of:
          0.14598069 = queryWeight, product of:
            3.650338 = idf(docFreq=3122, maxDocs=44218)
            0.039991006 = queryNorm
          0.28231642 = fieldWeight in 2560, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.650338 = idf(docFreq=3122, maxDocs=44218)
            0.0546875 = fieldNorm(doc=2560)
      0.037927605 = weight(_text_:22 in 2560) [ClassicSimilarity], result of:
        0.037927605 = score(doc=2560,freq=2.0), product of:
          0.1400417 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.039991006 = queryNorm
          0.2708308 = fieldWeight in 2560, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=2560)
  0.125 = coord(1/8)

Abstract: The proliferation of digital resources and their integration into a traditional library setting has created a pressing need for an automated tool that organizes textual information based on library classification schemes. Automated text classification is a research field of developing tools, methods, and models to automate text classification. This article describes the current popular approach for text classification and major text classification projects and applications that are based on library classification schemes. Related issues and challenges are discussed, and a number of considerations for the challenges are examined.
Date: 22. 9.2008 18:31:54

Guerrero-Bote, V.P.; Moya Anegón, F. de; Herrero Solana, V.: Document organization using Kohonen's algorithm (2002) 0.01
```
0.008762858 = product of:
  0.07010286 = sum of:
    0.07010286 = weight(_text_:network in 2564) [ClassicSimilarity], result of:
      0.07010286 = score(doc=2564,freq=2.0), product of:
        0.17809492 = queryWeight, product of:
          4.4533744 = idf(docFreq=1398, maxDocs=44218)
          0.039991006 = queryNorm
        0.3936264 = fieldWeight in 2564, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4533744 = idf(docFreq=1398, maxDocs=44218)
          0.0625 = fieldNorm(doc=2564)
  0.125 = coord(1/8)
```
Abstract

The classification of documents from a bibliographic database is a task that is linked to processes of information retrieval based on partial matching. A method is described of vectorizing reference documents from LISA which permits their topological organization using Kohonen's algorithm. As an example a map is generated of 202 documents from LISA, and an analysis is made of the possibilities of this type of neural network with respect to the development of information retrieval systems based on graphical browsing.
Ru, C.; Tang, J.; Li, S.; Xie, S.; Wang, T.: Using semantic similarity to reduce wrong labels in distant supervision for relation extraction (2018) 0.01
```
0.0077453456 = product of:
  0.061962765 = sum of:
    0.061962765 = weight(_text_:network in 5055) [ClassicSimilarity], result of:
      0.061962765 = score(doc=5055,freq=4.0), product of:
        0.17809492 = queryWeight, product of:
          4.4533744 = idf(docFreq=1398, maxDocs=44218)
          0.039991006 = queryNorm
        0.34791988 = fieldWeight in 5055, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.4533744 = idf(docFreq=1398, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5055)
  0.125 = coord(1/8)
```
Abstract

Distant supervision (DS) has the advantage of automatically generating large amounts of labelled training data and has been widely used for relation extraction. However, there are usually many wrong labels in the automatically labelled data in distant supervision (Riedel, Yao, & McCallum, 2010). This paper presents a novel method to reduce the wrong labels. The proposed method uses the semantic Jaccard with word embedding to measure the semantic similarity between the relation phrase in the knowledge base and the dependency phrases between two entities in a sentence to filter the wrong labels. In the process of reducing wrong labels, the semantic Jaccard algorithm selects a core dependency phrase to represent the candidate relation in a sentence, which can capture features for relation classification and avoid the negative impact from irrelevant term sequences that previous neural network models of relation extraction often suffer. In the process of relation classification, the core dependency phrases are also used as the input of a convolutional neural network (CNN) for relation classification. The experimental results show that compared with the methods using original DS data, the methods using filtered DS data performed much better in relation extraction. It indicates that the semantic similarity based method is effective in reducing wrong labels. The relation extraction performance of the CNN model using the core dependency phrases as input is the best of all, which indicates that using the core dependency phrases as input of CNN is enough to capture the features for relation classification and could avoid negative impact from irrelevant terms.
Orwig, R.E.; Chen, H.; Nunamaker, J.F.: ¬A graphical, self-organizing approach to classifying electronic meeting output (1997) 0.01
```
0.0076675005 = product of:
  0.061340004 = sum of:
    0.061340004 = weight(_text_:network in 6928) [ClassicSimilarity], result of:
      0.061340004 = score(doc=6928,freq=2.0), product of:
        0.17809492 = queryWeight, product of:
          4.4533744 = idf(docFreq=1398, maxDocs=44218)
          0.039991006 = queryNorm
        0.3444231 = fieldWeight in 6928, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4533744 = idf(docFreq=1398, maxDocs=44218)
          0.0546875 = fieldNorm(doc=6928)
  0.125 = coord(1/8)
```
Abstract

Describes research in the application of a Kohonen Self-Organizing Map (SOM) to the problem of classification of electronic brainstorming output and an evaluation of the results. Describes an electronic meeting system and describes the classification problem that exists in the group problem solving process. Surveys the literature concerning classification. Describes the application of the Kohonen SOM to the meeting output classification problem. Describes an experiment that evaluated the classification performed by the Kohonen SOM by comparing it with those of a human expert and a Hopfield neural network. Discusses conclusions and directions for future research
Ruiz, M.E.; Srinivasan, P.: Combining machine learning and hierarchical indexing structures for text categorization (2001) 0.01
```
0.0076675005 = product of:
  0.061340004 = sum of:
    0.061340004 = weight(_text_:network in 1595) [ClassicSimilarity], result of:
      0.061340004 = score(doc=1595,freq=2.0), product of:
        0.17809492 = queryWeight, product of:
          4.4533744 = idf(docFreq=1398, maxDocs=44218)
          0.039991006 = queryNorm
        0.3444231 = fieldWeight in 1595, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4533744 = idf(docFreq=1398, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1595)
  0.125 = coord(1/8)
```
Abstract

This paper presents a method that exploits the hierarchical structure of an indexing vocabulary to guide the development and training of machine learning methods for automatic text categorization. We present the design of a hierarchical classifier based an the divide-and-conquer principle. The method is evaluated using backpropagation neural networks, such as the machine learning algorithm, that leam to assign MeSH categories to a subset of MEDLINE records. Comparisons with traditional Rocchio's algorithm adapted for text categorization, as well as flat neural network classifiers, are provided. The results indicate that the use of hierarchical structures improves Performance significantly.
Yang, Y.; Liu, X.: ¬A re-examination of text categorization methods (1999) 0.01
```
0.0076675005 = product of:
  0.061340004 = sum of:
    0.061340004 = weight(_text_:network in 3386) [ClassicSimilarity], result of:
      0.061340004 = score(doc=3386,freq=2.0), product of:
        0.17809492 = queryWeight, product of:
          4.4533744 = idf(docFreq=1398, maxDocs=44218)
          0.039991006 = queryNorm
        0.3444231 = fieldWeight in 3386, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4533744 = idf(docFreq=1398, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3386)
  0.125 = coord(1/8)
```
Abstract

This paper reports a controlled study with statistical significance tests an five text categorization methods: the Support Vector Machines (SVM), a k-Nearest Neighbor (kNN) classifier, a neural network (NNet) approach, the Linear Leastsquares Fit (LLSF) mapping and a Naive Bayes (NB) classifier. We focus an the robustness of these methods in dealing with a skewed category distribution, and their performance as function of the training-set category frequency. Our results show that SVM, kNN and LLSF significantly outperform NNet and NB when the number of positive training instances per category are small (less than ten, and that all the methods perform comparably when the categories are sufficiently common (over 300 instances).

Denoyer, L.; Gallinari, P.: Bayesian network model for semi-structured document classification (2004) 0.01

0.006572143 = product of:
  0.052577145 = sum of:
    0.052577145 = weight(_text_:network in 995) [ClassicSimilarity], result of:
      0.052577145 = score(doc=995,freq=2.0), product of:
        0.17809492 = queryWeight, product of:
          4.4533744 = idf(docFreq=1398, maxDocs=44218)
          0.039991006 = queryNorm
        0.29521978 = fieldWeight in 995, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4533744 = idf(docFreq=1398, maxDocs=44218)
          0.046875 = fieldNorm(doc=995)
  0.125 = coord(1/8)

Yang, P.; Gao, W.; Tan, Q.; Wong, K.-F.: ¬A link-bridged topic model for cross-domain document classification (2013) 0.01
```
0.0054767863 = product of:
  0.04381429 = sum of:
    0.04381429 = weight(_text_:network in 2706) [ClassicSimilarity], result of:
      0.04381429 = score(doc=2706,freq=2.0), product of:
        0.17809492 = queryWeight, product of:
          4.4533744 = idf(docFreq=1398, maxDocs=44218)
          0.039991006 = queryNorm
        0.2460165 = fieldWeight in 2706, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4533744 = idf(docFreq=1398, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2706)
  0.125 = coord(1/8)
```
Abstract

Transfer learning utilizes labeled data available from some related domain (source domain) for achieving effective knowledge transformation to the target domain. However, most state-of-the-art cross-domain classification methods treat documents as plain text and ignore the hyperlink (or citation) relationship existing among the documents. In this paper, we propose a novel cross-domain document classification approach called Link-Bridged Topic model (LBT). LBT consists of two key steps. Firstly, LBT utilizes an auxiliary link network to discover the direct or indirect co-citation relationship among documents by embedding the background knowledge into a graph kernel. The mined co-citation relationship is leveraged to bridge the gap across different domains. Secondly, LBT simultaneously combines the content information and link structures into a unified latent topic model. The model is based on an assumption that the documents of source and target domains share some common topics from the point of view of both content information and link structure. By mapping both domains data into the latent topic spaces, LBT encodes the knowledge about domain commonality and difference as the shared topics with associated differential probabilities. The learned latent topics must be consistent with the source and target data, as well as content and link statistics. Then the shared topics act as the bridge to facilitate knowledge transfer from the source to the target domains. Experiments on different types of datasets show that our algorithm significantly improves the generalization performance of cross-domain document classification.
Chae, G.; Park, J.; Park, J.; Yeo, W.S.; Shi, C.: Linking and clustering artworks using social tags : revitalizing crowd-sourced information on cultural collections (2016) 0.01
```
0.0054767863 = product of:
  0.04381429 = sum of:
    0.04381429 = weight(_text_:network in 2852) [ClassicSimilarity], result of:
      0.04381429 = score(doc=2852,freq=2.0), product of:
        0.17809492 = queryWeight, product of:
          4.4533744 = idf(docFreq=1398, maxDocs=44218)
          0.039991006 = queryNorm
        0.2460165 = fieldWeight in 2852, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4533744 = idf(docFreq=1398, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2852)
  0.125 = coord(1/8)
```
Abstract

Social tagging is one of the most popular methods for collecting crowd-sourced information in galleries, libraries, archives, and museums (GLAMs). However, when the number of social tags grows rapidly, using them becomes problematic and, as a result, they are often left as simply big data that cannot be used for practical purposes. To revitalize the use of this crowd-sourced information, we propose using social tags to link and cluster artworks based on an experimental study using an online collection at the Gyeonggi Museum of Modern Art (GMoMA). We view social tagging as a folksonomy, where artworks are classified by keywords of the crowd's various interpretations and one artwork can belong to several different categories simultaneously. To leverage this strength of social tags, we used a clustering method called "link communities" to detect overlapping communities in a network of artworks constructed by computing similarities between all artwork pairs. We used this framework to identify semantic relationships and clusters of similar artworks. By comparing the clustering results with curators' manual classification results, we demonstrated the potential of social tagging data for automatically clustering artworks in a way that reflects the dynamic perspectives of crowds.

McKiernan, G.: Automated categorisation of Web resources : a profile of selected projects, research, products, and services (1996) 0.01

0.005203895 = product of:
  0.04163116 = sum of:
    0.04163116 = product of:
      0.08326232 = sum of:
        0.08326232 = weight(_text_:resources in 2533) [ClassicSimilarity], result of:
          0.08326232 = score(doc=2533,freq=4.0), product of:
            0.14598069 = queryWeight, product of:
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.039991006 = queryNorm
            0.5703653 = fieldWeight in 2533, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.078125 = fieldNorm(doc=2533)
      0.5 = coord(1/2)
  0.125 = coord(1/8)

Abstract: Profiles several representative current efforts that apply established as well as more innovative methods of automated classification, organization or other method of categorisation of WWW resources

Savic, D.: Automatic classification of office documents : review of available methods and techniques (1995) 0.01
```
0.0051633734 = product of:
  0.041306987 = sum of:
    0.041306987 = weight(_text_:computer in 2219) [ClassicSimilarity], result of:
      0.041306987 = score(doc=2219,freq=2.0), product of:
        0.1461475 = queryWeight, product of:
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.039991006 = queryNorm
        0.28263903 = fieldWeight in 2219, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2219)
  0.125 = coord(1/8)
```
Abstract

Classification of office documents is one of the administrative functions carried out by almost every organization and institution which sends and receives correspondence. Processing of this increasing amount of information coming and out going mail, in particular its classification, is time consuming and expensive. More and more organizations are seeking a solution for meeting this challenge by designing computer based systems for automatic classification. Examines the present status of available knowledge and methodology which can be used for automatic classification of office documents. Besides a review of classic methods and techniques, the focus id also placed on the application of artificial intelligence

Search (46 results, page 1 of 3)

Authors

Years

Languages

Types

Themes