Search (205 results, page 2 of 11)

Guerrero-Bote, V.P.; Moya Anegón, F. de; Herrero Solana, V.: Document organization using Kohonen's algorithm (2002) 0.01

0.009248867 = product of:
  0.023122165 = sum of:
    0.012184162 = weight(_text_:a in 2564) [ClassicSimilarity], result of:
      0.012184162 = score(doc=2564,freq=10.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.22789092 = fieldWeight in 2564, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0625 = fieldNorm(doc=2564)
    0.010938003 = product of:
      0.021876005 = sum of:
        0.021876005 = weight(_text_:information in 2564) [ClassicSimilarity], result of:
          0.021876005 = score(doc=2564,freq=6.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.2687516 = fieldWeight in 2564, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0625 = fieldNorm(doc=2564)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: The classification of documents from a bibliographic database is a task that is linked to processes of information retrieval based on partial matching. A method is described of vectorizing reference documents from LISA which permits their topological organization using Kohonen's algorithm. As an example a map is generated of 202 documents from LISA, and an analysis is made of the possibilities of this type of neural network with respect to the development of information retrieval systems based on graphical browsing.
Source: Information processing and management. 38(2002) no.1, S.79-89
Type: a

Pfeffer, M.: Automatische Vergabe von RVK-Notationen mittels fallbasiertem Schließen (2009) 0.01

0.009173402 = product of:
  0.022933504 = sum of:
    0.004086692 = weight(_text_:a in 3051) [ClassicSimilarity], result of:
      0.004086692 = score(doc=3051,freq=2.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.07643694 = fieldWeight in 3051, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=3051)
    0.018846812 = product of:
      0.037693623 = sum of:
        0.037693623 = weight(_text_:22 in 3051) [ClassicSimilarity], result of:
          0.037693623 = score(doc=3051,freq=2.0), product of:
            0.16237405 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046368346 = queryNorm
            0.23214069 = fieldWeight in 3051, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=3051)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Date: 22. 8.2009 19:51:28
Type: a

Dolin, R.; Agrawal, D.; El Abbadi, A.; Pearlman, J.: Using automated classification for summarizing and selecting heterogeneous information sources (1998) 0.01
```
0.009140301 = product of:
  0.022850752 = sum of:
    0.012260076 = weight(_text_:a in 316) [ClassicSimilarity], result of:
      0.012260076 = score(doc=316,freq=18.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.22931081 = fieldWeight in 316, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=316)
    0.010590675 = product of:
      0.02118135 = sum of:
        0.02118135 = weight(_text_:information in 316) [ClassicSimilarity], result of:
          0.02118135 = score(doc=316,freq=10.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.2602176 = fieldWeight in 316, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=316)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

Information retrieval over the Internet increasingly requires the filtering of thousands of heterogeneous information sources. Important sources of information include not only traditional databases with structured data and queries, but also increasing numbers of non-traditional, semi- or unstructured collections such as Web sites, FTP archives, etc. As the number and variability of sources increases, new ways of automatically summarizing, discovering, and selecting collections relevant to a user's query are needed. One such method involves the use of classification schemes, such as the Library of Congress Classification (LCC) [10], within which a collection may be represented based on its content, irrespective of the structure of the actual data or documents. For such a system to be useful in a large-scale distributed environment, it must be easy to use for both collection managers and users. As a result, it must be possible to classify documents automatically within a classification scheme. Furthermore, there must be a straightforward and intuitive interface with which the user may use the scheme to assist in information retrieval (IR).

Type

a
Mukhopadhyay, S.; Peng, S.; Raje, R.; Palakal, M.; Mostafa, J.: Multi-agent information classification using dynamic acquaintance lists (2003) 0.01
```
0.008859835 = product of:
  0.022149585 = sum of:
    0.01155891 = weight(_text_:a in 1755) [ClassicSimilarity], result of:
      0.01155891 = score(doc=1755,freq=16.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.2161963 = fieldWeight in 1755, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=1755)
    0.010590675 = product of:
      0.02118135 = sum of:
        0.02118135 = weight(_text_:information in 1755) [ClassicSimilarity], result of:
          0.02118135 = score(doc=1755,freq=10.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.2602176 = fieldWeight in 1755, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=1755)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

There has been considerable interest in recent years in providing automated information services, such as information classification, by means of a society of collaborative agents. These agents augment each other's knowledge structures (e.g., the vocabularies) and assist each other in providing efficient information services to a human user. However, when the number of agents present in the society increases, exhaustive communication and collaboration among agents result in a [arge communication overhead and increased delays in response time. This paper introduces a method to achieve selective interaction with a relatively small number of potentially useful agents, based an simple agent modeling and acquaintance lists. The key idea presented here is that the acquaintance list of an agent, representing a small number of other agents to be collaborated with, is dynamically adjusted. The best acquaintances are automatically discovered using a learning algorithm, based an the past history of collaboration. Experimental results are presented to demonstrate that such dynamically learned acquaintance lists can lead to high quality of classification, while significantly reducing the delay in response time.

Source

Journal of the American Society for Information Science and technology. 54(2003) no.10, S.966-975

Type

a
Leroy, G.; Miller, T.; Rosemblat, G.; Browne, A.: ¬A balanced approach to health information evaluation : a vocabulary-based naïve Bayes classifier and readability formulas (2008) 0.01
```
0.008859835 = product of:
  0.022149585 = sum of:
    0.01155891 = weight(_text_:a in 1998) [ClassicSimilarity], result of:
      0.01155891 = score(doc=1998,freq=16.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.2161963 = fieldWeight in 1998, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=1998)
    0.010590675 = product of:
      0.02118135 = sum of:
        0.02118135 = weight(_text_:information in 1998) [ClassicSimilarity], result of:
          0.02118135 = score(doc=1998,freq=10.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.2602176 = fieldWeight in 1998, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=1998)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

Since millions seek health information online, it is vital for this information to be comprehensible. Most studies use readability formulas, which ignore vocabulary, and conclude that online health information is too difficult. We developed a vocabularly-based, naïve Bayes classifier to distinguish between three difficulty levels in text. It proved 98% accurate in a 250-document evaluation. We compared our classifier with readability formulas for 90 new documents with different origins and asked representative human evaluators, an expert and a consumer, to judge each document. Average readability grade levels for educational and commercial pages was 10th grade or higher, too difficult according to current literature. In contrast, the classifier showed that 70-90% of these pages were written at an intermediate, appropriate level indicating that vocabulary usage is frequently appropriate in text considered too difficult by readability formula evaluations. The expert considered the pages more difficult for a consumer than the consumer did.

Source

Journal of the American Society for Information Science and Technology. 59(2008) no.9, S.1409-1419

Type

a

Miyamoto, S.: Information clustering based an fuzzy multisets (2003) 0.01

0.008717269 = product of:
  0.021793172 = sum of:
    0.008258085 = weight(_text_:a in 1071) [ClassicSimilarity], result of:
      0.008258085 = score(doc=1071,freq=6.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.1544581 = fieldWeight in 1071, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1071)
    0.013535086 = product of:
      0.027070172 = sum of:
        0.027070172 = weight(_text_:information in 1071) [ClassicSimilarity], result of:
          0.027070172 = score(doc=1071,freq=12.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.3325631 = fieldWeight in 1071, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1071)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: A fuzzy multiset model for information clustering is proposed with application to information retrieval on the World Wide Web. Noting that a search engine retrieves multiple occurrences of the same subjects with possibly different degrees of relevance, we observe that fuzzy multisets provide an appropriate model of information retrieval on the WWW. Information clustering which means both term clustering and document clustering is considered. Three methods of the hard c-means, fuzzy c-means, and an agglomerative method using cluster centers are proposed. Two distances between fuzzy multisets and algorithms for calculating cluster centers are defined. Theoretical properties concerning the clustering algorithms are studied. Illustrative examples are given to show how the algorithms work.
Source: Information processing and management. 39(2003) no.2, S.195-213
Type: a

Denoyer, L.; Gallinari, P.: Bayesian network model for semi-structured document classification (2004) 0.01

0.008240394 = product of:
  0.020600986 = sum of:
    0.0100103095 = weight(_text_:a in 995) [ClassicSimilarity], result of:
      0.0100103095 = score(doc=995,freq=12.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.18723148 = fieldWeight in 995, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=995)
    0.010590675 = product of:
      0.02118135 = sum of:
        0.02118135 = weight(_text_:information in 995) [ClassicSimilarity], result of:
          0.02118135 = score(doc=995,freq=10.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.2602176 = fieldWeight in 995, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=995)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: Recently, a new community has started to emerge around the development of new information research methods for searching and analyzing semi-structured and XML like documents. The goal is to handle both content and structural information, and to deal with different types of information content (text, image, etc.). We consider here the task of structured document classification. We propose a generative model able to handle both structure and content which is based on Bayesian networks. We then show how to transform this generative model into a discriminant classifier using the method of Fisher kernel. The model is then extended for dealing with different types of content information (here text and images). The model was tested on three databases: the classical webKB corpus composed of HTML pages, the new INEX corpus which has become a reference in the field of ad-hoc retrieval for XML documents, and a multimedia corpus of Web pages.
Source: Information processing and management. 40(2004) no.5, S.807-827
Type: a

Kwok, K.L.: ¬The use of titles and cited titles as document representations for automatic classification (1975) 0.01

0.008234787 = product of:
  0.020586967 = sum of:
    0.009535614 = weight(_text_:a in 4347) [ClassicSimilarity], result of:
      0.009535614 = score(doc=4347,freq=2.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.17835285 = fieldWeight in 4347, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.109375 = fieldNorm(doc=4347)
    0.011051352 = product of:
      0.022102704 = sum of:
        0.022102704 = weight(_text_:information in 4347) [ClassicSimilarity], result of:
          0.022102704 = score(doc=4347,freq=2.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.27153665 = fieldWeight in 4347, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.109375 = fieldNorm(doc=4347)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Source: Information processing and management. 11(1975), S.201-206
Type: a

Wu, M.; Fuller, M.; Wilkinson, R.: Using clustering and classification approaches in interactive retrieval (2001) 0.01

0.008234787 = product of:
  0.020586967 = sum of:
    0.009535614 = weight(_text_:a in 2666) [ClassicSimilarity], result of:
      0.009535614 = score(doc=2666,freq=2.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.17835285 = fieldWeight in 2666, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.109375 = fieldNorm(doc=2666)
    0.011051352 = product of:
      0.022102704 = sum of:
        0.022102704 = weight(_text_:information in 2666) [ClassicSimilarity], result of:
          0.022102704 = score(doc=2666,freq=2.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.27153665 = fieldWeight in 2666, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.109375 = fieldNorm(doc=2666)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Source: Information processing and management. 37(2001) no.3, S.459-484
Type: a

Möller, G.: Automatic classification of the World Wide Web using Universal Decimal Classification (1999) 0.01

0.008193462 = product of:
  0.020483656 = sum of:
    0.0068111527 = weight(_text_:a in 494) [ClassicSimilarity], result of:
      0.0068111527 = score(doc=494,freq=2.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.12739488 = fieldWeight in 494, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.078125 = fieldNorm(doc=494)
    0.013672504 = product of:
      0.027345007 = sum of:
        0.027345007 = weight(_text_:information in 494) [ClassicSimilarity], result of:
          0.027345007 = score(doc=494,freq=6.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.3359395 = fieldWeight in 494, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.078125 = fieldNorm(doc=494)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Imprint: Hinskey Hill : Learned Information
Source: Online information 99: 23rd International Online Information Meeting, Proceedings, London, 7-9 December 1999. Ed.: D. Raitt et al
Type: a

Yoon, Y.; Lee, G.G.: Efficient implementation of associative classifiers for document classification (2007) 0.01
```
0.008185431 = product of:
  0.020463578 = sum of:
    0.012260076 = weight(_text_:a in 909) [ClassicSimilarity], result of:
      0.012260076 = score(doc=909,freq=18.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.22931081 = fieldWeight in 909, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=909)
    0.008203502 = product of:
      0.016407004 = sum of:
        0.016407004 = weight(_text_:information in 909) [ClassicSimilarity], result of:
          0.016407004 = score(doc=909,freq=6.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.20156369 = fieldWeight in 909, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=909)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

In practical text classification tasks, the ability to interpret the classification result is as important as the ability to classify exactly. Associative classifiers have many favorable characteristics such as rapid training, good classification accuracy, and excellent interpretation. However, associative classifiers also have some obstacles to overcome when they are applied in the area of text classification. The target text collection generally has a very high dimension, thus the training process might take a very long time. We propose a feature selection based on the mutual information between the word and class variables to reduce the space dimension of the associative classifiers. In addition, the training process of the associative classifier produces a huge amount of classification rules, which makes the prediction with a new document ineffective. We resolve this by introducing a new efficient method for storing and pruning classification rules. This method can also be used when predicting a test document. Experimental results using the 20-newsgroups dataset show many benefits of the associative classification in both training and predicting when applied to a real world problem.

Footnote

Beitrag in: Special issue on AIRS2005: Information Retrieval Research in Asia

Source

Information processing and management. 43(2007) no.2, S.393-405

Type

a

Yi, K.: Challenges in automated classification using library classification schemes (2006) 0.01

0.007931474 = product of:
  0.019828685 = sum of:
    0.010897844 = weight(_text_:a in 5810) [ClassicSimilarity], result of:
      0.010897844 = score(doc=5810,freq=8.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.20383182 = fieldWeight in 5810, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0625 = fieldNorm(doc=5810)
    0.0089308405 = product of:
      0.017861681 = sum of:
        0.017861681 = weight(_text_:information in 5810) [ClassicSimilarity], result of:
          0.017861681 = score(doc=5810,freq=4.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.21943474 = fieldWeight in 5810, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0625 = fieldNorm(doc=5810)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: A major library classification scheme has long been standard classification framework for information sources in traditional library environment, and text classification (TC) becomes a popular and attractive tool of organizing digital information. This paper gives an overview of previous projects and studies on TC using major library classification schemes, and summarizes a discussion of TC research challenges.
Language: a

Montesi, M.; Navarrete, T.: Classifying web genres in context : A case study documenting the web genres used by a software engineer (2008) 0.01

0.0079049645 = product of:
  0.019762412 = sum of:
    0.01155891 = weight(_text_:a in 2100) [ClassicSimilarity], result of:
      0.01155891 = score(doc=2100,freq=16.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.2161963 = fieldWeight in 2100, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=2100)
    0.008203502 = product of:
      0.016407004 = sum of:
        0.016407004 = weight(_text_:information in 2100) [ClassicSimilarity], result of:
          0.016407004 = score(doc=2100,freq=6.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.20156369 = fieldWeight in 2100, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=2100)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: This case study analyzes the Internet-based resources that a software engineer uses in his daily work. Methodologically, we studied the web browser history of the participant, classifying all the web pages he had seen over a period of 12 days into web genres. We interviewed him before and after the analysis of the web browser history. In the first interview, he spoke about his general information behavior; in the second, he commented on each web genre, explaining why and how he used them. As a result, three approaches allow us to describe the set of 23 web genres obtained: (a) the purposes they serve for the participant; (b) the role they play in the various work and search phases; (c) and the way they are used in combination with each other. Further observations concern the way the participant assesses quality of web-based resources, and his information behavior as a software engineer.
Source: Information processing and management. 44(2008) no.4, S.1410-1430
Type: a

Rose, J.R.; Gasteiger, J.: HORACE: an automatic system for the hierarchical classification of chemical reactions (1994) 0.01

0.007797272 = product of:
  0.01949318 = sum of:
    0.011678694 = weight(_text_:a in 7696) [ClassicSimilarity], result of:
      0.011678694 = score(doc=7696,freq=12.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.21843673 = fieldWeight in 7696, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7696)
    0.007814486 = product of:
      0.015628971 = sum of:
        0.015628971 = weight(_text_:information in 7696) [ClassicSimilarity], result of:
          0.015628971 = score(doc=7696,freq=4.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.1920054 = fieldWeight in 7696, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=7696)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: Describes an automatic classification system for classifying chemical reactions. A detailed study of the classification of chemical reactions, based on topological and physicochemical features, is followed by an analysis of the hierarchical classification produced by the HORACE algorithm (Hierarchical Organization of Reactions through Attribute and Condition Eduction), which combines both approaches in a synergistic manner. The searching and updating of reaction hierarchies is demonstrated with the hierarchies produced for 2 data sets by the HORACE algorithm. Shows that reaction hierarchies provide an efficient access to reaction information and indicate the main reaction types for a given reaction scheme, define the scope of a reaction type, enable searchers to find unusual reactions, and can help in locating the reactions most relevant for a given problem
Source: Journal of chemical information and computer sciences. 34(1994) no.1, S.74-90
Type: a

Dang, E.K.F.; Luk, R.W.P.; Ho, K.S.; Chan, S.C.F.; Lee, D.L.: ¬A new measure of clustering effectiveness : algorithms and experimental studies (2008) 0.01

0.0076044286 = product of:
  0.019011071 = sum of:
    0.013485395 = weight(_text_:a in 1367) [ClassicSimilarity], result of:
      0.013485395 = score(doc=1367,freq=16.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.25222903 = fieldWeight in 1367, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1367)
    0.005525676 = product of:
      0.011051352 = sum of:
        0.011051352 = weight(_text_:information in 1367) [ClassicSimilarity], result of:
          0.011051352 = score(doc=1367,freq=2.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.13576832 = fieldWeight in 1367, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1367)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: We propose a new optimal clustering effectiveness measure, called CS1, based on a combination of clusters rather than selecting a single optimal cluster as in the traditional MK1 measure. For hierarchical clustering, we present an algorithm to compute CS1, defined by seeking the optimal combinations of disjoint clusters obtained by cutting the hierarchical structure at a certain similarity level. By reformulating the optimization to a 0-1 linear fractional programming problem, we demonstrate that an exact solution can be obtained by a linear time algorithm. We further discuss how our approach can be generalized to more general problems involving overlapping clusters, and we show how optimal estimates can be obtained by greedy algorithms.
Source: Journal of the American Society for Information Science and Technology. 59(2008) no.3, S.390-406
Type: a

Malo, P.; Sinha, A.; Wallenius, J.; Korhonen, P.: Concept-based document classification using Wikipedia and value function (2011) 0.01
```
0.007583283 = product of:
  0.018958207 = sum of:
    0.012260076 = weight(_text_:a in 4948) [ClassicSimilarity], result of:
      0.012260076 = score(doc=4948,freq=18.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.22931081 = fieldWeight in 4948, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=4948)
    0.0066981306 = product of:
      0.013396261 = sum of:
        0.013396261 = weight(_text_:information in 4948) [ClassicSimilarity], result of:
          0.013396261 = score(doc=4948,freq=4.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.16457605 = fieldWeight in 4948, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=4948)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

In this article, we propose a new concept-based method for document classification. The conceptual knowledge associated with the words is drawn from Wikipedia. The purpose is to utilize the abundant semantic relatedness information available in Wikipedia in an efficient value function-based query learning algorithm. The procedure learns the value function by solving a simple linear programming problem formulated using the training documents. The learning involves a step-wise iterative process that helps in generating a value function with an appropriate set of concepts (dimensions) chosen from a collection of concepts. Once the value function is formulated, it is utilized to make a decision between relevance and irrelevance. The value assigned to a particular document from the value function can be further used to rank the documents according to their relevance. Reuters newswire documents have been used to evaluate the efficacy of the procedure. An extensive comparison with other frameworks has been performed. The results are promising.

Source

Journal of the American Society for Information Science and Technology. 62(2011) no.12, S.2496-2511

Type

a
Lim, C.S.; Lee, K.J.; Kim, G.C.: Multiple sets of features for automatic genre classification of web documents (2005) 0.01
```
0.0074534067 = product of:
  0.018633517 = sum of:
    0.011797264 = weight(_text_:a in 1048) [ClassicSimilarity], result of:
      0.011797264 = score(doc=1048,freq=24.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.22065444 = fieldWeight in 1048, product of:
          4.8989797 = tf(freq=24.0), with freq of:
            24.0 = termFreq=24.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1048)
    0.006836252 = product of:
      0.013672504 = sum of:
        0.013672504 = weight(_text_:information in 1048) [ClassicSimilarity], result of:
          0.013672504 = score(doc=1048,freq=6.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.16796975 = fieldWeight in 1048, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1048)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

With the increase of information on the Web, it is difficult to find desired information quickly out of the documents retrieved by a search engine. One way to solve this problem is to classify web documents according to various criteria. Most document classification has been focused on a subject or a topic of a document. A genre or a style is another view of a document different from a subject or a topic. The genre is also a criterion to classify documents. In this paper, we suggest multiple sets of features to classify genres of web documents. The basic set of features, which have been proposed in the previous studies, is acquired from the textual properties of documents, such as the number of sentences, the number of a certain word, etc. However, web documents are different from textual documents in that they contain URL and HTML tags within the pages. We introduce new sets of features specific to web documents, which are extracted from URL and HTML tags. The present work is an attempt to evaluate the performance of the proposed sets of features, and to discuss their characteristics. Finally, we conclude which is an appropriate set of features in automatic genre classification of web documents.

Source

Information processing and management. 41(2005) no.5, S.1263-1276

Type

a

Losee, R.M.; Haas, S.W.: Sublanguage terms : dictionaries, usage, and automatic classification (1995) 0.01

0.0073474604 = product of:
  0.01836865 = sum of:
    0.009437811 = weight(_text_:a in 2650) [ClassicSimilarity], result of:
      0.009437811 = score(doc=2650,freq=6.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.17652355 = fieldWeight in 2650, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0625 = fieldNorm(doc=2650)
    0.0089308405 = product of:
      0.017861681 = sum of:
        0.017861681 = weight(_text_:information in 2650) [ClassicSimilarity], result of:
          0.017861681 = score(doc=2650,freq=4.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.21943474 = fieldWeight in 2650, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0625 = fieldNorm(doc=2650)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: The use of terms from natural and social science titles and abstracts is studied from the perspective of sublanguages and their specialized dictionaries. Explores different notions of sublanguage distinctiveness. Object methods for separating hard and soft sciences are suggested based on measures of sublanguage use, dictionary characteristics, and sublanguage distinctiveness. Abstracts were automatically classified with a high degree of accuracy by using a formula that condsiders the degree of uniqueness of terms in each sublanguage. This may prove useful for text filtering of information retrieval systems
Source: Journal of the American Society for Information Science. 46(1995) no.7, S.519-529
Type: a

Chung, Y.-M.; Noh, Y.-H.: Developing a specialized directory system by automatically classifying Web documents (2003) 0.01
```
0.007316127 = product of:
  0.018290317 = sum of:
    0.013554024 = weight(_text_:a in 1566) [ClassicSimilarity], result of:
      0.013554024 = score(doc=1566,freq=22.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.25351265 = fieldWeight in 1566, product of:
          4.690416 = tf(freq=22.0), with freq of:
            22.0 = termFreq=22.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=1566)
    0.0047362936 = product of:
      0.009472587 = sum of:
        0.009472587 = weight(_text_:information in 1566) [ClassicSimilarity], result of:
          0.009472587 = score(doc=1566,freq=2.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.116372846 = fieldWeight in 1566, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=1566)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

This study developed a specialized directory system using an automatic classification technique. Economics was selected as the subject field for the classification experiments with Web documents. The classification scheme of the directory follows the DDC, and subject terms representing each class number or subject category were selected from the DDC table to construct a representative term dictionary. In collecting and classifying the Web documents, various strategies were tested in order to find the optimal thresholds. In the classification experiments, Web documents in economics were classified into a total of 757 hierarchical subject categories built from the DDC scheme. The first and second experiments using the representative term dictionary resulted in relatively high precision ratios of 77 and 60%, respectively. The third experiment employing a machine learning-based k-nearest neighbours (kNN) classifier in a closed experimental setting achieved a precision ratio of 96%. This implies that it is possible to enhance the classification performance by applying a hybrid method combining a dictionary-based technique and a kNN classifier

Source

Journal of information science. 29(2003) no.2, S.117-126

Type

a

May, A.D.: Automatic classification of e-mail messages by message type (1997) 0.01

0.0072560436 = product of:
  0.01814011 = sum of:
    0.012614433 = weight(_text_:a in 6493) [ClassicSimilarity], result of:
      0.012614433 = score(doc=6493,freq=14.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.23593865 = fieldWeight in 6493, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0546875 = fieldNorm(doc=6493)
    0.005525676 = product of:
      0.011051352 = sum of:
        0.011051352 = weight(_text_:information in 6493) [ClassicSimilarity], result of:
          0.011051352 = score(doc=6493,freq=2.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.13576832 = fieldWeight in 6493, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6493)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: This article describes a system that automatically classifies e-mail messages in the HUMANIST electronic discussion group into one of 4 classes: questions, responses, announcement or administartive. A total of 1.372 messages were analyzed. The automatic classification of a message was based on string matching between a message text and predefined string sets for each of the massage types. The system's automated ability to accurately classify a message was compared against manually assigned codes. The Cohen's Kappa of .55 suggested that there was a statistical agreement between the automatic and manually assigned codes
Source: Journal of the American Society for Information Science. 48(1997) no.1, S.32-39
Type: a

Search (205 results, page 2 of 11)

Authors

Years

Languages

Types

Themes

Subjects