Search (137 results, page 2 of 7)

HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.01

0.011671156 = product of:
  0.035013467 = sum of:
    0.035013467 = product of:
      0.070026934 = sum of:
        0.070026934 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
          0.070026934 = score(doc=2748,freq=2.0), product of:
            0.18099438 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05168566 = queryNorm
            0.38690117 = fieldWeight in 2748, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2748)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 1. 2.2016 18:25:22

Möller, G.: Automatic classification of the World Wide Web using Universal Decimal Classification (1999) 0.01

0.010160271 = product of:
  0.030480811 = sum of:
    0.030480811 = weight(_text_:information in 494) [ClassicSimilarity], result of:
      0.030480811 = score(doc=494,freq=6.0), product of:
        0.09073304 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.05168566 = queryNorm
        0.3359395 = fieldWeight in 494, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.078125 = fieldNorm(doc=494)
  0.33333334 = coord(1/3)

Imprint: Hinskey Hill : Learned Information
Source: Online information 99: 23rd International Online Information Meeting, Proceedings, London, 7-9 December 1999. Ed.: D. Raitt et al

Miyamoto, S.: Information clustering based an fuzzy multisets (2003) 0.01
```
0.010058153 = product of:
  0.03017446 = sum of:
    0.03017446 = weight(_text_:information in 1071) [ClassicSimilarity], result of:
      0.03017446 = score(doc=1071,freq=12.0), product of:
        0.09073304 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.05168566 = queryNorm
        0.3325631 = fieldWeight in 1071, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1071)
  0.33333334 = coord(1/3)
```
Abstract

A fuzzy multiset model for information clustering is proposed with application to information retrieval on the World Wide Web. Noting that a search engine retrieves multiple occurrences of the same subjects with possibly different degrees of relevance, we observe that fuzzy multisets provide an appropriate model of information retrieval on the WWW. Information clustering which means both term clustering and document clustering is considered. Three methods of the hard c-means, fuzzy c-means, and an agglomerative method using cluster centers are proposed. Two distances between fuzzy multisets and algorithms for calculating cluster centers are defined. Theoretical properties concerning the clustering algorithms are studied. Illustrative examples are given to show how the algorithms work.

Source

Information processing and management. 39(2003) no.2, S.195-213
Ko, Y.: ¬A new term-weighting scheme for text classification using the odds of positive and negative class probabilities (2015) 0.01
```
0.009954991 = product of:
  0.02986497 = sum of:
    0.02986497 = weight(_text_:information in 2339) [ClassicSimilarity], result of:
      0.02986497 = score(doc=2339,freq=16.0), product of:
        0.09073304 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.05168566 = queryNorm
        0.3291521 = fieldWeight in 2339, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2339)
  0.33333334 = coord(1/3)
```
Abstract

Text classification (TC) is a core technique for text mining and information retrieval. It has been applied to many applications in many different research and industrial areas. Term-weighting schemes assign an appropriate weight to each term to obtain a high TC performance. Although term weighting is one of the important modules for TC and TC has different peculiarities from those in information retrieval, many term-weighting schemes used in information retrieval, such as term frequency-inverse document frequency (tf-idf), have been used in TC in the same manner. The peculiarity of TC that differs most from information retrieval is the existence of class information. This article proposes a new term-weighting scheme that uses class information using positive and negative class distributions. As a result, the proposed scheme, log tf-TRR, consistently performs better than do other schemes using class information as well as traditional schemes such as tf-idf.

Source

Journal of the Association for Information Science and Technology. 66(2015) no.12, S.2553-2565

Rijsbergen, C.J. van: Automatic classification in information retrieval (1978) 0.01

0.009385655 = product of:
  0.028156964 = sum of:
    0.028156964 = weight(_text_:information in 2412) [ClassicSimilarity], result of:
      0.028156964 = score(doc=2412,freq=2.0), product of:
        0.09073304 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.05168566 = queryNorm
        0.3103276 = fieldWeight in 2412, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.125 = fieldNorm(doc=2412)
  0.33333334 = coord(1/3)

Kwok, K.L.: ¬The use of titles and cited titles as document representations for automatic classification (1975) 0.01

0.008212449 = product of:
  0.024637345 = sum of:
    0.024637345 = weight(_text_:information in 4347) [ClassicSimilarity], result of:
      0.024637345 = score(doc=4347,freq=2.0), product of:
        0.09073304 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.05168566 = queryNorm
        0.27153665 = fieldWeight in 4347, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.109375 = fieldNorm(doc=4347)
  0.33333334 = coord(1/3)

Source: Information processing and management. 11(1975), S.201-206

Schiminovich, S.: Automatic classification and retrieval of documents by means of a bibliographic pattern discovery algorithm (1971) 0.01

0.008212449 = product of:
  0.024637345 = sum of:
    0.024637345 = weight(_text_:information in 4846) [ClassicSimilarity], result of:
      0.024637345 = score(doc=4846,freq=2.0), product of:
        0.09073304 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.05168566 = queryNorm
        0.27153665 = fieldWeight in 4846, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.109375 = fieldNorm(doc=4846)
  0.33333334 = coord(1/3)

Source: Information storage and retrieval. 6(1971), S.417-435

Wu, M.; Fuller, M.; Wilkinson, R.: Using clustering and classification approaches in interactive retrieval (2001) 0.01

0.008212449 = product of:
  0.024637345 = sum of:
    0.024637345 = weight(_text_:information in 2666) [ClassicSimilarity], result of:
      0.024637345 = score(doc=2666,freq=2.0), product of:
        0.09073304 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.05168566 = queryNorm
        0.27153665 = fieldWeight in 2666, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.109375 = fieldNorm(doc=2666)
  0.33333334 = coord(1/3)

Source: Information processing and management. 37(2001) no.3, S.459-484

Panyr, J.: Vektorraum-Modell und Clusteranalyse in Information-Retrieval-Systemen (1987) 0.01
```
0.0081282165 = product of:
  0.024384648 = sum of:
    0.024384648 = weight(_text_:information in 2322) [ClassicSimilarity], result of:
      0.024384648 = score(doc=2322,freq=6.0), product of:
        0.09073304 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.05168566 = queryNorm
        0.2687516 = fieldWeight in 2322, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0625 = fieldNorm(doc=2322)
  0.33333334 = coord(1/3)
```
Abstract

Ausgehend von theoretischen Indexierungsansätzen wird das klassische Vektorraum-Modell für automatische Indexierung (mit dem Trennschärfen-Modell) erläutert. Das Clustering in Information-Retrieval-Systemem wird als eine natürliche logische Folge aus diesem Modell aufgefaßt und in allen seinen Ausprägungen (d.h. als Dokumenten-, Term- oder Dokumenten- und Termklassifikation) behandelt. Anschließend werden die Suchstrategien in vorklassifizierten Dokumentenbeständen (Clustersuche) detailliert beschrieben. Zum Schluß wird noch die sinnvolle Anwendung der Clusteranalyse in Information-Retrieval-Systemen kurz diskutiert
Guerrero-Bote, V.P.; Moya Anegón, F. de; Herrero Solana, V.: Document organization using Kohonen's algorithm (2002) 0.01
```
0.0081282165 = product of:
  0.024384648 = sum of:
    0.024384648 = weight(_text_:information in 2564) [ClassicSimilarity], result of:
      0.024384648 = score(doc=2564,freq=6.0), product of:
        0.09073304 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.05168566 = queryNorm
        0.2687516 = fieldWeight in 2564, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0625 = fieldNorm(doc=2564)
  0.33333334 = coord(1/3)
```
Abstract

The classification of documents from a bibliographic database is a task that is linked to processes of information retrieval based on partial matching. A method is described of vectorizing reference documents from LISA which permits their topological organization using Kohonen's algorithm. As an example a map is generated of 202 documents from LISA, and an analysis is made of the possibilities of this type of neural network with respect to the development of information retrieval systems based on graphical browsing.

Source

Information processing and management. 38(2002) no.1, S.79-89
Dolin, R.; Agrawal, D.; El Abbadi, A.; Pearlman, J.: Using automated classification for summarizing and selecting heterogeneous information sources (1998) 0.01
```
0.007870112 = product of:
  0.023610333 = sum of:
    0.023610333 = weight(_text_:information in 316) [ClassicSimilarity], result of:
      0.023610333 = score(doc=316,freq=10.0), product of:
        0.09073304 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.05168566 = queryNorm
        0.2602176 = fieldWeight in 316, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=316)
  0.33333334 = coord(1/3)
```
Abstract

Information retrieval over the Internet increasingly requires the filtering of thousands of heterogeneous information sources. Important sources of information include not only traditional databases with structured data and queries, but also increasing numbers of non-traditional, semi- or unstructured collections such as Web sites, FTP archives, etc. As the number and variability of sources increases, new ways of automatically summarizing, discovering, and selecting collections relevant to a user's query are needed. One such method involves the use of classification schemes, such as the Library of Congress Classification (LCC) [10], within which a collection may be represented based on its content, irrespective of the structure of the actual data or documents. For such a system to be useful in a large-scale distributed environment, it must be easy to use for both collection managers and users. As a result, it must be possible to classify documents automatically within a classification scheme. Furthermore, there must be a straightforward and intuitive interface with which the user may use the scheme to assist in information retrieval (IR).
Denoyer, L.; Gallinari, P.: Bayesian network model for semi-structured document classification (2004) 0.01
```
0.007870112 = product of:
  0.023610333 = sum of:
    0.023610333 = weight(_text_:information in 995) [ClassicSimilarity], result of:
      0.023610333 = score(doc=995,freq=10.0), product of:
        0.09073304 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.05168566 = queryNorm
        0.2602176 = fieldWeight in 995, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=995)
  0.33333334 = coord(1/3)
```
Abstract

Recently, a new community has started to emerge around the development of new information research methods for searching and analyzing semi-structured and XML like documents. The goal is to handle both content and structural information, and to deal with different types of information content (text, image, etc.). We consider here the task of structured document classification. We propose a generative model able to handle both structure and content which is based on Bayesian networks. We then show how to transform this generative model into a discriminant classifier using the method of Fisher kernel. The model is then extended for dealing with different types of content information (here text and images). The model was tested on three databases: the classical webKB corpus composed of HTML pages, the new INEX corpus which has become a reference in the field of ad-hoc retrieval for XML documents, and a multimedia corpus of Web pages.

Source

Information processing and management. 40(2004) no.5, S.807-827
Leroy, G.; Miller, T.; Rosemblat, G.; Browne, A.: ¬A balanced approach to health information evaluation : a vocabulary-based naïve Bayes classifier and readability formulas (2008) 0.01
```
0.007870112 = product of:
  0.023610333 = sum of:
    0.023610333 = weight(_text_:information in 1998) [ClassicSimilarity], result of:
      0.023610333 = score(doc=1998,freq=10.0), product of:
        0.09073304 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.05168566 = queryNorm
        0.2602176 = fieldWeight in 1998, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=1998)
  0.33333334 = coord(1/3)
```
Abstract

Since millions seek health information online, it is vital for this information to be comprehensible. Most studies use readability formulas, which ignore vocabulary, and conclude that online health information is too difficult. We developed a vocabularly-based, naïve Bayes classifier to distinguish between three difficulty levels in text. It proved 98% accurate in a 250-document evaluation. We compared our classifier with readability formulas for 90 new documents with different origins and asked representative human evaluators, an expert and a consumer, to judge each document. Average readability grade levels for educational and commercial pages was 10th grade or higher, too difficult according to current literature. In contrast, the classifier showed that 70-90% of these pages were written at an intermediate, appropriate level indicating that vocabulary usage is frequently appropriate in text considered too difficult by readability formula evaluations. The expert considered the pages more difficult for a consumer than the consumer did.

Source

Journal of the American Society for Information Science and Technology. 59(2008) no.9, S.1409-1419

Major, R.L.; Ragsdale, C.T.: ¬An aggregation approach to the classification problem using multiple prediction experts (2000) 0.01

0.007039241 = product of:
  0.021117723 = sum of:
    0.021117723 = weight(_text_:information in 3789) [ClassicSimilarity], result of:
      0.021117723 = score(doc=3789,freq=2.0), product of:
        0.09073304 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.05168566 = queryNorm
        0.23274569 = fieldWeight in 3789, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.09375 = fieldNorm(doc=3789)
  0.33333334 = coord(1/3)

Source: Information processing and management. 36(2000) no.4, S.683-696

Yu, W.; Gong, Y.: Document clustering by concept factorization (2004) 0.01

0.007039241 = product of:
  0.021117723 = sum of:
    0.021117723 = weight(_text_:information in 4084) [ClassicSimilarity], result of:
      0.021117723 = score(doc=4084,freq=2.0), product of:
        0.09073304 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.05168566 = queryNorm
        0.23274569 = fieldWeight in 4084, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.09375 = fieldNorm(doc=4084)
  0.33333334 = coord(1/3)

Source: SIGIR'04: Proceedings of the 27th Annual International ACM-SIGIR Conference an Research and Development in Information Retrieval. Ed.: K. Järvelin, u.a

Pfeffer, M.: Automatische Vergabe von RVK-Notationen mittels fallbasiertem Schließen (2009) 0.01

0.0070026936 = product of:
  0.02100808 = sum of:
    0.02100808 = product of:
      0.04201616 = sum of:
        0.04201616 = weight(_text_:22 in 3051) [ClassicSimilarity], result of:
          0.04201616 = score(doc=3051,freq=2.0), product of:
            0.18099438 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05168566 = queryNorm
            0.23214069 = fieldWeight in 3051, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=3051)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 22. 8.2009 19:51:28

Losee, R.M.; Haas, S.W.: Sublanguage terms : dictionaries, usage, and automatic classification (1995) 0.01
```
0.00663666 = product of:
  0.01990998 = sum of:
    0.01990998 = weight(_text_:information in 2650) [ClassicSimilarity], result of:
      0.01990998 = score(doc=2650,freq=4.0), product of:
        0.09073304 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.05168566 = queryNorm
        0.21943474 = fieldWeight in 2650, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0625 = fieldNorm(doc=2650)
  0.33333334 = coord(1/3)
```
Abstract

The use of terms from natural and social science titles and abstracts is studied from the perspective of sublanguages and their specialized dictionaries. Explores different notions of sublanguage distinctiveness. Object methods for separating hard and soft sciences are suggested based on measures of sublanguage use, dictionary characteristics, and sublanguage distinctiveness. Abstracts were automatically classified with a high degree of accuracy by using a formula that condsiders the degree of uniqueness of terms in each sublanguage. This may prove useful for text filtering of information retrieval systems

Source

Journal of the American Society for Information Science. 46(1995) no.7, S.519-529

Ingwersen, P.; Wormell, I.: Ranganathan in the perspective of advanced information retrieval (1992) 0.01

0.00663666 = product of:
  0.01990998 = sum of:
    0.01990998 = weight(_text_:information in 7695) [ClassicSimilarity], result of:
      0.01990998 = score(doc=7695,freq=4.0), product of:
        0.09073304 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.05168566 = queryNorm
        0.21943474 = fieldWeight in 7695, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0625 = fieldNorm(doc=7695)
  0.33333334 = coord(1/3)

Abstract: Examnines Ranganathan's approach to knowledge organisation and its relevance to intellectual accessibility in libraries. Discusses the current and future developments of his methodology and theories in knowledge-based systems. Topics covered include: semi-automatic classification and structure of thesauri; user-intermediary interactions in information retrieval (IR); semantic value-theory and uncertainty principles in IR; and case grammar

Yoon, Y.; Lee, G.G.: Efficient implementation of associative classifiers for document classification (2007) 0.01
```
0.006096162 = product of:
  0.018288486 = sum of:
    0.018288486 = weight(_text_:information in 909) [ClassicSimilarity], result of:
      0.018288486 = score(doc=909,freq=6.0), product of:
        0.09073304 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.05168566 = queryNorm
        0.20156369 = fieldWeight in 909, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=909)
  0.33333334 = coord(1/3)
```
Abstract

In practical text classification tasks, the ability to interpret the classification result is as important as the ability to classify exactly. Associative classifiers have many favorable characteristics such as rapid training, good classification accuracy, and excellent interpretation. However, associative classifiers also have some obstacles to overcome when they are applied in the area of text classification. The target text collection generally has a very high dimension, thus the training process might take a very long time. We propose a feature selection based on the mutual information between the word and class variables to reduce the space dimension of the associative classifiers. In addition, the training process of the associative classifier produces a huge amount of classification rules, which makes the prediction with a new document ineffective. We resolve this by introducing a new efficient method for storing and pruning classification rules. This method can also be used when predicting a test document. Experimental results using the 20-newsgroups dataset show many benefits of the associative classification in both training and predicting when applied to a real world problem.

Footnote

Beitrag in: Special issue on AIRS2005: Information Retrieval Research in Asia

Source

Information processing and management. 43(2007) no.2, S.393-405
Montesi, M.; Navarrete, T.: Classifying web genres in context : A case study documenting the web genres used by a software engineer (2008) 0.01
```
0.006096162 = product of:
  0.018288486 = sum of:
    0.018288486 = weight(_text_:information in 2100) [ClassicSimilarity], result of:
      0.018288486 = score(doc=2100,freq=6.0), product of:
        0.09073304 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.05168566 = queryNorm
        0.20156369 = fieldWeight in 2100, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2100)
  0.33333334 = coord(1/3)
```
Abstract

This case study analyzes the Internet-based resources that a software engineer uses in his daily work. Methodologically, we studied the web browser history of the participant, classifying all the web pages he had seen over a period of 12 days into web genres. We interviewed him before and after the analysis of the web browser history. In the first interview, he spoke about his general information behavior; in the second, he commented on each web genre, explaining why and how he used them. As a result, three approaches allow us to describe the set of 23 web genres obtained: (a) the purposes they serve for the participant; (b) the role they play in the various work and search phases; (c) and the way they are used in combination with each other. Further observations concern the way the participant assesses quality of web-based resources, and his information behavior as a software engineer.

Source

Information processing and management. 44(2008) no.4, S.1410-1430

Search (137 results, page 2 of 7)

Authors

Years

Languages

Themes