Search (144 results, page 2 of 8)

Mengle, S.; Goharian, N.: Passage detection using text classification (2009) 0.02
```
0.017121121 = product of:
  0.034242243 = sum of:
    0.017165681 = weight(_text_:information in 2765) [ClassicSimilarity], result of:
      0.017165681 = score(doc=2765,freq=8.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.19395474 = fieldWeight in 2765, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2765)
    0.01707656 = product of:
      0.03415312 = sum of:
        0.03415312 = weight(_text_:22 in 2765) [ClassicSimilarity], result of:
          0.03415312 = score(doc=2765,freq=2.0), product of:
            0.17654699 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050415643 = queryNorm
            0.19345059 = fieldWeight in 2765, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2765)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Passages can be hidden within a text to circumvent their disallowed transfer. Such release of compartmentalized information is of concern to all corporate and governmental organizations. Passage retrieval is well studied; we posit, however, that passage detection is not. Passage retrieval is the determination of the degree of relevance of blocks of text, namely passages, comprising a document. Rather than determining the relevance of a document in its entirety, passage retrieval determines the relevance of the individual passages. As such, modified traditional information-retrieval techniques compare terms found in user queries with the individual passages to determine a similarity score for passages of interest. In passage detection, passages are classified into predetermined categories. More often than not, passage detection techniques are deployed to detect hidden paragraphs in documents. That is, to hide information, documents are injected with hidden text into passages. Rather than matching query terms against passages to determine their relevance, using text-mining techniques, the passages are classified. Those documents with hidden passages are defined as infected. Thus, simply stated, passage retrieval is the search for passages relevant to a user query, while passage detection is the classification of passages. That is, in passage detection, passages are labeled with one or more categories from a set of predetermined categories. We present a keyword-based dynamic passage approach (KDP) and demonstrate that KDP outperforms statistically significantly (99% confidence) the other document-splitting approaches by 12% to 18% in the passage detection and passage category-prediction tasks. Furthermore, we evaluate the effects of the feature selection, passage length, ambiguous passages, and finally training-data category distribution on passage-detection accuracy.

Date

22. 3.2009 19:14:43

Source

Journal of the American Society for Information Science and Technology. 60(2009) no.4, S.814-825

Na, J.-C.; Sui, H.; Khoo, C.; Chan, S.; Zhou, Y.: Effectiveness of simple linguistic processing in automatic sentiment classification of product reviews (2004) 0.02

0.01680845 = product of:
  0.0336169 = sum of:
    0.008582841 = weight(_text_:information in 2624) [ClassicSimilarity], result of:
      0.008582841 = score(doc=2624,freq=2.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.09697737 = fieldWeight in 2624, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2624)
    0.025034059 = product of:
      0.050068118 = sum of:
        0.050068118 = weight(_text_:organization in 2624) [ClassicSimilarity], result of:
          0.050068118 = score(doc=2624,freq=4.0), product of:
            0.17974974 = queryWeight, product of:
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.050415643 = queryNorm
            0.27854347 = fieldWeight in 2624, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2624)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Series: Advances in knowledge organization; vol.9
Source: Knowledge organization and the global information society: Proceedings of the 8th International ISKO Conference 13-16 July 2004, London, UK. Ed.: I.C. McIlwaine

AlQenaei, Z.M.; Monarchi, D.E.: ¬The use of learning techniques to analyze the results of a manual classification system (2016) 0.02
```
0.016283836 = product of:
  0.032567672 = sum of:
    0.014865918 = weight(_text_:information in 2836) [ClassicSimilarity], result of:
      0.014865918 = score(doc=2836,freq=6.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.16796975 = fieldWeight in 2836, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2836)
    0.017701752 = product of:
      0.035403505 = sum of:
        0.035403505 = weight(_text_:organization in 2836) [ClassicSimilarity], result of:
          0.035403505 = score(doc=2836,freq=2.0), product of:
            0.17974974 = queryWeight, product of:
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.050415643 = queryNorm
            0.19695997 = fieldWeight in 2836, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2836)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Classification is the process of assigning objects to pre-defined classes based on observations or characteristics of those objects, and there are many approaches to performing this task. The overall objective of this study is to demonstrate the use of two learning techniques to analyze the results of a manual classification system. Our sample consisted of 1,026 documents, from the ACM Computing Classification System, classified by their authors as belonging to one of the groups of the classification system: "H.3 Information Storage and Retrieval." A singular value decomposition of the documents' weighted term-frequency matrix was used to represent each document in a 50-dimensional vector space. The analysis of the representation using both supervised (decision tree) and unsupervised (clustering) techniques suggests that two pairs of the ACM classes are closely related to each other in the vector space. Class 1 (Content Analysis and Indexing) is closely related to Class 3 (Information Search and Retrieval), and Class 4 (Systems and Software) is closely related to Class 5 (Online Information Services). Further analysis was performed to test the diffusion of the words in the two classes using both cosine and Euclidean distance.

Source

Knowledge organization. 43(2016) no.1, S.56-63
Golub, K.; Hamon, T.; Ardö, A.: Automated classification of textual documents based on a controlled vocabulary in engineering (2007) 0.02
```
0.015770756 = product of:
  0.03154151 = sum of:
    0.01029941 = weight(_text_:information in 1461) [ClassicSimilarity], result of:
      0.01029941 = score(doc=1461,freq=2.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.116372846 = fieldWeight in 1461, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=1461)
    0.021242103 = product of:
      0.042484205 = sum of:
        0.042484205 = weight(_text_:organization in 1461) [ClassicSimilarity], result of:
          0.042484205 = score(doc=1461,freq=2.0), product of:
            0.17974974 = queryWeight, product of:
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.050415643 = queryNorm
            0.23635197 = fieldWeight in 1461, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.046875 = fieldNorm(doc=1461)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Automated subject classification has been a challenging research issue for many years now, receiving particular attention in the past decade due to rapid increase of digital documents. The most frequent approach to automated classification is machine learning. It, however, requires training documents and performs well on new documents only if these are similar enough to the former. We explore a string-matching algorithm based on a controlled vocabulary, which does not require training documents - instead it reuses the intellectual work put into creating the controlled vocabulary. Terms from the Engineering Information thesaurus and classification scheme were matched against title and abstract of engineering papers from the Compendex database. Simple string-matching was enhanced by several methods such as term weighting schemes and cut-offs, exclusion of certain terms, and en- richment of the controlled vocabulary with automatically extracted terms. The best results are 76% recall when the controlled vocabulary is enriched with new terms, and 79% precision when certain terms are excluded. Precision of individual classes is up to 98%. These results are comparable to state-of-the-art machine-learning algorithms.

Source

Knowledge organization. 34(2007) no.4, S.247-263

Zhu, W.Z.; Allen, R.B.: Document clustering using the LSI subspace signature model (2013) 0.02

0.015395639 = product of:
  0.030791279 = sum of:
    0.01029941 = weight(_text_:information in 690) [ClassicSimilarity], result of:
      0.01029941 = score(doc=690,freq=2.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.116372846 = fieldWeight in 690, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=690)
    0.02049187 = product of:
      0.04098374 = sum of:
        0.04098374 = weight(_text_:22 in 690) [ClassicSimilarity], result of:
          0.04098374 = score(doc=690,freq=2.0), product of:
            0.17654699 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050415643 = queryNorm
            0.23214069 = fieldWeight in 690, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=690)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Date: 23. 3.2013 13:22:36
Source: Journal of the American Society for Information Science and Technology. 64(2013) no.4, S.844-860

Egbert, J.; Biber, D.; Davies, M.: Developing a bottom-up, user-based method of web register classification (2015) 0.02

0.015395639 = product of:
  0.030791279 = sum of:
    0.01029941 = weight(_text_:information in 2158) [ClassicSimilarity], result of:
      0.01029941 = score(doc=2158,freq=2.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.116372846 = fieldWeight in 2158, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2158)
    0.02049187 = product of:
      0.04098374 = sum of:
        0.04098374 = weight(_text_:22 in 2158) [ClassicSimilarity], result of:
          0.04098374 = score(doc=2158,freq=2.0), product of:
            0.17654699 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050415643 = queryNorm
            0.23214069 = fieldWeight in 2158, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2158)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Date: 4. 8.2015 19:22:04
Source: Journal of the Association for Information Science and Technology. 66(2015) no.9, S.1817-1831

Classification, automation, and new media : Proceedings of the 24th Annual Conference of the Gesellschaft für Klassifikation e.V., University of Passau, March 15 - 17, 2000 (2002) 0.01

0.014919861 = product of:
  0.029839722 = sum of:
    0.01213797 = weight(_text_:information in 5997) [ClassicSimilarity], result of:
      0.01213797 = score(doc=5997,freq=4.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.13714671 = fieldWeight in 5997, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5997)
    0.017701752 = product of:
      0.035403505 = sum of:
        0.035403505 = weight(_text_:organization in 5997) [ClassicSimilarity], result of:
          0.035403505 = score(doc=5997,freq=2.0), product of:
            0.17974974 = queryWeight, product of:
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.050415643 = queryNorm
            0.19695997 = fieldWeight in 5997, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5997)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: Given the huge amount of information in the internet and in practically every domain of knowledge that we are facing today, knowledge discovery calls for automation. The book deals with methods from classification and data analysis that respond effectively to this rapidly growing challenge. The interested reader will find new methodological insights as well as applications in economics, management science, finance, and marketing, and in pattern recognition, biology, health, and archaeology.
Content: Data Analysis, Statistics, and Classification.- Pattern Recognition and Automation.- Data Mining, Information Processing, and Automation.- New Media, Web Mining, and Automation.- Applications in Management Science, Finance, and Marketing.- Applications in Medicine, Biology, Archaeology, and Others.- Author Index.- Subject Index.
Series: Proceedings of the ... annual conference of the Gesellschaft für Klassifikation e.V. ; 24)(Studies in classification, data analysis, and knowledge organization

Fang, H.: Classifying research articles in multidisciplinary sciences journals into subject categories (2015) 0.01
```
0.013142297 = product of:
  0.026284594 = sum of:
    0.008582841 = weight(_text_:information in 2194) [ClassicSimilarity], result of:
      0.008582841 = score(doc=2194,freq=2.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.09697737 = fieldWeight in 2194, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2194)
    0.017701752 = product of:
      0.035403505 = sum of:
        0.035403505 = weight(_text_:organization in 2194) [ClassicSimilarity], result of:
          0.035403505 = score(doc=2194,freq=2.0), product of:
            0.17974974 = queryWeight, product of:
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.050415643 = queryNorm
            0.19695997 = fieldWeight in 2194, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2194)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

In the Thomson Reuters Web of Science database, the subject categories of a journal are applied to all articles in the journal. However, many articles in multidisciplinary Sciences journals may only be represented by a small number of subject categories. To provide more accurate information on the research areas of articles in such journals, we can classify articles in these journals into subject categories as defined by Web of Science based on their references. For an article in a multidisciplinary sciences journal, the method counts the subject categories in all of the article's references indexed by Web of Science, and uses the most numerous subject categories of the references to determine the most appropriate classification of the article. We used articles in an issue of Proceedings of the National Academy of Sciences (PNAS) to validate the correctness of the method by comparing the obtained results with the categories of the articles as defined by PNAS and their content. This study shows that the method provides more precise search results for the subject category of interest in bibliometric investigations through recognition of articles in multidisciplinary sciences journals whose work relates to a particular subject category.

Source

Knowledge organization. 42(2015) no.3, S.139-153
Xu, Y.; Bernard, A.: Knowledge organization through statistical computation : a new approach (2009) 0.01
```
0.010621051 = product of:
  0.042484205 = sum of:
    0.042484205 = product of:
      0.08496841 = sum of:
        0.08496841 = weight(_text_:organization in 3252) [ClassicSimilarity], result of:
          0.08496841 = score(doc=3252,freq=8.0), product of:
            0.17974974 = queryWeight, product of:
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.050415643 = queryNorm
            0.47270393 = fieldWeight in 3252, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.046875 = fieldNorm(doc=3252)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

Knowledge organization (KO) is an interdisciplinary issue which includes some problems in knowledge classification such as how to classify newly emerged knowledge. With the great complexity and ambiguity of knowledge, it is becoming sometimes inefficient to classify knowledge by logical reasoning. This paper attempts to propose a statistical approach to knowledge organization in order to resolve the problems in classifying complex and mass knowledge. By integrating the classification process into a mathematical model, a knowledge classifier, based on the maximum entropy theory, is constructed and the experimental results show that the classification results acquired from the classifier are reliable. The approach proposed in this paper is quite formal and is not dependent on specific contexts, so it could easily be adapted to the use of knowledge classification in other domains within KO.

Source

Knowledge organization. 36(2009) no.4, S.227-239

Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.01

0.010245935 = product of:
  0.04098374 = sum of:
    0.04098374 = product of:
      0.08196748 = sum of:
        0.08196748 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
          0.08196748 = score(doc=1046,freq=2.0), product of:
            0.17654699 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050415643 = queryNorm
            0.46428138 = fieldWeight in 1046, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=1046)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 5. 5.2003 14:17:22

Borko, H.: Research in computer based classification systems (1985) 0.01
```
0.009199608 = product of:
  0.018399216 = sum of:
    0.006007989 = weight(_text_:information in 3647) [ClassicSimilarity], result of:
      0.006007989 = score(doc=3647,freq=2.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.06788416 = fieldWeight in 3647, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02734375 = fieldNorm(doc=3647)
    0.012391226 = product of:
      0.024782453 = sum of:
        0.024782453 = weight(_text_:organization in 3647) [ClassicSimilarity], result of:
          0.024782453 = score(doc=3647,freq=2.0), product of:
            0.17974974 = queryWeight, product of:
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.050415643 = queryNorm
            0.13787198 = fieldWeight in 3647, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.02734375 = fieldNorm(doc=3647)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

The selection in this reader by R. M. Needham and K. Sparck Jones reports an early approach to automatic classification that was taken in England. The following selection reviews various approaches that were being pursued in the United States at about the same time. It then discusses a particular approach initiated in the early 1960s by Harold Borko, at that time Head of the Language Processing and Retrieval Research Staff at the System Development Corporation, Santa Monica, California and, since 1966, a member of the faculty at the Graduate School of Library and Information Science, University of California, Los Angeles. As was described earlier, there are two steps in automatic classification, the first being to identify pairs of terms that are similar by virtue of co-occurring as index terms in the same documents, and the second being to form equivalence classes of intersubstitutable terms. To compute similarities, Borko and his associates used a standard correlation formula; to derive classification categories, where Needham and Sparck Jones used clumping, the Borko team used the statistical technique of factor analysis. The fact that documents can be classified automatically, and in any number of ways, is worthy of passing notice. Worthy of serious attention would be a demonstra tion that a computer-based classification system was effective in the organization and retrieval of documents. One reason for the inclusion of the following selection in the reader is that it addresses the question of evaluation. To evaluate the effectiveness of their automatically derived classification, Borko and his team asked three questions. The first was Is the classification reliable? in other words, could the categories derived from one sample of texts be used to classify other texts? Reliability was assessed by a case-study comparison of the classes derived from three different samples of abstracts. The notso-surprising conclusion reached was that automatically derived classes were reliable only to the extent that the sample from which they were derived was representative of the total document collection. The second evaluation question asked whether the classification was reasonable, in the sense of adequately describing the content of the document collection. The answer was sought by comparing the automatically derived categories with categories in a related classification system that was manually constructed. Here the conclusion was that the automatic method yielded categories that fairly accurately reflected the major area of interest in the sample collection of texts; however, since there were only eleven such categories and they were quite broad, they could not be regarded as suitable for use in a university or any large general library. The third evaluation question asked whether automatic classification was accurate, in the sense of producing results similar to those obtainabie by human cIassifiers. When using human classification as a criterion, automatic classification was found to be 50 percent accurate.

Koch, T.; Vizine-Goetz, D.: DDC and knowledge organization in the digital library : Research and development. Demonstration pages (1999) 0.01

0.009198101 = product of:
  0.036792405 = sum of:
    0.036792405 = product of:
      0.07358481 = sum of:
        0.07358481 = weight(_text_:organization in 942) [ClassicSimilarity], result of:
          0.07358481 = score(doc=942,freq=6.0), product of:
            0.17974974 = queryWeight, product of:
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.050415643 = queryNorm
            0.40937364 = fieldWeight in 942, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.046875 = fieldNorm(doc=942)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Content: 1. Increased Importance of Knowledge Organization in Internet Services - 2. Quality Subject Service and the role of classification - 3. Developing the DDC into a knowledge organization instrument for the digital library. OCLC site - 4. DESIRE's Barefoot Solutions of Automatic Classification - 5. Advanced Classification Solutions in DESIRE and CORC - 6. Future directions of research and development - 7. General references

Ardö, A.; Koch, T.: Automatic classification applied to full-text Internet documents in a robot-generated subject index (1999) 0.01

0.008919551 = product of:
  0.035678204 = sum of:
    0.035678204 = weight(_text_:information in 382) [ClassicSimilarity], result of:
      0.035678204 = score(doc=382,freq=6.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.40312737 = fieldWeight in 382, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.09375 = fieldNorm(doc=382)
  0.25 = coord(1/4)

Imprint: Hinskey Hill : Learned Information
Source: Online information 99: 23rd International Online Information Meeting, Proceedings, London, 7-9 December 1999. Ed.: D. Raitt et al

HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.01

0.00853828 = product of:
  0.03415312 = sum of:
    0.03415312 = product of:
      0.06830624 = sum of:
        0.06830624 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
          0.06830624 = score(doc=2748,freq=2.0), product of:
            0.17654699 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050415643 = queryNorm
            0.38690117 = fieldWeight in 2748, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2748)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 1. 2.2016 18:25:22

Möller, G.: Automatic classification of the World Wide Web using Universal Decimal Classification (1999) 0.01

0.007432959 = product of:
  0.029731836 = sum of:
    0.029731836 = weight(_text_:information in 494) [ClassicSimilarity], result of:
      0.029731836 = score(doc=494,freq=6.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.3359395 = fieldWeight in 494, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.078125 = fieldNorm(doc=494)
  0.25 = coord(1/4)

Imprint: Hinskey Hill : Learned Information
Source: Online information 99: 23rd International Online Information Meeting, Proceedings, London, 7-9 December 1999. Ed.: D. Raitt et al

Miyamoto, S.: Information clustering based an fuzzy multisets (2003) 0.01
```
0.0073582535 = product of:
  0.029433014 = sum of:
    0.029433014 = weight(_text_:information in 1071) [ClassicSimilarity], result of:
      0.029433014 = score(doc=1071,freq=12.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.3325631 = fieldWeight in 1071, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1071)
  0.25 = coord(1/4)
```
Abstract

A fuzzy multiset model for information clustering is proposed with application to information retrieval on the World Wide Web. Noting that a search engine retrieves multiple occurrences of the same subjects with possibly different degrees of relevance, we observe that fuzzy multisets provide an appropriate model of information retrieval on the WWW. Information clustering which means both term clustering and document clustering is considered. Three methods of the hard c-means, fuzzy c-means, and an agglomerative method using cluster centers are proposed. Two distances between fuzzy multisets and algorithms for calculating cluster centers are defined. Theoretical properties concerning the clustering algorithms are studied. Illustrative examples are given to show how the algorithms work.

Source

Information processing and management. 39(2003) no.2, S.195-213
Ko, Y.: ¬A new term-weighting scheme for text classification using the odds of positive and negative class probabilities (2015) 0.01
```
0.0072827823 = product of:
  0.02913113 = sum of:
    0.02913113 = weight(_text_:information in 2339) [ClassicSimilarity], result of:
      0.02913113 = score(doc=2339,freq=16.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.3291521 = fieldWeight in 2339, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2339)
  0.25 = coord(1/4)
```
Abstract

Text classification (TC) is a core technique for text mining and information retrieval. It has been applied to many applications in many different research and industrial areas. Term-weighting schemes assign an appropriate weight to each term to obtain a high TC performance. Although term weighting is one of the important modules for TC and TC has different peculiarities from those in information retrieval, many term-weighting schemes used in information retrieval, such as term frequency-inverse document frequency (tf-idf), have been used in TC in the same manner. The peculiarity of TC that differs most from information retrieval is the existence of class information. This article proposes a new term-weighting scheme that uses class information using positive and negative class distributions. As a result, the proposed scheme, log tf-TRR, consistently performs better than do other schemes using class information as well as traditional schemes such as tf-idf.

Source

Journal of the Association for Information Science and Technology. 66(2015) no.12, S.2553-2565

Rijsbergen, C.J. van: Automatic classification in information retrieval (1978) 0.01

0.006866273 = product of:
  0.027465092 = sum of:
    0.027465092 = weight(_text_:information in 2412) [ClassicSimilarity], result of:
      0.027465092 = score(doc=2412,freq=2.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.3103276 = fieldWeight in 2412, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.125 = fieldNorm(doc=2412)
  0.25 = coord(1/4)

Khoo, C.S.G.; Ou, S.: Machine versus human clustering of concepts across documents (2008) 0.01

0.0062585147 = product of:
  0.025034059 = sum of:
    0.025034059 = product of:
      0.050068118 = sum of:
        0.050068118 = weight(_text_:organization in 2286) [ClassicSimilarity], result of:
          0.050068118 = score(doc=2286,freq=4.0), product of:
            0.17974974 = queryWeight, product of:
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.050415643 = queryNorm
            0.27854347 = fieldWeight in 2286, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2286)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Series: Advances in knowledge organization; vol.11
Source: Culture and identity in knowledge organization: Proceedings of the Tenth International ISKO Conference 5-8 August 2008, Montreal, Canada. Ed. by Clément Arsenault and Joseph T. Tennis

Kwok, K.L.: ¬The use of titles and cited titles as document representations for automatic classification (1975) 0.01

0.006007989 = product of:
  0.024031956 = sum of:
    0.024031956 = weight(_text_:information in 4347) [ClassicSimilarity], result of:
      0.024031956 = score(doc=4347,freq=2.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.27153665 = fieldWeight in 4347, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.109375 = fieldNorm(doc=4347)
  0.25 = coord(1/4)

Source: Information processing and management. 11(1975), S.201-206

Search (144 results, page 2 of 8)

Authors

Years

Types

Themes

Subjects