Search (84 results, page 2 of 5)

Golub, K.: Automated subject classification of textual web documents (2006) 0.01
```
0.011296105 = product of:
  0.033888314 = sum of:
    0.033888314 = weight(_text_:science in 5600) [ClassicSimilarity], result of:
      0.033888314 = score(doc=5600,freq=6.0), product of:
        0.13445559 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.05104385 = queryNorm
        0.25204095 = fieldWeight in 5600, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5600)
  0.33333334 = coord(1/3)
```
Abstract

Purpose - To provide an integrated perspective to similarities and differences between approaches to automated classification in different research communities (machine learning, information retrieval and library science), and point to problems with the approaches and automated classification as such. Design/methodology/approach - A range of works dealing with automated classification of full-text web documents are discussed. Explorations of individual approaches are given in the following sections: special features (description, differences, evaluation), application and characteristics of web pages. Findings - Provides major similarities and differences between the three approaches: document pre-processing and utilization of web-specific document characteristics is common to all the approaches; major differences are in applied algorithms, employment or not of the vector space model and of controlled vocabularies. Problems of automated classification are recognized. Research limitations/implications - The paper does not attempt to provide an exhaustive bibliography of related resources. Practical implications - As an integrated overview of approaches from different research communities with application examples, it is very useful for students in library and information science and computer science, as well as for practitioners. Researchers from one community have the information on how similar tasks are conducted in different communities. Originality/value - To the author's knowledge, no review paper on automated text classification attempted to discuss more than one community's approach from an integrated perspective.
Liu, X.; Yu, S.; Janssens, F.; Glänzel, W.; Moreau, Y.; Moor, B.de: Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database (2010) 0.01
```
0.0110678775 = product of:
  0.03320363 = sum of:
    0.03320363 = weight(_text_:science in 3464) [ClassicSimilarity], result of:
      0.03320363 = score(doc=3464,freq=4.0), product of:
        0.13445559 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.05104385 = queryNorm
        0.24694869 = fieldWeight in 3464, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.046875 = fieldNorm(doc=3464)
  0.33333334 = coord(1/3)
```
Abstract

We propose a new hybrid clustering framework to incorporate text mining with bibliometrics in journal set analysis. The framework integrates two different approaches: clustering ensemble and kernel-fusion clustering. To improve the flexibility and the efficiency of processing large-scale data, we propose an information-based weighting scheme to leverage the effect of multiple data sources in hybrid clustering. Three different algorithms are extended by the proposed weighting scheme and they are employed on a large journal set retrieved from the Web of Science (WoS) database. The clustering performance of the proposed algorithms is systematically evaluated using multiple evaluation methods, and they were cross-compared with alternative methods. Experimental results demonstrate that the proposed weighted hybrid clustering strategy is superior to other methods in clustering performance and efficiency. The proposed approach also provides a more refined structural mapping of journal sets, which is useful for monitoring and detecting new trends in different scientific fields.

Source

Journal of the American Society for Information Science and Technology. 61(2010) no.6, S.1105-1119
Teich, E.; Degaetano-Ortlieb, S.; Fankhauser, P.; Kermes, H.; Lapshinova-Koltunski, E.: ¬The linguistic construal of disciplinarity : a data-mining approach using register features (2016) 0.01
```
0.0110678775 = product of:
  0.03320363 = sum of:
    0.03320363 = weight(_text_:science in 3015) [ClassicSimilarity], result of:
      0.03320363 = score(doc=3015,freq=4.0), product of:
        0.13445559 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.05104385 = queryNorm
        0.24694869 = fieldWeight in 3015, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.046875 = fieldNorm(doc=3015)
  0.33333334 = coord(1/3)
```
Abstract

We analyze the linguistic evolution of selected scientific disciplines over a 30-year time span (1970s to 2000s). Our focus is on four highly specialized disciplines at the boundaries of computer science that emerged during that time: computational linguistics, bioinformatics, digital construction, and microelectronics. Our analysis is driven by the question whether these disciplines develop a distinctive language use-both individually and collectively-over the given time period. The data set is the English Scientific Text Corpus (scitex), which includes texts from the 1970s/1980s and early 2000s. Our theoretical basis is register theory. In terms of methods, we combine corpus-based methods of feature extraction (various aggregated features [part-of-speech based], n-grams, lexico-grammatical patterns) and automatic text classification. The results of our research are directly relevant to the study of linguistic variation and languages for specific purposes (LSP) and have implications for various natural language processing (NLP) tasks, for example, authorship attribution, text mining, or training NLP tools.

Source

Journal of the Association for Information Science and Technology. 67(2016) no.7, S.1668-1678

Cheng, P.T.K.; Wu, A.K.W.: ACS: an automatic classification system (1995) 0.01

0.010434895 = product of:
  0.031304684 = sum of:
    0.031304684 = weight(_text_:science in 2188) [ClassicSimilarity], result of:
      0.031304684 = score(doc=2188,freq=2.0), product of:
        0.13445559 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.05104385 = queryNorm
        0.23282544 = fieldWeight in 2188, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.0625 = fieldNorm(doc=2188)
  0.33333334 = coord(1/3)

Source: Journal of information science. 21(1995) no.4, S.289-299

Wang, J.: ¬An extensive study on automated Dewey Decimal Classification (2009) 0.01
```
0.009223231 = product of:
  0.027669692 = sum of:
    0.027669692 = weight(_text_:science in 3172) [ClassicSimilarity], result of:
      0.027669692 = score(doc=3172,freq=4.0), product of:
        0.13445559 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.05104385 = queryNorm
        0.20579056 = fieldWeight in 3172, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3172)
  0.33333334 = coord(1/3)
```
Abstract

In this paper, we present a theoretical analysis and extensive experiments on the automated assignment of Dewey Decimal Classification (DDC) classes to bibliographic data with a supervised machine-learning approach. Library classification systems, such as the DDC, impose great obstacles on state-of-art text categorization (TC) technologies, including deep hierarchy, data sparseness, and skewed distribution. We first analyze statistically the document and category distributions over the DDC, and discuss the obstacles imposed by bibliographic corpora and library classification schemes on TC technology. To overcome these obstacles, we propose an innovative algorithm to reshape the DDC structure into a balanced virtual tree by balancing the category distribution and flattening the hierarchy. To improve the classification effectiveness to a level acceptable to real-world applications, we propose an interactive classification model that is able to predict a class of any depth within a limited number of user interactions. The experiments are conducted on a large bibliographic collection created by the Library of Congress within the science and technology domains over 10 years. With no more than three interactions, a classification accuracy of nearly 90% is achieved, thus providing a practical solution to the automatic bibliographic classification problem.

Source

Journal of the American Society for Information Science and Technology. 60(2009) no.11, S.2269-2286
Humphrey, S.M.; Névéol, A.; Browne, A.; Gobeil, J.; Ruch, P.; Darmoni, S.J.: Comparing a rule-based versus statistical system for automatic categorization of MEDLINE documents according to biomedical specialty (2009) 0.01
```
0.009223231 = product of:
  0.027669692 = sum of:
    0.027669692 = weight(_text_:science in 3300) [ClassicSimilarity], result of:
      0.027669692 = score(doc=3300,freq=4.0), product of:
        0.13445559 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.05104385 = queryNorm
        0.20579056 = fieldWeight in 3300, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3300)
  0.33333334 = coord(1/3)
```
Abstract

Automatic document categorization is an important research problem in Information Science and Natural Language Processing. Many applications, including, Word Sense Disambiguation and Information Retrieval in large collections, can benefit from such categorization. This paper focuses on automatic categorization of documents from the biomedical literature into broad discipline-based categories. Two different systems are described and contrasted: CISMeF, which uses rules based on human indexing of the documents by the Medical Subject Headings (MeSH) controlled vocabulary in order to assign metaterms (MTs), and Journal Descriptor Indexing (JDI), based on human categorization of about 4,000 journals and statistical associations between journal descriptors (JDs) and textwords in the documents. We evaluate and compare the performance of these systems against a gold standard of humanly assigned categories for 100 MEDLINE documents, using six measures selected from trec_eval. The results show that for five of the measures performance is comparable, and for one measure JDI is superior. We conclude that these results favor JDI, given the significantly greater intellectual overhead involved in human indexing and maintaining a rule base for mapping MeSH terms to MTs. We also note a JDI method that associates JDs with MeSH indexing rather than textwords, and it may be worthwhile to investigate whether this JDI method (statistical) and CISMeF (rule-based) might be combined and then evaluated showing they are complementary to one another.

Source

Journal of the American Society for Information Science and Technology. 60(2009) no.12, S.2530-2539

Golub, K.; Soergel, D.; Buchanan, G.; Tudhope, D.; Lykke, M.; Hiom, D.: ¬A framework for evaluating automatic indexing or classification in the context of retrieval (2016) 0.01

0.009223231 = product of:
  0.027669692 = sum of:
    0.027669692 = weight(_text_:science in 3311) [ClassicSimilarity], result of:
      0.027669692 = score(doc=3311,freq=4.0), product of:
        0.13445559 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.05104385 = queryNorm
        0.20579056 = fieldWeight in 3311, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3311)
  0.33333334 = coord(1/3)

Series: Advances in information science
Source: Journal of the Association for Information Science and Technology. 67(2016) no.1, S.3-16

Smiraglia, R.P.; Cai, X.: Tracking the evolution of clustering, machine learning, automatic indexing and automatic classification in knowledge organization (2017) 0.01
```
0.009223231 = product of:
  0.027669692 = sum of:
    0.027669692 = weight(_text_:science in 3627) [ClassicSimilarity], result of:
      0.027669692 = score(doc=3627,freq=4.0), product of:
        0.13445559 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.05104385 = queryNorm
        0.20579056 = fieldWeight in 3627, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3627)
  0.33333334 = coord(1/3)
```
Abstract

A very important extension of the traditional domain of knowledge organization (KO) arises from attempts to incorporate techniques devised in the computer science domain for automatic concept extraction and for grouping, categorizing, clustering and otherwise organizing knowledge using mechanical means. Four specific terms have emerged to identify the most prevalent techniques: machine learning, clustering, automatic indexing, and automatic classification. Our study presents three domain analytical case analyses in search of answers. The first case relies on citations located using the ISKO-supported "Knowledge Organization Bibliography." The second case relies on works in both Web of Science and SCOPUS. Case three applies co-word analysis and citation analysis to the contents of the papers in the present special issue. We observe scholars involved in "clustering" and "automatic classification" who share common thematic emphases. But we have found no coherence, no common activity and no social semantics. We have not found a research front, or a common teleology within the KO domain. We also have found a lively group of authors who have succeeded in submitting papers to this special issue, and their work quite interestingly aligns with the case studies we report. There is an emphasis on KO for information retrieval; there is much work on clustering (which involves conceptual points within texts) and automatic classification (which involves semantic groupings at the meta-document level).

May, A.D.: Automatic classification of e-mail messages by message type (1997) 0.01

0.009130533 = product of:
  0.027391598 = sum of:
    0.027391598 = weight(_text_:science in 6493) [ClassicSimilarity], result of:
      0.027391598 = score(doc=6493,freq=2.0), product of:
        0.13445559 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.05104385 = queryNorm
        0.20372227 = fieldWeight in 6493, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.0546875 = fieldNorm(doc=6493)
  0.33333334 = coord(1/3)

Source: Journal of the American Society for Information Science. 48(1997) no.1, S.32-39

Ruocco, A.S.; Frieder, O.: Clustering and classification of large document bases in a parallel environment (1997) 0.01

0.009130533 = product of:
  0.027391598 = sum of:
    0.027391598 = weight(_text_:science in 1661) [ClassicSimilarity], result of:
      0.027391598 = score(doc=1661,freq=2.0), product of:
        0.13445559 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.05104385 = queryNorm
        0.20372227 = fieldWeight in 1661, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1661)
  0.33333334 = coord(1/3)

Source: Journal of the American Society for Information Science. 48(1997) no.10, S.932-943

Orwig, R.E.; Chen, H.; Nunamaker, J.F.: ¬A graphical, self-organizing approach to classifying electronic meeting output (1997) 0.01

0.009130533 = product of:
  0.027391598 = sum of:
    0.027391598 = weight(_text_:science in 6928) [ClassicSimilarity], result of:
      0.027391598 = score(doc=6928,freq=2.0), product of:
        0.13445559 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.05104385 = queryNorm
        0.20372227 = fieldWeight in 6928, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.0546875 = fieldNorm(doc=6928)
  0.33333334 = coord(1/3)

Source: Journal of the American Society for Information Science. 48(1997) no.2, S.157-170

Sebastiani, F.: Classification of text, automatic (2006) 0.01

0.009130533 = product of:
  0.027391598 = sum of:
    0.027391598 = weight(_text_:science in 5003) [ClassicSimilarity], result of:
      0.027391598 = score(doc=5003,freq=2.0), product of:
        0.13445559 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.05104385 = queryNorm
        0.20372227 = fieldWeight in 5003, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5003)
  0.33333334 = coord(1/3)

Imprint: Amsterdam : Elsevier Science Publishers

Dang, E.K.F.; Luk, R.W.P.; Ho, K.S.; Chan, S.C.F.; Lee, D.L.: ¬A new measure of clustering effectiveness : algorithms and experimental studies (2008) 0.01

0.009130533 = product of:
  0.027391598 = sum of:
    0.027391598 = weight(_text_:science in 1367) [ClassicSimilarity], result of:
      0.027391598 = score(doc=1367,freq=2.0), product of:
        0.13445559 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.05104385 = queryNorm
        0.20372227 = fieldWeight in 1367, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1367)
  0.33333334 = coord(1/3)

Source: Journal of the American Society for Information Science and Technology. 59(2008) no.3, S.390-406

Bock, H.-H.: Datenanalyse zur Strukturierung und Ordnung von Information (1989) 0.01

0.00806836 = product of:
  0.02420508 = sum of:
    0.02420508 = product of:
      0.04841016 = sum of:
        0.04841016 = weight(_text_:22 in 141) [ClassicSimilarity], result of:
          0.04841016 = score(doc=141,freq=2.0), product of:
            0.17874686 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05104385 = queryNorm
            0.2708308 = fieldWeight in 141, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=141)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Pages: S.1-22

Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998) 0.01

0.00806836 = product of:
  0.02420508 = sum of:
    0.02420508 = product of:
      0.04841016 = sum of:
        0.04841016 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
          0.04841016 = score(doc=1673,freq=2.0), product of:
            0.17874686 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05104385 = queryNorm
            0.2708308 = fieldWeight in 1673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1673)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 1. 8.1996 22:08:06

Yi, K.: Automatic text classification using library classification schemes : trends, issues and challenges (2007) 0.01

0.00806836 = product of:
  0.02420508 = sum of:
    0.02420508 = product of:
      0.04841016 = sum of:
        0.04841016 = weight(_text_:22 in 2560) [ClassicSimilarity], result of:
          0.04841016 = score(doc=2560,freq=2.0), product of:
            0.17874686 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05104385 = queryNorm
            0.2708308 = fieldWeight in 2560, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2560)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 22. 9.2008 18:31:54

Larson, R.R.: Experiments in automatic Library of Congress Classification (1992) 0.01

0.007826171 = product of:
  0.023478512 = sum of:
    0.023478512 = weight(_text_:science in 1054) [ClassicSimilarity], result of:
      0.023478512 = score(doc=1054,freq=2.0), product of:
        0.13445559 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.05104385 = queryNorm
        0.17461908 = fieldWeight in 1054, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.046875 = fieldNorm(doc=1054)
  0.33333334 = coord(1/3)

Source: Journal of the American Society for Information Science. 43(1992), S.130-148

Mostafa, J.; Quiroga, L.M.; Palakal, M.: Filtering medical documents using automated and human classification methods (1998) 0.01

0.007826171 = product of:
  0.023478512 = sum of:
    0.023478512 = weight(_text_:science in 2326) [ClassicSimilarity], result of:
      0.023478512 = score(doc=2326,freq=2.0), product of:
        0.13445559 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.05104385 = queryNorm
        0.17461908 = fieldWeight in 2326, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.046875 = fieldNorm(doc=2326)
  0.33333334 = coord(1/3)

Source: Journal of the American Society for Information Science. 49(1998) no.14, S.1304-1318

Drori, O.; Alon, N.: Using document classification for displaying search results (2003) 0.01

0.007826171 = product of:
  0.023478512 = sum of:
    0.023478512 = weight(_text_:science in 1565) [ClassicSimilarity], result of:
      0.023478512 = score(doc=1565,freq=2.0), product of:
        0.13445559 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.05104385 = queryNorm
        0.17461908 = fieldWeight in 1565, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.046875 = fieldNorm(doc=1565)
  0.33333334 = coord(1/3)

Source: Journal of information science. 29(2003) no.2, S.97-106

Chung, Y.-M.; Noh, Y.-H.: Developing a specialized directory system by automatically classifying Web documents (2003) 0.01

0.007826171 = product of:
  0.023478512 = sum of:
    0.023478512 = weight(_text_:science in 1566) [ClassicSimilarity], result of:
      0.023478512 = score(doc=1566,freq=2.0), product of:
        0.13445559 = queryWeight, product of:
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.05104385 = queryNorm
        0.17461908 = fieldWeight in 1566, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6341193 = idf(docFreq=8627, maxDocs=44218)
          0.046875 = fieldNorm(doc=1566)
  0.33333334 = coord(1/3)

Source: Journal of information science. 29(2003) no.2, S.117-126

Search (84 results, page 2 of 5)

Authors

Years

Languages

Themes