Search (178 results, page 1 of 9)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.26

0.25871408 = product of:
  0.46568534 = sum of:
    0.062223002 = product of:
      0.186669 = sum of:
        0.186669 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.186669 = score(doc=562,freq=2.0), product of:
            0.3321406 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.03917671 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.186669 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.186669 = score(doc=562,freq=2.0), product of:
        0.3321406 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.03917671 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.014200641 = weight(_text_:of in 562) [ClassicSimilarity], result of:
      0.014200641 = score(doc=562,freq=10.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.23179851 = fieldWeight in 562, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.186669 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.186669 = score(doc=562,freq=2.0), product of:
        0.3321406 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.03917671 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.015923709 = product of:
      0.031847417 = sum of:
        0.031847417 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.031847417 = score(doc=562,freq=2.0), product of:
            0.13719016 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03917671 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
  0.5555556 = coord(5/9)

Abstract: Document representations for text classification are typically based on the classical Bag-Of-Words paradigm. This approach comes with deficiencies that motivate the integration of features on a higher semantic level than single words. In this paper we propose an enhancement of the classical document representation through concepts extracted from background knowledge. Boosting is used for actual classification. Experimental evaluations on two well known text corpora support our approach through consistent improvement of the results.
Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32
Source: Proceedings of the 4th IEEE International Conference on Data Mining (ICDM 2004), 1-4 November 2004, Brighton, UK

Koch, T.; Ardö, A.: Automatic classification of full-text HTML-documents from one specific subject area : DESIRE II D3.6a, Working Paper 2 (2000) 0.09

0.09283512 = product of:
  0.20887902 = sum of:
    0.06711562 = weight(_text_:applications in 1667) [ClassicSimilarity], result of:
      0.06711562 = score(doc=1667,freq=2.0), product of:
        0.17247584 = queryWeight, product of:
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03917671 = queryNorm
        0.38913056 = fieldWeight in 1667, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.0625 = fieldNorm(doc=1667)
    0.014666359 = weight(_text_:of in 1667) [ClassicSimilarity], result of:
      0.014666359 = score(doc=1667,freq=6.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.23940048 = fieldWeight in 1667, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0625 = fieldNorm(doc=1667)
    0.03270373 = weight(_text_:systems in 1667) [ClassicSimilarity], result of:
      0.03270373 = score(doc=1667,freq=2.0), product of:
        0.12039685 = queryWeight, product of:
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03917671 = queryNorm
        0.2716328 = fieldWeight in 1667, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.0625 = fieldNorm(doc=1667)
    0.09439332 = weight(_text_:software in 1667) [ClassicSimilarity], result of:
      0.09439332 = score(doc=1667,freq=6.0), product of:
        0.15541996 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.03917671 = queryNorm
        0.6073436 = fieldWeight in 1667, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.0625 = fieldNorm(doc=1667)
  0.44444445 = coord(4/9)

Content: 1 Introduction / 2 Method overview / 3 Ei thesaurus preprocessing / 4 Automatic classification process: 4.1 Matching -- 4.2 Weighting -- 4.3 Preparation for display / 5 Results of the classification process / 6 Evaluations / 7 Software / 8 Other applications / 9 Experiments with universal classification systems / References / Appendix A: Ei classification service: Software / Appendix B: Use of the classification software as subject filter in a WWW harvester.

Sebastiani, F.: Classification of text, automatic (2006) 0.08

0.078104936 = product of:
  0.1757361 = sum of:
    0.05872617 = weight(_text_:applications in 5003) [ClassicSimilarity], result of:
      0.05872617 = score(doc=5003,freq=2.0), product of:
        0.17247584 = queryWeight, product of:
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03917671 = queryNorm
        0.34048924 = fieldWeight in 5003, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5003)
    0.020956306 = weight(_text_:of in 5003) [ClassicSimilarity], result of:
      0.020956306 = score(doc=5003,freq=16.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.34207192 = fieldWeight in 5003, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5003)
    0.028615767 = weight(_text_:systems in 5003) [ClassicSimilarity], result of:
      0.028615767 = score(doc=5003,freq=2.0), product of:
        0.12039685 = queryWeight, product of:
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03917671 = queryNorm
        0.23767869 = fieldWeight in 5003, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5003)
    0.06743785 = weight(_text_:software in 5003) [ClassicSimilarity], result of:
      0.06743785 = score(doc=5003,freq=4.0), product of:
        0.15541996 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.03917671 = queryNorm
        0.43390724 = fieldWeight in 5003, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5003)
  0.44444445 = coord(4/9)

Abstract: Automatic text classification (ATC) is a discipline at the crossroads of information retrieval (IR), machine learning (ML), and computational linguistics (CL), and consists in the realization of text classifiers, i.e. software systems capable of assigning texts to one or more categories, or classes, from a predefined set. Applications range from the automated indexing of scientific articles, to e-mail routing, spam filtering, authorship attribution, and automated survey coding. This article will focus on the ML approach to ATC, whereby a software system (called the learner) automatically builds a classifier for the categories of interest by generalizing from a "training" set of pre-classified texts.
Source: Encyclopedia of language and linguistics. 2nd ed. Ed.: K. Brown. Vol. 14

Ruocco, A.S.; Frieder, O.: Clustering and classification of large document bases in a parallel environment (1997) 0.04

0.04161918 = product of:
  0.124857545 = sum of:
    0.05872617 = weight(_text_:applications in 1661) [ClassicSimilarity], result of:
      0.05872617 = score(doc=1661,freq=2.0), product of:
        0.17247584 = queryWeight, product of:
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03917671 = queryNorm
        0.34048924 = fieldWeight in 1661, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1661)
    0.016567415 = weight(_text_:of in 1661) [ClassicSimilarity], result of:
      0.016567415 = score(doc=1661,freq=10.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.2704316 = fieldWeight in 1661, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1661)
    0.049563963 = weight(_text_:systems in 1661) [ClassicSimilarity], result of:
      0.049563963 = score(doc=1661,freq=6.0), product of:
        0.12039685 = queryWeight, product of:
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03917671 = queryNorm
        0.41167158 = fieldWeight in 1661, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1661)
  0.33333334 = coord(3/9)

Abstract: Proposes the use of parallel computing systems to overcome the computationally intense clustering process. Examines 2 operations: clustering a document set and classifying the document set. Uses a subset of the TIPSTER corpus, specifically, articles from the Wall Street Journal. Document set classification was performed without the large storage requirements for ancillary data matrices. The time performance of the parallel systems was an improvement over sequential systems times, and produced the same clustering and classification scheme. Results show near linear speed up in higher threshold clustering applications
Source: Journal of the American Society for Information Science. 48(1997) no.10, S.932-943

Dubin, D.: Dimensions and discriminability (1998) 0.03

0.03275338 = product of:
  0.098260134 = sum of:
    0.05872617 = weight(_text_:applications in 2338) [ClassicSimilarity], result of:
      0.05872617 = score(doc=2338,freq=2.0), product of:
        0.17247584 = queryWeight, product of:
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03917671 = queryNorm
        0.34048924 = fieldWeight in 2338, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2338)
    0.020956306 = weight(_text_:of in 2338) [ClassicSimilarity], result of:
      0.020956306 = score(doc=2338,freq=16.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.34207192 = fieldWeight in 2338, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2338)
    0.018577661 = product of:
      0.037155323 = sum of:
        0.037155323 = weight(_text_:22 in 2338) [ClassicSimilarity], result of:
          0.037155323 = score(doc=2338,freq=2.0), product of:
            0.13719016 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03917671 = queryNorm
            0.2708308 = fieldWeight in 2338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2338)
      0.5 = coord(1/2)
  0.33333334 = coord(3/9)

Abstract: Visualization interfaces can improve subject access by highlighting the inclusion of document representation components in similarity and discrimination relationships. Within a set of retrieved documents, what kinds of groupings can index terms and subject headings make explicit? The role of controlled vocabulary in classifying search output is examined
Date: 22. 9.1997 19:16:05
Imprint: Urbana-Champaign, IL : Illinois University at Urbana-Champaign, Graduate School of Library and Information Science
Source: Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al

Yao, H.; Etzkorn, L.H.; Virani, S.: Automated classification and retrieval of reusable software components (2008) 0.03
```
0.030120917 = product of:
  0.13554412 = sum of:
    0.017552461 = weight(_text_:of in 1382) [ClassicSimilarity], result of:
      0.017552461 = score(doc=1382,freq=22.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.28651062 = fieldWeight in 1382, product of:
          4.690416 = tf(freq=22.0), with freq of:
            22.0 = termFreq=22.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1382)
    0.117991656 = weight(_text_:software in 1382) [ClassicSimilarity], result of:
      0.117991656 = score(doc=1382,freq=24.0), product of:
        0.15541996 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.03917671 = queryNorm
        0.75917953 = fieldWeight in 1382, product of:
          4.8989797 = tf(freq=24.0), with freq of:
            24.0 = termFreq=24.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1382)
  0.22222222 = coord(2/9)
```
Abstract

The authors describe their research which improves software reuse by using an automated approach to semantically search for and retrieve reusable software components in large software component repositories and on the World Wide Web (WWW). Using automation and smart (semantic) techniques, their approach speeds up the search and retrieval of reusable software components, while retaining good accuracy, and therefore improves the affordability of software reuse. A program understanding of software components and natural language understanding of user queries was employed. Then the software component descriptions were compared by matching the resulting semantic representations of the user queries to the semantic representations of the software components to search for software components that best match the user queries. A proof of concept system was developed to test the authors' approach. The results of this proof of concept system were compared to human experts, and statistical analysis was performed on the collected experimental data. The results from these experiments demonstrate that this automated semantic-based approach for software reusable component classification and retrieval is successful when compared to the labor-intensive results from the experts, thus showing that this approach can significantly benefit software reuse classification and retrieval.

Source

Journal of the American Society for Information Science and Technology. 59(2008) no.4, S.613-627

Yi, K.: Automatic text classification using library classification schemes : trends, issues and challenges (2007) 0.03

0.030045634 = product of:
  0.0901369 = sum of:
    0.05872617 = weight(_text_:applications in 2560) [ClassicSimilarity], result of:
      0.05872617 = score(doc=2560,freq=2.0), product of:
        0.17247584 = queryWeight, product of:
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03917671 = queryNorm
        0.34048924 = fieldWeight in 2560, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2560)
    0.0128330635 = weight(_text_:of in 2560) [ClassicSimilarity], result of:
      0.0128330635 = score(doc=2560,freq=6.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.20947541 = fieldWeight in 2560, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2560)
    0.018577661 = product of:
      0.037155323 = sum of:
        0.037155323 = weight(_text_:22 in 2560) [ClassicSimilarity], result of:
          0.037155323 = score(doc=2560,freq=2.0), product of:
            0.13719016 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03917671 = queryNorm
            0.2708308 = fieldWeight in 2560, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2560)
      0.5 = coord(1/2)
  0.33333334 = coord(3/9)

Abstract: The proliferation of digital resources and their integration into a traditional library setting has created a pressing need for an automated tool that organizes textual information based on library classification schemes. Automated text classification is a research field of developing tools, methods, and models to automate text classification. This article describes the current popular approach for text classification and major text classification projects and applications that are based on library classification schemes. Related issues and challenges are discussed, and a number of considerations for the challenges are examined.
Date: 22. 9.2008 18:31:54

Humphrey, S.M.; Névéol, A.; Browne, A.; Gobeil, J.; Ruch, P.; Darmoni, S.J.: Comparing a rule-based versus statistical system for automatic categorization of MEDLINE documents according to biomedical specialty (2009) 0.03
```
0.028607449 = product of:
  0.085822344 = sum of:
    0.041947264 = weight(_text_:applications in 3300) [ClassicSimilarity], result of:
      0.041947264 = score(doc=3300,freq=2.0), product of:
        0.17247584 = queryWeight, product of:
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03917671 = queryNorm
        0.2432066 = fieldWeight in 3300, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3300)
    0.014968789 = weight(_text_:of in 3300) [ClassicSimilarity], result of:
      0.014968789 = score(doc=3300,freq=16.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.24433708 = fieldWeight in 3300, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3300)
    0.02890629 = weight(_text_:systems in 3300) [ClassicSimilarity], result of:
      0.02890629 = score(doc=3300,freq=4.0), product of:
        0.12039685 = queryWeight, product of:
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03917671 = queryNorm
        0.24009174 = fieldWeight in 3300, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3300)
  0.33333334 = coord(3/9)
```
Abstract

Automatic document categorization is an important research problem in Information Science and Natural Language Processing. Many applications, including, Word Sense Disambiguation and Information Retrieval in large collections, can benefit from such categorization. This paper focuses on automatic categorization of documents from the biomedical literature into broad discipline-based categories. Two different systems are described and contrasted: CISMeF, which uses rules based on human indexing of the documents by the Medical Subject Headings (MeSH) controlled vocabulary in order to assign metaterms (MTs), and Journal Descriptor Indexing (JDI), based on human categorization of about 4,000 journals and statistical associations between journal descriptors (JDs) and textwords in the documents. We evaluate and compare the performance of these systems against a gold standard of humanly assigned categories for 100 MEDLINE documents, using six measures selected from trec_eval. The results show that for five of the measures performance is comparable, and for one measure JDI is superior. We conclude that these results favor JDI, given the significantly greater intellectual overhead involved in human indexing and maintaining a rule base for mapping MeSH terms to MTs. We also note a JDI method that associates JDs with MeSH indexing rather than textwords, and it may be worthwhile to investigate whether this JDI method (statistical) and CISMeF (rule-based) might be combined and then evaluated showing they are complementary to one another.

Source

Journal of the American Society for Information Science and Technology. 60(2009) no.12, S.2530-2539

Wang, J.: ¬An extensive study on automated Dewey Decimal Classification (2009) 0.03

0.02546304 = product of:
  0.07638912 = sum of:
    0.041947264 = weight(_text_:applications in 3172) [ClassicSimilarity], result of:
      0.041947264 = score(doc=3172,freq=2.0), product of:
        0.17247584 = queryWeight, product of:
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03917671 = queryNorm
        0.2432066 = fieldWeight in 3172, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3172)
    0.0140020205 = weight(_text_:of in 3172) [ClassicSimilarity], result of:
      0.0140020205 = score(doc=3172,freq=14.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.22855641 = fieldWeight in 3172, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3172)
    0.020439833 = weight(_text_:systems in 3172) [ClassicSimilarity], result of:
      0.020439833 = score(doc=3172,freq=2.0), product of:
        0.12039685 = queryWeight, product of:
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03917671 = queryNorm
        0.1697705 = fieldWeight in 3172, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3172)
  0.33333334 = coord(3/9)

Abstract: In this paper, we present a theoretical analysis and extensive experiments on the automated assignment of Dewey Decimal Classification (DDC) classes to bibliographic data with a supervised machine-learning approach. Library classification systems, such as the DDC, impose great obstacles on state-of-art text categorization (TC) technologies, including deep hierarchy, data sparseness, and skewed distribution. We first analyze statistically the document and category distributions over the DDC, and discuss the obstacles imposed by bibliographic corpora and library classification schemes on TC technology. To overcome these obstacles, we propose an innovative algorithm to reshape the DDC structure into a balanced virtual tree by balancing the category distribution and flattening the hierarchy. To improve the classification effectiveness to a level acceptable to real-world applications, we propose an interactive classification model that is able to predict a class of any depth within a limited number of user interactions. The experiments are conducted on a large bibliographic collection created by the Library of Congress within the science and technology domains over 10 years. With no more than three interactions, a classification accuracy of nearly 90% is achieved, thus providing a practical solution to the automatic bibliographic classification problem.
Source: Journal of the American Society for Information Science and Technology. 60(2009) no.11, S.2269-2286

Piros, A.: Automatic interpretation of complex UDC numbers : towards support for library systems (2015) 0.02
```
0.024787461 = product of:
  0.07436238 = sum of:
    0.012701438 = weight(_text_:of in 2301) [ClassicSimilarity], result of:
      0.012701438 = score(doc=2301,freq=18.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.20732687 = fieldWeight in 2301, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03125 = fieldNorm(doc=2301)
    0.023125032 = weight(_text_:systems in 2301) [ClassicSimilarity], result of:
      0.023125032 = score(doc=2301,freq=4.0), product of:
        0.12039685 = queryWeight, product of:
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03917671 = queryNorm
        0.19207339 = fieldWeight in 2301, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03125 = fieldNorm(doc=2301)
    0.03853591 = weight(_text_:software in 2301) [ClassicSimilarity], result of:
      0.03853591 = score(doc=2301,freq=4.0), product of:
        0.15541996 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.03917671 = queryNorm
        0.24794699 = fieldWeight in 2301, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.03125 = fieldNorm(doc=2301)
  0.33333334 = coord(3/9)
```
Abstract

Analytico-synthetic and faceted classifications, such as Universal Decimal Classification (UDC) express content of documents with complex, pre-combined classification codes. Without classification authority control that would help manage and access structured notations, the use of UDC codes in searching and browsing is limited. Existing UDC parsing solutions are usually created for a particular database system or a specific task and are not widely applicable. The approach described in this paper provides a solution by which the analysis and interpretation of UDC notations would be stored into an intermediate format (in this case, in XML) by automatic means without any data or information loss. Due to its richness, the output file can be converted into different formats, such as standard mark-up and data exchange formats or simple lists of the recommended entry points of a UDC number. The program can also be used to create authority records containing complex UDC numbers which can be comprehensively analysed in order to be retrieved effectively. The Java program, as well as the corresponding schema definition it employs, is under continuous development. The current version of the interpreter software is now available online for testing purposes at the following web site: http://interpreter-eto.rhcloud.com. The future plan is to implement conversion methods for standard formats and to create standard online interfaces in order to make it possible to use the features of software as a service. This would result in the algorithm being able to be employed both in existing and future library systems to analyse UDC numbers without any significant programming effort.

Source

Classification and authority control: expanding resource discovery: proceedings of the International UDC Seminar 2015, 29-30 October 2015, Lisbon, Portugal. Eds.: Slavic, A. u. M.I. Cordeiro

AlQenaei, Z.M.; Monarchi, D.E.: ¬The use of learning techniques to analyze the results of a manual classification system (2016) 0.02

0.024767645 = product of:
  0.074302934 = sum of:
    0.019801848 = weight(_text_:of in 2836) [ClassicSimilarity], result of:
      0.019801848 = score(doc=2836,freq=28.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.32322758 = fieldWeight in 2836, product of:
          5.2915025 = tf(freq=28.0), with freq of:
            28.0 = termFreq=28.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2836)
    0.020439833 = weight(_text_:systems in 2836) [ClassicSimilarity], result of:
      0.020439833 = score(doc=2836,freq=2.0), product of:
        0.12039685 = queryWeight, product of:
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03917671 = queryNorm
        0.1697705 = fieldWeight in 2836, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2836)
    0.034061253 = weight(_text_:software in 2836) [ClassicSimilarity], result of:
      0.034061253 = score(doc=2836,freq=2.0), product of:
        0.15541996 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.03917671 = queryNorm
        0.21915624 = fieldWeight in 2836, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2836)
  0.33333334 = coord(3/9)

Abstract: Classification is the process of assigning objects to pre-defined classes based on observations or characteristics of those objects, and there are many approaches to performing this task. The overall objective of this study is to demonstrate the use of two learning techniques to analyze the results of a manual classification system. Our sample consisted of 1,026 documents, from the ACM Computing Classification System, classified by their authors as belonging to one of the groups of the classification system: "H.3 Information Storage and Retrieval." A singular value decomposition of the documents' weighted term-frequency matrix was used to represent each document in a 50-dimensional vector space. The analysis of the representation using both supervised (decision tree) and unsupervised (clustering) techniques suggests that two pairs of the ACM classes are closely related to each other in the vector space. Class 1 (Content Analysis and Indexing) is closely related to Class 3 (Information Search and Retrieval), and Class 4 (Systems and Software) is closely related to Class 5 (Online Information Services). Further analysis was performed to test the diffusion of the words in the two classes using both cosine and Euclidean distance.

Golub, K.; Soergel, D.; Buchanan, G.; Tudhope, D.; Lykke, M.; Hiom, D.: ¬A framework for evaluating automatic indexing or classification in the context of retrieval (2016) 0.02

0.023156626 = product of:
  0.06946988 = sum of:
    0.014968789 = weight(_text_:of in 3311) [ClassicSimilarity], result of:
      0.014968789 = score(doc=3311,freq=16.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.24433708 = fieldWeight in 3311, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3311)
    0.020439833 = weight(_text_:systems in 3311) [ClassicSimilarity], result of:
      0.020439833 = score(doc=3311,freq=2.0), product of:
        0.12039685 = queryWeight, product of:
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03917671 = queryNorm
        0.1697705 = fieldWeight in 3311, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3311)
    0.034061253 = weight(_text_:software in 3311) [ClassicSimilarity], result of:
      0.034061253 = score(doc=3311,freq=2.0), product of:
        0.15541996 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.03917671 = queryNorm
        0.21915624 = fieldWeight in 3311, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3311)
  0.33333334 = coord(3/9)

Abstract: Tools for automatic subject assignment help deal with scale and sustainability in creating and enriching metadata, establishing more connections across and between resources and enhancing consistency. Although some software vendors and experimental researchers claim the tools can replace manual subject indexing, hard scientific evidence of their performance in operating information environments is scarce. A major reason for this is that research is usually conducted in laboratory conditions, excluding the complexities of real-life systems and situations. The article reviews and discusses issues with existing evaluation approaches such as problems of aboutness and relevance assessments, implying the need to use more than a single "gold standard" method when evaluating indexing and retrieval, and proposes a comprehensive evaluation framework. The framework is informed by a systematic review of the literature on evaluation approaches: evaluating indexing quality directly through assessment by an evaluator or through comparison with a gold standard, evaluating the quality of computer-assisted indexing directly in the context of an indexing workflow, and evaluating indexing quality indirectly through analyzing retrieval performance.
Source: Journal of the Association for Information Science and Technology. 67(2016) no.1, S.3-16

Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998) 0.02

0.021780707 = product of:
  0.06534212 = sum of:
    0.018148692 = weight(_text_:of in 1673) [ClassicSimilarity], result of:
      0.018148692 = score(doc=1673,freq=12.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.29624295 = fieldWeight in 1673, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1673)
    0.028615767 = weight(_text_:systems in 1673) [ClassicSimilarity], result of:
      0.028615767 = score(doc=1673,freq=2.0), product of:
        0.12039685 = queryWeight, product of:
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03917671 = queryNorm
        0.23767869 = fieldWeight in 1673, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1673)
    0.018577661 = product of:
      0.037155323 = sum of:
        0.037155323 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
          0.037155323 = score(doc=1673,freq=2.0), product of:
            0.13719016 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03917671 = queryNorm
            0.2708308 = fieldWeight in 1673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1673)
      0.5 = coord(1/2)
  0.33333334 = coord(3/9)

Abstract: The Wolverhampton Web Library (WWLib) is a WWW search engine that provides access to UK based information. The experimental version developed in 1995, was a success but highlighted the need for a much higher degree of automation. An interesting feature of the experimental WWLib was that it organised information according to DDC. Discusses the advantages of classification and describes the automatic classifier that is being developed in Java as part of the new, fully automated WWLib
Date: 1. 8.1996 22:08:06
Footnote: Contribution to a special issue devoted to the Proceedings of the 7th International World Wide Web Conference, held 14-18 April 1998, Brisbane, Australia; vgl. auch: http://www7.scu.edu.au/programme/posters/1846/com1846.htm.
Source: Computer networks and ISDN systems. 30(1998) nos.1/7, S.646-648

Search Engines and Beyond : Developing efficient knowledge management systems, April 19-20 1999, Boston, Mass (1999) 0.02
```
0.021254174 = product of:
  0.06376252 = sum of:
    0.013388492 = weight(_text_:of in 2596) [ClassicSimilarity], result of:
      0.013388492 = score(doc=2596,freq=20.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.21854173 = fieldWeight in 2596, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03125 = fieldNorm(doc=2596)
    0.023125032 = weight(_text_:systems in 2596) [ClassicSimilarity], result of:
      0.023125032 = score(doc=2596,freq=4.0), product of:
        0.12039685 = queryWeight, product of:
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03917671 = queryNorm
        0.19207339 = fieldWeight in 2596, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03125 = fieldNorm(doc=2596)
    0.027249003 = weight(_text_:software in 2596) [ClassicSimilarity], result of:
      0.027249003 = score(doc=2596,freq=2.0), product of:
        0.15541996 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.03917671 = queryNorm
        0.17532499 = fieldWeight in 2596, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.03125 = fieldNorm(doc=2596)
  0.33333334 = coord(3/9)
```
Abstract

This series of meetings originated in Albuquerque, New Mexico in 1995. This inaugural meeting (part of an ASIDIC series) was transplanted to Bath in England (1996 and 1997) and then to Boston, Massachusetts (1998 and 1999). The Search Engines Meetings bring together commercial search engine developers, academics and corporate professionals to learn from each other. Infonortics, sponsor of meetings post-1995 with Ev Brenner, plans to continue the same success in Boston in 2000.

Content

Ramana Rao (Inxight, Palo Alto, CA) 7 ± 2 Insights on achieving Effective Information Access Session One: Updates and a twelve month perspective Danny Sullivan (Search Engine Watch, US / England) Portalization and other search trends Carol Tenopir (University of Tennessee) Search realities faced by end users and professional searchers Session Two: Today's search engines and beyond Daniel Hoogterp (Retrieval Technologies, McLean, VA) Effective presentation and utilization of search techniques Rick Kenny (Fulcrum Technologies, Ontario, Canada) Beyond document clustering: The knowledge impact statement Gary Stock (Ingenius, Kalamazoo, MI) Automated change monitoring Gary Culliss (Direct Hit, Wellesley Hills, MA) User popularity ranked search engines Byron Dom (IBM, CA) Automatically finding the best pages on the World Wide Web (CLEVER) Peter Tomassi (LookSmart, San Francisco, CA) Adding human intellect to search technology Session Three: Panel discussion: Human v automated categorization and editing Ev Brenner (New York, NY)- Chairman James Callan (University of Massachusetts, MA) Marc Krellenstein (Northern Light Technology, Cambridge, MA) Dan Miller (Ask Jeeves, Berkeley, CA) Session Four: Updates and a twelve month perspective Steve Arnold (AIT, Harrods Creek, KY) Review: The leading edge in search and retrieval software Ellen Voorhees (NIST, Gaithersburg, MD) TREC update Session Five: Search engines now and beyond Intelligent Agents John Snyder (Muscat, Cambridge, England) Practical issues behind intelligent agents Text summarization Therese Firmin, (Dept of Defense, Ft George G. Meade, MD) The TIPSTER/SUMMAC evaluation of automatic text summarization systems Cross language searching Elizabeth Liddy (TextWise, Syracuse, NY) A conceptual interlingua approach to cross-language retrieval. Video search and retrieval Armon Amir (IBM, Almaden, CA) CueVideo: Modular system for automatic indexing and browsing of video/audio Speech recognition Michael Witbrock (Lycos, Waltham, MA) Retrieval of spoken documents Visualization James A. Wise (Integral Visuals, Richland, WA) Information visualization in the new millennium: Emerging science or passing fashion? Text mining David Evans (Claritech, Pittsburgh, PA) Text mining - towards decision support
Adams, K.C.: Word wranglers : Automatic classification tools transform enterprise documents from "bags of words" into knowledge resources (2003) 0.02
```
0.0206442 = product of:
  0.0928989 = sum of:
    0.016735615 = weight(_text_:of in 1665) [ClassicSimilarity], result of:
      0.016735615 = score(doc=1665,freq=20.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.27317715 = fieldWeight in 1665, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1665)
    0.076163284 = weight(_text_:software in 1665) [ClassicSimilarity], result of:
      0.076163284 = score(doc=1665,freq=10.0), product of:
        0.15541996 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.03917671 = queryNorm
        0.49004826 = fieldWeight in 1665, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1665)
  0.22222222 = coord(2/9)
```
Abstract

Taxonomies are an important part of any knowledge management (KM) system, and automatic classification software is emerging as a "killer app" for consumer and enterprise portals. A number of companies such as Inxight Software , Mohomine, Metacode, and others claim to interpret the semantic content of any textual document and automatically classify text on the fly. The promise that software could automatically produce a Yahoo-style directory is a siren call not many IT managers are able to resist. KM needs have grown more complex due to the increasing amount of digital information, the declining effectiveness of keyword searching, and heterogeneous document formats in corporate databases. This environment requires innovative KM tools, and automatic classification technology is an example of this new kind of software. These products can be divided into three categories according to their underlying technology - rules-based, catalog-by-example, and statistical clustering. Evolving trends in this market include framing classification as a cyborg (computer- and human-based) activity and the increasing use of extensible markup language (XML) and support vector machine (SVM) technology. In this article, we'll survey the rapidly changing automatic classification software market and examine the features and capabilities of leading classification products.
Classification, automation, and new media : Proceedings of the 24th Annual Conference of the Gesellschaft für Klassifikation e.V., University of Passau, March 15 - 17, 2000 (2002) 0.02
```
0.01925707 = product of:
  0.08665682 = sum of:
    0.0726548 = weight(_text_:applications in 5997) [ClassicSimilarity], result of:
      0.0726548 = score(doc=5997,freq=6.0), product of:
        0.17247584 = queryWeight, product of:
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03917671 = queryNorm
        0.42124623 = fieldWeight in 5997, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5997)
    0.0140020205 = weight(_text_:of in 5997) [ClassicSimilarity], result of:
      0.0140020205 = score(doc=5997,freq=14.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.22855641 = fieldWeight in 5997, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5997)
  0.22222222 = coord(2/9)
```
Abstract

Given the huge amount of information in the internet and in practically every domain of knowledge that we are facing today, knowledge discovery calls for automation. The book deals with methods from classification and data analysis that respond effectively to this rapidly growing challenge. The interested reader will find new methodological insights as well as applications in economics, management science, finance, and marketing, and in pattern recognition, biology, health, and archaeology.

Content

Data Analysis, Statistics, and Classification.- Pattern Recognition and Automation.- Data Mining, Information Processing, and Automation.- New Media, Web Mining, and Automation.- Applications in Management Science, Finance, and Marketing.- Applications in Medicine, Biology, Archaeology, and Others.- Author Index.- Subject Index.

Series

Proceedings of the ... annual conference of the Gesellschaft für Klassifikation e.V. ; 24)(Studies in classification, data analysis, and knowledge organization

Montesi, M.; Navarrete, T.: Classifying web genres in context : A case study documenting the web genres used by a software engineer (2008) 0.02

0.018887917 = product of:
  0.08499563 = sum of:
    0.014200641 = weight(_text_:of in 2100) [ClassicSimilarity], result of:
      0.014200641 = score(doc=2100,freq=10.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.23179851 = fieldWeight in 2100, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=2100)
    0.070794985 = weight(_text_:software in 2100) [ClassicSimilarity], result of:
      0.070794985 = score(doc=2100,freq=6.0), product of:
        0.15541996 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.03917671 = queryNorm
        0.4555077 = fieldWeight in 2100, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.046875 = fieldNorm(doc=2100)
  0.22222222 = coord(2/9)

Abstract: This case study analyzes the Internet-based resources that a software engineer uses in his daily work. Methodologically, we studied the web browser history of the participant, classifying all the web pages he had seen over a period of 12 days into web genres. We interviewed him before and after the analysis of the web browser history. In the first interview, he spoke about his general information behavior; in the second, he commented on each web genre, explaining why and how he used them. As a result, three approaches allow us to describe the set of 23 web genres obtained: (a) the purposes they serve for the participant; (b) the role they play in the various work and search phases; (c) and the way they are used in combination with each other. Further observations concern the way the participant assesses quality of web-based resources, and his information behavior as a software engineer.

Autonomy, Inc.: Automatic classification (o.J.) 0.02

0.017575702 = product of:
  0.079090655 = sum of:
    0.06711562 = weight(_text_:applications in 1666) [ClassicSimilarity], result of:
      0.06711562 = score(doc=1666,freq=2.0), product of:
        0.17247584 = queryWeight, product of:
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03917671 = queryNorm
        0.38913056 = fieldWeight in 1666, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.0625 = fieldNorm(doc=1666)
    0.011975031 = weight(_text_:of in 1666) [ClassicSimilarity], result of:
      0.011975031 = score(doc=1666,freq=4.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.19546966 = fieldWeight in 1666, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0625 = fieldNorm(doc=1666)
  0.22222222 = coord(2/9)

Abstract: Autonomy's Classification solutions remove the necessity for organizations to rely on human intervention or manual processing of information, such as manual tagging, typically required to make most other e-business applications work. Autonomy's ability to consistently and accurately classify data automatically is a unique infrastructure solution that overcomes the predicaments surrounding the exponential growth of unstructured data.

Hu, G.; Zhou, S.; Guan, J.; Hu, X.: Towards effective document clustering : a constrained K-means based approach (2008) 0.02

0.017406443 = product of:
  0.078329 = sum of:
    0.05872617 = weight(_text_:applications in 2113) [ClassicSimilarity], result of:
      0.05872617 = score(doc=2113,freq=2.0), product of:
        0.17247584 = queryWeight, product of:
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03917671 = queryNorm
        0.34048924 = fieldWeight in 2113, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2113)
    0.01960283 = weight(_text_:of in 2113) [ClassicSimilarity], result of:
      0.01960283 = score(doc=2113,freq=14.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.31997898 = fieldWeight in 2113, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2113)
  0.22222222 = coord(2/9)

Abstract: Document clustering is an important tool for document collection organization and browsing. In real applications, some limited knowledge about cluster membership of a small number of documents is often available, such as some pairs of documents belonging to the same cluster. This kind of prior knowledge can be served as constraints for the clustering process. We integrate the constraints into the trace formulation of the sum of square Euclidean distance function of K-means. Then, the combined criterion function is transformed into trace maximization, which is further optimized by eigen-decomposition. Our experimental evaluation shows that the proposed semi-supervised clustering method can achieve better performance, compared to three existing methods.

Losee, R.M.: Text windows and phrases differing by discipline, location in document, and syntactic structure (1996) 0.02

0.016731909 = product of:
  0.075293586 = sum of:
    0.05872617 = weight(_text_:applications in 6962) [ClassicSimilarity], result of:
      0.05872617 = score(doc=6962,freq=2.0), product of:
        0.17247584 = queryWeight, product of:
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03917671 = queryNorm
        0.34048924 = fieldWeight in 6962, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.0546875 = fieldNorm(doc=6962)
    0.016567415 = weight(_text_:of in 6962) [ClassicSimilarity], result of:
      0.016567415 = score(doc=6962,freq=10.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.2704316 = fieldWeight in 6962, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=6962)
  0.22222222 = coord(2/9)

Abstract: Knowledge of window style, content, location, and grammatical structure may be used to classify documents as originating within a particular discipline or may be used to place a document on a theory vs. practice spectrum. Examines characteristics of phrases and text windows, including their number, location in documents, and grammatical construction, in addition to studying variations in these window characteristics across disciplines. Examines some of the linguistic regularities for individual disciplines, and suggests families of regularities that may provide helpful for the automatic classification of documents, as well as for information retrieval and filtering applications

Search (178 results, page 1 of 9)

Authors

Years

Languages

Types

Themes

Subjects