Search (158 results, page 1 of 8)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.29

0.28663272 = product of:
  0.42994907 = sum of:
    0.05995991 = product of:
      0.17987972 = sum of:
        0.17987972 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.17987972 = score(doc=562,freq=2.0), product of:
            0.3200604 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.037751827 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.17987972 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.17987972 = score(doc=562,freq=2.0), product of:
        0.3200604 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.037751827 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.17987972 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.17987972 = score(doc=562,freq=2.0), product of:
        0.3200604 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.037751827 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.010229703 = product of:
      0.030689107 = sum of:
        0.030689107 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.030689107 = score(doc=562,freq=2.0), product of:
            0.13220046 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.037751827 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
  0.6666667 = coord(4/6)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Bock, H.-H.: Datenanalyse zur Strukturierung und Ordnung von Information (1989) 0.08

0.08032842 = product of:
  0.16065684 = sum of:
    0.008997706 = weight(_text_:information in 141) [ClassicSimilarity], result of:
      0.008997706 = score(doc=141,freq=2.0), product of:
        0.0662725 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.037751827 = queryNorm
        0.13576832 = fieldWeight in 141, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=141)
    0.13972448 = weight(_text_:strukturierung in 141) [ClassicSimilarity], result of:
      0.13972448 = score(doc=141,freq=2.0), product of:
        0.26115823 = queryWeight, product of:
          6.9177637 = idf(docFreq=118, maxDocs=44218)
          0.037751827 = queryNorm
        0.5350185 = fieldWeight in 141, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.9177637 = idf(docFreq=118, maxDocs=44218)
          0.0546875 = fieldNorm(doc=141)
    0.011934653 = product of:
      0.03580396 = sum of:
        0.03580396 = weight(_text_:22 in 141) [ClassicSimilarity], result of:
          0.03580396 = score(doc=141,freq=2.0), product of:
            0.13220046 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.037751827 = queryNorm
            0.2708308 = fieldWeight in 141, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=141)
      0.33333334 = coord(1/3)
  0.5 = coord(3/6)

Pages: S.1-22

Bock, H.-H.: Automatische Klassifikation : theoretische und praktische Methoden zur Gruppierung und Strukturierung von Daten (Cluster-Analyse) (1974) 0.05

0.053228375 = product of:
  0.31937024 = sum of:
    0.31937024 = weight(_text_:strukturierung in 7693) [ClassicSimilarity], result of:
      0.31937024 = score(doc=7693,freq=2.0), product of:
        0.26115823 = queryWeight, product of:
          6.9177637 = idf(docFreq=118, maxDocs=44218)
          0.037751827 = queryNorm
        1.2228994 = fieldWeight in 7693, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.9177637 = idf(docFreq=118, maxDocs=44218)
          0.125 = fieldNorm(doc=7693)
  0.16666667 = coord(1/6)

Reiner, U.: Automatische DDC-Klassifizierung von bibliografischen Titeldatensätzen (2009) 0.01

0.009967791 = product of:
  0.02990337 = sum of:
    0.012853865 = weight(_text_:information in 611) [ClassicSimilarity], result of:
      0.012853865 = score(doc=611,freq=2.0), product of:
        0.0662725 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.037751827 = queryNorm
        0.19395474 = fieldWeight in 611, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.078125 = fieldNorm(doc=611)
    0.017049506 = product of:
      0.051148515 = sum of:
        0.051148515 = weight(_text_:22 in 611) [ClassicSimilarity], result of:
          0.051148515 = score(doc=611,freq=2.0), product of:
            0.13220046 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.037751827 = queryNorm
            0.38690117 = fieldWeight in 611, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=611)
      0.33333334 = coord(1/3)
  0.33333334 = coord(2/6)

Content: Präsentation zum Vortrag anlässlich des 98. Deutscher Bibliothekartag in Erfurt: Ein neuer Blick auf Bibliotheken; TK10: Information erschließen und recherchieren Inhalte erschließen - mit neuen Tools
Date: 22. 8.2009 12:54:24

Dubin, D.: Dimensions and discriminability (1998) 0.01

0.009173046 = product of:
  0.027519137 = sum of:
    0.015584484 = weight(_text_:information in 2338) [ClassicSimilarity], result of:
      0.015584484 = score(doc=2338,freq=6.0), product of:
        0.0662725 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.037751827 = queryNorm
        0.23515764 = fieldWeight in 2338, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2338)
    0.011934653 = product of:
      0.03580396 = sum of:
        0.03580396 = weight(_text_:22 in 2338) [ClassicSimilarity], result of:
          0.03580396 = score(doc=2338,freq=2.0), product of:
            0.13220046 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.037751827 = queryNorm
            0.2708308 = fieldWeight in 2338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2338)
      0.33333334 = coord(1/3)
  0.33333334 = coord(2/6)

Date: 22. 9.1997 19:16:05
Imprint: Urbana-Champaign, IL : Illinois University at Urbana-Champaign, Graduate School of Library and Information Science
Source: Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al

Sommer, M.: Automatische Generierung von DDC-Notationen für Hochschulveröffentlichungen (2012) 0.01

0.0085824 = product of:
  0.025747199 = sum of:
    0.015424638 = weight(_text_:information in 587) [ClassicSimilarity], result of:
      0.015424638 = score(doc=587,freq=8.0), product of:
        0.0662725 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.037751827 = queryNorm
        0.23274569 = fieldWeight in 587, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=587)
    0.0103225615 = product of:
      0.030967683 = sum of:
        0.030967683 = weight(_text_:29 in 587) [ClassicSimilarity], result of:
          0.030967683 = score(doc=587,freq=2.0), product of:
            0.13279912 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.037751827 = queryNorm
            0.23319192 = fieldWeight in 587, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.046875 = fieldNorm(doc=587)
      0.33333334 = coord(1/3)
  0.33333334 = coord(2/6)

Content: Vgl. unter: http://opus.bsz-bw.de/fhhv/volltexte/2012/397/pdf/Bachelorarbeit_final_Korrektur01.pdf. Bachelorarbeit, Hochschule Hannover, Fakultät III - Medien, Information und Design, Abteilung Information und Kommunikation, Studiengang Informationsmanagement
Date: 29. 1.2013 15:44:43
Imprint: Hannover : Hochschule Hannover, Fakultät III - Medien, Information und Design, Abteilung Information und Kommunikation

Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.01
```
0.008509606 = product of:
  0.025528818 = sum of:
    0.017004065 = weight(_text_:information in 1107) [ClassicSimilarity], result of:
      0.017004065 = score(doc=1107,freq=14.0), product of:
        0.0662725 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.037751827 = queryNorm
        0.256578 = fieldWeight in 1107, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1107)
    0.008524753 = product of:
      0.025574258 = sum of:
        0.025574258 = weight(_text_:22 in 1107) [ClassicSimilarity], result of:
          0.025574258 = score(doc=1107,freq=2.0), product of:
            0.13220046 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.037751827 = queryNorm
            0.19345059 = fieldWeight in 1107, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1107)
      0.33333334 = coord(1/3)
  0.33333334 = coord(2/6)
```
Abstract

Retrieval of disease information is often based on several key aspects such as etiology, diagnosis, treatment, prevention, and symptoms of diseases. Automatic identification of disease aspect information is thus essential. In this article, I model the aspect identification problem as a text classification (TC) problem in which a disease aspect corresponds to a category. The disease aspect classification problem poses two challenges to classifiers: (a) a medical text often contains information about multiple aspects of a disease and hence produces noise for the classifiers and (b) text classifiers often cannot extract the textual parts (i.e., passages) about the categories of interest. I thus develop a technique, PETC (Passage Extractor for Text Classification), that extracts passages (from medical texts) for the underlying text classifiers to classify. Case studies on thousands of Chinese and English medical texts show that PETC enhances a support vector machine (SVM) classifier in classifying disease aspect information. PETC also performs better than three state-of-the-art classifier enhancement techniques, including two passage extraction techniques for text classifiers and a technique that employs term proximity information to enhance text classifiers. The contribution is of significance to evidence-based medicine, health education, and healthcare decision support. PETC can be used in those application domains in which a text to be classified may have several parts about different categories.

Date

28.10.2013 19:22:57

Source

Journal of the American Society for Information Science and Technology. 64(2013) no.11, S.2265-2277

Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998) 0.01

0.008219777 = product of:
  0.02465933 = sum of:
    0.012724677 = weight(_text_:information in 1673) [ClassicSimilarity], result of:
      0.012724677 = score(doc=1673,freq=4.0), product of:
        0.0662725 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.037751827 = queryNorm
        0.1920054 = fieldWeight in 1673, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1673)
    0.011934653 = product of:
      0.03580396 = sum of:
        0.03580396 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
          0.03580396 = score(doc=1673,freq=2.0), product of:
            0.13220046 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.037751827 = queryNorm
            0.2708308 = fieldWeight in 1673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1673)
      0.33333334 = coord(1/3)
  0.33333334 = coord(2/6)

Abstract: The Wolverhampton Web Library (WWLib) is a WWW search engine that provides access to UK based information. The experimental version developed in 1995, was a success but highlighted the need for a much higher degree of automation. An interesting feature of the experimental WWLib was that it organised information according to DDC. Discusses the advantages of classification and describes the automatic classifier that is being developed in Java as part of the new, fully automated WWLib
Date: 1. 8.1996 22:08:06

Li, T.; Zhu, S.; Ogihara, M.: Hierarchical document classification using automatically generated hierarchy (2007) 0.01
```
0.007893564 = product of:
  0.02368069 = sum of:
    0.013358129 = weight(_text_:information in 4797) [ClassicSimilarity], result of:
      0.013358129 = score(doc=4797,freq=6.0), product of:
        0.0662725 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.037751827 = queryNorm
        0.20156369 = fieldWeight in 4797, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=4797)
    0.0103225615 = product of:
      0.030967683 = sum of:
        0.030967683 = weight(_text_:29 in 4797) [ClassicSimilarity], result of:
          0.030967683 = score(doc=4797,freq=2.0), product of:
            0.13279912 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.037751827 = queryNorm
            0.23319192 = fieldWeight in 4797, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.046875 = fieldNorm(doc=4797)
      0.33333334 = coord(1/3)
  0.33333334 = coord(2/6)
```
Abstract

Automated text categorization has witnessed a booming interest with the exponential growth of information and the ever-increasing needs for organizations. The underlying hierarchical structure identifies the relationships of dependence between different categories and provides valuable sources of information for categorization. Although considerable research has been conducted in the field of hierarchical document categorization, little has been done on automatic generation of topic hierarchies. In this paper, we propose the method of using linear discriminant projection to generate more meaningful intermediate levels of hierarchies in large flat sets of classes. The linear discriminant projection approach first transforms all documents onto a low-dimensional space and then clusters the categories into hier- archies accordingly. The paper also investigates the effect of using generated hierarchical structure for text classification. Our experiments show that generated hierarchies improve classification performance in most cases.

Source

Journal of intelligent information systems. 29(2007) no.2, S.211-230
Liu, R.-L.: Context recognition for hierarchical text classification (2009) 0.01
```
0.007862611 = product of:
  0.023587832 = sum of:
    0.013358129 = weight(_text_:information in 2760) [ClassicSimilarity], result of:
      0.013358129 = score(doc=2760,freq=6.0), product of:
        0.0662725 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.037751827 = queryNorm
        0.20156369 = fieldWeight in 2760, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2760)
    0.010229703 = product of:
      0.030689107 = sum of:
        0.030689107 = weight(_text_:22 in 2760) [ClassicSimilarity], result of:
          0.030689107 = score(doc=2760,freq=2.0), product of:
            0.13220046 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.037751827 = queryNorm
            0.23214069 = fieldWeight in 2760, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2760)
      0.33333334 = coord(1/3)
  0.33333334 = coord(2/6)
```
Abstract

Information is often organized as a text hierarchy. A hierarchical text-classification system is thus essential for the management, sharing, and dissemination of information. It aims to automatically classify each incoming document into zero, one, or several categories in the text hierarchy. In this paper, we present a technique called CRHTC (context recognition for hierarchical text classification) that performs hierarchical text classification by recognizing the context of discussion (COD) of each category. A category's COD is governed by its ancestor categories, whose contents indicate contextual backgrounds of the category. A document may be classified into a category only if its content matches the category's COD. CRHTC does not require any trials to manually set parameters, and hence is more portable and easier to implement than other methods. It is empirically evaluated under various conditions. The results show that CRHTC achieves both better and more stable performance than several hierarchical and nonhierarchical text-classification methodologies.

Date

22. 3.2009 19:11:54

Source

Journal of the American Society for Information Science and Technology. 60(2009) no.4, S.803-813
Chung, Y.M.; Lee, J.Y.: ¬A corpus-based approach to comparative evaluation of statistical term association measures (2001) 0.01
```
0.007152 = product of:
  0.021456 = sum of:
    0.012853865 = weight(_text_:information in 5769) [ClassicSimilarity], result of:
      0.012853865 = score(doc=5769,freq=8.0), product of:
        0.0662725 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.037751827 = queryNorm
        0.19395474 = fieldWeight in 5769, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5769)
    0.008602135 = product of:
      0.025806403 = sum of:
        0.025806403 = weight(_text_:29 in 5769) [ClassicSimilarity], result of:
          0.025806403 = score(doc=5769,freq=2.0), product of:
            0.13279912 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.037751827 = queryNorm
            0.19432661 = fieldWeight in 5769, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5769)
      0.33333334 = coord(1/3)
  0.33333334 = coord(2/6)
```
Abstract

Statistical association measures have been widely applied in information retrieval research, usually employing a clustering of documents or terms on the basis of their relationships. Applications of the association measures for term clustering include automatic thesaurus construction and query expansion. This research evaluates the similarity of six association measures by comparing the relationship and behavior they demonstrate in various analyses of a test corpus. Analysis techniques include comparisons of highly ranked term pairs and term clusters, analyses of the correlation among the association measures using Pearson's correlation coefficient and MDS mapping, and an analysis of the impact of a term frequency on the association values by means of z-score. The major findings of the study are as follows: First, the most similar association measures are mutual information and Yule's coefficient of colligation Y, whereas cosine and Jaccard coefficients, as well as X**2 statistic and likelihood ratio, demonstrate quite similar behavior for terms with high frequency. Second, among all the measures, the X**2 statistic is the least affected by the frequency of terms. Third, although cosine and Jaccard coefficients tend to emphasize high frequency terms, mutual information and Yule's Y seem to overestimate rare terms

Date

29. 9.2001 14:01:18

Source

Journal of the American Society for Information Science and technology. 52(2001) no.4, S.283-296
Mengle, S.; Goharian, N.: Passage detection using text classification (2009) 0.01
```
0.007126206 = product of:
  0.021378618 = sum of:
    0.012853865 = weight(_text_:information in 2765) [ClassicSimilarity], result of:
      0.012853865 = score(doc=2765,freq=8.0), product of:
        0.0662725 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.037751827 = queryNorm
        0.19395474 = fieldWeight in 2765, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2765)
    0.008524753 = product of:
      0.025574258 = sum of:
        0.025574258 = weight(_text_:22 in 2765) [ClassicSimilarity], result of:
          0.025574258 = score(doc=2765,freq=2.0), product of:
            0.13220046 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.037751827 = queryNorm
            0.19345059 = fieldWeight in 2765, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2765)
      0.33333334 = coord(1/3)
  0.33333334 = coord(2/6)
```
Abstract

Passages can be hidden within a text to circumvent their disallowed transfer. Such release of compartmentalized information is of concern to all corporate and governmental organizations. Passage retrieval is well studied; we posit, however, that passage detection is not. Passage retrieval is the determination of the degree of relevance of blocks of text, namely passages, comprising a document. Rather than determining the relevance of a document in its entirety, passage retrieval determines the relevance of the individual passages. As such, modified traditional information-retrieval techniques compare terms found in user queries with the individual passages to determine a similarity score for passages of interest. In passage detection, passages are classified into predetermined categories. More often than not, passage detection techniques are deployed to detect hidden paragraphs in documents. That is, to hide information, documents are injected with hidden text into passages. Rather than matching query terms against passages to determine their relevance, using text-mining techniques, the passages are classified. Those documents with hidden passages are defined as infected. Thus, simply stated, passage retrieval is the search for passages relevant to a user query, while passage detection is the classification of passages. That is, in passage detection, passages are labeled with one or more categories from a set of predetermined categories. We present a keyword-based dynamic passage approach (KDP) and demonstrate that KDP outperforms statistically significantly (99% confidence) the other document-splitting approaches by 12% to 18% in the passage detection and passage category-prediction tasks. Furthermore, we evaluate the effects of the feature selection, passage length, ambiguous passages, and finally training-data category distribution on passage-detection accuracy.

Date

22. 3.2009 19:14:43

Source

Journal of the American Society for Information Science and Technology. 60(2009) no.4, S.814-825

Drori, O.; Alon, N.: Using document classification for displaying search results (2003) 0.01

0.007076476 = product of:
  0.021229427 = sum of:
    0.010906866 = weight(_text_:information in 1565) [ClassicSimilarity], result of:
      0.010906866 = score(doc=1565,freq=4.0), product of:
        0.0662725 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.037751827 = queryNorm
        0.16457605 = fieldWeight in 1565, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=1565)
    0.0103225615 = product of:
      0.030967683 = sum of:
        0.030967683 = weight(_text_:29 in 1565) [ClassicSimilarity], result of:
          0.030967683 = score(doc=1565,freq=2.0), product of:
            0.13279912 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.037751827 = queryNorm
            0.23319192 = fieldWeight in 1565, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.046875 = fieldNorm(doc=1565)
      0.33333334 = coord(1/3)
  0.33333334 = coord(2/6)

Abstract: In this paper, four self-developed user interfaces that display document search results using different methods were compared. In order to create the four interfaces, two information elements: document categories and lines from the document were used. A user study compared the four interfaces. It was found that the category addition to the interface was beneficial in both measurable and subjective measures. It was also found that displaying the relevant lines from the document increased the effectiveness and shortened the search time in all cases and tasks. It was found that the participants preferred the interface containing categories and relevant lines to all other interfaces checked. It was also the fastest in the objective time measurement. Another sub-research that was conducted showed that the most important parameter for the users was the confidence level that the answer was accurate, and the least important parameter was the feeling of comfort while conducting a search
Source: Journal of information science. 29(2003) no.2, S.97-106

Liu, X.; Yu, S.; Janssens, F.; Glänzel, W.; Moreau, Y.; Moor, B.de: Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database (2010) 0.01
```
0.007076476 = product of:
  0.021229427 = sum of:
    0.010906866 = weight(_text_:information in 3464) [ClassicSimilarity], result of:
      0.010906866 = score(doc=3464,freq=4.0), product of:
        0.0662725 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.037751827 = queryNorm
        0.16457605 = fieldWeight in 3464, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=3464)
    0.0103225615 = product of:
      0.030967683 = sum of:
        0.030967683 = weight(_text_:29 in 3464) [ClassicSimilarity], result of:
          0.030967683 = score(doc=3464,freq=2.0), product of:
            0.13279912 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.037751827 = queryNorm
            0.23319192 = fieldWeight in 3464, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.046875 = fieldNorm(doc=3464)
      0.33333334 = coord(1/3)
  0.33333334 = coord(2/6)
```
Abstract

We propose a new hybrid clustering framework to incorporate text mining with bibliometrics in journal set analysis. The framework integrates two different approaches: clustering ensemble and kernel-fusion clustering. To improve the flexibility and the efficiency of processing large-scale data, we propose an information-based weighting scheme to leverage the effect of multiple data sources in hybrid clustering. Three different algorithms are extended by the proposed weighting scheme and they are employed on a large journal set retrieved from the Web of Science (WoS) database. The clustering performance of the proposed algorithms is systematically evaluated using multiple evaluation methods, and they were cross-compared with alternative methods. Experimental results demonstrate that the proposed weighted hybrid clustering strategy is superior to other methods in clustering performance and efficiency. The proposed approach also provides a more refined structural mapping of journal sets, which is useful for monitoring and detecting new trends in different scientific fields.

Date

1. 6.2010 9:29:57

Source

Journal of the American Society for Information Science and Technology. 61(2010) no.6, S.1105-1119

Savic, D.: Automatic classification of office documents : review of available methods and techniques (1995) 0.01

0.0070135645 = product of:
  0.021040693 = sum of:
    0.008997706 = weight(_text_:information in 2219) [ClassicSimilarity], result of:
      0.008997706 = score(doc=2219,freq=2.0), product of:
        0.0662725 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.037751827 = queryNorm
        0.13576832 = fieldWeight in 2219, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2219)
    0.012042987 = product of:
      0.03612896 = sum of:
        0.03612896 = weight(_text_:29 in 2219) [ClassicSimilarity], result of:
          0.03612896 = score(doc=2219,freq=2.0), product of:
            0.13279912 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.037751827 = queryNorm
            0.27205724 = fieldWeight in 2219, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2219)
      0.33333334 = coord(1/3)
  0.33333334 = coord(2/6)

Abstract: Classification of office documents is one of the administrative functions carried out by almost every organization and institution which sends and receives correspondence. Processing of this increasing amount of information coming and out going mail, in particular its classification, is time consuming and expensive. More and more organizations are seeking a solution for meeting this challenge by designing computer based systems for automatic classification. Examines the present status of available knowledge and methodology which can be used for automatic classification of office documents. Besides a review of classic methods and techniques, the focus id also placed on the application of artificial intelligence
Source: Records management quarterly. 29(1995) no.4, S.3-18

Ruocco, A.S.; Frieder, O.: Clustering and classification of large document bases in a parallel environment (1997) 0.01

0.0070135645 = product of:
  0.021040693 = sum of:
    0.008997706 = weight(_text_:information in 1661) [ClassicSimilarity], result of:
      0.008997706 = score(doc=1661,freq=2.0), product of:
        0.0662725 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.037751827 = queryNorm
        0.13576832 = fieldWeight in 1661, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1661)
    0.012042987 = product of:
      0.03612896 = sum of:
        0.03612896 = weight(_text_:29 in 1661) [ClassicSimilarity], result of:
          0.03612896 = score(doc=1661,freq=2.0), product of:
            0.13279912 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.037751827 = queryNorm
            0.27205724 = fieldWeight in 1661, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1661)
      0.33333334 = coord(1/3)
  0.33333334 = coord(2/6)

Date: 29. 7.1998 17:45:02
Source: Journal of the American Society for Information Science. 48(1997) no.10, S.932-943

Ruiz, M.E.; Srinivasan, P.: Combining machine learning and hierarchical indexing structures for text categorization (2001) 0.01

0.0070135645 = product of:
  0.021040693 = sum of:
    0.008997706 = weight(_text_:information in 1595) [ClassicSimilarity], result of:
      0.008997706 = score(doc=1595,freq=2.0), product of:
        0.0662725 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.037751827 = queryNorm
        0.13576832 = fieldWeight in 1595, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1595)
    0.012042987 = product of:
      0.03612896 = sum of:
        0.03612896 = weight(_text_:29 in 1595) [ClassicSimilarity], result of:
          0.03612896 = score(doc=1595,freq=2.0), product of:
            0.13279912 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.037751827 = queryNorm
            0.27205724 = fieldWeight in 1595, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1595)
      0.33333334 = coord(1/3)
  0.33333334 = coord(2/6)

Date: 11. 5.2003 18:29:44
Imprint: Medford, NJ : Information Today

Yoon, Y.; Lee, C.; Lee, G.G.: ¬An effective procedure for constructing a hierarchical text classification system (2006) 0.01

0.006977453 = product of:
  0.020932358 = sum of:
    0.008997706 = weight(_text_:information in 5273) [ClassicSimilarity], result of:
      0.008997706 = score(doc=5273,freq=2.0), product of:
        0.0662725 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.037751827 = queryNorm
        0.13576832 = fieldWeight in 5273, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5273)
    0.011934653 = product of:
      0.03580396 = sum of:
        0.03580396 = weight(_text_:22 in 5273) [ClassicSimilarity], result of:
          0.03580396 = score(doc=5273,freq=2.0), product of:
            0.13220046 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.037751827 = queryNorm
            0.2708308 = fieldWeight in 5273, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5273)
      0.33333334 = coord(1/3)
  0.33333334 = coord(2/6)

Date: 22. 7.2006 16:24:52
Source: Journal of the American Society for Information Science and Technology. 57(2006) no.3, S.431-442

Yi, K.: Automatic text classification using library classification schemes : trends, issues and challenges (2007) 0.01

0.006977453 = product of:
  0.020932358 = sum of:
    0.008997706 = weight(_text_:information in 2560) [ClassicSimilarity], result of:
      0.008997706 = score(doc=2560,freq=2.0), product of:
        0.0662725 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.037751827 = queryNorm
        0.13576832 = fieldWeight in 2560, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2560)
    0.011934653 = product of:
      0.03580396 = sum of:
        0.03580396 = weight(_text_:22 in 2560) [ClassicSimilarity], result of:
          0.03580396 = score(doc=2560,freq=2.0), product of:
            0.13220046 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.037751827 = queryNorm
            0.2708308 = fieldWeight in 2560, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2560)
      0.33333334 = coord(1/3)
  0.33333334 = coord(2/6)

Abstract: The proliferation of digital resources and their integration into a traditional library setting has created a pressing need for an automated tool that organizes textual information based on library classification schemes. Automated text classification is a research field of developing tools, methods, and models to automate text classification. This article describes the current popular approach for text classification and major text classification projects and applications that are based on library classification schemes. Related issues and challenges are discussed, and a number of considerations for the challenges are examined.
Date: 22. 9.2008 18:31:54

Automatische Klassifikation und Extraktion in Documentum (2005) 0.01
```
0.00657797 = product of:
  0.01973391 = sum of:
    0.011131775 = weight(_text_:information in 3974) [ClassicSimilarity], result of:
      0.011131775 = score(doc=3974,freq=6.0), product of:
        0.0662725 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.037751827 = queryNorm
        0.16796975 = fieldWeight in 3974, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3974)
    0.008602135 = product of:
      0.025806403 = sum of:
        0.025806403 = weight(_text_:29 in 3974) [ClassicSimilarity], result of:
          0.025806403 = score(doc=3974,freq=2.0), product of:
            0.13279912 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.037751827 = queryNorm
            0.19432661 = fieldWeight in 3974, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3974)
      0.33333334 = coord(1/3)
  0.33333334 = coord(2/6)
```
Content

"LCI Comprend ist ab sofort als integriertes Modul für EMCs Content Management System Documentum verfügbar. LCI (Learning Computers International GmbH) hat mit Unterstützung von neeb & partner diese Technologie zur Dokumentenautomation transparent in Documentum integriert. Dies ist die erste bekannte Lösung für automatische, lernende Klassifikation und Extraktion, die direkt auf dem Documentum Datenbestand arbeitet und ohne zusätzliche externe Steuerung auskommt. Die LCI Information Capture Services (ICS) dienen dazu, jegliche Art von Dokument zu klassifizieren und Information daraus zu extrahieren. Das Dokument kann strukturiert, halbstrukturiert oder unstrukturiert sein. Somit können beispielsweise gescannte Formulare genauso verarbeitet werden wie Rechnungen oder E-Mails. Die Extraktions- und Klassifikationsvorschriften und die zu lernenden Beispieldokumente werden einfach interaktiv zusammengestellt und als XML-Struktur gespeichert. Zur Laufzeit wird das Projekt angewendet, um unbekannte Dokumente aufgrund von Regeln und gelernten Beispielen automatisch zu indexieren. Dokumente können damit entweder innerhalb von Documentum oder während des Imports verarbeitet werden. Der neue Server erlaubt das Einlesen von Dateien aus dem Dateisystem oder direkt von POPS-Konten, die Analyse der Dokumente und die automatische Erzeugung von Indexwerten bei der Speicherung in einer Documentum Ablageumgebung. Diese Indexwerte, die durch inhaltsbasierte, auch mehrthematische Klassifikation oder durch Extraktion gewonnen wurden, werden als vordefinierte Attribute mit dem Documentum-Objekt abgelegt. Handelt es sich um ein gescanntes Dokument oder ein Fax, wird automatisch die integrierte Volltext-Texterkennung durchgeführt."

Footnote

Kontakt: LCI GmbH, Freiburger Str. 16, 16,79199 Kirchzarten, Tel.: (0 76 61) 9 89 961o, Fax: (01212) 5 37 48 29 36, info@lci-software.com, www.lci-software.com

Source

Information - Wissenschaft und Praxis. 56(2005) H.5/6, S.276

Search (158 results, page 1 of 8)

Authors

Years

Languages

Types

Themes

Subjects