Search (89 results, page 1 of 5)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.11

0.11451894 = product of:
  0.17177841 = sum of:
    0.08133708 = product of:
      0.24401124 = sum of:
        0.24401124 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.24401124 = score(doc=562,freq=2.0), product of:
            0.43416977 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.051211275 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.09044133 = sum of:
      0.04881081 = weight(_text_:conference in 562) [ClassicSimilarity], result of:
        0.04881081 = score(doc=562,freq=2.0), product of:
          0.19418365 = queryWeight, product of:
            3.7918143 = idf(docFreq=2710, maxDocs=44218)
            0.051211275 = queryNorm
          0.25136417 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.7918143 = idf(docFreq=2710, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
      0.041630525 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
        0.041630525 = score(doc=562,freq=2.0), product of:
          0.17933317 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051211275 = queryNorm
          0.23214069 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
  0.6666667 = coord(2/3)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32
Source: Proceedings of the 4th IEEE International Conference on Data Mining (ICDM 2004), 1-4 November 2004, Brighton, UK

Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998) 0.09

0.09450364 = product of:
  0.14175546 = sum of:
    0.03624057 = weight(_text_:retrieval in 1673) [ClassicSimilarity], result of:
      0.03624057 = score(doc=1673,freq=2.0), product of:
        0.15490976 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.051211275 = queryNorm
        0.23394634 = fieldWeight in 1673, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1673)
    0.10551489 = sum of:
      0.056945946 = weight(_text_:conference in 1673) [ClassicSimilarity], result of:
        0.056945946 = score(doc=1673,freq=2.0), product of:
          0.19418365 = queryWeight, product of:
            3.7918143 = idf(docFreq=2710, maxDocs=44218)
            0.051211275 = queryNorm
          0.2932582 = fieldWeight in 1673, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.7918143 = idf(docFreq=2710, maxDocs=44218)
            0.0546875 = fieldNorm(doc=1673)
      0.048568945 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
        0.048568945 = score(doc=1673,freq=2.0), product of:
          0.17933317 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051211275 = queryNorm
          0.2708308 = fieldWeight in 1673, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=1673)
  0.6666667 = coord(2/3)

Date: 1. 8.1996 22:08:06
Footnote: Contribution to a special issue devoted to the Proceedings of the 7th International World Wide Web Conference, held 14-18 April 1998, Brisbane, Australia; vgl. auch: http://www7.scu.edu.au/programme/posters/1846/com1846.htm.
Theme: Klassifikationssysteme im Online-Retrieval

Yu, W.; Gong, Y.: Document clustering by concept factorization (2004) 0.07

0.07395834 = product of:
  0.110937506 = sum of:
    0.062126692 = weight(_text_:retrieval in 4084) [ClassicSimilarity], result of:
      0.062126692 = score(doc=4084,freq=2.0), product of:
        0.15490976 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.051211275 = queryNorm
        0.40105087 = fieldWeight in 4084, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.09375 = fieldNorm(doc=4084)
    0.04881081 = product of:
      0.09762162 = sum of:
        0.09762162 = weight(_text_:conference in 4084) [ClassicSimilarity], result of:
          0.09762162 = score(doc=4084,freq=2.0), product of:
            0.19418365 = queryWeight, product of:
              3.7918143 = idf(docFreq=2710, maxDocs=44218)
              0.051211275 = queryNorm
            0.50272834 = fieldWeight in 4084, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.7918143 = idf(docFreq=2710, maxDocs=44218)
              0.09375 = fieldNorm(doc=4084)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Source: SIGIR'04: Proceedings of the 27th Annual International ACM-SIGIR Conference an Research and Development in Information Retrieval. Ed.: K. Järvelin, u.a

Shen, D.; Chen, Z.; Yang, Q.; Zeng, H.J.; Zhang, B.; Lu, Y.; Ma, W.Y.: Web page classification through summarization (2004) 0.06

0.061631948 = product of:
  0.09244792 = sum of:
    0.051772244 = weight(_text_:retrieval in 4132) [ClassicSimilarity], result of:
      0.051772244 = score(doc=4132,freq=2.0), product of:
        0.15490976 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.051211275 = queryNorm
        0.33420905 = fieldWeight in 4132, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.078125 = fieldNorm(doc=4132)
    0.040675674 = product of:
      0.08135135 = sum of:
        0.08135135 = weight(_text_:conference in 4132) [ClassicSimilarity], result of:
          0.08135135 = score(doc=4132,freq=2.0), product of:
            0.19418365 = queryWeight, product of:
              3.7918143 = idf(docFreq=2710, maxDocs=44218)
              0.051211275 = queryNorm
            0.41894025 = fieldWeight in 4132, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.7918143 = idf(docFreq=2710, maxDocs=44218)
              0.078125 = fieldNorm(doc=4132)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Source: SIGIR'04: Proceedings of the 27th Annual International ACM-SIGIR Conference an Research and Development in Information Retrieval. Ed.: K. Järvelin, u.a

Reiner, U.: Automatische DDC-Klassifizierung von bibliografischen Titeldatensätzen (2009) 0.06

0.0576429 = product of:
  0.086464345 = sum of:
    0.051772244 = weight(_text_:retrieval in 611) [ClassicSimilarity], result of:
      0.051772244 = score(doc=611,freq=2.0), product of:
        0.15490976 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.051211275 = queryNorm
        0.33420905 = fieldWeight in 611, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.078125 = fieldNorm(doc=611)
    0.034692105 = product of:
      0.06938421 = sum of:
        0.06938421 = weight(_text_:22 in 611) [ClassicSimilarity], result of:
          0.06938421 = score(doc=611,freq=2.0), product of:
            0.17933317 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051211275 = queryNorm
            0.38690117 = fieldWeight in 611, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=611)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Date: 22. 8.2009 12:54:24
Theme: Klassifikationssysteme im Online-Retrieval

Cui, H.; Heidorn, P.B.; Zhang, H.: ¬An approach to automatic classification of text for information retrieval (2002) 0.05

0.05314992 = product of:
  0.07972488 = sum of:
    0.051251903 = weight(_text_:retrieval in 174) [ClassicSimilarity], result of:
      0.051251903 = score(doc=174,freq=4.0), product of:
        0.15490976 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.051211275 = queryNorm
        0.33085006 = fieldWeight in 174, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=174)
    0.028472973 = product of:
      0.056945946 = sum of:
        0.056945946 = weight(_text_:conference in 174) [ClassicSimilarity], result of:
          0.056945946 = score(doc=174,freq=2.0), product of:
            0.19418365 = queryWeight, product of:
              3.7918143 = idf(docFreq=2710, maxDocs=44218)
              0.051211275 = queryNorm
            0.2932582 = fieldWeight in 174, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.7918143 = idf(docFreq=2710, maxDocs=44218)
              0.0546875 = fieldNorm(doc=174)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: In this paper, we explore an approach to make better use of semi-structured documents in information retrieval in the domain of biology. Using machine learning techniques, we make those inherent structures explicit by XML markups. This marking up has great potentials in improving task performance in specimen identification and the usability of online flora and fauna.
Source: Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries : JCDL 2002 ; July 14 - 18, 2002, Portland, Oregon, USA. Ed. by Gary Marchionini

HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.05

0.05024519 = product of:
  0.15073556 = sum of:
    0.15073556 = sum of:
      0.08135135 = weight(_text_:conference in 2748) [ClassicSimilarity], result of:
        0.08135135 = score(doc=2748,freq=2.0), product of:
          0.19418365 = queryWeight, product of:
            3.7918143 = idf(docFreq=2710, maxDocs=44218)
            0.051211275 = queryNorm
          0.41894025 = fieldWeight in 2748, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.7918143 = idf(docFreq=2710, maxDocs=44218)
            0.078125 = fieldNorm(doc=2748)
      0.06938421 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
        0.06938421 = score(doc=2748,freq=2.0), product of:
          0.17933317 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051211275 = queryNorm
          0.38690117 = fieldWeight in 2748, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.078125 = fieldNorm(doc=2748)
  0.33333334 = coord(1/3)

Date: 1. 2.2016 18:25:22
Source: Semantic keyword-based search on structured data sources: First COST Action IC1302 International KEYSTONE Conference, IKC 2015, Coimbra, Portugal, September 8-9, 2015. Revised Selected Papers. Eds.: J. Cardoso et al

Mengle, S.; Goharian, N.: Passage detection using text classification (2009) 0.05
```
0.05015279 = product of:
  0.07522918 = sum of:
    0.05788313 = weight(_text_:retrieval in 2765) [ClassicSimilarity], result of:
      0.05788313 = score(doc=2765,freq=10.0), product of:
        0.15490976 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.051211275 = queryNorm
        0.37365708 = fieldWeight in 2765, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2765)
    0.017346052 = product of:
      0.034692105 = sum of:
        0.034692105 = weight(_text_:22 in 2765) [ClassicSimilarity], result of:
          0.034692105 = score(doc=2765,freq=2.0), product of:
            0.17933317 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051211275 = queryNorm
            0.19345059 = fieldWeight in 2765, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2765)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Passages can be hidden within a text to circumvent their disallowed transfer. Such release of compartmentalized information is of concern to all corporate and governmental organizations. Passage retrieval is well studied; we posit, however, that passage detection is not. Passage retrieval is the determination of the degree of relevance of blocks of text, namely passages, comprising a document. Rather than determining the relevance of a document in its entirety, passage retrieval determines the relevance of the individual passages. As such, modified traditional information-retrieval techniques compare terms found in user queries with the individual passages to determine a similarity score for passages of interest. In passage detection, passages are classified into predetermined categories. More often than not, passage detection techniques are deployed to detect hidden paragraphs in documents. That is, to hide information, documents are injected with hidden text into passages. Rather than matching query terms against passages to determine their relevance, using text-mining techniques, the passages are classified. Those documents with hidden passages are defined as infected. Thus, simply stated, passage retrieval is the search for passages relevant to a user query, while passage detection is the classification of passages. That is, in passage detection, passages are labeled with one or more categories from a set of predetermined categories. We present a keyword-based dynamic passage approach (KDP) and demonstrate that KDP outperforms statistically significantly (99% confidence) the other document-splitting approaches by 12% to 18% in the passage detection and passage category-prediction tasks. Furthermore, we evaluate the effects of the feature selection, passage length, ambiguous passages, and finally training-data category distribution on passage-detection accuracy.

Date

22. 3.2009 19:14:43

Chan, L.M.; Lin, X.; Zeng, M.: Structural and multilingual approaches to subject access on the Web (1999) 0.05

0.04930556 = product of:
  0.07395834 = sum of:
    0.041417796 = weight(_text_:retrieval in 162) [ClassicSimilarity], result of:
      0.041417796 = score(doc=162,freq=2.0), product of:
        0.15490976 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.051211275 = queryNorm
        0.26736724 = fieldWeight in 162, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0625 = fieldNorm(doc=162)
    0.032540537 = product of:
      0.065081075 = sum of:
        0.065081075 = weight(_text_:conference in 162) [ClassicSimilarity], result of:
          0.065081075 = score(doc=162,freq=2.0), product of:
            0.19418365 = queryWeight, product of:
              3.7918143 = idf(docFreq=2710, maxDocs=44218)
              0.051211275 = queryNorm
            0.3351522 = fieldWeight in 162, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.7918143 = idf(docFreq=2710, maxDocs=44218)
              0.0625 = fieldNorm(doc=162)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: Zu den großen Herausforderungen einer sinnvollen Suche im WWW gehören die riesige Menge des Verfügbaren und die Sparchbarrieren. Verfahren, die die Web-Ressourcen im Hinblick auf ein effizienteres Retrieval inhaltlich strukturieren, werden daher ebenso dringend benötigt wie Programme, die mit der Sprachvielfalt umgehen können. Im folgenden Vortrag werden wir einige Ansätze diskutieren, die zur Bewältigung der beiden Probleme derzeit unternommen werden
Footnote: Vortrag bei: 65th IFLA Council and General Conference, Bangkok, Thailand, 20.28.8.1999

Koch, T.: Experiments with automatic classification of WAIS databases and indexing of WWW : some results from the Nordic WAIS/WWW project (1994) 0.04

0.043142363 = product of:
  0.064713545 = sum of:
    0.03624057 = weight(_text_:retrieval in 7209) [ClassicSimilarity], result of:
      0.03624057 = score(doc=7209,freq=2.0), product of:
        0.15490976 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.051211275 = queryNorm
        0.23394634 = fieldWeight in 7209, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7209)
    0.028472973 = product of:
      0.056945946 = sum of:
        0.056945946 = weight(_text_:conference in 7209) [ClassicSimilarity], result of:
          0.056945946 = score(doc=7209,freq=2.0), product of:
            0.19418365 = queryWeight, product of:
              3.7918143 = idf(docFreq=2710, maxDocs=44218)
              0.051211275 = queryNorm
            0.2932582 = fieldWeight in 7209, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.7918143 = idf(docFreq=2710, maxDocs=44218)
              0.0546875 = fieldNorm(doc=7209)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: The Nordic WAIS/WWW project sponsored by NORDINFO is a joint project between Lund University Library and the National Technological Library of Denmark. It aims to improve the existing networked information discovery and retrieval tools Wide Area Information System (WAIS) and World Wide Web (WWW), and to move towards unifying WWW and WAIS. Details current results focusing on the WAIS side of the project. Describes research into automatic indexing and classification of WAIS sources, development of an orientation tool for WAIS, and development of a WAIS index of WWW resources
Source: Internet world and document delivery world international 94: Proceedings of the 2nd Annual Conference, London, May 1994

Schiminovich, S.: Automatic classification and retrieval of documents by means of a bibliographic pattern discovery algorithm (1971) 0.03

0.034167938 = product of:
  0.10250381 = sum of:
    0.10250381 = weight(_text_:retrieval in 4846) [ClassicSimilarity], result of:
      0.10250381 = score(doc=4846,freq=4.0), product of:
        0.15490976 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.051211275 = queryNorm
        0.6617001 = fieldWeight in 4846, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.109375 = fieldNorm(doc=4846)
  0.33333334 = coord(1/3)

Source: Information storage and retrieval. 6(1971), S.417-435

Cathey, R.J.; Jensen, E.C.; Beitzel, S.M.; Frieder, O.; Grossman, D.: Exploiting parallelism to support scalable hierarchical clustering (2007) 0.03
```
0.030815974 = product of:
  0.04622396 = sum of:
    0.025886122 = weight(_text_:retrieval in 448) [ClassicSimilarity], result of:
      0.025886122 = score(doc=448,freq=2.0), product of:
        0.15490976 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.051211275 = queryNorm
        0.16710453 = fieldWeight in 448, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=448)
    0.020337837 = product of:
      0.040675674 = sum of:
        0.040675674 = weight(_text_:conference in 448) [ClassicSimilarity], result of:
          0.040675674 = score(doc=448,freq=2.0), product of:
            0.19418365 = queryWeight, product of:
              3.7918143 = idf(docFreq=2710, maxDocs=44218)
              0.051211275 = queryNorm
            0.20947012 = fieldWeight in 448, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.7918143 = idf(docFreq=2710, maxDocs=44218)
              0.0390625 = fieldNorm(doc=448)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

A distributed memory parallel version of the group average hierarchical agglomerative clustering algorithm is proposed to enable scaling the document clustering problem to large collections. Using standard message passing operations reduces interprocess communication while maintaining efficient load balancing. In a series of experiments using a subset of a standard Text REtrieval Conference (TREC) test collection, our parallel hierarchical clustering algorithm is shown to be scalable in terms of processors efficiently used and the collection size. Results show that our algorithm performs close to the expected O(n**2/p) time on p processors rather than the worst-case O(n**3/p) time. Furthermore, the O(n**2/p) memory complexity per node allows larger collections to be clustered as the number of nodes increases. While partitioning algorithms such as k-means are trivially parallelizable, our results confirm those of other studies which showed that hierarchical algorithms produce significantly tighter clusters in the document clustering task. Finally, we show how our parallel hierarchical agglomerative clustering algorithm can be used as the clustering subroutine for a parallel version of the buckshot algorithm to cluster the complete TREC collection at near theoretical runtime expectations.

Panyr, J.: Automatische Klassifikation und Information Retrieval : Anwendung und Entwicklung komplexer Verfahren in Information-Retrieval-Systemen und ihre Evaluierung (1986) 0.03

0.029286806 = product of:
  0.08786041 = sum of:
    0.08786041 = weight(_text_:retrieval in 32) [ClassicSimilarity], result of:
      0.08786041 = score(doc=32,freq=4.0), product of:
        0.15490976 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.051211275 = queryNorm
        0.5671716 = fieldWeight in 32, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.09375 = fieldNorm(doc=32)
  0.33333334 = coord(1/3)

Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.03
```
0.02882145 = product of:
  0.043232173 = sum of:
    0.025886122 = weight(_text_:retrieval in 1107) [ClassicSimilarity], result of:
      0.025886122 = score(doc=1107,freq=2.0), product of:
        0.15490976 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.051211275 = queryNorm
        0.16710453 = fieldWeight in 1107, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1107)
    0.017346052 = product of:
      0.034692105 = sum of:
        0.034692105 = weight(_text_:22 in 1107) [ClassicSimilarity], result of:
          0.034692105 = score(doc=1107,freq=2.0), product of:
            0.17933317 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051211275 = queryNorm
            0.19345059 = fieldWeight in 1107, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1107)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Retrieval of disease information is often based on several key aspects such as etiology, diagnosis, treatment, prevention, and symptoms of diseases. Automatic identification of disease aspect information is thus essential. In this article, I model the aspect identification problem as a text classification (TC) problem in which a disease aspect corresponds to a category. The disease aspect classification problem poses two challenges to classifiers: (a) a medical text often contains information about multiple aspects of a disease and hence produces noise for the classifiers and (b) text classifiers often cannot extract the textual parts (i.e., passages) about the categories of interest. I thus develop a technique, PETC (Passage Extractor for Text Classification), that extracts passages (from medical texts) for the underlying text classifiers to classify. Case studies on thousands of Chinese and English medical texts show that PETC enhances a support vector machine (SVM) classifier in classifying disease aspect information. PETC also performs better than three state-of-the-art classifier enhancement techniques, including two passage extraction techniques for text classifiers and a technique that employs term proximity information to enhance text classifiers. The contribution is of significance to evidence-based medicine, health education, and healthcare decision support. PETC can be used in those application domains in which a text to be classified may have several parts about different categories.

Date

28.10.2013 19:22:57

Rijsbergen, C.J. van: Automatic classification in information retrieval (1978) 0.03

0.027611865 = product of:
  0.08283559 = sum of:
    0.08283559 = weight(_text_:retrieval in 2412) [ClassicSimilarity], result of:
      0.08283559 = score(doc=2412,freq=2.0), product of:
        0.15490976 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.051211275 = queryNorm
        0.5347345 = fieldWeight in 2412, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.125 = fieldNorm(doc=2412)
  0.33333334 = coord(1/3)

Borko, H.: Research in computer based classification systems (1985) 0.03
```
0.02657496 = product of:
  0.03986244 = sum of:
    0.025625952 = weight(_text_:retrieval in 3647) [ClassicSimilarity], result of:
      0.025625952 = score(doc=3647,freq=4.0), product of:
        0.15490976 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.051211275 = queryNorm
        0.16542503 = fieldWeight in 3647, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.02734375 = fieldNorm(doc=3647)
    0.0142364865 = product of:
      0.028472973 = sum of:
        0.028472973 = weight(_text_:conference in 3647) [ClassicSimilarity], result of:
          0.028472973 = score(doc=3647,freq=2.0), product of:
            0.19418365 = queryWeight, product of:
              3.7918143 = idf(docFreq=2710, maxDocs=44218)
              0.051211275 = queryNorm
            0.1466291 = fieldWeight in 3647, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.7918143 = idf(docFreq=2710, maxDocs=44218)
              0.02734375 = fieldNorm(doc=3647)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

The selection in this reader by R. M. Needham and K. Sparck Jones reports an early approach to automatic classification that was taken in England. The following selection reviews various approaches that were being pursued in the United States at about the same time. It then discusses a particular approach initiated in the early 1960s by Harold Borko, at that time Head of the Language Processing and Retrieval Research Staff at the System Development Corporation, Santa Monica, California and, since 1966, a member of the faculty at the Graduate School of Library and Information Science, University of California, Los Angeles. As was described earlier, there are two steps in automatic classification, the first being to identify pairs of terms that are similar by virtue of co-occurring as index terms in the same documents, and the second being to form equivalence classes of intersubstitutable terms. To compute similarities, Borko and his associates used a standard correlation formula; to derive classification categories, where Needham and Sparck Jones used clumping, the Borko team used the statistical technique of factor analysis. The fact that documents can be classified automatically, and in any number of ways, is worthy of passing notice. Worthy of serious attention would be a demonstra tion that a computer-based classification system was effective in the organization and retrieval of documents. One reason for the inclusion of the following selection in the reader is that it addresses the question of evaluation. To evaluate the effectiveness of their automatically derived classification, Borko and his team asked three questions. The first was Is the classification reliable? in other words, could the categories derived from one sample of texts be used to classify other texts? Reliability was assessed by a case-study comparison of the classes derived from three different samples of abstracts. The notso-surprising conclusion reached was that automatically derived classes were reliable only to the extent that the sample from which they were derived was representative of the total document collection. The second evaluation question asked whether the classification was reasonable, in the sense of adequately describing the content of the document collection. The answer was sought by comparing the automatically derived categories with categories in a related classification system that was manually constructed. Here the conclusion was that the automatic method yielded categories that fairly accurately reflected the major area of interest in the sample collection of texts; however, since there were only eleven such categories and they were quite broad, they could not be regarded as suitable for use in a university or any large general library. The third evaluation question asked whether automatic classification was accurate, in the sense of producing results similar to those obtainabie by human cIassifiers. When using human classification as a criterion, automatic classification was found to be 50 percent accurate.

Footnote

Original in: Classification research: Proceedings of the Second International Study Conference held at Hotel Prins Hamlet, Elsinore, Denmark, 14th-18th Sept. 1964. Ed.: Pauline Atherton. Copenhagen: Munksgaard 1965. S.220-238.

Wätjen, H.-J.; Diekmann, B.; Möller, G.; Carstensen, K.-U.: Bericht zum DFG-Projekt: GERHARD : German Harvest Automated Retrieval and Directory (1998) 0.02

0.02440567 = product of:
  0.073217005 = sum of:
    0.073217005 = weight(_text_:retrieval in 3065) [ClassicSimilarity], result of:
      0.073217005 = score(doc=3065,freq=4.0), product of:
        0.15490976 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.051211275 = queryNorm
        0.47264296 = fieldWeight in 3065, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.078125 = fieldNorm(doc=3065)
  0.33333334 = coord(1/3)

Theme: Klassifikationssysteme im Online-Retrieval

Wu, M.; Fuller, M.; Wilkinson, R.: Using clustering and classification approaches in interactive retrieval (2001) 0.02

0.024160381 = product of:
  0.07248114 = sum of:
    0.07248114 = weight(_text_:retrieval in 2666) [ClassicSimilarity], result of:
      0.07248114 = score(doc=2666,freq=2.0), product of:
        0.15490976 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.051211275 = queryNorm
        0.46789268 = fieldWeight in 2666, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.109375 = fieldNorm(doc=2666)
  0.33333334 = coord(1/3)

Panyr, J.: Vektorraum-Modell und Clusteranalyse in Information-Retrieval-Systemen (1987) 0.02
```
0.023912577 = product of:
  0.07173773 = sum of:
    0.07173773 = weight(_text_:retrieval in 2322) [ClassicSimilarity], result of:
      0.07173773 = score(doc=2322,freq=6.0), product of:
        0.15490976 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.051211275 = queryNorm
        0.46309367 = fieldWeight in 2322, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0625 = fieldNorm(doc=2322)
  0.33333334 = coord(1/3)
```
Abstract

Ausgehend von theoretischen Indexierungsansätzen wird das klassische Vektorraum-Modell für automatische Indexierung (mit dem Trennschärfen-Modell) erläutert. Das Clustering in Information-Retrieval-Systemem wird als eine natürliche logische Folge aus diesem Modell aufgefaßt und in allen seinen Ausprägungen (d.h. als Dokumenten-, Term- oder Dokumenten- und Termklassifikation) behandelt. Anschließend werden die Suchstrategien in vorklassifizierten Dokumentenbeständen (Clustersuche) detailliert beschrieben. Zum Schluß wird noch die sinnvolle Anwendung der Clusteranalyse in Information-Retrieval-Systemen kurz diskutiert
Reiner, U.: Automatische DDC-Klassifizierung bibliografischer Titeldatensätze der Deutschen Nationalbibliografie (2009) 0.02
```
0.02305716 = product of:
  0.03458574 = sum of:
    0.020708898 = weight(_text_:retrieval in 3284) [ClassicSimilarity], result of:
      0.020708898 = score(doc=3284,freq=2.0), product of:
        0.15490976 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.051211275 = queryNorm
        0.13368362 = fieldWeight in 3284, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.03125 = fieldNorm(doc=3284)
    0.013876842 = product of:
      0.027753685 = sum of:
        0.027753685 = weight(_text_:22 in 3284) [ClassicSimilarity], result of:
          0.027753685 = score(doc=3284,freq=2.0), product of:
            0.17933317 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051211275 = queryNorm
            0.15476047 = fieldWeight in 3284, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=3284)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Die Menge der zu klassifizierenden Veröffentlichungen steigt spätestens seit der Existenz des World Wide Web schneller an, als sie intellektuell sachlich erschlossen werden kann. Daher werden Verfahren gesucht, um die Klassifizierung von Textobjekten zu automatisieren oder die intellektuelle Klassifizierung zumindest zu unterstützen. Seit 1968 gibt es Verfahren zur automatischen Dokumentenklassifizierung (Information Retrieval, kurz: IR) und seit 1992 zur automatischen Textklassifizierung (ATC: Automated Text Categorization). Seit immer mehr digitale Objekte im World Wide Web zur Verfügung stehen, haben Arbeiten zur automatischen Textklassifizierung seit ca. 1998 verstärkt zugenommen. Dazu gehören seit 1996 auch Arbeiten zur automatischen DDC-Klassifizierung bzw. RVK-Klassifizierung von bibliografischen Titeldatensätzen und Volltextdokumenten. Bei den Entwicklungen handelt es sich unseres Wissens bislang um experimentelle und keine im ständigen Betrieb befindlichen Systeme. Auch das VZG-Projekt Colibri/DDC ist seit 2006 u. a. mit der automatischen DDC-Klassifizierung befasst. Die diesbezüglichen Untersuchungen und Entwicklungen dienen zur Beantwortung der Forschungsfrage: "Ist es möglich, eine inhaltlich stimmige DDC-Titelklassifikation aller GVK-PLUS-Titeldatensätze automatisch zu erzielen?"

Date

22. 1.2010 14:41:24

Search (89 results, page 1 of 5)

Authors

Years

Languages

Types

Themes

Subjects