Search (97 results, page 1 of 5)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.05

0.054559484 = product of:
  0.081839226 = sum of:
    0.069911666 = product of:
      0.20973499 = sum of:
        0.20973499 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.20973499 = score(doc=562,freq=2.0), product of:
            0.37318197 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.04401763 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.011927563 = product of:
      0.035782687 = sum of:
        0.035782687 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.035782687 = score(doc=562,freq=2.0), product of:
            0.15414225 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04401763 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Reiner, U.: Automatische DDC-Klassifizierung von bibliografischen Titeldatensätzen (2009) 0.04

0.042919382 = product of:
  0.06437907 = sum of:
    0.0444998 = weight(_text_:retrieval in 611) [ClassicSimilarity], result of:
      0.0444998 = score(doc=611,freq=2.0), product of:
        0.1331496 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04401763 = queryNorm
        0.33420905 = fieldWeight in 611, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.078125 = fieldNorm(doc=611)
    0.019879272 = product of:
      0.059637815 = sum of:
        0.059637815 = weight(_text_:22 in 611) [ClassicSimilarity], result of:
          0.059637815 = score(doc=611,freq=2.0), product of:
            0.15414225 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04401763 = queryNorm
            0.38690117 = fieldWeight in 611, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=611)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)

Date: 22. 8.2009 12:54:24
Theme: Klassifikationssysteme im Online-Retrieval

Mengle, S.; Goharian, N.: Passage detection using text classification (2009) 0.04
```
0.039794616 = product of:
  0.059691925 = sum of:
    0.049752288 = weight(_text_:retrieval in 2765) [ClassicSimilarity], result of:
      0.049752288 = score(doc=2765,freq=10.0), product of:
        0.1331496 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04401763 = queryNorm
        0.37365708 = fieldWeight in 2765, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2765)
    0.009939636 = product of:
      0.029818907 = sum of:
        0.029818907 = weight(_text_:22 in 2765) [ClassicSimilarity], result of:
          0.029818907 = score(doc=2765,freq=2.0), product of:
            0.15414225 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04401763 = queryNorm
            0.19345059 = fieldWeight in 2765, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2765)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)
```
Abstract

Passages can be hidden within a text to circumvent their disallowed transfer. Such release of compartmentalized information is of concern to all corporate and governmental organizations. Passage retrieval is well studied; we posit, however, that passage detection is not. Passage retrieval is the determination of the degree of relevance of blocks of text, namely passages, comprising a document. Rather than determining the relevance of a document in its entirety, passage retrieval determines the relevance of the individual passages. As such, modified traditional information-retrieval techniques compare terms found in user queries with the individual passages to determine a similarity score for passages of interest. In passage detection, passages are classified into predetermined categories. More often than not, passage detection techniques are deployed to detect hidden paragraphs in documents. That is, to hide information, documents are injected with hidden text into passages. Rather than matching query terms against passages to determine their relevance, using text-mining techniques, the passages are classified. Those documents with hidden passages are defined as infected. Thus, simply stated, passage retrieval is the search for passages relevant to a user query, while passage detection is the classification of passages. That is, in passage detection, passages are labeled with one or more categories from a set of predetermined categories. We present a keyword-based dynamic passage approach (KDP) and demonstrate that KDP outperforms statistically significantly (99% confidence) the other document-splitting approaches by 12% to 18% in the passage detection and passage category-prediction tasks. Furthermore, we evaluate the effects of the feature selection, passage length, ambiguous passages, and finally training-data category distribution on passage-detection accuracy.

Date

22. 3.2009 19:14:43

Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998) 0.03

0.030043568 = product of:
  0.04506535 = sum of:
    0.03114986 = weight(_text_:retrieval in 1673) [ClassicSimilarity], result of:
      0.03114986 = score(doc=1673,freq=2.0), product of:
        0.1331496 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04401763 = queryNorm
        0.23394634 = fieldWeight in 1673, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1673)
    0.01391549 = product of:
      0.04174647 = sum of:
        0.04174647 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
          0.04174647 = score(doc=1673,freq=2.0), product of:
            0.15414225 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04401763 = queryNorm
            0.2708308 = fieldWeight in 1673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1673)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)

Date: 1. 8.1996 22:08:06
Theme: Klassifikationssysteme im Online-Retrieval

Schiminovich, S.: Automatic classification and retrieval of documents by means of a bibliographic pattern discovery algorithm (1971) 0.03

0.029368369 = product of:
  0.088105105 = sum of:
    0.088105105 = weight(_text_:retrieval in 4846) [ClassicSimilarity], result of:
      0.088105105 = score(doc=4846,freq=4.0), product of:
        0.1331496 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04401763 = queryNorm
        0.6617001 = fieldWeight in 4846, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.109375 = fieldNorm(doc=4846)
  0.33333334 = coord(1/3)

Source: Information storage and retrieval. 6(1971), S.417-435

Ribeiro-Neto, B.; Laender, A.H.F.; Lima, L.R.S. de: ¬An experimental study in automatically categorizing medical documents (2001) 0.03
```
0.027663982 = product of:
  0.04149597 = sum of:
    0.031466108 = weight(_text_:retrieval in 5702) [ClassicSimilarity], result of:
      0.031466108 = score(doc=5702,freq=4.0), product of:
        0.1331496 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04401763 = queryNorm
        0.23632148 = fieldWeight in 5702, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5702)
    0.010029862 = product of:
      0.030089583 = sum of:
        0.030089583 = weight(_text_:29 in 5702) [ClassicSimilarity], result of:
          0.030089583 = score(doc=5702,freq=2.0), product of:
            0.15484026 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04401763 = queryNorm
            0.19432661 = fieldWeight in 5702, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5702)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)
```
Abstract

In this article, we evaluate the retrieval performance of an algorithm that automatically categorizes medical documents. The categorization, which consists in assigning an International Code of Disease (ICD) to the medical document under examination, is based on wellknown information retrieval techniques. The algorithm, which we proposed, operates in a fully automatic mode and requires no supervision or training data. Using a database of 20,569 documents, we verify that the algorithm attains levels of average precision in the 70-80% range for category coding and in the 60-70% range for subcategory coding. We also carefully analyze the case of those documents whose categorization is not in accordance with the one provided by the human specialists. The vast majority of them represent cases that can only be fully categorized with the assistance of a human subject (because, for instance, they require specific knowledge of a given pathology). For a slim fraction of all documents (0.77% for category coding and 1.4% for subcategory coding), the algorithm makes assignments that are clearly incorrect. However, this fraction corresponds to only one-fourth of the mistakes made by the human specialists

Date

29. 9.2001 13:59:42
Golub, K.; Hansson, J.; Soergel, D.; Tudhope, D.: Managing classification in libraries : a methodological outline for evaluating automatic subject indexing and classification in Swedish library catalogues (2015) 0.03
```
0.027663982 = product of:
  0.04149597 = sum of:
    0.031466108 = weight(_text_:retrieval in 2300) [ClassicSimilarity], result of:
      0.031466108 = score(doc=2300,freq=4.0), product of:
        0.1331496 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04401763 = queryNorm
        0.23632148 = fieldWeight in 2300, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2300)
    0.010029862 = product of:
      0.030089583 = sum of:
        0.030089583 = weight(_text_:29 in 2300) [ClassicSimilarity], result of:
          0.030089583 = score(doc=2300,freq=2.0), product of:
            0.15484026 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04401763 = queryNorm
            0.19432661 = fieldWeight in 2300, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2300)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)
```
Abstract

Subject terms play a crucial role in resource discovery but require substantial effort to produce. Automatic subject classification and indexing address problems of scale and sustainability and can be used to enrich existing bibliographic records, establish more connections across and between resources and enhance consistency of bibliographic data. The paper aims to put forward a complex methodological framework to evaluate automatic classification tools of Swedish textual documents based on the Dewey Decimal Classification (DDC) recently introduced to Swedish libraries. Three major complementary approaches are suggested: a quality-built gold standard, retrieval effects, domain analysis. The gold standard is built based on input from at least two catalogue librarians, end-users expert in the subject, end users inexperienced in the subject and automated tools. Retrieval effects are studied through a combination of assigned and free tasks, including factual and comprehensive types. The study also takes into consideration the different role and character of subject terms in various knowledge domains, such as scientific disciplines. As a theoretical framework, domain analysis is used and applied in relation to the implementation of DDC in Swedish libraries and chosen domains of knowledge within the DDC itself.

Source

Classification and authority control: expanding resource discovery: proceedings of the International UDC Seminar 2015, 29-30 October 2015, Lisbon, Portugal. Eds.: Slavic, A. u. M.I. Cordeiro

Panyr, J.: Automatische Klassifikation und Information Retrieval : Anwendung und Entwicklung komplexer Verfahren in Information-Retrieval-Systemen und ihre Evaluierung (1986) 0.03

0.02517289 = product of:
  0.07551867 = sum of:
    0.07551867 = weight(_text_:retrieval in 32) [ClassicSimilarity], result of:
      0.07551867 = score(doc=32,freq=4.0), product of:
        0.1331496 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04401763 = queryNorm
        0.5671716 = fieldWeight in 32, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.09375 = fieldNorm(doc=32)
  0.33333334 = coord(1/3)

Rijsbergen, C.J. van: Automatic classification in information retrieval (1978) 0.02

0.023733227 = product of:
  0.07119968 = sum of:
    0.07119968 = weight(_text_:retrieval in 2412) [ClassicSimilarity], result of:
      0.07119968 = score(doc=2412,freq=2.0), product of:
        0.1331496 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04401763 = queryNorm
        0.5347345 = fieldWeight in 2412, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.125 = fieldNorm(doc=2412)
  0.33333334 = coord(1/3)

Chung, Y.M.; Lee, J.Y.: ¬A corpus-based approach to comparative evaluation of statistical term association measures (2001) 0.02
```
0.02151984 = product of:
  0.03227976 = sum of:
    0.0222499 = weight(_text_:retrieval in 5769) [ClassicSimilarity], result of:
      0.0222499 = score(doc=5769,freq=2.0), product of:
        0.1331496 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04401763 = queryNorm
        0.16710453 = fieldWeight in 5769, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5769)
    0.010029862 = product of:
      0.030089583 = sum of:
        0.030089583 = weight(_text_:29 in 5769) [ClassicSimilarity], result of:
          0.030089583 = score(doc=5769,freq=2.0), product of:
            0.15484026 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04401763 = queryNorm
            0.19432661 = fieldWeight in 5769, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5769)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)
```
Abstract

Statistical association measures have been widely applied in information retrieval research, usually employing a clustering of documents or terms on the basis of their relationships. Applications of the association measures for term clustering include automatic thesaurus construction and query expansion. This research evaluates the similarity of six association measures by comparing the relationship and behavior they demonstrate in various analyses of a test corpus. Analysis techniques include comparisons of highly ranked term pairs and term clusters, analyses of the correlation among the association measures using Pearson's correlation coefficient and MDS mapping, and an analysis of the impact of a term frequency on the association values by means of z-score. The major findings of the study are as follows: First, the most similar association measures are mutual information and Yule's coefficient of colligation Y, whereas cosine and Jaccard coefficients, as well as X**2 statistic and likelihood ratio, demonstrate quite similar behavior for terms with high frequency. Second, among all the measures, the X**2 statistic is the least affected by the frequency of terms. Third, although cosine and Jaccard coefficients tend to emphasize high frequency terms, mutual information and Yule's Y seem to overestimate rare terms

Date

29. 9.2001 14:01:18
Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.02
```
0.021459691 = product of:
  0.032189537 = sum of:
    0.0222499 = weight(_text_:retrieval in 1107) [ClassicSimilarity], result of:
      0.0222499 = score(doc=1107,freq=2.0), product of:
        0.1331496 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04401763 = queryNorm
        0.16710453 = fieldWeight in 1107, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1107)
    0.009939636 = product of:
      0.029818907 = sum of:
        0.029818907 = weight(_text_:22 in 1107) [ClassicSimilarity], result of:
          0.029818907 = score(doc=1107,freq=2.0), product of:
            0.15414225 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04401763 = queryNorm
            0.19345059 = fieldWeight in 1107, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1107)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)
```
Abstract

Retrieval of disease information is often based on several key aspects such as etiology, diagnosis, treatment, prevention, and symptoms of diseases. Automatic identification of disease aspect information is thus essential. In this article, I model the aspect identification problem as a text classification (TC) problem in which a disease aspect corresponds to a category. The disease aspect classification problem poses two challenges to classifiers: (a) a medical text often contains information about multiple aspects of a disease and hence produces noise for the classifiers and (b) text classifiers often cannot extract the textual parts (i.e., passages) about the categories of interest. I thus develop a technique, PETC (Passage Extractor for Text Classification), that extracts passages (from medical texts) for the underlying text classifiers to classify. Case studies on thousands of Chinese and English medical texts show that PETC enhances a support vector machine (SVM) classifier in classifying disease aspect information. PETC also performs better than three state-of-the-art classifier enhancement techniques, including two passage extraction techniques for text classifiers and a technique that employs term proximity information to enhance text classifiers. The contribution is of significance to evidence-based medicine, health education, and healthcare decision support. PETC can be used in those application domains in which a text to be classified may have several parts about different categories.

Date

28.10.2013 19:22:57

Wätjen, H.-J.; Diekmann, B.; Möller, G.; Carstensen, K.-U.: Bericht zum DFG-Projekt: GERHARD : German Harvest Automated Retrieval and Directory (1998) 0.02

0.020977406 = product of:
  0.062932216 = sum of:
    0.062932216 = weight(_text_:retrieval in 3065) [ClassicSimilarity], result of:
      0.062932216 = score(doc=3065,freq=4.0), product of:
        0.1331496 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04401763 = queryNorm
        0.47264296 = fieldWeight in 3065, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.078125 = fieldNorm(doc=3065)
  0.33333334 = coord(1/3)

Theme: Klassifikationssysteme im Online-Retrieval

Wu, M.; Fuller, M.; Wilkinson, R.: Using clustering and classification approaches in interactive retrieval (2001) 0.02

0.020766575 = product of:
  0.06229972 = sum of:
    0.06229972 = weight(_text_:retrieval in 2666) [ClassicSimilarity], result of:
      0.06229972 = score(doc=2666,freq=2.0), product of:
        0.1331496 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04401763 = queryNorm
        0.46789268 = fieldWeight in 2666, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.109375 = fieldNorm(doc=2666)
  0.33333334 = coord(1/3)

Panyr, J.: Vektorraum-Modell und Clusteranalyse in Information-Retrieval-Systemen (1987) 0.02
```
0.020553578 = product of:
  0.061660733 = sum of:
    0.061660733 = weight(_text_:retrieval in 2322) [ClassicSimilarity], result of:
      0.061660733 = score(doc=2322,freq=6.0), product of:
        0.1331496 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04401763 = queryNorm
        0.46309367 = fieldWeight in 2322, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0625 = fieldNorm(doc=2322)
  0.33333334 = coord(1/3)
```
Abstract

Ausgehend von theoretischen Indexierungsansätzen wird das klassische Vektorraum-Modell für automatische Indexierung (mit dem Trennschärfen-Modell) erläutert. Das Clustering in Information-Retrieval-Systemem wird als eine natürliche logische Folge aus diesem Modell aufgefaßt und in allen seinen Ausprägungen (d.h. als Dokumenten-, Term- oder Dokumenten- und Termklassifikation) behandelt. Anschließend werden die Suchstrategien in vorklassifizierten Dokumentenbeständen (Clustersuche) detailliert beschrieben. Zum Schluß wird noch die sinnvolle Anwendung der Clusteranalyse in Information-Retrieval-Systemen kurz diskutiert

GERHARD : eine Spezialsuchmaschine für die Wissenschaft (1998) 0.02

0.017799921 = product of:
  0.05339976 = sum of:
    0.05339976 = weight(_text_:retrieval in 381) [ClassicSimilarity], result of:
      0.05339976 = score(doc=381,freq=2.0), product of:
        0.1331496 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04401763 = queryNorm
        0.40105087 = fieldWeight in 381, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.09375 = fieldNorm(doc=381)
  0.33333334 = coord(1/3)

Theme: Klassifikationssysteme im Online-Retrieval

Yu, W.; Gong, Y.: Document clustering by concept factorization (2004) 0.02

0.017799921 = product of:
  0.05339976 = sum of:
    0.05339976 = weight(_text_:retrieval in 4084) [ClassicSimilarity], result of:
      0.05339976 = score(doc=4084,freq=2.0), product of:
        0.1331496 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04401763 = queryNorm
        0.40105087 = fieldWeight in 4084, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.09375 = fieldNorm(doc=4084)
  0.33333334 = coord(1/3)

Source: SIGIR'04: Proceedings of the 27th Annual International ACM-SIGIR Conference an Research and Development in Information Retrieval. Ed.: K. Järvelin, u.a

Ko, Y.: ¬A new term-weighting scheme for text classification using the odds of positive and negative class probabilities (2015) 0.02
```
0.017799921 = product of:
  0.05339976 = sum of:
    0.05339976 = weight(_text_:retrieval in 2339) [ClassicSimilarity], result of:
      0.05339976 = score(doc=2339,freq=8.0), product of:
        0.1331496 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04401763 = queryNorm
        0.40105087 = fieldWeight in 2339, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=2339)
  0.33333334 = coord(1/3)
```
Abstract

Text classification (TC) is a core technique for text mining and information retrieval. It has been applied to many applications in many different research and industrial areas. Term-weighting schemes assign an appropriate weight to each term to obtain a high TC performance. Although term weighting is one of the important modules for TC and TC has different peculiarities from those in information retrieval, many term-weighting schemes used in information retrieval, such as term frequency-inverse document frequency (tf-idf), have been used in TC in the same manner. The peculiarity of TC that differs most from information retrieval is the existence of class information. This article proposes a new term-weighting scheme that uses class information using positive and negative class distributions. As a result, the proposed scheme, log tf-TRR, consistently performs better than do other schemes using class information as well as traditional schemes such as tf-idf.
Reiner, U.: Automatische DDC-Klassifizierung bibliografischer Titeldatensätze der Deutschen Nationalbibliografie (2009) 0.02
```
0.017167753 = product of:
  0.025751628 = sum of:
    0.01779992 = weight(_text_:retrieval in 3284) [ClassicSimilarity], result of:
      0.01779992 = score(doc=3284,freq=2.0), product of:
        0.1331496 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04401763 = queryNorm
        0.13368362 = fieldWeight in 3284, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.03125 = fieldNorm(doc=3284)
    0.0079517085 = product of:
      0.023855126 = sum of:
        0.023855126 = weight(_text_:22 in 3284) [ClassicSimilarity], result of:
          0.023855126 = score(doc=3284,freq=2.0), product of:
            0.15414225 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04401763 = queryNorm
            0.15476047 = fieldWeight in 3284, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=3284)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)
```
Abstract

Die Menge der zu klassifizierenden Veröffentlichungen steigt spätestens seit der Existenz des World Wide Web schneller an, als sie intellektuell sachlich erschlossen werden kann. Daher werden Verfahren gesucht, um die Klassifizierung von Textobjekten zu automatisieren oder die intellektuelle Klassifizierung zumindest zu unterstützen. Seit 1968 gibt es Verfahren zur automatischen Dokumentenklassifizierung (Information Retrieval, kurz: IR) und seit 1992 zur automatischen Textklassifizierung (ATC: Automated Text Categorization). Seit immer mehr digitale Objekte im World Wide Web zur Verfügung stehen, haben Arbeiten zur automatischen Textklassifizierung seit ca. 1998 verstärkt zugenommen. Dazu gehören seit 1996 auch Arbeiten zur automatischen DDC-Klassifizierung bzw. RVK-Klassifizierung von bibliografischen Titeldatensätzen und Volltextdokumenten. Bei den Entwicklungen handelt es sich unseres Wissens bislang um experimentelle und keine im ständigen Betrieb befindlichen Systeme. Auch das VZG-Projekt Colibri/DDC ist seit 2006 u. a. mit der automatischen DDC-Klassifizierung befasst. Die diesbezüglichen Untersuchungen und Entwicklungen dienen zur Beantwortung der Forschungsfrage: "Ist es möglich, eine inhaltlich stimmige DDC-Titelklassifikation aller GVK-PLUS-Titeldatensätze automatisch zu erzielen?"

Date

22. 1.2010 14:41:24

Ingwersen, P.; Wormell, I.: Ranganathan in the perspective of advanced information retrieval (1992) 0.02

0.016781926 = product of:
  0.050345775 = sum of:
    0.050345775 = weight(_text_:retrieval in 7695) [ClassicSimilarity], result of:
      0.050345775 = score(doc=7695,freq=4.0), product of:
        0.1331496 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04401763 = queryNorm
        0.37811437 = fieldWeight in 7695, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0625 = fieldNorm(doc=7695)
  0.33333334 = coord(1/3)

Abstract: Examnines Ranganathan's approach to knowledge organisation and its relevance to intellectual accessibility in libraries. Discusses the current and future developments of his methodology and theories in knowledge-based systems. Topics covered include: semi-automatic classification and structure of thesauri; user-intermediary interactions in information retrieval (IR); semantic value-theory and uncertainty principles in IR; and case grammar

Guerrero-Bote, V.P.; Moya Anegón, F. de; Herrero Solana, V.: Document organization using Kohonen's algorithm (2002) 0.02
```
0.016781926 = product of:
  0.050345775 = sum of:
    0.050345775 = weight(_text_:retrieval in 2564) [ClassicSimilarity], result of:
      0.050345775 = score(doc=2564,freq=4.0), product of:
        0.1331496 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04401763 = queryNorm
        0.37811437 = fieldWeight in 2564, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0625 = fieldNorm(doc=2564)
  0.33333334 = coord(1/3)
```
Abstract

The classification of documents from a bibliographic database is a task that is linked to processes of information retrieval based on partial matching. A method is described of vectorizing reference documents from LISA which permits their topological organization using Kohonen's algorithm. As an example a map is generated of 202 documents from LISA, and an analysis is made of the possibilities of this type of neural network with respect to the development of information retrieval systems based on graphical browsing.

Search (97 results, page 1 of 5)

Authors

Years

Languages

Types

Themes