Search (68 results, page 1 of 4)

Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998) 0.10

0.1015835 = product of:
  0.15237525 = sum of:
    0.031532075 = weight(_text_:im in 1673) [ClassicSimilarity], result of:
      0.031532075 = score(doc=1673,freq=2.0), product of:
        0.1442303 = queryWeight, product of:
          2.8267863 = idf(docFreq=7115, maxDocs=44218)
          0.051022716 = queryNorm
        0.2186231 = fieldWeight in 1673, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.8267863 = idf(docFreq=7115, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1673)
    0.12084317 = sum of:
      0.03634593 = weight(_text_:online in 1673) [ClassicSimilarity], result of:
        0.03634593 = score(doc=1673,freq=2.0), product of:
          0.1548489 = queryWeight, product of:
            3.0349014 = idf(docFreq=5778, maxDocs=44218)
            0.051022716 = queryNorm
          0.23471867 = fieldWeight in 1673, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.0349014 = idf(docFreq=5778, maxDocs=44218)
            0.0546875 = fieldNorm(doc=1673)
      0.03610713 = weight(_text_:retrieval in 1673) [ClassicSimilarity], result of:
        0.03610713 = score(doc=1673,freq=2.0), product of:
          0.15433937 = queryWeight, product of:
            3.024915 = idf(docFreq=5836, maxDocs=44218)
            0.051022716 = queryNorm
          0.23394634 = fieldWeight in 1673, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.024915 = idf(docFreq=5836, maxDocs=44218)
            0.0546875 = fieldNorm(doc=1673)
      0.048390117 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
        0.048390117 = score(doc=1673,freq=2.0), product of:
          0.17867287 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051022716 = queryNorm
          0.2708308 = fieldWeight in 1673, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=1673)
  0.6666667 = coord(2/3)

Date: 1. 8.1996 22:08:06
Theme: Klassifikationssysteme im Online-Retrieval

Möller, G.: Automatic classification of the World Wide Web using Universal Decimal Classification (1999) 0.09

0.09292588 = product of:
  0.13938881 = sum of:
    0.045045823 = weight(_text_:im in 494) [ClassicSimilarity], result of:
      0.045045823 = score(doc=494,freq=2.0), product of:
        0.1442303 = queryWeight, product of:
          2.8267863 = idf(docFreq=7115, maxDocs=44218)
          0.051022716 = queryNorm
        0.3123187 = fieldWeight in 494, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.8267863 = idf(docFreq=7115, maxDocs=44218)
          0.078125 = fieldNorm(doc=494)
    0.09434299 = product of:
      0.14151448 = sum of:
        0.08993286 = weight(_text_:online in 494) [ClassicSimilarity], result of:
          0.08993286 = score(doc=494,freq=6.0), product of:
            0.1548489 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.051022716 = queryNorm
            0.5807781 = fieldWeight in 494, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.078125 = fieldNorm(doc=494)
        0.051581617 = weight(_text_:retrieval in 494) [ClassicSimilarity], result of:
          0.051581617 = score(doc=494,freq=2.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.33420905 = fieldWeight in 494, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.078125 = fieldNorm(doc=494)
      0.6666667 = coord(2/3)
  0.6666667 = coord(2/3)

Source: Online information 99: 23rd International Online Information Meeting, Proceedings, London, 7-9 December 1999. Ed.: D. Raitt et al
Theme: Klassifikationssysteme im Online-Retrieval

Vizine-Goetz, D.: NetLab / OCLC collaboration seeks to improve Web searching (1999) 0.08

0.0760325 = product of:
  0.11404874 = sum of:
    0.045045823 = weight(_text_:im in 4180) [ClassicSimilarity], result of:
      0.045045823 = score(doc=4180,freq=2.0), product of:
        0.1442303 = queryWeight, product of:
          2.8267863 = idf(docFreq=7115, maxDocs=44218)
          0.051022716 = queryNorm
        0.3123187 = fieldWeight in 4180, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.8267863 = idf(docFreq=7115, maxDocs=44218)
          0.078125 = fieldNorm(doc=4180)
    0.06900292 = product of:
      0.103504375 = sum of:
        0.051922753 = weight(_text_:online in 4180) [ClassicSimilarity], result of:
          0.051922753 = score(doc=4180,freq=2.0), product of:
            0.1548489 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.051022716 = queryNorm
            0.33531237 = fieldWeight in 4180, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.078125 = fieldNorm(doc=4180)
        0.051581617 = weight(_text_:retrieval in 4180) [ClassicSimilarity], result of:
          0.051581617 = score(doc=4180,freq=2.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.33420905 = fieldWeight in 4180, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.078125 = fieldNorm(doc=4180)
      0.6666667 = coord(2/3)
  0.6666667 = coord(2/3)

Theme: Klassifikationssysteme im Online-Retrieval

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.06

0.06324223 = product of:
  0.09486334 = sum of:
    0.081037596 = product of:
      0.24311279 = sum of:
        0.24311279 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.24311279 = score(doc=562,freq=2.0), product of:
            0.43257114 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.051022716 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.013825747 = product of:
      0.04147724 = sum of:
        0.04147724 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.04147724 = score(doc=562,freq=2.0), product of:
            0.17867287 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051022716 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Koch, T.; Vizine-Goetz, D.: DDC and knowledge organization in the digital library : Research and development. Demonstration pages (1999) 0.05

0.053082936 = product of:
  0.0796244 = sum of:
    0.03822265 = weight(_text_:im in 942) [ClassicSimilarity], result of:
      0.03822265 = score(doc=942,freq=4.0), product of:
        0.1442303 = queryWeight, product of:
          2.8267863 = idf(docFreq=7115, maxDocs=44218)
          0.051022716 = queryNorm
        0.26501122 = fieldWeight in 942, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.8267863 = idf(docFreq=7115, maxDocs=44218)
          0.046875 = fieldNorm(doc=942)
    0.04140175 = product of:
      0.062102623 = sum of:
        0.031153653 = weight(_text_:online in 942) [ClassicSimilarity], result of:
          0.031153653 = score(doc=942,freq=2.0), product of:
            0.1548489 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.051022716 = queryNorm
            0.20118743 = fieldWeight in 942, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.046875 = fieldNorm(doc=942)
        0.03094897 = weight(_text_:retrieval in 942) [ClassicSimilarity], result of:
          0.03094897 = score(doc=942,freq=2.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.20052543 = fieldWeight in 942, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=942)
      0.6666667 = coord(2/3)
  0.6666667 = coord(2/3)

Abstract: Der Workshop gibt einen Einblick in die aktuelle Forschung und Entwicklung zur Wissensorganisation in digitalen Bibliotheken. Diane Vizine-Goetz vom OCLC Office of Research in Dublin, Ohio, stellt die Forschungsprojekte von OCLC zur Anpassung und Weiterentwicklung der Dewey Decimal Classification als Wissensorganisationsinstrument fuer grosse digitale Dokumentensammlungen vor. Traugott Koch, NetLab, Universität Lund in Schweden, demonstriert die Ansätze und Lösungen des EU-Projekts DESIRE zum Einsatz von intellektueller und vor allem automatischer Klassifikation in Fachinformationsdiensten im Internet.
Theme: Klassifikationssysteme im Online-Retrieval

Chan, L.M.; Lin, X.; Zeng, M.: Structural and multilingual approaches to subject access on the Web (1999) 0.05

0.05078162 = product of:
  0.07617243 = sum of:
    0.06241733 = weight(_text_:im in 162) [ClassicSimilarity], result of:
      0.06241733 = score(doc=162,freq=6.0), product of:
        0.1442303 = queryWeight, product of:
          2.8267863 = idf(docFreq=7115, maxDocs=44218)
          0.051022716 = queryNorm
        0.43276152 = fieldWeight in 162, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          2.8267863 = idf(docFreq=7115, maxDocs=44218)
          0.0625 = fieldNorm(doc=162)
    0.013755098 = product of:
      0.041265294 = sum of:
        0.041265294 = weight(_text_:retrieval in 162) [ClassicSimilarity], result of:
          0.041265294 = score(doc=162,freq=2.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.26736724 = fieldWeight in 162, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0625 = fieldNorm(doc=162)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)

Abstract: Zu den großen Herausforderungen einer sinnvollen Suche im WWW gehören die riesige Menge des Verfügbaren und die Sparchbarrieren. Verfahren, die die Web-Ressourcen im Hinblick auf ein effizienteres Retrieval inhaltlich strukturieren, werden daher ebenso dringend benötigt wie Programme, die mit der Sprachvielfalt umgehen können. Im folgenden Vortrag werden wir einige Ansätze diskutieren, die zur Bewältigung der beiden Probleme derzeit unternommen werden

Golub, K.; Lykke, M.: Automated classification of web pages in hierarchical browsing (2009) 0.04

0.03801625 = product of:
  0.05702437 = sum of:
    0.022522911 = weight(_text_:im in 3614) [ClassicSimilarity], result of:
      0.022522911 = score(doc=3614,freq=2.0), product of:
        0.1442303 = queryWeight, product of:
          2.8267863 = idf(docFreq=7115, maxDocs=44218)
          0.051022716 = queryNorm
        0.15615936 = fieldWeight in 3614, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.8267863 = idf(docFreq=7115, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3614)
    0.03450146 = product of:
      0.051752187 = sum of:
        0.025961377 = weight(_text_:online in 3614) [ClassicSimilarity], result of:
          0.025961377 = score(doc=3614,freq=2.0), product of:
            0.1548489 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.051022716 = queryNorm
            0.16765618 = fieldWeight in 3614, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3614)
        0.025790809 = weight(_text_:retrieval in 3614) [ClassicSimilarity], result of:
          0.025790809 = score(doc=3614,freq=2.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.16710453 = fieldWeight in 3614, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3614)
      0.6666667 = coord(2/3)
  0.6666667 = coord(2/3)

Theme: Klassifikationssysteme im Online-Retrieval

Mengle, S.; Goharian, N.: Passage detection using text classification (2009) 0.02
```
0.020496529 = product of:
  0.061489582 = sum of:
    0.061489582 = product of:
      0.09223437 = sum of:
        0.05767 = weight(_text_:retrieval in 2765) [ClassicSimilarity], result of:
          0.05767 = score(doc=2765,freq=10.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.37365708 = fieldWeight in 2765, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2765)
        0.03456437 = weight(_text_:22 in 2765) [ClassicSimilarity], result of:
          0.03456437 = score(doc=2765,freq=2.0), product of:
            0.17867287 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051022716 = queryNorm
            0.19345059 = fieldWeight in 2765, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2765)
      0.6666667 = coord(2/3)
  0.33333334 = coord(1/3)
```
Abstract

Passages can be hidden within a text to circumvent their disallowed transfer. Such release of compartmentalized information is of concern to all corporate and governmental organizations. Passage retrieval is well studied; we posit, however, that passage detection is not. Passage retrieval is the determination of the degree of relevance of blocks of text, namely passages, comprising a document. Rather than determining the relevance of a document in its entirety, passage retrieval determines the relevance of the individual passages. As such, modified traditional information-retrieval techniques compare terms found in user queries with the individual passages to determine a similarity score for passages of interest. In passage detection, passages are classified into predetermined categories. More often than not, passage detection techniques are deployed to detect hidden paragraphs in documents. That is, to hide information, documents are injected with hidden text into passages. Rather than matching query terms against passages to determine their relevance, using text-mining techniques, the passages are classified. Those documents with hidden passages are defined as infected. Thus, simply stated, passage retrieval is the search for passages relevant to a user query, while passage detection is the classification of passages. That is, in passage detection, passages are labeled with one or more categories from a set of predetermined categories. We present a keyword-based dynamic passage approach (KDP) and demonstrate that KDP outperforms statistically significantly (99% confidence) the other document-splitting approaches by 12% to 18% in the passage detection and passage category-prediction tasks. Furthermore, we evaluate the effects of the feature selection, passage length, ambiguous passages, and finally training-data category distribution on passage-detection accuracy.

Date

22. 3.2009 19:14:43

Cui, H.; Heidorn, P.B.; Zhang, H.: ¬An approach to automatic classification of text for information retrieval (2002) 0.02

0.01942425 = product of:
  0.05827275 = sum of:
    0.05827275 = product of:
      0.087409124 = sum of:
        0.03634593 = weight(_text_:online in 174) [ClassicSimilarity], result of:
          0.03634593 = score(doc=174,freq=2.0), product of:
            0.1548489 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.051022716 = queryNorm
            0.23471867 = fieldWeight in 174, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.0546875 = fieldNorm(doc=174)
        0.05106319 = weight(_text_:retrieval in 174) [ClassicSimilarity], result of:
          0.05106319 = score(doc=174,freq=4.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.33085006 = fieldWeight in 174, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0546875 = fieldNorm(doc=174)
      0.6666667 = coord(2/3)
  0.33333334 = coord(1/3)

Abstract: In this paper, we explore an approach to make better use of semi-structured documents in information retrieval in the domain of biology. Using machine learning techniques, we make those inherent structures explicit by XML markups. This marking up has great potentials in improving task performance in specimen identification and the usability of online flora and fauna.

AlQenaei, Z.M.; Monarchi, D.E.: ¬The use of learning techniques to analyze the results of a manual classification system (2016) 0.01
```
0.013874464 = product of:
  0.04162339 = sum of:
    0.04162339 = product of:
      0.062435087 = sum of:
        0.025961377 = weight(_text_:online in 2836) [ClassicSimilarity], result of:
          0.025961377 = score(doc=2836,freq=2.0), product of:
            0.1548489 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.051022716 = queryNorm
            0.16765618 = fieldWeight in 2836, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2836)
        0.03647371 = weight(_text_:retrieval in 2836) [ClassicSimilarity], result of:
          0.03647371 = score(doc=2836,freq=4.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.23632148 = fieldWeight in 2836, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2836)
      0.6666667 = coord(2/3)
  0.33333334 = coord(1/3)
```
Abstract

Classification is the process of assigning objects to pre-defined classes based on observations or characteristics of those objects, and there are many approaches to performing this task. The overall objective of this study is to demonstrate the use of two learning techniques to analyze the results of a manual classification system. Our sample consisted of 1,026 documents, from the ACM Computing Classification System, classified by their authors as belonging to one of the groups of the classification system: "H.3 Information Storage and Retrieval." A singular value decomposition of the documents' weighted term-frequency matrix was used to represent each document in a 50-dimensional vector space. The analysis of the representation using both supervised (decision tree) and unsupervised (clustering) techniques suggests that two pairs of the ACM classes are closely related to each other in the vector space. Class 1 (Content Analysis and Indexing) is closely related to Class 3 (Information Search and Retrieval), and Class 4 (Systems and Software) is closely related to Class 5 (Online Information Services). Further analysis was performed to test the diffusion of the words in the two classes using both cosine and Euclidean distance.

Hagedorn, K.; Chapman, S.; Newman, D.: Enhancing search and browse using automated clustering of subject metadata (2007) 0.01

0.013800584 = product of:
  0.04140175 = sum of:
    0.04140175 = product of:
      0.062102623 = sum of:
        0.031153653 = weight(_text_:online in 1168) [ClassicSimilarity], result of:
          0.031153653 = score(doc=1168,freq=2.0), product of:
            0.1548489 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.051022716 = queryNorm
            0.20118743 = fieldWeight in 1168, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.046875 = fieldNorm(doc=1168)
        0.03094897 = weight(_text_:retrieval in 1168) [ClassicSimilarity], result of:
          0.03094897 = score(doc=1168,freq=2.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.20052543 = fieldWeight in 1168, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=1168)
      0.6666667 = coord(2/3)
  0.33333334 = coord(1/3)

Abstract: The Web puzzle of online information resources often hinders end-users from effective and efficient access to these resources. Clustering resources into appropriate subject-based groupings may help alleviate these difficulties, but will it work with heterogeneous material? The University of Michigan and the University of California Irvine joined forces to test automatically enhancing metadata records using the Topic Modeling algorithm on the varied OAIster corpus. We created labels for the resulting clusters of metadata records, matched the clusters to an in-house classification system, and developed a prototype that would showcase methods for search and retrieval using the enhanced records. Results indicated that while the algorithm was somewhat time-intensive to run and using a local classification scheme had its drawbacks, precise clustering of records was achieved and the prototype interface proved that faceted classification could be powerful in helping end-users find resources.

Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.01
```
0.013412262 = product of:
  0.040236786 = sum of:
    0.040236786 = product of:
      0.06035518 = sum of:
        0.025790809 = weight(_text_:retrieval in 1107) [ClassicSimilarity], result of:
          0.025790809 = score(doc=1107,freq=2.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.16710453 = fieldWeight in 1107, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1107)
        0.03456437 = weight(_text_:22 in 1107) [ClassicSimilarity], result of:
          0.03456437 = score(doc=1107,freq=2.0), product of:
            0.17867287 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051022716 = queryNorm
            0.19345059 = fieldWeight in 1107, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1107)
      0.6666667 = coord(2/3)
  0.33333334 = coord(1/3)
```
Abstract

Retrieval of disease information is often based on several key aspects such as etiology, diagnosis, treatment, prevention, and symptoms of diseases. Automatic identification of disease aspect information is thus essential. In this article, I model the aspect identification problem as a text classification (TC) problem in which a disease aspect corresponds to a category. The disease aspect classification problem poses two challenges to classifiers: (a) a medical text often contains information about multiple aspects of a disease and hence produces noise for the classifiers and (b) text classifiers often cannot extract the textual parts (i.e., passages) about the categories of interest. I thus develop a technique, PETC (Passage Extractor for Text Classification), that extracts passages (from medical texts) for the underlying text classifiers to classify. Case studies on thousands of Chinese and English medical texts show that PETC enhances a support vector machine (SVM) classifier in classifying disease aspect information. PETC also performs better than three state-of-the-art classifier enhancement techniques, including two passage extraction techniques for text classifiers and a technique that employs term proximity information to enhance text classifiers. The contribution is of significance to evidence-based medicine, health education, and healthcare decision support. PETC can be used in those application domains in which a text to be classified may have several parts about different categories.

Date

28.10.2013 19:22:57
Yilmaz, T.; Ozcan, R.; Altingovde, I.S.; Ulusoy, Ö.: Improving educational web search for question-like queries through subject classification (2019) 0.01
```
0.011500487 = product of:
  0.03450146 = sum of:
    0.03450146 = product of:
      0.051752187 = sum of:
        0.025961377 = weight(_text_:online in 5041) [ClassicSimilarity], result of:
          0.025961377 = score(doc=5041,freq=2.0), product of:
            0.1548489 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.051022716 = queryNorm
            0.16765618 = fieldWeight in 5041, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5041)
        0.025790809 = weight(_text_:retrieval in 5041) [ClassicSimilarity], result of:
          0.025790809 = score(doc=5041,freq=2.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.16710453 = fieldWeight in 5041, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5041)
      0.6666667 = coord(2/3)
  0.33333334 = coord(1/3)
```
Abstract

Students use general web search engines as their primary source of research while trying to find answers to school-related questions. Although search engines are highly relevant for the general population, they may return results that are out of educational context. Another rising trend; social community question answering websites are the second choice for students who try to get answers from other peers online. We attempt discovering possible improvements in educational search by leveraging both of these information sources. For this purpose, we first implement a classifier for educational questions. This classifier is built by an ensemble method that employs several regular learning algorithms and retrieval based approaches that utilize external resources. We also build a query expander to facilitate classification. We further improve the classification using search engine results and obtain 83.5% accuracy. Although our work is entirely based on the Turkish language, the features could easily be mapped to other languages as well. In order to find out whether search engine ranking can be improved in the education domain using the classification model, we collect and label a set of query results retrieved from a general web search engine. We propose five ad-hoc methods to improve search ranking based on the idea that the query-document category relation is an indicator of relevance. We evaluate these methods for overall performance, varying query length and based on factoid and non-factoid queries. We show that some of the methods significantly improve the rankings in the education domain.

Schiminovich, S.: Automatic classification and retrieval of documents by means of a bibliographic pattern discovery algorithm (1971) 0.01

0.011347376 = product of:
  0.034042127 = sum of:
    0.034042127 = product of:
      0.10212638 = sum of:
        0.10212638 = weight(_text_:retrieval in 4846) [ClassicSimilarity], result of:
          0.10212638 = score(doc=4846,freq=4.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.6617001 = fieldWeight in 4846, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.109375 = fieldNorm(doc=4846)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Source: Information storage and retrieval. 6(1971), S.417-435

Ardö, A.; Koch, T.: Automatic classification applied to full-text Internet documents in a robot-generated subject index (1999) 0.01

0.00979065 = product of:
  0.029371947 = sum of:
    0.029371947 = product of:
      0.08811584 = sum of:
        0.08811584 = weight(_text_:online in 382) [ClassicSimilarity], result of:
          0.08811584 = score(doc=382,freq=4.0), product of:
            0.1548489 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.051022716 = queryNorm
            0.569044 = fieldWeight in 382, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.09375 = fieldNorm(doc=382)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Source: Online information 99: 23rd International Online Information Meeting, Proceedings, London, 7-9 December 1999. Ed.: D. Raitt et al

Dolin, R.; Agrawal, D.; El Abbadi, A.; Pearlman, J.: Using automated classification for summarizing and selecting heterogeneous information sources (1998) 0.01
```
0.009758486 = product of:
  0.029275458 = sum of:
    0.029275458 = product of:
      0.043913186 = sum of:
        0.02202896 = weight(_text_:online in 1253) [ClassicSimilarity], result of:
          0.02202896 = score(doc=1253,freq=4.0), product of:
            0.1548489 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.051022716 = queryNorm
            0.142261 = fieldWeight in 1253, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.0234375 = fieldNorm(doc=1253)
        0.021884227 = weight(_text_:retrieval in 1253) [ClassicSimilarity], result of:
          0.021884227 = score(doc=1253,freq=4.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.1417929 = fieldWeight in 1253, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0234375 = fieldNorm(doc=1253)
      0.6666667 = coord(2/3)
  0.33333334 = coord(1/3)
```
Abstract

Information retrieval over the Internet increasingly requires the filtering of thousands of heterogeneous information sources. Important sources of information include not only traditional databases with structured data and queries, but also increasing numbers of non-traditional, semi- or unstructured collections such as Web sites, FTP archives, etc. As the number and variability of sources increases, new ways of automatically summarizing, discovering, and selecting collections relevant to a user's query are needed. One such method involves the use of classification schemes, such as the Library of Congress Classification (LCC), within which a collection may be represented based on its content, irrespective of the structure of the actual data or documents. For such a system to be useful in a large-scale distributed environment, it must be easy to use for both collection managers and users. As a result, it must be possible to classify documents automatically within a classification scheme. Furthermore, there must be a straightforward and intuitive interface with which the user may use the scheme to assist in information retrieval (IR). Our work with the Alexandria Digital Library (ADL) Project focuses on geo-referenced information, whether text, maps, aerial photographs, or satellite images. As a result, we have emphasized techniques which work with both text and non-text, such as combined textual and graphical queries, multi-dimensional indexing, and IR methods which are not solely dependent on words or phrases. Part of this work involves locating relevant online sources of information. In particular, we have designed and are currently testing aspects of an architecture, Pharos, which we believe will scale up to 1.000.000 heterogeneous sources. Pharos accommodates heterogeneity in content and format, both among multiple sources as well as within a single source. That is, we consider sources to include Web sites, FTP archives, newsgroups, and full digital libraries; all of these systems can include a wide variety of content and multimedia data formats. Pharos is based on the use of hierarchical classification schemes. These include not only well-known 'subject' (or 'concept') based schemes such as the Dewey Decimal System and the LCC, but also, for example, geographic classifications, which might be constructed as layers of smaller and smaller hierarchical longitude/latitude boxes. Pharos is designed to work with sophisticated queries which utilize subjects, geographical locations, temporal specifications, and other types of information domains. The Pharos architecture requires that hierarchically structured collection metadata be extracted so that it can be partitioned in such a way as to greatly enhance scalability. Automated classification is important to Pharos because it allows information sources to extract the requisite collection metadata automatically that must be distributed.
We are currently experimenting with newsgroups as collections. We have built an initial prototype which automatically classifies and summarizes newsgroups within the LCC. (The prototype can be tested below, and more details may be found at http://pharos.alexandria.ucsb.edu/). The prototype uses electronic library catalog records as a `training set' and Latent Semantic Indexing (LSI) for IR. We use the training set to build a rich set of classification terminology, and associate these terms with the relevant categories in the LCC. This association between terms and classification categories allows us to relate users' queries to nodes in the LCC so that users can select appropriate query categories. Newsgroups are similarly associated with classification categories. Pharos then matches the categories selected by users to relevant newsgroups. In principle, this approach allows users to exclude newsgroups that might have been selected based on an unintended meaning of a query term, and to include newsgroups with relevant content even though the exact query terms may not have been used. This work is extensible to other types of classification, including geographical, temporal, and image feature. Before discussing the methodology of the collection summarization and selection, we first present an online demonstration below. The demonstration is not intended to be a complete end-user interface. Rather, it is intended merely to offer a view of the process to suggest the "look and feel" of the prototype. The demo works as follows. First supply it with a few keywords of interest. The system will then use those terms to try to return to you the most relevant subject categories within the LCC. Assuming that the system recognizes any of your terms (it has over 400,000 terms indexed), it will give you a list of 15 LCC categories sorted by relevancy ranking. From there, you have two choices. The first choice, by clicking on the "News" links, is to get a list of newsgroups which the system has identified as relevant to the LCC category you select. The other choice, by clicking on the LCC ID links, is to enter the LCC hierarchy starting at the category of your choice and navigate the tree until you locate the best category for your query. From there, again, you can get a list of newsgroups by clicking on the "News" links. After having shown this demonstration to many people, we would like to suggest that you first give it easier examples before trying to break it. For example, "prostate cancer" (discussed below), "remote sensing", "investment banking", and "gershwin" all work reasonably well.

Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.01

0.009217165 = product of:
  0.027651494 = sum of:
    0.027651494 = product of:
      0.08295448 = sum of:
        0.08295448 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
          0.08295448 = score(doc=1046,freq=2.0), product of:
            0.17867287 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051022716 = queryNorm
            0.46428138 = fieldWeight in 1046, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=1046)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Date: 5. 5.2003 14:17:22

Rijsbergen, C.J. van: Automatic classification in information retrieval (1978) 0.01

0.009170066 = product of:
  0.027510196 = sum of:
    0.027510196 = product of:
      0.08253059 = sum of:
        0.08253059 = weight(_text_:retrieval in 2412) [ClassicSimilarity], result of:
          0.08253059 = score(doc=2412,freq=2.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.5347345 = fieldWeight in 2412, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.125 = fieldNorm(doc=2412)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Wu, M.; Fuller, M.; Wilkinson, R.: Using clustering and classification approaches in interactive retrieval (2001) 0.01

0.008023808 = product of:
  0.024071421 = sum of:
    0.024071421 = product of:
      0.07221426 = sum of:
        0.07221426 = weight(_text_:retrieval in 2666) [ClassicSimilarity], result of:
          0.07221426 = score(doc=2666,freq=2.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.46789268 = fieldWeight in 2666, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.109375 = fieldNorm(doc=2666)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.01

0.007680971 = product of:
  0.023042914 = sum of:
    0.023042914 = product of:
      0.06912874 = sum of:
        0.06912874 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
          0.06912874 = score(doc=2748,freq=2.0), product of:
            0.17867287 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051022716 = queryNorm
            0.38690117 = fieldWeight in 2748, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2748)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Date: 1. 2.2016 18:25:22

Search (68 results, page 1 of 4)

Authors

Years

Types

Themes