Search (216 results, page 1 of 11)

Wätjen, H.-J.; Diekmann, B.; Möller, G.; Carstensen, K.-U.: Bericht zum DFG-Projekt: GERHARD : German Harvest Automated Retrieval and Directory (1998) 0.05

0.0516568 = product of:
  0.11364496 = sum of:
    0.009064952 = product of:
      0.018129904 = sum of:
        0.018129904 = weight(_text_:h in 3065) [ClassicSimilarity], result of:
          0.018129904 = score(doc=3065,freq=2.0), product of:
            0.0660481 = queryWeight, product of:
              2.4844491 = idf(docFreq=10020, maxDocs=44218)
              0.026584605 = queryNorm
            0.27449545 = fieldWeight in 3065, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.4844491 = idf(docFreq=10020, maxDocs=44218)
              0.078125 = fieldNorm(doc=3065)
      0.5 = coord(1/2)
    0.03218541 = weight(_text_:r in 3065) [ClassicSimilarity], result of:
      0.03218541 = score(doc=3065,freq=2.0), product of:
        0.088001914 = queryWeight, product of:
          3.3102584 = idf(docFreq=4387, maxDocs=44218)
          0.026584605 = queryNorm
        0.36573532 = fieldWeight in 3065, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3102584 = idf(docFreq=4387, maxDocs=44218)
          0.078125 = fieldNorm(doc=3065)
    0.0034720355 = weight(_text_:s in 3065) [ClassicSimilarity], result of:
      0.0034720355 = score(doc=3065,freq=2.0), product of:
        0.028903782 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.026584605 = queryNorm
        0.120123915 = fieldWeight in 3065, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.078125 = fieldNorm(doc=3065)
    0.031492744 = weight(_text_:u in 3065) [ClassicSimilarity], result of:
      0.031492744 = score(doc=3065,freq=2.0), product of:
        0.08704981 = queryWeight, product of:
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.026584605 = queryNorm
        0.3617784 = fieldWeight in 3065, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.078125 = fieldNorm(doc=3065)
    0.037429813 = weight(_text_:k in 3065) [ClassicSimilarity], result of:
      0.037429813 = score(doc=3065,freq=2.0), product of:
        0.09490114 = queryWeight, product of:
          3.569778 = idf(docFreq=3384, maxDocs=44218)
          0.026584605 = queryNorm
        0.39440846 = fieldWeight in 3065, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.569778 = idf(docFreq=3384, maxDocs=44218)
          0.078125 = fieldNorm(doc=3065)
  0.45454547 = coord(5/11)

Pages: 34 S
Type: r

Bock, H.-H.: Datenanalyse zur Strukturierung und Ordnung von Information (1989) 0.02

0.022397306 = product of:
  0.049274072 = sum of:
    0.008973843 = product of:
      0.017947687 = sum of:
        0.017947687 = weight(_text_:h in 141) [ClassicSimilarity], result of:
          0.017947687 = score(doc=141,freq=4.0), product of:
            0.0660481 = queryWeight, product of:
              2.4844491 = idf(docFreq=10020, maxDocs=44218)
              0.026584605 = queryNorm
            0.27173662 = fieldWeight in 141, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.4844491 = idf(docFreq=10020, maxDocs=44218)
              0.0546875 = fieldNorm(doc=141)
      0.5 = coord(1/2)
    0.0027335514 = weight(_text_:a in 141) [ClassicSimilarity], result of:
      0.0027335514 = score(doc=141,freq=2.0), product of:
        0.030653298 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.026584605 = queryNorm
        0.089176424 = fieldWeight in 141, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0546875 = fieldNorm(doc=141)
    0.022529786 = weight(_text_:r in 141) [ClassicSimilarity], result of:
      0.022529786 = score(doc=141,freq=2.0), product of:
        0.088001914 = queryWeight, product of:
          3.3102584 = idf(docFreq=4387, maxDocs=44218)
          0.026584605 = queryNorm
        0.25601473 = fieldWeight in 141, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3102584 = idf(docFreq=4387, maxDocs=44218)
          0.0546875 = fieldNorm(doc=141)
    0.0024304248 = weight(_text_:s in 141) [ClassicSimilarity], result of:
      0.0024304248 = score(doc=141,freq=2.0), product of:
        0.028903782 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.026584605 = queryNorm
        0.08408674 = fieldWeight in 141, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.0546875 = fieldNorm(doc=141)
    0.012606464 = product of:
      0.025212929 = sum of:
        0.025212929 = weight(_text_:22 in 141) [ClassicSimilarity], result of:
          0.025212929 = score(doc=141,freq=2.0), product of:
            0.09309476 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.026584605 = queryNorm
            0.2708308 = fieldWeight in 141, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=141)
      0.5 = coord(1/2)
  0.45454547 = coord(5/11)

Pages: S.1-22
Source: Klassifikation und Ordnung. Tagungsband 12. Jahrestagung der Gesellschaft für Klassifikation, Darmstadt 17.-19.3.1988. Hrsg.: R. Wille
Type: a

Sparck Jones, K.: Automatic classification (1976) 0.02

0.02079955 = product of:
  0.076265015 = sum of:
    0.010822058 = weight(_text_:a in 2908) [ClassicSimilarity], result of:
      0.010822058 = score(doc=2908,freq=6.0), product of:
        0.030653298 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.026584605 = queryNorm
        0.3530471 = fieldWeight in 2908, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.125 = fieldNorm(doc=2908)
    0.0055552567 = weight(_text_:s in 2908) [ClassicSimilarity], result of:
      0.0055552567 = score(doc=2908,freq=2.0), product of:
        0.028903782 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.026584605 = queryNorm
        0.19219826 = fieldWeight in 2908, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.125 = fieldNorm(doc=2908)
    0.0598877 = weight(_text_:k in 2908) [ClassicSimilarity], result of:
      0.0598877 = score(doc=2908,freq=2.0), product of:
        0.09490114 = queryWeight, product of:
          3.569778 = idf(docFreq=3384, maxDocs=44218)
          0.026584605 = queryNorm
        0.63105357 = fieldWeight in 2908, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.569778 = idf(docFreq=3384, maxDocs=44218)
          0.125 = fieldNorm(doc=2908)
  0.27272728 = coord(3/11)

Pages: S.209-225
Source: Classification in the 1970s: a second look. Rev. ed. Ed.: A. Maltby
Type: a

Zhu, W.Z.; Allen, R.B.: Document clustering using the LSI subspace signature model (2013) 0.02

0.02030736 = product of:
  0.05584524 = sum of:
    0.0040582716 = weight(_text_:a in 690) [ClassicSimilarity], result of:
      0.0040582716 = score(doc=690,freq=6.0), product of:
        0.030653298 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.026584605 = queryNorm
        0.13239266 = fieldWeight in 690, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=690)
    0.0020832212 = weight(_text_:s in 690) [ClassicSimilarity], result of:
      0.0020832212 = score(doc=690,freq=2.0), product of:
        0.028903782 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.026584605 = queryNorm
        0.072074346 = fieldWeight in 690, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.046875 = fieldNorm(doc=690)
    0.038898204 = weight(_text_:k in 690) [ClassicSimilarity], result of:
      0.038898204 = score(doc=690,freq=6.0), product of:
        0.09490114 = queryWeight, product of:
          3.569778 = idf(docFreq=3384, maxDocs=44218)
          0.026584605 = queryNorm
        0.40988132 = fieldWeight in 690, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.569778 = idf(docFreq=3384, maxDocs=44218)
          0.046875 = fieldNorm(doc=690)
    0.010805541 = product of:
      0.021611081 = sum of:
        0.021611081 = weight(_text_:22 in 690) [ClassicSimilarity], result of:
          0.021611081 = score(doc=690,freq=2.0), product of:
            0.09309476 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.026584605 = queryNorm
            0.23214069 = fieldWeight in 690, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=690)
      0.5 = coord(1/2)
  0.36363637 = coord(4/11)

Abstract: We describe the latent semantic indexing subspace signature model (LSISSM) for semantic content representation of unstructured text. Grounded on singular value decomposition, the model represents terms and documents by the distribution signatures of their statistical contribution across the top-ranking latent concept dimensions. LSISSM matches term signatures with document signatures according to their mapping coherence between latent semantic indexing (LSI) term subspace and LSI document subspace. LSISSM does feature reduction and finds a low-rank approximation of scalable and sparse term-document matrices. Experiments demonstrate that this approach significantly improves the performance of major clustering algorithms such as standard K-means and self-organizing maps compared with the vector space model and the traditional LSI model. The unique contribution ranking mechanism in LSISSM also improves the initialization of standard K-means compared with random seeding procedure, which sometimes causes low efficiency and effectiveness of clustering. A two-stage initialization strategy based on LSISSM significantly reduces the running time of standard K-means procedures.
Date: 23. 3.2013 13:22:36
Source: Journal of the American Society for Information Science and Technology. 64(2013) no.4, S.844-860
Type: a

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.02

0.018460881 = product of:
  0.05076742 = sum of:
    0.03166755 = product of:
      0.1266702 = sum of:
        0.1266702 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.1266702 = score(doc=562,freq=2.0), product of:
            0.22538458 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.026584605 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.25 = coord(1/4)
    0.0046860883 = weight(_text_:a in 562) [ClassicSimilarity], result of:
      0.0046860883 = score(doc=562,freq=8.0), product of:
        0.030653298 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.026584605 = queryNorm
        0.15287387 = fieldWeight in 562, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.0036082454 = weight(_text_:s in 562) [ClassicSimilarity], result of:
      0.0036082454 = score(doc=562,freq=6.0), product of:
        0.028903782 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.026584605 = queryNorm
        0.124836445 = fieldWeight in 562, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.010805541 = product of:
      0.021611081 = sum of:
        0.021611081 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.021611081 = score(doc=562,freq=2.0), product of:
            0.09309476 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.026584605 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
  0.36363637 = coord(4/11)

Abstract: Document representations for text classification are typically based on the classical Bag-Of-Words paradigm. This approach comes with deficiencies that motivate the integration of features on a higher semantic level than single words. In this paper we propose an enhancement of the classical document representation through concepts extracted from background knowledge. Boosting is used for actual classification. Experimental evaluations on two well known text corpora support our approach through consistent improvement of the results.
Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32
Pages: S.331-334
Type: a

Reiner, U.: VZG-Projekt Colibri : Bewertung von automatisch DDC-klassifizierten Titeldatensätzen der Deutschen Nationalbibliothek (DNB) (2009) 0.02

0.01820914 = product of:
  0.040060107 = sum of:
    0.004532476 = product of:
      0.009064952 = sum of:
        0.009064952 = weight(_text_:h in 2675) [ClassicSimilarity], result of:
          0.009064952 = score(doc=2675,freq=2.0), product of:
            0.0660481 = queryWeight, product of:
              2.4844491 = idf(docFreq=10020, maxDocs=44218)
              0.026584605 = queryNorm
            0.13724773 = fieldWeight in 2675, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.4844491 = idf(docFreq=10020, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2675)
      0.5 = coord(1/2)
    0.0019525366 = weight(_text_:a in 2675) [ClassicSimilarity], result of:
      0.0019525366 = score(doc=2675,freq=2.0), product of:
        0.030653298 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.026584605 = queryNorm
        0.06369744 = fieldWeight in 2675, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2675)
    0.016092705 = weight(_text_:r in 2675) [ClassicSimilarity], result of:
      0.016092705 = score(doc=2675,freq=2.0), product of:
        0.088001914 = queryWeight, product of:
          3.3102584 = idf(docFreq=4387, maxDocs=44218)
          0.026584605 = queryNorm
        0.18286766 = fieldWeight in 2675, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3102584 = idf(docFreq=4387, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2675)
    0.0017360178 = weight(_text_:s in 2675) [ClassicSimilarity], result of:
      0.0017360178 = score(doc=2675,freq=2.0), product of:
        0.028903782 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.026584605 = queryNorm
        0.060061958 = fieldWeight in 2675, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2675)
    0.015746372 = weight(_text_:u in 2675) [ClassicSimilarity], result of:
      0.015746372 = score(doc=2675,freq=2.0), product of:
        0.08704981 = queryWeight, product of:
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.026584605 = queryNorm
        0.1808892 = fieldWeight in 2675, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2675)
  0.45454547 = coord(5/11)

Abstract: Das VZG-Projekt Colibri/DDC beschäftigt sich seit 2003 mit automatischen Verfahren zur Dewey-Dezimalklassifikation (Dewey Decimal Classification, kurz DDC). Ziel des Projektes ist eine einheitliche DDC-Erschließung von bibliografischen Titeldatensätzen und eine Unterstützung der DDC-Expert(inn)en und DDC-Laien, z. B. bei der Analyse und Synthese von DDC-Notationen und deren Qualitätskontrolle und der DDC-basierten Suche. Der vorliegende Bericht konzentriert sich auf die erste größere automatische DDC-Klassifizierung und erste automatische und intellektuelle Bewertung mit der Klassifizierungskomponente vc_dcl1. Grundlage hierfür waren die von der Deutschen Nationabibliothek (DNB) im November 2007 zur Verfügung gestellten 25.653 Titeldatensätze (12 Wochen-/Monatslieferungen) der Deutschen Nationalbibliografie der Reihen A, B und H. Nach Erläuterung der automatischen DDC-Klassifizierung und automatischen Bewertung in Kapitel 2 wird in Kapitel 3 auf den DNB-Bericht "Colibri_Auswertung_DDC_Endbericht_Sommer_2008" eingegangen. Es werden Sachverhalte geklärt und Fragen gestellt, deren Antworten die Weichen für den Verlauf der weiteren Klassifizierungstests stellen werden. Über das Kapitel 3 hinaus führende weitergehende Betrachtungen und Gedanken zur Fortführung der automatischen DDC-Klassifizierung werden in Kapitel 4 angestellt. Der Bericht dient dem vertieften Verständnis für die automatischen Verfahren.
Pages: 111 S
Type: r

Han, K.; Rezapour, R.; Nakamura, K.; Devkota, D.; Miller, D.C.; Diesner, J.: ¬An expert-in-the-loop method for domain-specific document categorization based on small training data (2023) 0.02

0.01811571 = product of:
  0.049818203 = sum of:
    0.0055226083 = weight(_text_:a in 967) [ClassicSimilarity], result of:
      0.0055226083 = score(doc=967,freq=16.0), product of:
        0.030653298 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.026584605 = queryNorm
        0.18016359 = fieldWeight in 967, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=967)
    0.016092705 = weight(_text_:r in 967) [ClassicSimilarity], result of:
      0.016092705 = score(doc=967,freq=2.0), product of:
        0.088001914 = queryWeight, product of:
          3.3102584 = idf(docFreq=4387, maxDocs=44218)
          0.026584605 = queryNorm
        0.18286766 = fieldWeight in 967, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3102584 = idf(docFreq=4387, maxDocs=44218)
          0.0390625 = fieldNorm(doc=967)
    0.0017360178 = weight(_text_:s in 967) [ClassicSimilarity], result of:
      0.0017360178 = score(doc=967,freq=2.0), product of:
        0.028903782 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.026584605 = queryNorm
        0.060061958 = fieldWeight in 967, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.0390625 = fieldNorm(doc=967)
    0.026466874 = weight(_text_:k in 967) [ClassicSimilarity], result of:
      0.026466874 = score(doc=967,freq=4.0), product of:
        0.09490114 = queryWeight, product of:
          3.569778 = idf(docFreq=3384, maxDocs=44218)
          0.026584605 = queryNorm
        0.2788889 = fieldWeight in 967, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.569778 = idf(docFreq=3384, maxDocs=44218)
          0.0390625 = fieldNorm(doc=967)
  0.36363637 = coord(4/11)

Abstract: Automated text categorization methods are of broad relevance for domain experts since they free researchers and practitioners from manual labeling, save their resources (e.g., time, labor), and enrich the data with information helpful to study substantive questions. Despite a variety of newly developed categorization methods that require substantial amounts of annotated data, little is known about how to build models when (a) labeling texts with categories requires substantial domain expertise and/or in-depth reading, (b) only a few annotated documents are available for model training, and (c) no relevant computational resources, such as pretrained models, are available. In a collaboration with environmental scientists who study the socio-ecological impact of funded biodiversity conservation projects, we develop a method that integrates deep domain expertise with computational models to automatically categorize project reports based on a small sample of 93 annotated documents. Our results suggest that domain expertise can improve automated categorization and that the magnitude of these improvements is influenced by the experts' understanding of categories and their confidence in their annotation, as well as data sparsity and additional category characteristics such as the portion of exclusive keywords that can identify a category.
Source: Journal of the Association for Information Science and Technology. 74(2023) no.6, S.669-684
Type: a

Yi, K.: Automatic text classification using library classification schemes : trends, issues and challenges (2007) 0.02

0.017218243 = product of:
  0.04735017 = sum of:
    0.006112407 = weight(_text_:a in 2560) [ClassicSimilarity], result of:
      0.006112407 = score(doc=2560,freq=10.0), product of:
        0.030653298 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.026584605 = queryNorm
        0.19940455 = fieldWeight in 2560, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2560)
    0.0024304248 = weight(_text_:s in 2560) [ClassicSimilarity], result of:
      0.0024304248 = score(doc=2560,freq=2.0), product of:
        0.028903782 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.026584605 = queryNorm
        0.08408674 = fieldWeight in 2560, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2560)
    0.02620087 = weight(_text_:k in 2560) [ClassicSimilarity], result of:
      0.02620087 = score(doc=2560,freq=2.0), product of:
        0.09490114 = queryWeight, product of:
          3.569778 = idf(docFreq=3384, maxDocs=44218)
          0.026584605 = queryNorm
        0.27608594 = fieldWeight in 2560, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.569778 = idf(docFreq=3384, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2560)
    0.012606464 = product of:
      0.025212929 = sum of:
        0.025212929 = weight(_text_:22 in 2560) [ClassicSimilarity], result of:
          0.025212929 = score(doc=2560,freq=2.0), product of:
            0.09309476 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.026584605 = queryNorm
            0.2708308 = fieldWeight in 2560, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2560)
      0.5 = coord(1/2)
  0.36363637 = coord(4/11)

Abstract: The proliferation of digital resources and their integration into a traditional library setting has created a pressing need for an automated tool that organizes textual information based on library classification schemes. Automated text classification is a research field of developing tools, methods, and models to automate text classification. This article describes the current popular approach for text classification and major text classification projects and applications that are based on library classification schemes. Related issues and challenges are discussed, and a number of considerations for the challenges are examined.
Date: 22. 9.2008 18:31:54
Source: International cataloguing and bibliographic control. 36(2007) no.4, S.78-82
Type: a

Reiner, U.: DDC-based search in the data of the German National Bibliography (2008) 0.02

0.0166472 = product of:
  0.0457798 = sum of:
    0.0023430442 = weight(_text_:a in 2166) [ClassicSimilarity], result of:
      0.0023430442 = score(doc=2166,freq=2.0), product of:
        0.030653298 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.026584605 = queryNorm
        0.07643694 = fieldWeight in 2166, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=2166)
    0.0020832212 = weight(_text_:s in 2166) [ClassicSimilarity], result of:
      0.0020832212 = score(doc=2166,freq=2.0), product of:
        0.028903782 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.026584605 = queryNorm
        0.072074346 = fieldWeight in 2166, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.046875 = fieldNorm(doc=2166)
    0.018895645 = weight(_text_:u in 2166) [ClassicSimilarity], result of:
      0.018895645 = score(doc=2166,freq=2.0), product of:
        0.08704981 = queryWeight, product of:
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.026584605 = queryNorm
        0.21706703 = fieldWeight in 2166, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.046875 = fieldNorm(doc=2166)
    0.022457888 = weight(_text_:k in 2166) [ClassicSimilarity], result of:
      0.022457888 = score(doc=2166,freq=2.0), product of:
        0.09490114 = queryWeight, product of:
          3.569778 = idf(docFreq=3384, maxDocs=44218)
          0.026584605 = queryNorm
        0.23664509 = fieldWeight in 2166, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.569778 = idf(docFreq=3384, maxDocs=44218)
          0.046875 = fieldNorm(doc=2166)
  0.36363637 = coord(4/11)

Pages: S.121-129
Source: New pespectives on subject indexing and classification: essays in honour of Magda Heiner-Freiling. Red.: K. Knull-Schlomann, u.a
Type: a

Cathey, R.J.; Jensen, E.C.; Beitzel, S.M.; Frieder, O.; Grossman, D.: Exploiting parallelism to support scalable hierarchical clustering (2007) 0.02

0.015897594 = product of:
  0.043718383 = sum of:
    0.004782719 = weight(_text_:a in 448) [ClassicSimilarity], result of:
      0.004782719 = score(doc=448,freq=12.0), product of:
        0.030653298 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.026584605 = queryNorm
        0.15602624 = fieldWeight in 448, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=448)
    0.01848474 = product of:
      0.07393896 = sum of:
        0.07393896 = weight(_text_:o in 448) [ClassicSimilarity], result of:
          0.07393896 = score(doc=448,freq=8.0), product of:
            0.13338262 = queryWeight, product of:
              5.017288 = idf(docFreq=795, maxDocs=44218)
              0.026584605 = queryNorm
            0.55433726 = fieldWeight in 448, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              5.017288 = idf(docFreq=795, maxDocs=44218)
              0.0390625 = fieldNorm(doc=448)
      0.25 = coord(1/4)
    0.0017360178 = weight(_text_:s in 448) [ClassicSimilarity], result of:
      0.0017360178 = score(doc=448,freq=2.0), product of:
        0.028903782 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.026584605 = queryNorm
        0.060061958 = fieldWeight in 448, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.0390625 = fieldNorm(doc=448)
    0.018714907 = weight(_text_:k in 448) [ClassicSimilarity], result of:
      0.018714907 = score(doc=448,freq=2.0), product of:
        0.09490114 = queryWeight, product of:
          3.569778 = idf(docFreq=3384, maxDocs=44218)
          0.026584605 = queryNorm
        0.19720423 = fieldWeight in 448, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.569778 = idf(docFreq=3384, maxDocs=44218)
          0.0390625 = fieldNorm(doc=448)
  0.36363637 = coord(4/11)

Abstract: A distributed memory parallel version of the group average hierarchical agglomerative clustering algorithm is proposed to enable scaling the document clustering problem to large collections. Using standard message passing operations reduces interprocess communication while maintaining efficient load balancing. In a series of experiments using a subset of a standard Text REtrieval Conference (TREC) test collection, our parallel hierarchical clustering algorithm is shown to be scalable in terms of processors efficiently used and the collection size. Results show that our algorithm performs close to the expected O(n**2/p) time on p processors rather than the worst-case O(n**3/p) time. Furthermore, the O(n**2/p) memory complexity per node allows larger collections to be clustered as the number of nodes increases. While partitioning algorithms such as k-means are trivially parallelizable, our results confirm those of other studies which showed that hierarchical algorithms produce significantly tighter clusters in the document clustering task. Finally, we show how our parallel hierarchical agglomerative clustering algorithm can be used as the clustering subroutine for a parallel version of the buckshot algorithm to cluster the complete TREC collection at near theoretical runtime expectations.
Source: Journal of the American Society for Information Science and Technology. 58(2007) no.8, S.1207-1221
Type: a

Sojka, P.; Lee, M.; Rehurek, R.; Hatlapatka, R.; Kucbel, M.; Bouche, T.; Goutorbe, C.; Anghelache, R.; Wojciechowski, K.: Toolset for entity and semantic associations : Final Release (2013) 0.02

0.015886089 = product of:
  0.05824899 = sum of:
    0.0023430442 = weight(_text_:a in 1057) [ClassicSimilarity], result of:
      0.0023430442 = score(doc=1057,freq=2.0), product of:
        0.030653298 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.026584605 = queryNorm
        0.07643694 = fieldWeight in 1057, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=1057)
    0.03344806 = weight(_text_:r in 1057) [ClassicSimilarity], result of:
      0.03344806 = score(doc=1057,freq=6.0), product of:
        0.088001914 = queryWeight, product of:
          3.3102584 = idf(docFreq=4387, maxDocs=44218)
          0.026584605 = queryNorm
        0.38008332 = fieldWeight in 1057, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.3102584 = idf(docFreq=4387, maxDocs=44218)
          0.046875 = fieldNorm(doc=1057)
    0.022457888 = weight(_text_:k in 1057) [ClassicSimilarity], result of:
      0.022457888 = score(doc=1057,freq=2.0), product of:
        0.09490114 = queryWeight, product of:
          3.569778 = idf(docFreq=3384, maxDocs=44218)
          0.026584605 = queryNorm
        0.23664509 = fieldWeight in 1057, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.569778 = idf(docFreq=3384, maxDocs=44218)
          0.046875 = fieldNorm(doc=1057)
  0.27272728 = coord(3/11)

Abstract: In this document we describe the final release of the toolset for entity and semantic associations, integrating two versions (language dependent and language independent) of Unsupervised Document Similarity implemented by MU (using gensim tool) and Citation Indexing, Resolution and Matching (UJF/CMD). We give a brief description of tools, the rationale behind decisions made, and provide elementary evaluation. Tools are integrated in the main project result, EuDML website, and they deliver the needed functionality for exploratory searching and browsing the collected documents. EuDML users and content providers thus benefit from millions of algorithmically generated similarity and citation links, developed using state of the art machine learning and matching methods.

Golub, K.; Hansson, J.; Soergel, D.; Tudhope, D.: Managing classification in libraries : a methodological outline for evaluating automatic subject indexing and classification in Swedish library catalogues (2015) 0.02

0.015432358 = product of:
  0.042438984 = sum of:
    0.0055226083 = weight(_text_:a in 2300) [ClassicSimilarity], result of:
      0.0055226083 = score(doc=2300,freq=16.0), product of:
        0.030653298 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.026584605 = queryNorm
        0.18016359 = fieldWeight in 2300, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2300)
    0.0024550997 = weight(_text_:s in 2300) [ClassicSimilarity], result of:
      0.0024550997 = score(doc=2300,freq=4.0), product of:
        0.028903782 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.026584605 = queryNorm
        0.08494043 = fieldWeight in 2300, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2300)
    0.015746372 = weight(_text_:u in 2300) [ClassicSimilarity], result of:
      0.015746372 = score(doc=2300,freq=2.0), product of:
        0.08704981 = queryWeight, product of:
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.026584605 = queryNorm
        0.1808892 = fieldWeight in 2300, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2300)
    0.018714907 = weight(_text_:k in 2300) [ClassicSimilarity], result of:
      0.018714907 = score(doc=2300,freq=2.0), product of:
        0.09490114 = queryWeight, product of:
          3.569778 = idf(docFreq=3384, maxDocs=44218)
          0.026584605 = queryNorm
        0.19720423 = fieldWeight in 2300, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.569778 = idf(docFreq=3384, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2300)
  0.36363637 = coord(4/11)

Abstract: Subject terms play a crucial role in resource discovery but require substantial effort to produce. Automatic subject classification and indexing address problems of scale and sustainability and can be used to enrich existing bibliographic records, establish more connections across and between resources and enhance consistency of bibliographic data. The paper aims to put forward a complex methodological framework to evaluate automatic classification tools of Swedish textual documents based on the Dewey Decimal Classification (DDC) recently introduced to Swedish libraries. Three major complementary approaches are suggested: a quality-built gold standard, retrieval effects, domain analysis. The gold standard is built based on input from at least two catalogue librarians, end-users expert in the subject, end users inexperienced in the subject and automated tools. Retrieval effects are studied through a combination of assigned and free tasks, including factual and comprehensive types. The study also takes into consideration the different role and character of subject terms in various knowledge domains, such as scientific disciplines. As a theoretical framework, domain analysis is used and applied in relation to the implementation of DDC in Swedish libraries and chosen domains of knowledge within the DDC itself.
Location: S
Pages: S.163-175
Source: Classification and authority control: expanding resource discovery: proceedings of the International UDC Seminar 2015, 29-30 October 2015, Lisbon, Portugal. Eds.: Slavic, A. u. M.I. Cordeiro
Type: a

Borko, H.: Research in computer based classification systems (1985) 0.02
```
0.015358961 = product of:
  0.033789713 = sum of:
    0.003172733 = product of:
      0.006345466 = sum of:
        0.006345466 = weight(_text_:h in 3647) [ClassicSimilarity], result of:
          0.006345466 = score(doc=3647,freq=2.0), product of:
            0.0660481 = queryWeight, product of:
              2.4844491 = idf(docFreq=10020, maxDocs=44218)
              0.026584605 = queryNorm
            0.096073404 = fieldWeight in 3647, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.4844491 = idf(docFreq=10020, maxDocs=44218)
              0.02734375 = fieldNorm(doc=3647)
      0.5 = coord(1/2)
    0.0045330822 = weight(_text_:a in 3647) [ClassicSimilarity], result of:
      0.0045330822 = score(doc=3647,freq=22.0), product of:
        0.030653298 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.026584605 = queryNorm
        0.14788237 = fieldWeight in 3647, product of:
          4.690416 = tf(freq=22.0), with freq of:
            22.0 = termFreq=22.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.02734375 = fieldNorm(doc=3647)
    0.011264893 = weight(_text_:r in 3647) [ClassicSimilarity], result of:
      0.011264893 = score(doc=3647,freq=2.0), product of:
        0.088001914 = queryWeight, product of:
          3.3102584 = idf(docFreq=4387, maxDocs=44218)
          0.026584605 = queryNorm
        0.12800737 = fieldWeight in 3647, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3102584 = idf(docFreq=4387, maxDocs=44218)
          0.02734375 = fieldNorm(doc=3647)
    0.00171857 = weight(_text_:s in 3647) [ClassicSimilarity], result of:
      0.00171857 = score(doc=3647,freq=4.0), product of:
        0.028903782 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.026584605 = queryNorm
        0.059458308 = fieldWeight in 3647, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.02734375 = fieldNorm(doc=3647)
    0.013100435 = weight(_text_:k in 3647) [ClassicSimilarity], result of:
      0.013100435 = score(doc=3647,freq=2.0), product of:
        0.09490114 = queryWeight, product of:
          3.569778 = idf(docFreq=3384, maxDocs=44218)
          0.026584605 = queryNorm
        0.13804297 = fieldWeight in 3647, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.569778 = idf(docFreq=3384, maxDocs=44218)
          0.02734375 = fieldNorm(doc=3647)
  0.45454547 = coord(5/11)
```
Abstract

The selection in this reader by R. M. Needham and K. Sparck Jones reports an early approach to automatic classification that was taken in England. The following selection reviews various approaches that were being pursued in the United States at about the same time. It then discusses a particular approach initiated in the early 1960s by Harold Borko, at that time Head of the Language Processing and Retrieval Research Staff at the System Development Corporation, Santa Monica, California and, since 1966, a member of the faculty at the Graduate School of Library and Information Science, University of California, Los Angeles. As was described earlier, there are two steps in automatic classification, the first being to identify pairs of terms that are similar by virtue of co-occurring as index terms in the same documents, and the second being to form equivalence classes of intersubstitutable terms. To compute similarities, Borko and his associates used a standard correlation formula; to derive classification categories, where Needham and Sparck Jones used clumping, the Borko team used the statistical technique of factor analysis. The fact that documents can be classified automatically, and in any number of ways, is worthy of passing notice. Worthy of serious attention would be a demonstra tion that a computer-based classification system was effective in the organization and retrieval of documents. One reason for the inclusion of the following selection in the reader is that it addresses the question of evaluation. To evaluate the effectiveness of their automatically derived classification, Borko and his team asked three questions. The first was Is the classification reliable? in other words, could the categories derived from one sample of texts be used to classify other texts? Reliability was assessed by a case-study comparison of the classes derived from three different samples of abstracts. The notso-surprising conclusion reached was that automatically derived classes were reliable only to the extent that the sample from which they were derived was representative of the total document collection. The second evaluation question asked whether the classification was reasonable, in the sense of adequately describing the content of the document collection. The answer was sought by comparing the automatically derived categories with categories in a related classification system that was manually constructed. Here the conclusion was that the automatic method yielded categories that fairly accurately reflected the major area of interest in the sample collection of texts; however, since there were only eleven such categories and they were quite broad, they could not be regarded as suitable for use in a university or any large general library. The third evaluation question asked whether automatic classification was accurate, in the sense of producing results similar to those obtainabie by human cIassifiers. When using human classification as a criterion, automatic classification was found to be 50 percent accurate.

Footnote

Original in: Classification research: Proceedings of the Second International Study Conference held at Hotel Prins Hamlet, Elsinore, Denmark, 14th-18th Sept. 1964. Ed.: Pauline Atherton. Copenhagen: Munksgaard 1965. S.220-238.

Pages

S.287-305

Source

Theory of subject analysis: a sourcebook. Ed.: L.M. Chan, et al

Type

a

Wu, M.; Fuller, M.; Wilkinson, R.: Using clustering and classification approaches in interactive retrieval (2001) 0.02

0.015105689 = product of:
  0.055387523 = sum of:
    0.005467103 = weight(_text_:a in 2666) [ClassicSimilarity], result of:
      0.005467103 = score(doc=2666,freq=2.0), product of:
        0.030653298 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.026584605 = queryNorm
        0.17835285 = fieldWeight in 2666, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.109375 = fieldNorm(doc=2666)
    0.045059573 = weight(_text_:r in 2666) [ClassicSimilarity], result of:
      0.045059573 = score(doc=2666,freq=2.0), product of:
        0.088001914 = queryWeight, product of:
          3.3102584 = idf(docFreq=4387, maxDocs=44218)
          0.026584605 = queryNorm
        0.51202947 = fieldWeight in 2666, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3102584 = idf(docFreq=4387, maxDocs=44218)
          0.109375 = fieldNorm(doc=2666)
    0.0048608496 = weight(_text_:s in 2666) [ClassicSimilarity], result of:
      0.0048608496 = score(doc=2666,freq=2.0), product of:
        0.028903782 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.026584605 = queryNorm
        0.16817348 = fieldWeight in 2666, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.109375 = fieldNorm(doc=2666)
  0.27272728 = coord(3/11)

Source: Information processing and management. 37(2001) no.3, S.459-484
Type: a

Yu, W.; Gong, Y.: Document clustering by concept factorization (2004) 0.01

0.014664084 = product of:
  0.053768307 = sum of:
    0.0046860883 = weight(_text_:a in 4084) [ClassicSimilarity], result of:
      0.0046860883 = score(doc=4084,freq=2.0), product of:
        0.030653298 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.026584605 = queryNorm
        0.15287387 = fieldWeight in 4084, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.09375 = fieldNorm(doc=4084)
    0.0041664424 = weight(_text_:s in 4084) [ClassicSimilarity], result of:
      0.0041664424 = score(doc=4084,freq=2.0), product of:
        0.028903782 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.026584605 = queryNorm
        0.14414869 = fieldWeight in 4084, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.09375 = fieldNorm(doc=4084)
    0.044915777 = weight(_text_:k in 4084) [ClassicSimilarity], result of:
      0.044915777 = score(doc=4084,freq=2.0), product of:
        0.09490114 = queryWeight, product of:
          3.569778 = idf(docFreq=3384, maxDocs=44218)
          0.026584605 = queryNorm
        0.47329018 = fieldWeight in 4084, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.569778 = idf(docFreq=3384, maxDocs=44218)
          0.09375 = fieldNorm(doc=4084)
  0.27272728 = coord(3/11)

Pages: S.202-209
Source: SIGIR'04: Proceedings of the 27th Annual International ACM-SIGIR Conference an Research and Development in Information Retrieval. Ed.: K. Järvelin, u.a
Type: a

Fangmeyer, H.; Gloden, R.: Bewertung und Vergleich von Klassifikationsergebnissen bei automatischen Verfahren (1978) 0.01

0.014146174 = product of:
  0.038901977 = sum of:
    0.007251961 = product of:
      0.014503922 = sum of:
        0.014503922 = weight(_text_:h in 81) [ClassicSimilarity], result of:
          0.014503922 = score(doc=81,freq=2.0), product of:
            0.0660481 = queryWeight, product of:
              2.4844491 = idf(docFreq=10020, maxDocs=44218)
              0.026584605 = queryNorm
            0.21959636 = fieldWeight in 81, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.4844491 = idf(docFreq=10020, maxDocs=44218)
              0.0625 = fieldNorm(doc=81)
      0.5 = coord(1/2)
    0.0031240587 = weight(_text_:a in 81) [ClassicSimilarity], result of:
      0.0031240587 = score(doc=81,freq=2.0), product of:
        0.030653298 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.026584605 = queryNorm
        0.10191591 = fieldWeight in 81, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0625 = fieldNorm(doc=81)
    0.025748327 = weight(_text_:r in 81) [ClassicSimilarity], result of:
      0.025748327 = score(doc=81,freq=2.0), product of:
        0.088001914 = queryWeight, product of:
          3.3102584 = idf(docFreq=4387, maxDocs=44218)
          0.026584605 = queryNorm
        0.29258826 = fieldWeight in 81, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3102584 = idf(docFreq=4387, maxDocs=44218)
          0.0625 = fieldNorm(doc=81)
    0.0027776284 = weight(_text_:s in 81) [ClassicSimilarity], result of:
      0.0027776284 = score(doc=81,freq=2.0), product of:
        0.028903782 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.026584605 = queryNorm
        0.09609913 = fieldWeight in 81, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.0625 = fieldNorm(doc=81)
  0.36363637 = coord(4/11)

Pages: S.147-155
Type: a

Liu, R.-L.: Context recognition for hierarchical text classification (2009) 0.01

0.013963317 = product of:
  0.03839912 = sum of:
    0.0061991126 = weight(_text_:a in 2760) [ClassicSimilarity], result of:
      0.0061991126 = score(doc=2760,freq=14.0), product of:
        0.030653298 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.026584605 = queryNorm
        0.20223314 = fieldWeight in 2760, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=2760)
    0.019311246 = weight(_text_:r in 2760) [ClassicSimilarity], result of:
      0.019311246 = score(doc=2760,freq=2.0), product of:
        0.088001914 = queryWeight, product of:
          3.3102584 = idf(docFreq=4387, maxDocs=44218)
          0.026584605 = queryNorm
        0.2194412 = fieldWeight in 2760, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3102584 = idf(docFreq=4387, maxDocs=44218)
          0.046875 = fieldNorm(doc=2760)
    0.0020832212 = weight(_text_:s in 2760) [ClassicSimilarity], result of:
      0.0020832212 = score(doc=2760,freq=2.0), product of:
        0.028903782 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.026584605 = queryNorm
        0.072074346 = fieldWeight in 2760, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.046875 = fieldNorm(doc=2760)
    0.010805541 = product of:
      0.021611081 = sum of:
        0.021611081 = weight(_text_:22 in 2760) [ClassicSimilarity], result of:
          0.021611081 = score(doc=2760,freq=2.0), product of:
            0.09309476 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.026584605 = queryNorm
            0.23214069 = fieldWeight in 2760, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2760)
      0.5 = coord(1/2)
  0.36363637 = coord(4/11)

Abstract: Information is often organized as a text hierarchy. A hierarchical text-classification system is thus essential for the management, sharing, and dissemination of information. It aims to automatically classify each incoming document into zero, one, or several categories in the text hierarchy. In this paper, we present a technique called CRHTC (context recognition for hierarchical text classification) that performs hierarchical text classification by recognizing the context of discussion (COD) of each category. A category's COD is governed by its ancestor categories, whose contents indicate contextual backgrounds of the category. A document may be classified into a category only if its content matches the category's COD. CRHTC does not require any trials to manually set parameters, and hence is more portable and easier to implement than other methods. It is empirically evaluated under various conditions. The results show that CRHTC achieves both better and more stable performance than several hierarchical and nonhierarchical text-classification methodologies.
Date: 22. 3.2009 19:11:54
Source: Journal of the American Society for Information Science and Technology. 60(2009) no.4, S.803-813
Type: a

Pong, J.Y.-H.; Kwok, R.C.-W.; Lau, R.Y.-K.; Hao, J.-X.; Wong, P.C.-C.: ¬A comparative study of two automatic document classification methods in a library setting (2008) 0.01

0.013911991 = product of:
  0.038257975 = sum of:
    0.004532476 = product of:
      0.009064952 = sum of:
        0.009064952 = weight(_text_:h in 2532) [ClassicSimilarity], result of:
          0.009064952 = score(doc=2532,freq=2.0), product of:
            0.0660481 = queryWeight, product of:
              2.4844491 = idf(docFreq=10020, maxDocs=44218)
              0.026584605 = queryNorm
            0.13724773 = fieldWeight in 2532, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.4844491 = idf(docFreq=10020, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2532)
      0.5 = coord(1/2)
    0.0055226083 = weight(_text_:a in 2532) [ClassicSimilarity], result of:
      0.0055226083 = score(doc=2532,freq=16.0), product of:
        0.030653298 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.026584605 = queryNorm
        0.18016359 = fieldWeight in 2532, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2532)
    0.0017360178 = weight(_text_:s in 2532) [ClassicSimilarity], result of:
      0.0017360178 = score(doc=2532,freq=2.0), product of:
        0.028903782 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.026584605 = queryNorm
        0.060061958 = fieldWeight in 2532, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2532)
    0.026466874 = weight(_text_:k in 2532) [ClassicSimilarity], result of:
      0.026466874 = score(doc=2532,freq=4.0), product of:
        0.09490114 = queryWeight, product of:
          3.569778 = idf(docFreq=3384, maxDocs=44218)
          0.026584605 = queryNorm
        0.2788889 = fieldWeight in 2532, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.569778 = idf(docFreq=3384, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2532)
  0.36363637 = coord(4/11)

Abstract: In current library practice, trained human experts usually carry out document cataloguing and indexing based on a manual approach. With the explosive growth in the number of electronic documents available on the Internet and digital libraries, it is increasingly difficult for library practitioners to categorize both electronic documents and traditional library materials using just a manual approach. To improve the effectiveness and efficiency of document categorization at the library setting, more in-depth studies of using automatic document classification methods to categorize library items are required. Machine learning research has advanced rapidly in recent years. However, applying machine learning techniques to improve library practice is still a relatively unexplored area. This paper illustrates the design and development of a machine learning based automatic document classification system to alleviate the manual categorization problem encountered within the library setting. Two supervised machine learning algorithms have been tested. Our empirical tests show that supervised machine learning algorithms in general, and the k-nearest neighbours (KNN) algorithm in particular, can be used to develop an effective document classification system to enhance current library practice. Moreover, some concrete recommendations regarding how to practically apply the KNN algorithm to develop automatic document classification in a library setting are made. To our best knowledge, this is the first in-depth study of applying the KNN algorithm to automatic document classification based on the widely used LCC classification scheme adopted by many large libraries.
Source: Journal of information science. 34(2008) no.2, S.213-230
Type: a

Chung, Y.-M.; Noh, Y.-H.: Developing a specialized directory system by automatically classifying Web documents (2003) 0.01

0.013727665 = product of:
  0.03775108 = sum of:
    0.0054389704 = product of:
      0.010877941 = sum of:
        0.010877941 = weight(_text_:h in 1566) [ClassicSimilarity], result of:
          0.010877941 = score(doc=1566,freq=2.0), product of:
            0.0660481 = queryWeight, product of:
              2.4844491 = idf(docFreq=10020, maxDocs=44218)
              0.026584605 = queryNorm
            0.16469726 = fieldWeight in 1566, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.4844491 = idf(docFreq=10020, maxDocs=44218)
              0.046875 = fieldNorm(doc=1566)
      0.5 = coord(1/2)
    0.007770999 = weight(_text_:a in 1566) [ClassicSimilarity], result of:
      0.007770999 = score(doc=1566,freq=22.0), product of:
        0.030653298 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.026584605 = queryNorm
        0.25351265 = fieldWeight in 1566, product of:
          4.690416 = tf(freq=22.0), with freq of:
            22.0 = termFreq=22.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=1566)
    0.0020832212 = weight(_text_:s in 1566) [ClassicSimilarity], result of:
      0.0020832212 = score(doc=1566,freq=2.0), product of:
        0.028903782 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.026584605 = queryNorm
        0.072074346 = fieldWeight in 1566, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.046875 = fieldNorm(doc=1566)
    0.022457888 = weight(_text_:k in 1566) [ClassicSimilarity], result of:
      0.022457888 = score(doc=1566,freq=2.0), product of:
        0.09490114 = queryWeight, product of:
          3.569778 = idf(docFreq=3384, maxDocs=44218)
          0.026584605 = queryNorm
        0.23664509 = fieldWeight in 1566, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.569778 = idf(docFreq=3384, maxDocs=44218)
          0.046875 = fieldNorm(doc=1566)
  0.36363637 = coord(4/11)

Abstract: This study developed a specialized directory system using an automatic classification technique. Economics was selected as the subject field for the classification experiments with Web documents. The classification scheme of the directory follows the DDC, and subject terms representing each class number or subject category were selected from the DDC table to construct a representative term dictionary. In collecting and classifying the Web documents, various strategies were tested in order to find the optimal thresholds. In the classification experiments, Web documents in economics were classified into a total of 757 hierarchical subject categories built from the DDC scheme. The first and second experiments using the representative term dictionary resulted in relatively high precision ratios of 77 and 60%, respectively. The third experiment employing a machine learning-based k-nearest neighbours (kNN) classifier in a closed experimental setting achieved a precision ratio of 96%. This implies that it is possible to enhance the classification performance by applying a hybrid method combining a dictionary-based technique and a kNN classifier
Source: Journal of information science. 29(2003) no.2, S.117-126
Type: a

Kwon, O.W.; Lee, J.H.: Text categorization based on k-nearest neighbor approach for web site classification (2003) 0.01

0.013652633 = product of:
  0.050059654 = sum of:
    0.0064758323 = weight(_text_:a in 1070) [ClassicSimilarity], result of:
      0.0064758323 = score(doc=1070,freq=22.0), product of:
        0.030653298 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.026584605 = queryNorm
        0.21126054 = fieldWeight in 1070, product of:
          4.690416 = tf(freq=22.0), with freq of:
            22.0 = termFreq=22.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1070)
    0.0017360178 = weight(_text_:s in 1070) [ClassicSimilarity], result of:
      0.0017360178 = score(doc=1070,freq=2.0), product of:
        0.028903782 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.026584605 = queryNorm
        0.060061958 = fieldWeight in 1070, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1070)
    0.041847803 = weight(_text_:k in 1070) [ClassicSimilarity], result of:
      0.041847803 = score(doc=1070,freq=10.0), product of:
        0.09490114 = queryWeight, product of:
          3.569778 = idf(docFreq=3384, maxDocs=44218)
          0.026584605 = queryNorm
        0.44096208 = fieldWeight in 1070, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.569778 = idf(docFreq=3384, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1070)
  0.27272728 = coord(3/11)

Abstract: Automatic categorization is a viable method to deal with the scaling problem on the World Wide Web. For Web site classification, this paper proposes the use of Web pages linked with the home page in a different manner from the sole use of home pages in previous research. To implement our proposed method, we derive a scheme for Web site classification based on the k-nearest neighbor (k-NN) approach. It consists of three phases: Web page selection (connectivity analysis), Web page classification, and Web site classification. Given a Web site, the Web page selection chooses several representative Web pages using connectivity analysis. The k-NN classifier next classifies each of the selected Web pages. Finally, the classified Web pages are extended to a classification of the entire Web site. To improve performance, we supplement the k-NN approach with a feature selection method and a term weighting scheme using markup tags, and also reform its document-document similarity measure. In our experiments on a Korean commercial Web directory, the proposed system, using both a home page and its linked pages, improved the performance of micro-averaging breakeven point by 30.02%, compared with an ordinary classification which uses a home page only.
Source: Information processing and management. 39(2003) no.1, S.25-44
Type: a

Search (216 results, page 1 of 11)

Authors

Years

Languages

Types

Themes

Subjects