Search (5 results, page 1 of 1)

  • × theme_ss:"Automatisches Klassifizieren"
  • × year_i:[2010 TO 2020}
  1. Zhu, W.Z.; Allen, R.B.: Document clustering using the LSI subspace signature model (2013) 0.07
    0.073640086 = product of:
      0.14728017 = sum of:
        0.14728017 = sum of:
          0.10608215 = weight(_text_:maps in 690) [ClassicSimilarity], result of:
            0.10608215 = score(doc=690,freq=2.0), product of:
              0.28477904 = queryWeight, product of:
                5.619245 = idf(docFreq=435, maxDocs=44218)
                0.050679237 = queryNorm
              0.37250686 = fieldWeight in 690, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.619245 = idf(docFreq=435, maxDocs=44218)
                0.046875 = fieldNorm(doc=690)
          0.041198023 = weight(_text_:22 in 690) [ClassicSimilarity], result of:
            0.041198023 = score(doc=690,freq=2.0), product of:
              0.17747006 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.050679237 = queryNorm
              0.23214069 = fieldWeight in 690, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=690)
      0.5 = coord(1/2)
    
    Abstract
    We describe the latent semantic indexing subspace signature model (LSISSM) for semantic content representation of unstructured text. Grounded on singular value decomposition, the model represents terms and documents by the distribution signatures of their statistical contribution across the top-ranking latent concept dimensions. LSISSM matches term signatures with document signatures according to their mapping coherence between latent semantic indexing (LSI) term subspace and LSI document subspace. LSISSM does feature reduction and finds a low-rank approximation of scalable and sparse term-document matrices. Experiments demonstrate that this approach significantly improves the performance of major clustering algorithms such as standard K-means and self-organizing maps compared with the vector space model and the traditional LSI model. The unique contribution ranking mechanism in LSISSM also improves the initialization of standard K-means compared with random seeding procedure, which sometimes causes low efficiency and effectiveness of clustering. A two-stage initialization strategy based on LSISSM significantly reduces the running time of standard K-means procedures.
    Date
    23. 3.2013 13:22:36
  2. Mu, T.; Goulermas, J.Y.; Korkontzelos, I.; Ananiadou, S.: Descriptive document clustering via discriminant learning in a co-embedded space of multilevel similarities (2016) 0.02
    0.022100445 = product of:
      0.04420089 = sum of:
        0.04420089 = product of:
          0.08840178 = sum of:
            0.08840178 = weight(_text_:maps in 2496) [ClassicSimilarity], result of:
              0.08840178 = score(doc=2496,freq=2.0), product of:
                0.28477904 = queryWeight, product of:
                  5.619245 = idf(docFreq=435, maxDocs=44218)
                  0.050679237 = queryNorm
                0.31042236 = fieldWeight in 2496, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.619245 = idf(docFreq=435, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2496)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Descriptive document clustering aims at discovering clusters of semantically interrelated documents together with meaningful labels to summarize the content of each document cluster. In this work, we propose a novel descriptive clustering framework, referred to as CEDL. It relies on the formulation and generation of 2 types of heterogeneous objects, which correspond to documents and candidate phrases, using multilevel similarity information. CEDL is composed of 5 main processing stages. First, it simultaneously maps the documents and candidate phrases into a common co-embedded space that preserves higher-order, neighbor-based proximities between the combined sets of documents and phrases. Then, it discovers an approximate cluster structure of documents in the common space. The third stage extracts promising topic phrases by constructing a discriminant model where documents along with their cluster memberships are used as training instances. Subsequently, the final cluster labels are selected from the topic phrases using a ranking scheme using multiple scores based on the extracted co-embedding information and the discriminant output. The final stage polishes the initial clusters to reduce noise and accommodate the multitopic nature of documents. The effectiveness and competitiveness of CEDL is demonstrated qualitatively and quantitatively with experiments using document databases from different application fields.
  3. HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.02
    0.017165843 = product of:
      0.034331687 = sum of:
        0.034331687 = product of:
          0.06866337 = sum of:
            0.06866337 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
              0.06866337 = score(doc=2748,freq=2.0), product of:
                0.17747006 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050679237 = queryNorm
                0.38690117 = fieldWeight in 2748, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=2748)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    1. 2.2016 18:25:22
  4. Egbert, J.; Biber, D.; Davies, M.: Developing a bottom-up, user-based method of web register classification (2015) 0.01
    0.010299506 = product of:
      0.020599011 = sum of:
        0.020599011 = product of:
          0.041198023 = sum of:
            0.041198023 = weight(_text_:22 in 2158) [ClassicSimilarity], result of:
              0.041198023 = score(doc=2158,freq=2.0), product of:
                0.17747006 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050679237 = queryNorm
                0.23214069 = fieldWeight in 2158, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2158)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    4. 8.2015 19:22:04
  5. Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.01
    0.008582922 = product of:
      0.017165843 = sum of:
        0.017165843 = product of:
          0.034331687 = sum of:
            0.034331687 = weight(_text_:22 in 1107) [ClassicSimilarity], result of:
              0.034331687 = score(doc=1107,freq=2.0), product of:
                0.17747006 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050679237 = queryNorm
                0.19345059 = fieldWeight in 1107, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1107)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    28.10.2013 19:22:57