Search (6 results, page 1 of 1)

Cortez, E.; Herrera, M.R.; Silva, A.S. da; Moura, E.S. de; Neubert, M.: Lightweight methods for large-scale product categorization (2011) 0.02
```
0.020866206 = product of:
  0.08346482 = sum of:
    0.08346482 = weight(_text_:sites in 4758) [ClassicSimilarity], result of:
      0.08346482 = score(doc=4758,freq=2.0), product of:
        0.2408473 = queryWeight, product of:
          5.227637 = idf(docFreq=644, maxDocs=44218)
          0.046071928 = queryNorm
        0.34654665 = fieldWeight in 4758, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.227637 = idf(docFreq=644, maxDocs=44218)
          0.046875 = fieldNorm(doc=4758)
  0.25 = coord(1/4)
```
Abstract

In this article, we present a study about classification methods for large-scale categorization of product offers on e-shopping web sites. We present a study about the performance of previously proposed approaches and deployed a probabilistic approach to model the classification problem. We also studied an alternative way of modeling information about the description of product offers and investigated the usage of price and store of product offers as features adopted in the classification process. Our experiments used two collections of over a million product offers previously categorized by human editors and taxonomies of hundreds of categories from a real e-shopping web site. In these experiments, our method achieved an improvement of up to 9% in the quality of the categorization in comparison with the best baseline we have found.

HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.01

0.007802638 = product of:
  0.031210553 = sum of:
    0.031210553 = product of:
      0.062421106 = sum of:
        0.062421106 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
          0.062421106 = score(doc=2748,freq=2.0), product of:
            0.16133605 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046071928 = queryNorm
            0.38690117 = fieldWeight in 2748, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2748)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 1. 2.2016 18:25:22

Zhu, W.Z.; Allen, R.B.: Document clustering using the LSI subspace signature model (2013) 0.00

0.0046815826 = product of:
  0.01872633 = sum of:
    0.01872633 = product of:
      0.03745266 = sum of:
        0.03745266 = weight(_text_:22 in 690) [ClassicSimilarity], result of:
          0.03745266 = score(doc=690,freq=2.0), product of:
            0.16133605 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046071928 = queryNorm
            0.23214069 = fieldWeight in 690, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=690)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 23. 3.2013 13:22:36

Egbert, J.; Biber, D.; Davies, M.: Developing a bottom-up, user-based method of web register classification (2015) 0.00

0.0046815826 = product of:
  0.01872633 = sum of:
    0.01872633 = product of:
      0.03745266 = sum of:
        0.03745266 = weight(_text_:22 in 2158) [ClassicSimilarity], result of:
          0.03745266 = score(doc=2158,freq=2.0), product of:
            0.16133605 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046071928 = queryNorm
            0.23214069 = fieldWeight in 2158, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2158)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 4. 8.2015 19:22:04

Salles, T.; Rocha, L.; Gonçalves, M.A.; Almeida, J.M.; Mourão, F.; Meira Jr., W.; Viegas, F.: ¬A quantitative analysis of the temporal effects on automatic text classification (2016) 0.00
```
0.0044974573 = product of:
  0.01798983 = sum of:
    0.01798983 = product of:
      0.03597966 = sum of:
        0.03597966 = weight(_text_:design in 3014) [ClassicSimilarity], result of:
          0.03597966 = score(doc=3014,freq=2.0), product of:
            0.17322445 = queryWeight, product of:
              3.7598698 = idf(docFreq=2798, maxDocs=44218)
              0.046071928 = queryNorm
            0.20770542 = fieldWeight in 3014, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.7598698 = idf(docFreq=2798, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3014)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

Automatic text classification (TC) continues to be a relevant research topic and several TC algorithms have been proposed. However, the majority of TC algorithms assume that the underlying data distribution does not change over time. In this work, we are concerned with the challenges imposed by the temporal dynamics observed in textual data sets. We provide evidence of the existence of temporal effects in three textual data sets, reflected by variations observed over time in the class distribution, in the pairwise class similarities, and in the relationships between terms and classes. We then quantify, using a series of full factorial design experiments, the impact of these effects on four well-known TC algorithms. We show that these temporal effects affect each analyzed data set differently and that they restrict the performance of each considered TC algorithm to different extents. The reported quantitative analyses, which are the original contributions of this article, provide valuable new insights to better understand the behavior of TC algorithms when faced with nonstatic (temporal) data distributions and highlight important requirements for the proposal of more accurate classification models.

Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.00

0.003901319 = product of:
  0.015605276 = sum of:
    0.015605276 = product of:
      0.031210553 = sum of:
        0.031210553 = weight(_text_:22 in 1107) [ClassicSimilarity], result of:
          0.031210553 = score(doc=1107,freq=2.0), product of:
            0.16133605 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046071928 = queryNorm
            0.19345059 = fieldWeight in 1107, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1107)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 28.10.2013 19:22:57

Search (6 results, page 1 of 1)

Authors