Search (25 results, page 2 of 2)

Han, K.; Rezapour, R.; Nakamura, K.; Devkota, D.; Miller, D.C.; Diesner, J.: ¬An expert-in-the-loop method for domain-specific document categorization based on small training data (2023) 0.01
```
0.007890998 = product of:
  0.015781997 = sum of:
    0.015781997 = product of:
      0.031563994 = sum of:
        0.031563994 = weight(_text_:b in 967) [ClassicSimilarity], result of:
          0.031563994 = score(doc=967,freq=2.0), product of:
            0.16126883 = queryWeight, product of:
              3.542962 = idf(docFreq=3476, maxDocs=44218)
              0.045518078 = queryNorm
            0.19572285 = fieldWeight in 967, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.542962 = idf(docFreq=3476, maxDocs=44218)
              0.0390625 = fieldNorm(doc=967)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Automated text categorization methods are of broad relevance for domain experts since they free researchers and practitioners from manual labeling, save their resources (e.g., time, labor), and enrich the data with information helpful to study substantive questions. Despite a variety of newly developed categorization methods that require substantial amounts of annotated data, little is known about how to build models when (a) labeling texts with categories requires substantial domain expertise and/or in-depth reading, (b) only a few annotated documents are available for model training, and (c) no relevant computational resources, such as pretrained models, are available. In a collaboration with environmental scientists who study the socio-ecological impact of funded biodiversity conservation projects, we develop a method that integrates deep domain expertise with computational models to automatically categorize project reports based on a small sample of 93 annotated documents. Our results suggest that domain expertise can improve automated categorization and that the magnitude of these improvements is influenced by the experts' understanding of categories and their confidence in their annotation, as well as data sparsity and additional category characteristics such as the portion of exclusive keywords that can identify a category.

Mengle, S.; Goharian, N.: Passage detection using text classification (2009) 0.01

0.0077088396 = product of:
  0.015417679 = sum of:
    0.015417679 = product of:
      0.030835358 = sum of:
        0.030835358 = weight(_text_:22 in 2765) [ClassicSimilarity], result of:
          0.030835358 = score(doc=2765,freq=2.0), product of:
            0.15939656 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045518078 = queryNorm
            0.19345059 = fieldWeight in 2765, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2765)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 3.2009 19:14:43

Billal, B.; Fonseca, A.; Sadat, F.; Lounis, H.: Semi-supervised learning and social media text analysis towards multi-labeling categorization (2017) 0.01

0.0063127987 = product of:
  0.012625597 = sum of:
    0.012625597 = product of:
      0.025251195 = sum of:
        0.025251195 = weight(_text_:b in 4095) [ClassicSimilarity], result of:
          0.025251195 = score(doc=4095,freq=2.0), product of:
            0.16126883 = queryWeight, product of:
              3.542962 = idf(docFreq=3476, maxDocs=44218)
              0.045518078 = queryNorm
            0.15657827 = fieldWeight in 4095, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.542962 = idf(docFreq=3476, maxDocs=44218)
              0.03125 = fieldNorm(doc=4095)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Altinel, B.; Ganiz, M.C.: Semantic text classification : a survey of past and recent advances (2018) 0.01

0.0063127987 = product of:
  0.012625597 = sum of:
    0.012625597 = product of:
      0.025251195 = sum of:
        0.025251195 = weight(_text_:b in 5051) [ClassicSimilarity], result of:
          0.025251195 = score(doc=5051,freq=2.0), product of:
            0.16126883 = queryWeight, product of:
              3.542962 = idf(docFreq=3476, maxDocs=44218)
              0.045518078 = queryNorm
            0.15657827 = fieldWeight in 5051, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.542962 = idf(docFreq=3476, maxDocs=44218)
              0.03125 = fieldNorm(doc=5051)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Khoo, C.S.G.; Ng, K.; Ou, S.: ¬An exploratory study of human clustering of Web pages (2003) 0.01

0.0061670714 = product of:
  0.012334143 = sum of:
    0.012334143 = product of:
      0.024668286 = sum of:
        0.024668286 = weight(_text_:22 in 2741) [ClassicSimilarity], result of:
          0.024668286 = score(doc=2741,freq=2.0), product of:
            0.15939656 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045518078 = queryNorm
            0.15476047 = fieldWeight in 2741, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=2741)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 12. 9.2004 9:56:22

Search (25 results, page 2 of 2)

Authors

Years

Languages

Themes