Search (1 results, page 1 of 1)

  • × author_ss:"Mostafa, J."
  • × author_ss:"Seki, K."
  • × year_i:[2000 TO 2010}
  1. Seki, K.; Mostafa, J.: Gene ontology annotation as text categorization : an empirical study (2008) 0.00
    0.0022137975 = product of:
      0.01771038 = sum of:
        0.01771038 = product of:
          0.05313114 = sum of:
            0.05313114 = weight(_text_:problem in 2123) [ClassicSimilarity], result of:
              0.05313114 = score(doc=2123,freq=6.0), product of:
                0.13082431 = queryWeight, product of:
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.030822188 = queryNorm
                0.4061259 = fieldWeight in 2123, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2123)
          0.33333334 = coord(1/3)
      0.125 = coord(1/8)
    
    Abstract
    Gene ontology (GO) consists of three structured controlled vocabularies, i.e., GO domains, developed for describing attributes of gene products, and its annotation is crucial to provide a common gateway to access different model organism databases. This paper explores an effective application of text categorization methods to this highly practical problem in biology. As a first step, we attempt to tackle the automatic GO annotation task posed in the Text Retrieval Conference (TREC) 2004 Genomics Track. Given a pair of genes and an article reference where the genes appear, the task simulates assigning GO domain codes. We approach the problem with careful consideration of the specialized terminology and pay special attention to various forms of gene synonyms, so as to exhaustively locate the occurrences of the target gene. We extract the words around the spotted gene occurrences and used them to represent the gene for GO domain code annotation. We regard the task as a text categorization problem and adopt a variant of kNN with supervised term weighting schemes, making our method among the top-performing systems in the TREC official evaluation. Furthermore, we investigate different feature selection policies in conjunction with the treatment of terms associated with negative instances. Our experiments reveal that round-robin feature space allocation with eliminating negative terms substantially improves performance as GO terms become specific.