Search (4 results, page 1 of 1)

  • × author_ss:"Cui, H."
  • × year_i:[2010 TO 2020}
  1. Cui, H.: Competency evaluation of plant character ontologies against domain literature (2010) 0.02
    0.01597124 = product of:
      0.03194248 = sum of:
        0.014865918 = weight(_text_:information in 3466) [ClassicSimilarity], result of:
          0.014865918 = score(doc=3466,freq=6.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.16796975 = fieldWeight in 3466, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3466)
        0.01707656 = product of:
          0.03415312 = sum of:
            0.03415312 = weight(_text_:22 in 3466) [ClassicSimilarity], result of:
              0.03415312 = score(doc=3466,freq=2.0), product of:
                0.17654699 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050415643 = queryNorm
                0.19345059 = fieldWeight in 3466, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3466)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Specimen identification keys are still the most commonly created tools used by systematic biologists to access biodiversity information. Creating identification keys requires analyzing and synthesizing large amounts of information from specimens and their descriptions and is a very labor-intensive and time-consuming activity. Automating the generation of identification keys from text descriptions becomes a highly attractive text mining application in the biodiversity domain. Fine-grained semantic annotation of morphological descriptions of organisms is a necessary first step in generating keys from text. Machine-readable ontologies are needed in this process because most biological characters are only implied (i.e., not stated) in descriptions. The immediate question to ask is How well do existing ontologies support semantic annotation and automated key generation? With the intention to either select an existing ontology or develop a unified ontology based on existing ones, this paper evaluates the coverage, semantic consistency, and inter-ontology agreement of a biodiversity character ontology and three plant glossaries that may be turned into ontologies. The coverage and semantic consistency of the ontology/glossaries are checked against the authoritative domain literature, namely, Flora of North America and Flora of China. The evaluation results suggest that more work is needed to improve the coverage and interoperability of the ontology/glossaries. More concepts need to be added to the ontology/glossaries and careful work is needed to improve the semantic consistency. The method used in this paper to evaluate the ontology/glossaries can be used to propose new candidate concepts from the domain literature and suggest appropriate definitions.
    Date
    1. 6.2010 9:55:22
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.6, S.1144-1165
  2. Cui, H.: CharaParser for fine-grained semantic annotation of organism morphological descriptions (2012) 0.01
    0.014919861 = product of:
      0.029839722 = sum of:
        0.01213797 = weight(_text_:information in 45) [ClassicSimilarity], result of:
          0.01213797 = score(doc=45,freq=4.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.13714671 = fieldWeight in 45, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=45)
        0.017701752 = product of:
          0.035403505 = sum of:
            0.035403505 = weight(_text_:organization in 45) [ClassicSimilarity], result of:
              0.035403505 = score(doc=45,freq=2.0), product of:
                0.17974974 = queryWeight, product of:
                  3.5653565 = idf(docFreq=3399, maxDocs=44218)
                  0.050415643 = queryNorm
                0.19695997 = fieldWeight in 45, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5653565 = idf(docFreq=3399, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=45)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Biodiversity information organization is looking beyond the traditional document-level metadata approach and has started to look into factual content in textual documents to support more intelligent and semantic-based access. This article reports the development and evaluation of CharaParser, a software application for semantic annotation of morphological descriptions. CharaParser annotates semistructured morphological descriptions in such a detailed manner that all stated morphological characters of an organ are marked up in Extensible Markup Language format. Using an unsupervised machine learning algorithm and a general purpose syntactic parser as its key annotation tools, CharaParser requires minimal additional knowledge engineering work and seems to perform well across different description collections and/or taxon groups. The system has been formally evaluated on over 1,000 sentences randomly selected from Volume 19 of Flora of North American and Part H of Treatise on Invertebrate Paleontology. CharaParser reaches and exceeds 90% in sentence-wise recall and precision, exceeding other similar systems reported in the literature. It also significantly outperforms a heuristic rule-based system we developed earlier. Early evidence that enriching the lexicon of a syntactic parser with domain terms alone may be sufficient to adapt the parser for the biodiversity domain is also observed and may have significant implications.
    Source
    Journal of the American Society for Information Science and Technology. 63(2012) no.4, S.738-754
  3. Mao, J.; Cui, H.: Identifying bacterial biotope entities using sequence labeling : performance and feature analysis (2018) 0.00
    0.0030344925 = product of:
      0.01213797 = sum of:
        0.01213797 = weight(_text_:information in 4462) [ClassicSimilarity], result of:
          0.01213797 = score(doc=4462,freq=4.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.13714671 = fieldWeight in 4462, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4462)
      0.25 = coord(1/4)
    
    Abstract
    Habitat information is important to biodiversity conservation and research. Extracting bacterial biotope entities from scientific publications is important to large scale study of the relationships between bacteria and their living environments. To facilitate the further development of robust habitat text mining systems for biodiversity, following the BioNLP task framework, three sequence labeling techniques, CRFs (Conditional Random Fields), MEMM (Maximum Entropy Markov Model) and SVMhmm (Support Vector Machine) and one classifier, SVMmulticlass, are compared on their performance in identifying three types of bacterial biotope entities: bacteria, habitats and geographical locations. The effectiveness of a variety of basic word formation features, syntactic features, and semantic features are exploited and compared for the three sequence labeling methods. Experiments on two publicly available BioNLP collections show that, in addition to a WordNet feature, word embedding featured clusters (although not trained with the task-specific corpus) consistently improve the performance for all methods on all entity types in both collections. Other features produce various results. Our results also show that when trained on limited corpora, Brown clusters resulted in better performance than word embedding clusters did. Further analysis suggests that the entity recognition performance can be greatly boosted through improving the accuracy of entity boundary identification.
    Source
    Journal of the Association for Information Science and Technology. 69(2018) no.9, S.1134-1147
  4. Cui, H.; Boufford, D.; Selden, P.: Semantic annotation of biosystematics literature without training examples (2010) 0.00
    0.0025748524 = product of:
      0.01029941 = sum of:
        0.01029941 = weight(_text_:information in 3422) [ClassicSimilarity], result of:
          0.01029941 = score(doc=3422,freq=2.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.116372846 = fieldWeight in 3422, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=3422)
      0.25 = coord(1/4)
    
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.3, S.522-542