Document (#35423)

Author
Cui, H.
Boufford, D.
Selden, P.
Title
Semantic annotation of biosystematics literature without training examples
Source
Journal of the American Society for Information Science and Technology. 61(2010) no.3, S.522-542
Year
2010
Abstract
This article presents an unsupervised algorithm for semantic annotation of morphological descriptions of whole organisms. The algorithm is able to annotate plain text descriptions with high accuracy at the clause level by exploiting the corpus itself. In other words, the algorithm does not need lexicons, syntactic parsers, training examples, or annotation templates. The evaluation on two real-life description collections in botany and paleontology shows that the algorithm has the following desirable features: (a) reduces/eliminates manual labor required to compile dictionaries and prepare source documents; (b) improves annotation coverage: the algorithm annotates what appears in documents and is not limited by predefined and often incomplete templates; (c) learns clean and reusable concepts: the algorithm learns organ names and character states that can be used to construct reusable domain lexicons, as opposed to collection-dependent patterns whose applicability is often limited to a particular collection; (d) insensitive to collection size; and (e) runs in linear time with respect to the number of clauses to be annotated.
Theme
Automatisches Indexieren
Field
Biologie

Similar documents (content)

  1. Cui, H.: CharaParser for fine-grained semantic annotation of organism morphological descriptions (2012) 0.16
    0.16201241 = sum of:
      0.16201241 = product of:
        0.67505175 = sum of:
          0.03046603 = weight(abstract_txt:semantic in 45) [ClassicSimilarity], result of:
            0.03046603 = score(doc=45,freq=2.0), product of:
              0.07703599 = queryWeight, product of:
                1.0604178 = boost
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.01623639 = queryNorm
              0.39547786 = fieldWeight in 45, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.0625 = fieldNorm(doc=45)
          0.08733732 = weight(abstract_txt:organ in 45) [ClassicSimilarity], result of:
            0.08733732 = score(doc=45,freq=1.0), product of:
              0.15545917 = queryWeight, product of:
                1.0651808 = boost
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.01623639 = queryNorm
              0.5618023 = fieldWeight in 45, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.0625 = fieldNorm(doc=45)
          0.11686097 = weight(abstract_txt:annotates in 45) [ClassicSimilarity], result of:
            0.11686097 = score(doc=45,freq=1.0), product of:
              0.18876845 = queryWeight, product of:
                1.1737615 = boost
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.01623639 = queryNorm
              0.6190705 = fieldWeight in 45, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.0625 = fieldNorm(doc=45)
          0.07065019 = weight(abstract_txt:descriptions in 45) [ClassicSimilarity], result of:
            0.07065019 = score(doc=45,freq=2.0), product of:
              0.13496628 = queryWeight, product of:
                1.4035982 = boost
                5.9223356 = idf(docFreq=321, maxDocs=44218)
                0.01623639 = queryNorm
              0.52346545 = fieldWeight in 45, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9223356 = idf(docFreq=321, maxDocs=44218)
                0.0625 = fieldNorm(doc=45)
          0.23573747 = weight(abstract_txt:annotation in 45) [ClassicSimilarity], result of:
            0.23573747 = score(doc=45,freq=2.0), product of:
              0.3797043 = queryWeight, product of:
                3.3294148 = boost
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.01623639 = queryNorm
              0.6208449 = fieldWeight in 45, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.0625 = fieldNorm(doc=45)
          0.13399976 = weight(abstract_txt:algorithm in 45) [ClassicSimilarity], result of:
            0.13399976 = score(doc=45,freq=1.0), product of:
              0.3757822 = queryWeight, product of:
                4.056569 = boost
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.01623639 = queryNorm
              0.35658893 = fieldWeight in 45, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.0625 = fieldNorm(doc=45)
        0.24 = coord(6/25)
    
  2. Malo, P.; Sinha, A.; Wallenius, J.; Korhonen, P.: Concept-based document classification using Wikipedia and value function (2011) 0.09
    0.092110075 = sum of:
      0.092110075 = product of:
        0.46055037 = sum of:
          0.026928421 = weight(abstract_txt:semantic in 4948) [ClassicSimilarity], result of:
            0.026928421 = score(doc=4948,freq=1.0), product of:
              0.07703599 = queryWeight, product of:
                1.0604178 = boost
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.01623639 = queryNorm
              0.34955636 = fieldWeight in 4948, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.078125 = fieldNorm(doc=4948)
          0.04016301 = weight(abstract_txt:training in 4948) [ClassicSimilarity], result of:
            0.04016301 = score(doc=4948,freq=1.0), product of:
              0.10056277 = queryWeight, product of:
                1.2115707 = boost
                5.112096 = idf(docFreq=723, maxDocs=44218)
                0.01623639 = queryNorm
              0.39938247 = fieldWeight in 4948, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.112096 = idf(docFreq=723, maxDocs=44218)
                0.078125 = fieldNorm(doc=4948)
          0.04529595 = weight(abstract_txt:collection in 4948) [ClassicSimilarity], result of:
            0.04529595 = score(doc=4948,freq=1.0), product of:
              0.12472583 = queryWeight, product of:
                1.6525477 = boost
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.01623639 = queryNorm
              0.36316413 = fieldWeight in 4948, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.078125 = fieldNorm(doc=4948)
          0.18066327 = weight(abstract_txt:learns in 4948) [ClassicSimilarity], result of:
            0.18066327 = score(doc=4948,freq=1.0), product of:
              0.27403098 = queryWeight, product of:
                2.0 = boost
                8.43879 = idf(docFreq=25, maxDocs=44218)
                0.01623639 = queryNorm
              0.6592805 = fieldWeight in 4948, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.43879 = idf(docFreq=25, maxDocs=44218)
                0.078125 = fieldNorm(doc=4948)
          0.16749972 = weight(abstract_txt:algorithm in 4948) [ClassicSimilarity], result of:
            0.16749972 = score(doc=4948,freq=1.0), product of:
              0.3757822 = queryWeight, product of:
                4.056569 = boost
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.01623639 = queryNorm
              0.44573617 = fieldWeight in 4948, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.078125 = fieldNorm(doc=4948)
        0.2 = coord(5/25)
    
  3. Robert, C.A.; Davis, A.: Annotation and its application to information research in economic intelligence (2006) 0.09
    0.08784058 = sum of:
      0.08784058 = product of:
        0.7320048 = sum of:
          0.0973652 = weight(abstract_txt:annotate in 2288) [ClassicSimilarity], result of:
            0.0973652 = score(doc=2288,freq=1.0), product of:
              0.1440386 = queryWeight, product of:
                1.0253086 = boost
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.01623639 = queryNorm
              0.675966 = fieldWeight in 2288, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.078125 = fieldNorm(doc=2288)
          0.04529595 = weight(abstract_txt:collection in 2288) [ClassicSimilarity], result of:
            0.04529595 = score(doc=2288,freq=1.0), product of:
              0.12472583 = queryWeight, product of:
                1.6525477 = boost
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.01623639 = queryNorm
              0.36316413 = fieldWeight in 2288, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.078125 = fieldNorm(doc=2288)
          0.58934367 = weight(abstract_txt:annotation in 2288) [ClassicSimilarity], result of:
            0.58934367 = score(doc=2288,freq=8.0), product of:
              0.3797043 = queryWeight, product of:
                3.3294148 = boost
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.01623639 = queryNorm
              1.5521122 = fieldWeight in 2288, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.078125 = fieldNorm(doc=2288)
        0.12 = coord(3/25)
    
  4. Vallet, D.; Fernández, M.; Castells, P.: ¬An ontology-based information retrieval model (2005) 0.08
    0.08325987 = sum of:
      0.08325987 = product of:
        0.6938323 = sum of:
          0.05596967 = weight(abstract_txt:semantic in 4708) [ClassicSimilarity], result of:
            0.05596967 = score(doc=4708,freq=3.0), product of:
              0.07703599 = queryWeight, product of:
                1.0604178 = boost
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.01623639 = queryNorm
              0.7265392 = fieldWeight in 4708, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.09375 = fieldNorm(doc=4708)
          0.35360622 = weight(abstract_txt:annotation in 4708) [ClassicSimilarity], result of:
            0.35360622 = score(doc=4708,freq=2.0), product of:
              0.3797043 = queryWeight, product of:
                3.3294148 = boost
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.01623639 = queryNorm
              0.9312674 = fieldWeight in 4708, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.09375 = fieldNorm(doc=4708)
          0.28425643 = weight(abstract_txt:algorithm in 4708) [ClassicSimilarity], result of:
            0.28425643 = score(doc=4708,freq=2.0), product of:
              0.3757822 = queryWeight, product of:
                4.056569 = boost
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.01623639 = queryNorm
              0.7564393 = fieldWeight in 4708, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.09375 = fieldNorm(doc=4708)
        0.12 = coord(3/25)
    
  5. Zhao, G.; Wu, J.; Wang, D.; Li, T.: Entity disambiguation to Wikipedia using collective ranking (2016) 0.08
    0.07547212 = sum of:
      0.07547212 = product of:
        0.47170073 = sum of:
          0.09033164 = weight(abstract_txt:plain in 3266) [ClassicSimilarity], result of:
            0.09033164 = score(doc=3266,freq=1.0), product of:
              0.13701549 = queryWeight, product of:
                8.43879 = idf(docFreq=25, maxDocs=44218)
                0.01623639 = queryNorm
              0.6592805 = fieldWeight in 3266, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.43879 = idf(docFreq=25, maxDocs=44218)
                0.078125 = fieldNorm(doc=3266)
          0.026928421 = weight(abstract_txt:semantic in 3266) [ClassicSimilarity], result of:
            0.026928421 = score(doc=3266,freq=1.0), product of:
              0.07703599 = queryWeight, product of:
                1.0604178 = boost
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.01623639 = queryNorm
              0.34955636 = fieldWeight in 3266, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.078125 = fieldNorm(doc=3266)
          0.14607622 = weight(abstract_txt:annotates in 3266) [ClassicSimilarity], result of:
            0.14607622 = score(doc=3266,freq=1.0), product of:
              0.18876845 = queryWeight, product of:
                1.1737615 = boost
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.01623639 = queryNorm
              0.7738381 = fieldWeight in 3266, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.078125 = fieldNorm(doc=3266)
          0.20836447 = weight(abstract_txt:annotation in 3266) [ClassicSimilarity], result of:
            0.20836447 = score(doc=3266,freq=1.0), product of:
              0.3797043 = queryWeight, product of:
                3.3294148 = boost
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.01623639 = queryNorm
              0.5487546 = fieldWeight in 3266, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.078125 = fieldNorm(doc=3266)
        0.16 = coord(4/25)