Search (1 results, page 1 of 1)

Ziemba, L.: Information retrieval with concept discovery in digital collections for agriculture and natural resources (2011) 0.01
```
0.009693679 = product of:
  0.048468396 = sum of:
    0.048468396 = weight(_text_:thesaurus in 4728) [ClassicSimilarity], result of:
      0.048468396 = score(doc=4728,freq=2.0), product of:
        0.23732872 = queryWeight, product of:
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.051357865 = queryNorm
        0.20422474 = fieldWeight in 4728, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.03125 = fieldNorm(doc=4728)
  0.2 = coord(1/5)
```
Abstract

The amount and complexity of information available in a digital form is already huge and new information is being produced every day. Retrieving information relevant to address a particular need becomes a significant issue. This work utilizes knowledge organization systems (KOS), such as thesauri and ontologies and applies information extraction (IE) and computational linguistics (CL) techniques to organize, manage and retrieve information stored in digital collections in the agricultural domain. Two real world applications of the approach have been developed and are available and actively used by the public. An ontology is used to manage the Water Conservation Digital Library holding a dynamic collection of various types of digital resources in the domain of urban water conservation in Florida, USA. The ontology based back-end powers a fully operational web interface, available at http://library.conservefloridawater.org. The system has demonstrated numerous benefits of the ontology application, including accurate retrieval of resources, information sharing and reuse, and has proved to effectively facilitate information management. The major difficulty encountered with the approach is that large and dynamic number of concepts makes it difficult to keep the ontology consistent and to accurately catalog resources manually. To address the aforementioned issues, a combination of IE and CL techniques, such as Vector Space Model and probabilistic parsing, with the use of Agricultural Thesaurus were adapted to automatically extract concepts important for each of the texts in the Best Management Practices (BMP) Publication Library--a collection of documents in the domain of agricultural BMPs in Florida available at http://lyra.ifas.ufl.edu/LIB. A new approach of domain-specific concept discovery with the use of Internet search engine was developed. Initial evaluation of the results indicates significant improvement in precision of information extraction. The approach presented in this work focuses on problems unique to agriculture and natural resources domain, such as domain specific concepts and vocabularies, but should be applicable to any collection of texts in digital format. It may be of potential interest for anyone who needs to effectively manage a collection of digital resources.