Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft / Powered by litecat, BIS Oldenburg (Stand: 04. Juni 2021)
1Iorio, A.D. ; Peroni, S. ; Poggi, F. ; Vitali, F.: Dealing with structural patterns of XML documents.
In: Journal of the Association for Information Science and Technology. 65(2014) no.9, S.1884-1900.
Abstract: Evaluating collections of XML documents without paying attention to the schema they were written in may give interesting insights into the expected characteristics of a markup language, as well as any regularity that may span vocabularies and languages, and that are more fundamental and frequent than plain content models. In this paper we explore the idea of structural patterns in XML vocabularies, by examining the characteristics of elements as they are used, rather than as they are defined. We introduce from the ground up a formal theory of 8 plus 3 structural patterns for XML elements, and verify their identifiability in a number of different XML vocabularies. The results allowed the creation of visualization and content extraction tools that are completely independent of the schema and without any previous knowledge of the semantics and organization of the XML vocabulary of the documents.