Content Model¶
PyXB’s content model is used to complete the link between the Component Model and the Binding Model. These classes are the ones that:
- determine what Python class attribute is used to store which XML element or attribute;
- distinguish those elements that can occur at most once from those that require an aggregation; and
- ensure that the ordering and occurrence constraints imposed by the XML model group are satisfied, when XML is converted to Python instances and vice-versa.
Associating XML and Python Objects¶
Most of the classes involved in the content model are in the
pyxb.binding.content
module. The relations among these classes are
displayed in the following diagram.
In the standard code generation template, both element and attribute values are stored in Python class fields. As noted in Deconflicting Names it is necessary to ensure an attribute and an element which have the same name in their containing complex type have distinct names in the Python class corresponding to that type. Use information for each of these is maintained in the type class. This use information comprises:
- the original
name
of the element/attribute in the XML - its
deconflicted name
in Python - the private name by which the value is stored in the Python instance dictionary
Other information is specific to the type of use. The
pyxb.binding.basis.complexTypeDefinition
retains maps from the
component’s name the attribute use or element use instance corresponding to
the component’s use.
Attribute Uses¶
The information associated with an attribute use is recorded in an
pyxb.binding.content.AttributeUse
instance. This class provides:
- The
name
of the attribute - The
default value
of the attribute - Whether the attribute value is
fixed
- Whether the attribute use is
required
orprohibited
- The
type
of the attribute, as a subclass ofpyxb.binding.basis.simpleTypeDefinition
- Methods to
read
,set
, andreset
the value of the attribute in a given binding instance.
A map
is used
to map from expanded names to AttributeUse instances. This map is defined
within the class definition itself.
Element Uses¶
The element analog to an attribute use is an element declaration, and the
corresponding information is stored in a
pyxb.binding.content.ElementDeclaration
instance. This class provides:
- The
element binding
that defines the properties of the referenced element, including its type - Whether the use allows
multiple occurrences
- The
default value
of the element. Currently this is either C{None} or an empty list, depending onpyxb.binding.content.ElementDeclaration.isPlural
- Methods to
read
,set
,append to
(only for plural elements), andreset
the value of the element in a given binding instance - The
setOrAppend
method, which is most commonly used to provide new content to a value
A map
is used to
map from expanded names to ElementDeclaration instances. This map is defined
within the class definition itself. As mentioned before, when the same
element name appears at multiple places within the element content the uses
are collapsed into a single attribute on the complex type; thus the map is to
the ElementDeclaration
, not
the ElementUse
.
Validating the Content Model¶
As of PyXB 1.2.0, content validation is performed using the Finite Automata with Counters (FAC) data structure, as described in Regular Expressions with Numerical Constraints and Automata with Counters, Dag Hovland, Lecture Notes in Computer Science, 2009, Volume 5684, Theoretical Aspects of Computing - ICTAC 2009, Pages 231-245.
This structure allows accurate validation of occurrence and order constraints
without the complexity of the original back-tracking validation solution from
PyXB 1.1.1 and earlier. It also avoids the
incorrect rejection of valid documents that (rarely) occurred
with the greedy algorithm introduced in PyXB 1.1.2.
Conversion to this data structure also enabled the distinction between
element declaration
and
element use
nodes, allowing
diagnostics to trace back to the element references in context.
The data structures for the automaton and the configuration structure that represents a processing automaton are:
The implementation in PyXB is generally follows the description in the ICTAC 2009 paper. Calculation of first/follow sets has been enhanced to support term trees with more than two children per node. In addition, support for unordered catenation as required for the “all” model group is implemented by a state that maintains a distinct sub-automaton for each alternative, requiring a layered approach where executon of an automaton is suspended until the subordinate automaton has accepted and a transition out of it is encountered.
For more information on the implementation, please see the FAC module
. This module has been written to be independent of PyXB
infrastructure, and may be re-used in other code in accordance with the
PyXB license.
FAC and the PyXB Content Model¶
As depicted in the Content Model class diagram each
complex type binding class has a _Automaton
which encodes the content model of the type as a Finite Automaton with
Counters. This representation models the occurrence constraints and
sub-element orders, referencing the specific element and wildcard uses as they
appear in the schema. Each instance of a complex binding supports an
AutomatonConfiguration
that is used to validate the binding content against the model.
An ElementUse
instance is provided as
the metadata for automaton states that correspond an element declaration in the
schema. Similarly, a WildcardUse
instance is used as the metadata for automaton states that correspond to an
instance of the xs:any
wildcard schema component. Validation in the automaton delegates through the
SymbolMatch_mixin
interface to see
whether content in the form of a complex type binding instance is conformant
to the restrictions on symbols associated with a particular state.
When parsing, a transition taken results in the storage of the consumed symbol
into the appropriate element attribute or wildcard list in the binding
instance. In many cases, the transition from one state to a next is uniquely
determined by the content; as long as this condition holds, the
AutomatonConfiguration
instance retains a single underlying FAC Configuration
representing the current state.
To generate the XML corresponding to a binding instance, the element and
wildcard content of the instance are loaded into a Python dictionary, keyed by
the ElementDeclaration
.
These subordinate elements are appended to a list of child nodes as
transitions that recognize them are encountered. As of PyXB 1.2.0 the first legal transition in the order imposed by the schema is
taken, and there is no provision for influencing the order in the generated
document when multiple orderings are valid.