Using Binding Classes¶
Python instances corresponding to XML structures can be created in two primary ways: from XML documents, and directly within Python code. Generating XML documents from bindings can also be controlled.
Creating Instances from XML Documents¶
XML documents are converted into Python bindings by invoking the
CreateFromDocument
function in a binding module. For example:
from __future__ import print_function
import po3
order = po3.CreateFromDocument(open('po3.xml').read())
print('%s is sending %s %d thing(s):' % (order.billTo.name, order.shipTo.name, len(order.items.item)))
for item in order.items.item:
print(' Quantity %d of %s at $%s' % (item.quantity, item.productName, item.USPrice))
The CreateFromDocument
function in a given binding module is configured
so that documents with no default namespace are assumed to be in the
namespace from which the binding was generated.
Locating Invalid Content¶
If a document does not validate, PyXB will generally through an
pyxb.UnrecognizedContentError
exception. You can determine where the
problem lies, and what was not recognized, by examining attributes present
on the exception as shown in this example:
from __future__ import print_function
import pyxb
import po1
xml = open('badcontent.xml').read()
try:
order = po1.CreateFromDocument(xml, location_base='badcontent.xml')
except pyxb.ValidationError as e:
print(e.details())
which produces:
The containing element shipTo is defined at po1.xsd[6:6].
The containing element type USAddress is defined at po1.xsd[13:2]
The unrecognized content streeet begins at badcontent.xml[5:4]
The USAddress automaton is not in an accepting state.
The following element and wildcard content would be accepted:
An element street per po1.xsd[16:6]
Coping With Wrong xsi:type
Attributes¶
Some web services and binding tools mis-use xsi:type, providing attribute values
that either are not types, or do not specify a type that is derived from an
abstract type. The
pyxb.namespace.builtin.XMLSchema_instance.ProcessTypeAttribute
method
can be used to relax how PyXB processes those attributes.
Creating Instances in Python Code¶
Creating bindings from XML documents is straightforward, because the documents contain enough information to identify each element and attribute, locate the corresponding use in the binding class, and store a value that is converted to the appropriate type. Creating values in Python is inherently more complex, because native Python objects like strings and integers do not contain this information.
As described in Binding Model, binding classes corresponding to simple
types extend the underlying Python type (such as str
or int
), and
add XML-specific information like the canonical representation of the value
in Unicode, which is the natural representation
as XML text. These classes also maintain a set of facets that constrain the
values that can be stored as instances when validation is active. Binding
classes for complex types have constructors that parse positional and
keyword parameters to determine the appropriate element or attribute to
which the value belongs. Attributes are assigned using keyword parameters.
Content is assigned using positional parameters. The order of the
positional parameters must be consistent with the order expected by the
content model.
Using the schema in the namespace-aware address schema, we can begin to construct the example document in Python:
from __future__ import print_function
import address
addr = address.USAddress()
addr.name = 'Robert Smith'
addr.street = '8 Oak Avenue'
addr.city = 'Anytown'
addr.state = 'AK'
addr.zip = 12341
print(addr.toxml("utf-8", element_name='USAddress').decode('utf-8'))
Note
It is necessary to provide an element_name
parameter to to_xml
because in this case USAddress
is the name of a complex type, not an
top-level schema element. PyXB cannot generate XML for an instance unless
it knows the name to use for the root element. In most situations PyXB can
figure out what element the instance belongs to, as when the instance is
created through an element binding instead of a type binding and when it is
assigned into another instance, both of which are seen in demo4c.
This produces:
<?xml version="1.0" encoding="utf-8"?><USAddress><name>Robert Smith</name><street>8 Oak Avenue</street><city>Anytown</city><state>AK</state><zip>12341</zip></USAddress>
Assigning to individual fields like this bypasses the complex type content model, although each field itself is validated. For example, the address schema does not include New York as a state, so the following assignment:
addr.state = 'NY'
will cause a pyxb.exceptions_.BadTypeValueError
exception to be raised:
However, the order of the field assignments does not matter, as long as all required fields are present by the time the XML document is generated.
from __future__ import print_function
import address
addr = address.USAddress()
addr.street = '8 Oak Avenue'
addr.state = 'AK'
addr.city = 'Anytown'
addr.zip = 12341
addr.name = 'Robert Smith'
print(addr.toxml("utf-8", element_name='USAddress').decode('utf-8'))
Alternatively, you can provide the content as positional parameters in the object creation call:
# examples/manual/demo4b.py
from __future__ import print_function
import address
addr = address.USAddress('Robert Smith', '8 Oak Avenue', 'Anytown', 'AK', 12341)
print(addr.toxml("utf-8", element_name='USAddress').decode('utf-8'))
This has the same effect, and is much more compact, but it does require that the order match the content model.
Attributes are set using keyword parameters:
# examples/manual/demo4c.py
from __future__ import print_function
import pyxb
import po4
import address
import pyxb.binding.datatypes as xs
po = po4.purchaseOrder(orderDate=xs.date(1999, 10, 20))
po.shipTo = address.USAddress('Alice Smith', '123 Maple Street', 'Anytown', 'AK', 12341)
po.billTo = address.USAddress('Robert Smith', '8 Oak Avenue', 'Anytown', 'AK', 12341)
# Disable validation since content is incomplete.
pyxb.RequireValidWhenGenerating(False)
print(po.toxml("utf-8").decode('utf-8'))
This example produces (after reformatting):
<?xml version="1.0" encoding="utf-8"?>
<ns1:purchaseOrder xmlns:ns1="URN:purchase-order" orderDate="1999-10-20">
<ns1:billTo>
<city>Anytown</city>
<state>AK</state>
<street>8 Oak Avenue</street>
<name>Robert Smith</name>
<zip>12341</zip>
</ns1:billTo>
<ns1:shipTo>
<city>Anytown</city>
<state>AK</state>
<street>123 Maple Street</street>
<name>Alice Smith</name>
<zip>12341</zip>
</ns1:shipTo>
</ns1:purchaseOrder>
Note that, because we’re in the middle of the example and have not provided
the items
element that the content model requires, the code
explicitly disables the requirement for
validation
when generating XML from a
binding instance. A consequence of this is that the generated XML is not
valid, and validation must be disabled for parsing
as well if the resulting document is to be
re-converted into a binding with CreateFromDocument
.
Creating Instances of Anonymous Types¶
The style of XML schema used for purchase orders uses anonymous types for the deeper elements of the purchase order:
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
targetNamespace="URN:purchase-order"
xmlns:tns="URN:purchase-order"
xmlns:address="URN:address"
elementFormDefault="qualified">
<xsd:import namespace="URN:address" schemaLocation="nsaddress.xsd"/>
<xsd:element name="purchaseOrder" type="tns:PurchaseOrderType"/>
<xsd:element name="comment" type="xsd:string"/>
<xsd:complexType name="PurchaseOrderType">
<xsd:sequence>
<xsd:element name="shipTo" type="address:USAddress"/>
<xsd:element name="billTo" type="address:USAddress"/>
<xsd:element ref="tns:comment" minOccurs="0"/>
<xsd:element name="items" type="tns:Items"/>
</xsd:sequence>
<xsd:attribute name="orderDate" type="xsd:date"/>
</xsd:complexType>
<xsd:complexType name="Items">
<xsd:sequence>
<xsd:element name="item" minOccurs="0" maxOccurs="unbounded">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="productName" type="xsd:string"/>
<xsd:element name="quantity">
<xsd:simpleType>
<xsd:restriction base="xsd:positiveInteger">
<xsd:maxExclusive value="100"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="USPrice" type="xsd:decimal"/>
<xsd:element ref="tns:comment" minOccurs="0"/>
<xsd:element name="shipDate" type="xsd:date" minOccurs="0"/>
</xsd:sequence>
<xsd:attribute name="partNum" type="tns:SKU" use="required"/>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
<!-- Stock Keeping Unit, a code for identifying products -->
<xsd:simpleType name="SKU">
<xsd:restriction base="xsd:string">
<xsd:pattern value="\d{3}-[A-Z]{2}"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:schema>
In particular, there is no global item
element that can be used to
create the individual items. For situations like this, we use
pyxb.BIND
:
from __future__ import print_function
import pyxb
import po4
import address
import pyxb.binding.datatypes as xs
po = po4.purchaseOrder(orderDate=xs.date(1999, 10, 20))
po.shipTo = address.USAddress('Alice Smith', '123 Maple Street', 'Anytown', 'AK', 12341)
po.billTo = address.USAddress('Robert Smith', '8 Oak Avenue', 'Anytown', 'AK', 12341)
po.items = pyxb.BIND(pyxb.BIND('Lapis necklace', 1, 99.95, partNum='833-AA'),
pyxb.BIND('Plastic necklace', 4, 3.95, partNum='833-AB'))
print(po.toxml("utf-8").decode('utf-8'))
The pyxb.BIND
reference wraps the content of the inner elements, and
is a cue to PyXB to attempt to build an instance of whatever type of object
would satisfy the content model at that point. The resulting document
(after reformatting) is:
<?xml version="1.0" encoding="utf-8"?>
<ns1:purchaseOrder xmlns:ns1="URN:purchase-order" orderDate="1999-10-20">
<ns1:shipTo>
<name>Alice Smith</name>
<street>123 Maple Street</street>
<city>Anytown</city>
<state>AK</state>
<zip>12341</zip>
</ns1:shipTo>
<ns1:billTo>
<name>Robert Smith</name>
<street>8 Oak Avenue</street>
<city>Anytown</city>
<state>AK</state>
<zip>12341</zip>
</ns1:billTo>
<ns1:items>
<ns1:item partNum="833-AA">
<ns1:productName>Lapis necklace</ns1:productName>
<ns1:quantity>1</ns1:quantity>
<ns1:USPrice>99.95</ns1:USPrice>
</ns1:item>
<ns1:item partNum="833-AB">
<ns1:productName>Plastic necklace</ns1:productName>
<ns1:quantity>4</ns1:quantity>
<ns1:USPrice>3.95</ns1:USPrice>
</ns1:item>
</ns1:items>
</ns1:purchaseOrder>
The complete document is generated by the following program:
from __future__ import print_function
import pyxb
import po4
import address
import pyxb.binding.datatypes as xs
import datetime
po = po4.purchaseOrder(orderDate=xs.date(1999, 10, 20))
po.shipTo = address.USAddress('Alice Smith', '123 Maple Street', 'Anytown', 'AK', 12341)
po.billTo = address.USAddress('Robert Smith', '8 Oak Avenue', 'Anytown', 'AK', 12341)
po.items = pyxb.BIND(pyxb.BIND('Lapis necklace', 1, 99.95, partNum='833-AA'),
pyxb.BIND('Plastic necklace', 4, 3.95, partNum='833-AB'))
po.shipTo.country = po.billTo.country = po.shipTo.country
lapis = po.items.item[0]
lapis.shipDate = po.orderDate + datetime.timedelta(days=46)
lapis.comment = 'Want this for the holidays!'
po.items.item[1].shipDate = po.items.item[0].shipDate + datetime.timedelta(days=19)
print(po.toxml("utf-8").decode('utf-8'))
The additional code demonstrates a couple additional features:
- Fixed attribute values (such as
country
) are present in the bindings, even though they are only printed if they are set explicitly- The PyXB types for representing dates and times are extensions of those used by Python for the same purpose, including the ability to use them in expressions
Creating XML Documents from Binding Instances¶
All along we’ve been seeing how to generate XML from a binding instance.
The toxml
method is short-hand for a sequence that converts the binding
to a DOM instance using xml.dom.minidom
, then uses the DOM interface to
generate the XML document.
The pyxb.utils.domutils.BindingDOMSupport
class provides ways to
control this generation. In particular, you may want to use something more
informative than ns#
to denote namespaces in the generated documents.
This can be done using the following code:
import pyxb.utils.domutils
pyxb.utils.domutils.BindingDOMSupport.DeclareNamespace(address.Namespace, 'addr')
pyxb.utils.domutils.BindingDOMSupport.DeclareNamespace(po4.Namespace, 'po')
print(po.toxml("utf-8").decode('utf-8'))
With this, the final document produced is:
<?xml version="1.0" encoding="utf-8"?>
<po:purchaseOrder xmlns:po="URN:purchase-order" orderDate="1999-10-20">
<po:shipTo country="US">
<name>Alice Smith</name>
<street>123 Maple Street</street>
<city>Anytown</city>
<state>AK</state>
<zip>12341</zip>
</po:shipTo>
<po:billTo country="US">
<name>Robert Smith</name>
<street>8 Oak Avenue</street>
<city>Anytown</city>
<state>AK</state>
<zip>12341</zip>
</po:billTo>
<po:items>
<po:item partNum="833-AA">
<po:productName>Lapis necklace</po:productName>
<po:quantity>1</po:quantity>
<po:USPrice>99.95</po:USPrice>
<po:comment>Want this for the holidays!</po:comment>
<po:shipDate>1999-12-05</po:shipDate>
</po:item>
<po:item partNum="833-AB">
<po:productName>Plastic necklace</po:productName>
<po:quantity>4</po:quantity>
<po:USPrice>3.95</po:USPrice>
<po:shipDate>1999-12-24</po:shipDate>
</po:item>
</po:items>
</po:purchaseOrder>
(Surprise: addr
does not appear, because the nsaddress.xsd
schema
uses the default element form unqualified
, so none of the address
components in the document have a namespace.)
Influencing Element and Mixed Content Order¶
PyXB generally expects that any information reflected in the order of elements is controlled by the content model in the schema. Where content includes multiple instances of the same element, they are maintained in order within the binding attribute corresponding to the name. Historically relative order with other elements or with mixed content historically was not rigorously maintained, and generated documents applied only the order enforced by the content model.
The following example from examples/xhtml/generate.py
hints at the difficulty:
# -*- coding: utf-8 -*-
from __future__ import print_function
import pyxb.bundles.common.xhtml1 as xhtml
import pyxb.utils.domutils
pyxb.utils.domutils.BindingDOMSupport.SetDefaultNamespace(xhtml.Namespace)
head = xhtml.head(title='A Test Document')
body = xhtml.body()
body.append(xhtml.h1('Contents'))
body.append(xhtml.p('''Here is some text.
It doesn't do anything special.'''))
p2 = xhtml.p('Here is more text. It has ',
xhtml.b('bold'),
' and ',
xhtml.em('emphasized'),
' content with ',
xhtml.b('more bold'),
' just to complicate things.')
body.append(p2)
# Verify we have two b's and an em
assert 2 == len(p2.b)
assert 1 == len(p2.em)
# Generate the document and externally verify that the em is between the two bs.
doc = xhtml.html(head, body)
try:
xmls = doc.toDOM().toprettyxml()
except pyxb.ValidationError as e:
print(e.details())
raise
open('genout.xhtml', 'w').write(xmls)
If the relative order of elements and mixed content were not maintained, this might produce something like:
<?xml version="1.0" ?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title/>
</head>
<body>
<h1/>
<p/>
<p>
<b/>
<b/>
<em/>
</p>
</body>
</html>
Here mixed content is lost, and element content is emitted in the order that elements appear in the original schema.
As of release 1.2.1 [1], PyXB appends both element and non-element content to a
list in each complex binding instance. The list may be obtained using the
orderedContent
method. The list
comprises instances of pyxb.binding.basis.ElementContent
and
pyxb.binding.basis.NonElementContent
added in the order in which they were
added to the binding instance: when creating the instance from a document or
through a constructor, or by invoking the
append
or
extend
methods to add
content consistent with the content model.
The contentInfluencesGeneration
flag of
pyxb.ValidationConfig
controls how the orderedContent
list affects
generation of documents (both DOM directly and XML indirectly). With the
default value of MIXED_ONLY
the
orderedContent
list is only consulted when a complex type allows both
element and non-element content.
The bundle for XHTML has been modified to use:
ALWAYS
forcontentInfluencesGeneration
RAISE_EXCEPTION
fororphanElementInContent
RAISE_EXCEPTION
forinvalidElementInContent
for all binding classes in that module. (See
pyxb/bundles/common/xhtml1.py
for the technique used.) This ensures
preservation of element order in cases where no non-element content may appear
(such as the top-level body
element).
With this capability the following document is generated:
<?xml version="1.0" ?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>A Test Document</title>
</head>
<body>
<h1>Contents</h1>
<p>Here is some text.
It doesn't do anything special.</p>
<p>
Here is more text. It has
<b>bold</b>
and
<em>emphasized</em>
content with
<b>more bold</b>
just to complicate things.
</p>
</body>
</html>
Be aware that the automatically-maintained orderedContent
list will be
incorrect in at least two cases:
When the elements of an instance are mutated through Python code, the list no longer reflects the correct order;
When elements are appended directly to sub-elements as with:
p2.b.append('another bit of bold')
the newly added elements do not appear in the
orderedContent
list.
The value returned by orderedContent
is a mutable list
so that you can manipulate it to reflect the content you wish to have
generated.
Where the orderedContent
list is not consistent with the content model
(e.g., references elements that are no longer part of the binding instance, or
proposes an order that is not valid) various exceptions may arise. To some
extent this can be controlled through the orphanElementInContent
and
invalidElementInContent
flags.
[1] | Though previous versions also provided this information through
a content list, the list did not associate content with the element to
which it belonged making it difficult to reconstruct a valid document. |