The builtin Python modules to work with markup languages can be found on http://docs.python.org/library/markup.html. For XML these are mainly DOM (incl. minidom), SAX and ElementTree.
A comparison of minidom and ElementTree including good examples can be found on http://mike.hostetlerhome.com/present_files/pyxml.html.
Other than the default Python modules there is also a very Pythonic module called lxml which behaves similar as ElementTree and is based on Gnome’s libxml2.
MiniDom
- Hands on how-to to parse xml files with minidom: http://diveintopython.org/xml_processing/parsing_xml.html
- More on minidom (German): http://de.wikibooks.org/wiki/Python_unter_Linux:_XML
ElementTree
- The documentation: http://docs.python.org/library/xml.etree.elementtree.html
Usage of ElementTree:
- by the devs: http://effbot.org/zone/element-index.htm#usage
- random: http://www.xml.com/pub/a/2003/02/12/py-xml.html
!python
import elementtree.ElementTree as ET tree = ET.parse(“page.xhtml”)
the tree root is the toplevel html element
print tree.findtext(“head/title”)
if you need the root element, use getroot
root = tree.getroot()
SAX
- Using the xml.sax.handler to read XML structures in to a Python Object data structure: http://code.activestate.com/recipes/534109-xml-to-python-data-structure/
lxml
More demanding XML applications including schemes and namespaces should probably use lxml XML toolkit, which is the Pythonic binding for the C libraries libxml2 and libxslt. The most Pythonic way of using it is to make use of lxml.objectify.
Resources
- A good overview over the possibilities is given on: http://wiki.python.org/moin/PythonXml