The builtin Python modules to work with markup languages can be found on http://docs.python.org/library/markup.html. For XML these are mainly DOM (incl. minidom), SAX and ElementTree.
A comparison of minidom and ElementTree including good examples can be found on http://mike.hostetlerhome.com/present_files/pyxml.html.
Other than the default Python modules there is also a very Pythonic module called lxml which behaves similar as ElementTree and is based on Gnome’s libxml2.
- Hands on how-to to parse xml files with minidom: http://www.diveintopython.net/xml_processing/parsing_xml.html (or here for Python 3)
- More on minidom (German): http://de.wikibooks.org/wiki/Python_unter_Linux:_XML
- The documentation: http://docs.python.org/library/xml.etree.elementtree.html
- Usage of ElementTree:
Here is a small example:
import elementtree.ElementTree as ET tree = ET.parse("page.xhtml") # the tree root is the toplevel html element print tree.findtext("head/title") # if you need the root element, use getroot root = tree.getroot()
- Using the xml.sax.handler to read XML structures in to a Python Object data structure: http://code.activestate.com/recipes/534109-xml-to-python-data-structure/
More demanding XML applications including schemes and namespaces should probably use lxml XML toolkit, which is the Pythonic binding for the C libraries libxml2 and libxslt. The most Pythonic way of using it is to make use of lxml.objectify.
- A good overview over the possibilities is given on: http://wiki.python.org/moin/PythonXml
- Some example scripts employing Python are to be found in the tool collection Beej’s GPS stuff – Geocaching (and other) tools which I put on Github