Parsing structured data into nested python Sat, Aug 30 2014 AM
The theme of this blog entry is converting structured data into nested python objects. Python's duck-typing system, along with other language features, makes representing structured data of arbitrary nesting really easy.
Why even bother making a custom class when you can just throw together arbitrary nestings of dictionaries, lists, string, bools, and numbers? Laziness FTW!
Also, this is fundamentally no different than how JSON encoding works in javascript. I'm guessing this particular approach has become more popular in recent times given the rise in use of JSON.
CppHeaderParser
The CppHeaderParser module can parse an arbitrary c++ header file into a nested structure containing all of the information about methods, classes, namespaces, etc... Very useful. Not to give CppHeaderParser 100% of the credit, it does use the python lexer PLY under the hood.
I did have two troubles with the parser.
Problem 1: The parameter0 for doSomething() will have a type of 'baz::enum'. Removing the typedef and using a c++ style enum fixes the issue.
namespace baz { typedef enum {FOO, BAR} my_type_t; void doSomething(my_type_t x); }
Problem 2: The parser cant handle API export macros. I just removed the macro in a pre-processing step to sanitize it for CppHeaderParser.
class FOO_BAR_API MyBaz { //class stuff here... };
xmltodict
I have used xml dom, minidom, lxml... I have even used lxml to convert xml to a nested python structure, just to avoid sprinking xml API stuff all over the code. Usually when I use XML, I really just want to deal with a dictionary anyway.
So here it is, xmltodict does exactly that. Feed it XML, get nested python. It even uses ordered dictionaries to preserve the order of the xml tags. Its a little weird in that it will create a lists for tags that appear more than once. The reason that can be weird is that some tags appear 0, 1, or many times... and the structure you get may or may not be a list, even when you always want a list since your code expects a list to iterate through. But xmltodict doesnt know any better, so you make it work:
entries = xmltodict.parse(xml_stuff)['top_level_tag'] if not isinstance(entries, list): entries = [entries] for entry in entries: pass
Python JSON module
I find myself preferring JSON over XML when I need a text file format, simply because the tools to deal with JSON are so much easier to use.
Good thing python comes with a JSON module that converts between arbitrary python structures and JSON strings. Of course, it only supports python types that are supported in JSON, lists, maps, integers, floats, bools, and strings. Unforunatly, no complex number types supported.
import json s = json.dumps([{'hello': true}, 2]) print (json.loads(s))