DictParser is a tiny library for decoding rudimentary dictionary-like objects from a stream of bytes. The library contains two implementations, one in Python and one in C++. This document describes the format of an encoded dictionary object.
A big thanks to my former employer Nordic River, for letting me share this code.
We define dictionary (or dict for short) to mean a set of properties where each property has a name and a value. The set of properties may or may not be ordered and property names may or may not repeat; it's up to the user to define.
An encoded dictionary is an ordered sequence of encoded properties
enclosed in curly braces:
{
, }
.
The empty dictionary is the string '{}
'.
The significance of the ordering of properties is user defined.
An encoded property is either simple or binary.
A simple property
has a name and a value
(each a sequence of bytes)
separated by a colon
and terminated by a semicolon,
e.g., 'name:value;
'.
A simple property name must not be empty
and must not contain parentheses ((
, )
) or colon (:
).
A simple property value must not contain a semicolon (;
).
A binary property
has the same structure as a simple property,
but its name ends with the length (in 8 bit bytes) of the value,
in parentheses ((
, )
),
e.g., 'hello(7): world!;
' and 'hello(6):world!;
'.
A binary property value may contain any character.
Property names may be repeated,
so that '{a:x;a:y;a:z;}
'
is a valid encoded dictionary with three distinct properties.
The interpretation of properties with identical names is user defined.
Note that white space characters are interpreted as any other characters; any line feed, space, tab, etc, will be interpreted verbatim, as part of a property name or value.
This library provides an interface
to decode an encoded dictionary
one property at a time.
The interface is pretty straight forward.
You instantiate a DictParser
with a stream
and call getNextProperty()
until all available (or desired?) properties
have been read.
The parser instance will throw an exception
on invalid input
or stream errors.
Here's an example of how to parse the contents of a file in Python:
import DictParser
import io
f = io.open("dict.txt", mode="r+b")
parser = DictParser.DictParser(f)
while True:
prop = parser.getNextProperty()
if not prop: break
print "%s: %s" % (prop.name(), prop.value())
Here's the same example in C++:
#include <fstream>
#include "DictParser.h"
int main(int argc, char *argv[]) {
std::fstream stream("dict.txt", std::ios::in|std::ios::binary);
DictParser parser(stream);
DictParser::Property prop;
while (parser.getNextProperty(prop)) {
std::cout << prop.name() << ": " << prop.value() << std::endl;
}
std::cout << std::flush;
return 0;
}