Node serialisation

This page discusses how to serialise the parameters for node types. If you don't know what that means, you're either in the wrong place or haven't read Writing PCOT plugins.

Most nodes need to store some data. Sometimes this data can just be stored directly, as attributes of the XForm object (i.e. the node). It's not really "polite" programming, but this is the kind of thing you can do with Python.

However, often "parameter" data controlling how the node operates needs to be saved inside the PCOT document file, and loaded when we reopen the file. For example, the expr node needs to store a string: the expression to be run. Parameters for some nodes can be complicated: multidot needs to be able to store a list of circular regions of interest, for example.

We also need to do this to handle undo operations - every time a change is made, the entire document is "saved" into an archive in memory so it can be undone.

This process - converting node data into data which can be saved to archives - is called serialisation, and there no less than four different mechanisms for doing it. This is largely for historical reasons, but also because the different mechanisms serve different needs:

In order of preference, with the best at the top:

  • TaggedAggregate serialisation - the data is JSON-serialisable but we want to make it possible to edit it from a batch/parameter file (see batch mode). Probably the best choice if you can.
  • complex serialisation via TaggedAggregate - the data is not serialisable, but we want to edit it from a parameter file. Probably the second-best and suitable where simple TA-serialisation can't handle the more complex data involved.
  • autoserialisation - for when your data is already JSON-serialisable and you don't need to edit it from a parameter file. It is very simple to implement, but doesn't allow editing from a batch file and doesn't document itself automatically. Used only in legacy nodes.
  • complex serialisation - for when your data is not directly JSON-serialisable (for example, regions of interest) and you don't need to edit it from a parameter file because it makes no sense (such as painted ROIs)

TaggedAggregate serialisation

This is the method we use when we want to be able to edit the parameters of nodes in batch mode, using parameter files (see batch mode). It is probably the best method to use because of this, but it is rather more complicated.

We make use of tagged aggregate structures, which can be found in pcot.utils.taggedaggregate. These are dictionaries and lists, but each has a formal, typed structure with "tags" giving the names of the members, their types, and default values.

Each is described by a type singleton object which provides descriptions of its elements (these are the "tags" in the name). Calling create() on the singleton builds an instance of the structure with all the values filled in with defaults.

TaggedDictType

The main type used is TaggedDictType, which describes the format of a set of key/value pairs. Calling create() on one of these objects builds a TaggedDict object containing the default values.

The TaggedDictType constructor takes a set of keyword arguments. Each key is the same of an element in the dict, and each value describes that element as a tuple of:

  • a description used in the documentation
  • a type: either a primitive type such as int or str, or another TaggedAggregateType subclass for nested structures
  • a default value (must be None for aggregates, which provide their own defaults)
  • for string values, an optional list of acceptable strings

If you call setOrdered on the constructed type object you will get an "ordered dict" - this will be serialised as a tuple with the contained data having an implicit ordering.

For example, here is a TaggedDictType definition for a rectangle:

taggedRectType = TaggedDictType(
    x=("The x coordinate of the top left corner", Number, 0),
    y=("The y coordinate of the top left corner", Number, 0),
    w=("The width of the rectangle", Number, 10),
    h=("The height of the rectangle", Number, 10)).setOrdered()

We are using Number here to indicate that either ints or floats are acceptable. We can then create a rectangle TaggedDict and access its values:

r = taggedRectType.create()            # create TaggedDict from type
print(f"Rectangle at {r.x}, {r.y}")    # show values
r.w = 20                               # set values
r.h = 30

We can also specify that a parameter is another TaggedDict, allowing us to build complex nested structures. Here we nest the taggedRectType we defined above in another dict type:

taggedThingType = TaggedDictType(
    rect=("The rectangle", taggedRectType),
    somenumber=("Some numerical value",Number,0))

TaggedListType

These objects describe lists, and calling create() on them generates a TaggedList item. This is rarely done directly - it's more usual for a TaggedDictType to specify that one of its values is a list, in which case the list object is created when the containing TaggedDict is created.

Here's an example where the dict contains both our rectangle type and a list of rectangles:

taggedThingType = TaggedDictType(
    main=("The main rectangle", taggedRectType),
    others=("Some other rectangles", TaggedListType(taggedRectType,0)),
    somenumber=("Some numerical value",Number,0))

We can then create our parameters and add a new default rect to the list:

thing = taggedThingType.create()
listOfThings.others.append_default()

We can then access these items:

print(taggedThingType.others.[0].x)

The TaggedListType constructor takes the following arguments:

  • Type of item (must be a TaggedAggregateType subclass or a primitive type - int, str, etc.)
  • Default length (if a list of aggregates) or default list (if a list of primitives)
  • Optional default value to append when a new item is created, ignored for lists of aggregates which will create their own default item

For more details on how to use these structures, read the tests in tests/test_taggedaggs.py.

You'll note that all the elements of a TaggedAggregate structure are JSON-serialisable1, although some can be numpy arrays. However, the nature of the structure allows defaults - and documentation - to be generated automatically.

Using TaggedAggregates to serialise nodes

To use a TaggedAggregate to serialise node data, create a TaggedDictType and assign it to the params member of the XFormType in the constructor. For example:

        self.params = TaggedDictType(
            mul=("multiplicative factor (done first)", float, 1.0),
            add=("additive constant (done last)", float, 0.0))

When a new node is created, a default structure will be created from this type and stored in the node's params field where it can be accessed from the perform method:

output = node.params.add + node.params.mul * node.getInput(0, Datum.IMG)

When the node is serialised, the structure will be serialised.

complex TaggedAggregate serialisation (CTAS)

The previous method dealt with data which can be JSON-serialised directly. If we need to modify non-JSON-serialisable data with parameter files, we need to do something similar to the complex serialisation method described above but going through a TaggedAggregate: we set the TaggedAggregate from our complex data, and then PCOT will serialise that.

To do this, we write code as before to store the data in a TaggedDict in node.params, and we store any simple data we have in that structure. Then we write a serialise method containing code which converts our more complex data into a simpler form and stores it in node.params so it can be serialised. Here's an example:

def serialise(self, node):
    # fill in the node.params with data
    node.params.foo = some_data_or_other
    node.params.bar = some_data_or_other

    # we don't return anything, because node.params will have been set to
    # represent our data; we don't need to add anything directly to the
    # JSON-serialisable dict.

    return None

This method is also used by the legacy serialisation mechanism, where it would return a JSON dict directly. Here we need to return None instead.

We must also write a nodeDataFromParams method. This takes a node, and uses its params field (which will be a TaggedDict, of course) to set the node's internal data:

def nodeDataFromParams(self, node):
    # convert some data in node.params into our own private data
    our_data = some_function_of(node.params.foo, node.params.bar)

Maybe

Types and TaggedAggregate type objects can be "wrapped" in Maybe objects if they might be null:

    tdt = TaggedDictType(
        a=("a", int, 10),
        b=("b", Maybe(str), "foo"),        # string or null
        c=("c", float, 3.14)
    )
    td = tdt.create()           # create new dict
    td.b = "hello"              # this is fine
    td.b = None                 # and so is this

TaggedVariantDicts

Sometimes it is necessary to store different kinds of object in a list. We can do this with TaggedAggregates, provided the objects are all TaggedDicts and there is a field in all the dicts which tells us which type it is - a "discriminator".

You can find more details on TVDs here

Autoserialisation

This is the serialisation method used for a few legacy nodes and nodes which don't require batch editing, like comment.

In the simplest case, the data stored in the XForm object for a particular node is already JSON-serialisable: that is, it is either a Python primitive type (number, string, tuple, list or dict) or a Numpy array (PCOT handles serialising these automatically). In this case you can simply list the names of the attributes in a tuple called autoserialise in the XFormType, along with some defaults which are used in case the items are not found in the saved data.

For example, the constructor for XFormSpectrum could look like this (not any more, because it now uses TA-serialisation):

        super().__init__("spectrum", "data", "0.0.0")
        self.autoserialise = ('sortlist', 'errorbarmode', 'legendFontSize', 'axisFontSize', 'stackSep', 'labelFontSize',
                              'bottomSpace', 'colourmode', 'rightSpace',
                              # these have defaults because they were developed later.
                              ('ignorePixSD', False),
                              ('bandwidthmode', BANDWIDTHMODE_NONE),
                              )
        for i in range(NUMINPUTS):
            self.addInputConnector(str(i), Datum.IMG, "a single line in the plot")
        self.addOutputConnector("data", Datum.DATA, "a CSV output (use 'dump' or 'sink' to read it)")

Note that the default values are optional - if you don't specify a default you can just use the attribute name rather than a (name, default) tuple, but you will get an error if you try to load from data which doesn't have that attribute stored.

When the system serialises the node it will read the named fields from the node, and it will do the reverse when it deserialises a node (i.e. load it from an archive).

Complex serialisation

In this case, we have data which isn't JSON-serialisable, and where it doesn't make any sense to have the parameters editable outside the PCOT user interface. Here, we add serialise and deserialise methods to the XFormType to convert our data to and from data which is JSON-serialisable. The serialise method will take the node and return a dict (which is JSON-serialisable), while the deserialise method will take the node and the dict and set the appropriate values a newly created node. The dict returned by serialise will be merged with the dict generated by autoserialisation and other, general node data (e.g. canvas data). Make sure you use unique names and not any of:

canvas, displayName, inputTypes, ins, outs, mapping, md5, type, ver, w, xy, outputTypes

An example: imagine our node stores a list of objects of some class Foo. We could write our methods thus:

def serialise(self, node):
    return {'foolist': [(x.a, x.b) for x in node.foo_list]}

def deserialise(self, node, d):
    node.foo_list = [Foo(a, b) for a, b in d['foolist']]

In this case we are converting the Foo objects into tuples to serialise them, and constructing them from the tuples when we deserialise. It would probably be better to write Foo so that it can serialise itself and has a deserialise constructor:

class Foo:
    def __init__(self...):
        ...

    def serialise(self)->dict:
        return ...

    @staticmethod
    def deserialise(d: dict):
        return Foo(...)

and

def serialise(self, node):
    return {'foolist': [x.serialise() for x in node.foo_list]}

def deserialise(self, node, d):
    node.foo_list = [Foo.deserialise(x) for x in d['foolist']]

  1. i.e. they can be turned directly into JSON - they are primitive types (int, float, str etc.), dicts, lists, or tuples.