Data pipeline

The MetroProcessor device provides a programmable pipeline to perform operations on data flowing through Karabo. This pipeline consists of several stages. The frontend lives in the Karabo device and connects to all other Karabo devices required for data input. The data is matched by train and then send to the pool stage. This consists of one or more workers, each processing a single train. All their results are then send further to the reduce stage, which may perform operations across trains, e.g. an average. Finally, this stage also outputs the results over ZMQ to any connected clients:

Pipeline stages

All these components may run on a single machine or be spread across multiple nodes, both the stages as a whole as well as the workers constituting the pool stage. This allows to run complex calculations exceeding 100 ms per train in the pool stage, while still keeping the final pipeline output running at 10 Hz. Alternatively, it promotes the use of expressive code over highly optimized code, which is often harder to maintain in a scientific setting. In turn, it is important to keep the operations in the reduce stage as lean as possible, as each train is executed synchronously here.

The context configures the programmable portion of the pipeline. It is written in Python code, which expresses the transformation steps to be performed, declare parameters or even run a scriped measurement. Currently, this code must be contained in a file readable by the MetroProcessor device. It is structured into a series of functions, which perform any operations on the data and (may) each return a result. These functions are called views in the sense that they allow a particular “view” into the data.

Please refer to the documentation of the underlying metropc framework for details on how to write context files.