Devices

Overview

A typical calibration pipeline will contain the following device roles:

One calibration manager
- Exception: single-module detectors (some JUNGFRAU, GOTTHARD-II)
One correction device per detector module, arranged in groups
- AGIPD1M@SPB, AGIPD1M@MID, LPD1M@FXE, DSSC1M@1MSQS/SCS: 4 groups of 4 correction devices
- JF4M@SPB (in GPU mode): 1 group with all 8 correction devices
Optionally one group matcher per group
Optionally one full matcher
One preview assembler per preview layer (at least raw and corrected)
One geometry device

The following diagram illustrates the data flow between devices. For readability, the diagram only includes one module in one group, only shows one preview layer, and omits manager and geometry devices (these don't use fast data channels).

Calibration manager

Responsible for instantiating, configuring, and restarting most other devices, the manager is the primary way to interact with the pipeline. Its basic configuration is saved in the pipeline Karabo project. Note that a lot of the manager's schema (including constant retrieval operating condition parameters) is dynamic and so will reset to default when the manager is restarted.

While the pipeline is running, the manager provides convenient reconfiguration of multiple devices simultaneously. For instance, it allows simultaneously setting the DAQ train stride on all DAQ devices setting change any of a number of settings on all correction devices. Thus, although a number of points below (such as preview configuration retrieving constants) technically deal with settings on correction devices, these should almost always be set through the manager.

Group configuration

For big multi-module detectors, correction devices are split across multiple compute nodes. We use the term group here to refer to a set of modules handled by a single node. For AGIPD / DSSC / LPD, a typical calng pipeline will have four groups of four modules - note that groups do not have to correspond to to quadrants.

Module grouping is configured via the Modules table. This table must list all the detector modules with virtual names and each is assigned a group number (the remaining fields in each row typically left blank as they can be inferred). Modules with the same group number are processed on the same device server - which device server is seen in the Module groups table, which also has some options for automatically starting group matchers and setting up their karabo bridge output.

Assembled preview configuration

As illustrated in the simplified overview, previews use a separate data flow. The main feature of this is to slice or summarize big detector data down to one frame already on the correction devices. Therefore, settings related to how this is done should typically be managed via the manager to keep settings consistent. The parameters seen in the "Preview" box on the manager overview scene control:

How to select which frame to preview (for non-negative indices)
- See Index selection mode
- Frame can be extracted directly from image ndarray (frame mode), by looking up corresponding frame in cell table (cell mode), or by looking up in pulse table (pulse mode)
- For XTDF detectors, the mapping tables are typically image.cellId and image.pulseId. Which selection mode makes sense depends on detector, veto pattern, and experimental setup.
- If the specified cell or pulse is not found in the respective table, a warning is issued and the first frame is sent instead.
Which frame or statistic to send
- See Index for preview
- If the index is non-negative, a single frame is sliced from the image data. How this frame is found depends on the the index selection mode.
- If the index is negative, a statistic is computed across all frames for a summary preview. Note that this is done individually per pixel and that NaN values are ignored. Options are:
  - -1 for max
  - -2 for mean
  - -3 for sum
  - -4 for standard deviation

These settings determine the data slicing / summarizing on the correction devices. Before the preview is sent to the GUI, some additional steps are taken. In case of assembled previews, these are set on the preview assembler itself. In case of a single module preview, the correction device exposes the same configuration settings as the assembler.

Retrieving constants

For correction devices to apply their corrections, they need to load appropriate correction constants. To do so, they need a number of parameters about current operating conditions in order to query CalCat. Through the manager - see the "Constant retrieval parameters" box in the screenshot - these parameters are set and propagated to the correction devices.

Note that one needs to manually update these when changing detector configuration. Taking the AGIPD example in the screenshots, this typically means ensuring that memory cells, bias voltage, acquisition rate, and gain setting / mode corresponds to the current state of the detector. Efforts are being made to automate this process.

Once parameters are set, click "Load most recent constants" to start the constant loading process. The lamps on the manager overview gives an indication of the progress. The individual correction provide additional information about what was found by the latest query, used to troubleshoot constant loading issues.

Restoring device settings on restart

In some cases, you may want to configure running manager-started devices (ex. group matchers) beyond settings maanged directly by the manager (ex. additional data sources). One way to persist such settings is to add such devices to the project. Then, however, these devices must be started manually for project-level custom settings to take effect - the manager does not instantiate via project settings.

The "Restored configurations" table offers an alternate solution. Each row in this table specifies a device and key regex. When instantiating devices, for each row where the device regex matches the device name, the manager will use the Karabo "get configuration from past" feature with the time set to "now" in order to get the last known values for all properties matching the key regex. This allows live reconfigured settings to be persisted across pipeline restarts. Make sure to save the manager configuration to the project, though.

Correction devices

As they are started and configured by the manager, correction devices for big detectors need not be in the Karabo project. Similarly, most settings you will want to change on correction devices should be set via the manager in order to keep them consistent.

The overview scene for a single correction device can, however, be useful for troubleshooting:

Observations about the state of the correction device at the time this screenshot was taken:

The deviceInternalsState warning lamp is on
- At the time the screenshot was taken, this was due to a benign issue connecting to timeserver
The last status was a warning about input hash having unknown source
- Given that inputDataState is no longer red, this was not an issue any longer for the last train processed
All constants are loaded
- All correction steps are available
- All correction steps except forceMgIfBelow and forceHgIfBelow are enabled (those two are off by default)

Single module preview

The section on assembled preview configuration deals with previewing a full multi-module detector, including using a DetectorAssembler with detector geometry to assemble the full preview. One can, however, also directly access the preview data from a single correction device. This is found under the preview node which contain both the settings described for assembled preview configuration - how to slice burst mode data for preview - and the GUI-specific settings necessary to display the preview in the KaraboGUI.

The default correction device scene includes the corrected preview output and links to individual scenes with any additional preview channels.

Single module previews are primarily intended for use in single-module detector installations such as JF500K. They can also be used to inspect single modules in multi-module detector setups, but one should avoid tweaking individual module preview settings in this case. Changing the GUI-related settings (flipping along SS / FS axes, downsampling) on the correction device level should be expected to break or render inconsistent the assembled preview.

Warning lamps

The warning lamps discussed here are displayed with names on the correction device overview scene and shown in the correction device overview of the manager overview scene. They are intended to quickly notify the operator of issues with the pipeline. There are multiple lamps (rather than the regular device state going to ERROR) in order to give a hint as to where the errors may originate. The lamps and their error types are currently:

inputDataState indicates a problem with the latest input received - can flicker if detector, DAQ, or network is unreliable: EMPTY_HASH is most commonly encountered when the DAQ is monitoring, but not getting data from the detector. In that setting, the DAQ will still send data to the pipeline, but it will be "empty". Specifically, this warning is triggered when opening image data causes a RuntimeError because the number of bytes in the array does not match the shape.; TRAIN_ID means that the train ID of the input seems fishy. Note that while the correction devices do not care about train IDs, train ID issues will affect matching later in the pipeline. Train IDs are fishy if they are much greater than "current" train ID (see trainFromFutureThreshold) or if they are not monotonic. In the former case, input is dropped, while in the latter, the warning is just issued (and the received train percentage is reset).; MISC_INPUT_DATA covers other issues such as input source not being in fastSources or input missing image data node. These issues seem to sometimes show for a single train when DAQ cycles through states.
processingState: MEMORY_CELL_RANGE means that the memory cell range for the input (for XTDF detectors, the values in image.cellId) exceed the number of memory cells as used for constant queries (constantParameters.memoryCells). Most corrections are per pixel and per memory cell, so frames from cells outside the range of currently loaded constants may not get corrected. Make sure to set the constant parameters according to the current detector configuration.; FRAME_FILTER means an error occurred during application of the frame filter.; PREVIEW_SETTINGS means that there was a warning when picking the frame for preview, probably because the requested frame was out of bounds or the cell / pulse requested was not found. The warning text should give some details.; In future versions of calng, there will likely be a catch-all MISC_PROCESSING to gather uncaught exceptions in the detector-specific correction code.
deviceInternalsState indicates problems within the correction device expected to prevent normal function.: CALCAT_CONNECTION means that the device failed to connect to CalCat. In this case, the device will be unable to retrieve constants.; TIMESERVER_CONNECTION means that the device (when doing getActualTimestamp) thought that the "current train ID" was zero, typically indicating that the device was not connected to the timeserver. In this case, the correction device may be unable to preemptively discard erroneous "future train IDs" (see trainFromFutureThreshold and the TRAIN_ID warning above). This does not impair processing, but if "future train IDs" are expected and this warning persists, one should try to fix the configuration.; CORRECTION_RUNNER means that data was received while the correction kernel(s) were not ready. Some reconfiguration (for instance, changing number of cells on the detector) may trigger recompilation of kernels, causing the lamp to blink briefly. While this warning is on, the device will be unable to correct data - if it is on and persists for a while after a restart, it is likely an issue for CAL OCD.; FRAME_FILTER - when applying frame filter fails for a given train, this sets a warning on processingState. As it is possible to specify "obviously" invalid frame filters (ex. with indices out of range for the number of frames on input), this warning can additionally be set here on deviceInternalsState after reconfiguration.

If an issue is detected, a warning is issued once - this goes in the device status and in the log - and the appropriate lamp is turned on. The lamp is turned back off when the issue is noticed to be resolved - for example, the lamp warning about inputDataState might turn on if corrupt data is received for one train, but turned off again if the next train is valid.

Pro tip: you can use Karabo's device property history feature to see recent history of device status. Select a correction device, double-click the "Current value on device" for "Status", select a time range, and "Request History". This is useful to see why a warning lamp is on in cases where it has been on for a while and other status messages might have come since.

Constant status

For each calibration constant, the correction device exposes a state field to indicate whether the constant was successfully found and loaded. These are somewhat simpler than the warning lamps described in the previous section:

Initially, all constants are in the OFF state
Every time a constant is queried / reloaded, the state changes to:
- ON if no exceptions happened
- ERROR in case anything went wrong

Consult the correction device status log for information about what specifically went wrong. A common source of issues is incorrectly set operating condition parameters, leading to "condition not found" when querying CalCat.

Geometry devices

For multi-module detector setups, a geometry device is typically set up to inform the preview assemblers about the current geometry of the detector. The geometry device should be saved in the Karabo project. It should not need to be restarted and is not managed as such by the manager.

As the above screenshot suggests, the current geometry can be set manually via quadrant positions (or module list, in the case of JUNGFRAU) or loaded from a file. Geometry devices wrap geometry classes from EXtra-geom.

Matchers

A number of specialized versions of the TrainMatcher are used in calng pipelines. The extensions provided in ShmemTrainMatcher used within calng are twofold:

They handle shared memory handles
- This is necessary for group matchers
They allow stacking arrays from multiple sources
- This is useful for connecting tools like OnDA to a full matcher

Group matcher

Each detector module produces an independent data stream - to the control system, each module is essentially its own detector. The data from one module is typically passed through one DAQ device from which it is then sent to a calng correction device. As multiple data streams can be processed simultaneously, a group in calng denotes a set of modules which are corrected on the same compute node in the online cluster. Fast detectors like AGIPD, DSSC, or LPD are typically split into four groups of four modules. For slower detectors like JUNGFRAU, it may be feasible to process the entire detector wihin a sigle group.

Each group can optionally run a group matcher, which provides the fastest possible access to the output stream from the group. If your online analysis relies only upon a subset of modules, configuring how the groups are divided (done via the manager) is the first step to set up an efficient output over a Karabo bridge. See manager notes for details on group matcher configuration.

Here is a screenshot of the overview scene from a group matcher in the wild:

Full matcher

Important to note:

Full matcher should connect to outputs of group matchers, not directly to correction devices
- The group matchers will take care to send full data out (dereferencing shared memory handles from correction devices)
- To configure sources, keep in mind that correction devices forward the source name they get from DAQs and that TrainMatcher needs to know source name and actual channel name in the format [source]@[channel]
- Example source in SPB_DET_AGIPD1M-1/CALNG/FULL_MATCHER: SPB_DET_AGIPD1M-1/DET/0CH0:xtdf@SPB_DET_AGIPD1M-1/CALNG/MATCH_G1:output
Full matcher is likely to hit network limits
- Be aware of DAQ train stride and number of frames (or memory cells) coming from the detector
- If possible, consider running the full matcher on the node where analysis is done
- Configuring and testing full matcher with online analysis software should be done during commissioning (please contact calibration team ahead of time)

Preview assemblers

Preview assemblers are also based on TrainMatcher and as such, some configuration and troubleshooting notes overlap with those for other matchers. In particular, consider the value of the maxIdle parameter; if it is too low relative to DAQ train stride, it can cause flickering.

A preview assembler needs to contact a geometry device to get current detector geometry to use for assembly. The geometry classes and assembly methods directly use EXtra-geom.

Preview assemblers offer a few parameters to tweak the preview image sent to the GUI:

NaN replacement: Karabo widgets don't play well with NaN values. Bad pixel masking by default sets bad pixels to NaN and the space between modules when assembled is by default filled with NaN. Therefore, it is up to the operator to decide which value to show for these pixels.
Max rate: how often to send preview updates. Karabo GUI is typically set to only accept updates at 2 Hz, so typically leave this parameter at this default value.
Downsampling factor and function: allows downsampling of preview image resolution; mostly relevant over slow connections, hopefully not in control room. Downsampling (enabled if the factor is greater than one) is recursive halving, so factors can only be powers of two and image dimensions must be multiples of the factor. The downsampling function defines the downscaling kernel. This function is applied to each 2x2 group of pixels during recursive halving.

Preview assemblers provide multiple outputs:

As a TrainMatcher, it has the typical "Pipeline Output" (output). This is unassembled, at full rate, and per source. Keep in mind that preview assemblers typically receive single-frame input - this is different to a full matcher.
assembledOutput provides the assembled images at full speed. The output only contains image.data (single frame, assembled) and trainId. As the preview settings have not been applied to this data, this output may be useful for inspecting with small analysis devices.
preview.output is the preview output, providing the image displayed on the default scene. The aforementioned throttling, NaN replacement, and downscaling is only applied to this output.

As a TrainMatcher, a preview assembler can also provide Karabo bridge output. The outputForBridgeOutput parameter allows you to choose which of the three output channels the Karabo bridge output should mirror.

Condition watchers

As described above, retrieving constants requires querying with current operating conditions. In many cases, this means repeating values set by (a detector) control device(s). A condition device can partly automate the task of keeping the query values in sync. Still a work in progress, condition devices are available for AGIPD and JUNGFRAU.

The above screenshot from the calng installation at MID reveals some details:

The condition checker distinguishes between the current value on the control device, the ideal value that the manager should use for constant querying based on that, and the current value actually set on the manager
- For some values like gain mode, the condition device maps from an integer control value (gainModeIndex) to the descriptive enum used in calng (AgipdGainMode with members like ADAPTIVE_GAIN)
- For other values like bias voltage, the condition device rounds the floating control value to avoid small fluctuations interfering with constant retrieval
Some parameters - like photon energy - are not found on detector control device which is indicated by IGNORING
- In the screenshot, this includes pixelsX and pixelsY which are anyway fixed for the detector
- It additionally shows deviceMappingSnapshotAt and constantVersionEventAt which are CalCat-specific parameters only relevant for certain testing scenarios and should generally be left blank for normal operation