TrainMatcher notes

Overview

As several variants of TrainMatchers are used in calng pipelines, it is useful to understand the basics of how a TrainMatcher works. As the name implies, a TrainMatcher matches data based on train IDs. It can monitor properties of other devices (also known as slow sources or control sources) and connect to a number of output channels (also known as fast sources). Generally, it will aim to forward data from all selected sources grouped by train IDs in one of two operation modes:

match: This is the mode typically used in calng contexts. In this mode, data for a train is sent once data from all selected sources has arrived (exception: max idle). If data from only a subset of sources arrived for train x, but a full match for a later y has arrived, then the match is sent for y whereas x is discarded. In other terms: a match is sent as soon as it is complete, but failure to match means nothing is sent.; The matching ratio is the rate of matched (and sent) trains divided by the rate of incoming unique train IDs (across all sources).
buffer: In this mode, a configurable number, n, of trains is cached. Whenever data starts arriving for train ID x, the cached data for train ID x-n+1 This means partially matched trains can be sent, and naturally implies a delay of n trains.

Data forwarded by a TrainMatcher is sent per source. On the Karabo channel output, this means that data from each matched source is written before update is called on the channel. On the Karabo bridge output, this means that the receiver will, get a dictionary with one entry per matched source. See also output formats, in particular if you need data stacked.

Configuring sources

The TrainMatcher device comes with a useful description of the sources parameter. This description is available within Karabo, on the overview scene (also in the screenshot below), or here. The following screenshot shows an example of the source configuration for a group matcher

For each output channel source, the TrainMatcher needs to know the source name to watch and the device name plus channel name to connect to. To explicitly specify both, the format is [source name]@[device name]:[channel name]. In most cases, the device name plus channel name is the source name. Correction devices - as the configuration seen in the screenshot above indicates - are an exception as they forward the source name of the source they are correcting. Another exception is other TrainMatchers as they also forward sources; relevant if setting up a full matcher.

Note that device property sources (also known as control sources) are handled differently from channel sources. The value associated with a property is updated when the property being watched is changed. For trains in between changes, the latest known value is used. Channel sources which haven't sent data for a given train will simply not appear in the output for that train (if anything is output - depends on how missing sources are handled).

Handling missing sources

In match mode, missing data from any selected source causes a train to fail to match. This (for different causes of missing data from any source) is a common cause of missing preview or missing data on bridge.

If a source is known to not send data for a while (ex. defective detector module), consider unselecting it in the source configuration.
If it's likely for some source(s) to suddenly go missing, the max idle parameter can be used to gracefully handle certain kinds of failures from individual sources. The manager by default sets a non-zero max idle value for the preview assemblers it instantiates.
If some latency is acceptable, consider using buffer mode instead of match mode.

Max idle

This parameter, if set to a value greater than zero, allows the TrainMatcher to ignore any source from which it has not received data in a configurable number of seconds. Any sources not heard from within the last maxIdle seconds will not be waited for in matching. This is useful in case a source may suddenly disappear - for example, in case of an unreliable detector module.

Note that if data sporadically fails to arrive for a few trains at a time, max idle may not take effect quickly enough to help. In particular, hitting network limits can cause this to happen for many detector module sources at once, rendering matching infeasible, regardless of the max idle function.

Karabo bridge output

Recent versions of TrainMatcher include a "built-in" Karabo bridge output. This is configured under "ZeroMQ configuration". For most use cases, one will need to add one row to this table for a single ZMQ output. The parameters for each row deal with specifics of the ZMQ port; this should align with the configuration of the online analysis software you want to connect.

While the TrainMatcher is running, see the "ZeroMQ outputs" table for information about currently active ZMQ ports based on the ZeroMQ configuration. This table will show how many packets have been sent (useful to see if things are working) and address and pattern for the ZMQ port (useful to configure connection to the bridge).