Mpod Configuration and related Control issues¶
Overview¶
One expects vendor’s of commercial products to provide configuration tools which access the complete configuration space of the product. MPOD is no exception to this, but as MPOD is a joint venture of two companies, Wiener and Iseg, we have the added complication of two tools.
The MUSE Wiener tool connects to the MPOD crate controller via USB and allows configuration (set/get) of network, crate and LV output board and channel properties to be performed. Additionally saving (downloading to file) and reading (uploading from file) all configuration properties to and from XML file (CC-XML) is provided.
The snmpIsegControl Iseg tool connects to the MPOD crate controller via snmp over Ethernet and allows configuration (set/get) of HV channel properties.
The aim of a control system is to provide an interface which allows many differing products to be integrated and to expose these to the operator with a consistent look-and-feel. The Karabo control tool must therefore provide the configuration (set/get) possibilities that the operator requires to manipulate single HV or LV channels plus higher level features which are unlikely to be present in a vendor tool: channel sequencing and grouping, archiving, extended graphics, etc. Implementation versus cost decisions have to be made as to what functionality is left with the vendor tool and which are present in the control system. Currently for MPOD the following tradeoff is made:
MUSE is used to configure the crate controller’s network configuration and, additionally, those properties which are not otherwise exposed via the controller’s network interface. The latter non-exposure issue also means that download and upload functionality remains a vendor tool activity.
Karabo will use the crate controllers snmp over Ethernet protocol to control and monitor properties required for operation.
MPOD non-volatile memory and configuration persistence¶
The configuration of the crate is stored in MPOD controller non-volatile memory (NVRAM) and is applied on controller f/w startup. This is a useful feature for read only (RO) properties (S/N, max output voltage, etc.) but has the inherent drawback for configurable (RW) properties (last set target voltage, current trip, etc.) that inappropriate settings are inherited by the next user, or that false settings persists. The karabo mpod.py s/w device therefore provides a load (and save) configuration action that resets (stores) properties to their stored values.
MPOD geometrical addressing and channel numbering consequences¶
Boards and their channels are addressed geometrically using the boards slot number in the crate. The leftmost board slot, next to the controller, is slot 0 and boards to the right have slot number incrementally increasing.
Channel numbers within a board are similarly geometrical. On boards with single channel output connectors channel 0 is the top connector and last channel is the bottom connector. On multiple channel connectors this definition is normally followed, but depends on the pin geometry of the connector used.
The configuration definitions defined in WIENER-CRATE-MIB, see the SNMP and
MIB section, and the geometrical board addressing have the consequence that
the control system refers to channels in 16 channel board positioned in slot 0
as U0
thru U15
, slot 1 channels as U100
thru U115
, etc. The MPOD
configuration definition does not allow boards to have more than 100 channels.
The use of geometrical addressing results in a board swap
safety hazard.
If two, different channel V/I output limits, boards are swapped, then it is
conceivable that an incorrect channel V/I can be applied. The karabo mpod.py
s/w device provides a load safety configuration action that prevents startup
when the board S/N does not match that expected for a slot.
Karabo multi-channel s/w device issues and mpod.py model¶
The configuration quandary posed to the s/w developer is: are devices configuration specific or not? Multi-board (and therefore multi-channel) crate devices (MPOD power, Agilent gauge or ion-pump controllers, X2timer boards, etc.) have, what could be called the schema evolution problem: a board may be added later which has to be configured and controlled by the control system s/w device. Such systems introduce additional problems, similar to dependency issue, when introduced added channels are not independent. Dependency arises when resources, e.g. set-points relays associated with gauges, within a crate have sharing rules associated with them which are imposed by the controller.
The solution currently taken for multi-channel (MPOD and similar h/w) systems is to implement a single lowest-level Karabo s/w device per system plus additional breakout MDL s/w devices at higher levels providing single or group channel functionality required. The lowest level s/w devices typically implement board/channel discovery injecting the found channels into the control system and apply channel configurations from saved files. For MPOD known property tag attribute settings loaded are then used by breakout s/w devices to control single or groups of channel actions and settings.
Karabo s/w to hardware f/w access issues¶
Access to multi-channel systems where different users ‘own’ different channels can also be hazardous, but can be solved by partitioning using higher level MDL devices.
More dangerous is where single session control (s/w device to h/w firmware) is not possible, which allows additional actors to concurrently modify the h/w configuration with potentially dangerous results. The SNMP control communication protocol used between mpod.py s/w and controller f/w is UDP based and is currently not authorization restricted (i.e. SNMPv2) and not safe w.r.t. additional actors. SNMPv3 is reported to be available with the MPOD f/w and its usage, if present, should be tested.
Overview of (mpod.py) configuration activity¶
On entering the STARTING state mpod.py requests the position of the crate’s main switch, if ‘OFF’ mpod.py goes to ERROR, and if ‘ON’:
- performs a discovery of crate boards and injects board and their channels Schema definitions (using the specified type list’s property configuration options are STANDARD, AGIPD, DSSC and ALL)
- optionally verifies correct board-slot occupancy (going to ERROR on failure)
- optionally applies the requested type list configuration from file which might overwrite crate configuration values and apply different threshold limits properties like voltage and current.
before going to ACTIVE.
Note
In the absence of a karabo-facility-wide networked database (the project database is a candidate, but lacks versioning) the configuration sources are xml files stored in $KARABO/var/data/mpod/. Although non-optimal this location allows copy and reuse, and (more dangerously) the modification functionality often required to edit configurations.
During the ACTIVE state the following configuration actions are allowed
- modifying single or multi-channel settings
- reapplying
last saved
configuration - reapply
last known to work
configuration (must be in ALL!) - save the current configuration
Note
The last known-to-work configuration cannot be overwritten from the s/w device, but must be redefinable by a another tool. Ideally this is a configuration DB (web) tool, in our case an actor with access to $KARABO/var/data/mpod/. Saved configurations must be versioned.
Configuration saving (to store) and applying (from store) issues¶
Saving and applying configuration data is a building site. The following issues and limitations exist.
- XML files stored on disk ($KARABO/var/data/mpod) are currently used to save
configuration data. As mentioned above disk storage should be replaced by a
visioning capable network storage. Configuration files are
type
specific, MIN_RW_LV type example is used in the bullets below. - Device saved and loaded (click ‘save configuration’ or ‘load configuration’) configuration filename are: MIN_RW_LV_configuration.xml. The known-to-work version is MIN_RW_LV_configuration.ok.xml (click ‘load known-to-work’), clearly an operator expert has to elevate a …configuration.xml to a …configuration.ok.xml file!
- The internal format used, a VectorHash (one Hash per channel), is the same as that used by mpodTableConfigurator.py (MDL device). Again due to the lack of suitable network storage.
- Stored channel information is retrieved from mpod.py’s current configuration,
e.g. self.getCurrentConfiguration().get(“U101”). Property attribute settings
are applied at runtime driven by option settings in the
configuartion
node. This is a Karabo design pattern as Property attributes can only be set programatically within the device using the value of another property which the UI-client can set, non-exposed value are therefore excluded. - The operating scope of mpod.py is the entire crate and configurations saved or applied are for the entire crate (all boards and all channels). Save and apply actions may adversely affect others if the crate has multiple users.
- The ALL type list exposes all properties others may reduce property count exposed per channel and the stored configurations contain only the exposed properties. Applying a type may adversely affect others if the crate has multiple users.
- Configuration filenames include their list type. If the property list of the type is modified it is likely that existing configuration files will contain superfluous or missing properties, if applied WARNing messages are logged.
- Known-to-work files can only be updated by someone with access to local storage.
Guidelines on creating configuration files¶
Crate w/o a saved configuration¶
If another crate with an identical board-slot configuration exists, then copy
its ALL and other type configuration files to $KARABO/var/data/mpod/. Then,
to set all properties in crate NVRAM
instantiate mpod.py with configuration node V-limits = I-limits = NONE,
and top-level Type = All and Load channel configs
= False. The apply
Last known to work
.
If another crate with a similar board-slot configuration exists, then decide whether this is like adding a new board to an existing system. If yes, then perform the identical board-slot update, described above, followed by a new board update, described below. The new board should be in the crate.
Adding new boards to a crate with a saved configuration¶
Insert the board and start mpod.py with Type = ALL, then configure the new
channels as required and save the configuration. Important properties to set
are the loadable ones. Then save the configuration
and get an authorized
person to save the file as Last known to work
Replacing boards in a crate with a saved configuration¶
Due to repair a same type board swap may be necessary. Insert the spare board
and start mpod.py with Type = ALL and apply Saved configuration
.
Removing baords from a crate with a saved configuration¶
Remove the board and start mpod.py with Type = ALL, configuration node
V-limits = I-limits = NONE, and Load channel configs
= False. Then
save the configuration
and get an authorized person to save the file as
Last known to work
Command and update latency; changing and nudging issues¶
The command-query protocol implemented by the MPOD controller to foreign control systems is polling, an event driven subscribe and publish mechanism is not available. Therefore to see a change at the controller of a channel (e.g. channel voltage, channel tripped, etc.) or crate (fan failure) property requires the control s/w to periodically poll the controller for information. Periodic means fixed periods between when the control s/w requests an update and associated processing of the information received from the controller and when the next request is sent.
The control s/w modifies the fixed poll period in a number of ways to improve command and update latency. When a change requests (turn channel ON, set new Target voltage…) is received - the change is immediately applied. This is natural as the control system s/w is event driven, but as command and response exchanges with the controller are sequenced a change request may be delayed by an on-going polling thread request and response processing. To maintain good command application latency the standard 20s period between polls is shortened. Two mechanisms are used to do this:
- changing - if any channel is ramping then the current poll wait period is broken (by the incomming ON/OFF command) and until all channels are not changing a poll period of 1s is applied.
- nudging - adds an additional five 1s polls of hysteresis at the end of changing. Nudging bridges change latency seen at the controller (e.g. from OFF to RAMPING_UP when ON is request), and additionally inconsistent controller channel state handling (e.g. turning ON of HV channels on some boards when the channel status initially goes to ON before becoming RAMPING_UP a few seconds later.)
Once changing and nudging has stopped the standard poll period is reinstated.
Controller side asynchronous changes (channel trip) are seen at the next poll. Communication network failure (a real network outage, controller failure, crate power off, etc.) are seen after 5 failures to receive a poll reply from the controller.
The last example requiring nudging described above is a problem for middlelayer devices if they move immediately to another action having seen the controller’s incorrect channel ON report.
Controlling property update messages (set) to Karabo¶
The updatePolicy
property controls which values read on poll are written
(set) to Karabo. Available options are:
- ALL_NOCOND_EVERYPOLL - all controller values read are written every poll
- ALL_CHANGING_EVERYPOLL - all controller values read are written if changing or nudging every poll
- IV_NOCOND_EVERYPOLL - channel I,V and state, and crate and board status values every poll
Option definition strings contain three underscore separated tokens WHAT-PROPERTIES, WRITE-CONDITION, and WHEN-WRITTEN. In addition to the bulleted properties written a small number of bookkeeping messages are always written, these include: poll count, power supply uptime and poll channel summary.
The pollMessageCount
property shows the number of messages written, in a
single Hash set, for the last poll. For a crate with 100 channels
ALL_NOCOND_EVERYPOLL and ALL_CHANGING_EVERYPOLL can write O(10000) messages,
the later if not changing or nudging O(50); IV_NOCOND_EVERYPOLL writes O(250).