=============================================
Mpod Configuration and related Control issues
=============================================

Overview
--------

One expects vendor's of commercial products to provide configuration tools
which access the complete configuration space of the product. MPOD is no
exception to this, but as MPOD is a joint venture of two companies, Wiener
and Iseg, we have the added complication of two tools.

The MUSE Wiener tool connects to the MPOD crate controller via USB and allows
configuration (set/get) of network, crate and LV output board and channel
properties to be performed. Additionally saving (downloading to file) and
reading (uploading from file) all configuration properties to and from XML
file (CC-XML) is provided.

The snmpIsegControl Iseg tool connects to the MPOD crate controller via snmp
over Ethernet and allows configuration (set/get) of HV channel properties.
   
The aim of a control system is to provide an interface which allows many
differing products to be integrated and to expose these to the operator with
a consistent look-and-feel. The Karabo control tool must therefore provide the
configuration (set/get) possibilities that the operator requires to manipulate
single HV or LV channels plus higher level features which are unlikely to be
present in a vendor tool: channel sequencing and grouping, archiving, extended
graphics, etc. Implementation versus cost decisions have to be made as to what
functionality is left with the vendor tool and which are present in the
control system. Currently for MPOD the following tradeoff is made:

MUSE is used to configure the crate controller's network configuration and,
additionally, those properties which are not otherwise exposed via the
controller's network interface. The latter non-exposure issue also means
that download and upload functionality remains a vendor tool activity.

Karabo will use the crate controllers snmp over Ethernet protocol to control
and monitor properties required for operation.

MPOD non-volatile memory and configuration persistence
------------------------------------------------------

The configuration of the crate is stored in MPOD controller non-volatile
memory (NVRAM) and is applied on controller f/w startup. This is a useful
feature for read only (RO) properties (S/N, max output voltage, etc.) but
has the inherent drawback for configurable (RW) properties (last set target
voltage, current trip, etc.) that inappropriate settings are inherited by the
next user, or that false settings persists. The karabo mpod.py s/w device
therefore provides a load (and save) configuration action that resets (stores)
properties to their stored values.

MPOD geometrical addressing and channel numbering consequences
--------------------------------------------------------------

Boards and their channels are addressed geometrically using the boards slot
number in the crate. The leftmost board slot, next to the controller, is slot
0 and boards to the right have slot number incrementally increasing.

Channel numbers within a board are similarly geometrical. On boards with
single channel output connectors channel 0 is the top connector and
last channel is the bottom connector. On multiple channel connectors this
definition is normally followed, but depends on the pin geometry of the
connector used.

The configuration definitions defined in WIENER-CRATE-MIB, see the SNMP and
MIB section, and the geometrical board addressing have the consequence that
the control system refers to channels in 16 channel board positioned in slot 0
as ``U0`` thru ``U15``, slot 1 channels as ``U100`` thru ``U115``, etc. The MPOD
configuration definition does not allow boards to have more than 100 channels.

The use of geometrical addressing results in a ``board swap`` safety hazard.
If two, different channel V/I output limits, boards are swapped, then it is
conceivable that an incorrect channel V/I can be applied. The karabo mpod.py
s/w device provides a load safety configuration action that prevents startup
when the board S/N does not match that expected for a slot.

Karabo multi-channel s/w device issues and mpod.py model
--------------------------------------------------------

The configuration quandary posed to the s/w developer is: are devices
configuration specific or not? Multi-board (and therefore multi-channel) crate
devices (MPOD power, Agilent gauge or ion-pump controllers, X2timer boards,
etc.) have, what could be called the schema evolution problem: a board may
be added later which has to be configured and controlled by the control system
s/w device. Such systems introduce additional problems, similar to dependency
issue, when introduced added channels are not independent. Dependency arises
when resources, e.g. set-points relays associated with gauges, within a crate
have sharing rules associated with them which are imposed by the controller.

The solution currently taken for multi-channel (MPOD and similar h/w)
systems is to implement a single lowest-level Karabo s/w device per system
plus additional breakout MDL s/w devices at higher levels providing single or
group channel functionality required. The lowest level s/w devices typically
implement board/channel discovery injecting the found channels into the
control system and apply channel configurations from saved files. For MPOD
known property tag attribute settings loaded are then used by breakout s/w
devices to control single or groups of channel actions and settings.

Karabo s/w to hardware f/w access issues
----------------------------------------

Access to multi-channel systems where different users 'own' different channels
can also be hazardous, but can be solved by partitioning using higher level
MDL devices.

More dangerous is where single session control (s/w device to h/w firmware)
is not possible, which allows additional actors to concurrently modify the
h/w configuration with potentially dangerous results. The SNMP control
communication protocol used between mpod.py s/w and controller f/w is UDP based
and is currently not authorization restricted (i.e. SNMPv2) and not safe w.r.t.
additional actors. SNMPv3 is reported to be available with the MPOD f/w and
its usage, if present, should be tested.

Overview of (mpod.py) configuration activity
--------------------------------------------

On entering the STARTING state mpod.py requests the position of
the crate's main switch, if 'OFF' mpod.py goes to ERROR, and if 'ON':

* performs a discovery of crate boards and injects board and their channels
  Schema definitions (using the specified type list's property configuration
  options are STANDARD, AGIPD, DSSC and ALL)
* optionally verifies correct board-slot occupancy (going to ERROR on failure)
* optionally applies the requested type list configuration from file which
  might overwrite crate configuration values and apply different threshold
  limits properties like voltage and current.

before going to ACTIVE.

.. note::
    In the absence of a karabo-facility-wide networked database (the project
    database is a candidate, but lacks versioning) the configuration sources
    are xml files stored in $KARABO/var/data/mpod/. Although non-optimal this
    location allows copy and reuse, and (more dangerously) the modification
    functionality often required to edit configurations.

During the ACTIVE state the following configuration actions are allowed

* modifying single or multi-channel settings
* reapplying ``last saved`` configuration
* reapply ``last known to work`` configuration (must be in ALL!)
* save the current configuration

.. note::
    The last known-to-work configuration cannot be overwritten from the s/w
    device, but must be redefinable by a another tool. Ideally this is a
    configuration DB (web) tool, in our case an actor with access to
    $KARABO/var/data/mpod/. Saved configurations must be versioned.


Configuration saving (to store) and applying (from store) issues
----------------------------------------------------------------

Saving and applying configuration data is a building site. The following
issues and limitations exist.


* XML files stored on disk ($KARABO/var/data/mpod) are currently used to save
  configuration data. As mentioned above disk storage should be replaced by a
  visioning capable network storage. Configuration files are ``type`` specific,
  MIN_RW_LV type example is used in the bullets below.
* Device saved and loaded (click 'save configuration' or 'load configuration')
  configuration filename are: MIN_RW_LV_configuration.xml. The known-to-work
  version is MIN_RW_LV_configuration.ok.xml (click 'load known-to-work'),
  clearly an operator expert has to elevate a ...configuration.xml to a
  ...configuration.ok.xml file!
* The internal format used, a VectorHash (one Hash per channel), is the same as
  that used by mpodTableConfigurator.py (MDL device). Again due to the lack of
  suitable network storage.
* Stored channel information is retrieved from mpod.py's current configuration,
  e.g. self.getCurrentConfiguration().get("U101"). Property attribute settings
  are applied at runtime driven by option settings in the ``configuartion``
  node. This is a Karabo design pattern as Property attributes can only be
  set programatically within the device using the value of another property
  which the UI-client can set, non-exposed value are therefore excluded.
* The operating scope of mpod.py is the entire crate and configurations saved
  or applied are for the entire crate (all boards and all channels). Save
  and apply actions may adversely affect others if the crate has multiple
  users.
* The ALL type list exposes all properties others may reduce property count
  exposed per channel and the stored configurations contain only the exposed
  properties. Applying a type may adversely affect others if the crate has
  multiple users.
* Configuration filenames include their list type. If the property list of the
  type is modified it is likely that existing configuration files will contain
  superfluous or missing properties, if applied WARNing messages are logged.
* Known-to-work files can only be updated by someone with access to local
  storage.


Guidelines on creating configuration files
------------------------------------------



Crate w/o a saved configuration
+++++++++++++++++++++++++++++++

If another crate with an identical board-slot configuration exists, then copy
its ALL and other type configuration files to $KARABO/var/data/mpod/. Then,
to set all properties in crate NVRAM
instantiate mpod.py with configuration node V-limits = I-limits = NONE,
and top-level Type = All and ``Load channel configs`` = False. The apply
``Last known to work``.

If another crate with a similar board-slot configuration exists, then decide
whether this is like adding a new board to an existing system. If yes, then
perform the identical board-slot update, described above,
followed by a new board update, described below. The new board should be in
the crate.

Adding new boards to a crate with a saved configuration
+++++++++++++++++++++++++++++++++++++++++++++++++++++++

Insert the board and start mpod.py with Type = ALL, then configure the new
channels as required and save the configuration. Important properties to set
are the loadable ones. Then ``save the configuration`` and get an authorized
person to save the file as ``Last known to work``

Replacing boards in a crate with a saved configuration
++++++++++++++++++++++++++++++++++++++++++++++++++++++

Due to repair a same type board swap may be necessary. Insert the spare board
and start mpod.py with Type = ALL and apply ``Saved configuration``.

Removing baords from a crate with a saved configuration
+++++++++++++++++++++++++++++++++++++++++++++++++++++++

Remove the board and start mpod.py with Type = ALL, configuration node
V-limits = I-limits = NONE, and ``Load channel configs`` = False. Then
``save the configuration`` and get an authorized person to save the file as
``Last known to work``

Command and update latency; changing and nudging issues
-------------------------------------------------------

The command-query protocol implemented by the MPOD controller to foreign
control systems is polling, an event driven subscribe and publish
mechanism is not available. Therefore to see a change at the controller of a
channel (e.g. channel voltage, channel tripped, etc.) or crate (fan failure)
property requires the control s/w to periodically poll the controller for
information. Periodic means fixed periods between when the control s/w
requests an update and associated processing of the information received from
the controller and when the next request is sent.

The control s/w modifies the fixed poll period in a number of ways to improve
command and update latency. When a change requests (turn channel ON, set new
Target voltage...) is received - the change is immediately applied. This is
natural as the control system s/w is event driven, but as command and response
exchanges with the controller are sequenced a change request may be delayed by
an on-going polling thread request and response processing.
To maintain good command application latency the standard 20s period
between polls is shortened. Two mechanisms are used to do this:

   * changing - if any channel is ramping then the current poll wait period is
     broken (by the incomming ON/OFF command) and until all channels are not
     changing a poll period of 1s is applied.
   * nudging - adds an additional five 1s polls of hysteresis at the end of
     changing. Nudging bridges change latency seen at the controller (e.g. from
     OFF to RAMPING_UP when ON is request), and additionally inconsistent
     controller channel state handling (e.g. turning ON of HV channels on some
     boards when the channel status initially goes to ON before becoming
     RAMPING_UP a few seconds later.)

Once changing and nudging has stopped the standard poll period is reinstated.

Controller side asynchronous changes (channel trip) are seen at the next poll.
Communication network failure (a real network outage, controller failure,
crate power off, etc.) are seen after 5 failures to receive a poll reply from
the controller.

.. warning::

The last example requiring nudging described above is a problem for
middlelayer devices if they move immediately to another action having seen
the controller's incorrect channel ON report.

Controlling property update messages (set) to Karabo
----------------------------------------------------

The ``updatePolicy`` property controls which values read on poll are written
(set) to Karabo. Available options are:

   * ALL_NOCOND_EVERYPOLL - all controller values read are written every poll
   * ALL_CHANGING_EVERYPOLL - all controller values read are written if
     changing or nudging every poll
   * IV_NOCOND_EVERYPOLL - channel I,V and state, and crate and board status
     values every poll

Option definition strings contain three underscore separated tokens
WHAT-PROPERTIES, WRITE-CONDITION, and WHEN-WRITTEN. In addition to the bulleted
properties written a small number of bookkeeping messages are always written,
these include: poll count, power supply uptime and poll channel summary.

The ``pollMessageCount`` property shows the number of messages written, in a
single Hash set, for the last poll. For a crate with 100 channels
ALL_NOCOND_EVERYPOLL and ALL_CHANGING_EVERYPOLL can write O(10000) messages,
the later if not changing or nudging O(50); IV_NOCOND_EVERYPOLL writes O(250).