Skip to content

Newsletters

2023/05/22 - New Approach to Environment Management

A number of changes are happening to the way that python environments and kernels are being managed by the EuXFEL Data Analysis (DA) Group. The changes are intended to improve the long-term reproducibility of data analysis performed using these environments, as well as make maintenance and updating of environments more consistent.

Along with these changes we have also improved the documentation page on environments.

If there are any missing packages you'd like to be added to the environment or if you run into any issues please contact da-support@xfel.eu. Or if you have some questions/suggestions then feel free to contact me (Robert Rosca) directly via email.

Summary of Changes

  1. Old environment is deprecated - exfel_anaconda3 will no longer be updated, it will remain so that any code that relies on it will continue working, but we strongly encourage everybody to move to the new environment.
  2. New environment available - use module load exfel exfel-python to load the most current environment.
  3. Per-cycle environments - each cycle (half of the year) will get its own environment, once a cycle is over its environment is frozen to improve long-term reproducibility, ensuring that code ran during a beamtime will run the same way in the future.
  4. More useful versions - module load exfel-python will load the environment for the current cycle, previous environments can be loaded by specifying versions in the module load command, e.g. module load exfel-python/202301.
  5. Long-term environment history/archive - the environment specifications and any custom package definitions are available publicly on the European XFEL/Environments GitHub page.
  6. Faster and consistent updates - Numpy's NEP 29 ("Recommend Python and NumPy version support as a community policy standard") has been adopted by many packages in the scientific python ecosystem. The versions of packages in each cycle environment will follow this policy.
  7. Enhanced documentation - the Environments documentation contains much more information on using our environments and creating your own.
  8. Migration to Mamba - DESY have moved to using Mamba for python environment management instead of Conda, we follow this move for consistency.

Details

Deprecation of exfel_anaconda3

exfel_anaconda3/1.1.2 will be frozen and no longer modified.

The environment will not be deleted and it will still be possible to access it, however by the end of the year the module will be renamed to DEPRECATED/exfel_anaconda3.

Per-cycle Environments

Currently the exfel_anaconda3 environment has received occasional updates, with version bumps for major changes, however there was no record of the previous states of the environment, which can make it difficult to reproduce the results of data analysis in the future as the versions of packages used at the time of the analysis is lost.

To improve long-term reproducibility we will be creating an environment every cycle, and only make minor changes to the environments during a cycle. Major changes (like updating the Python version) will be done when a new environment is deployed at the start of a new cycle.

Overall the new approach to environment management will:

  1. Log the state of all packages whenever a change is made to the current environment.
  2. Create a new environment every cycle.
  3. 'Freeze' previous cycle's environments.

With this approach it would be possible to, in a few years from now, look at when a proposal ran/when data analysis was performed, load the relevant environment from that time (e.g. module load exfel exfel-python/202301), and know that the packages will be the same as when the original data analysis war performed.

Scheduled Package Updates

Numpy's NEP 29 suggests a 42-month support window for Python versions which we will also follow, however we will stick with the lowest version of python that is supported. For example:

On Jun 23, 2020 drop support for Python 3.6  (initially released on Dec 23, 2016)
On Dec 26, 2021 drop support for Python 3.7  (initially released on Jun 27, 2018)
On Apr 14, 2023 drop support for Python 3.8  (initially released on Oct 14, 2019)
On Apr 05, 2024 drop support for Python 3.9  (initially released on Oct 05, 2020)
On Apr 04, 2025 drop support for Python 3.10 (initially released on Oct 04, 2021)
On Apr 24, 2026 drop support for Python 3.11 (initially released on Oct 24, 2022)

The current environment 202301 is on Python 3.9, support for 3.9 will be dropped next year on 2024-04-05, so the environment for cycle 202401 will bump the python version up to 3.10.

Additionally, all packages will receive a version update (within some constraints) when a new cycle environment is created.

Documentation Overhaul

The environment documentation has been updated to reflect these changes, and now includes a lot more details the module system, using our environments, and creating your own, which can be shared with users as a starting point. Some important sections include:

Shareable Environment Specifications

The full environment specification is logged via git whenever a change is made to an environment. This is done for reproducibility, but also comes with the benefit that it is possible to share the environment specification file(s) and recreate our environments outside of Maxwell.

The repository containing environment specifications and package recipes is available on GitHub at European XFEL/Environments.