Offline Cluster (Maxwell)¶

Quote

The Maxwell Cluster is the computing platform at DESY (Hamburg) for Photon Science data analysis, GPU accelerated computations (AI), High Performance Computing and scientific computing in general. The cluster serves myriads of applications and scientific fields.”

More Information

General information about Maxwell (DESY docs)
Details of hardware for EuXFEL (only visible inside the DESY/EuXFEL network)

The Maxwell cluster at DESY is available for data processing and analysis during and after the experiment. Users are welcome and encouraged to make themselves familiar with the Maxwell cluster and its environment well in advance of the beam time.

In the context of European XFEL experiments, the Maxwell cluster is also referred to as the Offline Cluster. Despite this name, you can connect to the internet from Maxwell. It is offline in that it can't stream data directly from the experiments, unlike the Online Cluster.

Getting Access¶

When a proposal is accepted, the main proposer will be asked to fill out the "A-form" which, among information on the final selection of samples to be brought to the experiment, also contains a list of all experiment's participants. At time of submission of the A-form, all the participants have to have an active account in UPEX. This is the prerequisites for getting access to the facility's computing and data resources. After submission of the A-form, additional participants can be granted access to the experiment's data by PI request.

Users have access to:

HPC cluster
Beamtime store, data repository and scratch space
Web based tools

There are a few main entry points to using Maxwell:

Remote Desktop
JupyterHub
SSH

This is summarised in the following flow chart:

Remote Desktop¶

More Information

Interactive login (DESY docs)

To use Maxwell with a remote desktop, you can either:

Go to https://max-exfl-display.desy.de:3389 in a web browser
Or install FastX and connect to max-exfl-display.desy.de

JupyterHub¶

Jupyter notebooks can be used through https://max-jhub.desy.de

More Information

JupyterHub and Notebooks

SSH access¶

ssh $USER@max-exfl-display.desy.de

Replace $USER with your EuXFEL username. Unlike most of the cluster, max-exfl-display is directly accessible from outside the DESY/EuXFEL network.

If you are frequently connecting via SSH, it is recommended to use Kerberos for authentication to avoid repeated password entry requests.

More Information

DESY Advanced Usage Documentation covers:

How to use bastion as an ssh/scp/sftp proxy?
How to use Kerberos authentication? (for 'passwordless' login)
How to reach internal web servers?
And more...

Allocating Resources - Slurm¶

When you log in, you are on a 'shared node' or 'login node', shared with lots of other people. You can try things out and run small computations here, but it's bad practice to run anything for a long time or use many CPUs on a login node.

To run a bigger job, you should submit it to SLURM, our queueing system.

More Information

There are three main ways to run jobs via slurm:

salloc - allocate a node for interactive use via shell
sbatch - submit a job via a batch script
srun - submit a job via command line arguments

Note that the Maxwell documentation page goes over these in far more detail, this is just a brief overview.

See Running Batch Jobs on Maxwell for more information.

Summary of SLURM commands

sbatch - Script Submissionsrun - Single Command Executionsalloc - Interactive Allocation

Submits a batch file to slurm
This is non-blocking - after submission you can close the SSH session or carry on with other tasks
Slurm queues/allocates the requested resources, script is then executed on node
Once the script finishes, allocation is released
Use cases:
- Computationally intense work
- Multi-node workflows
Recommendations:
- Short to long-length analysis - seconds to days
- Preferred way of running jobs on Maxwell

Submits a single command to run
This is blocking - terminal waits until command finishes
Slurm queues/allocates the requested resources, command is executed
Once the script finishes, allocation is released
Use cases:
- Computationally intense work
- Multi-node workflows
Recommendations:
- Short to medium-length analysis - seconds to hours

Semi-blocking - new shell is spawned after allocation, exiting shell releases the allocation
Slurm queues/allocates the requested node, echoes the host ame, you can then SSH to the node
Once the shell is exited or the time elapses, allocation is released
Use cases:
- Interactive development - executing srun on the allocation, or an interactive shell session
- Medium-length analysis - minutes to hours
Recommendations:
- Only recommended for short periods of interactive analysis/development
- salloc means that resources are blocked even when idle, wasting compute resources
- Only use when unavoidable - stick to srun/sbatch when possible

Examples¶

If you can define your job in a script, you can submit it like this:

sbatch -p upex -t 8:00:00 myscript.sh

-p specifies the 'partition' to use. External users should use upex, while EuXFEL staff use exfel.
-t specifies a time limit: 8:00:00 means 8 hours. If your job doesn't finish in this time, it will be killed. The default is 1 hour, and the maximum is 2 weeks.

Your script should start with a 'shebang', a line like #!/usr/bin/bash pointing to the interpreter it should run in, e.g:

#!/usr/bin/bash

echo "Job started at $(date) on $(hostname)"

# To use the 'module' command, source this script first:
source /usr/share/Modules/init/bash
module load exfel exfel-python

python -c "print(9 * 6)"

To see your running and pending jobs, run:

squeue -u $USER

More Information

Once a job starts, a file like slurm-4192693.out will be created - the number is the job ID. This contains the text output of the script, which you would see if you ran it in a terminal. The programs you run will probably also write data files.

SLURM is a powerful tool, and this is a deliberately brief introduction. If you are submitting a lot of jobs, it's worth spending some time exploring what it can do.

During beamtime¶

A reservation can be made for a beamtime so that your user group has exclusive access to a small subset of nodes. This is helpful if the computing cluster is busy, as you can always run some prioritised jobs, without waiting for nodes to become free in the general user partition.

As how many nodes to reserve, or if a reservation is needed, is dependent on the requirements of the experiment, speak to your local contact in the instrument group if you want to request a reservation.

The name of the reservation is upex_NNNNNN where NNNNNN is a 6-digit zero-padded proposal number, e.g. upex_002416 would be the reservation for proposal number 2416. Access to this reservation is permitted to anybody defined as a team member on myMdC.

Important

Note that the reservation is only valid for 6 hours before and after a scheduled beamtime.

During your beamtime, if a reservation has been made, members of your group can submit jobs to the reservation with the --reservation flag on slurm commands. For example:

sbatch --reservation=upex_002416 ...

You can check the details of your reservation like this:

scontrol show res upex_002416

The output of this command tells you the period when the reservation is valid, the reserved nodes, and which usernames are allowed to submit jobs for it:

[@max-exfl001]~/reservation% scontrol show res upex_002416
ReservationName=upex_002416 StartTime=2019-03-07T23:05:00 EndTime=2019-03-11T14:00:00 Duration=3-14:55:00
Nodes=max-exfl[034-035,057,166] NodeCnt=4 CoreCnt=156 Features=(null) PartitionName=upex Flags=IGNORE_JOBS
TRES=cpu=312
Users=bob,fred,sally Accounts=(null) Licenses=(null) State=ACTIVE BurstBuffer=(null) Watts=n/a

To view all of the current reservations:

scontrol show reservations --all

Available Software¶

More Information

The EuXFEL data analysis group provides a number of relevant tools, described in software, as well as some software environments set up with commonly used packages which is described in the Software Environments section.

In particular, a Python environment with relevant modules can be loaded by running:

module load exfel exfel-python

Doing this will load the current default environment. To improve reproducibility, a new environment is created every cycle and the previous environment is left in its current state. This you could do, for example, module load exfel exfel-python/202301 in 2024 and have access to the software that was being used in 2023.

Offline Storage¶

Users will be given a single experiment folder per beamtime (not per user) through which all data will be accessible, e.g:

/gpfs/exfel/exp/$INSTRUMENT/$CYCLE/p$PROPOSAL_ID/{ raw, usr, proc, scratch }

Storage	Quota	Permission	Lifetime	comments
raw	None	Read	5 years	Raw experiment data
proc	None	Read	6 months	Processed data e.g. calibrated
usr	5TB	Read/Write	2 years	User data, results. Backed up every 6 hours to `/gpfs/exfel/u/usr/.snapshots`
scratch	None	Read/Write	6 months	Temporary data (lifetime not guaranteed)

The data lifetimes above are minima set by the Scientific Data Policy - data may be kept for longer than this. However, data may be moved from storage designed for fast access to slower archival systems, even within these minimum lifetimes.

Synchronisation¶

The data in the raw directories are moved from the online cluster (at the experiment) to the offline (Maxwell) cluster as follows:

When the run stops (user presses button), the data is flagged that it can be copied to the Maxwell cluster, and is queued to a copy service (provided by DESY). The data will be copied without the user noticing.
Once the data is copied, the data is 'switched' and becomes available on the offline cluster.

The precise time at which this switch happens after the user presses the button cannot be predicted: if the data is copied already (in the background), it could be instantaneous, otherwise the copy process needs to finish first.

Note

The data will then (at some point) also be removed from the online system. It is expected, at least for the initial experiments, that the data remains available in the online system at least to the end of the shift. (A deviation from this would be necessary if so large amounts of data accumulate on the online system during one shift, that some needs to be removed to be able to record more.)

The actual copying process (before the switch) could take anything between minutes to hours, and will depend on (i) the size of the data and (ii) how busy the (DESY) copying queue is.
The usr folder is mounted from the Maxwell cluster, and thus always identical between the online and offline system. However, it is not optimised for dealing with large files and thus potentially slow for lager files. There is a quota of 5TB.

Running Containers¶

Singularity (now Apptainer) is available on both the online and offline cluster. It can be used to run containers built with Singularity or Docker.

Running containers with Docker is experimental, and there are some complications with filesystem permissions. We recommend using Singularity to run your containers, but if you need Docker, it is available.

On the online cluster, Docker needs to be enabled for your account. Please email it-support@xfel.eu to request it.
On the offline cluster, Docker only works on nodes allocated for SLURM jobs (see Allocating Resources), not on login nodes.

More Information