4. Compute environment

4.1. User account

User accounts are created during the initial registration step in the UPEX portal. At this point the account can only be used for the UPEX itself. If the user account is associated to the accepted and scheduled proposal, then the account is upgraded 4 weeks before the first scheduled beamtime of the given user. For the first early user period, the time between the upgrade of the accounts and the start of the experiment can be shorter. The upgraded account allows the user to access additional services such as the online safety training, the metadata catalog , and the computing infrastructure of the European XFEL.

By default upgraded user accounts are kept in this state for 1 year after the user’s last beamtime. An extension can be requested by the PI.

On-site guest WLAN (WiFi) is provided for all users. For those users with eduroam accounts provided by their home institute, access is straightforward. For those without eduroam account, a special registration procedure must be conducted to obtain guest access for the limited time period. After connecting to the XFEL-Guest network (also when using a network patch cable) and opening a web browser, the user will be able to register for the usage if guest network. The registration is valid for 10 days and 5 devices.

4.1.1. Tools

At different stages of the proposal, users are granted access to different services:

Stage

Access provided

Comments

Proposal submission

Access to User portal (UPEX)

Approval of proposal
and scheduling

Lightweight account

ca. 2 months before beam-time start

Preparation phase

Access to Metadata catalog and beamtime store
filesystem. LDAP account upgraded for members of
all accepted proposals.
First-time users: once A-form is submitted
and accepted. Deadline for A-form submission
is normally 4 weeks before beam-time start.

Beam time

Access to catalogs and dedicated online and
offline services

Data analysis

Access to catalogs and shared offline computing
resources, initially limited to 1 year time
period after beamtime.

Importantly, first-time users should aim for a timely A-form submission. This ensures that they will have a time window of several weeks prior to the start of their beam-time when access to the Maxwell computing resources and the associated storage system (GPFS) is granted. An additional benefit of such access is that working with example data becomes possible, in order to get accustomed to the peculiarities of EuXFEL data and workflows.

4.2. Online cluster

During beam time, exclusive access to a dedicated online cluster (ONC) is available only to the experiment team members and instrument support staff.

European XFEL aims to keep the software provided on the ONC identical to that available on the offline cluster (which is the Maxwell cluster).

4.2.1. Online cluster nodes in SASE 1

Beamtime in SASE 1 is shared between the FXE and the SPB/SFX instruments, with alternating shifts: when the FXE shift stops, the SPB/SFX shift starts, and vice versa.

Within SASE1, there is one node reserved for the SPB/SFX experiments (sa1-onc-spb), and one node is reserved for the FXE experiments (sa1-onc-fxe). These can be used by the groups at any time during experiment period (i.e. during shifts and between shifts).

Both the SPB/SFX and the FXE users have shared access to another 7 nodes. The default expectation is that those nodes are using during the shift of the users, and usage stops at the end of the shift (so that the other experiment can start using the machines during their shift). These are sa1-onc-01, sa1-onc-02, sa1-onc-03, sa1-onc-04, sa1-onc-05, sa1-onc-06, sa1-ong-01.

Overview of available nodes and usage policy:

name

purpose

sa1-onc-spb

reserved for SPB/SFX

sa1-onc-fxe

reserved for FXE

sa1-onc-01 to sa1-onc-06

shared between FXE, SPB use only during shifts

sa1-ong-01

shared between FXE, SPB
GPU: Tesla V100 (16GB)

These nodes do not have access to the Internet.

The name sa1-onc- of the nodes stands for SAse1-ONlineCluster.

4.2.2. Online cluster nodes in SASE 2

Beamtime in SASE 2 is shared between the MID and the HED instruments, with alternating shifts: when the MID shift stops, the HED shift starts, and vice versa.

Within SASE2, there is one node reserved for the MID experiments (sa2-onc-mid), and one node is reserved for the HED experiments (sa2-onc-hed). These can be used by the groups at any time during experiment period (i.e. during shifts and between shifts).

Both the MID and the HED users have shared access to another 7 nodes. The default expectation is that those nodes are using during the shift of the users, and usage stops at the end of the shift (so that the other experiment can start using the machines during their shift). These are sa2-onc-01, sa2-onc-02, sa2-onc-03, sa2-onc-04, sa2-onc-05, sa2-onc-06, sa2-ong-01.

Overview of available nodes and usage policy:

name

purpose

sa2-onc-mid

reserved for MID

sa2-onc-hed

reserved for HED

sa2-onc-01 to sa2-onc-06

shared between MID, HED use only during shifts

sa2-ong-01

shared between HED, MID
GPU: Tesla V100 (16GB)

These nodes do not have access to the Internet.

The name sa2-onc- of the nodes stands for SAse2-ONlineCluster.

4.2.3. Online cluster nodes in SASE 3

Beamtime in SASE 3 is shared between the SQS and the SCS instruments, with alternating shifts, when the SQS shift stops, the SCS shift starts, and vice versa.

Within SASE3, there is one node reserved for the SCS experiments (sa3-onc-scs), and one node is reserved for the SQS experiments (sa3-onc-sqs). These can be used by the groups at any time during experiment period (i.e. during and between shifts).

Both SASE3 instrument users have shared access to another 7 nodes. The default expectation is that those nodes are used during users shift, and usage stops at the end of the shift (so that the other experiment can start using the machines during their shift). These are sa3-onc-01, sa3-onc-02, sa3-onc-03, sa3-onc-04, sa3-onc-05, sa3-onc-06, sa3-ong-01.

Overview of available nodes and usage policy:

name

purpose

sa3-onc-scs

reserved for SCS

sa3-onc-sqs

reserved for SQS

sa3-onc-01 to sa3-onc-06

shared between SCS, SQS use only during shifts

sa3-ong-01

shared between SCS, SQS
GPU: Tesla V100 (16GB)

These nodes do not have access to the Internet.

The name sa3-onc- of the nodes stands for SAse3-ONlineCluster.

Note that the usage policy on shared nodes is not strictly enforced. Scientists across instruments should liaise for agreement on usage other than specified here.

4.2.4. Access to online cluster

During your beamtime, you can SSH to the online cluster with two hops:

# Replace <username> with your username
ssh <username>@max-exfl-display003.desy.de  # 003 or 004
ssh sa3-onc-scs   # Name of a specific node - see above

This only works during your beamtime, not before or after. You should connect to the reserved node for the instrument you’re using, and then make another hop to the shared nodes if you need them.

Workstations in the control hutches, and dedicated access workstations in the XFEL headquarters building (marked with an X in the map below) can connect directly to the online cluster.

Location of the ONC workstations

Workstations at Level 1

Fig. 4.1 Workstations at Level 1

Workstations at Level 1

Fig. 4.2 Workstation at Level 2

From these access computers, one can ssh directly into the online cluster nodes and also to the Maxwell cluster (see Offline cluster). The X display is forwarded automatically in both cases.

There is no direct Internet access from the online cluster possible.

4.2.5. Storage

The following folders are available on the online cluster for each proposal:

  • raw: data files recorded from the experiment (read-only).

  • usr: beamtime store. Into this folder users can upload some files, data or scripts to be used during the beamtime. This folder is mounted and thus immediately synchronised with a corresponding folder in the offline cluster. There is not a lot of space here (5TB).

  • scratch: folder where users can write temporary data, i.e. the output of customized calibration pipelines etc. This folder is intended for large amounts of processed data. If the processed data is small in volume, it is recommended to use usr. Data in scratch is considered temporary, and may be deleted after your experiment.

  • proc: Not currently used on the online cluster.

These folders are accessible at the same paths as on the offline cluster:

/gpfs/exfel/exp/<instrument>/<instrument_cycle>/p<proposal_id>/(raw|usr|proc|scratch)

Your home directory (/home/<username>) on online nodes is only shared between nodes within each SASE (e.g. sa1- nodes for SPB & FXE). It is also entirely separate from your home directory on the offline (Maxwell) cluster.

To share files between the online and the offline cluster, use the usr directory for your proposal, e.g. /gpfs/exfel/exp/FXE/202122/p002958/usr. All users associated with a proposal have access to this directory.

4.2.6. Access to data on the online cluster

Tools running on the online cluster can get streaming data from Karabo Bridge.

Data is also available in HDF5 files in the proposal’s raw folder, and can be read using EXtra-data. But there are some limitations to reading files on the online cluster:

  • You can’t read the files which are currently being written. You can read complete runs after they have been closed, and you can read partial data from the current run because the writers periodically close one ‘sequence file’ and start a new one.

  • Once runs are migrated to the offline cluster, the data on the online cluster may be deleted to free up space. You can expect data to remain available during a shift, but don’t assume it will stay on the online cluster for the long term.

  • Corrected detector data is not available in files on the online cluster, because the corrected files are generated after data is migrated to the offline cluster. You can get corrected data as a stream, though.

4.3. Offline cluster

The Maxwell cluster at DESY is available for data processing and analysis during and after the experiment. Users are welcome and encouraged to make themselves familiar with the Maxwell cluster and its environment well in advance of the beam time.

In the context of European XFEL experiments, the Maxwell cluster is also referred to as the “offline” cluster. Despite this name, you can connect to the internet from Maxwell. It is offline in that it can’t stream data directly from the experiments, unlike the “online cluster”.

4.3.1. Getting access

When a proposal is accepted, the main proposer will be asked to fill out the “A-form” which, among information on the final selection of samples to be brought to the experiment, also contains a list of all experiment’s participants. At time of submission of the A-form, all the participants have to have an active account in UPEX. This is the prerequesite for getting access to the facility’s computing and data resources. After submission of the A-form, additional participants can be granted access to the experiment’s data by PI request.

Users have access to:

  • HPC cluster

  • beamtime store, data repository and scratch space

  • web based tools

4.3.2. Graphical login

To use Maxwell with a remote desktop, you can either:

4.3.3. Jupyter

Jupyter notebooks can be used through https://max-jhub.desy.de

4.3.4. SSH access

ssh username@max-exfl-display.desy.de

Replace username with your EuXFEL username. Unlike most of the cluster, max-exfl-display is directly accessible from outside the DESY/EuXFEL network.

4.3.5. Running jobs

When you log in, you are on a ‘login node’, shared with lots of other people. You can try things out and run small computations here, but it’s bad practice to run anything for a long time or use many CPUs on a login node.

To run a bigger job, you should submit it to SLURM, our queueing system. If you can define your job in a script, you can submit it like this:

sbatch -p upex -t 8:00:00 myscript.sh
  • -p specifies the ‘partition’ to use. External users should use upex, while EuXFEL staff use exfel.

  • -t specifies a time limit: 8:00:00 means 8 hours. If your job doesn’t finish in this time, it will be killed. The default is 1 hour, and the maximum is 2 weeks.

  • Your script should start with a ‘shebang’, a line like #!/usr/bin/bash pointing to the interpreter it should run in, e.g.:

    #!/usr/bin/bash
    
    echo "Job started at $(date) on $(hostname)"
    
    # To use the 'module' command, source this script first:
    source /usr/share/Modules/init/bash
    module load exfel exfel_anaconda3
    
    python -c "print(9 * 6)"
    

To see your running and pending jobs, run:

squeue -u $USER

Once a job starts, a file like slurm-4192693.out will be created - the number is the job ID. This contains the text output of the script, which you would see if you ran it in a terminal. The programs you run will probably also write data files.

SLURM is a powerful tool, and this is a deliberately brief introduction. If you are submitting a lot of jobs, it’s worth spending some time exploring what it can do.

4.3.5.1. During beamtime

A reservation can be made for a beamtime so that your user group has exclusive access to a small subset of nodes. This is helpful if the computing cluster is busy, as you can always run some prioritised jobs, without waiting for nodes to become free in the general user partition.

As how many nodes to reserve, or if a reservation is needed, is dependent on the requirements of the experiment, speak to your local contact in the instrument group if you want to request a reservation.

The name of the reservation is upex_NNNNNN where NNNNNN is a 6-digit zero-padded proposal number, e.g. upex_002416 would be the reservation for proposal number 2416. Access to this reservation is permitted to anybody defined as a team member on myMdC.

Note that the reservation is only valid for 6 hours before and after a scheduled beamtime.

During your beamtime, if a reservation has been made, members of your group can submit jobs to the reservation with the --reservation flag on slurm commands. For example:

sbatch --reservation=upex_002416 ...

You can check the details of your reservation like this:

scontrol show res upex_002416

The output of this command tells you the period when the reservation is valid, the reserved nodes, and which usernames are allowed to submit jobs for it:

[@max-exfl001]~/reservation% scontrol show res upex_002416
ReservationName=upex_002416 StartTime=2019-03-07T23:05:00 EndTime=2019-03-11T14:00:00 Duration=3-14:55:00
Nodes=max-exfl[034-035,057,166] NodeCnt=4 CoreCnt=156 Features=(null) PartitionName=upex Flags=IGNORE_JOBS
TRES=cpu=312
Users=bob,fred,sally Accounts=(null) Licenses=(null) State=ACTIVE BurstBuffer=(null) Watts=n/a

To view all of the current reservations:

scontrol show reservations --all

4.3.6. Software available

The EuXFEL data analysis group provides a number of relevant tools, described in Data analysis software. In particular, a Python environment with relevant modules can be loaded by running:

module load exfel exfel_anaconda3

4.3.7. Storage

Users will be given a single experiment folder per beam time (not per user) through which all data will be accessible, e.g:

/gpfs/exfel/exp/<instrument>/<instrument_cycle>/p<proposal_id>/(raw|usr|proc|scratch)

Storage

Quota

Permission

Lifetime

comments

raw

None

Read

5 years

Raw experiment data

proc

None

Read

6 months

Processed data e.g. calibrated

usr

5TB

Read/Write

2 years

User data, results

scratch

None

Read/Write

6 months

Temporary data (lifetime not guaranteed)

The data lifetimes above are minima set by the data policy - data may be kept for longer than this. However, data may be moved from storage designed for fast access to slower archival systems, even within these minimum lifetimes.

4.3.8. Synchronisation

The data in the raw directories are moved from the online cluster (at the experiment) to the offline (Maxwell) cluster as follows:

  • when the run stops (user presses button), the data is flagged that it can be copied to the Maxwell cluster, and is queued to a copy service (provided by DESY). The data will be copied without the user noticing.

  • Once the data is copied, the data is ‘switched’ and becomes available on the offline cluster.

    The precise time at which this switch happens after the user presses the button cannot be predicted: if the data is copied already (in the background), it could be instantaneous, otherwise the copy process needs to finish first.

  • The actual copying process (before the switch) could take anything between minutes to hours, and will depend on (i) the size of the data and (ii) how busy the (DESY) copying queue is.

  • The usr folder is mounted from the Maxwell cluster, and thus always identical between the online and offline system. However, it is not optimised for dealing with large files and thus potentially slow for lager files. There is a quota of 5TB.

4.4. Running containers

Singularity is available on both the online and offline cluster. It can be used to run containers built with Singularity or Docker.

Running containers with Docker is experimental, and there are some complications with filesystem permissions. We recommend using Singularity to run your containers, but if you need Docker, it is available.

  • On the online cluster, Docker needs to be enabled for your account. Please email it-support@xfel.eu to request it.

  • On the offline cluster, Docker only works on nodes allocated for SLURM jobs (see Running jobs), not on login nodes.

4.5. Compute environment FAQ

Frequently asked questions

tbd