Skip to content

Data Files

Data Policy

The full data policy of European XFEL is available at https://www.xfel.eu/users/policies/index_eng.html, with a short summary provided in the Overview Page.

Data Folders

On both the Online Cluster and the Offline Cluster, European XFEL data is stored in /gpfs/exfel/exp. Each instrument has a directory which contains cycles, cycles contain proposals, and the proposals contain the data collected as well as locations for users to store their code and data outputs:

/gpfs/exfel/exp/
├── {INSTRUMENT}   ├── {CYCLE}      ├── p{PROPOSAL}      └── ...
│   └── ...
└── ...

More Information

The contents of the proposal directory is explained in:

For example, if you have an experiment at SPB, cycle 201701, with proposal number 2012, then the proposal directory will be /gpfs/exfel/exp/SPB/201701/p002012.

The raw data from each run goes in a subfolder such as raw/r0104. Once this has been migrated to the offline cluster, corrected detector data will be automatically produced in another subfolder such as proc/r0104.

Reading Data in Python

More Information

EXtra-data documentation

The volume of data and the number of instruments make even something as simple as viewing a single image from a large detector non-trivial as it involves opening multiple files, each with their own internal hierarchical data structure, finding and reading the correct slices of data out of them, and then bringing it all together.

To make data access as easy as possible we provide a Python package named EXtra-data to read data from European XFEL.

We also provide a tool to create a virtual CXI file, which can be used with any tools that take CXI-style data: HDF5 Virtualise.

Combining Detector Data from Multiple Modules

The pixel detectors (AGIPD and LPD) record data in separate files for each of their 16 modules.

The EXtra-data Python library can combine detector modules into a numpy array, as shown in this example.

Alternatively, the modules can be combined in a single view as an HDF5 virtual dataset with the extra-data-make-virtual-cxi command, allowing the data to be processed by external tools such as CrystFEL. Use of this tool is covered in How to Make Virtual CXI Data Files.

Geometry Files

Geometry files specify the location of the detector modules in real space. Please contact your instrument scientist regarding obtaining geometry files for the detector at each instrument.

EXtra-geom is a Python library used to describe the physical layout of multi-module detectors at European XFEL, and to assemble complete detector images.

One geometry file for the FXE LPD detector is available online

One geometry file for the SPB/SFX AGIPD1M detector is available from https://cxidb.org/id-83.htm (Experiment by Anton Barty).

EXtra-geom can read both file formats, see for example Assembling Detector Data into Images.

GeoAssembler can be used to create or adjust geometry files by visually moving detector quadrants around.

A more systematic provision of geometry files is in preparation.

Data Format

Experimental data are taken in the context of the following categories:

Context Description
Instrument Each instrument has their own label. For each instrument, there are multiple cycles
Cycle A scheduling period in which multiple user experiments will take place (of the order of months)
Proposal Each user experiment (beamtime) gets a proposal number
Run When a user starts acquiring data, then a new run starts, until that data acquisition is stopped
Train id There are 10 pulse trains per second
Pulse id Up to 2700 pulses per train, individually counted. Counter starts from zero for every train

We distinguish different types of data:

Context Description
Control Data one entry for each train, even if the value changes less often than that
Instrument Data may have zero, one or multiple entries per train. Your main experimental results, e.g. from X-ray detectors, will usually be 2d/1d detector data
Run data is a superset of Control data, captured once per run

Data is stored in HDF5 files; there may be tens to thousands of files in a single run. We aim to enable you to analyse data without needing to know the details of the file structure, e.g. by using EXtra-data in Python, or by generating a CXI file to represent a run as previously mentioned.

If you do need to read the EuXFEL HDF5 files yourself, however, the structure is described in the data files format page of the EXtra-data docs.

HDF5 Chunking & Compression

Both raw and corrected data may be stored using the HDF5 chunked layout. Some parts of the corrected data are compressed using the gzip compression filter in HDF5. In particular, detector gain stage and mask datasets compress well, saving a lot of disk space.

You can examine compression and chunk sizes using the GUI HDF View tool, our h5glance command line tool, or h5ls -v:

  $ h5glance /gpfs/exfel/exp/XMPL/201750/p700000/raw/r0803/RAW-R0803-AGIPD00-S00000.h5 \
    INSTRUMENT/SPB_DET_AGIPD1M-1/DET/0CH0:xtdf/image/data
  /gpfs/exfel/exp/XMPL/201750/p700000/raw/r0803/RAW-R0803-AGIPD00-S00000.h5/INSTRUMENT/SPB_DET_AGIPD1M-1/DET/0CH0:xtdf/image/data
        dtype: uint16
        shape: 16000 × 2 × 512 × 128
     maxshape: Unlimited × 2 × 512 × 128
       layout: Chunked
        chunk: 16 × 2 × 512 × 128
  compression: None (options: None)
  ...

  $ h5ls -v /gpfs/exfel/exp/XMPL/201750/p700000/raw/r0803/RAW-R0803-AGIPD00-S00000.h5/INSTRUMENT/SPB_DET_AGIPD1M-1/DET/0CH0:xtdf/image/data
  Opened "/gpfs/exfel/exp/XMPL/201750/p700000/raw/r0803/RAW-R0803-AGIPD00-S00000.h5" with sec2 driver.
  data                     Dataset {16000/Inf, 2/2, 512/512, 128/128}
      Location:  1:12333
      Links:     1
      Modified:  2017-11-20 04:57:44 CET
      Chunks:    {16, 2, 512, 128} 4194304 bytes
      Storage:   4194304000 logical bytes, 4194304000 allocated bytes, 100.00% utilization
      Type:      native unsigned short

The compressed datasets are stored with a single detector frame per chunk, to minimise the impact on analysis code reading the data.

If you observe pathologically slow reading, check whether you are accessing a compressed dataset with a chunk size larger than one frame. HDF5 decompresses an entire chunk at once, and it may be redoing this for each frame you read. You can avoid this by setting a cache size large enough to hold one complete chunk. The necessary C code looks something like this:

   hid_t dapl = H5Pcreate(H5P_DATASET_ACCESS);
   // Set a 32 MB cache size (calculate at least the size of one chunk)
   H5Pset_chunk_cache(dapl, H5D_CHUNK_CACHE_NSLOTS_DEFAULT, 32 * 1024 * 1024, 1);
   hid_t h5_dataset_id = H5Dopen(h5_file_id, ".../image/gain", dapl);

To benefit from chunk caching, you need to reuse the opened dataset ID for successive reads, instead of opening and closing it to read each frame.

Example Data

Some example datasets are available so you can try reading the files before your experiment. There may be differences, e.g. in naming, when you collect new data, so it's a good idea to talk to the relevant instrument group and the data analysis group at European XFEL as well.

Example Runs on Maxwell

We prepared an environment to mimic real experiment data cycle at the European XFEL. For this, we have a fake instrument called XMPL which contains runs giving an overview of the data to expect. This data is made available on Maxwell:

    /gpfs/exfel/exp/XMPL/201750/p700000

It follows the same structure that each experiment have (see Offline Analysis - Offline Storage for more details), and will be used to share different example of file format generated at the facility, from all instrument and detectors.

These datasets are also linked to the Metadata catalog and information about the data (instrument, detector, sample, date, ...) can be found there (MDC). Each run datasets comprise raw data (in .../p700000/raw/run_id) calibrated data (in .../p700000/proc/run_id) and a set of sample script to read the data (in .../p700000/usr/run_id).

List of sample data sets:

Run ID Instrument Detector/Device Sample Run Type Date Comments
r0001 SPB AGIPD Water Standard 2018-04-03 Commissioning
r0002 SPB AGIPD Lysozyme (liquid) Standard 2018-04-03 Commissioning
r0003 SPB AGIPD Lysozyme (liquid) Standard 2018-04-03 Commissioning
r0004 SPB AGIPD Lysozyme (liquid) Standard 2018-04-03 Commissioning
r0005 SPB AGIPD Lithium titanate Standard 2018-08-18 Geometry calibration
r0006 SPB AGIPD Lithium titanate 1 1 Standard 2017-11-20 commissioning
r0007 FXE LPD Aqueous solution of [Fe(bpy)3]2+ Standard 2017-09-18 User Run
r0008 SA1_XTD2 XGM N/A Standard 2019-02-15 Commissioning (XPD)
r0009 SA3_XTD10 XGM N/A Standard 2019-02-15 Commissioning (XPD)
r0010 SPB AGIPD N/A Calibration - Dark high gain 2019-08-10 Commissioning
r0011 SPB AGIPD N/A Calibration - Dark medium gain 2019-08-10 Commissioning
r0012 SPB AGIPD N/A Calibration - Dark low gain 2019-08-10 Commissioning
r0013 SPB AGIPD Lysozyme Standard 2019-08-11 Commissioning
r0014 SPB AGIPD Lysozyme Standard 2019-08-11 Commissioning
r0015 SPB AGIPD Lysozyme Standard 2019-08-11 Commissioning
r0016 SPB AGIPD Lysozyme Standard 2019-08-11 Commissioning
r0017 SPB AGIPD Lysozyme Standard 2019-08-11 Commissioning
r0018 SPB AGIPD Lysozyme Standard 2019-08-11 Commissioning
r0019 SQS Digitizer Xenon Standard 2019-10-11 Commissioning
r0020 SQS Digitizer Xenon Standard 2019-10-11 Commissioning
r0021 SPB Jungfrau Lysozyme Standard 2019-05-05 IRDa commissioning
r0022 SPB Jungfrau Lysozyme Standard 2019-05-05 IRDa commissioning
r0023 SCS DSSC 2-Co8_pt14_8fold - 30nm Pt cap Standard 2019-05-05 p002212 helicity switching
r0024 SCS DSSC 1-Co10_Pt_6fold Standard 2019-05-05 p002212 helicity switching
r0025 SCS DSSC Ni-20 MLs - b Standard 2019-05-05 p002212 helicity switching
r0026 SCS DSSC Ni75-11 MLs-b Standard 2019-05-05 p002212 helicity switching
r0027 MID AGIPD Silica 50 nm Standard 2019-09-21 Commissioning

Note

Mock data can be generated using the extra_data package, e.g.:

>>> from extra_data.tests.make_examples import make_agipd_example_file

>>> make_agipd_example_file('agipd_example.h5')

>>> from extra_data.tests.make_examples import write_file, Motor, ADC, XGM

>>> write_file('test_file.h5', [
        XGM('SPB_XTD1_XGM/XGM/MAIN'),
        Motor('SPB_DET_MOT/MOTOR/AGIPD_X'),
        Motor('SPB_DET_MOT/MOTOR/AGIPD_Y'),
        Motor('SPB_DET_MOT/MOTOR/AGIPD_Z'),
        ADC('SA1_XTD2_MPC/ADC/1', nsample=0, channels=(
            'channel_3.output/data',
            'channel_4.output/data',
            'channel_5.output/data'))
        ], ntrains=500, chunksize=50)

This only creates the structure of the files; the data will all be zeros.

Public Data from EuXFEL in CXIDB

The following entries at https://cxidb.org/ stem from user experiments done at our facility:

CXIDB ID Instrument Authors Sample Wavelength Deposition Date Publication DOI
id80 SPB/SFX Wiedorn et al. Lysozyme 1.33 Å (9.30 keV) 2018-08-13 10.1038/s41467-018-06156-7
id83 SPB/SFX Wiedorn et al. β-lactamase 1.33 Å (9.30 keV) 2018-08-13 10.1038/s41467-018-06156-7
id87 SPB/SFX Grünbein et al. Urease, Concanavalin A/B 1.66 Å (7.47 keV) 2018-09-12 10.1038/s41597-019-0010-0
id98 SPB/SFX Yefanov et al. Lysozyme 1.33 Å (9.30 keV) 2020-02-07 10.1063/1.5124387
id100 SPB/SFX Pandey et al. Photoactive Yellow Protein 1.33 Å (9.30 keV) 2019-08-12 10.11577/1577287
id111 SPB/SFX Gisriel et al. Photosystem I 1.33 Å (9.30 keV) 2020-11-21 10.1038/s41467-019-12955-3
id152 SPB/SFX Echelmeier et al. KDO8PS 1.33 Å (9.30 keV) 2021-07-22 10.1038/s41467-020-18156-7

Linking Publications to Data

A Digital Object Identifier (DOI) will be generated for each successful proposal. Each publication should reference the DOI of the data.

Downloading Experiment Data

Experiments at European XFEL typically generate large amounts of data - from around 10 TB up to petabytes from one beamtime. Because of this, we recommend that you analyse data on the Maxwell cluster rather than downloading it.

If you do need to download experimental data, there are two options:

  • Using Globus (see Globus How To). To login to the EuXFEL endpoint you should use the organizational login ("Use your existing organizational login"), search for "DESY" and select "Deutsches Elektronen-Synchrotron DESY", click continue, and enter your EuXFEL credentials followed by OTP token to authenticate. To view the data click the search box for "Collection", enter "EuXFEL", and select the "EuXFEL Data Collection".
  • Using FTP from ftp.xfel.eu. You can use this with lftp (command line), or FileZilla (GUI). TLS encryption ('explicit FTPS') is required. The FTP server is not considered a critical service, so it may be unavailable at times.

  1. Lithium titanate, spinel; nanopowder, <200 nm particle size (BET), >99%; CAS Number 12031-95-7; Empirical formula Li4Ti5O12; https://www.sigmaaldrich.com/catalog/product/aldrich/702277