1. Overview of European XFEL data

1.1. Trains and pulses

European XFEL generates a “pulse train” of up to 2700 individual X-ray pulses, at a rate of 10 trains per second. Within a train, pulses arrive with a maximum frequency of 4.5 MHz (220 ns between pulses).


Fig. 1.1 The structure of trains and pulses at European XFEL.

Each train receives a unique integer train ID, which is used to find and match up data. Some kinds of data are recorded once per train, while others may be per pulse, or even higher sampling rates.

Information about the pulses generated in each SASE can be recorded with a BUNCH_DECODER device. This is a subset of the raw ‘bunch pattern’ data available through a TIMESERVER device. An example notebook illustrates how to read this information.

1.2. Sources and keys

Data is recorded from various sources. This includes the X-ray detectors which are the main data sources for many experiments, as well as separate sensors such as temperature sensors or devices to measure parameters of the beam. Controllable devices can also be data sources, to record things like motor positions.

Data within each source is organised by keys. For instance, an XGM is an apparatus for measuring the beam (see [Maltezopoulos2019] for details). SA1_XTD2_XGM/XGM/DOOCS is the source name of a specific XGM, and beamPosition.ixPos is one of its keys, recording the X position of the beam.


Multi-module detectors typically have a separate source for each module. Various other devices (such as XGMs) are split for technical reasons into one ‘control’ and one ‘instrument’ source, although not all keys of a ‘control’ source can be controlled.

1.3. Tools and services provided by European XFEL

Broadly speaking, data analysis at EuXFEL is separated into two categories: online and offline. ‘Online analysis’ refers to realtime analysis of data being streamed during an experiment, and the programs we have for this run on the Online cluster. ‘Offline analysis’ refers to analysis of data that has been saved to files and migrated to the Offline cluster.

The facility provides:

[Fangohr2017] provides some context about the data analysis provisions.

1.3.1. Detector calibration

The fast X-ray detectors at European XFEL have some unusual features which pose challenges for processing their output into meaningful scientific data, including on-sensor memory cells and multi-gain-stage architectures. They are also capable of producing on the order of 10 GB per second, so calibrating the data is computationally intensive.

European XFEL aims to provide facility users with a fully corrected and calibrated dataset as the primary data product [KusterCal2014], so the burden of dealing with this calibration falls on the facility, not the users. This concept has been successfully deployed in other scientific communities such as astronomy, space science, and high-energy physics for more than a decade.

Users neither have to provide large amounts of computing resources nor have to have in-depth expertise on detector physics to obtain state-of-the-art corrected and calibrated datasets for their experiments and can thus focus on their scientific analysis. Additionally, comparisons between and data aggregation of different experiments and instruments are simplified as calibration becomes user-independent.

Within the proposal data folder, the proc/ subfolder contains calibrated data, and raw/ contains uncalibrated data. The Karabo Bridge data streams can also offer both calibrated or raw data - see Streaming from Karabo bridge for details.

1.3.2. Data policies

The following summarise the policies around certain kinds of data.

Raw data:Raw data represents digitized detector signal, not altered by detector-specific corrections or calibrations; e.g., it is in the form of detector units such as analogue digital units (ADU). Vetoing, either by hard- or software triggers and zero-value suppression (e.g., by transferral to event lists), may have been performed and is irreversible. Raw data is the main archival data product at the European XFEL. It is not foreseen to be exported outside the facility [KusterCal2014].
Calib. data:Calibrated data is generated from raw data by applying detector-specific corrections and transformation to physical units (calibration) - e.g., photons per pixel. Calibrated data is the standard data product with which users will be provided. It is not archived; instead, if a calibrated dataset is requested but not accessible through the online-cache or user-space anymore, it will be reprocessed on the fly from the raw data repository using the appropriate calibration parameters provided by the calibration database.
Alignment data:Alignment data is generated from dedicated alignment measurements, providing the position of each detector pixel and detector module in three-dimensional space. It is stored in the detector coordinate system (i.e., as pixel coordinates) and no additional interpolation or coordinate transformation will be applied. Alignment data is part of the standard data products with which users will be provided.

1.4. Citations

[Fangohr2017]Fangohr, Hans, et al. “Data Analysis Support in Karabo at European XFEL” ICALEPCS 2017. Available online: http://accelconf.web.cern.ch/AccelConf/icalepcs2017/doi/JACoW-ICALEPCS2017-TUCPA01.html
[Hauf2019]Hauf, Steffen, et al. “The Karabo distributed control system” J. Synchrotron Rad. 26 (2019): 1448-1461. Available online: https://doi.org/10.1107/S1600577519006696
[KusterCal2014](1, 2) Kuster, Markus, et al. “Detectors and calibration concept for the European XFEL.” Synchrotron radiation news 27.4 (2014): 35-38. Available online: https://www.tandfonline.com/doi/abs/10.1080/08940886.2014.930809.
[Maltezopoulos2019]Maltezopoulos, Theophilos et al. “Operation of X-ray gas monitors at the European XFEL” J. Synchrotron Rad. 26 (2019): 1045-1051. Available online: https://doi.org/10.1107/S1600577519003795