Getting started with EXtra-xwiz =============================== *EXtra-xwiz* is a command-line tool for automated processing of the calibrated serial crystalography data. It is a semi-automatic pipeline wrapped around CrystFEL and it utilises SLURM for parallel computing. .. figure:: _images/xwiz_schema_02.png The basic functionality of the tool is explained here. Installation ------------ EXtra-xwiz is pre-installed in the EuXFEL software folder on Maxwell, currently in a virtual environment. To activate it, simply type :: module load exfel EXtra-xwiz/beta Getting and setting a configuration ----------------------------------- The workflow configuration determines how the steps of a typical SFX workflow shall be run. The first call of the tool:: xwiz-workflow will do nothing except writing a template configuration file to the current folder. Luckily, there are only a few configuration items that have to be adapted by the user, because no reasonable default exists. Open the freshly created file ``xwiz_conf.toml`` with a text editor and change the right-hand parts of the definitions as required, e. g. proposal number, run and so on. If configuration file with more parameters is desired ``-adv / --advance-config`` option can be used:: xwiz-workflow --advance-config The configuration file explained -------------------------------- Generally the configuration is structured into groups, addressing different categories or aspects of the pipeline usage, such as input file retrieval, CrystFEL program parameters or SLURM (distributed computation) set-up. A few specific aspects of the configuration values may not be obvious, let's look at the "data", "crystfel", "geom, and "slurm" groups. .. image:: _images/xwiz_data_01.png :width: 150 :class: float-left .. code-block:: toml [data] proposal = 900145 runs = 752 n_frames_total = 10000 vds_names = ["jf_752_vds.cxi"] list_prefix = "p900145_r0752" proposal Proposal number which identifies your beam-time. Accessing calibrated/corrected detector data, Xwiz will expand this to the path of the proposal "proc" folder containing the runs you will work with. runs Single data collection run or a list of run numbers that shall be processed within one Xwiz session. To add another run to the above example, change to e. g.: ``[752,753]`` vds_names List of one or more virtual data set (VDS) file names. Any valid/existing path - absolute or relative - may be used. If the string(s) are only names, the path is interpreted as the current working directory (i. e. the folder from where Xwiz was started). There are two ways to work with VDS files: #. create them from the original data in runs, typically at the first processing session. - the **runs** parameter is essential - **vds_names** point to target file names of VDS files to be created - there must be one name for each run - e.g. in case of two runs: ``["jf_752_vds.cxi", "jf_753_vds.cxi"]`` #. use existing VDS files. - The software first looks if the given files are already present - if so, the **runs** setting is ignored and the VDS will be re-used n_frames_total The total number of frames to be processed, over all runs. Having ``n_frames_total < actual sum of frames`` is a way to truncate processing. The case ``n_frames_total > actual sum of frames`` will take the actual sum. list_prefix All output files of Xwiz will start with this tag, it thus identifies the session. .. image:: _images/xwiz_ucrystfel_cut.png :width: 115 :class: float-left .. code-block:: toml [crystfel] # Available versions: '0.8.0', '0.9.1', '0.10.0', 'cfel_dev' version = 'cfel_dev' version Version number of the CrystFEL release to use. The need to switch to older releases may be given by the reproduction or comparison purposes. ``cfel_dev`` points to the latest builds, prior to the upcoming release. .. image:: _images/xwiz_geom_01.png :width: 115 :class: float-left .. code-block:: toml [geom] file_path = "jungfrau8_p900145_v1_vds.geom" | file_path relative or absolute path to an existing CrystFEL-format geometry file appropriate for the detector geometry at the time of data collection. Geometry files can come in two fashions, with either Cheetah-compatible or VDS-compatible pixel data references. The main difference concerns fused vs. stacked pixel data from multi-module detectors (AGIPD, JF4M). As per Cheetah layout, all pixels of an image-frame are in one set with just the dimensions (ss, fs), whereas in VDS layout there is an additional dimension containing the module index. If you have the correct geometry in terms of metrics, but the file refers to the wrong pixel layout, Xwiz will auto-correct these layout references to match the used data file. For instance, a geometry file stemming from usage with Cheetah will be auto-converted to VDS layout if Xwiz is run with a VDS file (default). .. image:: _images/xwiz_slurm_01.png :width: 115 :class: float-left .. code-block:: toml [slurm] # Available partitions: 'all', 'upex', 'exfel' partition = "all" duration_all = "2:00:00" n_nodes_all = 10 duration_hits = "0:30:00" n_nodes_hits = 4 partition points to the corresponding Maxwell partition of HPC nodes to be used. Default "all" would suite most needs. External users could also choose "upex" - or in case of dedicated beam-time reservations something like "upex_002697" (example). duration_all (resp. **duration_hits**) The format reads "H:MM:SS" and defines for how long the HPC nodes shall be allocated. *In practice, processing that takes longer to finish than the allocated time will be aborted!* n_nodes_all (resp. **n_nodes_hits**) specifies how many HPC nodes to employ. Hard limits are set by the partition resp. reservation. The ``_all`` vs. ``_hits`` distinction allows to have separate values for the processing of all frames in the first phase versus the re-processing of a subset of crystal "hit" frames (indexable due to Bragg diffraction patterns contained) in the second phase of processing. Running the fully automatic mode -------------------------------- Provided that a good starting estimate for the crystal unit cell is available, one can create a CrystFEL unit cell file in the current folder, like this:: CrystFEL unit cell file version 1.0 lattice_type = tetragonal centering = P unique_axis = c a = 79.1 A b = 79.1 A c = 37.9 A al = 90.00 deg be = 90.00 deg ga = 90.00 deg and specify the saved file name in the configuration. To process the diffraction data, one can then enter :: xwiz-workflow -a and let everything run based on the configuration, ending up with the merged structure factor intensities and their crystallographic FOM vs. resolution- shell tables. Running the interactive mode ---------------------------- Essentially, interactive mode means that all items of the configuration file (including advanced configuration parameters) are displayed for confirmation at a prompt at run-time. This is meant to allow for quick re-confirmation or modification of the workflow parameters, without the need of changing the configuration persistently by file editing. The interactive mode is actually the default, meaning it is chosen when the tool is called without command-line argument. Hence, type :: xwiz-workflow and you will enter the interactive mode. **Output files** The following files can be found in the current folder, from where the tool was run, after the workflow has finished: - a virtual data set for indexamajig (only in case that a non-existing file was referred to in the config) - stream files from indexamajig - a file _hits.lst containing the numbers of detector frames that could be indexed (a sub-set of all frames) - the unique set of structure factor intensities after scaling and averaging symmetry-equivalent observations with ``partialator``, in three versions: fully merged as well as merged from the two complementary half-sets of equivalents - text files *.dat with crystallographic FOMs (S/N, CC_1/2, CC*, R_split) in resolution shells - a summary file written by the workflow tool, wrapping up indexing rates, unit cell refinement and FOMs The summary file looks like this:: SUMMARY OF XWIZ WORKFLOW Session time-stamp: 2020-07-22T12:18:48.711769 Operation mode: interactive (run-time parameter confirm/override) Input type: virtual data set referring to EuXFEL-corrected HDF5 BASE CONFIGURATION USED Group: data path : /gpfs/exfel/exp/XMPL/201750/p700000/proc/ runs : 29 n_frames : 630000 ... ... Step # d_lim source N(crystals) N(frames) Indexing rate [%%] 1 3.5 indexamajig 28553 630000 4.53 cell_check 28553 630000 4.53 2 2.0 indexamajig 27581 28553 96.60 OVERALL 27581 630000 4.38 Crystal unit cells used: File Symmetry/axis, a, b, c, alpha, beta, gamma hewl.cell tetragonal P c 79.1 79.1 37.9 90.00 90.00 90.00 hewl.cell_refined tetragonal P c 79.7 79.7 38.0 90.00 90.00 90.00 Crystallographic FOMs: overall outer shell Completeness 100.00 100.0 Signal-over-noise 5.84 2.54 CC_1/2 0.6338 0.1394 CC* 0.8363 0.4948 R_split 33.68 72.59