Tutorial Calculation

Author: Astrid Muennich

Version: 0.1

A small example how to adapt a notebook to run with the offline calibration package “pycalibation”.

The first cell contains all parameters that should be exposed to the command line.

To run this notebooks with several different input parameters in parallel by submitting multiple slurm jobs, for example for various random seed we can do the following:

xfel-calibrate TUTORIAL TEST –random-seed 1,2,3,4

or

xfel-calibrate TUTORIAL TEST –random-seed 1-5

will produce 4 jobs:

Parsed input 1,2,3,4 to [1, 2, 3, 4]

Submitted job: 1169340

Submitted job: 1169341

Submitted job: 1169342

Submitted job: 1169343

Submitted the following SLURM jobs: 1169340,1169341,1169342,1169343

out_folder = "/gpfs/exfel/data/scratch/amunnich/tutorial" # output folder
sensor_size = [10, 30] # defining the picture size
random_seed = [2345] # random seed for filling of fake data array. Change it to produce different results, range allowed
runs = 500 # how may iterations to fill histograms
cluster_profile = "tutorial"

First include what we need and set up the cluster profile for parallel processing on one node utilising more than one core. Everything that has a written response in a cell will show up in the report, e.g. prints but also return values or errors.

import matplotlib

%matplotlib inline

import matplotlib.pyplot as plt
import numpy as np

# if not using slurm: make sure a cluster is running with
# ipcluster start --n=4 --profile=tutorial
# give it a while to start
from ipyparallel import Client

print("Connecting to profile {}".format(cluster_profile))
view = Client(profile=cluster_profile)[:]
view.use_dill()

Create some random data

def data_creation(random_seed):
    np.random.seed = random_seed
    return np.random.random((sensor_size))
# in order to run several random seeds in parallel the parameter has to be a list. To use the current single value in this
# notebook we use the first entry in the list
random_seed_single = random_seed[0]
fake_data = []
for i in range(runs):
    fake_data.append(data_creation(random_seed_single+10*i))

Create some random images and plot them. Everything we write here in the markup cells will show up as text in the report.

plt.subplot(211)
plt.imshow(fake_data[0], interpolation="nearest")
plt.title('Random Image')
plt.ylabel('sensor height')
plt.subplot(212)
plt.imshow(fake_data[5], interpolation="nearest")
plt.xlabel('sensor width')
plt.ylabel('sensor height')
plt.subplots_adjust(bottom=0.1, right=0.8, top=0.9)
cax = plt.axes([0.85, 0.1, 0.075, 0.9])
plt.colorbar(cax=cax).ax.set_ylabel("# counts")
plt.show()

These plots show two randomly filled sensor images. We can use markup cells also as captions for images.

Simple Analysis

mean = []
std = []
for im in fake_data:
    mean.append(im.mean())
    std.append(im.std())

To parallelise jobs we use the ipyparallel client. This will run on one node an ipcluster with the specified number of cores given in xfel_calibrate/notebooks.py.

from functools import partial


def parallel_stats(input):
    return input.mean(), input.std()

p = partial(parallel_stats)
results = view.map_sync(p, fake_data)

p_mean= [ x[0] for x in results ]
p_std= [ x[1] for x in results ]

We calculate the mean value of all images, as well as the standard deviation.

plt.subplot(221)
plt.hist(mean, 50)
plt.xlabel('mean')
plt.ylabel('counts')
plt.title('Mean value')

plt.subplot(222)
plt.hist(p_mean, 50)
plt.xlabel('mean parallel')
plt.ylabel('counts')
plt.title('Parallel Mean value')

plt.subplot(223)
plt.hist(std, 50)
plt.xlabel('std')
plt.ylabel('counts')
plt.title('Std value')

plt.subplot(224)
plt.hist(p_std, 50)
plt.xlabel('std parallel')
plt.ylabel('counts')
plt.title('Parallel Std value')

plt.subplots_adjust(top=0.99, bottom=0.01, left=0.01, right=0.99, hspace=0.7, wspace=0.35)
plt.show()