.. _calversion:
Tutorial Calculation
====================

Author: Astrid Muennich

Version: 0.1

A small example how to adapt a notebook to run with the offline
calibration package "pycalibation".

The first cell contains all parameters that should be exposed to the
command line.

To run this notebooks with several different input parameters in
parallel by submitting multiple slurm jobs, for example for various
random seed we can do the following:

xfel-calibrate TUTORIAL TEST --random-seed 1,2,3,4

or

xfel-calibrate TUTORIAL TEST --random-seed 1-5

will produce 4 jobs:

Parsed input 1,2,3,4 to [1, 2, 3, 4]

Submitted job: 1169340

Submitted job: 1169341

Submitted job: 1169342

Submitted job: 1169343

Submitted the following SLURM jobs: 1169340,1169341,1169342,1169343

.. code-block:: python

    out_folder = "/gpfs/exfel/data/scratch/amunnich/tutorial" # output folder
    sensor_size = [10, 30] # defining the picture size
    random_seed = [2345] # random seed for filling of fake data array. Change it to produce different results, range allowed
    runs = 500 # how may iterations to fill histograms
    cluster_profile = "tutorial" 

First include what we need and set up the cluster profile for parallel
processing on one node utilising more than one core. Everything that has
a written response in a cell will show up in the report, e.g. prints but
also return values or errors.

.. code-block:: python

    import matplotlib
    
    %matplotlib inline
    
    import matplotlib.pyplot as plt
    import numpy as np
    
    # if not using slurm: make sure a cluster is running with 
    # ipcluster start --n=4 --profile=tutorial
    # give it a while to start
    from ipyparallel import Client
    
    print("Connecting to profile {}".format(cluster_profile))
    view = Client(profile=cluster_profile)[:]
    view.use_dill()

Create some random data
-----------------------

.. code-block:: python

    def data_creation(random_seed):
        np.random.seed = random_seed
        return np.random.random((sensor_size))

.. code-block:: python

    # in order to run several random seeds in parallel the parameter has to be a list. To use the current single value in this 
    # notebook we use the first entry in the list
    random_seed_single = random_seed[0]
    fake_data = []
    for i in range(runs):
        fake_data.append(data_creation(random_seed_single+10*i))

Create some random images and plot them. Everything we write here in the
markup cells will show up as text in the report.

.. code-block:: python

    plt.subplot(211)
    plt.imshow(fake_data[0], interpolation="nearest")
    plt.title('Random Image')
    plt.ylabel('sensor height')
    plt.subplot(212)
    plt.imshow(fake_data[5], interpolation="nearest")
    plt.xlabel('sensor width')
    plt.ylabel('sensor height')
    plt.subplots_adjust(bottom=0.1, right=0.8, top=0.9)
    cax = plt.axes([0.85, 0.1, 0.075, 0.9])
    plt.colorbar(cax=cax).ax.set_ylabel("# counts")
    plt.show()

These plots show two randomly filled sensor images. We can use markup
cells also as captions for images.

Simple Analysis
---------------

.. code-block:: python

    mean = []
    std = []
    for im in fake_data:
        mean.append(im.mean())
        std.append(im.std())

To parallelise jobs we use the ipyparallel client. This will run on one
node an ipcluster with the specified number of cores given in
xfel\_calibrate/notebooks.py.

.. code-block:: python

    from functools import partial
    
    
    def parallel_stats(input):
        return input.mean(), input.std()
    
    p = partial(parallel_stats)
    results = view.map_sync(p, fake_data)
    
    p_mean= [ x[0] for x in results ]
    p_std= [ x[1] for x in results ]

We calculate the mean value of all images, as well as the standard
deviation.

.. code-block:: python

    plt.subplot(221)
    plt.hist(mean, 50)
    plt.xlabel('mean')
    plt.ylabel('counts')
    plt.title('Mean value')
    
    plt.subplot(222)
    plt.hist(p_mean, 50)
    plt.xlabel('mean parallel')
    plt.ylabel('counts')
    plt.title('Parallel Mean value')
    
    plt.subplot(223)
    plt.hist(std, 50)
    plt.xlabel('std')
    plt.ylabel('counts')
    plt.title('Std value')
    
    plt.subplot(224)
    plt.hist(p_std, 50)
    plt.xlabel('std parallel')
    plt.ylabel('counts')
    plt.title('Parallel Std value')
    
    plt.subplots_adjust(top=0.99, bottom=0.01, left=0.01, right=0.99, hspace=0.7, wspace=0.35)
    plt.show()