Restart Offline Calibration

In the following the procedures for verifying operation and restarting of offline calibration are described

Preface

The offline calibration runs on max-exfl016 under the xcal user account. You need to be able to kinit yourself into that account in case you want to perform any of the below procedures. Please ask ITDM for access.

You can login direcly from within the DESY network.

kinit <username>
ssh xcal@max-exfl016

Remotely, when not on a VPN you need to go via Bastion:

ssh <username>@bastion.desy.de
ssh xcal@max-exfl016

Warning

You are working on the production offline calibration service. Please proceed with care

Checking Status

The following status checks should be performed to verify if the service is working correctly:

Is the Service Running

The service is run as a Python webservice. To check if it is running type:

ps aux | grep webservice

This should result in output similar to

xcal     15440  0.0  0.0 401188 36464 ?        Sl   Jun10   0:20 /home/xcal/calibration_webservice_venv/bin/python /home/xcal/calibration_webservice/webservice/webservice.py --mode prod --logging INFO --config-file /home/xcal/calibration_webservice/webservice/webservice.yaml
xcal     17569  0.0  0.0 112812  1020 pts/1    S+   13:49   0:00 grep --color=auto webservice
xcal     21500  0.0  0.0 222752 87788 ?        S    Jun02   2:23 /home/xcal/calibration_webservice_venv/bin/python /home/xcal/calibration_webservice/webservice/serve_overview.py --config /home/xcal/calibration_webservice/webservice/serve_overview.yaml

It is important that the first line is present.

As the offline calibration webservice is running using a virtual environment, you will need to activate this environment before you go further.

The path for the bin folder for the python environment should be shown as the above output. to activate this python environment write:

source /home/xcal/calibration_webservice_venv/bin/activate

Checking the Logs

The logs are located the max-exfl016:8008 web page:

../../_images/webservice_logs.png

or they can be accessed in the webservice run subdirectory:

tail -500f calibration_webservice_run/web.log

You should see entries similar to:

2019-07-04 12:20:48,988 - root - INFO - Copying /gpfs/exfel/exp/SQS...
2019-07-04 12:20:48,996 - root - INFO - Copying /gpfs/exfel/exp/SQS...
2019-07-04 12:20:49,004 - root - INFO - Copying /gpfs/exfel/exp/SQS...
2019-07-04 12:20:49,013 - root - INFO - Copying /gpfs/exfel/exp/SQS...
2019-07-09 16:11:29,966 - root - INFO - python -m xfel_calibrate.calibrate JUNGFRAU CORRECT ...
2019-07-09 16:11:36,987 - root - INFO - SUCCESS: Started correction: proposal 900063, run 1051

These will tell you the last processes that were started. Especially useful are the lines which start with python -m xfel_calibrate.calibrate … as these tell you which calibration jobs were launched and with which parameters.

Checking for running Jobs

The following command will display a list of currently running jobs on Maxwell:

squeue  -u xcal -o "%.18i %60j %.3t %.10M %R"

You should check that jobs are not piling up, e.g. have been queued for a long time. If this is the case, the instrument should be contacted.

Restarting the Service

Restarting the calibration webservice should rarely be necessary, and if so, should only be done with great care as it can affect the entire facility. If you are unsure, first consult with other colleagues.

  1. Check if the service process is still running and kill it:

    ps aux | grep webservice.py
    

    Note down the PID and kill that process if it is running:

    kill -9 <PID>
    
  2. Restart the service:

    cd calibration_webservice_run
    nohup /home/xcal/calibration_webservice_venv/bin/python /home/xcal/calibration_webservice/webservice/webservice.py --mode prod --logging INFO --config-file /home/xcal/calibration_webservice/webservice/webservice.yaml > nohup_webservice.out&
    
  3. Verify it is running by checking for the process:

    ps aux | grep webservice.py
    

    and the logs:

    tail -500f /home/xcal/calibration_webservice_run/web.log
    
  4. Notify the instruments that the service was down:

    send-to: <all instrument emails>
    subject: Restart of Calibration Webservice
    
    The calibration webservice required a restart due to....
    
    The restart was performed in the time from XXX to YYY. During this time some jobs submitted through the
    MDC or calibrate_dark script may not have been handled. If you submitted jobs during this time, please consider
    resubmitting them. If you have questions, please contact det-support@xfel.eu. Please also forward this information
    to your users.
    

Restarting the overview webpage

Previous first step for restarting the calibration webservice should be performed then

  1. Restart the overview page:

    cd calibration_webservice_run
    nohup /home/xcal/calibration_webservice_venv/bin/python /home/xcal/calibration_webservice/webservice/serve_overview.py --config /home/xcal/calibration_webservice/webservice/serve_overview.yaml > nohup_serve_overview.out &
    
  2. Verify it is running by checking for the process:

    ps aux | grep webservice.py
    

    and the webpage through max-exfl16:8008 from a browser accessing the maxwell network.