Restart Offline Calibration¶
In the following the procedures for verifying operation and restarting of offline calibration are described
Preface¶
The offline calibration runs on max-exfl016 under the xcal user account. You need to be able to kinit yourself into that account in case you want to perform any of the below procedures. Please ask ITDM for access.
You can login direcly from within the DESY network.
kinit <username>
ssh xcal@max-exfl016
Remotely, when not on a VPN you need to go via Bastion:
ssh <username>@bastion.desy.de
ssh xcal@max-exfl016
Warning
You are working on the production offline calibration service. Please proceed with care
Checking Status¶
The following status checks should be performed to verify if the service is working correctly:
Is the Service Running¶
The service is run as a Python webservice. To check if it is running type:
ps aux | grep webservice
This should result in output similar to
xcal 15440 0.0 0.0 401188 36464 ? Sl Jun10 0:20 /home/xcal/calibration_webservice_venv/bin/python /home/xcal/calibration_webservice/webservice/webservice.py --mode prod --logging INFO --config-file /home/xcal/calibration_webservice/webservice/webservice.yaml
xcal 17569 0.0 0.0 112812 1020 pts/1 S+ 13:49 0:00 grep --color=auto webservice
xcal 21500 0.0 0.0 222752 87788 ? S Jun02 2:23 /home/xcal/calibration_webservice_venv/bin/python /home/xcal/calibration_webservice/webservice/serve_overview.py --config /home/xcal/calibration_webservice/webservice/serve_overview.yaml
It is important that the first line is present.
As the offline calibration webservice is running using a virtual environment, you will need to activate this environment before you go further.
The path for the bin folder for the python environment should be shown as the above output. to activate this python environment write:
source /home/xcal/calibration_webservice_venv/bin/activate
Checking the Logs¶
The logs are located the max-exfl016:8008 web page:
or they can be accessed in the webservice run subdirectory:
tail -500f calibration_webservice_run/web.log
You should see entries similar to:
2019-07-04 12:20:48,988 - root - INFO - Copying /gpfs/exfel/exp/SQS...
2019-07-04 12:20:48,996 - root - INFO - Copying /gpfs/exfel/exp/SQS...
2019-07-04 12:20:49,004 - root - INFO - Copying /gpfs/exfel/exp/SQS...
2019-07-04 12:20:49,013 - root - INFO - Copying /gpfs/exfel/exp/SQS...
2019-07-09 16:11:29,966 - root - INFO - python -m xfel_calibrate.calibrate JUNGFRAU CORRECT ...
2019-07-09 16:11:36,987 - root - INFO - SUCCESS: Started correction: proposal 900063, run 1051
These will tell you the last processes that were started. Especially useful are the lines which start with python -m xfel_calibrate.calibrate … as these tell you which calibration jobs were launched and with which parameters.
Checking for running Jobs¶
The following command will display a list of currently running jobs on Maxwell:
squeue -u xcal -o "%.18i %60j %.3t %.10M %R"
You should check that jobs are not piling up, e.g. have been queued for a long time. If this is the case, the instrument should be contacted.
Restarting the Service¶
Restarting the calibration webservice should rarely be necessary, and if so, should only be done with great care as it can affect the entire facility. If you are unsure, first consult with other colleagues.
Check if the service process is still running and kill it:
ps aux | grep webservice.py
Note down the PID and kill that process if it is running:
kill -9 <PID>
Restart the service:
cd calibration_webservice_run nohup /home/xcal/calibration_webservice_venv/bin/python /home/xcal/calibration_webservice/webservice/webservice.py --mode prod --logging INFO --config-file /home/xcal/calibration_webservice/webservice/webservice.yaml > nohup_webservice.out&
Verify it is running by checking for the process:
ps aux | grep webservice.py
and the logs:
tail -500f /home/xcal/calibration_webservice_run/web.log
Notify the instruments that the service was down:
send-to: <all instrument emails> subject: Restart of Calibration Webservice The calibration webservice required a restart due to.... The restart was performed in the time from XXX to YYY. During this time some jobs submitted through the MDC or calibrate_dark script may not have been handled. If you submitted jobs during this time, please consider resubmitting them. If you have questions, please contact det-support@xfel.eu. Please also forward this information to your users.
Restarting the overview webpage¶
Previous first step for restarting the calibration webservice should be performed then
Restart the overview page:
cd calibration_webservice_run nohup /home/xcal/calibration_webservice_venv/bin/python /home/xcal/calibration_webservice/webservice/serve_overview.py --config /home/xcal/calibration_webservice/webservice/serve_overview.yaml > nohup_serve_overview.out &
Verify it is running by checking for the process:
ps aux | grep webservice.py
and the webpage through max-exfl16:8008 from a browser accessing the maxwell network.