.. _cal_offline_restart: Restart Offline Calibration =========================== In the following the procedures for verifying operation and restarting of offline calibration are described Preface ~~~~~~~ The offline calibration runs on `max-exfl016` under the `xcal` user account. You need to be able to `kinit` yourself into that account in case you want to perform any of the below procedures. Please ask ITDM for access. You can login direcly from within the DESY network. .. code-block:: console kinit ssh xcal@max-exfl016 Remotely, when not on a VPN you need to go via Bastion: .. code-block:: console ssh @bastion.desy.de ssh xcal@max-exfl016 .. warning:: You are working on the production offline calibration service. Please proceed with care Checking Status ~~~~~~~~~~~~~~~ The following status checks should be performed to verify if the service is working correctly: Is the Service Running ---------------------- The service is run as a Python webservice. To check if it is running type: .. code-block:: console ps aux | grep webservice This should result in output similar to .. code-block:: console xcal 15440 0.0 0.0 401188 36464 ? Sl Jun10 0:20 /home/xcal/calibration_webservice_venv/bin/python /home/xcal/calibration_webservice/webservice/webservice.py --mode prod --logging INFO --config-file /home/xcal/calibration_webservice/webservice/webservice.yaml xcal 17569 0.0 0.0 112812 1020 pts/1 S+ 13:49 0:00 grep --color=auto webservice xcal 21500 0.0 0.0 222752 87788 ? S Jun02 2:23 /home/xcal/calibration_webservice_venv/bin/python /home/xcal/calibration_webservice/webservice/serve_overview.py --config /home/xcal/calibration_webservice/webservice/serve_overview.yaml It is important that the first line is present. As the offline calibration webservice is running using a virtual environment, you will need to activate this environment before you go further. The path for the bin folder for the python environment should be shown as the above output. to activate this python environment write: source /home/xcal/calibration_webservice_venv/bin/activate Checking the Logs ----------------- The logs are located the `max-exfl016:8008` web page: .. figure:: webservice_logs.png or they can be accessed in the webservice run subdirectory: .. code-block:: console tail -500f calibration_webservice_run/web.log You should see entries similar to: .. code-block:: console 2019-07-04 12:20:48,988 - root - INFO - Copying /gpfs/exfel/exp/SQS... 2019-07-04 12:20:48,996 - root - INFO - Copying /gpfs/exfel/exp/SQS... 2019-07-04 12:20:49,004 - root - INFO - Copying /gpfs/exfel/exp/SQS... 2019-07-04 12:20:49,013 - root - INFO - Copying /gpfs/exfel/exp/SQS... 2019-07-09 16:11:29,966 - root - INFO - python -m xfel_calibrate.calibrate JUNGFRAU CORRECT ... 2019-07-09 16:11:36,987 - root - INFO - SUCCESS: Started correction: proposal 900063, run 1051 These will tell you the last processes that were started. Especially useful are the lines which start with `python -m xfel_calibrate.calibrate ...` as these tell you which calibration jobs were launched and with which parameters. Checking for running Jobs ------------------------- The following command will display a list of currently running jobs on Maxwell: .. code-block:: console squeue -u xcal -o "%.18i %60j %.3t %.10M %R" You should check that jobs are not piling up, e.g. have been queued for a long time. If this is the case, the instrument should be contacted. Restarting the Service ~~~~~~~~~~~~~~~~~~~~~~ Restarting the calibration webservice should rarely be necessary, and if so, should only be done with great care as it can affect the entire facility. If you are unsure, first consult with other colleagues. 1. Check if the service process is still running and kill it: .. code-block:: console ps aux | grep webservice.py Note down the PID and kill that process if it is running: .. code-block:: console kill -9 2. Restart the service: .. code-block:: console cd calibration_webservice_run nohup /home/xcal/calibration_webservice_venv/bin/python /home/xcal/calibration_webservice/webservice/webservice.py --mode prod --logging INFO --config-file /home/xcal/calibration_webservice/webservice/webservice.yaml > nohup_webservice.out& 3. Verify it is running by checking for the process: .. code-block:: console ps aux | grep webservice.py and the logs: .. code-block:: console tail -500f /home/xcal/calibration_webservice_run/web.log 4. Notify the instruments that the service was down: .. code-block:: console send-to: subject: Restart of Calibration Webservice The calibration webservice required a restart due to.... The restart was performed in the time from XXX to YYY. During this time some jobs submitted through the MDC or calibrate_dark script may not have been handled. If you submitted jobs during this time, please consider resubmitting them. If you have questions, please contact det-support@xfel.eu. Please also forward this information to your users. Restarting the overview webpage ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Previous first step for restarting the calibration webservice should be performed then 2. Restart the overview page: .. code-block:: console cd calibration_webservice_run nohup /home/xcal/calibration_webservice_venv/bin/python /home/xcal/calibration_webservice/webservice/serve_overview.py --config /home/xcal/calibration_webservice/webservice/serve_overview.yaml > nohup_serve_overview.out & 3. Verify it is running by checking for the process: .. code-block:: console ps aux | grep webservice.py and the webpage through max-exfl16:8008 from a browser accessing the maxwell network.