Skip to content
Snippets Groups Projects
exceptionHandlingDesign.dox 9.73 KiB
Newer Older
/**
\page exceptionHandlingDesign Exception Handling Design
\section gen_idea General Idea


Exceptions must be handled by ApplicationCore in a way that the application developer does not have to care much about it.

In case of a ChimeraTK::runtime exception the Application must catch the expection and report it to the DeviceModule. The DeviceModule should handle this exception and block the device until the device can be opened again. As there could many devices make sure only the faulty device is blocked.
Nadeem Shehzad's avatar
Nadeem Shehzad committed
Even if a device is faulty it should not block the server from starting.

Once in error state, set the DataValidity flag for that module to faulty and propogate this to all of it‘s output variables. After the exception is cleared and operation returns without a data fault flag, set DataValidity flag to ok. Furthermore, the device must be reinitialised automatically and also recover the values of process variables as the device might have rebooted and the variables have been re-set.


Nadeem Shehzad's avatar
Nadeem Shehzad committed
<b>Genesis</b>

- When DeviceModule is created it is registered with Application. (Added to a list in Application::registerDeviceModule.)
- An initailisation handler can be added to the device through constructor. Initialisation handlers are callback function which will be executed after a device recovers from an exception.
- A list of TransferElements shared pointers is created as writeAfterOpen which is used to write constants after the devcie is opened.
- A list of TransferElements shared pointers is created as writeRecoveryOpen which is populated in function addRecoveryAccessor in the DeviceModule.
- ChimeraTK::NDRegisterAccessor is used to access the device variables inside class Application.
- Class ExceptionHandlingDecorator facilitates ChimeraTK::NDRegisterAccessor in case of exception.
- Recovery accessor is added for writebale register when ChimeraTK::NDRegisterAccessor is obtianed. These recovery accessors are used to recover the values of variables after the recovery.
- setOnwer() is used to set the application module or variable group as owner of the (feeding) device which is decorated with and ExceptionHandlingDecorator.

<b>The Flow</b>

- Application has started but the device is not opened
  - All the writes will be delayed until the device is opened
  - Constants too will be written only after the device is opened.

  - The device is opened for the first time inside DeviceModule::handleException().
  - If there is no exception
    - deviceError.status is set to 0.
    - Device is initailised iterating over initialisationHandlers list.
    - Constant feeders are written to the device using writeAfterOpen().

- When a read / write operation ExceptionHandlingDecorator<UserType>::genericTransfer (ChimeraTK::NDRegisterAccessor) on device causes a ChimeraTK::runtime exception, the exception is caught
  - Inside ExceptionHandlingDecorator
    - The dataValidity of the DeviceModule is set to faulty using setOwnerValidityFunction(DataValidity::faulty)
    - incrementDataFaultCounter is set to true
    - Error is reported to DeviceModule with the exception as DeviceModule::reportException(e.what).
  - incrementDataFaultCounter is picked up by MetaDataPropagatingDecorator and all the outputs are set faulty.

  - In DeviceModule::reportException
    - The Error is pushed into an error queue and the deviceError.status is set to 1.
    - The device is blocked until the error state is resolved i.e., device can be opened again.

  - Exception is handled by DeviceModule::handleException() in a separate thread.
  - It will keep on trying to open the device until successful.
  - Once device is opened,
    - deviceError.status is set to 0.
    - device is reinitalisied through initialisationHandlers.
    - process variables are written again through writeRecoveryOpen().
    - device thread is notified and it no longer remains block.


-<b>Add an exception handling and reporting machinsm to the device module (DeviceModule).</b>

Description.

Add  two error state variables.
Nadeem Shehzad's avatar
Nadeem Shehzad committed
- "state" (boolean flag if error occurred)
- "message" (string with error message)
These variables are automatically connected to the control systen in this format
Nadeem Shehzad's avatar
Nadeem Shehzad committed
- /Devices/{AliasName}/message
- /Devices/{AliasName}/status

Add a thread safe function reportException().
A user/application can report an exception by calling reportException of DeviceModule with an exception string. The reportException packs the exception in a queue and the blocks the thread. This queue is processed by an internal function handleException which updates the DeviceError variables (status=1 and message="YourExceptionString") and tries to open the device. Once device can be opened the DeviceError variables are updated (status=0 and message="") and blocking threads are notified to continue. It must be noted that whatever operation which lead to exception e.g., read or write, should be repeated after the exception is handled.

Implmentation.
- DeviceModule


-<b>Catch ChimeraTK::runtime_error exceptions.</b>

Description.

Catch all the ChimeraTK::runtime_error exceptions that could be thrown in read and write operations and feed the error state into the DeviceModule through the function DeviceModule::reportException() . NDRegisterAccessors coming from device should be used as a singal central point to catch these excpetions.
Retry the failed operation after reportException() returns.

Implmentation.

It is done by placing a ExceptionHandlingDecorator around all NDRegisterAccessors  coming from a device.
-  NDRegisterAccessors
- Application

-<b>Faulty device should not block any other device.</b>

Description.

Each TriggerFanOut deals with several variable networks at the same time, which are triggered by the same trigger. Each variable network has its own feeder and one or more consumers. You do not need to change anything about the variable networks.
On the other hand, the trigger itself is a variable network, too. The TriggerFanOut has a consumer of this trigger network. This is the accessor on which the blocking read() is called in the loop. You will need to create additional consumers in the trigger network, one for each TriggerFanOut.

Implementation.

- Application (Application::typedMakeConnection)

-<b>The Server must always start even if a device is in error state.</b>

Description.

Nadeem Shehzad's avatar
Nadeem Shehzad committed
To make sure that the server should always start, the initial opening of the device should take place in the DeviceModule itself, inside the exception handling loop so that device can go to the error state right at the beginning and the server can start despite not all its devices are available.

Implementation.

- DeviceModule ( DeviceModule::handleException() ).


-<b>Set/clear fault flag of module in case of exception.</b>

Background.

A DataValidity flag of a module is set to faulty if any input variables returns with a set data fault flag after a read operation and is cleared once all inputs have data fault no longer set. In a write operation, the module's data fault flag status is attached to the variable to write.
More detail ...(Martin‘s doc)

Description.

In case of an ChimeraTK:runtime_error exception this DataValidity flag should also be set to faulty and  propogated to all outputs of the module. When the operation completes after clearing the exception state, the flag should be cleared as well.

Implmentation.
- ExceptionHandlingDecorator
- TriggerFanOut

Additional note from code author.
Note that if the data is distributed through a triggered FanOut (i.e. variables from device is connected to other variables through a trigger, the usual way for poll-type variables) the data read from the receiving end of the variable cannot be considered valid if the DataValidity is faulty.
Additionaly, a change of to a faulty validity state will signal the availability of new data on those variables, which is to be considered invalid.


Bahnhof.Variables which are Constants or outputs of the ConfigReader and are connected to a DeviceModule should be written in an initialisation handler. Currently they are written in ConfigReader::pepare() etc., which might block the application initialisation if an exception occurs in the process of writing these variables.

-<b>Initialise the device after recovey.</b>

Description.

If a device is recovered after an exception, it might need to be reinitialised (e.g. because it was power cycled). The device should be automatically reinitialised after recovery.

Implementation.

A list of DeviceModule std::function is added. InitialisationHandlers can be added through construtor and addInitialisationHandler() function. When the device recovers all the initialisationHandlers in the list are executed.
- DeviceModule


-<b>Recover process variables after exception.</b>

Background.

After a device has failed and recovered, it might have re-booted and lost the values of the process variables that live in the server and are written to the device. Hence these values have to be re-written after the device has recovered.

Description.
Technically the issue is that the original value that has been written is not safely accessible when recovering. Inside the accessor the user buffer must not be touched because the recovery is taking place in a different thread. In addition we don't know where the data is (might or might not have been swapped away, depending whether write() or writeDestructively() has been call by the user).
The only race condition free way is to create a copy when writing the data to the device, so they are available when recovering.

Implementation.

- DeviceModule
- ExceptionHandlingDecorator
A list of TransferElements shared pointers is created with as writeRecoveryOpen which is populated in function addRecoveryAccessor in the DeviceModule.
ExceptionHandlingDecorator is extended by adding second accessor to the same register as the target accessor it is decorating and data is copied in doPreWrite().





*/