Exceptions must be handled by ApplicationCore in a way that the application developer does not have to care much about it.
In case of a ChimeraTK::runtime exception the Application must catch the expection and report it to the DeviceModule. The DeviceModule should handle this exception and block the device until the device can be opened again. As there could many devices make sure only the faulty device is blocked.
Even if a device is faulty it should not block the server from starting.
Once in error state, set the DataValidity flag for that module to faulty and propogate this to all of it‘s output variables. After the exception is cleared and operation returns without a data fault flag, set DataValidity flag to ok. Furthermore, the device must be reinitialised automatically and also recover the values of process variables as the device might have rebooted and the variables have been re-set.
-<b>Add an exception handling and reporting machinsm to the device module (DeviceModule).</b>
Description.
Add two error state variables.
- "state" (boolean flag if error occurred)
- "message" (string with error message)
These variables are automatically connected to the control systen in this format
- /Devices/{AliasName}/message
- /Devices/{AliasName}/status
Add a thread safe function reportException().
A user/application can report an exception by calling reportException of DeviceModule with an exception string. The reportException packs the exception in a queue and the blocks the thread. This queue is processed by an internal function handleException which updates the DeviceError variables (status=1 and message="YourExceptionString") and tries to open the device. Once device can be opened the DeviceError variables are updated (status=0 and message="") and blocking threads are notified to continue. It must be noted that whatever operation which lead to exception e.g., read or write, should be repeated after the exception is handled.
Catch all the ChimeraTK::runtime_error exceptions that could be thrown in read and write operations and feed the error state into the DeviceModule through the function DeviceModule::reportException() . NDRegisterAccessors coming from device should be used as a singal central point to catch these excpetions.
Retry the failed operation after reportException() returns.
Implmentation.
It is done by placing a ExceptionHandlingDecorator around all NDRegisterAccessors coming from a device.
- NDRegisterAccessors
- Application
-<b>Faulty device should not block any other device.</b>
Description.
Each TriggerFanOut deals with several variable networks at the same time, which are triggered by the same trigger. Each variable network has its own feeder and one or more consumers. You do not need to change anything about the variable networks.
On the other hand, the trigger itself is a variable network, too. The TriggerFanOut has a consumer of this trigger network. This is the accessor on which the blocking read() is called in the loop. You will need to create additional consumers in the trigger network, one for each TriggerFanOut.
Implementation.
- Application (Application::typedMakeConnection)
-<b>The Server must always start even if a device is in error state.</b>
Description.
To make sure that the server should always start, the initial opening of the device should take place in the DeviceModule itself, inside the exception handling loop so that device can go to the error state right at the beginning and the server can start despite not all its devices are available.
-<b>Set/clear fault flag of module in case of exception.</b>
Background.
A DataValidity flag of a module is set to faulty if any input variables returns with a set data fault flag after a read operation and is cleared once all inputs have data fault no longer set. In a write operation, the module's data fault flag status is attached to the variable to write.
More detail ...(Martin‘s doc)
Description.
In case of an ChimeraTK:runtime_error exception this DataValidity flag should also be set to faulty and propogated to all outputs of the module. When the operation completes after clearing the exception state, the flag should be cleared as well.
Implmentation.
- ExceptionHandlingDecorator
- TriggerFanOut
Additional note from code author.
Note that if the data is distributed through a triggered FanOut (i.e. variables from device is connected to other variables through a trigger, the usual way for poll-type variables) the data read from the receiving end of the variable cannot be considered valid if the DataValidity is faulty.
Additionaly, a change of to a faulty validity state will signal the availability of new data on those variables, which is to be considered invalid.
Bahnhof.Variables which are Constants or outputs of the ConfigReader and are connected to a DeviceModule should be written in an initialisation handler. Currently they are written in ConfigReader::pepare() etc., which might block the application initialisation if an exception occurs in the process of writing these variables.
-<b>Initialise the device after recovey.</b>
Description.
If a device is recovered after an exception, it might need to be reinitialised (e.g. because it was power cycled). The device should be automatically reinitialised after recovery.
Implementation.
A list of DeviceModule std::function is added. InitialisationHandlers can be added through construtor and addInitialisationHandler() function. When the device recovers all the initialisationHandlers in the list are executed.
- DeviceModule
-<b>Recover process variables after exception.</b>
Background.
After a device has failed and recovered, it might have re-booted and lost the values of the process variables that live in the server and are written to the device. Hence these values have to be re-written after the device has recovered.
Description.
Technically the issue is that the original value that has been written is not safely accessible when recovering. Inside the accessor the user buffer must not be touched because the recovery is taking place in a different thread. In addition we don't know where the data is (might or might not have been swapped away, depending whether write() or writeDestructively() has been call by the user).
The only race condition free way is to create a copy when writing the data to the device, so they are available when recovering.
Implementation.
- DeviceModule
- ExceptionHandlingDecorator
A list of TransferElements shared pointers is created with as writeRecoveryOpen which is populated in function addRecoveryAccessor in the DeviceModule.
ExceptionHandlingDecorator is extended by adding second accessor to the same register as the target accessor it is decorating and data is copied in doPreWrite().