- 1.1 All ChimeraTK::runtime_error exceptions thrown by device register accessors are handled by the framework and are never exposed to user code in ApplicationModules.
- 1.2 After an exception has been received, read operations will propagate the DataValidity::faulty. Blocking read operations will block after the flag has been read (i.e. on the second read of the same accessor), while non-blocking read operations will never block. Write operations will block always. [TBD: is this really a good idea? <b>COMMENT</b>: The order of write operations is still not guaranteed through the recovery accessors (which maybe should be changed), and blocking writes has some severe drawbacks. Not only in fan outs but also in normal ApplicationModules blocking writes will prevent propagation of DataValidity flags! Blocking writes might help if a sequence of values is written to the same register - this is not handled by the recovery accessor. But if a handshake register is read back in between the writes, the situation can already be handled properly (check DataValidity flag, restart sequence after recovery). Maybe blocking writes create more probelms then they solve!? On the other hand, how does the application then know that a write() has no effect yet? E.g. a PI controller might wind-up if actuator and sensor are on different devices and the actuator fails. Then again, how is this different a failing actuator hardware without breaking the communication? Some form of a status readback of the actuator again cures the situation. I think I am in favour of "fire-and-forget" writes.].
- 1.2.1 Write should not block in case of an exception for the outputs of ThreadedFanOut / TriggerFanOut.
- 1.2.2 According to \link spec_initialValuePropagation \endlink, writes in ApplicationModules do not block before the first successful read in the main loop.
- 1.3 The framework will try to resolve the exception state by periodically re-opening the faulty device.
- 1.4 After successfully re-opening the device, a recovery procedure is executed before allowing any read/write operations from the AppliactionModules and FanOuts again. This recovery procedure involves:
- 1.4.1 the execution of initialisation handlers, and
- 1.4.2 restoring all registers that have been written since the start of the application with their latest values. The register values are restored in the same order they were written. [<b>NEW REQUIREMENT!</b>] (*)
- 1.5 An initialisation handler can be added to the DeviceModule in the user code. Initialisation handlers are callback function which will be executed when a device is opened for the first time and after a device recovers from an exception, before any process variables are written. See DeviceModule::addInitialisationHandler().
- 1.6 The behaviour at application start (when all devices are still closed at first) is similar to the case of a later received exception. The only differences are mentioned in 1.2.2 and 1.8.
- 1.7 Even if some devices are initially in a persisting error state, the part of the application which does not interact with the faulty devices will start and work normally.
- 1.8 Initial values are correctly propagated after a device is opened. See \link spec_initialValuePropagation \endlink. Especially, no read function (even readNonBlocking/readLatest) will return before an initial value has been received.
- 1.9 Exception handling and DataValidity flag propagation is implemented such that it is transparent to a module whether it is directly connected to a device, or whether a fanout or another application module is in between.
- 1.10 ChimeraTK::logic_error exceptions are left unhandled and will terminate the application. These errors may only occur in the initialisation phase (up to the point where all devices are opened and initialised) and point to a severe configuration error which is not recoverable. (*)
- 1.2 When an exception has been received (thrown by a device register accessor in an ApplicationModule, FanOut etc.):
- 1.2.1 The exception status is published as a process variable together with an error message.
- 1.2.1.1 The variable Devices/<alias>/
- 1.2.1 Read operations will propagate the DataValidity::faulty to the owning module / fan out (without changing the actual value).
- 1.2.2 The normal module algorithm code will be continued, to allow this flag to propagate to the outputs in the same way as if it had been received through the process variable itself (c.f. 1.9).
- 1.2.3 Blocking read operations will block after the flag has been read and propagated once (i.e. on the second blocking read of the same accessor).
- 1.2.4 Non-blocking read operations (incl. readLatest) never block.
- 1.2.5 Asynchronous read operations behave analogue to 1.2.3: The TransferFuture, which was valid while the exception was received, is fulfilled once, the DataValidity::faulty is propaated to the owning module and the value is left unchanged. The TransferFuture will only be fulfilled again after the device has been recovered.
- 1.2.6 [TBD: proposed replacement for the next bullet point including sub-points, see discussion there] Write operations never block. In case of an exception (new or persisting), the actual write operation will be delayed until the device is functional and recovered again. The same mechanism as used for 1.3.1.2 is used here, hence the order of write operations is guaranteed across accessors, but only the latest written value of each accessor prevails. (*)
- <strike>1.2.6 Write operations will block immediately until the device has been recovered and the write operation has been completed. [TBD: is this really a good idea? <b>COMMENT</b>: The order of write operations is still not guaranteed through the recovery accessors (which maybe should be changed), and blocking writes has some severe drawbacks. Not only in fan outs but also in normal ApplicationModules blocking writes will prevent propagation of DataValidity flags! Blocking writes might help if a sequence of values is written to the same register - this is not handled by the recovery accessor. But if a handshake register is read back in between the writes, the situation can already be handled properly (check DataValidity flag, restart sequence after recovery). Maybe blocking writes create more probelms then they solve!? On the other hand, how does the application then know that a write() has no effect yet? E.g. a PI controller might wind-up if actuator and sensor are on different devices and the actuator fails. Then again, how is this different a failing actuator hardware without breaking the communication? Some form of a status readback of the actuator again cures the situation. I think I am in favour of "fire-and-forget" writes.].</strike>
- <strike>1.2.6.1 Write should not block in case of an exception for the outputs of ThreadedFanOut / TriggerFanOut.</strike>
- <strike>1.2.6.2 According to \link spec_initialValuePropagation \endlink, writes in ApplicationModules do not block before the first successful read in the main loop.</strike>
- 1.2.7 In case of exceptions, there is no guaranteed realtime behaviour, not even for "non-blocking" transfers. (*)
- 1.3 The framework tries to resolve an exception state by periodically re-opening the faulty device.
- 1.3.1 After successfully re-opening the device, a recovery procedure is executed before allowing any read/write operations from the AppliactionModules and FanOuts again. This recovery procedure involves:
- 1.3.1.1 the execution of so-called initialisation handlers (cf. 1.3.2), and
- 1.3.1.2 restoring all registers that have been written since the start of the application with their latest values. The register values are restored in the same order they were written. [<b>NEW REQUIREMENT!</b>] (*)
- 1.3.1.3 Finally, Devices/<alias>/deviceBecameFunctional is written to inform any module subscribing this variable about the finished recovery. (*)
- 1.3.2 Any number of initialisation handlers can be added to the DeviceModule in the user code. Initialisation handlers are callback function which will be executed when a device is opened for the first time and after a device recovers from an exception, before any process variables are written. See DeviceModule::addInitialisationHandler().
- 1.4 The behaviour at application start (when all devices are still closed at first) is similar to the case of a later received exception. The only differences are mentioned in <strike>1.2.6.2 and</strike> 1.4.2.
- 1.4.1 Even if some devices are initially in a persisting error state, the part of the application which does not interact with the faulty devices starts and works normally.
- 1.4.2 Initial values are correctly propagated after a device is opened. See \link spec_initialValuePropagation \endlink. Especially, no read function (even readNonBlocking/readLatest) will return before an initial value has been received.(*)
- 1.5 Exception handling and DataValidity flag propagation is implemented such that it is transparent to a module whether it is directly connected to a device, or whether a fanout or another application module is in between.
- 1.6 ChimeraTK::logic_error exceptions are left unhandled and will terminate the application. These errors may only occur in the initialisation phase (up to the point where all devices are opened and initialised) and point to a severe configuration error which is not recoverable. (*)
- 1.2.1 If writes in ThreadedFanOut/TriggerFanOut would block, the other receivers would no longer receive updates. The exact behaviour would not even be well-defined, since the order of writes in the fanouts is random.
- <strike>1.2.5.1 If writes in ThreadedFanOut/TriggerFanOut would block, the other receivers would no longer receive updates. The exact behaviour would not even be well-defined, since the order of writes in the fanouts is random.</strike>
- 1.4.2 It may be important to guarantee the order of writes. Please note that the VersionNumber is insufficient as a sorting criteria, since many writes may have been done with the same VersionNumber (in an ApplicationModule, the VersionNumber used for the writes is determined by the largest VersionNumber of the inputs).
- 1.2.6 / 1.3.1.3 If timing is important for write operations (e.g. must not write a sequence of registers too fast), or if multiple values need to be written to the same register in sequence, the application cannot fully rely on the framework's recovery procedure. The framework hence provides the process variable Devices/<alias>/deviceBecameFunctional for each device, which will be written each time the recovery procedure is completed (cf. 1.3.1.3). ApplicationModules which implement such timed sequence need to receive this variable and restart the entire sequence after the recovery.
- 1.8 DataValidity::faulty is set at first by default, so there is no need to propagate this flag initially. To prevent race conditions and undefined behaviour, it even needs to be made sure that the flag is not propagated unnecessarily. See also \link spec_initialValuePropagation \endlink.
- 1.2.7 Even non-blocking read and write operations are not truely non-blocking, since they are still synchronous. The "non-blocking" guarantee only means that the operation does not block for an extended period of time until the fault state has been cleared. For the duration of the recovery procedure and of course for timeout periods these operations may still block.
- 1.10 In future, maybe logic_errors are also handled, so configuration errors can nicely be presented in the control system. This may be important especially since logic_errors may depend on the configuration of external components (devices). If e.g. a device is changed (e.g. device is another control system application which has been modified), logic_errors may be thrown in the recovery phase, despite the device had been successfully initialsed previously.
- 1.3.1.2 For some applications, the order of writes may be important, e.g. if firmware expects this. Please note that the VersionNumber is insufficient as a sorting criteria, since many writes may have been done with the same VersionNumber (in an ApplicationModule, the VersionNumber used for the writes is determined by the largest VersionNumber of the inputs).
- 1.4.2 DataValidity::faulty is set at first by default, so there is no need to propagate this flag initially. To prevent race conditions and undefined behaviour, it even needs to be made sure that the flag is not propagated unnecessarily. See also \link spec_initialValuePropagation \endlink.
- 1.6 In future, maybe logic_errors are also handled, so configuration errors can nicely be presented to the control system. This may be important especially since logic_errors may depend also on the configuration of external components (devices). If e.g. a device is changed (e.g. device is another control system application which has been modified), logic_errors may be thrown in the recovery phase, despite the device had been successfully initialsed previously.
\section spec_execptionHandling_high_level_implmentation 2. High-level description of the implementation