Skip to content
Snippets Groups Projects
Commit cc26af55 authored by Martin Christoph Hierholzer's avatar Martin Christoph Hierholzer
Browse files

exception handling spec: reorder section B for better readability

parent e2e8f8ef
No related branches found
No related tags found
No related merge requests found
......@@ -88,88 +88,97 @@ When the device is functional, it be (re)initialised by using application-define
\section spec_execptionHandling_high_level_implmentation B. Implementation
- 1. A so-called ExceptionHandlingDecorator is placed around all device register accessors (used in ApplicationModules and FanOuts). It is responsible for catching the exceptions and implementing most of the behavior described in A.2.
- 1.1 The ExceptionHandlingDecorator will catch any runtime_error exception thrown in postRead/postWrite (exceptions from other stages are delayed there by the TransferElement base class). The following actions are executed in doPostRead/doPostWrite when an exception has been caught (note that certain actions have been executed already in preRead/preWrite, see 1.2.4, 1.3 and 1.4):
- 1.1.1 The error is reported to the DeviceModule via DeviceModule::reportException().
- 1.1.2 For readable accessors: override DataValidity returned by the accessor to faulty until next successful read operation
- 1.1.2.1 The code instantiating the decorator (Application::createDeviceVariable) has to make sure that the ExceptionHandlingDecorator is "inside" the MetaDataPropagatingRegisterDecorator, so the overriden DataValidity flag in case of an exception is properly propagated to the owning module/fan out.
- 1.1.3 Action depending on the calling operation:
- 1.1.3.1 All read operations: The ExceptionHandlingDecorator remembers that it is in an exception state by setting previousReadFailed = true
- 1.1.3.1 read (push-type inputs): return immediately (*)
- 1.1.3.2 readNonBlocking / readLatest / read (poll-type inputs): Just return false (no new data). The calling module thread will continue and propagate the DataValidity::faulty flag (cf. 1.1.2).
- 1.1.3.3 write: Do not block. Write will be later executed by the DeviceModule (cf. 1.2)
- 1.2 A second, undecorated copy of each writeable device register accessor (*) is used as a so-called recoveryAccessor by the ExceptionHandlingDecorator and the DeviceModule. These recoveryAccessor are used to set the initial values of registers when the device is opened for the first time and to recover the last written values during the recovery procedure.
- 1.2.1 The recovreryAccessor is stored with additional meta data in a so-called RecoveryHelper data structure, which contains:
- the recoveryAccessor itself,
- the VersionNumber of the (potentially unwritten) data stored in the accessor,
- an ordering parameter which determines the order of write opereations during recovery.
- a flag which indicates whether the value in the recoveryAccessor has already been written to data. (*)
- 1.2.2 Ordering can be done per device (*), hence each DeviceModule has one 64-bit atomic counter which is incremented for each write operation and the is value stored in the ordering parameter for the recoveryAccessor.
- 1.2.3 The RecoveryHelper object may be accessed only under a lock to prevent concurrent access during recovery. The lock shall be shared to allow concurrent write operations of different registers - only the DeviceModule needs to obtain an exclusive lock during recovery. The lock is obained by the ExceptionHandlingDecorators via DeviceModule::getRecoverySharedLock().
- 1.2.4 In doPreWrite() the recoveryAccessor with the version number and ordering parameter is updated, and the written flag is cleared.
- 1.2.4.1 If the written flag was previously not set, the return value of doWriteTransfer() must be forced to true (data lost).
- 1.2.5 In doPostWrite() the recoveryAccessor's written flag is set if the write was successful (no exception thrown; data lost flag does not matter here). (*)
- 1.3 As described in A.2.1, the ExceptionHandlingDecorator freezes certain read operations in case of exceptions. This is done as follows in doPreRead():
- 1.3.1 Obtain the recovery lock through DeviceModule::getRecoverySharedLock(), to prevent interference with an ongoing recovery procedure.
- 1.3.2 Decide, whether freezing is done (don't freeze yet). Freezing is done if one of the following conditions is met:
- read type is blocking and AccessMode::wait_for_new_data is set, previousReadFailed == true, and DeviceModule::deviceHasError == true (cf. A.2.1.2), or
- no initial value has been read yet (getCurretVersion() == {nullptr}) and DeviceModule::deviceHasError == true (cf. A.4.2).
- 1.3.3 Obtain the DeviceModule::errorLock. Only then release the recovery lock. (*)
- 1.3.4 Wait on DeviceModule::errorIsReportedCondVar.
- 1.4 In doPreRead/doPreWrite, check if fault state already prevails (check DeviceModule::deviceHasError while holding the recovery shared lock).
- 1.4.1 If yes, the actual transfer will be skipped. (cf. 2.2 or 2.3.13)
- 1.4.2 If the transfer will not be skipped, atomically increment DeviceModule::activeTransfers while still (!) holding the recovery shared lock.
- 1.4.3 write: The check for a prevailing fault state has to be done without releasing the lock between the write to the recoveryAccessor and the check. (*)
- 1.4.4 For skipped transfers, none of the pre/transfer/post functions must be delegated to the target accessor.
- 1.5 In doPostRead/doPostWrite:
- 1.5.1 If there was no exception, set previousReadFailed = false.
- 1.5.2 If in 1.4.2 the DeviceModule::activeTransfers counter was incremented, atomically decrement it.
- 2. DeviceModule:
- 2.1 The application always starts with all devices as closed. For each device, the initial value for Devices/<alias>/status is set to 1 and the initial value for Devices/<alias>/message is set to an error that the device has not been opened yet (the message will be overwritten with the real error message if the first attempt to open fails, see 2.3.1).
- 2.2 The DeviceModule takes care that ExceptionHandlingDecorators initally do not perform any read or write operations, but block (cf. 1.4). This happens before running any prepare() of an ApplicationModule, where the first write calls to ExceptionHandlingDecorators might be done.
- 2.3 In the DeviceModule thread, the following procedure is executed (in a loop until termination):
- 2.3.1 The DeviceModule tries to open the device until it succeeds and isFunctional() returns true.
- 2.3.1.1 If the very first attempt to open the device after the application start fails, the error message of the exception is used to overwrite the content of Devices/<alias>/message. Otherwise error messages of exceptions thrown by Device::open() are not visible.
- 2.3.2 Obtain lock for accessing recoveryAccessors.
- 2.3.3 Device is initialised by iterating initialisationHandlers list.
- 2.3.3.1 If there is an exception, update Devices/<alias>/message with the error message, release the lock and go back to 2.3.1.
- 2.3.4 All valid recoveryAccessors are written in the same order they were originally written.
- 2.3.4.1 A recoveryAccessor is considered "valid", if it has already received a value, i.e. its current version number is not {nullptr} any more.
- 2.3.4.2 If there is an exception, update Devices/<alias>/message with the error message, release the lock and go back to 2.3.1.
- 2.3.5 The queue of reported exceptions is cleared. (*)
- 2.3.6 Devices/<alias>/status is set to 0 and Devices/<alias>/message is set to an empty string.
- 2.3.7 DeviceModule allows ExceptionHandlingDecorators to execute reads and writes again (cf. 2.3.13)
- 2.3.8 All blocked read operations (cf. 1.1.3) are notified.
- 2.3.9 Release lock for recoveryAccessors.
- 2.3.10 The DeviceModuleThread waits for the next reported exception.
- 2.3.11 An exception is received.
- 2.3.12 Devices/<alias>/status is set to 1 and Devices/<alias>/message is set to the first received exception message.
- 2.3.13 Set DeviceModule::deviceHasError = true under exclusive recovery lock (cf. 1.4). From this point on, no new transfers will be started.
- 2.3.14 The device module waits until all running read and write operations of ExceptionHandlingDecorators have ended (wait until DeviceModule::activeTransfers == 0). (*)
- 2.3.15 The thread goes back to 2.3.1 and tries to re-open the device.
\subsection spec_execptionHandling_high_level_implmentation_decorator B.1 ExceptionHandlingDecorator
A so-called ExceptionHandlingDecorator is placed around all device register accessors (used in ApplicationModules and FanOuts). It is responsible for catching the exceptions and implementing most of the behavior described in A.2.
- 1.1 A second, undecorated copy of each writeable device register accessor (*) is used as a so-called recoveryAccessor by the ExceptionHandlingDecorator and the DeviceModule. These recoveryAccessor are used to set the initial values of registers when the device is opened for the first time and to recover the last written values during the recovery procedure.
- 1.1.1 The recoveryAccessor is stored by the DeviceModule with additional meta data in a so-called RecoveryHelper data structure, which contains:
- the recoveryAccessor itself,
- the VersionNumber of the (potentially unwritten) data stored in the accessor,
- an ordering parameter which determines the order of write opereations during recovery.
- a flag which indicates whether the value in the recoveryAccessor has already been written to data. (*)
- 1.1.2 Ordering can be done per device (*), hence each DeviceModule has one 64-bit atomic counter which is incremented for each write operation and the is value stored in the ordering parameter for the recoveryAccessor.
- 1.1.3 The RecoveryHelper object may be accessed only under a lock to prevent concurrent access during recovery. The lock shall be shared to allow concurrent write operations of different registers - only the DeviceModule needs to obtain an exclusive lock during recovery. The lock is obained by the ExceptionHandlingDecorators via DeviceModule::getRecoverySharedLock().
- 1.2 In doPreRead()/doPreWrite(), it is checked whether the fault state already prevails (check DeviceModule::deviceHasError while holding the recovery shared lock).
- 1.2.1 If yes, the actual transfer will be skipped. (cf. 2.2 or 2.3.13)
- 1.2.2 If the transfer will not be skipped, atomically increment DeviceModule::activeTransfers while still (!) holding the recovery shared lock.
- 1.2.3 write: The check for a prevailing fault state has to be done without releasing the lock between the write to the recoveryAccessor and the check. (*)
- 1.2.4 For skipped transfers, none of the pre/transfer/post functions must be delegated to the target accessor.
- 1.3 In doPreWrite() the recoveryAccessor with the version number and ordering parameter is updated, and the written flag is cleared.
- 1.3.1 If the written flag was previously not set, the return value of doWriteTransfer() must be forced to true (data lost).
- 1.4 In doPreRead() certain read operations are frozen in case of a fault state (see A.2.1):
- 1.4.1 Obtain the recovery lock through DeviceModule::getRecoverySharedLock(), to prevent interference with an ongoing recovery procedure.
- 1.4.2 Decide, whether freezing is done (don't freeze yet). Freezing is done if one of the following conditions is met:
- read type is blocking and AccessMode::wait_for_new_data is set, previousReadFailed == true, and DeviceModule::deviceHasError == true (cf. A.2.1.2), or
- no initial value has been read yet (getCurretVersion() == {nullptr}) and DeviceModule::deviceHasError == true (cf. A.4.2).
- 1.4.3 Obtain the DeviceModule::errorLock. Only then release the recovery lock. (*)
- 1.4.4 Wait on DeviceModule::errorIsReportedCondVar.
- 1.5 In doPostRead()/doPostWrite():
- 1.5.1 If there was no exception, set previousReadFailed = false.
- 1.5.2 If in 1.2.2 the DeviceModule::activeTransfers counter was incremented, atomically decrement it.
- 1.5.3 In doPostWrite() the recoveryAccessor's written flag is set if the write was successful (no exception thrown; data lost flag does not matter here). (*)
- 1.5.4 In doPostRead(), if no exception was thrown, end overriding the DataValidity returned by the accessor (cf. 1.6.2).
- 1.6 In doPostRead()/doPostWrite(), any runtime_error exception thrown by the delegated postRead()/postWrite() is caught (*). The following actions are in case of an exception:
- 1.6.1 The error is reported to the DeviceModule via DeviceModule::reportException().
- 1.6.2 For readable accessors: the DataValidity returned by the accessor is overridden to faulty until next successful read operation (cf. 1.5.4).
- 1.6.2.1 The code instantiating the decorator (Application::createDeviceVariable()) has to make sure that the ExceptionHandlingDecorator is "inside" the MetaDataPropagatingRegisterDecorator, so the overriden DataValidity flag in case of an exception is properly propagated to the owning module/fan out.
- 1.6.3 Action depending on the calling operation:
- 1.6.3.1 All read operations: The ExceptionHandlingDecorator remembers that it is in an exception state by setting previousReadFailed = true
- 1.6.3.1 read (push-type inputs): return immediately (*)
- 1.6.3.2 readNonBlocking / readLatest / read (poll-type inputs): Just return false (no new data). The calling module thread will continue and propagate the DataValidity::faulty flag (cf. 1.6.2).
- 1.6.3.3 write: Do not block. Write will be later executed by the DeviceModule (see 1.1)
\subsection spec_execptionHandling_high_level_implmentation_deviceModule B.2 DeviceModule
- 2.1 The application always starts with all devices as closed. For each device, the initial value for Devices/<alias>/status is set to 1 and the initial value for Devices/<alias>/message is set to an error that the device has not been opened yet (the message will be overwritten with the real error message if the first attempt to open fails, see 2.3.1).
- 2.2 The DeviceModule takes care that ExceptionHandlingDecorators initally do not perform any read or write operations, but freeze (cf. 1.4). This happens before running any prepare() of an ApplicationModule, where the first write calls to ExceptionHandlingDecorators might be done.
- 2.3 In the DeviceModule thread, the following procedure is executed (in a loop until termination):
- 2.3.1 The DeviceModule tries to open the device until it succeeds and isFunctional() returns true.
- 2.3.1.1 If the very first attempt to open the device after the application start fails, the error message of the exception is used to overwrite the content of Devices/<alias>/message. Otherwise error messages of exceptions thrown by Device::open() are not visible.
- 2.3.2 Obtain lock for accessing recoveryAccessors.
- 2.3.3 Device is initialised by iterating initialisationHandlers list.
- 2.3.3.1 If there is an exception, update Devices/<alias>/message with the error message, release the lock and go back to 2.3.1.
- 2.3.4 All valid recoveryAccessors are written in the same order they were originally written.
- 2.3.4.1 A recoveryAccessor is considered "valid", if it has already received a value, i.e. its current version number is not {nullptr} any more.
- 2.3.4.2 If there is an exception, update Devices/<alias>/message with the error message, release the lock and go back to 2.3.1.
- 2.3.5 The queue of reported exceptions is cleared. (*)
- 2.3.6 Devices/<alias>/status is set to 0 and Devices/<alias>/message is set to an empty string.
- 2.3.7 DeviceModule allows ExceptionHandlingDecorators to execute reads and writes again (cf. 2.3.13)
- 2.3.8 All frozen read operations (cf. 1.4.4) are notified via DeviceModule::errorIsReportedCondVar.
- 2.3.9 Release lock for recoveryAccessors.
- 2.3.10 The DeviceModuleThread waits for the next reported exception.
- 2.3.11 An exception is received.
- 2.3.12 Devices/<alias>/status is set to 1 and Devices/<alias>/message is set to the first received exception message.
- 2.3.13 Set DeviceModule::deviceHasError = true under exclusive recovery lock (cf. 1.2). From this point on, no new transfers will be started.
- 2.3.14 The device module waits until all running read and write operations of ExceptionHandlingDecorators have ended (wait until DeviceModule::activeTransfers == 0). (*)
- 2.3.15 The thread goes back to 2.3.1 and tries to re-open the device.
\subsection spec_execptionHandling_high_level_implmentation_comments (*) Comments
- 1.1.3.1 The freezing is done in doPreRead(), see 1.3.
- 1.6 Remember: exceptions from other phases are redirected to the post phase by the TransferElement base class.
- 1.2 Possible future change: Output accessors can have the option not to have a recovery accessor. This is needed for instance for "trigger registers" which start an operation on the hardware. Also void registers don't have recovery accessors (once the void data type is supported).
- 1.6.3.1 The freezing is done in doPreRead(), see 1.4.
- 1.2.1 The written flag cannot be replaced by comparing the version number of the recoveryAccessor and the version number stored in the RecoveryHelper, because normal writes (without exceptions) would not update the version number of the recoveryAccessor.
- 1.1 Possible future change: Output accessors can have the option not to have a recovery accessor. This is needed for instance for "trigger registers" which start an operation on the hardware. Also void registers don't have recovery accessors (once the void data type is supported).
- 1.2.2 The ordering guarantee cannot work across DeviceModules anyway. Different devices may go offline and recover at different times. Even in case of two DeviceModules which actually refer to the same hardware device there is no synchronisation mechanism which ensures the recovering procedure is done in a defined order.
- 1.1.1 The written flag cannot be replaced by comparing the version number of the recoveryAccessor and the version number stored in the RecoveryHelper, because normal writes (without exceptions) would not update the version number of the recoveryAccessor.
- 1.2.5 The written flag for the recoveryAccessor is used to report loss of data. If the loss of data is already reported directly, it should not later be reported again. Hence the written flag is set even if there was a loss of data in this context.
- 1.1.2 The ordering guarantee cannot work across DeviceModules anyway. Different devices may go offline and recover at different times. Even in case of two DeviceModules which actually refer to the same hardware device there is no synchronisation mechanism which ensures the recovering procedure is done in a defined order.
- 1.3.3 The order of locks is important here. The recovery lock prevents the DeviceModule from entering the section 2.3.2 to 2.3.9, which includes the notification through the DeviceModule::errorIsReportedCondVar at 2.3.8. The mutex DeviceModule::errorLock is the mutex used for the condition variable. Since the ExceptionHandlingDecorator obtains it before the DeviceModule can start the notification, it is guaranteed that the decorator does not miss the notification. Note that the DeviceModule::errorLock is not a shared lock, so concurrent ExceptionHandlingDecorator::preRead() will mutually exclude, but the mutex is held only for a short time until errorIsReportedCondVar.wait() is called.
- 1.5.3 The written flag for the recoveryAccessor is used to report loss of data. If the loss of data is already reported directly, it should not later be reported again. Hence the written flag is set even if there was a loss of data in this context.
- 1.4.3 The lock excludes that the DeviceModule is between 2.3.2 and 2.3.9. If it is right before, the device is still in fault state and the value written to the recoveryAccessor is guaranteed to be written in 2.3.4. If it is right after, the exception state has already been resolved and the real write transfer will be attempted by the ExceptionHandlingDecorator.
- 1.4.3 The order of locks is important here. The recovery lock prevents the DeviceModule from entering the section 2.3.2 to 2.3.9, which includes the notification through the DeviceModule::errorIsReportedCondVar at 2.3.8. The mutex DeviceModule::errorLock is the mutex used for the condition variable. Since the ExceptionHandlingDecorator obtains it before the DeviceModule can start the notification, it is guaranteed that the decorator does not miss the notification. Note that the DeviceModule::errorLock is not a shared lock, so concurrent ExceptionHandlingDecorator::preRead() will mutually exclude, but the mutex is held only for a short time until errorIsReportedCondVar.wait() is called.
- 1.6.3 The lock excludes that the DeviceModule is between 2.3.2 and 2.3.9. If it is right before, the device is still in fault state and the value written to the recoveryAccessor is guaranteed to be written in 2.3.4. If it is right after, the exception state has already been resolved and the real write transfer will be attempted by the ExceptionHandlingDecorator.
- 2.3.5 The exact place when this is done does not matter, as long as it is done under the lock for the recoveryAccessors.
......@@ -178,6 +187,7 @@ When the device is functional, it be (re)initialised by using application-define
\section spec_execptionHandling_known_issues Known issues - OUTDATED (numbers don't even match)
<strike>
- 11.1 In step 2.1: The initial value of deviceError is not set to 1.
- 11.2 In step 2.2.3: is not correctly fulfilled as we are only waiting for device to be opened and don't wait for it to be correctly initialised. The lock 4.2.3 is not implemented at all.
......@@ -212,7 +222,7 @@ When the device is functional, it be (re)initialised by using application-define
- 11.22 In 3.4: The TransferType is not known. Needs to be implemented in TransferElement
- 11.23 In 3.5: PostRead is currently skipped if readNonBlocking or readLatest does not have new data
- 11.24 In 3.6: The waitForNewData calls in the DoocsBackend (using zmq) are currently not interruptible
</strike>
*/
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment