Skip to content
Snippets Groups Projects
Commit 49d195ce authored by Martin Christoph Hierholzer's avatar Martin Christoph Hierholzer
Browse files

[wip] Exception handling spec: mostly minor corrections

parent b4abc140
No related branches found
No related tags found
No related merge requests found
...@@ -90,7 +90,7 @@ FIXME: NUMBERING ...@@ -90,7 +90,7 @@ FIXME: NUMBERING
Note: This section defines the internal interface on a low level. Helper functions, like getters and setters, are intenionally not mentioned here, since those are (in this context) unimportant details which can be chosen at will to structure the code conveniently. The entire interface between the ExceptionHandlingDecorator and the DeviceModule should be protected and the two classes should be friends, to prevent interference with the interface from other entities. Only DeviceModule::reportException() is public, see A.5. Note: This section defines the internal interface on a low level. Helper functions, like getters and setters, are intenionally not mentioned here, since those are (in this context) unimportant details which can be chosen at will to structure the code conveniently. The entire interface between the ExceptionHandlingDecorator and the DeviceModule should be protected and the two classes should be friends, to prevent interference with the interface from other entities. Only DeviceModule::reportException() is public, see A.5.
- 4.1 The boolean flag DeviceModule::deviceHasError - 4.1 The boolean flag DeviceModule::deviceHasError
- 4.1.1 is used by the RecoveryAccessor to detect prevailing error conditions, to know when transfers have to be skipped, frozen or delayed (cf. 1.2 and 1.4). - 4.1.1 is used by the ExceptionHandlingDecorator to detect prevailing error conditions, to know when transfers have to be skipped, frozen or delayed (cf. 1.2 and 1.4).
- 4.1.2 The access is protected by the DeviceModule::errorMutex: - 4.1.2 The access is protected by the DeviceModule::errorMutex:
- shared lock allows to read - shared lock allows to read
- unique lock allows to read and write - unique lock allows to read and write
...@@ -106,18 +106,18 @@ Note: This section defines the internal interface on a low level. Helper functio ...@@ -106,18 +106,18 @@ Note: This section defines the internal interface on a low level. Helper functio
- unique lock allows to call RecoveryHelper::accessor.write() and to read the RecoveryHelper::versionNumber - unique lock allows to call RecoveryHelper::accessor.write() and to read the RecoveryHelper::versionNumber
- 4.4 The cppext::future_queue DeviceModule::errorQueue - 4.4 The cppext::future_queue DeviceModule::errorQueue
- 4.4.1 is used by the RecoveryAccessor to inform the DeviceModule about new exceptions. - 4.4.1 is used by the ExceptionHandlingDecorator to inform the DeviceModule about new exceptions.
- 4.6 The following mutexes govern critical sections (besides variable access listed above): - 4.5 The following mutexes govern critical sections (besides variable access listed above):
- 4.6.1 DeviceModule::errorMutex protects (*) - 4.5.1 DeviceModule::errorMutex protects (*)
- the (positive) decision to start a transfer followed by incrementing the DeviceModule::transferCounter in 1.2.1 to 1.2.3, against - the (positive) decision to start a transfer followed by incrementing the DeviceModule::transferCounter in 1.2.1 to 1.2.3, against
- setting DeviceModule::deviceHasError flag in 1.6.1. - setting DeviceModule::deviceHasError flag in 1.6.1.
- 4.6.2 DeviceModule::recoveryMutex protects (*) - 4.5.2 DeviceModule::recoveryMutex protects (*)
- writing the DeviceModule::recoveryHelpers to the device and clearing the DeviceModule::deviceHasError flag in 2.3.5 to 2.3.8, against - writing the DeviceModule::recoveryHelpers to the device and clearing the DeviceModule::deviceHasError flag in 2.3.5 to 2.3.8, against
- updating the DeviceModule::recoveryHelpers in 1.3. - updating the DeviceModule::recoveryHelpers in 1.3.
- 4.6.3 DeviceModule::initialValueMutex protects (*) - 4.5.3 DeviceModule::initialValueMutex protects (*)
- the start of a read operation in 1.4.4, against - the start of a read operation in 1.4.4, against
- the setup phase of a device until it has been opened and recovered for the very first time in 2.1 to 2.9. - the setup phase of a device until it has been opened and recovered for the very first time in 2.1 to 2.9.
...@@ -128,28 +128,29 @@ Note: This section defines the internal interface on a low level. Helper functio ...@@ -128,28 +128,29 @@ Note: This section defines the internal interface on a low level. Helper functio
- 4.3.2 A shared lock (in contrast to an exclusive lock) is used for the same reasons as in 4.2. - 4.3.2 A shared lock (in contrast to an exclusive lock) is used for the same reasons as in 4.2.
- 4.6.1 This prevents a race condition in 2.3.15. If a (synchronous) transfer might be started after DeviceModule::deviceHasError has been set, the barrier for new transfers in 2.3.15 would not be effective and the transfer might be even executed only after the device has been re-openend (2.3.1) but before the recovery is complete. - 4.5.1 This prevents a race condition in 2.3.15. If a (synchronous) transfer might be started after DeviceModule::deviceHasError has been set, the barrier for new transfers in 2.3.15 would not be effective and the transfer might be even executed only after the device has been re-openend (2.3.1) but before the recovery is complete.
- 4.6.2 This prevents data loss due to a race condition. If the ExceptionHandlingDecorator would update the corresponding DeviceModule::RecoveryHelpers list entry only after it has been written to the device in 2.3.5, but the ExceptionHandlingDecorator would decide not to execute the write operation (1.2) because the DeviceModule thread is still before 2.3.8, the data would not be written to the device at all. - 4.5.2 This prevents data loss due to a race condition. If the ExceptionHandlingDecorator would update the corresponding DeviceModule::RecoveryHelpers list entry only after it has been written to the device in 2.3.5, but the ExceptionHandlingDecorator would decide not to execute the write operation (1.2) because the DeviceModule thread is still before 2.3.8, the data would not be written to the device at all.
- 4.6.3 This implements freezing reads until the initial value can be read, cf. 4.2. - 4.5.3 This implements freezing reads until the initial value can be read, cf. 4.2.
\subsection spec_execptionHandling_high_level_implmentation_decorator B.1 ExceptionHandlingDecorator \subsection spec_execptionHandling_high_level_implmentation_decorator B.1 ExceptionHandlingDecorator
- 1.1 A second, undecorated copy of each writeable device register accessor (*) is used as a so-called recoveryAccessor by the ExceptionHandlingDecorator and the DeviceModule. These recoveryAccessor are used to set the initial values of registers when the device is opened for the first time and to recover the last written values during the recovery procedure. - 1.1 A second, undecorated copy of each writeable device register accessor (*), the so-called recovery accessor, is stored in the DeviceModule::recoveryHelpers. These recoveryHelpers are used to set the initial values of registers when the device is opened for the first time and to recover the last written values during the recovery procedure.
- 1.1.1 The recoveryAccessor is stored by the DeviceModule with additional meta data in a so-called RecoveryHelper data structure, which contains: - 1.1.1 The DeviceModule::recoveryHelpers is a list of RecoveryHelper objects, which each contain:
- the recoveryAccessor itself, - RecoveryHelper::accessor, the recovery accessor itself,
- the VersionNumber of the (potentially unwritten) data stored in the accessor, - RecoveryHelper::versionNumber, the VersionNumber of the (potentially unwritten) data stored in the value buffer of the accessor,
- an ordering parameter which determines the order of write opereations during recovery. - RecoveryHelper::writeOrder, an ordering parameter which determines the order of write opereations during recovery.
- an atomic flag which indicates whether the value in the recoveryAccessor has already been written to data. (*) - RecoveryHelper::wasWritten, an atomic flag which indicates whether the data in the value buffer of the RecoveryHelper::accessor has already been written to the device. (*)
- 1.1.2 Ordering can be done per device (*), hence each DeviceModule has one 64-bit atomic counter which is incremented for each write operation and the value is stored in the ordering parameter for the recoveryAccessor. - 1.1.2 Ordering can be done per device (*), hence each DeviceModule has one 64-bit atomic counter which is incremented for each write operation and the value is stored in RecoveryHelper::writeOrder.
- 1.1.3 The RecoveryHelper object may be accessed only under a lock to prevent concurrent access during recovery. The lock shall be shared to allow concurrent write operations of different registers - only the DeviceModule needs to obtain an exclusive lock during recovery. The lock is obained by the ExceptionHandlingDecorators via DeviceModule::getRecoverySharedLock(). - 1.1.3 The RecoveryHelper objects may be accessed only under a lock, see 4.3.
- 1.3 In doPreWrite() the recoveryAccessor with the version number and ordering parameter is updated, and the written flag is cleared. This has to happen while holding the shared recovery lock. - 1.3 In doPreWrite() the RecoveryHelper is updated. This has to happen while holding the shared recovery lock.
- 1.3.0 This step needs to be done unconditionally at the very beginning of doPreWrite(), before 1.2 and before delegating preWrite(). (*) - 1.3.0 This step needs to be done unconditionally at the very beginning of doPreWrite(), before 1.2 and before delegating preWrite(). (*)
- 1.3.1 If the written flag was previously not set, the return value of doWriteTransfer() must be forced to true (data lost). - 1.3.1 If the written flag was previously not set, the return value of doWriteTransfer() must be forced to true (data lost).
- 1.3.2 The check wheterh to skip the transfer (cf. 1.2) has to be done without releasing the lock between the write to the recoveryAccessor and the check. (*) - 1.3.x Update the value buffer of the RecoveryHelper::accessor
- 1.3.2 The check whether to skip the transfer (cf. 1.2) has to be done without releasing the lock between the update of the RecoveryHelper and the check. (*)
- 1.2 In doPreRead()/doPreWrite(), it must be decided whether to execute xxxTransferYyy(). This part requires a shared lock on the DeviceModule::errorMutex. - 1.2 In doPreRead()/doPreWrite(), it must be decided whether to execute xxxTransferYyy(). This part requires a shared lock on the DeviceModule::errorMutex.
- 1.2.1 xxxTransferYyy() is <i>not</i> executed, if DeviceModule::deviceHasError == true and either: - 1.2.1 xxxTransferYyy() is <i>not</i> executed, if DeviceModule::deviceHasError == true and either:
...@@ -169,9 +170,9 @@ Note: This section defines the internal interface on a low level. Helper functio ...@@ -169,9 +170,9 @@ Note: This section defines the internal interface on a low level. Helper functio
- 1.5 In doPostRead()/doPostWrite(): - 1.5 In doPostRead()/doPostWrite():
- 1.5.0 Delegate postRead() / postWrite() (see 1.6) - 1.5.0 Delegate postRead() / postWrite() (see 1.6)
- 1.5.1 If there was no exception, set ExceptionHandlingDecorator::previousReadFailed = false (cf. 1.2.1 and 1.6.3.1). - 1.5.1 If there was no exception, set ExceptionHandlingDecorator::previousReadFailed = false (cf. 1.2.1 and 1.6.3.1).
- 1.5.3 In doPostWrite() the recoveryAccessor's written flag is set if the write was successful (no exception thrown; data lost flag does not matter here). (*)
- 1.5.4 In doPostRead(), if no exception was thrown, end overriding the DataValidity returned by the accessor (cf. 1.6.2).
- 1.5.2 If the DeviceModule::transferCounter was incremented in 1.2.3, decrement it. (*) - 1.5.2 If the DeviceModule::transferCounter was incremented in 1.2.3, decrement it. (*)
- 1.5.3 In doPostWrite() the RecoveryHelper::wasWritten flag is set if the write was successful (no exception thrown; data lost flag does not matter here). (*)
- 1.5.4 In doPostRead(), if no exception was thrown, end overriding the DataValidity returned by the accessor (cf. 1.6.2).
- 1.6 In doPostRead()/doPostWrite(), any runtime_error exception thrown by the delegated postRead()/postWrite() is caught (*). The following actions are in case of an exception: - 1.6 In doPostRead()/doPostWrite(), any runtime_error exception thrown by the delegated postRead()/postWrite() is caught (*). The following actions are in case of an exception:
- 1.6.1 The error is reported to the DeviceModule via DeviceModule::reportException(). This automatically sets DeviceModule::deviceHasError to true. From this point on, no new transfers will be started.(*) - 1.6.1 The error is reported to the DeviceModule via DeviceModule::reportException(). This automatically sets DeviceModule::deviceHasError to true. From this point on, no new transfers will be started.(*)
...@@ -187,15 +188,13 @@ Note: This section defines the internal interface on a low level. Helper functio ...@@ -187,15 +188,13 @@ Note: This section defines the internal interface on a low level. Helper functio
\subsubsection spec_execptionHandling_high_level_implmentation_decorator_comments (*) Comments \subsubsection spec_execptionHandling_high_level_implmentation_decorator_comments (*) Comments
- 1.1 Possible future change: Output accessors can have the option not to have a recovery accessor. This is needed for instance for "trigger registers" which start an operation on the hardware. Also void registers don't have recovery accessors (once the void data type is supported). - 1.1 Possible future change: Output accessors can have the option not to have a RecoveryHelper. This is needed for instance for "trigger registers" which start an operation on the hardware. Also void registers don't have a RecoveryHelper (once the void data type is supported by ChimeraTK).
- 1.1.1 The written flag cannot be replaced by comparing the version number of the recoveryAccessor and the version number stored in the RecoveryHelper, because normal writes (without exceptions) would not update the version number of the recoveryAccessor. - 1.1.1 The written flag cannot be replaced by comparing RecoveryHelper::accessor.getCurrentVersion() and RecoveryHelper::versionNumber, because normal writes (without exceptions) would not update the version number of the RecoveryHelper::accessor. The written flag is atomic so it can be set without getting the recoveryLock again in doPostWrite(). This has to happen before calling DeviceModule::stopTransfer() to ensure the DeviceModule does not start the recovery yet. When clearing it in doPreWrite(), and setting it in the DeviceModule during recovery, the recoveryLock must be held (see 4.5.2).
- 1.1.1 The flag is atomic so it can be set without getting the recoveryLock again in doPostRead(). This has to happen before calling DeviceModule::stopTransfer() to ensure the DeviceModule() does not start the recovery yet.
When clearing it in doPreRead(), and setting it in the DeviceModule during recovery, the recoveryLock must be held.
- 1.1.2 The ordering guarantee cannot work across DeviceModules anyway. Different devices may go offline and recover at different times. Even in case of two DeviceModules which actually refer to the same hardware device there is no synchronisation mechanism which ensures the recovering procedure is done in a defined order. - 1.1.2 The ordering guarantee cannot work across DeviceModules anyway. Different devices may go offline and recover at different times. Even in case of two DeviceModules which actually refer to the same hardware device there is no synchronisation mechanism which ensures the recovering procedure is done in a defined order.
- 1.3.0 Updating the recoveryHelper first ensures that no data is lost, even if the write operation attempt is concurrent with a recovery. See 4.6.2. - 1.3.0 Updating the recoveryHelper first ensures that no data is lost, even if the write operation attempt is concurrent with a recovery. See 4.5.2.
- 1.3.2 Extending the duration of the lock until the decision whether to skip the transfer will prevent unncessary duplicate writes, which otherwise could occur if the DeviceModule went through the whole critical section 2.3.2 to 2.3.10 in between. - 1.3.2 Extending the duration of the lock until the decision whether to skip the transfer will prevent unncessary duplicate writes, which otherwise could occur if the DeviceModule went through the whole critical section 2.3.2 to 2.3.10 in between.
...@@ -205,9 +204,9 @@ Note: This section defines the internal interface on a low level. Helper functio ...@@ -205,9 +204,9 @@ Note: This section defines the internal interface on a low level. Helper functio
- 1.4.4 The transferCounter is already incremeted at this point. It is acceptable to freeze anyway in this case by waiting on the initialValueMutex, because the DeviceModule release the mutex after the first successful recovery and never obtains it again, and this happens before it waits for the transferCounter to become 0 in 2.3.15. - 1.4.4 The transferCounter is already incremeted at this point. It is acceptable to freeze anyway in this case by waiting on the initialValueMutex, because the DeviceModule release the mutex after the first successful recovery and never obtains it again, and this happens before it waits for the transferCounter to become 0 in 2.3.15.
- 1.5.2 The state of DeviceModule::deviceHasError does not matter here. The counter always MUST be decreased after a transfer (if it has been incremented in the corresponding preXxx()), whether the transfer failed or not. Also, this must happen after 1.5.3 ===> why? DeviceModule::transferCounter > 0 prevents the DeviceModule from starting the recovery, but during the recovery the written flag will also just be set and not read. The written flag is merely used to determine in the next write whether data has been lost (which is the case if the written flag is not set). - 1.5.2 The state of DeviceModule::deviceHasError does not matter here. The counter always MUST be decreased after a transfer (if it has been incremented in the corresponding preXxx()), whether the transfer failed or not. Note: the exact place of decrementing the counter within doPostXxx does not matter, it just has to be done after delegating to postXxx(). The other actions on doPostXxx() have no influence on the behaviour of the DeviceModule.
- 1.5.3 The written flag for the recoveryAccessor is used to report loss of data. If the loss of data is already reported directly, it should not later be reported again. Hence the written flag is set even if there was a loss of data in this context. - 1.5.3 The RecoveryHelper::wasWritten flag is used to report loss of data. If the loss of data is already reported directly, it should not later be reported again. Hence the written flag is set even if there was a loss of data in this context.
- 1.6 Remember: exceptions from other phases are redirected to the post phase by the TransferElement base class. - 1.6 Remember: exceptions from other phases are redirected to the post phase by the TransferElement base class.
...@@ -219,25 +218,33 @@ Note: This section defines the internal interface on a low level. Helper functio ...@@ -219,25 +218,33 @@ Note: This section defines the internal interface on a low level. Helper functio
\subsection spec_execptionHandling_high_level_implmentation_deviceModule B.2 DeviceModule \subsection spec_execptionHandling_high_level_implmentation_deviceModule B.2 DeviceModule
- 2.1 The application always starts with all devices as closed. For each device, the initial value for Devices/<alias>/status is set to 1 and the initial value for Devices/<alias>/message is set to an error that the device has not been opened yet (the message will be overwritten with the real error message if the first attempt to open fails, see 2.3.1). - 2.1 The application always starts with all devices as closed. For each device, the initial value for Devices/<alias>/status is set to 1 and the initial value for Devices/<alias>/message is set to an error that the device has not been opened yet (the message will be overwritten with the real error message if the first attempt to open fails, see 2.3.1).
- 2.2 The DeviceModule takes care that ExceptionHandlingDecorators initally do not perform any read or write operations, but freeze (cf. 1.4). This happens before running any prepare() of an ApplicationModule, where the first write calls to ExceptionHandlingDecorators might be done. - 2.2 The DeviceModule takes care that ExceptionHandlingDecorators initally do not perform any read or write operations, but freeze (cf. 1.4). This happens before running any prepare() of an ApplicationModule, where the first write calls to ExceptionHandlingDecorators might be done.
- 2.3 In the DeviceModule thread, the following procedure is executed (in a loop until termination): - 2.3 In the DeviceModule thread, the following procedure is executed (in a loop until termination):
- 2.3.1 The DeviceModule tries to open the device until it succeeds and isFunctional() returns true. - 2.3.1 The DeviceModule tries to open the device until it succeeds and isFunctional() returns true.
- 2.3.1.1 If the very first attempt to open the device after the application start fails, the error message of the exception is used to overwrite the content of Devices/<alias>/message. Otherwise error messages of exceptions thrown by Device::open() are not visible. - 2.3.1.1 If the very first attempt to open the device after the application start fails, the error message of the exception is used to overwrite the content of Devices/<alias>/message. Otherwise error messages of exceptions thrown by Device::open() are not visible.
- New position for 2.3.6 The queue of reported exceptions is cleared. (*) - New position for 2.3.6 The queue of reported exceptions is cleared. (*)
- 2.3.3 Check that all registers on DeviceModule::listOfReadRegisters are isReadable() and all registers on DeviceModule::listOfWriteRegisters are isWriteable(). - 2.3.3 Check that all registers on DeviceModule::listOfReadRegisters are isReadable() and all registers on DeviceModule::listOfWriteRegisters are isWriteable().
- 2.3.3.1 This involves obtaining an accessor for the register first, which is discarded after the check. - 2.3.3.1 This involves obtaining an accessor for the register first, which is discarded after the check.
- 2.3.3.2 If there is an exception, update Devices/<alias>/message with the error message and go back to 2.3.1. - 2.3.3.2 If there is an exception, update Devices/<alias>/message with the error message and go back to 2.3.1.
- 2.3.3.3 If one of the accessors does not meet this condition, throw a ChimeraTK::logic_error. - 2.3.3.3 If one of the accessors does not meet this condition, throw a ChimeraTK::logic_error.
- 2.3.4 Device is initialised by iterating initialisationHandlers list. - 2.3.4 Device is initialised by iterating initialisationHandlers list.
- 2.3.4.1 If there is an exception, update Devices/<alias>/message with the error message and go back to 2.3.1. - 2.3.4.1 If there is an exception, update Devices/<alias>/message with the error message and go back to 2.3.1.
- New positon of 2.3.2 Obtain lock for accessing recoveryAccessors.
- 2.3.5 All valid recoveryAccessors are written in the same order they were originally written. - New positon of 2.3.2 Obtain unique lock on DeviceModule::recoveryMutex.
- 2.3.5.1 A recoveryAccessor is considered "valid", if it has already received a value, i.e. its current version number is not {nullptr} any more.
- 2.3.5 Call write() on all valid RecoveryHelper::accessor, in the ascending order of the DeviceModule::writeOrder.
- 2.3.5.1 A RecoveryHelper::accessor is considered "valid", if it has already received a value, i.e. RecoveryHelper::versionNumber != {nullptr}
- 2.3.5.2 If there is an exception, update Devices/<alias>/message with the error message, release the lock and go back to 2.3.1. - 2.3.5.2 If there is an exception, update Devices/<alias>/message with the error message, release the lock and go back to 2.3.1.
- 2.3.7 Devices/<alias>/status is set to 0 and Devices/<alias>/message is set to an empty string. - 2.3.7 Devices/<alias>/status is set to 0 and Devices/<alias>/message is set to an empty string.
- 2.3.8 DeviceModule allows ExceptionHandlingDecorators to execute reads and writes again (cf. 2.3.14) - 2.3.8 DeviceModule allows ExceptionHandlingDecorators to execute reads and writes again (cf. 2.3.14)
- 2.3.9 All frozen read operations (cf. 1.4.4) are notified via DeviceModule::errorIsResolvedCondVar. - 2.3.9 All frozen read operations (cf. 1.4.4) are notified via DeviceModule::errorIsResolvedCondVar.
- 2.3.10 Release lock for recoveryAccessors. - 2.3.10 Release lock on DeviceModule::recoveryMutex (was obtained in 2.3.2).
- 2.3.11 The DeviceModuleThread waits for the next reported exception. The call to reportException in the other thread has already set deviceHasError to true (*). From this point on, no new transfers will be started. - 2.3.11 The DeviceModuleThread waits for the next reported exception. The call to reportException in the other thread has already set deviceHasError to true (*). From this point on, no new transfers will be started.
- 2.3.12 An exception is received. - 2.3.12 An exception is received.
- 2.3.13 Devices/<alias>/status is set to 1 and Devices/<alias>/message is set to the first received exception message. - 2.3.13 Devices/<alias>/status is set to 1 and Devices/<alias>/message is set to the first received exception message.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment