From 9c5282dc6531a216b6e0565516af6150b417a107 Mon Sep 17 00:00:00 2001 From: Martin Hierholzer <martin.hierholzer@desy.de> Date: Mon, 4 May 2020 13:37:41 +0200 Subject: [PATCH] exception handling spec: fix numbering and a typo --- doc/spec_exceptionHandling.md | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/doc/spec_exceptionHandling.md b/doc/spec_exceptionHandling.md index 11cd6e3e..ddefda5f 100644 --- a/doc/spec_exceptionHandling.md +++ b/doc/spec_exceptionHandling.md @@ -46,19 +46,19 @@ When the device is functional, it be (re)initialised by using application-define - 2.1 The exception status is published as a process variable together with an error message. - 2.1.1 The variable Devices/<alias>/status contains a boolean flag whether the device is in an error state - 2.1.2 The variable Devices/<alias>/message contains an error message, if the device is in an error state, or an empty string otherwise. - - 2.1 Read operations will propagate the DataValidity::faulty to the owning module / fan out (without changing the actual value). - - 2.1.1 The normal module algorithm code will be continued, to allow this flag to propagate to the outputs in the same way as if it had been received through the process variable itself (c.f. 1.2). - - 2.1.2 Blocking read operations will be skipped, if the fault flag has not yet been read once by the same accessor. If the fault flag had already been read previously by the same accessor, the operation is frozen (regardless of the type of the first read). When the frozen operation is finally executed, another exception might be thrown, in which case the previously frozen operation is finally skipped. - - 2.1.3 Non-blocking read operations (incl. readLatest) will be skipped. The return value will be false (no new data), if the fault flag has been read once already by the same accessor and hence is already propagated (regardless of the type of the first read), true otherwise. - - 2.1.4 Asynchronous read operations behave analogous to 2.1.2: + - 2.2 Read operations will propagate the DataValidity::faulty flag to the owning module / fan out (without changing the actual value): + - 2.2.1 The normal module algorithm code will be continued, to allow this flag to propagate to the outputs in the same way as if it had been received through the process variable itself (c.f. 1.2). + - 2.2.2 Blocking read operations will be skipped, if the fault flag has not yet been read once by the same accessor. If the fault flag had already been read previously by the same accessor, the operation is frozen (regardless of the type of the first read). When the frozen operation is finally executed, another exception might be thrown, in which case the previously frozen operation is finally skipped. + - 2.2.3 Non-blocking read operations (incl. readLatest) will be skipped. The return value will be false (no new data), if the fault flag has been read once already by the same accessor and hence is already propagated (regardless of the type of the first read), true otherwise. + - 2.2.4 Asynchronous read operations behave analogous to 2.2.2: - A TransferFuture, which was valid while the exception was received, is fulfilled immediately when the exception is received, the DataValidity::faulty is propagated to the owning module and the value is left unchanged (i.e. the underlying operation is effectively skipped). - The TransferFuture of an asynchronous read operation that is started only after the exception was received will be fulfilled immediately (i.e. the underlying operation is effectively skipped), if no other read operation of (regardless of the type) of the same accessor has read the fault flag once already. Otherwise it will be fulfilled only after the device is recovered (i.e. the underlying operation is effectively frozen). - - 2.1.5 If the fault state had been resolved in between two read operations (regardless of the type) and the device had become faulty again before the second read is executed, it is not defined whether the second operation will frozen/delayed/skipped (depending on the type) or not. The second operation might behave either like it is a new exception or like the same fault state would still prevail. (*) - - 2.2 Write operations will be delayed. In case of a fault state (new or persisting), the actual write operation will take place asynchronously when the device is recovering. The same mechanism as used for 3.1.2 is used here, hence the order of write operations is guaranteed across accessors, but only the latest written value of each accessor prevails. (*) - - 2.2.1 The return value of write() indicates whether data was lost in the transfer. If the write has to be delayed due to an exception, the return value will be true, if a previously delayed and not-yet writen value is discarded in the process, false otherwise. - - 2.2.2 When the delayed value is finally written to the device during the recovery procedure, it is guaranteed that no data loss happens (writes with data loss will be retried). - - 2.2.3 It is guaranteed that the write takes place before the device is considered fully recovered again and other transfers are allowed (cf. 3.1). - - 2.3 In case of exceptions, there is no guaranteed realtime behavior, not even for "non-blocking" transfers. (*) + - 2.2.5 If the fault state had been resolved in between two read operations (regardless of the type) and the device had become faulty again before the second read is executed, it is not defined whether the second operation will frozen/delayed/skipped (depending on the type) or not. The second operation might behave either like it is a new exception or like the same fault state would still prevail. (*) + - 2.3 Write operations will be delayed. In case of a fault state (new or persisting), the actual write operation will take place asynchronously when the device is recovering. The same mechanism as used for 3.1.2 is used here, hence the order of write operations is guaranteed across accessors, but only the latest written value of each accessor prevails. (*) + - 2.3.1 The return value of write() indicates whether data was lost in the transfer. If the write has to be delayed due to an exception, the return value will be true, if a previously delayed and not-yet writen value is discarded in the process, false otherwise. + - 2.3.2 When the delayed value is finally written to the device during the recovery procedure, it is guaranteed that no data loss happens (writes with data loss will be retried). + - 2.3.3 It is guaranteed that the write takes place before the device is considered fully recovered again and other transfers are allowed (cf. 3.1). + - 2.4 In case of exceptions, there is no guaranteed realtime behavior, not even for "non-blocking" transfers. (*) - 3. The framework tries to resolve an exception state by periodically re-opening the faulty device. - 3.1 After successfully re-opening the device, a recovery procedure is executed before allowing any read/write operations from the AppliactionModules and FanOuts again. This recovery procedure involves: @@ -76,11 +76,11 @@ When the device is functional, it be (re)initialised by using application-define - 1.1 In future, maybe logic_errors are also handled, so configuration errors can nicely be presented to the control system. This may be important especially since logic_errors may depend also on the configuration of external components (devices). If e.g. a device is changed (e.g. device is another control system application which has been modified), logic_errors may be thrown in the recovery phase, despite the device had been successfully initialsed previously. -- 2.1.5 Not defining the behavior here avoids a conflict with 1.2 without requiring a complicated implementation which does not block in this case. Implementing this would not present any gain for the application. If there are many exceptions on the same device in a short period of time, the number of faulty data updates seen by the application modules will always depend on the speed the module is attempting to read data (unless we require every exception to be visible to every module, but this will have complex effects, too). It might break consistency of the number of updates sent through different paths in an application, but applications should anyway not rely on that and use a DataConsistencyGroup to synchronise instead. Hence, the implementation will block always if a blocking read sees a known exception +- 2.2.5 Not defining the behavior here avoids a conflict with 1.2 without requiring a complicated implementation which does not block in this case. Implementing this would not present any gain for the application. If there are many exceptions on the same device in a short period of time, the number of faulty data updates seen by the application modules will always depend on the speed the module is attempting to read data (unless we require every exception to be visible to every module, but this will have complex effects, too). It might break consistency of the number of updates sent through different paths in an application, but applications should anyway not rely on that and use a DataConsistencyGroup to synchronise instead. Hence, the implementation will block always if a blocking read sees a known exception -- 2.2 / 3.1.3 If timing is important for write operations (e.g. must not write a sequence of registers too fast), or if multiple values need to be written to the same register in sequence, the application cannot fully rely on the framework's recovery procedure. The framework hence provides the process variable Devices/<alias>/deviceBecameFunctional for each device, which will be written each time the recovery procedure is completed (cf. 3.1.3). ApplicationModules which implement such timed sequence need to receive this variable and restart the entire sequence after the recovery. +- 2.3 / 3.1.3 If timing is important for write operations (e.g. must not write a sequence of registers too fast), or if multiple values need to be written to the same register in sequence, the application cannot fully rely on the framework's recovery procedure. The framework hence provides the process variable Devices/<alias>/deviceBecameFunctional for each device, which will be written each time the recovery procedure is completed (cf. 3.1.3). ApplicationModules which implement such timed sequence need to receive this variable and restart the entire sequence after the recovery. -- 2.3 Even non-blocking read and write operations are not truely non-blocking, since they are still synchronous. The "non-blocking" guarantee only means that the operation does not block until new data has arrived, and that it is not frozen until the device is recovered. For the duration of the recovery procedure and of course for timeout periods these operations may still block. +- 2.4 Even non-blocking read and write operations are not truely non-blocking, since they are still synchronous. The "non-blocking" guarantee only means that the operation does not block until new data has arrived, and that it is not frozen until the device is recovered. For the duration of the recovery procedure and of course for timeout periods these operations may still block. - 3.1.2 For some applications, the order of writes may be important, e.g. if firmware expects this. Please note that the VersionNumber is insufficient as a sorting criteria, since many writes may have been done with the same VersionNumber (in an ApplicationModule, the VersionNumber used for the writes is determined by the largest VersionNumber of the inputs). @@ -124,10 +124,10 @@ A so-called ExceptionHandlingDecorator is placed around all device register acce - 1.3 In doPreWrite() the recoveryAccessor with the version number and ordering parameter is updated, and the written flag is cleared. - 1.3.1 If the written flag was previously not set, the return value of doWriteTransfer() must be forced to true (data lost). -- 1.4 In doPreRead() certain read operations are frozen in case of a fault state (see A.2.1): +- 1.4 In doPreRead() certain read operations are frozen in case of a fault state (see A.2.2): - 1.4.1 Obtain the recovery lock through DeviceModule::getRecoverySharedLock(), to prevent interference with an ongoing recovery procedure. - 1.4.2 Decide, whether freezing is done (don't freeze yet). Freezing is done if one of the following conditions is met: - - read type is blocking and AccessMode::wait_for_new_data is set, previousReadFailed == true, and DeviceModule::deviceHasError == true (cf. A.2.1.2), or + - read type is blocking and AccessMode::wait_for_new_data is set, previousReadFailed == true, and DeviceModule::deviceHasError == true (cf. A.2.2.2), or - no initial value has been read yet (getCurretVersion() == {nullptr}) and DeviceModule::deviceHasError == true (cf. A.4.2). - 1.4.3 Obtain the DeviceModule::errorLock. Only then release the recovery lock. (*) - 1.4.4 Wait on DeviceModule::errorIsReportedCondVar. -- GitLab